Question
Common Crawl, an organization that primarily performs this action, provided the majority of the text inputs for OpenAI’s GPT3 model. For 10 points each:
[10m] Name this process of gathering information from the Internet. Twitter blamed groups engaged in this action for the need to implement a Tweet viewing limit in July 2023.
ANSWER: web scraping [or data scraping; accept word forms thereof such as scraping data or web scrape; prompt on web crawling and word forms thereof such as crawling the web]
[10h] Getty Images sued the makers of this image-generating neural net for using web scraping to take their images without compensation. Unlike the similar DALL-E and Midjourney, this neural net’s code is publicly accessible.
ANSWER: Stable Diffusion [accept Stability AI]
[10e] Another major source of data for AI language models is this large Internet encyclopedia founded by Jimmy Wales and Larry Sanger, whose logo depicts a globe as a puzzle.
ANSWER: Wikipedia [prompt on Wikimedia Foundation]
<Benjamin McAvoy-Bickford, Other Academic>
Summary
2023 Penn Bowl @ Waterloo | 10/28/2023 | Y | 1 | 30.00 | 100% | 100% | 100% |
2023 Penn Bowl @ FSU | 10/28/2023 | Y | 1 | 20.00 | 100% | 100% | 0% |
2023 Penn Bowl (UK) | 10/28/2023 | Y | 2 | 15.00 | 100% | 50% | 0% |
Data
Toronto Weary | Toronto Joy | 10 | 10 | 10 | 30 |