Common Crawl, an organization that primarily performs this action, provided the majority of the text inputs for OpenAI’s GPT3 model. For 10 points each:
[10m] Name this process of gathering information from the Internet. Twitter blamed groups engaged in this action for the need to implement a Tweet viewing limit in July 2023.
ANSWER: web scraping [or data scraping; accept word forms thereof such as scraping data or web scrape; prompt on web crawling and word forms thereof such as crawling the web]
[10h] Getty Images sued the makers of this image-generating neural net for using web scraping to take their images without compensation. Unlike the similar DALL-E and Midjourney, this neural net’s code is publicly accessible.
ANSWER: Stable Diffusion [accept Stability AI]
[10e] Another major source of data for AI language models is this large Internet encyclopedia founded by Jimmy Wales and Larry Sanger, whose logo depicts a globe as a puzzle.
ANSWER: Wikipedia [prompt on Wikimedia Foundation]
<Benjamin McAvoy-Bickford, Other Academic>