web scraping - 2023 Penn Bowl @ Waterloo

Question

Common Crawl, an organization that primarily performs this action, provided the majority of the text inputs for OpenAI’s GPT3 model. For 10 points each:

[10m] Name this process of gathering information from the Internet. Twitter blamed groups engaged in this action for the need to implement a Tweet viewing limit in July 2023.

ANSWER: web scraping [or data scraping; accept word forms thereof such as scraping data or web scrape; prompt on web crawling and word forms thereof such as crawling the web]

[10h] Getty Images sued the makers of this image-generating neural net for using web scraping to take their images without compensation. Unlike the similar DALL-E and Midjourney, this neural net’s code is publicly accessible.

ANSWER: Stable Diffusion [accept Stability AI]

[10e] Another major source of data for AI language models is this large Internet encyclopedia founded by Jimmy Wales and Larry Sanger, whose logo depicts a globe as a puzzle.

ANSWER: Wikipedia [prompt on Wikimedia Foundation]

Back to bonuses

Summary


2023 Penn Bowl @ Waterloo	10/28/2023	Y	1	30.00	100%	100%	100%
2023 Penn Bowl @ FSU	10/28/2023	Y	1	20.00	100%	100%	0%
2023 Penn Bowl (UK)	10/28/2023	Y	2	15.00	100%	50%	0%

Data


Toronto Weary	Toronto Joy	10	10	10	30