Presenter: Yang Xu
Web scraping is the automated process of fetching webpage content and extracting the desired data.
Scrapy is a fast high-level web crawling and web scraping Python framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
The workshop provides an introduction to web scraping, and involves a hands-on project to scrape, parse and extract desired data from a webpage using Scrapy.
Participants will learn:
- HTTP protocol basics involved in a web scraping instance
- Set up Scrapy to send request to remote server
- Parse HTML response
- Use regex to locate and extract desired data
The workshop assumes a working knowledge of Python and is open to advanced undergraduate students, graduate students, faculty, and staff.
This workshop will be offered in-person in Hesburgh Library. There is a limit of 15 participants for this workshop.
Registration must be completed by Wednesday, February 8th.
Originally published at lucyinstitute.nd.edu.