Lucy Training: Web Scraping with Python

Presenter: Yang Xu

Web scraping is the automated process of fetching webpage content and extracting the desired data.

Scrapy is a fast high-level web crawling and web scraping Python framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

The workshop provides an introduction to web scraping, and involves a hands-on project to scrape, parse and extract desired data from a webpage using Scrapy.

Participants will learn:

HTTP protocol basics involved in a web scraping instance
Set up Scrapy to send request to remote server
Parse HTML response
Use regex to locate and extract desired data

The workshop assumes a working knowledge of Python and is open to advanced undergraduate students, graduate students, faculty, and staff.

This workshop will be offered in-person in Hesburgh Library. There is a limit of 15 participants for this workshop.

Register Now!

Registration must be completed by Wednesday, February 8th.

Originally published at lucyinstitute.nd.edu.