Web Scraping and Text Processing 2017

April 24, 2017

Dates: Monday, May 22 - Friday, May 26, 2017
Morning Session | 10:00am - 12:00pm
Afternoon Session | 1:00pm - 3:00pm+
Location: May 22-25 at Sherrerd 101; May 26 at Corwin 127
Instructor: Will Lowe & Hubert Jin

Over the last decade, both the variety and amount of data available to social scientists have expanded. These new data sources include administrative records e.g., voter files, campaign finance and lobbying records, geo-referenced data e.g., satellite maps, geocoded event data, and texts e.g., speeches, court rulings, legislative bills. Many of these data sources can be accessed through the web and as a consequence, techniques such as web scraping have become an essential part of a social scientist's toolkit.

The objective of this workshop is to introduce basic tools and techniques for automatic content extraction from the web, parsing, and other data-handling tasks that are commonly encountered in data-intensive research projects.

This year, the 5-day workshop will be divided into 3 sections. You can register for all or any of the sections you require.  

Section 1: Introduction to web scraping using R. This section presupposes prior experience using R.
Section 2: Introduction to Python.
Section 3: Introduction to web scraping and text processing using Python. This section presupposes prior experience using Python.

See the syllabi below for more detail on course content.

Please check the Web Scraping and Text Processing 2017 page for updates and more information.

If you are interested in attending this workshop, you must register.  Please sign up here. You will be added to the Blackboard class which will give you access to materials and the discussion board.

syllabus-DataScrapingWithR.pdf34 KB
syllabus-IntroductionToPython.pdf30 KB
syllabus-DataScrapingWithPython.pdf34 KB