Skip to Main Content
* UC Irvine access only

HathiTrust and You: Home

Data-driven Research using Text Mining

Workshop Info

HathiTrust and You: Data-driven Research using Text Mining is a recurring hands-on series with a group of co-learners for anyone interested in learning about text mining and how to perform it using free resources held in the HathiTrust Digital Library. We are learning essential text mining concepts and methods, and how to gather, work with, analyze, and visualize textual data with HathiTrust and other free web-based tools in a progressive manner of four 1.5-hr sessions (each session's topic is different; no previous experience is needed to attend each session). This is a brand new series for Winter 2020 and workshop lessons are posted online as we move along. The classroom is equipped with computers and internet, although participants can choose to bring their own laptops. The event is free and no registration is necessary. See the year tab for workshop schedule, location, and lessons.

Questions? Email Shu at

Learning Objectives

At the end of the workshop, participants will be able to:

  1. Explain text mining and essential concepts and terminology
  2. Understand HathiTrust Digital Library and its data
  3. Process, analyze, and visualize textual data using free web-based tools
  4. Work with HathiTrust data and algorithms

About Shu

Shu is the Digital Scholarship Services Librarian at UCI Libraries. She wants to learn text mining and connect with UCI faculty, researchers, and students to solve real-world problems with it. 

This recurring series is a pilot for anyone at UCI interested in learning about this topic utilizing free HathiTrust data and tools with a supportive group of co-learners.   

About HathiTrust

Founded in 2008, HathiTrust is a not-for-profit collaborative of academic and research libraries preserving 17+ million digitized items. HathiTrust offers reading access to the fullest extent allowable by U.S. copyright law, computational access to the entire corpus for scholarly research, and other emerging services based on the combined collection. HathiTrust members steward the collection — the largest set of digitized books managed by academic and research libraries — under the aims of scholarly, not corporate, interests.

HathiTrust Digital Library preserves and provides lawful access to the 17 million digitized items.
HathiTrust Research Center offers services that support use of the HathiTrust corpus as a dataset for analysis via text and data mining research.

UCI is a member of HathiTrust. See also

Shu Liu

Profile Photo
Shu Liu
242 Science Library
University of California, Irvine
Website Skype Contact: Shu Liu