Skip to Main Content

Digital Humanities

This guide provides an introduction to digital humanities (DH) theory and practice and an overview of DH methods, tools, and resources.

Email this link:

What is Humanities Data?

"In the digital age, data is the raw material on which discoveries are built" (via SPARC)


Many humanists hear "data" and assume they don't have it, but humanities data can take many forms, including images, music, poetry, short stories, and more. It is not necessarily quantitative, and despite the quote above, it is not necessarily digital.


Some helpful introductions to Humanities Data

“An Introduction to Humanities Data Curation” by Julia Flanders (Northeastern University) and Trevor Muñoz (Maryland Institute for Technology in the Humanities)

"Humanities Data: A Necessary Contradiction" by Miriam Posner (UCLA)

"Big? Smart? Clean? Messy? Data in the Humanities" by Christof Schöch (University of Trier, Germany)

Finding Humanities Data

This is a selected list of humanities data sources with a focus on resources designed for computation. (Researchers may also want to explore primary sources, which can also be humanities data.) No list of data sources could ever be complete. Please contact your subject librarian or the DH librarian for guidance on identifying more specific data sources.


  • Dr. Alan Liu's DH Toychest Includes data collections and datasets, demo corpora (text collections ready for use), as well as a plethora of other DH-related tool,  resources, and guides. (It will keep you busy!)
  • HathiTrust Digital Library HathiTrust makes the texts of public domain works in its corpus available for research purposes. The works fall into two categories: non-Google-digitized volumes, which are freely available, and Google-digitized volumes, which are available through an agreement with Google. Within each category there is a distinction between public domain works available only in the US versus public domain works available anywhere in the world.
  • Google Books Ngram Viewer and Dataset
    • Ngram Viewer play around with Google's corpus of millions of books in this optimized dashboard; lets you graph occurrences of words in Google Books and a variety of open digital collections, respectively
    • Downloadable Dataset  Download the full corpus of about 5 million digitized books for large-scale analysis
  • Project Gutenberg Thousands of free, open access plain-text e-books that can be used to create corpora for text analysis
  • JSTOR Data for Research (DfR) Provides datasets on content on JSTOR for use in research and teaching; researchers may use DfR to define and submit their desired dataset to be automatically processed. Data available includes metadata, n-grams, and word counts for most articles and book chapters, and for all research reports and pamphlets on JSTOR.
  • EEBO-TCP: Early English Books Online* A partnership with ProQuest and with more than 150 libraries to generate highly accurate, fully-searchable, SGML/XML-encoded texts corresponding to books from the Early English Books Online Database.
  • Oxford Text Archive (OTA) Collects, catalogs, preserves and distributes digital resources for research and teaching; holds thousands of texts in more than 25 different languages
  • Proceedings of the Old Bailey, 1674 - 1913  Contains fully searchable texts related to 197,745 criminal trials held at London's central criminal court. The platform is searchable by anyone, and also allows advanced researchers direct access to it's large, complex dataset
  • European Data Portal Harvests the metadata of Public Sector Information available on public data portals across European countries; information regarding the provision of data and the benefits of re-using data is also included
  • Caselaw Access Project (CAP) From Harvard University's Library Innovation Lab, researchers can use the API and Bulk Data Service to analyze 360 years of United States caselaw
  • Seeks to help collect and disseminate information about publicly available data of particular interest to digital humanities and humanities computing
  • GDELT: A Global Database of Society Supported by Google Jigsaw, provides free access to its raw datafiles, which contain the world's broadcast, print and web news in over 100 languages
  • Internet Archive Access to millions of textual, audio, visual and web documents

Data Management for Humanists

Federal grant-funding agencies often  require "data management plans" (DMPs) alongside grant applications to receive funding. It is important to properly assess, document, store, archive, and share your data - it will save you headaches down the road by making it easier to find, use, and analyze your data in the long term, as well as facilitate collaboration with colleagues.

Several universities and organizations are developing the DMPTool to help researchers meet new data management requirements from many U.S. funding agencies. The DMPTool will help researchers:

  • Create ready-to-use data management plans for specific funding agencies.
  • Meet requirements for data management plans.
  • Get step-by-step instructions and guidance for data management plan.
  • Learn about resources and services available at your institution to fulfill the data management requirements of their grants.

DMP Tool at UC

DMP Tool Web Site

DMP Tool Blog

Learn more about Open Data from SPARC (the Scholarly Publishing and Academic Resources Coalition)

UC Data Repositories and Services

Repository service that enables UC users to manage, archive, and share digital content including data; can be used for long-term preservation, sharing, or meeting a grant’s data sharing and preservation requirements.

The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types.

The eScholarship suite of open access publishing services gives UC scholars direct control over the creation and dissemination of the full range of their research. eScholarship Publishing provides comprehensive publication services for UC-affiliated departments, research units, publishing programs, and individual scholars who seek to publish original, open access journals, books, conference proceedings, and other original scholarship.

A service for researchers and others to obtain and manage long-term identifiers for digital content including data, which makes digital objects easier to access and verify, thus increasing re-use and citations; contact Digital Scholarship Services (DSS) for more information.

ORCID is a nonprofit organization helping create a world in which all who participate in research, scholarship and innovation are uniquely identified and connected to their contributions and affiliations, across disciplines, borders, and time. ORCID provides a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. You can connect your iD with your professional information — affiliations, grants, publications, peer review, and more.

Step by step instructions on how to get your unique ORCID identifier.