Post

8 US Supreme Court cases – Johnson v Texas

https://youtu.be/E5KZj1GWA8g

CAP includes all official, book-published United States case law — every volume designated as an official report of decisions by a court within the United States.

Our scope includes all state courts, federal courts, and territorial courts for American Samoa, Dakota Territory, Guam, Native American Courts, Navajo Nation, and the Northern Mariana Islands. Our earliest case is from 1658, and our most recent cases are from 2018.

Each volume has been converted into structured, case-level data broken out by majority and dissenting opinion, with human-checked metadata for party names, docket number, citation, and date.

We also plan to share (but have not yet published) page images and page-level OCR data for all volumes.

Scope limits

CAP does not include:

  • New cases as they are published. We currently include volumes published through June, 2018, and may or may not include additional volumes in the future.
  • Cases not designated as officially published, such as most lower court decisions.
  • Non-published trial documents such as party filings, orders, and exhibits.
  • Parallel versions of cases from regional reporters, unless those cases were designated by a court as official.
  • Cases officially published in digital form, such as recent cases from Illinois and Arkansas.

Digitization Process

We created this data by digitizing roughly 40 million pages of court decisions contained in roughly 40,000 bound volumes owned by the Harvard Law School Library.

Members of our team created metadata for each volume, including a unique barcode, reporter name, title, jurisdiction, publication date and other volume-level information. We then used a high-speed scanner to produce JP2 and TIF images of every page. A vendor then used OCR to extract the text of every case, creating case-level XML files. Key metadata fields, like case name, citation, court and decision date, were corrected for accuracy, while the text of each case was left as raw OCR output. In addition, for cases from volumes not yet in the public domain, our vendor redacted any headnotes.

Data quality

Our data inevitably includes countless errors as part of the digitization process. 

Some parts of our data are higher quality than others. Case metadata, such as the party names, docket number, citation, and date, has received human review. Case text and general head matter has been generated by machine OCR and has not received human review.

You can report errors of all kinds at our Github issue tracker, where you can also see currently known issues. We particularly welcome metadata corrections, feature requests, and suggestions for large-scale algorithmic changes. We are not currently able to process individual OCR corrections, but welcome general suggestions on the OCR correction process.

Data citation

Data made available through the Caselaw Access Project API and bulk download service is citable. View our suggested citation in these standard formats:

Usage & access

Case metadata, such as the case name, citation, court, date, etc., is freely and openly accessible without limitation. Full case text can be freely viewed or downloaded but you must register for an account to do so, and currently you may view or download no more than 500 cases per day. In addition, research scholars can qualify for bulk data access by agreeing to certain use and redistribution restrictions. You can request a bulk access agreement by creating an account and then visiting your account page.

Access limitations on full text and bulk data are a component of Harvard’s collaboration agreement with Ravel Law, Inc. (now part of Lexis-Nexis). These limitations will end, at the latest, in March of 2024. In addition, these limitations apply only to cases from jurisdictions that continue to publish their official case law in print form. Once a jurisdiction transitions from print-first publishing to digital-first publishing, these limitations cease. Thus far, Illinois and Arkansas have made this important and positive shift and, as a result, all historical cases from these jurisdictions are freely available to the public without restriction. We hope many other jurisdictions will follow their example soon.