THOMAS bulk data access

= Introduction =

This wiki gathers information concerning public bulk access to information stored on THOMAS, a comprehensive Internet-accessible database that makes federal legislative information available to the public at no cost. THOMAS is operated by the Library of Congress and was launched in January of 1995 at the inception of the 104th Congress.

= Quick Facts =


 * At least twice as many people access congressional legislative information through third party sources than directly through the THOMAS website. Major third party sources include GovTrack.us, OpenCongress.org, and Sunlight's Congress app for Android.
 * Providing “bulk access to data” means releasing an entire database for use by others.
 * GPO currently publishes 6 datasets in bulk (including the Federal Register); Data.gov (launched March 2010) has 400,000 datasets; New Jersey and New Hampshire publish legislative data in bulk.
 * A coalition of organizations issues the major Open House Report calling on Congress to "embrace structured data by publishing the status of legislation and other information to the Web not only as it is now, but also in structured data formats." (May 2007) (http://bit.ly/HkPycb)
 * The Explanatory Statement accompanying the Committee Print of the House Committee on Appropriations for Public Law 111-9 (March 2009) articulates Congress' support for bulk access to legislative information. (http://1.usa.gov/I2UvJG p. 1770)
 * In 2008, the Library of Congress says it expected to report on the resources necessary to supply the public with raw legislative data within the first part of the calendar year. It established a bulk data task force that has never completed its deliberations. (http://bit.ly/A4c5le)
 * Rep. Bill Foster introduced HR 6289 (in the 111th Congress) that would require some legislative data to be made available in bulk and create a THOMAS advisory committee. (Sep. 2010) (http://1.usa.gov/HZthAp)
 * Congressional Facebook Hackathon endorses bulk access to legislative data as an action item: "Release Structured Machine-Readable Legislative Data: Providing legislative data in a bulk format to enable third-party developers to create more dynamic interfaces for legislative information." (November 2011) (http://1.usa.gov/ygzQpl)
 * 30 organizations and companies call for bulk access to legislative data and the creation of an advisory committee. (April 6, 2012)

= Blog Posts =


 * "Improve Public Access to Legislative Information" by Daniel Schuman (4/10/2012)
 * "Help improve public access to Congressional/legislative information #FDLP" by James Jacobs (3/28/2012)
 * "GovTrack Users Want Better Transparency From Congress" by Josh Tauburer (3/16/2012)
 * "Tell Congress to Open Up" by Nicole Aro (3/12/2012)
 * "Government Transparency “To Do” Your Government Transparency 'To-Do'" by Jim Harper (3/12/2012)
 * "Partners in Data Transparency: Parliaments and Non-Profits" by Daniel Schuman (3/1/2012)
 * "Put THOMAS on the Fast Track" by Daniel Schuman (2/9/2012)
 * "Benchmarks for Measuring Success for Legislative Data Transparency" by Daniel Schuman (2/2/2012)
 * "Bulk Data at the House Legislative Data Conference" by John Wonderlich (2/2/2012)
 * "Liberate OpenGovData Now" by David Moore (2/1/2012)
 * "In #HackWeTrust - The House of Representatives Opens Its Doors to Transparency Through Technology" by Daniel Schuman (12/8/2011)
 * "House Holding Wonk-a-thon on Public Access to Congressional Info This Thursday" by Daniel Schuman (12/5/2011)
 * "Sunlight Testimony: Bulk Access to THOMAS and Access to CRS Reports" by Daniel Schuman (12/5/2011)
 * "Read the Bill 2.0" by Daniel Schuman (11/14/2010)
 * "Rep. Foster Introduces Bill To Improve THOMAS" by Daniel Schuman (9/30/2010)
 * "Apps for THOMAS: 3 Wishes" by Daniel Schuman (7/29/2010)
 * "Birds of a Feather: What's in the DISCLOSE Bills" by Daniel Schuman (5/3/2010)
 * "Tip of the Hat to THOMAS" by Daniel Schuman (1/6/2010)
 * "House Leg Branch Appropriations Review" by John Wonderlich (6/27/2009)
 * "Legislative Databases recommendation makes it to House Leg Branch Appropriations markup" by Josh Tauburer (4/14/2008)
 * "Congressman Honda on the Open House cause" by Josh Tauburer (2/1/2008)
 * Discussion on the Open House Project email list (link) (11/14/2007)
 * "Mash-ups for government transparency" by Josh Tauburer (1/25/2007)
 * "Finding Bills Online" by Paul Blumenthal (1/9/2007)

= Policy Documents and Gov't Resources =

Government Resources
"Public Access to Legislative Data.--There is support for enhancing public access to legislative documents, bill status, summary information, and other legislative data through more direct methods such as bulk data downloads and other means of no-charge digital access to legislative databases. The Library of Congress, Congressional Research Service, and Government Printing Office and the appropriate entities of the House of Representatives are directed to prepare a report on the feasibility of providing advanced search capabilities. This report is to be provided to the Committees on Appropriations of the House and Senate within 120 days of the release of Legislative Information System 2.0."
 * "House Committee on Appropriations, Omnibus Act, 2009, Committee Print of the House Committee on Appropriations H.R. 1105 / Public Law 111-8." See Book G, explanatory statement on Congressional Research Service Salaries and Expenses, the paragraph starting with the phrase "Public Access to Legislative Data" (or page 10 of this PDF) (March 2009). Key language:
 * Congressional Facebook Hackathon endorses bulk access to legislative data as an action item in this report
 * "Annual Report of the Congressional Research Service of the Library of Congress for Fiscal Year 2009" (January 2010). See page 20.
 * "Remarks from the Public Printer of the United States" (October 19, 2009)

Civil Society Organization Resources

 * 30 Organizations Send Letters to Appropriators and Rulemakers regarding bulk access to THOMAS (April 10, 2012)
 * Comments Submitted for the Record by Joshua Tauburer for House Committee on Appropriations Subcommittee on the Legislative Branch regarding bulk data for legislative information (Febuary 6, 2012)
 * Comments Submitted for the Record by the Sunlight Foundation for the House Committee on Appropriations Subcommittee on the Legislative Branch Hearing (February 6, 2012)
 * Comments Submitted for the Record by the Sunlight Foundation for the House Committee on Appropriations Subcommittee on the Legislative Branch Hearing Regarding Bulk Access to THOMAS data (May 11, 2011)
 * Open House Project Report: "Congressional Information & the Internet: A Collaborative Examination of the House of Representatives and Internet Technology" Chapter 3: Legislation Database (May 8, 2007)

= News Stories =


 * "Open government advocates seek greater access to congressional data" Federal News Radio.Com (4/16/2012)
 * "Transparency Groups Call for THOMAS bulk downloads" Fierce Government IT (4/11/2012)
 * "Transparency Groups Say THOMAS website is outdated" Federal Computer Week (4/10/2012)
 * "An API for Federal Legislation? Congress Wants Your Opinion" Threat Level (3/5/2009)
 * "Congressional Data Mining: Coming Soon?" Mother Jones (3/5/2009)
 * "Bulk Data Downloads: A Breakthrough in Government Transparency O'Reilly Radar (3/4/2009)
 * "Lawmakers favor outside access to legislative data Government Executive (1/23/2008)

= Additional Resources =


 * "Government: Do you really need an API" by Eric Mill (3/21/2012)
 * Sites that use GovTrack Data (list)
 * THOMAS RSS feeds (link)
 * How often is THOMAS updated (link)
 * Josh Tauburer on Civic Technology (link)
 * House of Representatives Adopts Standards for Electronic posting of House and committee documents and data (committee resolution as PDF) (document naming conventions as PDF)
 * House of Represnetatives launches transparency portal docs.house.gov

= The History of THOMAS Generally =


 * "Congress on the Internet: New Web Server Organizes Online Information" Library of Congress Information Bulletin (1/25/1995) - Announces the creation of THOMAS and includes introductory remarks at Jan. 5 launch event by then-Speaker Gingrich
 * Access to Government Information on the Internet Interpersonal Computing and Technology Journal (10/1993) - Discusses the precursor to THOMAS, the Library of Congress Information System (LOCIS)
 * "The Hill on the Net: Congress Enters the Information Age," by Chris Casey (1996) - Has history of creation of THOMAS.

= States that provide bulk access to legislative data =


 * New Hampshire
 * New Jersey
 * The Sunlight Foundation scrapes and provides bulk access to [50 of 50 state legislative data]

= Ideas for Upgrading THOMAS =

Top Suggestions

 * Bulk Access to THOMAS data
 * Incorporate open data principles

Meta Suggestions

 * Have regular roundtable discussions with members of public and government to discuss ideas for improving THOMAS
 * Create THOMAS users group (email discussion?)
 * Programmer access page: for XML access, RSS feeds, email sign ups, etc.
 * Work to improve parsability of all search results; more structured data
 * All bills in XML
 * Singe page (no pagination) that lists every bill in Congress with status; updated daily on a new page (for scraping); preferably in a feed or XML format
 * Create and make public unique IDs for commonly used entities (or draw upon those created by others)
 * List of all Committees and Subcommittees Members
 * Incorporate Senate Amendments (See S Res. 562)
 * Consider redesign of site (look at LIS, GovTrak, OpenCongress for ideas + public)
 * Provide more detailed history of how THOMAS came to be

Specific Suggestions

 * Make Public Laws Searchable by law number and by name
 * Allow for bill alerts system (email) for bills and topics
 * Add short name of bill to weekly top 5 (plus link to archives)
 * Allow highlighting of "hot" bills -- where there's some kind of legislative action
 * Word/Phrase vs. Bill Number
 * have search box handle both;
 * allow search of entire bill text
 * make selection of phrase vs number sticky
 * Improve "related bills" -- run comparison of bill summaries/ text -- both in this Congress and over past Congresses
 * Make easier to trace bills through, especially when there is a substitute
 * e.g., HR 3200 became HR 3590
 * Is legislation searchable by CRS tags? (Make available list of tags). Add tags to each bill, so can search for related bills.
 * Organize front page of THOMAS around what's going on today in congress; with info on yesterday and upcoming
 * Permalink: "save" on share/save tab is confusing; perhaps make its own link
 * Daily Digest -- when send email, include contents of daily digest, not just link
 * Increase size of search fields
 * 3 organizing links:
 * what's going on today -- running info from floor embedded into THOMAS
 * what happened yesterday
 * what's upcoming this week
 * order plain language search for bills by topic + frequency and tags
 * Is search boolean?
 * want to be able to eliminate terms from search (the "not" function, e.g. Israel not steve)
 * When in search result, there's a calendar, link to it automatically

Fun Suggestions

 * Create twitter account to tweet whenever a bill is introduced (see OLRC) or goes to committee, enacted, etc.; tweet top five viewed bills
 * Mobile version