Skip to main content

The Problem of Dark Data

A March New York Times article sounded warning bells for researchers: the scourge of dark data. Dark data doesn’t refer to anything secret or illegal, but rather data developed by the government and other organizations subject to loss. A more complete definition, often used in the corporate context, is "the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” Concern over the loss of data that could lead to new discoveries has been especially equated with the loss of scientific data stored by agencies and other organizations. Much of this data is stored on government servers, with no legal obligation to remain available. The Trump administration’s proposed cuts to scientific research and agency funding has only increased the alarm felt by scientists and other researchers.

An additional problem is that dark data, by definition, is unknown. It can’t be verified if it can’t be found, even though we know it’s there. Somewhere. Right now, data.gov is the central repository for government created databases, but it relies on agencies to self-report and is, by many researchers’ estimates, only a fraction of data created by the agencies. The use of proprietary code and data.gov’s practice of linking to data housed on websites, instead of the databases themselves, makes it even more difficult for researchers.

While there does not seem to be any federal legislation prohibiting the destruction or decentralization of these types of data, several non-profits have formed to save this data from going dark, by identifying and downloading  data viewed as vulnerable to deletion.

To learn more about dark data, here are some resources to get you started:




Dark Web: Exploring and Data Mining the Dark Side of the Web, Hsinchun Chen

Comments

Popular posts from this blog

The Amazing, but True, Deportation Story of Carlos Marcello

Earlier this week, the University of Houston Law Center was fortunate to have as its guest Professor Daniel Kanstroom of Boston College of Law. An expert in immigration law, he is the Director of the International Human Rights Program, and he both founded and directs the Boston College Immigration and Asylum Clinic. Speaking as the guest of the Houston Journal of International Law’s annual Fall Lecture Series, Professor Kanstroom discussed issues raised in his new book, Aftermath: Deportation Law and the New American Diaspora . Professor Michael Olivas introduced Professor Kanstroom to the audience, and mentioned the fascinating tale of Carlos Marcello, which Professor Kanstroom wrote about in his chapter “The Long, Complex, and Futile Deportation Saga of Carlos Marcello,” in Immigration Stories , a collection of narratives about leading immigration law cases. My interest piqued, I read and was amazed by Kanstroom’s description of one of the most interesting figures in American le

C-SPAN Video Archive Now Online

Legislative researchers and politics fans take note. C-SPAN recently completed a digitization project placing the entirety of its video collection online. The archives record all three C-SPAN networks seven days a week, twenty-four hours a day. The videos are available at no cost for historical, educational, research, and archival uses. The database includes over 160,000 hours of video recorded since 1987 and the programs are indexed by subject, speaker names, titles, affiliations, sponsors, committees, categories, formats, policy groups, keywords, and locations. The most recent, most watched, and most shared videos are highlighted on the main page. To start watching, visit the C-SPAN Video Library and use the search function at the top of the page.

Texas Subsequent History Table Ceases Publication

This week, Thomson Reuters notified subscribers that publication of the Texas Subsequent History Table will be discontinued and no further updates will be produced, due to “insufficient market interest.” Practitioners have been extracting writ (and since 1997, petition) history from the tables since their initial publication in 1917 as The Complete Texas Writs of Error Table . The tables, later published by West, have been used for nearly a century to determine how the Texas Supreme Court or Court of Criminal Appeals disposed of an appeal from an intermediate appellate court. The purpose of adding this notation to citations is to indicate the effect of the Texas Supreme Court’s action on the weight of authority of the Court of Appeals’ opinion.  For example, practitioners may prefer to use as authority a case that the Texas Supreme Court has determined is correct both in result and legal principles applied (petition refused), rather than one that simply presents no error that requires