Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, and others. Wikidata also provides support to many other sites and services beyond just Wikimedia projects. The content of Wikidata is available under a free license, exported using standard formats, and can be interlinked to other open data sets on the linked data web. For an introduction to Wikidata, visit here.
Back in 2017, Venture Beat reported that LexisNexis was testing chatbots for legal search. Bob Ambrogi now reports that implementation of a chatbot for Lexis Advance is coming sooner rather than later although no launch date has been announced.
The chatbot’s goal, LexisNexis said, is to give users the option to take more of a conversational approach to search, rather than the “typing keywords into a search bar” approach. A Lexis Advance chatbot could have two key uses. The bot can guide researchers unfamiliar with a topic to sources people typically look at for that topic. The second use is when revisiting prior research. The bot can present it back to searchers, pointing out that, three months ago, they did similar research, and offering to show it to them again. Also, it is claimed that the bot will get better over time at predicting a user’s intent as the user interacts with the system.
Wait ‘n see.
The ‘Future Book’ Is Here, but It’s Not What We Expected from Wired notes that despite the seeming certainty of predictions that digital technology would have by now revolutionized books by incorporating all kinds of interactive features, so far it hasn’t happened. Digital books haven’t changed much at all since their introduction more than 15 years ago. And they still haven’t supplanted the demand for traditional books which remains strong for the same reason that the design hasn’t changed since Gutenberg’s invention; the book blends form and function so perfectly that it nearly defies improvement. According to Wired, however, what has changed is the publishing industry itself and the ease with which an author can get her work into print.
Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018 in beta, and the product is targeted at scientists and data journalists. Institutions that publish their data online, like universities and governments, will need to include metadata tags in their webpages that describe their data, including who created it, when it was published, how it was collected, and so on. This information will then be indexed by Dataset Search and combined with input from Google’s Knowledge Graph.
The initial release of Dataset Search will cover the environmental and social sciences, government data, and datasets from news organizations like ProPublica. However, if the service becomes popular, the amount of data it indexes should quickly snowball as institutions and scientists scramble to make their information accessible. Check out Dataset Search here.
From Lisa DeLuca, Where Do FOIA Responses Live? Electronic Reading Rooms and Web Sources, C&RL News (2019; 80.1):
The “Electronic Freedom of Information Act Amendments of 1996” required that agencies needed to make eligible records available electronically. As a result, there are dozens of FOIA Libraries and Electronic Reading Rooms that are repositories for responses to agency FOIA requests. These documents are also known as responsive documents. Documents are often posted by agencies with redactions to protect personal privacy, national security, and other FOIA exemptions and exclusions. It is important for researchers, journalists, and citizens to use the terms “FOIA Libraries” and “Electronic Reading Rooms” as part of their search terminology. This will ensure they can find documents that might not be findable through a regular Google search.
There is no shortage of literature analyzing the challenges and administrative components of FOIA, including response wait times, complaints about excessive redactions, and lawsuits over access to government files. The purpose of this article is to describe where FOIA responses can be located. Searchable government FOIA information varies by agency. This column includes descriptions of several agency Electronic Reading Rooms, government sources (including Presidential Libraries), and the National Archives and Records Administration (NARA), as well as nongovernment sources, such as FOIA Mapper and MuckRock. The sources listed in this column are excellent starting points to locate current and historical FOIA content.
H/T Gary Price’s INFOdocket post.
From the press release:
The Government Publishing Office (GPO) makes available a subset of enrolled bills, public and private laws, and the Statutes at Large in Beta United States Legislative Markup (USLM) XML, a format that makes documents easier to download and repurpose.
The documents available in the Beta USLM XML format include enrolled bills and public laws beginning with the 113th Congress (2013) and the Statutes at Large beginning with the 108th Congress (2003). They are available on govinfo, GPO’s one-stop site to authentic, published Government information. www.govinfo.gov/bulkdata
H/T Gary Price, InfoDocket
govinfo is a redesign of the FDsys public website, with a focus on implementing feedback from users and improving overall search and access to electronic Federal Government information. The redesigned, mobile-friendly website incorporates innovative technologies and includes several new features for an overall enhanced user experience. GPO’s Federal Digital System (FDsys) website will be retired and replaced with govinfo on Dec. 14, 2018. Here’s answers to frequently ask questions about the transition.
The Foreign Law Web Archive is a collection of foreign legal materials, including gazettes and judicial sites. Many foreign legal materials are now posted online, with some jurisdictions dispensing with a print publication entirely. Certain jurisdictions’ legal materials are challenging to acquire or considered at-risk of disappearing from the web. The Law Library of Congress is now archiving the legal materials of selected jurisdictions to ensure we can continue to provide comprehensive and timely access to foreign legal materials to researchers from across the world.
Thomson Reuters is reporting that more than 1,500 legal organizations have already purchased Westlaw Edge and that milestone was reached in Westlaw Edge’s first 15 weeks on the market. The company also reports that all law students will have Westlaw Edge beginning in January 2019.
Groups such as the Southern Poverty Law Center maintain databases that track hate groups but the Center focuses on the big picture. First Vigil digs into the detail the Center leaves out by tracking white nationalist court cases. For background on First Vigil’s creator, see The Data Scientist Tracking America’s White Supremacists, Motherboard, Nov. 14, 2018.
A collection of Trump administration documents on secrecy policy has been compiled by the Federation of American Scientists’ Project on Government Secrecy. View the collection here.
In 2016, Congress and the president established the U.S. Commission on Evidence-Based Policymaking and charged it with developing a strategy for addressing these barriers. During the commission’s fact-finding efforts, it launched a survey of agencies and units across the federal government to better understand existing barriers to data access and use. The data collected in the survey then provided initial evidence that the commission considered in making its recommendations.
- Extended analysis of the commission survey confirms much of what the commission concluded in its final report, validating identified legal and regulatory barriers to using data. The extended analysis also leads to new findings:
- Federal offices perceive that their roles in evidence-building activities are in niches and largely do not perceive their data collection as for a broad range of purposes like evaluation that would require better coordination across an agency.
- Units within federal agencies exhibit wide variation in their capacity for data sharing and linkage.
- Challenges to using data for evidence building are distributed across virtually every policy domain. Respondents identify federal tax information as especially difficult to access and use.
Despite some offices reportedly lacking resources to conduct evidence-building activities, it is still quite common for offices to conduct at least some data sharing and linking. However, agencies still indicate substantial gaps in developing metadata, sharing with third parties, conducting disclosure reviews, and engaging in disclosure avoidance protocols to protect data. Statistical agencies were by far better positioned for this work than other agencies.
H/T to InfoDocket
Gerry W. Beyer & Katherine Peters recently published an article entitled, Sign on the [Electronic] Dotted Line: The Rise of the Electronic Will, Wills, Trusts, & Estates Law eJournal (2018). Here is the abstract: “The electronic will is here … almost. The last two years have seen rapid development in the area of electronic wills. As of September 2018, several states either have enacted electronic will statutes or are in the process of considering such legislation. This article provides the history of e-wills and reviews e-will statutes, both enacted and proposed, along with the Summer 2018 draft of the Electronic Wills Act.”
Here’s the abstract for Peter Martin, District Court Opinions that Remain Hidden Despite a Longstanding Congressional Mandate of Transparency – The Result of Judicial Autonomy and Systemic Indifference (2018):
The E-Government Act of 2002 directed the federal courts to provide access to all their written opinions, in text-searchable format, via a website. Ten years later the Judicial Conference of the United States approved national implementation of a comprehensive database of those opinions through a joint venture between the courts and the Government Publishing Office (GPO). Despite the promise implicit in these initiatives, public access to many thousands of federal district court decisions each year remains blocked. They are effectively hidden. Many court websites lack a clear link to opinions, only a bare majority of district courts transmit decisions to the GPO, and far too many courts and judges fail to take the steps necessary for opinion distribution beyond the parties.
Using the large volume of district court Social Security litigation to measure and illustrate these failures, the article examines their dimensions, consequences, and causes. It concludes that the problem is a large one, that it poses a major challenge to those carrying out empirical studies and judicial analytics, and that the courts’ radical decentralization combined with judicial autonomy will continue to frustrate goals of public access unless serious measures are taken at the national level. Finally, it argues that inclusion in the GPO database of federal judicial opinions should cease being optional.
From the abstract for Jason Zarin, A Comparison of Case Law Results between Bloomberg Law’s ‘Smart Code’ Automated Annotated Statutes and Traditional Curated Annotated Codes (2017):
Traditional annotated codes provide an edited list of cases, organized by topic, that cite a particular statute. Bloomberg Law has recently implemented “Smart Code,” a computer-generated citator to the United States Code. The computer-generated Smart Code is designed to compete with traditional edited annotated codes in that it uses an automated and algorithmic process to classify the cases that cite a statute into a set of ninety topics. Using legal research examples in a variety of topics of increasing abstraction, results using Smart Code are compared to traditional annotated codes (United States Code Service and United States Code Annotated) as well as to specialized looseleafs (e.g., Standard Federal Tax Reporter).
From the Oct. 30, 2018 press release: “Bloomberg Law today announced the formation of its Bankruptcy Innovation Board, which will provide input and consult on the digital Bloomberg Law: Bankruptcy Treatise and inform the direction of future technology-enhanced financial restructuring and insolvency tools on the Bloomberg Law platform. The board’s membership consists of leading bankruptcy attorneys from law firms, academia, and the judiciary.”
Very interesting development. I wonder whether Bloomberg Law will organize similar innovation boards for BNA labor and employment treatises, IP treatises, etc.
Subscript is a nonprofit legal news website delivering reports in infographics. “Our visual reports help lawyers keep up-to-date on Supreme Court decisions and other legal news quickly, and they introduce complex legal concepts to teachers, students, journalists and other non-lawyer politically-interested individuals.” See, for example, Political Gerrymandering and the Midterm Elections.
Here’s the abstract for Andrea L. Roth, Machine Testimony, 126 Yale Law Journal ___ (2017):
Machines play increasingly crucial roles in establishing facts in legal disputes. Some machines convey information — the images of cameras, the measurements of thermometers, the opinions of expert systems. When a litigant offers a human assertion for its truth, the law subjects it to testimonial safeguards — such as impeachment and the hearsay rule — to give juries the context necessary to assess the source’s credibility. But the law on machine conveyance is confused; courts shoehorn them into existing rules by treating them as “hearsay,” as “real evidence,” or as “methods” underlying human expert opinions. These attempts have not been wholly unsuccessful, but they are intellectually incoherent and fail to fully empower juries to assess machine credibility. This Article seeks to resolve this confusion and to offer a coherent framework for conceptualizing and regulating machine evidence. First, it explains that some machine evidence, like human testimony, depends on the credibility of a source. Just as so-called “hearsay dangers” lurk in human assertions, “black box dangers” — human and machine errors causing a machine to be false by design, inarticulate, or analytically unsound — potentially lurk in machine conveyances. Second, it offers a taxonomy of machine evidence, explaining which types implicate credibility and how courts have attempted to regulate them through existing law. Third, it offers a new vision of testimonial safeguards for machines. It explores credibility testing in the form of front-end design, input and operation protocols; pretrial disclosure and access rules; authentication and reliability rules; impeachment and courtroom testing mechanisms; jury instructions; and corroboration rules. And it explains why machine sources can be “witnesses” under the Sixth Amendment, refocusing the right of confrontation on meaningful impeachment. The Article concludes by suggesting how the decoupling of credibility testing from the prevailing courtroom-centered hearsay model could benefit the law of testimony more broadly.
Compared to other search services, Westlaw and Westlaw Edge always produce more search output with prefiltering searches compared to executing the exact same searches in postfiltering mode. No other tested search services did that. Those services (Bloomberg Law and Lexis) always posted the same number of results regardless of search mode. So why does prefiltering Westlaw-Westlaw Edge searching produce more hits than postfiltering searching?
Kevin Rothenberg tested Westlaw’s search algorithm for answers after confirming this insight from Susan Nevelow Mart’s The Algorithm as a Human Artifact: Implications for Legal [Re]Search. The black box that is Westlaw defeated Rothenberg’s efforts: “clearly, I do not understand some important aspect of West’s search algorithm,” wrote Rothenberg in Prefiltering vs. Postfiltering: Which is the Best Method for Searching? AALL Spectrum Nov.-Dec. 2018 at 36 [recommended but paywalled].
I wonder if the West Search development team would shed some light on this unique phenomenon. Perhaps CRIV can ask. (I doubt Thomson Reuters would provide an illuminating look inside West Search’s black box.)
News analytics appears to be on the rise and LexisNexis, the dominate player in legal news, intends to play its part in this new development. From yesterday’s press release:
Nexis® DaaS offers data-driven organizations distinct and differentiated advantages to harness big data’s potential:
- Comprehensive source universe—Access to petabytes of data including global print, broadcast, web news and social commentary, company and industry data, regulatory and legal data.
- Optimal data integrations—Delivery via flexible APIs providing normalized, XML-based, semi-structured data.
- Robust enrichments—Enriched with multiple feature extractors and metadata related to more than 7000 subjects and industries.
- Experienced big data partner—45 years of experience with content aggregation and multiple patents on machine learning, clustering and other big data applications decades before mainstream use.