From Chapter One in Evaluating Machine Learning Models by Alice Zheng:

One of the core tasks in building a machine learning model is to evaluate its performance. It’s fundamental, and it’s also really hard. My mentors in machine learning research taught me to ask these questions at the outset of any project: “How can I measure success for this project?” and “How would I know when I’ve succeeded?” These questions allow me to set my goals realistically, so that I know when to stop. Sometimes they prevent me from working on ill-formulated projects where good measurement is vague or infeasible. It’s important to think about evaluation up front.

From the summary for Counting Regulations: An Overview of Rulemaking, Types of Federal Regulations, and Pages in the Federal Register (R43056, Updated September 3, 2019):

Federal rulemaking is an important mechanism through which the federal government implements policy. Federal agencies issue regulations pursuant to statutory authority granted by Congress. Therefore, Congress may have an interest in performing oversight of those regulations, and measuring federal regulatory activity can be one way for Congress to conduct that oversight. The number of federal rules issued annually and the total number of pages in the Federal Register are often referred to as measures of the total federal regulatory burden.

Bloomberg BNA has changed its name to Bloomberg Industrial Group. because “The new name better reflects the diverse range of businesses and professionals the company serves and the wide range of markets where it operates.” From the press release:

“Since our company was acquired by Bloomberg in 2011, we’ve developed a broad portfolio of products and solutions while serving a changing marketplace,” said Josh Eastright, CEO of Bloomberg Industry Group. “At the same time, we’ve transformed from a periodical publisher to a product- and technology-focused company. Our new name more accurately reflects who we are today—a company that empowers industry professionals with critical information to take decisive action and make the most of every opportunity.”

Jean O’Grady opines “The new name … gives us a wonderful new acronym in Legal Publishing: BIG.”

From the abstract for G. Patrick Flanagan & Michelle Dewey, Where Do We Go from Here? Transformation and Acceleration of Legal Analytics in Practice (Georgia State University Law Review, Vol. 35, No. 4, 2019):

The advantages of evidence-based decision-making in the practice and theory of law should be obvious: Don’t make arguments to judges that seldom persuade; Jurisprudential analysis ought to align with sound social science; Attorneys should pitch legal work to clients that demonstrably need it. Despite the appearance of simplicity, there are practical and attitudinal barriers to finding and incorporating data into the practice of law.

This article evaluates the current technologies and systems used to publish and analyze legal information from a researcher’s perspective. The authors also explore the technological, economic, political, and legal impediments that have prevented legal information systems from being able to keep pace with other industries and more open models. The authors detail tangible recommendations for necessary next steps toward making legal analytics more widely adopted by practitioners.

From the abstract for John Nay, Natural Language Processing and Machine Learning for Law and Policy Texts (Aug. 23, 2019):

Almost all law is expressed in natural language; therefore, natural language processing (NLP) is a key component of understanding and predicting law at scale. NLP converts unstructured text into a formal representation that computers can understand and analyze. The intersection of NLP and law is poised for innovation because there are (i.) a growing number of repositories of digitized machine-readable legal text data, (ii.) advances in NLP methods driven by algorithmic and hardware improvements, and (iii.) the potential to improve the effectiveness of legal services due to inefficiencies in its current practice.

NLP is a large field and like many research areas related to computer science, it is rapidly evolving. Within NLP, this paper focuses primarily on statistical machine learning techniques because they demonstrate significant promise for advancing text informatics systems and will likely be relevant in the foreseeable future.

First, we provide a brief overview of the different types of legal texts and the different types of machine learning methods to process those texts. We introduce the core idea of representing words and documents as numbers. Then we describe NLP tools for leveraging legal text data to accomplish tasks. Along the way, we define important NLP terms in italics and offer examples to illustrate the utility of these tools. We describe methods for automatically summarizing content (sentiment analyses, text summaries, topic models, extracting attributes and relations, document relevance scoring), predicting outcomes, and answering questions.

From the abstract for Harry Surden’s The Ethics of Artificial Intelligence in Law: Basic Questions (Forthcoming chapter in Oxford Handbook of Ethics of AI, 2020):

Ethical issues surrounding the use of Artificial Intelligence (AI) in law often share a common theme. As AI becomes increasingly integrated within the legal system, how can society ensure that core legal values are preserved?

Among the most important of these legal values are: equal treatment under the law; public, unbiased, and independent adjudication of legal disputes; justification and explanation for legal outcomes; outcomes based upon law, principle, and facts rather than social status or power; outcomes premised upon reasonable, and socially justifiable grounds; the ability to appeal decisions and seek independent review; procedural fairness and due process; fairness in design and application of the law; public promulgation of laws; transparency in legal substance and process; adequate access to justice for all; integrity and honesty in creation and application of law; and judicial, legislative, and administrative efficiency.

The use of AI in law may diminish or enhance how these values are actually expressed within the legal system or alter their balance relative to one another. This chapter surveys some of the most important ethical topics involving the use of AI within the legal system itself (but not its use within society more broadly) and examines how central legal values might unintentionally (or intentionally) change with increased use of AI in law.

The first of its kind, Paul T. Jaeger & Natalie Greene Taylor, Foundations of Information Policy (ALA Neal-Schuman, 2019) provides a much-needed introduction to the myriad information policy issues that impact information professionals, information institutions, and the patrons and communities served by those institutions. In this key textbook for LIS students and reference text for practitioners, noted scholars Jaeger and Taylor —

  • draw from current, authoritative sources to familiarize readers with the history of information policy;
  • discuss the broader societal issues shaped by policy, including access to infrastructure, digital literacy and inclusion, accessibility, and security;
  • elucidate the specific laws, regulations, and policies that impact information, including net neutrality, filtering, privacy, openness, and much more;
  • use case studies from a range of institutions to examine the issues, bolstered by discussion questions that encourage readers to delve more deeply;
  • explore the intersections of information policy with human rights, civil rights, and professional ethics; and
  • prepare readers to turn their growing understanding of information policy into action, through activism, advocacy, and education.

The ABA Profile of the Legal Profession survey reports when lawyers begin a research project 37% say they start with a general search engine like Google, 31% start with a paid online resource, 11% start with a free state bar-sponsored legal research service and 8% start with print resources.

A large majority (72%) use fee-based online resources for research. Westlaw is the most-used paid online legal research service, used by nearly two-thirds of all lawyers (64%) and preferred over other paid online services by nearly half of all lawyers (46%).

When it comes to free websites used most often for legal research 19% said Cornell’s Legal Information Institute, followed by Findlaw, Fastcase and government websites (17% each), Google Scholar 13%, and Casemaker 11%. Despite the popularity of online sources, 44% still use print materials regularly.

The survey also reports that 10% of lawyers say their firms use artificial intelligence-based technology tools while 36% think artificial intelligence tools will become mainstream in the legal profession in the next three to five years.

In thinking about dual provider choices for legal information vendors in the BigLaw market, I believe we tend to think the licensing equation is (Westlaw + Lexis Advance). Why? The answer may be that we tend to divide the marketplace for commercial legal information into two unique and close-to-mutually exclusive segments: general for core legal search provided by WEXIS and specialty for practice-specific legal search provided by Bloomberg BNA and Wolters Kluwer. This perspective assumes the adoption of BBNA and WK is only on a practice group/per seat basis while the adoption of WEXIS is on an enterprise/firm-wide basis. In addition to perceptions on editorial quality, where topical deep dives are expected from BBNA and WK but not WEXIS, perceived vendor pricing policies have influenced our take on the structure of this market.

According to Feit Consulting, the reality is quite different. Approximately 89% of AmLaw 200 firms license Wolters Kluwer and 72% of those WK firms license this service in an enterprise/firm-wide pricing plan, not on a practice group/per seat plan. That 72% figure means WK’s firm-wide install base in the AmLaw 200 is approximately 64%, or almost the same as Lexis Advance’s install rate in BigLaw.

The dual provider licensing equation really appears to be (Westlaw) + (Lexis or Wolters Kluwer). This is reinforced by statistics from Feit on the likelihood of vendor cancellation. Only 14% of Westlaw firms and 12% of WK firms are extremely or moderately likely to be eliminated. That’s less than half the number of firms extremely or moderately likely to eliminate Lexis (30%) and BBNA (29%). For dual provider firms, (Westlaw) + (Lexis or Wolters Kluwer) appears to be a well established equation.

ROSS Intelligence goes after “legacy” search platforms (i.e., WEXIS) in this promotional blog post, How ROSS AI Turns Legal Research On Its Head, Aug. 6, 2019. The post claims that ROSS supplants secondary analytical sources and makes West KeyCite and LexisNexis Shepard’s obsolete because its search function provides all the relevant applied AI search output for the research task at hand. In many respects, Fastcase and Casetext also could characterize their WEXIS competitors as legacy legal search platforms. Perhaps they have and I have just missed that.

To the best of my recollection, Fastcase, Casetext and ROSS have not explicitly promoted competition with each other. WEXIS has always been the primary target in their promotions. So why are Fastcase, Casetext and ROSS competing with each other in the marketplace? What if they joined forces in such a compelling manner that users abandon WEXIS for core legal search? Two or all three of the companies could merge. In the alternative, they could find a creative way to offer license-one-get-all options.

Perhaps the first step is to reconsider the sole provider option. It’s time to revise the licensing equation; perhaps it should be (Westlaw or Lexis) + (Fastcase or Casetext or ROSS).

H/T to Bob Ambrogi for featuring results from the 2019 Aderant Business of Law and Legal Technology Survey. The survey results answered the question: What technology tools rank most important to lawyers in driving efficiency? In the section on technology tools and cloud adoption, the survey asked lawyers about the technology tools that have the greatest impact on their ability to work efficiently and manage their work effectively. Out of 18 categories of tools, the two lowest ranked were AI and blockchain. Knowledge management ranked seventh. Details on LawSites.

Andrew Martineau’s Reinforcing the ‘Crumbling Infrastructure of Legal Research’ Through Court-Authored Metadata, Law Libr. J. (Forthcoming) “examines the role of the court system in publishing legal information and how that role should be viewed in a digital, online environment. In order to ensure that the public retains access to useful legal information into the future, courts should fully embrace the digital format by authoring detailed, standardized metadata for their written work product—appellate-level case law, especially. If court systems took full advantage of the digital format, this would result in immediate, identifiable improvements in free and low-cost case law databases. Looking to the future, we can speculate on how court-authored metadata might impact the next generation of “A.I.”-powered research systems. Ultimately, courts should view their metadata responsibilities as an opportunity to “reinforce” the structure of the law itself.”

“Natural language generation (NLG) is a subset of natural language processing (NLP) that aims to produce natural language from structured data,” wrote Sam Del Rowe. “It can be used in chatbot conversations, but also for various types of content creation, such as summarizing data and generating product descriptions for online shopping. Companies in the space offer various use cases for this type of automated content creation, but the technology requires human oversight—a necessity that is likely to remain in the near future.” For more, see Get Started With Natural Language Content Generation, EContent, July 22, 2019.

Your litigation analytical tool says your win rate for summary judgement motions in class action employment discrimination cases is ranked the best in your local jurisdiction according to the database used. Forget the problem with using PACER data for litigation analytics, possible modeling error or possible bias embedded in the tool. Can you communicate this applied AI output to a client or potential client? Are you creating an “unjustified expectation” that your client or potential client will achieve the same result for your next client matter?

According to the ABA’s Model Rules of Professional Conduct Rule 7.1, you are probably creating an “unjustified expectation.” However you may be required to use that information under Model Rule 1.1 because that rule creates a duty of technological competence. This tension between Model Rule 7.1 and Model Rule 1.1 is just begining to be played out.

For more, see Roy Strom’s The Algorithm Says You’ll Win the Case. What Do You Say? US Law Week’s Big Law Business column for August 5, 2019. See also Melissa Heelan Stanzione, Courts, Lawyers Must Address AI Ethics, ABA Proposal Says, Bloomberg Law, August 6, 2019.