Understanding Primary Law as Legal Data


Anita Gogia  is a IPilogue Writer and a 2L JD Candidate at Osgoode Hall Law School.


During the Refugee Law Lab’s online seminar in September, students learned about how primary law can be used as data to increase its machine readability. Sarah Sutherland, the president and CEO of the Canadian Legal Information Institute (CanLII), discussed the role of technological solutions in legal work.

Sutherland  first addressed how the structure of primary law affects its usage as data. Primary law is recorded and published in a way that can be referred to in ensuring proper process and governance. Legal data is “linguistic, narrational, situational, contextual, ambiguous, and process-dependant.” It is complex, just as human relationships are. When case-related documents are converted to legal data, it is semi-structured. While section headings are visible, it’s possible that the content within them is not marked or tagged in way that makes it efficient for document creation. This means the machine-readability is weak, as the content in the underlying document cannot be analyzed, requiring manual tagging.

Sutherland used Amazon to exemplify the impact when analyzing large amounts of data. Amazon, records millions of interactions and, through that data, they can extrapolate and predict what the user may be interested in purchasing. In some instances, Amazon can even prepare shipments before customers make purchases because of their forecasting accuracy. On the other hand, legal data lacks “yes/no” binary data and is incompatible with statistical analysis, despite employing large data sets. Specifically, in reference to legal data, “statistical methods may not have enough data points to give reliable results,” while “machine learning works well, and natural language processing techniques are more helpful.”

Statistical analysis requires a representative random sample — a criterion impossible for legal data to meet, as case law itself is non-random. Sutherland noted such arbitrariness existed at any level. For instance, “litigants are biased towards people who have money and high conflict complex problems, cases are selected by judges for which they will write and publish decisions, and matters heard by appellate courts are selected,” Overall, people who have legal problems are not random.

Language, specifically unintentional and intentional imprecision, is yet another reason it can be inefficient to use legal data. There is unintentional imprecision in language because of ambiguities in human syntax, as well as a lack of definition and/or precision about concepts. There is also intentional imprecision because not all situations can be anticipated and because of subjective human interpretations. For example, the legislature is mindful of the fact that if the drafted statute does not anticipate or is unclear about a specific set of facts, it will go to the courts who will apply their tools of statutory interpretation. Sutherland suggested that we should approach data the same way we approach creating statute — by accounting for imprecision.

Sutherland also discussed that machine-readable law can solve such problems through tagging and coding that makes the law explicit – this is referred to as “law as code.” Publishing law in such a way makes it widely usable across different applications that increase efficiency within the legal system. How can this be done? It was suggested that the law could be expressed in a literal way —law as code— and then in a human way. Rather than converting law expressed in human language to code, computer code would be converted to human language.  

While it is unclear whether this would allow for more functionality with ambiguity in the law as law makers would require sophisticated levels of technical and computing skills to make it possible, there is potential to make the legal system more productive.