View Full Transcript
Episode Transcript
[00:00:03] Speaker A: Hello. Welcome to the relay presented by Lexie Amica, the leading attorney referral network. I'm Gabriel Steeritz, the founder and CEO of Lexamica. This is a show for leaders who use cutting edge technology and AI to enhance their law firms. Our listeners are the owners in C suite at personal injury, mass tort, and other plaintiff law firms. Joining me today is Mike Listener, executive director and CTO at the Free Law Project.
Free Law was started in 2010 as a nonprofit using technology, data, and advocacy to make the legal ecosystem more equitable and competitive. I'm a huge fan of the work that Mike is doing at Freelaw project. Partnered with him on projects in the past. Very excited to talk about what he's doing today.
We discussed a few things that they have in the works, including AI applications for court data, bill with Adam Schiff to bring FOIA to the judiciary, and even some services that are relevant to your plaintiff law firm, including an e filing system. And if you stick around to the end, we're going to talk about a project that I'm very excited about that's a little bit more blue sky, but has the potential to be very revolutionary. Mike, welcome to the show.
[00:01:09] Speaker B: Thanks, Gabriel. It's great to be here.
[00:01:11] Speaker A: Yeah, great to have you. So, our. Our relationship started, I think, five years ago when I reached out to you, had found that you had reverse engineered the way that the federal court system releases documents when they are filed by parties, which was really interesting. I was working at a federal plaintiff's law firm, and so there was an opportunity to build a project together where we were able to get some documents more seamlessly in the process. Caught the vision you have for creating free public access to legal materials and just really respect not only the vision, but the work ethic you've put behind it. Give us a high level of what the free law project is doing. What's the scope of your data library at this point? And I just think it's absolutely phenomenal.
[00:01:59] Speaker B: Sure. Yeah. So, like you said, free law project is a nonprofit, so we're probably a little bit unique in the legal ecosystem for that reason. But our high level goal is just make the system better. Like, we want to make the judiciary better. And we do that primarily two ways.
The first is with advocacy. When we see an opportunity where, like, geez, if there was just a law, things would be so much better, we go and we pursue that, and then the major way that we do it, the second way is with technology and data.
So I think we'll get to the advocacy stuff later, hopefully but on the data side that you asked about, we've been doing this now for over a decade, and we collect data where we see opportunities.
One data set we've got now is case law. We've been working for over a decade to create essentially the first complete, high quality, maintained, open collection of case law. It's got about 10 million decisions in it. We get maybe about 1000 more every day, working on citations, working on scanning. It's a huge project and it'll never end, but it's a little bit like Wikipedia. First you got to go back to the 1650s and start collecting the content, get it out of the books, get it however you can, put it online, and then once that's done, it's done.
Outside of that, we have, I believe, the world's largest collection of baser filings, federal court filings.
We have metadata in that collection on about 400 million items and actual PDF's from tens of millions of PDF's. So that's a huge dataset.
It's just massive, fully searchable, accessible, et cetera.
[00:03:49] Speaker A: So you have a massive corpus of data.
Is this one of the largest data sets of the kind that exists? My guess would be yes.
[00:03:58] Speaker B: Yeah, that's a good question.
The big thing we don't have that you see in other places is filings from state courts. So organizations that have that often have a lot more than us.
But when it comes to federal filings, we've certainly got one of the biggest collections. When it comes to case law, we've got the biggest open set that's available.
Case law is particularly hard because if you're missing one decision from the 17 hundreds, someone's going to eat mad.
And if you're missing a citation that came out in the federal reporter yesterday, people are going to get mad. So it's always a game of keeping it up to date and pushing it to be as good as it can.
But I think at this point, we've got the best open collection of both of those things.
[00:04:45] Speaker A: Yeah, that's so. And then what are the applications here? I mean, obviously, law firms are willing to pay big money to Lexis and to Westlaw to roll up the data and make it presentable and help them to do research. But why do we need an open source corpus of this information? Why is that so important to our society?
[00:05:03] Speaker B: Yeah, well, one thing I come across a lot is that people don't want, like legal tech companies don't want to compete with their data provider. They don't want to say, hey, Lexus, I'm going to make this hot new innovation to compete with you.
Also, can you please give me some data? That's a bad situation to find yourself in.
One thing our organization has done is encouraged the legal ecosystem to flourish by creating this data and getting it out to all the innovation that's happening at the firms. What's different is that we're here to make you successful.
We're a nonprofit. We're not out there to extract every last dollar. So if you come to us and you say, hey, I'm interested in employment law, and I want to know all the initial complaints for all the employment law cases going back ten years. Cool, that is squarely in the stuff we do. We'll get that data, we'll give it to you, and then you can use it for AI, machine learning, even just put it in your system so you can search it, like, look for similar cases.
So we just try to inspire innovation?
[00:06:16] Speaker A: Yeah, absolutely. And you recently announced that you are partnered with OpenAI to provide data to their machine learning models. I imagine that you've potentially worked with other, uh, LLM providers. To me, like that, that's absolutely massive. We need competition in this space. LLMs only are as good as the training information provided. They literally wouldn't be as smart if they didn't have the data set that you're providing for them. And I think that's really critical, because GPT isn't just useful for lawyers. It's useful for average Americans like myself, who have legal questions and will ask this thing just about anything.
[00:06:57] Speaker B: Right.
[00:06:57] Speaker A: I don't think that the average American is going to say, well, GPT's, you know, it's, it's good enough to pass the bar and then not going to ask that legal questions, like the. Its ability to pass the bar, in part, is because of the data that, that you've provided, that you've collected over time. And I think that's really, really critical to, to our society functioning well. And I think that you're also create, like you said, you're creating a more flourishing ecosystem for these legal tech services that are coming online and providing valuable help to people all the way from law firms, but also directly to consumers who have legal questions. And I think that's absolutely critical. So one of the things that you're working on is a bill with Adam Schiff to bring FOIA to the judiciary. I'd love to hear a little bit about that and why that's an important part of what you do in the advocacy side.
[00:07:48] Speaker B: Yeah. So one of the datasets we haven't talked about yet is we have a database of judges, and we have a database of their financial disclosures.
And if you've been following the news lately, I think most people are aware of all of this stuff that's happened around, like Justice Thomas and some of the other justices that haven't been disclosing things properly. That's also coming out of our data.
And in doing that work, we realized something that I think a lot of people run into eventually, which is, oh, hey, there's this amazing FOIA law for the executive branch. And if you want to do reform or if you want to understand the judicial branch, tough luck. Right? There's a couple of things that they have to publish, like the financial disclosures, and getting those has not been easy.
And beyond that, there's not a lot that you can do. And so we're working with Rep. SchiFfeD to bring the judicial FOIA bill, and we're hopeful that'll help. Right. It's narrowly constrained. The goal is to focus on sort of the administrative and the security apparatus of the judiciary because they do have their own police force, and that should also be subject to these kinds of transparency laws.
Just so we can understand what's going on inside. Like, what are the policies they have? We don't know. They won't tell us who is even on the policy making body. They don't tell you. They treat it like it's a secret so that we literally cannot petition the people making the decisions. So there's a lot of things like that. And hopefully this bill will help.
[00:09:23] Speaker A: Absolutely. Again, the transparency, you're not advocating necessarily for a change in this direction or that direction. You're saying, hey, this needs to be open. As Americans, we have a right to information about how the country is run, how the system is working. I think that's an incredibly powerful piece of advocacy because it really should go across both sides of the aisle to say, look, we need to understand what's going on here. It's okay if we disagree about it. We got to understand it. And so that's really important.
[00:09:52] Speaker B: And I'll say it's a shame that so many of these scandals have come down on one side of the aisle.
A part of me actually wishes that we had a bipartisan problem, because if we did, we would have a bipartisan solution. But as it is, people sort of hunker down and say they're attacking the, the conservative justices and the conservative judges, and that's not what we're about at all.
We're all about fixing problems, and unfortunately, the problems have mostly been on one side of the aisle so far.
[00:10:20] Speaker A: Well, and I think you have a track record of doing things in a very rigorous way when it comes to unlocking information. You all are positioned well to say, hey, look, we're here to bring more things to light. We've done it in the decisions. We've done it in federal court filings, and we need to do this in the judiciary. So that actually makes a lot of sense. One of the things we're talking kind of high level here, and these are exciting things from a principled perspective. You do have some interesting tools that you've built that could benefit or do benefit law firms. Could benefit plaintiff law firms. Tell us about some of the applications that you have seen directly helping law firms in their day to day operations.
[00:11:04] Speaker B: Yeah, so, I mean, I hinted at one a minute ago. If you want to get the big picture of an area of federal law and by gathering a bunch of data. Awesome, right?
If you want to build your own sort of AI model using particular kinds of legal data, whether that's like tax law, employment law, anything in the federal world. Awesome, right?
We've got that data, and we work with firms all the time.
The other big thing that we do is provide what I think of it as document plumbing.
What I mean by that is when a document comes out, there's a pipeline that it goes through.
We have a lot of tools to help make that pipeline better in a lot of firms. There's a human at every step in the pipeline, and humans make mistakes, and humans cost money, and humans are inefficient and need training, all this stuff. Right? And so we have created a handful of tools where you can sort of be like, okay, look, we're involved in this federal case.
There are email alerts from that case. Those can be captured. Those can be automatically handled and put into my internal document system. Right. And then, boom, automatically, all your cases are in your system. You no longer have to have a paralegal sitting there downloading documents all day. Right?
[00:12:31] Speaker A: Yeah.
[00:12:32] Speaker B: So we have things like that.
[00:12:33] Speaker A: If you're a high volume federal court filing law firm or you're. That's a massive time saver, and you're not even talking about necessarily building complex taxonomies for documents, you're talking about saying, hey, this is the court case number. Let's get this filed into our system without someone touching it. That's a no brainer. If you're involved in federal court cases, you should 100% do that I think it's really interesting that you are taking, with the advent of LLMs, you're able to take this data set and then turn that into bespoke tools. This isn't something that we had talked about before. Um, but, man, I mean, I can just see some absolutely amazing applications of training AI models specifically on certain types of, of case information, whether that's for internal use inside of law firm, like you said, like if you've got lawyers or you're some kind of tax, tax accounting firm, being able to answer those questions and then cite to the actual information, because you have a ground, you have ground truth with your data set, which is, I think the most important thing about building any kind of AI generative AI solution in this space is the ability to cite to ground truth data sources, because you, you, there should be a zero tolerance for hallucinations, and this data set allows for that with case law and with the filings themselves. So that's fascinating. I mean, I think law firms should all be exploring that relevant to the type of cases that they're practicing.
[00:13:57] Speaker B: Yeah, and we've built APIs as well to help with the hallucination problem. So if you have a system that's generating text, or even if you have a filing that an attorney wrote, you can send us all that text from the filing or from the generated thing, and we'll tell you, oh, here are the 20 citations within the document.
These 19 look cool. This 20th one, we don't have it in our system. So either we are missing the citation, which happens, or that citation doesn't exist because maybe it was hallucinated. Right. And so we have an API for that that makes this very simple. You can integrate it into your firm, you can integrate it into whatever legal tool you're building.
It's pretty cool.
[00:14:43] Speaker A: That's, that's incredible. Look, the largest valuation in the space right now, or the fastest growing valuation is Harvey, which is doing internal large language model training. It's unclear exactly what they're doing in the space, but massive hype, massive valuations around this. And you have built this on a nonprofit, a true nonprofit, not a fake nonprofit like OpenAI, to really get this in the hands of people who can do some good with it. So that's fascinating. That solves a massive problem in space right now.
My only hope is that you all will eventually be able to expand your scope to some of the state level court filings as well, and have a comprehensive database of this, because, man, it's only going to be more important and the ability to tie generative AI to this data set is a way to really give the public the ability to understand the law for themselves. And I think that's, that's fantastic. Um, something else that you are working on is, uh, a little, maybe more blue sky or early stages at this point, but something that could directly impact law firms who are listening to this, uh, the beginnings of an open sourced court management system. I'd love to hear a little bit about that.
[00:16:00] Speaker B: Yeah, I mean, that's the perfect lead in, right?
You sort of mentioned how wouldn't it be great if we got into the state world?
And a lot of what we do now is we gather data from the outside and it's really hard. You're dealing with scraping HTML and trying to download the latest documents when they come out. And it's not like these systems tell you, oh, hey, there's a new document over here. Come and get it. You have to figure that out in the first place.
So we, after doing that for many, many years, have decided what would happen if we got inside the court, what would happen if we built the first open system for the court to use so that we can bring the innovation of AI and LLMs. And hell, wouldn't it be cool if open case law data set was also part of your filing system?
It could be checking for hallucinated citations, for example. Right. When you file.
These sorts of things are not science fiction. The systems that courts have are laughably simplistic and bad. They do a few things well.
They compete based on how well they do procurement. They compete based on having really complicated contracts. We've actually been using FOIA to get these contracts.
Texas is paying one and a half million dollars every month for their case management system and case management system sounds sophisticated. Sounds like this big complicated thing. It's a website.
At the end of the day, it's some software, and you have to do the math. What's 1.5 million times twelve? How many salaries could that pay for?
Somewhere along the line, the math just makes $0.00. So we're hoping to go into that, build an open system that any court can adapt and start lowering those costs so the courts can spend them in better ways, start providing better technology to people so they get a better shake when they use the justice system, make the people inside the courts have better tools, too. I don't think it's going to be that. I mean, it's complicated, but I think it's within the realm of possibility. So we're researching that now. To sort of figure out what are the subtleties there.
[00:18:18] Speaker A: Well, and I think that one of the reasons, selfishly, why law firms that are dealing with a court system would want this is to your point early on, it's, it will be a better service. You have legacy providers that have sat on cash cow contracts for a very long time that have no incentive to innovate. It will be very hard to unseat the incumbent competition. But at the same time, there is a real reason for all parties to want to this to become the future reality. So I would encourage, if you're listening to this, go and take a look at the free law project and you can donate. They're 501.
They are working on new tools. You should check that out. You should reach out to Mike if you have a speculative question, uh, that you think could be a tool that your law firm, uh, wants to build internally, uh, I know not a lot of our listeners have the ability to go in and take raw data like this and turn into a tool, but it's always worth having a conversation. And, uh, if nothing else, I really believe that free law project is doing something that is incredibly valuable for our society and something that more people need to know and understand. That open source legal data, uh, truly underpins our ability to live as a free society. So, Mike, thank you so much for taking time to be on the show today. Really appreciate it. I know you're super busy, but always a pleasure.
[00:19:35] Speaker B: Hey, it's my pleasure to be here. It's always fun to talk about this stuff.