Healthcare company associate GC on where legal AI products break down

Background

We spoke with an in-house legal leader at a healthcare company who oversees contracting, compliance, and public-company work with a lean team. The conversation covers why tools like Harvey and Legora struggle to justify their price against enterprise ChatGPT, where CLM and legal AI products consistently break down for non-law-firm buyers, and what it would actually take to win a deal with a skeptical, resource-constrained in-house team.

Key points via Sacra AI:

For lean in-house legal teams, Harvey and competing legal AI products consistently fail to justify their per-seat pricing against enterprise ChatGPT plus a research tool like Westlaw, because the workflow pain they solve isn't large enough to warrant adding another system on top of what already exists. "Speaking as someone who's done an ad hoc evaluation for our own in-house team, it's pricing. The per-user, per-month pricing on Harvey, GCAI, and others I looked at is just so high. We pay thirty dollars per user per month for ChatGPT, and our Westlaw subscription is more expensive but gives a lot more than just the CoCounsel product, it includes access to the proprietary database. Price is for sure the biggest hurdle. I can't say with a straight face that this is worth the money, because we can get almost all of it done with existing tools."
The most valuable unmet use case for in-house legal teams is a contract lifecycle management tool that can ingest a team's own documents, build a working playbook, and flag anomalies in third-party contracts with specific comparison language, a capability that vendors including Luminance have promised but failed to deliver in practice. "If it could do a substantive ingestion of all our documents and build a playbook that demonstrably worked, where it could take agreements it's never seen and say, 'This clause is about this subject. Your contracts usually say this, but this one says this. Do you want to make a change? Here's some suggested language.' Really just accurately flagging third-party documents as counter to our internal documents... It would need to be part of the contract management tool, this is what I thought I was getting with Luminance."
Defensibility in legal AI is mostly inertia and switching costs rather than genuine product moat, which means the category is likely to commoditize or lose out to major platform vendors like OpenAI once incumbent relationships can be unseated with price concessions. "It seems like they're all racing to be the one that absorbs all the others. From the demos and sales pitches I've seen, it's all going to become commoditized, or it's going to lose out to ChatGPT or a major platform. Inertia. That's why they're trying to gobble up as much market share as they can, it's kind of a pain to switch these tools. If they become the first mover on AI at a place like Baker McKenzie or Kirkland & Ellis with thousands of attorneys using them, that's a big switch to make. But that's not super defensible, someone can make the incumbent offer enough price concessions if they can undercut them."

Questions

Have you come across Legora or Harvey in your evaluation process, or seen them used in environments you know well?
Based on the Harvey demos, how would you describe what it's actually trying to be, especially compared with something like CoCounsel or a more general ChatGPT setup?
What, if anything, in the demo felt meaningfully differentiated from just using an enterprise ChatGPT license with decent prompts—like workflow, document grounding, review, or firm knowledge integration?
Since you haven't looked at Legora directly, have you heard enough from peers, outside counsel, or vendors to have a view on what Legora is supposed to do day to day—like what people actually open it up to do?
In your network, what kinds of legal teams seem most likely to seriously evaluate tools like Legora or Harvey—large law firms, bigger in-house departments, regulated teams, or something else?
For larger in-house teams, especially ones dealing with healthcare compliance or public-company sensitive materials, how much does a tool's ability to use the team's own documents really matter—things like playbooks, prior agreements, internal templates, and clause positions?
For an in-house team, does that make a tool feel actually purpose-built only if it can work well with your own docs and precedent, or do most of these still feel more like law firm products looking for an in-house use case?
Where do those products actually break for a lean in-house team—is it that the workflows are too narrow, the setup is too heavy, the output isn't reliable enough across very different workstreams, or something else?
Switching back to Harvey, how does its positioning land with you—like, what is it actually trying to be for legal teams?
For in-house teams, especially with healthcare compliance and other sensitive workflows, where do you think Harvey is genuinely strong in real use, and where does the positioning get ahead of what it can actually do?
If you put Harvey side by side with something like enterprise ChatGPT plus Westlaw or CoCounsel, is the main issue that it doesn't clear the bar on output quality, or that it doesn't solve enough workflow pain to justify another system?
If an in-house team were evaluating Harvey against just expanding enterprise ChatGPT usage, what usually tips the decision one way or the other—is it pricing, security comfort around non-public materials, admin controls, or something else?
When you're dealing with non-public board materials, financing docs, or draft disclosures, are there any governance or securities-specific constraints that would make you more or less comfortable using one of these tools versus enterprise ChatGPT?
Do you think tools like Harvey and Legora are building toward real standalone platform positions, or are they more likely to get absorbed into broader enterprise, research, or contract management stacks?
Where do you think tools like Harvey or Legora still have something defensible, if anything, versus where they're most exposed to just getting flattened by enterprise ChatGPT, Westlaw, or broader platform vendors?
What do you think people outside in-house legal consistently get wrong about how legal AI works in practice on the buyer side, especially in a lean team with healthcare compliance, contracting, and public-company work all mixed together?
Anything we haven't covered that feels important to add about how these tools actually get bought, tested, or rejected inside in-house legal teams like yours?
When vendors promise onboarding and customer success, what do you actually need to see for that to matter—faster setup, live legal-domain help, workflow configuration, training for non-technical users, or something else?
If a vendor had truly responsive support with legal-savvy people, would that actually change a buying decision for you, or is it still secondary to price and the product just working out of the box?
If you were pressure-testing a vendor now, how would you try to validate support before signing—reference calls, service levels in the contract, trial-period responsiveness, or something else?
If a vendor can't prove fast, legally informed support and easy implementation, does that basically kill the deal for a lean in-house team, even if the demo looks good?
Since you've lived the downside of a tool that overpromised on implementation and support, what would an ideal evaluation process look like now before you'd sign anything new—like, what would you insist on seeing in a pilot or proof of concept?
In that kind of pilot, what would you want those non-legal users to actually do successfully—submit requests, redline against fallback language, answer intake questions, route approvals, or something else?
On the contract side, is the real value mostly in better intake and routing for the business, while the AI piece is more of a secondary assist for legal once the request gets to you?
On the litigation or outside counsel side, have you seen any AI use case that feels more compelling than contract workflow—like invoice review, matter summaries, discovery organization, or tracking advice across firms?
On outside counsel bills specifically, what would a tool have to do well enough for you to actually trust it—flag staffing anomalies, spot duplicate or vague entries, compare against billing guidelines, or benchmark firms against each other?
On the compliance side, for things like FDA, AKS, Stark, or privacy review, have you seen any legal AI product get meaningfully closer to being trusted for first-pass issue spotting there, or do they all still break in basically the same way?
In that regulated review work, when they fail, where does it usually happen—missing the real issue entirely, sounding confident but shallow, or not handling the factual nuance in the arrangement?
When it misses on factual nuance, is that because the tools can't really reason through the business arrangement, or because people don't give them enough structured context in the prompt?
Do you think that gap is fundamentally hard for current AI to close because the judgment comes from lived pattern recognition in healthcare arrangements, not just access to more documents or better prompts?
Do you think the realistic ceiling for AI in regulated healthcare legal is mostly productivity on drafting and organization, rather than true substantive judgment on compliance-sensitive arrangements?
What specific capability would make you say this saves enough time or solves enough pain to actually pay for, instead of just sticking with ChatGPT plus Westlaw or CoCounsel?
If a vendor claimed they could do that today, what proof would you need before believing it actually works in practice?
In that kind of test, what would success look like—high-accuracy clause spotting, low false positives, useful fallback language, or time saved versus your current manual review?
How would you want that to work operationally: upload a third-party agreement and get a clause-by-clause comparison to your playbook, or more of a red-flag summary that points you to the sections that break from your usual positions?
If a tool did that well on third-party contracts, would that be enough on its own to justify buying it for a lean team, or would you still need it to prove value across other work too?
When you imagine that ideal setup, where does the AI need to show up natively so it actually reduces overhead—intake, triage, first-pass review, approvals, signature, post-signature search, or basically all of it?
If a CLM tool nailed that review layer but was just average on the rest, would that still be enough to win, or does the end-to-end workflow still have to be solid for you to adopt it?
What does "easy enough for your least capable client" actually mean in practice—like, what are the specific failure points you'd want a vendor to eliminate for those users?
When that kind of off-rails behavior happens, what would a good product do instead: surface a clear next step to the user, auto-route to legal, preserve all the entered context, or something else?
When you think about why tools get rejected after a decent demo, is the biggest miss usually that they only work on the happy path and fall apart on real-world exceptions?
In a pilot now, would you intentionally test messy edge cases—unusual approval paths, broken intake, counterparty changes, incomplete submissions—before you'd trust any vendor?

Interview

Have you come across Legora or Harvey in your evaluation process, or seen them used in environments you know well?

I've watched demos of Harvey, but not of Legora.

Based on the Harvey demos, how would you describe what it's actually trying to be, especially compared with something like CoCounsel or a more general ChatGPT setup?

My impression is it's just a wrapper for the large language models. I didn't see a lot that distinguished it.

What, if anything, in the demo felt meaningfully differentiated from just using an enterprise ChatGPT license with decent prompts—like workflow, document grounding, review, or firm knowledge integration?

It's been a little while, but they did have some pre-built workflows that might save a little bit of prep time. But what was generated was not meaningfully different from the enterprise models.

Since you haven't looked at Legora directly, have you heard enough from peers, outside counsel, or vendors to have a view on what Legora is supposed to do day to day—like what people actually open it up to do?

I wouldn't say I do. I feel like they're trying to do everything for everybody, but I don't know anyone who even has a seat license, frankly, in my network.

In your network, what kinds of legal teams seem most likely to seriously evaluate tools like Legora or Harvey—large law firms, bigger in-house departments, regulated teams, or something else?

Larger law firms and possibly larger in-house teams.

For larger in-house teams, especially ones dealing with healthcare compliance or public-company sensitive materials, how much does a tool's ability to use the team's own documents really matter—things like playbooks, prior agreements, internal templates, and clause positions?

Internally, the legal team pressed for the enterprise ChatGPT just so we could have it manipulate our documents and feel safe feeding in material that won't get fed back into the model or violate confidentiality.

For an in-house team, does that make a tool feel actually purpose-built only if it can work well with your own docs and precedent, or do most of these still feel more like law firm products looking for an in-house use case?

It does seem like the latter. Even if they market a little bit differently, the products just don't seem designed for smaller in-house teams where everyone has to do so many different things.

Where do those products actually break for a lean in-house team—is it that the workflows are too narrow, the setup is too heavy, the output isn't reliable enough across very different workstreams, or something else?

In my experience, setup is the biggest issue. We have Luminance as our contract lifecycle management tool, and it's supposed to be able to build playbooks. It was pitched as something that could do that from our documents, but once we actually got the product installed, there's so much setup and training required that I just don't have time to get it started. And then there's reliability—I don't find any of the output to be a hundred percent reliable, which means I always need to check it. If these tools were truly trustworthy, I could empower more junior members to do more advanced work with them, but I end up having to check it myself anyway.

Switching back to Harvey, how does its positioning land with you—like, what is it actually trying to be for legal teams?

Their marketing is everything for everyone. But the demos I've seen—and I feel the same about GCAI and some of the others I've looked at—it's not getting me a lot more than I'm getting with our enterprise ChatGPT subscription.

For in-house teams, especially with healthcare compliance and other sensitive workflows, where do you think Harvey is genuinely strong in real use, and where does the positioning get ahead of what it can actually do?

On the research side, I trust the Westlaw products more because they have the proprietary database. The sort of chat-prompt type functionality doesn't seem to distinguish itself at all.

If you put Harvey side by side with something like enterprise ChatGPT plus Westlaw or CoCounsel, is the main issue that it doesn't clear the bar on output quality, or that it doesn't solve enough workflow pain to justify another system?

The latter. It doesn't solve enough.

If an in-house team were evaluating Harvey against just expanding enterprise ChatGPT usage, what usually tips the decision one way or the other—is it pricing, security comfort around non-public materials, admin controls, or something else?

Speaking as someone who's done an ad hoc evaluation for our own in-house team—it's pricing. The per-user, per-month pricing on Harvey, GCAI, and others I looked at is just so high. We pay thirty dollars per user per month for ChatGPT, and our Westlaw subscription is more expensive but gives a lot more than just the CoCounsel product—it includes access to the proprietary database. Price is for sure the biggest hurdle. I can't say with a straight face that this is worth the money, because we can get almost all of it done with existing tools.

When you're dealing with non-public board materials, financing docs, or draft disclosures, are there any governance or securities-specific constraints that would make you more or less comfortable using one of these tools versus enterprise ChatGPT?

As it's been described to me by our IT team, I have no hesitation using ChatGPT for those source materials. My understanding is it's sandboxed—the information isn't leaving our controlled, secured environment or being fed back to train the model. So I don't have a lot of hesitation about that.

Do you think tools like Harvey and Legora are building toward real standalone platform positions, or are they more likely to get absorbed into broader enterprise, research, or contract management stacks?

It seems like they're all racing to be the one that absorbs all the others. From the demos and sales pitches I've seen, it's all going to become commoditized, or it's going to lose out to ChatGPT or a major platform.

Where do you think tools like Harvey or Legora still have something defensible, if anything, versus where they're most exposed to just getting flattened by enterprise ChatGPT, Westlaw, or broader platform vendors?

Inertia. That's why they're trying to gobble up as much market share as they can—it's kind of a pain to switch these tools. If they become the first mover on AI at a place like Baker McKenzie or Kirkland & Ellis with thousands of attorneys using them, that's a big switch to make. But that's not super defensible—someone can make the incumbent offer enough price concessions if they can undercut them. So inertia and the difficulty of changing an established workflow, especially across a larger user base.

What do you think people outside in-house legal consistently get wrong about how legal AI works in practice on the buyer side, especially in a lean team with healthcare compliance, contracting, and public-company work all mixed together?

They don't get how seamless the product needs to work right out of the gate. If something is a stumbling block for me or someone on my team as we're trying to introduce a new tool, it loses credibility almost instantly. It has to seamlessly fit into what we're already doing and tolerate less experienced or resistant users while still producing the expected work product.

Anything we haven't covered that feels important to add about how these tools actually get bought, tested, or rejected inside in-house legal teams like yours?

The thing I haven't seen anyone crack yet is where you do the demo, sign the contract, cut the check, and then you're humming the next day. Better support would help, but beyond that, introducing a new tool to my whole team requires an expert who's available to help with any problems and iron them out smoothly and immediately. That just doesn't seem to be the case for most of these offerings.

When vendors promise onboarding and customer success, what do you actually need to see for that to matter—faster setup, live legal-domain help, workflow configuration, training for non-technical users, or something else?

Live, legal-savvy help would be good, along with fast turnaround times. With our contract management tool Luminance, when there's a problem, it's a twenty-four to forty-eight hour response time at minimum—sometimes longer—and the response doesn't always solve the problem. It's been a real thorn in our side.

If a vendor had truly responsive support with legal-savvy people, would that actually change a buying decision for you, or is it still secondary to price and the product just working out of the box?

It would be a strong consideration for sure. But it's also hard to evaluate during the sales and evaluation period. I thought we were getting that with our current CLM, but once we exited the evaluation period, the tone changed.

If you were pressure-testing a vendor now, how would you try to validate support before signing—reference calls, service levels in the contract, trial-period responsiveness, or something else?

References would be good. And SLAs with real teeth—not just a ten percent discount if they fail a metric. I want them to live and die by this. I want to be able to get out of the contract if I'm displeased more than a couple of times with the support response.

If a vendor can't prove fast, legally informed support and easy implementation, does that basically kill the deal for a lean in-house team, even if the demo looks good?

Definitely. Unfortunately, contracts aren't written that way. That's why we're still using Luminance two years later.

Since you've lived the downside of a tool that overpromised on implementation and support, what would an ideal evaluation process look like now before you'd sign anything new—like, what would you insist on seeing in a pilot or proof of concept?

I'd want an easy way to test my non-legal users, because that's where a lot of the friction is. We need non-legal people to use the product for it to be helpful to us on the contract side. There wasn't a way to experience it as a regular user—it was either my power user account or nothing, despite asking for it. I'd want a real, lengthy demonstration that acknowledged problems and showed how those problems would be solved for less technical users.

In that kind of pilot, what would you want those non-legal users to actually do successfully—submit requests, redline against fallback language, answer intake questions, route approvals, or something else?

Submit requests, route approvals, and ask questions. I don't want my non-legal users redlining.

On the contract side, is the real value mostly in better intake and routing for the business, while the AI piece is more of a secondary assist for legal once the request gets to you?

Yes, that's certainly what I had in mind for it.

On the litigation or outside counsel side, have you seen any AI use case that feels more compelling than contract workflow—like invoice review, matter summaries, discovery organization, or tracking advice across firms?

In theory, all of those would be great use cases for AI. But we're just not a large enough team that it makes sense to even explore most of them. That said, I would love to see an AI analysis of the last two years of bills from my most-used outside counsel—to see if there are excess billers or anything that looks strange. Document analysis is another area, but again, I don't think we're a big enough team where we have such a range of documents that manual review isn't still a sufficient solution.

On outside counsel bills specifically, what would a tool have to do well enough for you to actually trust it—flag staffing anomalies, spot duplicate or vague entries, compare against billing guidelines, or benchmark firms against each other?

I'd love to see anomalies flagged and firms benchmarked against each other. Those would be my top two asks.

On the compliance side, for things like FDA, AKS, Stark, or privacy review, have you seen any legal AI product get meaningfully closer to being trusted for first-pass issue spotting there, or do they all still break in basically the same way?

I haven't seen anything stand out on those types of issues.

In that regulated review work, when they fail, where does it usually happen—missing the real issue entirely, sounding confident but shallow, or not handling the factual nuance in the arrangement?

Factual nuance and just straight-up missing the issue are the two most common I've seen, at least in demos and elsewhere.

When it misses on factual nuance, is that because the tools can't really reason through the business arrangement, or because people don't give them enough structured context in the prompt?

The creators of the product would say the latter, but I'd say it's probably the former. Things can be described many different ways, but you need experience with the deals to understand when something is creeping up on a potential compliance violation.

Do you think that gap is fundamentally hard for current AI to close because the judgment comes from lived pattern recognition in healthcare arrangements, not just access to more documents or better prompts?

I think that's right. You could have an arrangement described in such a way that, if all you had was past precedent and documents—what an AI would be trained on—it wouldn't flag it. But someone who understood how doctors work, how healthcare entities transfer money, or how value gets transferred in these arrangements could detect it.

Do you think the realistic ceiling for AI in regulated healthcare legal is mostly productivity on drafting and organization, rather than true substantive judgment on compliance-sensitive arrangements?

At the risk of being viewed as a dinosaur in a year or five, yes. That's what I think.

What specific capability would make you say this saves enough time or solves enough pain to actually pay for, instead of just sticking with ChatGPT plus Westlaw or CoCounsel?

If it could do a substantive ingestion of all our documents and build a playbook that demonstrably worked—where it could take agreements it's never seen and say, "This clause is about this subject. Your contracts usually say this, but this one says this. Do you want to make a change? Here's some suggested language." Really just accurately flagging third-party documents as counter to our internal documents.

If a vendor claimed they could do that today, what proof would you need before believing it actually works in practice?

I'd really need a full install where I let it ingest all my documents—so I'd have to trust it to do that—and then have it show me real-world examples that I'm providing, not curated demo data.

In that kind of test, what would success look like—high-accuracy clause spotting, low false positives, useful fallback language, or time saved versus your current manual review?

The first two: accurate identification and flagging. The suggested language would be nice, but if it's flagging the issue, I'm going to know what language I want to put in there.

How would you want that to work operationally: upload a third-party agreement and get a clause-by-clause comparison to your playbook, or more of a red-flag summary that points you to the sections that break from your usual positions?

The latter, and I'd want it to work with comments I can clear as I move through it.

If a tool did that well on third-party contracts, would that be enough on its own to justify buying it for a lean team, or would you still need it to prove value across other work too?

It would need to be part of the contract management tool—this is what I thought I was getting with Luminance. It would have to live in the whole lifecycle management workflow and do all of those things well. If I have to bounce from tool to tool for every discrete part of contract lifecycle management and the contracting process, it just creates too much overhead.

When you imagine that ideal setup, where does the AI need to show up natively so it actually reduces overhead—intake, triage, first-pass review, approvals, signature, post-signature search, or basically all of it?

First-pass review is probably the most important. I'd want at least a human touch point on all the other steps progressing it through.

If a CLM tool nailed that review layer but was just average on the rest, would that still be enough to win, or does the end-to-end workflow still have to be solid for you to adopt it?

It would be closer to winning for sure. But besides the first-pass review, it also has to be easily used by my least technically capable client.

What does "easy enough for your least capable client" actually mean in practice—like, what are the specific failure points you'd want a vendor to eliminate for those users?

They need to be guardrailed through the process, but if something anomalous happens—whether caused by them or the other party—they need to be safely guided back on track. That's where my current process breaks down. If the user does something unexpected, everything comes to a halt, and I have to step in and possibly bug the vendor to fix it, or fix it myself.

When that kind of off-rails behavior happens, what would a good product do instead: surface a clear next step to the user, auto-route to legal, preserve all the entered context, or something else?

Definitely preserve the context—figuring out what happened is a big problem we have. But ideally, it would also surface a helpful next step to the user without requiring intervention from me or anyone else.

When you think about why tools get rejected after a decent demo, is the biggest miss usually that they only work on the happy path and fall apart on real-world exceptions?

Yes, that's been my experience. The happy path works great. Anything else grinds to a halt.

In a pilot now, would you intentionally test messy edge cases—unusual approval paths, broken intake, counterparty changes, incomplete submissions—before you'd trust any vendor?

I certainly would now.

Disclaimers

This transcript is for information purposes only and does not constitute advice of any type or trade recommendation and should not form the basis of any investment decision. Sacra accepts no liability for the transcript or for any errors, omissions or inaccuracies in respect of it. The views of the experts expressed in the transcript are those of the experts and they are not endorsed by, nor do they represent the opinion of Sacra. Sacra reserves all copyright, intellectual property rights in the transcript. Any modification, copying, displaying, distributing, transmitting, publishing, licensing, creating derivative works from, or selling any transcript is strictly prohibited.

Healthcare company associate GC on where legal AI products break down

Background

Questions

Interview

Disclaimers

Read more from

Harvey

Harvey at $195M ARR

Harvey at $150M ARR

Harvey at $75M ARR

Read more from

Legora

Legal tech VP of cloud operations on evaluating legal AI tools

Director of Innovation at large law firm on why firms adopt Harvey over Legora

Read more from
#legal

Scott Stevenson, CEO of Spellbook, on building Cursor for contracts

Glean for law

Shubham Datta, VP of Corporate Development at Clio, on Clio's $1B acquisition of vLex

Read more from
#ai

Retell AI revenue, growth, and valuation

Retell AI at $60M/year up 650% YoY

Max Peters, CEO of Adapta, on building AI agents for Brazilian SMBs

Create a free account, or log in.

Free article limit reached.

Standard membership required.

Standard membership required.

Background

Questions

Interview

Disclaimers

Read more from Harvey

Harvey at $195M ARR

Harvey at $150M ARR

Harvey at $75M ARR

Read more from Legora

Legal tech VP of cloud operations on evaluating legal AI tools

Director of Innovation at large law firm on why firms adopt Harvey over Legora

Read more from #legal

Scott Stevenson, CEO of Spellbook, on building Cursor for contracts

Glean for law

Shubham Datta, VP of Corporate Development at Clio, on Clio's $1B acquisition of vLex

Read more from #ai

Retell AI revenue, growth, and valuation

Retell AI at $60M/year up 650% YoY

Max Peters, CEO of Adapta, on building AI agents for Brazilian SMBs

Read more from

Harvey

Read more from

Legora

Read more from
#legal

Read more from
#ai