Healthcare compliance has entered the machine-learning era, and most organizations have not yet noticed.
Providers are using artificial intelligence (AI) to generate documentation, surface reimbursable conditions, and tighten coding workflows. Regulators and payers are using AI to detect abnormal patterns, flag statistical outliers, and identify documentation that does not align with expected clinical behavior.
Both sides are operating faster than traditional human oversight can follow. That much has been said.
What has not been said clearly enough is this: both sides are increasingly using the same classes of technology: natural language processing (NLP). Predictive analytics. Anomaly detection. Generative models. Pattern recognition. Longitudinal modeling.
The provider side and the enforcement side are no longer playing different games with different tools. They are playing the same game with the same tools, just pointed in opposite directions.
We are watching the emergence of a technological chess match in healthcare compliance: two grandmasters at the same board, each anticipating the other’s moves, each running the same kinds of pattern recognition against the same patient populations. One side is optimizing for reimbursement accuracy. The other is optimizing for the detection of reimbursement inflation. The contest itself has become algorithmic.
From Coder Versus Auditor to Model Versus Model
For most of my career, the structure of a Medicare audit dispute was straightforward to describe. A coder made a coding determination. An auditor made a different one. A statistician (sometimes me) argued about the sampling and extrapolation that connected those determinations to a multi-million-dollar demand.
The conflict was human, slow, documentable. You could put a name on every decision in the chain.
That is no longer the structure.
Today, when an AI-assisted coding tool surfaces a diagnosis from the chart and a clinician accepts it, that determination has been shaped by a model. When a Unified Program Integrity Contractor (UPIC), a Recovery Audit Contractor (RAC), or a Supplemental Medical Review Contractor (SMRC) flags that claim for review, the flag itself is increasingly the output of a model – a behavioral or statistical anomaly detector running across a population of providers.
By the time a human auditor opens the file, the suspicion has already been algorithmically generated. The provider’s response, if drafted with AI assistance, will be partially algorithmic too. The conflict is no longer coder versus auditor or hospital versus payer. It is increasingly optimization models versus detection models – and the human professionals on both sides are interpreting the outputs of those models more than they are generating the underlying judgments themselves.
Consider how this plays out in a familiar setting. Say a mid-sized cardiology practice deploys an AI documentation assistant that “listens” to the encounter, drafts the note, and suggests both diagnoses and CPT® codes for clinician review. The clinician accepts the suggestions in roughly 90 percent of encounters, because they are usually right. Six months later, the practice’s evaluation and management (E&M) distribution has shifted measurably toward higher-level visits – enough that the practice now sits in the upper quartile of its specialty peer group.
No one inside the practice has noticed. No individual chart looks problematic. But somewhere in a contractor’s data warehouse, a model has noticed, because that is precisely what the model is built to do. The drift was generated by an algorithm. The detection was generated by an algorithm. By the time human professionals enter the picture, the case is already most of the way built.
Recursive Automation and Why It Matters
There is a phrase I have started using to describe this dynamic, and I think it is worth introducing here: recursive automation. The concern is not simply that documentation is being automated, or that auditing is being automated. The concern is that algorithms are now responding to the outputs of other algorithms, with diminishing human verification at each step.
Consider the loop: an NLP engine reads a chart and proposes diagnoses. A coding-assist model maps those to ICD-10 and CPT codes. A claim is generated. On the payer side, an anomaly detection model evaluates that claim against the provider’s historical pattern, peer benchmarks, and population norms. If the model flags it, a sampling algorithm pulls it into a review universe. If the review yields a denial, an extrapolation model projects an overpayment across thousands of similar claims. The provider then engages a defense, and increasingly, that defense includes AI-assisted analytics that look for the same statistical irregularities in the auditor’s methodology that the auditor’s models were looking for in the provider’s behavior. Models, all the way down. For the first time, in healthcare compliance, much of the substantive analytic work on both sides may be performed by systems that are reading the outputs of other systems, rather than reading clinical documentation directly.
The technical mechanics here are not exotic. Contractor-side anomaly detection models are statistical pattern recognizers: variants of isolation forests, density-based clustering, and increasingly, transformer-based architectures trained on claims-level features like E&M distribution, modifier utilization, case mix, and longitudinal trajectory. It is the same toolkit fraud detection has used in banking for two decades, applied to healthcare claims at scale. On the documentation side, large language models with healthcare fine-tuning read encounter transcripts, suggest diagnoses and codes, and generate supporting narratives in seconds. The clinician’s role has shifted from author to reviewer – and reviewer fatigue, well-documented elsewhere, is just as real here.
Three Patterns I Am Seeing in the Field
Let me get concrete about what this looks like in practice. The patterns below are composites, drawn from years of compliance analytics work and the kinds of behavioral signatures that compliance risk models surface routinely. None point to a single client, and none require fraud to produce a problem. They are simply what happens when AI-assisted workflows interact with population-level statistical surveillance.
Pattern One — E&M distribution drift in a primary care group.
Say a multi-site primary-care group adopts ambient AI documentation. Within nine months, their 99214 utilization moves from 52 percent of established patient encounters to 68 percent. Their 99215 utilization doubles. The clinicians have not changed how they practice medicine. The AI is simply better at surfacing (and documenting) the elements that justify a level four or five visit.
Every individual chart will hold up under review. But the group’s distribution is now a textbook outlier signature, and any peer-comparison model run by a UPIC or a Medicare Advantage (MA) payer will flag them as a target. The clinical work did not change. The statistical footprint changed dramatically.
Pattern Two — modifier 25 and modifier 59 patterns in a specialty practice.
A specialty practice using AI-assisted coding sees its modifier 25 application rate rise from 16 percent to 31 percent of eligible encounters over a year. The model is technically correct each time – there is documentation supporting a separately identifiable E&M service – but the cumulative pattern is the kind of signature that National Correct Coding Initiative (NCCI) edit analytics and contractor behavioral models are explicitly designed to catch. Modifier utilization is one of the most heavily monitored signals in claims surveillance, and a sustained shift of that magnitude almost certainly triggers review.
The practice may be entirely defensible, chart by chart, and still find itself in an extrapolated overpayment demand because the population-level signature is what drove the audit selection.
Pattern Three — diagnosis code expansion in a Medicare Advantage context.
This is the one regulators are watching most closely right now. AI documentation tools are very good at surfacing chronic conditions that are historically under-documented: diabetic complications, vascular disease, depression, CKD staging, and the rest. From a clinical accuracy standpoint, that is arguably an improvement. From a Risk Adjustment Data Validation (RADV) audit standpoint, it is a detection-model magnet. RADV contractors are now running their own NLP against the same charts to test whether the conditions submitted on encounter data are actually supported by the documentation. When provider-side AI is suggesting the diagnosis, and contractor-side AI is auditing it, the validation cycle has become essentially algorithmic, on both ends.
In each of these cases, the human professionals are doing exactly what they are supposed to do. Clinicians are reviewing AI-generated documentation. Coders are accepting reasonable code suggestions. Compliance officers are running periodic chart audits and finding nothing wrong. The problem is not chart-level; it is distributional. And distributional problems are invisible to chart-level compliance programs.
Why Most Organizations Are Not Prepared for Either Side
On the documentation side, the unpreparedness shows up as a governance gap. AI-assisted documentation tools are being deployed at the practice level, without a coherent answer to a deceptively simple question: when an algorithm proposes a diagnosis and a clinician accepts it, who owns the accuracy? The clinician signed the note. The vendor wrote the model. The hospital deployed the workflow.
The False Claims Act (FCA) will not care about that ambiguity. Neither will the 60-day overpayment rule under Section 6402(a) of the Patient Protection and Affordable Care Act (PPACA), which requires identification, quantification, and return of overpayments within 60 days of identification.
And here is where the FCA exposure gets uncomfortable. The “reckless disregard” standard has been interpreted to include situations in which a provider should have known about a problem and failed to investigate it. Deploying an AI documentation tool, observing a measurable shift in coding distributions, and choosing not to investigate that shift starts to look very much like reckless disregard – particularly if the shift increases reimbursement.
Compliance programs that treat AI-assisted documentation as a productivity tool, rather than a regulated activity, are accumulating exposure they cannot yet see.
On the enforcement side, the unpreparedness shows up as a sophistication gap. Most provider-side compliance programs are still oriented around individual chart audits, the assumption being that compliance risk is identified one record at a time.
But contractor-side enforcement has moved toward population-level statistical surveillance. The Centers for Medicare & Medicaid Services (CMS) Program Integrity Manual, particularly Chapter 8, has long contemplated statistical sampling and extrapolation, but the data analytics that drive audit selection in the first place have evolved well beyond what most provider compliance officers track.
Outlier detection does not look at your charts. It looks at the shape of your data, compared with the shape of your peers’ data. By the time you are flagged, the model has already concluded that something about your behavior is statistically improbable. The audit is the consequence of that conclusion, not the source of it.
Defending against population-level statistical surveillance with chart-level compliance tools is like bringing a microscope to a satellite photo. The scale is wrong. Once a contractor has built a sample, applied RAT-STATS to extrapolate, and issued a demand, the provider is left arguing about the methodology of a process whose entire premise – that the provider’s behavior was anomalous enough to warrant review – was established algorithmically before the first chart was ever pulled.
What Compliance Has to Become
If both sides are running similar technology, the compliance program of the future cannot be a paper-based, one-chart-at-a-time operation. It must become something closer to what the enforcement side already is: a continuous, model-aware, population-level monitoring function.
That means understanding your own data the same way a regulator’s anomaly detection model understands it: not as a collection of charts, but as a distribution. It means knowing where you sit relative to peer benchmarks, before a contractor tells you. It means running your own outlier detection against your own claims, ideally on a continuous basis, and treating distributional drift as a leading indicator, rather than a coincidence. It means treating AI-assisted documentation as a governed activity, with validation discipline, override tracking, and clear ownership of accuracy. And it means recognizing that statistical surveillance is now a permanent feature of the regulatory landscape, not a periodic event.
Practically, that translates into a short list of things compliance programs should be doing now. Track E&M distributions monthly against specialty peer benchmarks, not just internal historical baselines. Monitor modifier utilization rates – particularly modifier 25 and modifier 59 – and investigate sustained shifts. Audit a stratified sample of AI-assisted encounters for whether the documentation reflects the encounter as performed, not just whether the codes are technically defensible. Establish a written governance framework that names an owner for AI-generated coding accuracy. And build (or buy) the analytic capability to look at your own claims the way a contractor’s model looks at them.
None of this requires the most expensive AI tools on the market. It requires understanding of the dynamic. The provider organizations that adapt will not be the ones with the biggest technology budgets. They will be the ones whose compliance programs have absorbed the central insight: compliance has become a contest between models, and human professionals on both sides are increasingly interpreters, rather than originators, of the analytic work.
The Underlying Point
For the first time in this field, healthcare compliance may become less about individual charts and more about competing statistical models interpreting the same patient populations. That is not a futurist’s prediction. It is a description of what is already happening in audit selection, in coding workflows, in extrapolation defenses, and in the early case-building stages of every major Medicare program integrity contractor.
The grandmaster analogy is the right one. Both sides are studying the same board. Both sides are running similar engines.
The question is no longer whether your organization will encounter algorithmic auditing; it is whether your compliance program is positioned to play the game, or only to react to it after the move has already been made.









