EDITOR’S NOTE: This RACmonitor series examines a new type of auditing in healthcare – the use of algorithms. In the first article in this series, “Artificial Intelligence is Coming to Medicare Audits – Is Your Legal Team Obsolete?” Mr. Roche reported on statistical sampling being supplanted by more complex methods based on machine learning and artificial intelligence. In this article, Mr. Roche reports on the recent study published in Science Magazine about the algorithm used by Optum that reportedly has a built-in racial bias.
Healthcare is a giant part of the gross domestic product (GDP) of the United States – around 20 percent. Private insurance handles around 34 percent of the total cost, Medicare 20 percent, Medicaid 17 percent, and the rest is paid some other way.
The healthcare industry is an ocean of data. One-quarter of total hospital costs are spent on billing and coding. For each doctor, there are around five administrative persons dedicated to handling the paperwork. Each year sees the processing of around 30 billion healthcare transactions. In the United States alone, the health authority, the Centers for Medicare & Medicaid Services (CMS), processes around 1.3 billion Medicare claims per year, more than four per second. Each claim contains a number of codes describing services provided to the patient. In 1996, there were 3,500 CPT codes; by 2019, the American Medical Association (AMA) published 10,294.
The scope of healthcare data is truly incredible. Patient records are a treasure trove. Medical files, prescriptions were written, and drugs are taken, dental histories, all of those results that came back from laboratories, the numerous notes from physicians, nurses, technicians, and therapists, the piles of insurance information, job records – these are only a few of the numerous data sets surrounding a patient. This can be combined with treatment information from hospitals and even more details from insurance companies, Medicare, Medicaid, employers, gene testing services, family histories, and other sources.
Naturally, this vast ocean of data is ripe for research. Supercomputers and big data mining have enabled correlative research simultaneously across hundreds of different data sets. The concept is simple enough – by analysis of data, it is possible to find patterns. Patterns can lead to understanding the effectiveness of different types of healthcare. Today, we see algorithms being designed to improve patient outcomes by identifying healthcare delivery protocols tailored to specific classes of patients.
But there is a complication: money. The cost of healthcare is exorbitant. Difficult choices must be made regarding costs and benefits. Algorithms can help in matching the mix of various healthcare services to the best patient outcomes, and at the lowest cost.
If a patient is a long-term customer for an insurance company, they can be thought of as something like a car needing regular check-ups and preventative maintenance. This will help keep it running much longer, and lower costs of operation. Of course, each car, or person, is different, but with algorithms, keep in mind that the protocol model is derived from data averaged over millions of units.
There are a number of popular algorithm models being used, and some are commercial. These include ACG System from Johns Hopkins University; Impact Pro, provided by Optum; DxCG Intelligence from Verisk Health; the Truven Cost of Care Model, from IBM’s Truven Health Analytics; the SCIO Cost of Care Model, from SCIO Health Analytics, and HHS-HCC from the CMS. One important output of these models is a risk score for each person. This can be created from diagnosis, pharmacy, prior costs, and other factors, in any combination. In the United States, healthcare for around 200 million persons per year is managed by these algorithms. First, patients with complex healthcare needs are identified; the algorithm then can target patients for “high-risk” maintenance programs. The result: better health, lower costs.
Do Algorithms Always Work?
How well do the algorithms work? A recent study published in the scientific journal Science shows they certainly are not magic. A team of researchers from Berkeley’s School of Public Health, Boston’s Brigham and Women’s Hospital, Mass General Hospital, and the University of Chicago Business School found that Impact Pro by Optum, a major, widely used commercial prediction algorithm being used to identify patients with complex healthcare needs, “exhibits significant racial bias.” They write:
At a given risk score, black patients are considerably sicker than white patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of black patients receiving additional help from 17.7 to 46.5 percent.
Without a doubt, this was completely unanticipated by Optum, which reportedly is working to re-calibrate. But it indicates problems ahead in the world of algorithms and healthcare.
This opens up an entirely new line of research into the use of algorithms. It shows that a predictive model that seems to be accurate and reasonably reliable actually can have serious hidden problems.
Testing the accuracy of algorithms is important to research. For example, some researchers have noted that “one new development . . . has been the introduction of . . . models that aim to predict more than simple relative risk [including] . . . probabilities of hospitalization.” The unique nature of the Science article is its reference to race.
Doing scientific work to test healthcare algorithms is difficult. Many are concealed behind a wall of intellectual property protection. They may be trade secrets, and unlike patents, unavailable to the public. Discussing the methodologies used to do this is beyond the scope of this article.
Implications for Health Law
It is likely that if comparing European-descent to African-descent Americans yielded such a notable discrepancy gap in the utility of an algorithm, then perhaps another study comparing Asian-descent Americans would do the same. Another possibility would be algorithm testing according to genetic markers, such as HTT at 4p16.3, BRCA, MLH, or PMS2.
It is too early to know whether a new wave of class-action suits will arise from racial or gene bias problems in algorithms. But we can be sure that algorithms will open up a new area of litigation.
We will continue our discussion of algorithms in the next article in this series.