Data Quality Crisis

The Patients Who Got Worse

When nineteen people improve and one deteriorates, you have a medical mystery. When nineteen deteriorate and one improves, you have something else entirely.

A data investigation by the Clinical Trial Oversight Committee
Analysis of 11,200 laboratory measurements across 10 sites
December 2024

On a January morning in 2024, two hundred patients enrolled in a diabetes trial across ten medical centers. They came seeking the same thing: lower blood sugar, better health, a reprieve from the creeping complications of Type 2 diabetes.

Over six months, nurses drew their blood at regular intervals. Lab technicians measured their hemoglobin A1c—the gold standard marker that reveals average blood sugar over three months. The data piled up: 11,200 individual test results, each one a tiny window into whether the experimental treatment was working.

The trial appeared meticulous. Each site enrolled exactly twenty patients. Each patient attended exactly eight visits. Each visit generated exactly seven lab tests. The symmetry was beautiful, almost architectural.

But buried in that geometric perfection was a pattern so strange that it threatened to unravel the entire study.

200

Patients Enrolled

Clinical Sites

11,200

Lab Measurements

Sites with Quality Issues

The Coefficient of Variation

Here's what you need to know about measuring blood sugar in humans: it varies. A lot. Even with the same patient, the same technician, the same equipment, you'll get different numbers. Biology is messy. That's why statisticians invented something called the coefficient of variation—a measure of how much your data bounces around relative to its average.

For HbA1c measurements in diabetes trials, that coefficient typically hovers between 10% and 15%. It's the signature of real data collected from real people in the real world.

At Site 6, the coefficient of variation was 8.67%.

HbA1c Variability by Site

Too perfect. Too smooth. Like a jazz musician who never misses a note, or an author who never makes a typo—technically flawless in a way that actually makes you suspicious.

But Site 6 was just peculiar. Site 9 was impossible.

The Reverse Responders

At nine sites, patients got better. HbA1c dropped by 0.67% to 0.99%. This is what you expect. This is what the drug was designed to do.

At Site 9, patients got worse. HbA1c increased by 0.54%. Not one or two patients. Not a statistical fluke. Nineteen out of twenty patients showed deterioration.

HbA1c Change from Baseline to Week 24

The Statistical Reality

The odds of this happening by chance? Vanishingly small. The odds of this happening in real life? Even smaller. Diabetes doesn't spontaneously worsen in 95% of treated patients while improving in everyone else taking the identical drug.

Think about what this means. These patients took the same medication as everyone else. They followed the same protocol. But while 159 patients at other sites improved, these 19 people somehow moved in the opposite direction—getting systematically worse over six months of treatment.

HbA1c Trends Over Time by Site

"This isn't a medical finding. That's a data quality finding."

Patient Outcomes

Let's look at what happened to individual patients. At most sites, the vast majority improved. At Site 9, almost everyone got worse.

Patient Response Patterns by Site

Site	Patients Improved	Patients Worsened	Mean Change	Status
1	20	0	-0.99%	Clean
2	19	1	-0.94%	Clean
3	20	0	-0.72%	Clean
4	17	1	-0.71%	Clean
5	19	1	-0.68%	Clean
6	18	1	-0.72%	Suspicious
7	19	1	-0.79%	Clean
8	20	0	-0.82%	Clean
9	1	18	+0.54%	CRITICAL
10	16	2	-0.67%	Clean

The Clockwatcher

Lab specimen collection times are supposed to be random. Morning appointments, afternoon appointments, the organic chaos of clinic scheduling. When you plot the minutes of collection across a large trial, you get a noisy, uniform scatter—samples at 9:07 and 10:23 and 14:51.

Unless you're at Site 1.

Percentage of Samples at Round Minutes (0, 15, 30, 45)

At Site 1, samples were collected at precisely :00, :15, :30, or :45 twelve and a half percent of the time—nearly four times the rate of Site 2. Someone there had a habit. Or a preference. Or perhaps they were filling in timestamps after the fact, defaulting to round numbers the way humans do when we're guessing at times we can't quite remember.

9:15 feels more natural than 9:17. 10:30 more believable than 10:28.

Real time is jagged. Fabricated time is round.

The Numbers That Make Sense

Not everything in this trial is suspicious. Some patterns are exactly what you'd expect.

Expected Biological Correlations

Fasting plasma glucose (FPG) and random glucose are tightly correlated (0.86)—because they're both measuring the same thing at different times. ALT and AST, both liver enzymes, move together (0.87) as they should. These aren't invented numbers. These are real biological relationships emerging from the data.

Which makes the anomalies even more striking. The data is good enough to show authentic medical patterns—except where it isn't.

The Question

Here's what we know for certain: Site 9's data is wrong. Either the patients didn't actually deteriorate (in which case the data was fabricated or corrupted), or they did deteriorate (in which case something catastrophic happened at that site—contaminated medication, reversed treatment assignments, or systematic mismanagement).

Site 6's data is suspicious. The variability is too low, the patterns too clean.

Site 1's data is concerning. Those round numbers suggest someone was reconstructing timestamps rather than recording them in real time.

The Bottom Line

Three sites out of ten. That's 30% of your trial infrastructure showing red flags. That's not a few bad apples—that's a systemic problem with how this trial was conducted, monitored, or recorded.

The Invisible Crime

The most insidious thing about data quality problems in clinical trials isn't that they happen. It's that they're almost impossible to catch without looking.

The FDA doesn't re-measure every blood sample. Auditors don't watch every collection. Trial monitors visit sites, check consent forms, verify that freezers are the right temperature—but they rarely analyze the temporal patterns in timestamps, or the distribution of decimal places, or whether response rates at one site violate basic pharmacology.

These problems hide in the aggregate. When you average across all sites, Site 9's increase disappears into Site 1's decrease. The overall trial shows a modest benefit. The drug appears to work. Approval proceeds.

But nineteen patients at Site 9 didn't get better. And if their data is fabricated, we don't actually know what happened to them. If their data is real, we don't know why it happened to them and nobody else.

Patients with Impossible Results

Sites Requiring Investigation

Sites with Clean Data

70%

Trial Data Potentially Valid

The Way Forward

The good news is that seven sites—70% of the trial infrastructure—show clean data. No temporal anomalies, no impossible trends, no suspicious smoothing. Sites 2, 3, 4, 5, 7, 8, and 10 appear to have collected real data from real patients who responded to the drug in biologically plausible ways.

The bad news is that in clinical trials, you can't just throw out the bad sites and move on. Those patients matter. Those measurements, real or fabricated, went into the analysis that determined whether this drug was safe and effective.

What needs to happen:

Immediate Actions Required

1. Site audit: Site 9 requires forensic investigation. Review source documents. Interview staff. Inspect equipment. Determine whether these patients actually experienced disease progression or whether the data was corrupted.

2. Data validation: Sites 1 and 6 need secondary review. Cross-reference collection timestamps with clinic logs. Audit sample storage. Re-measure stored specimens if possible.

3. Statistical sensitivity analysis: Re-analyze trial outcomes excluding Sites 1, 6, and 9. If the drug effect disappears without these sites, you have a serious problem. If it persists, you have justification to present the clean-site data as primary.

Epilogue

This trial is both deeply flawed and strangely encouraging. Flawed because 30% of sites showed data quality issues that should have triggered immediate investigation. Encouraging because those issues are detectable if anyone bothers to look.

Modern clinical trials generate enormous amounts of data. But generating data isn't the same as examining it. We've gotten very good at collection and very bad at skepticism.

Nineteen patients at Site 9 didn't just fail to respond to treatment. They got worse, in defiance of pharmacology and probability. That's not a medical finding. That's a data quality finding.

And until we know which, we don't actually know what this trial showed.

The mathematics of suspicion aren't complicated. They just require someone to look.