Our Posts

Digital Health Startups Don’t Produce Evidence

No Comments

A recent article in JMIR paints a bleak picture of evidence produced by digital health startups: they rarely use enough evidence! This limits their ability to demonstrate value from their products. The conclusion of the paper is that more investment in clinical validation is needed, but stops short of saying what that means. Evidentli already made significant investments in clinical robustnes that we can share here. But first, a quick story:

Super Digital Diagnostics LLC* developed an AI that can predict the risk of being pre-diabetic much more accurately than traditional methods that rely almost exclusively on HbA1c levels. Super Digital's software looks at 10 additional factors including history of type 2 diabetes mellitus (we'll call it diabetes here for short) in the family, body mass index (BMI), low density lipoprotein (LDL) cholesterol, smoking status and alcohol consumption. The company has determined these to be good predictors when combined through a complex neural network.

Before we go on, let's review why accuracy of this test is important: There are four options:

  • If the test shows a pre-diabetic person is indeed pre-diabetic, and especially if it can detect it much earlier than existing tests, that person has a much better chance of controlling or avoiding diabetes. That is a great outcome.
  • If a person who is suspected of being pre-diabetic but is not, in fact, pre-diabetic, would avoid costly and prolonged treatments and lifestyle changes that are completely unnecessary.


  • If the test shows a healthy person is pre-diabetic, or a pre-diabetic person is health, those people would not be so luck. A healthy person might go through unnecessary treatment, and a sick person will not get the care that they need.

So, the big question is: How would you know how accurate is the test? The first answer that should come to your mind is Clinical Trials. And indeed that was the answer that Super Digital came up with.

Super Digital partnered with a big university hospital in North Carolina that provided access to historical data that was used to train the AI, and together they conducted a randomized controlled trial that showed that the algorithm indeed works better than the traditional method.

With preliminary confirmation that the AI works, Super Digital applied for regulatory approval for their software as a medical device. They partnered with another university hospital, this time in South Carolina. They raised the funds for a state of the art, double-blind, prospective, randomized, controlled trial and then hired a contract research organisation that conducted the trial over 6 months. To their surprise, the trial did not yield the expected results which was very disappointing to the company to say the least. A failed trial set the company back months or even years as well as devaluing it at the same time.

What went wrong? It's the data

Training an AI means teaching it to recognize patterns in dataset B that appeared in dataset A (also know as the training set). When the two datasets are vastly different, the AI will not be able to find the patterns of one in the other. Hospital data is just that different. Upon investigation, Super Digital found several issues and here are just some:

  • The NC hospital routinley measures LDL cholesterol for patients with any metabolic disorder. In the SC hospital, LDL is measured only in cases where liver or cardiovascular disease are suspected.
  • The NC hospital measures HbA1c in an in-house lab that uses metric units. The SC hospital uses two external labs, one uses metric units and the other uses imperial units.
  • The SC hospital never reocrds BMI and only record patient weight when they are overweight or obese, and never their height so calculating BMI is impossible.
  • The SC hosptial records smoking status seperately from alcohol consumption status but the NC hospital records them together as a "lifestyle risk factors".

Other issues that were overcome by the technical team may have inadvertantly also contributed to the problem:

  • The two hospitals use different vendors for their medical record databases, and these organize data differently. The technical team had to come up with a new method to link patients to their relatives to collect family history: this new method was never tested and its effect on the AI's predictions is unknown.
  • The data was organized differently so the technical team had to add a pre-processing step to rearrange the data so it can be understood by the AI: data degrades with every pre-processing and transformation step; the effect of this new step on the AI is unknown.

There are other, more technical reasons that we could add here but the message is clear: changes in the data have a significant effect on algorithmic accuracy. This is a problem that is more pronounced in healthcare than in other industries. It impeded digital health innovations from scaling because the evaluation of the accuracy needs to be re-evaluated at every site.

The solution is to standardise both datasets to an open standard (OMOP) before the data is used in training and predictions. This helps identify missing or incomplete data, and avoids the need for the technical team to adjust the algorithm to every site. Most importantly, it makes the training data more like the data that the predictions are actually made on.

For data to be standardised it also needs to be cleansed and usually aggregated from multiple systems. This used to be an expensive and prolonged step that would have made Super Digital's product too expensive to be viable. Today Evidentli can accelerate this process, reducing time, prerequisite skillset and data degradation by two orders of magnitude. With Evidentli as a platform, AI vendors can develop scalable, reproducible and inexpensive decision support, reporting tools and dashboards.

*All names and locations have been changed.

Recent Posts

Leave a Reply

Comments, questions or queries? Let us know!"