• Guy Tsafnat

DNA: Data Needed for AI

AI is the term we use for algorithms that perform cognitive tasks when applied to data in different ways. These tend to be very specific to the data they are trained on. For example, algorithms trained on x-ray images of shoulder fractures, were shown to work well, but when trained on x-ray images of foot fractures did not perform as well. This is quite common in the field of AI and a couple of reasons are typically combined this phenomena exist:

  • The problem is materially different than the language we used to describe it. That is, we use the same noun to describe shoulder and foot fractures but the manifestation of these conditions could be so different so as to require different ways of detecting them.

  • The equipment used to create these images is different and while people may not even notice the difference. Nonetheless, these differences may be significant to an AI.

  • The people that labelled the data (in this example, identified the fractures in x-ray images) for one task, operated on different assumptions than the ones who labelled the data in the second example.

Other issues also often arise but they follow the same pattern: Data sets are significantly different from each other because of their meaning, form and intent. The fact that we have a very good interoperability standard for x-ray images (called DICOM) is little help for this problem.

In other medical data this is even harder and huge communities of data and clinical scientists are working on creating common data models. Common data models include the context and a view of the data that is consistent for different sites and different data sets regardless of where they are from, who produced it and how. By far the most widely used common data model is the OHDSI collaboration we are proud to be members of.

The challenge of transforming source data to the standard remains a major hurdle for many and that's why Evidentli researched and developed tools and AI that significantly accelerate this process by up to 50 times. This not only helps AI to work seamlessly on different data sets but also accelerates the production of other kinds of medical evidence to drive better and cheaper healthcare.

You could say that AI is in our DNA.