Search
  • Guy Tsafnat

What to expect from a good clinical data model?

Data collected as part of routine care holds a treasure trove of insights about what works and doesn’t work in healthcare “as practiced”. Mining such data for insights that can improve quality, increase efficiency, inform decision makers and underpin important research, is another matter. Real world clinical data is stored in specialized databases optimized for specific functions of clinical care often in multiple incompatible systems. Aggregation, harmonization and organisation of real-world data in bespoke data models for use with business analytics tools is expensive and only shifts the problem to data consumers: senior administrators, researchers and practitioners. Furthermore, development of robust and versatile models for clinical data is expensive and rarely stands the test of time.

In addition to all the features that make a good data model: storage efficiency, interoperability, data access control etc. a clinical data model has specific warehousing demands that are unique to healthcare:


A good clinical data model is future- and past- proof

The most common reason clinical data warehouses are not used is arguably the failure to evolve with data. Clinical data represents a large number of procedures, medications, clinical care pathways and more, and all of those change over time. Medical practice is continuously improving as new technologies and new research emerge. Apart from the obvious complexity of such a model, it is rare for already stretched hospital IT departments to have the resources needed to test and maintain such a model.


It is easy to transform data into a clinical data model

The cost of transforming data between data models is typically high in every industry, and creating a bespoke data model can seem like it can simplify this process as it can be designed to deviate minimally from the original data. This benefit is often overestimated when model maintenance costs and the effort still required to aggregate from other data models are forgotten or discounted.


Governance of clinical data is faster and easier with a good data model

Clinical data is sensitive, usually identifying and has unique governance rules that include duty of care, ethical and custodianship considerations. General-purpose data warehouses fit these requirements poorly leading to non-compliant practices and prolonged and iterative data extraction requests.

Long waits for data extracts are the norm in most healthcare systems because data provisioning systems are inadequate for the scale of the demand. Demand is unnecessarily high, extracts for users unfamiliar with the data model inevitably have to be repeated several times until all the data for the use case is included.

Perhaps even more important, specialized clinical data warehouses, such as Evidentli’s, can provide access to data without needing to extract it, provide user interfaces for access approval and management in line with ethics workflows, and perform domain-specific operations like de-identification.


Most importantly, a good data model facilitates new insights

It may seem obvious that gaining insights from data is the main reasoning for warehousing data in the first place, but the large number of technical and socio-legal considerations that need to go into a clinical data warehouse design, it is perhaps not surprising how easy it is to lose sight of the goal.

The large community, the availability of open source and commercial tools and competitive professional services, should make OMOP the obvious choice because it uniquely provides all of them. But OMOP provides more than that. A pillar of medical research is collaboration. Collaboration in medical research means peer-review, co-design, and independent reproduction of research and results on multiple data sets and OMOP caters for those like no other data model.

This means analysts can run tried and tested data analyses, compare analytic results with other institutions, share cohort definitions, and organically form collaborative communities around specific clinical questions.


Every cloud has a silver lining

An open-source common data model for secondary uses of clinical data, with ample documentation and an engaged global community of thousands of members from around the globe. The Observational Healthcare Data Sciences and Informatics (OHDSI) is a not for profit members organisation that develops and maintains a common data model most known as OMOP.

OMOP is tested on a large variety of data collected in different ways and for different purposes for over a decade. OMOP also comes with a complete set of guidelines on how to extend wherever needed. A community of volunteers continues to maintain it efficiently and quickly as can be seen from the speed with which challenges brought about by the COVID-19 pandemic have been handled. New treatment pathways, diagnoses, tests and vaccines were added to the model well in advance of the respective products entering the market. These were joined by a number of additional technologies such as COVID-19 patient simulators and cohort definitions ready to be used in any OMOP data set.

Transformation of data into OMOP is simplified by its community. Public online forums see user questions answered within minutes or hours from any time zone. Open source and commercial tools are available to drastically reduce the data transformation time. Taking Evidentli as an example, a number of dedicated AI algorithms use natural language processing, machine learning and more to transform data into OMOP.

Taking advantage of OMOP's open documentation and large community of potential helpers allows users to request the right data the first time, lowering the load on systems and administrators and getting users the data they need at a fraction of the time. Importantly, data administrators and custodians can manage access to the right data, and only the right data because requests and permissions are expressed using the same model.

Using OMOP extensions, additional analyses can be unlocked including: to analyze phenotype, notes, genotype and radiology data together, to import data directly from FHIR streams automatically, update analysis results in near real-time, to develop portable AI and other algorithms and much more.


further reading: Pitfalls in Developing HealthCare Data Warehouses


#clinicaldata #datawarehouse #datamodel #omop #research #automation #ai #healthit #machinelearning #healthanalytics #clinicalanalytics #medicalresearch #artificialintelligence #datastandards #insights #evidence #ebm #database #ehr #realworlddata #realworldevidence #datagovernance #dataaccess #datamanagement #etl #ingestion #genomics #radiology #phenotypes #datastructures #schemas #patientdata #longitundonalstudies