What to expect from a good data model

February 1, 2023


Data collected as part of routine care holds a treasure trove of insights about what works and doesn’t work in healthcare “as practiced”. However, mining such data for insights that can improve quality, increase efficiency, inform decision makers and underpin important research, is another matter. Real world clinical data is stored in specialised databases optimized for specific functions of clinical care, often in multiple incompatible systems - it is fragmented.

For healthcare data to be made usable at scale, it first needs to be standardised into a common data model. 

So, what makes a good generic data model? Storage efficiency, interoperability, data access control etc. But in addition to that, a clinical data model has specific warehousing demands that are unique to healthcare. We discuss four important features of a robust clinical data model. 

  1. It is future-proof

Arguably, the most common reason clinical data warehouses are not used is its failure to evolve with data. Clinical data represents a large number of procedures, medications, clinical care pathways and more, and all of these change over time. Medical practice is continuously improving as new technologies and new research emerge. Apart from the complexity of data model to adapt and expand with these changes, it is rare for already stretched hospital IT departments to have the resources needed to test and maintain such a model.

  1. It is easy to transform data into the clinical data model.

The cost of transforming data between data models is typically high in every industry, and creating a bespoke data model may seem like a viable solution, as it can be designed to fit the original data. However, the benefit is often overestimated when model maintenance costs and the effort still required to aggregate from other data models aren’t taken into account. A good data model should be accommodate data from multiple models, sources and formats e.g. administrative, claims, laboratory data etc. 

  1. Governance of clinical data is fast and easy

Clinical data is sensitive, usually identifying and has unique governance rules that include duty of care, ethical and custodianship considerations. General-purpose data warehouses fit these requirements poorly leading to non-compliant practices and prolonged and iterative data extraction requests.

Long waits for data extracts are the norm in most healthcare systems because data provisioning systems are inadequate for the scale of the demand. Demand is unnecessarily high, extracts for users unfamiliar with the data model inevitably have to be repeated several times until all the data for the use case is included.

Purpose-built clinical data warehouses, such as Evidentli’s, can provide access to data without needing to extract it, provide user interfaces for access approval and management in line with ethics workflows, and perform domain-specific operations like de-identification.

  1. It facilitates collaboration.

A pillar of medical research is collaboration. Collaboration in medical research means peer-review, co-design, and independent reproduction of research and results on multiple data sets - it’s important for a clinical data model to cater for that. The model should allow analysts to run tried and tested data analyses, compare analytic results with other institutions, share cohort definitions, and organically form collaborative communities around specific clinical questions.


Fortunately, there exists a clinical data model with all of the above features. Observational Healthcare Data Sciences and Informatics (OHDSI) is a not-for-profit members organisation that develops and maintains a common data model known as OMOP.

OMOP is open source, provides access to commercial tools and competitive professional services and is supported by a global community of over 3000 organisations.

OMOP is tested on a large variety of data collected in different ways and for different purposes for over a decade. It also comes with a complete set of guidelines on how to extend wherever needed. A community of volunteers continues to maintain it efficiently and quickly as can be seen from the speed with which challenges brought about by the COVID-19 pandemic have been handled. New treatment pathways, diagnoses, tests and vaccines were added to the model well in advance of the respective products entering the market. These were joined by a number of additional technologies such as COVID-19 patient simulators and cohort definitions ready to be used in any OMOP data set.

Transformation of data into OMOP is simplified by its community. Public online forums see user questions answered within minutes or hours from any time zone. Open source and commercial tools are available to drastically reduce the data transformation time. Taking Evidentli as an example, a number of dedicated AI algorithms use natural language processing, machine learning and more to transform data into OMOP.

Taking advantage of OMOP's open documentation and large community of potential helpers allows users to request the right data the first time, lowering the load on systems and administrators and getting users the data they need at a fraction of the time. Importantly, data administrators and custodians can manage access to the right data, and only the right data because requests and permissions are expressed using the same model.

Using OMOP extensions, additional analyses can be unlocked including: phenotype analysis, notes, linking genotype and radiology data, automatically importing data directly from FHIR streams, updating analysis results in near real-time, developing portable AI and other algorithms and much more.

Evidentli’s Piano Data Automation platform is built on OMOP standards, and provides a scalable, future-proof data warehousing solution for healthcare. 


We're happy to help.

Contact usBook a demo
Book a Demo