Version: 1.0.0 | Published: 27 May 2025 | Updated: 194 days ago
University College London Hospitals NHS OMOP dataset
Dataset
Documentation
Associated Media:
Description:
UCLH has an OMOP extraction system (omop_es) that connects our Electronic Health Record (EHR) to an architecture that delivers high quality, standardised extracts meeting the OMOP CDM standards. Our EHR contains records for 6 million patients, 13 million diagnoses and 50 million medication events. These derive from the UCLH patient population which includes national referrals for tertiary and quaternary services (cancer, neurology etc.) and general medical admissions from an inner city teaching hospital that treats >1m outpatients per year, and has >100k inpatient admissions.
UCLH has invested efforts and expertise to align international terminology systems e.g. SNOMED CT, LOINC, UCUM with NHS data standards, during EHR system build and post implementation. Our standardisation work has covered clinical domains i.e. Diagnosis and past medical history, Surgical and Ambulatory procedures, Diagnostic Imaging, Cardiac Echo, Lab Medicine including Biochemistry, Haematology, Microbiology, Immunology, Virology, Allergens, Medications (including route of administration); and Demographic information like Religion, Ethnicity. For some domains (e.g. diagnosis and surgical procedures) we have achieved 100%
standardisation, others are an ongoing task.
Our data pipeline, the OMOP-Extraction System (OMOP-ES) is a modular, re-usable architecture written in over 20,000 lines of R. Extractions proceed through four stages.
1. Standardisation - translates source data to OMOP concepts at full fidelity
2. Projection - applies rules to redact, filter, transform & link
3. Post-processing - allows linking of de-identified non-OMOP data
4. Output - multiple formats & destinations incl. CSV, Parquet or SQLite for direct use or import in a TRE
The system is
● configurable to a variety of OMOP projects via a settings file
● reproducible and automated
● queries EPIC EHR and other sources
● automates filtering of sensitive data with safe defaults and ability for Information Governance teams to inspect settings before & after running
● tests and reports quality of standardisation
● being extended both by the "core" team and by other trusts in an inner source fashion
● has a small mock database for system development and testing
Coverage
Spatial:
United Kingdom
Typical Age Range:
0-110
Provenance
Origin
Purposes:
- Care
- Administrative
Sources:
EPR
Collection Situations:
- Secondary care - Accident and Emergency
- Secondary care - Outpatients
- Secondary care - In-patients
- Secondary care - Ambulance
- Secondary care - ICU
Temporal
Accrual Periodicity:
Irregular
Start Date:
01 April 2019
Time Lag:
Variable
Accessibility
Access
Access Rights:
Jurisdictions:
UK
Data Controller:
University College London Hospital (UCLH)
Data Processor:
University College London Hospital (UCLH)
Usage
Data Use Limitations:
General research use
Data Use Requirements:
Ethics approval required
Format and Standards
Vocabulary Encoding Schemes:
- OPCS4
- SNOMED CT
- DM+D
- LOINC
- ICD10
- RXNORM
- RXNORM EXTENSION
Conforms To:
OMOP
Languages:
en
Formats:
- parquet
- csv
Observations
Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
1200000
count
30 April 2025