client logo
Version: 1.0.0 | Published: 27 May 2025 | Updated: 194 days ago

University College London Hospitals NHS OMOP dataset

Dataset

Documentation

Associated Media:
Description:
UCLH has an OMOP extraction system (omop_es) that connects our Electronic Health Record (EHR) to an architecture that delivers high quality, standardised extracts meeting the OMOP CDM standards. Our EHR contains records for 6 million patients, 13 million diagnoses and 50 million medication events. These derive from the UCLH patient population which includes national referrals for tertiary and quaternary services (cancer, neurology etc.) and general medical admissions from an inner city teaching hospital that treats >1m outpatients per year, and has >100k inpatient admissions. UCLH has invested efforts and expertise to align international terminology systems e.g. SNOMED CT, LOINC, UCUM with NHS data standards, during EHR system build and post implementation. Our standardisation work has covered clinical domains i.e. Diagnosis and past medical history, Surgical and Ambulatory procedures, Diagnostic Imaging, Cardiac Echo, Lab Medicine including Biochemistry, Haematology, Microbiology, Immunology, Virology, Allergens, Medications (including route of administration); and Demographic information like Religion, Ethnicity. For some domains (e.g. diagnosis and surgical procedures) we have achieved 100% standardisation, others are an ongoing task. Our data pipeline, the OMOP-Extraction System (OMOP-ES) is a modular, re-usable architecture written in over 20,000 lines of R. Extractions proceed through four stages. 1. Standardisation - translates source data to OMOP concepts at full fidelity 2. Projection - applies rules to redact, filter, transform & link 3. Post-processing - allows linking of de-identified non-OMOP data 4. Output - multiple formats & destinations incl. CSV, Parquet or SQLite for direct use or import in a TRE The system is ● configurable to a variety of OMOP projects via a settings file ● reproducible and automated ● queries EPIC EHR and other sources ● automates filtering of sensitive data with safe defaults and ability for Information Governance teams to inspect settings before & after running ● tests and reports quality of standardisation ● being extended both by the "core" team and by other trusts in an inner source fashion ● has a small mock database for system development and testing

Coverage

Spatial:
United Kingdom
Typical Age Range:
0-110

Provenance

Origin

Purposes:
  • Care
  • Administrative
Sources:
EPR
Collection Situations:
  • Secondary care - Accident and Emergency
  • Secondary care - Outpatients
  • Secondary care - In-patients
  • Secondary care - Ambulance
  • Secondary care - ICU

Temporal

Accrual Periodicity:
Irregular
Start Date:
01 April 2019
Time Lag:
Variable

Accessibility

Access

Jurisdictions:
UK
Data Controller:
University College London Hospital (UCLH)
Data Processor:
University College London Hospital (UCLH)

Usage

Data Use Limitations:
General research use
Data Use Requirements:
Ethics approval required

Format and Standards

Vocabulary Encoding Schemes:
  • OPCS4
  • SNOMED CT
  • DM+D
  • LOINC
  • ICD10
  • RXNORM
  • RXNORM EXTENSION
Conforms To:
OMOP
Languages:
en
Formats:
  • parquet
  • csv

Observations

Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
1200000
count
30 April 2025