client logo
Version: 1.0.0 | Published: 8 Oct 2024 | Updated: 229 days ago

Synthetic dataset - Hospitalised patients with Thromboembolic diagnosis

Dataset

Documentation

Description:
Background ​Annually in the UK, around 60,000 people develop a pulmonary embolism (PE) and 200,000 a deep vein thrombosis (DVT) and the number of emergency admissions for suspected PE and DVT is increasing. Diagnosing PE and DVT remains a challenge due to the non-specific nature of presenting symptoms. Further tests are often required and each year the number of CTPAs and USS performed for suspected VTE increases. There is great interest in finding better tools to identify those with the highest likelihood of a DVT and PE, so that precious screening services can be focused where needed most. A number of tools have been suggested but few have been adopted in clinical practice. Methods such as age-adjusted D-dimer tests and 4PEPs and 4D scores aim to predict PE and DVT more accurately. Implementing a more precise system could revolutionise how we diagnose and treat these dangerous conditions. This dataset enables an exploration of VTE to better understand disease, identify patients at most risk of the poorest outcomes and to improve health services through the development of new prognostic tools. PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, and 2,750 beds. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health.”  Methodology: A specific pipeline was designed for the generation of the synthetic version of thromboembolic events dataset including data pre-processing, synthetising, and post-process steps. In brief, a generative adversarial network model (CTGAN) in the SDV package (N. Patki, 2016) was employed to generate synthetic dataset which is statistically equivalent to a real dataset. Pre-process and post-process steps were customised to improve the realisticity of the synthetic data. Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Thromboembolic events (PE/DVT). Real-world dataset linked. The dataset includes large patient demographics, clinical scores, and medical conditions for PE/DVT patients, alongside outcomes taken from ICD-10 & SNOMED-CT codes. Available supplementary data: real-world PE/DVT cohort. Available supplementary support: Analytics, model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.

Coverage

Spatial:
United Kingdom,England,West Midlands
Typical Age Range:
18-97
Follow Up:
1 - 10 Years
Pathway:
Data is representative of the multi-ethnicity population within the West Midlands (42% non white). Data includes all patients admitted during this timeframe, with National data Opt Outs applied, and therefore is representative of admissions to secondary care. Data focuses on in-patient stay in hospital during the acute episode but can be supplemented on request to include previous and subsequent hospital contacts (including outpatient appointments) and ambulance, 111, 999 data.

Provenance

Origin

Purposes:
Care
Sources:
Machine generated
Collection Situations:
Secondary care - Accident and Emergency

Temporal

Accrual Periodicity:
Quarterly
Distribution Release Date:
17 October 2023
Start Date:
26 December 2016
End Date:
28 December 2021
Time Lag:
Other

Accessibility

Access

Access Service:
Trusted Research Environments (TRE) are built using Microsoft Azure services and hosted in the UK to provide research teams a safe, secure and agile environment which allows users to quickly analyse, interpret and form an enriched view of primary care information through a range of integrated datasets. Health data collated from multiple sources is ingested into a secure data lake which will then allow subsets of data to be made available to research teams on approval of a data request. Once approved a customer specific TRE is made available with a standard set of leading analytical tools from Microsoft including Azure Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale data warehouses). Specific tools can be provided at an additional cost over the standard platform data access charge and the PIONEER team will work with you to determine your exact needs. Access to the TRE is managed using the latest virtual desktop technology to provide a safe and secure end-user experience. By utilising leading edge design PIONEER are able to create TREs rapidly to enable us to service any customer requirement.
Access Request Cost:
www.pioneerdatahub.co.uk/data/data-services-costs/
Delivery Lead Time:
Not applicable
Jurisdictions:
GB-ENG
Data Controller:
University Hospitals Birmingham NHS Foundation Trust

Usage

Data Use Limitations:
General research use
Data Use Requirements:
Project-specific restrictions
Resource Creators:
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)

Format and Standards

Vocabulary Encoding Schemes:
  • SNOMED CT
  • ICD10
  • OPCS4
Conforms To:
LOCAL
Languages:
en
Formats:
SQL

Observations

Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
14500 spells for patients with PE/DVT between 26/12/2016 and 28/12/2021
14500
Count
18 August 2022