Version: 1.0.0 | Published: 8 Oct 2024 | Updated: 228 days ago
Documentation
Associated Media:
Description:
Background:
A PIONEER synthetic dataset of 20,000 ethnically diverse hypertrophic cardiomyopathy patients created using CT-GAN generative AI. Data includes clinical & biological phenotyping, co-morbidities, investigations (ECG, ECHO), procedures & outcomes.
Well-created synthetic data establishes a governance risk-free environment for algorithm development & experimentation. This includes evaluating new treatment models, care management systems, clinical decision support, and more. Synthetic data is of particular use in rare diseases, where real data may be in short supply, or to replicate disease in less common patient demographics (e.g. ethnicities).
Familial hypertrophic cardiomyopathy (HCM) is a rare genetic condition characterised by thickening (hypertrophy) of the cardiac muscle, usually of the interventricular septum. Arrhythmias can be life threatening and HCM is associated with an increased risk of sudden death. Some affected individuals develop potentially fatal heart failure, which may require heart transplantation. Approximately 130,000 people have HCM in the UK, but there is a significant burden of undiagnosed disease and diagnostic delay.
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can provide real world data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
Coverage
Spatial:
United Kingdom,England,West Midlands
Typical Age Range:
0-150
Follow Up:
Other
Pathway:
Data is representative of the multi-ethnicity population within the West
Midlands (42% non white). Data includes all patients admitted during this
timeframe, with National data Opt Outs applied, and therefore is representative
of admissions to secondary care. Data focuses on in-patient stay in hospital
during the acute episode but can be supplemented on request to include previous
and subsequent hospital contacts (including outpatient appointments) and
ambulance, 111, 999 data. University Hospitals Birmingham NHS Foundation Trust
(UHB) is one of the largest NHS Trusts in England, providing direct acute
services and specialist care across four hospital sites, with 2.2 million
patient episodes per year, 2750 beds and 100 ITU beds. UHB runs a fully
electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary
and secondary care record (Your Care Connected) and a patient portal “My
Health”.
Provenance
Origin
Purposes:
Other
Sources:
Machine generated
Collection Situations:
Secondary care - In-patients
Temporal
Accrual Periodicity:
Static
Distribution Release Date:
29 February 2024
Start Date:
22 February 2021
End Date:
31 January 2024
Time Lag:
Not applicable
Accessibility
Access
Access Service:
Trusted Research Environments (TRE) are built using Microsoft Azure services and
hosted in the UK to provide research teams a safe, secure and agile environment
which allows users to quickly analyse, interpret and form an enriched view of
primary care information through a range of integrated datasets. Health data
collated from multiple sources is ingested into a secure data lake which will
then allow subsets of data to be made available to research teams on approval of
a data request. Once approved a customer specific TRE is made available with a
standard set of leading analytical tools from Microsoft including Azure
Databricks, Azure Machine Learning, Azure SQL and Azure Synapse (for large-scale
data warehouses). Specific tools can be provided at an additional cost over the
standard platform data access charge and the PIONEER team will work with you to
determine your exact needs. Access to the TRE is managed using the latest
virtual desktop technology to provide a safe and secure end-user experience. By
utilising leading edge design PIONEER are able to create TREs rapidly to enable
us to service any customer requirement.virtual desktop technology to provide a
safe and secure end-user experience. By utilising leading edge design PIONEER
are able to create TREs rapidly to enable us to service any customer
requirement.
Access Request Cost:
www.pioneerdatahub.co.uk/data/data-services-costs/
Delivery Lead Time:
Not applicable
Jurisdictions:
GB-ENG
Data Controller:
University Hospitals Birmingham NHS Foundation Trust
Usage
Data Use Limitations:
- General research use
- Commercial research use
Data Use Requirements:
Project-specific restrictions
Resource Creators:
This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
Format and Standards
Vocabulary Encoding Schemes:
LOCAL
Conforms To:
LOCAL
Languages:
en
Formats:
csv
Observations
Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
20,000 synthetically generated patient records, sampled using a CT-GAN generative AI
20000
Count
25 February 2021