client logo
Version: 1.0.0 | Published: 8 Oct 2024 | Updated: 229 days ago

CPRD COVID-19 Symptoms and Risk Factors Synthetic Dataset

Dataset

Summary

DOI Name:
10.48329/yk2n-sz66

Documentation

Description:
This wholly synthetic dataset is based on real anonymised primary care patient data extracted from the CPRD Aurum database. Researchers will not be able to access the real anonymised patient data extract which were used as the basis for the synthetic dataset generation to preserve patient privacy. The dataset focuses on patients presenting to primary care with symptoms indicative of COVID-19 (confirmed/suspected COVID-19) and control patients with negative COVID-19 test results. The dataset includes data on sociodemographic and clinical risk factors. The ‘ground truth’ CPRD Aurum data extract used as the basis for generating this synthetic dataset included data till 13/04/2021 on patients with either suspected or confirmed COVID-19 as ascertained from the primary care record. The ground truth data extract was subject to data pre-processing and as such, the synthetic dataset based on this, does not reflect the structure of the source CPRD Aurum database. The development of this synthetic dataset was funded by NHS X using the synthetic data generation and evaluation framework developed by CPRD under a grant from the Regulators’ Pioneer Fund launched by The Department for Business, Energy and Industrial Strategy (BEIS) and managed by Innovate UK. The methodology used to generate and evaluate this synthetic dataset is outlined in Wang et al. 2019 (DOI Bookmark:10.1109/CBMS.2019.00036).

Coverage

Spatial:
United Kingdom
Typical Age Range:
0-150
Follow Up:
Unknown
Pathway:
Primary care

Provenance

Origin

Purposes:
Study
Collection Situations:
Other

Temporal

Accrual Periodicity:
Other
Distribution Release Date:
01 December 2021
Start Date:
03 December 2019
End Date:
12 April 2021
Time Lag:
Not applicable

Accessibility

Access

Access Service:
Access to CPRD data, including UK Primary Care Data, and linked data such as Hospital Episode Statistics, is subject to protocol approval via CPRD’s Research Data Governance (RDG) Process. Independent scientific and patient advice is provided by Expert Review Committees (ERCs) and the Central Advisory Committee (CAC): https://www.cprd.com/research-applications
Access Request Cost:
Delivery Lead Time:
Not applicable
Jurisdictions:
GB-GBN
Data Controller:
Clinical Practice Research Datalink (CPRD)
Data Processor:
CPRD

Usage

Data Use Limitations:
  • General research use
  • No linkage
  • Research-specific restrictions
  • Research use only
Data Use Requirements:
  • Geographical restrictions
  • Institution-specific restrictions
  • Project-specific restrictions
  • Time limit on use
  • User-specific restriction
Resource Creators:
CPRD

Format and Standards

Vocabulary Encoding Schemes:
SNOMED CT
Languages:
en
Formats:
Tab-delimited-text

Observations

Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
Population size
4173000
COUNT
01 December 2021