Documentation
Description:
UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. The database, which is regularly augmented with additional data, is globally accessible to approved researchers and scientists undertaking vital research into the most common and life-threatening diseases. UK Biobank’s research resource is a major contributor to the advancement of modern medicine and treatment and has enabled several scientific discoveries that improve human health.
Since 2006, UK Biobank has collected an unprecedented amount of biological and medical data on half a million people, aged between 40 and 69 years old and living in the UK, as part of a large-scale prospective study. With their consent they regularly provide blood, urine and saliva samples, as well as detailed information about their lifestyle which is then linked to their health-related records to provide a deeper understanding of how individuals experience diseases. Genotyping, whole exome sequencing and whole genome sequencing is available for the whole cohort. Blood and urine biomarkers, telomere data, metabolomic and proteomic data and infectious disease markers have been assayed from the samples provided.
Since 2014 we have been undertaking the largest imaging study to date. We aim to undertake brain, cardiac and neck to knee MRI, whole body DXA and carotid ultrasound of 100,000 participants. We additionally have retinal images for 100,000 participants from baseline assessment, and accelerometer data for 100,000 participants collected 2013-2014.
Questionnaires that aim to capture data that is not readily captured by health data linkages are regularly sent to our participants.
The data – the largest and richest dataset of its kind – is de-identified and made widely accessible by UK Biobank to registered researchers around the world who use it to make new scientific discoveries about common and life-threatening diseases – such as cancer, heart disease and stroke – in order to improve public health.
Coverage
Spatial:
United Kingdom
Typical Age Range:
40-69
Follow Up:
Continuous
Pathway:
UK Biobank is a volunteer based cohort. As such, there is a healthy volunteer
effect that results in participants tending to be of higher socioeconomic
status, remaining in education longer, slimmer, less smokers (although those
that smoke tend to be heavier smokers) and lower consumers of alcohol than the
general population. A comparison between UK Biobank participants and the general
UK population has been published (https://doi.org/10.1093/aje/kwx246). Whilst
selection biases are seen in UK Biobank, there is still substantial
heterogeneity within the cohort. Whilst incidence and prevalence calculations
are not generalisable to the UK population, exposure-outcome comparisons should
be due to the heterogeneity in the cohort. However, it is important that
researchers consider the potential biases of a data set that might limit
generalisability of their results (as is the case for all observational data).
Provenance
Origin
Purposes:
Study
Collection Situations:
- Primary care - Clinic
- Secondary care - Accident and Emergency
- Secondary care - In-patients
- Community
- Clinic
- Prescribing - Community pharmacy
Temporal
Accrual Periodicity:
Continuous
Start Date:
13 March 2006
Time Lag:
Variable
Accessibility
Access
Access Service:
Applications to access data are made through our bespoke access management
system (https://bbams.ndph.ox.ac.uk/ams/). Data access is either via data
download (phenotype and genotype data) or via our Research Analysis Platform
(phenotype, imaging, genotype, WES, WGS, omics). Our RAP is enabled by DNANexus
and hosted by Amazon Web Services
(https://www.ukbiobank.ac.uk/enable-your-research/research-analysis-platform).
Access costs depend on what data access is required.
Access Request Cost:
Delivery Lead Time:
Not applicable
Jurisdictions:
GB-ENG
Data Controller:
UK Biobank
Data Processor:
UK Biobank
Usage
Data Use Limitations:
General research use
Data Use Requirements:
- Institution-specific restrictions
- Project-specific restrictions
- Publication required
- Return to database or resource
- User-specific restriction
- Time limit on use
Resource Creators:
UK Biobank
Format and Standards
Vocabulary Encoding Schemes:
- LOCAL
- OPCS4
- READ
- SNOMED CT
- DM+D
- ICD10
- ICD9
Languages:
en
Formats:
- Text/csv, dta, SAS, R
- Image/ DICOM, NIFTI, PNG
- Other/ VCF, CRAM, PLINK, BGEN, BED, CWA
Observations
Statistical Population
Population Description
Population Size
Measured Property
Observation Date
Persons
Each participant has a large number (<5000) of data points associated with them. Recruitment started in 2006, but data collection is ongoing, and health data predates recruitment date. Summary statistics of all data can be found on our data showcase.
500000
Count
13 March 2006