Skip to main content
Centre for Translational Bioinformatics

HDR UK

A Cardiovascular Translational Biomedicine Platform for HDR-UK

Project Summary

 

Abstract

Translational biomedicine studies depend on the integration of multiple datasets that, together, represent the complex plethora of features from patients transiting between health and disease states. The UK has several initiatives which aim to investigate disease onset and progression on a longitudinal basis which are particularly suited for research. The UK Biobank (UKB) has a clinical data collection comprised of more than 500,000 healthy individuals, with aims to collect 100,000 magnetic resonance image scans of various body parts such as brain, heart and abdomen, as well as information about the bone tissue structure and ultrasound of the carotid arteries from participants. This imaging data is being integrated with genetic data and detailed clinical information derived from detailed subject assessments and linked electronic health records. By comparison to the mainly healthy UKB cohort, the Barts Heart Centre has recruited over 14,000 patients since 2014 to create the Barts BioResource (BBR), which aims to create a rich information resource for cardiovascular research, linking omics, imaging and EHR. In order to speed-up translational research using these unprecedented datasets, it is of utmost importance to guarantee the information about the origin of these datasets, the precise methods that they were collected and integrate them in a major unified database system. The UKB and BBR cohorts collectively represent the full spectrum between health and cardiovascular disease.

In parallel, the European Commission (EC) together with the European Association of Pharmaceutical Industries and Associations (EFPIA) funded the eTRIKS project (2012-2018) to deploy a sustainable open-source data and knowledge management platform to support translational research: tranSMART. This system supports a wide variety of data and has been successfully applied to various projects within (e.g. U-BIOPRED, MRC Stratified Medicine projects (PSORT, MATURA, RA-MAP, IMID-BIO, CLUSTER and MASTERPLANS)) and beyond the UK (e.g. AETIONOMY).The new capabilities of tranSMART allow the integration of study metadata; various categorical and numerical data (e.g. red-blood cells counts) along with OMICS data (e.g. gene expression, genomic copy number variation and small nucleotide polymorphisms, peptides & metabolite profiling). The tool tranSMART allows programmatic data access for the generation of computational workflows using a large variety of software.

From the collaboration between the projects eTRIKS and AETIONOMY, a new software concept called BrainMesh raised and prized the best-poster award from the tranSMART Foundation Annual Meeting (2016) at the University of California (San Diego - US); featuring as promising future technology around the tranSMART environment. Together with the new visual analytical features of tranSMART, via the newly developed software component SmartR, BrainMesh adds a completely new dynamic visual analytics concept to tranSMART, such as allowing the visual analysis of clinical and image-derived data in a integrated fashion. 

In this proposal, we aim to include the complete UKB and BBR cardiovascular MRI cohorts into dedicated (distinct) tranSMART environments where multiple analytical workflows could be executed in order to stratify patients that share common health data features, paving the way for data mining and discovery in these cohorts and in future projects that desire to use the platform.

Technical Details

The UKB and BBR cohorts collectively represent the full spectrum between health and cardiovascular disease. Both cohorts will be analysed using parallel tranSMART data warehouse infrastructures, enabling a comparison between healthy and diseased subjects and integration of high level findings from both cohorts. Thus, we will establish a foundation for translational cardiovascular research in the UK with a detailed data provenance schema and common analytical pipeline.

To achieve this, Unified Medical Language System (UMLS) coding standards will be used to standardize EHR data into official nomenclature. In order to analyze and potentially integrate multiple datasets between tranSMART instances, extensive data curation will be necessary, prior experience from the eTRIKS and AETIONOMY IMI projects will mitigate the risks associated with this process. The data will be made available to other applications, via a flexible tranSMART API, including a data constructor feeding a machine learning software layer called Ada. Using Ada's powerful machine-learning algorithms we will stratify patients and by adapting the BrainMesh package for heart data, we will investigate the MR images collected by UKB and BBR.

Docker instances will help to create reproducible and portable data warehouse instrances, while Git versioning will provide clarity in versioning. Software and datasets will be made available in public repositories where appropriate or via Zenodo and referenced by a top-level unique Digital Object Identifiers. Application of FAIR (Findable, Accessible, Interoperable and Reproducible) principles, within the broader scope of each resource access conditions will guarantee the sustainable long-term use of these tools and datasets for future researchers. Allowing us to create a critical mass of highly skilled scientists dedicated to health data research.

Funding: UKRI – MRC.