A Cardiovascular Translational Biomedicine Platform for HDR-UK
In parallel, the European Commission (EC) together with the European Association of Pharmaceutical Industries and Associations (EFPIA) funded the eTRIKS project (2012-2018) to deploy a sustainable open-source data and knowledge management platform to support translational research: tranSMART. This system supports a wide variety of data and has been successfully applied to various projects within (e.g. U-BIOPRED, MRC Stratified Medicine projects (PSORT, MATURA, RA-MAP, IMID-BIO, CLUSTER and MASTERPLANS)) and beyond the UK (e.g. AETIONOMY).The new capabilities of tranSMART allow the integration of study metadata; various categorical and numerical data (e.g. red-blood cells counts) along with OMICS data (e.g. gene expression, genomic copy number variation and small nucleotide polymorphisms, peptides & metabolite profiling). The tool tranSMART allows programmatic data access for the generation of computational workflows using a large variety of software.
From the collaboration between the projects eTRIKS and AETIONOMY, a new software concept called BrainMesh raised and prized the best-poster award from the tranSMART Foundation Annual Meeting (2016) at the University of California (San Diego - US); featuring as promising future technology around the tranSMART environment. Together with the new visual analytical features of tranSMART, via the newly developed software component SmartR, BrainMesh adds a completely new dynamic visual analytics concept to tranSMART, such as allowing the visual analysis of clinical and image-derived data in a integrated fashion.
In this proposal, we aim to include the complete UKB and BBR cardiovascular MRI cohorts into dedicated (distinct) tranSMART environments where multiple analytical workflows could be executed in order to stratify patients that share common health data features, paving the way for data mining and discovery in these cohorts and in future projects that desire to use the platform.
The UKB and BBR cohorts collectively represent the full spectrum between health and cardiovascular disease. Both cohorts will be analysed using parallel tranSMART data warehouse infrastructures, enabling a comparison between healthy and diseased subjects and integration of high level findings from both cohorts. Thus, we will establish a foundation for translational cardiovascular research in the UK with a detailed data provenance schema and common analytical pipeline.
To achieve this, Unified Medical Language System (UMLS) coding standards will be used to standardize EHR data into official nomenclature. In order to analyze and potentially integrate multiple datasets between tranSMART instances, extensive data curation will be necessary, prior experience from the eTRIKS and AETIONOMY IMI projects will mitigate the risks associated with this process. The data will be made available to other applications, via a flexible tranSMART API, including a data constructor feeding a machine learning software layer called Ada. Using Ada's powerful machine-learning algorithms we will stratify patients and by adapting the BrainMesh package for heart data, we will investigate the MR images collected by UKB and BBR.
Docker instances will help to create reproducible and portable data warehouse instrances, while Git versioning will provide clarity in versioning. Software and datasets will be made available in public repositories where appropriate or via Zenodo and referenced by a top-level unique Digital Object Identifiers. Application of FAIR (Findable, Accessible, Interoperable and Reproducible) principles, within the broader scope of each resource access conditions will guarantee the sustainable long-term use of these tools and datasets for future researchers. Allowing us to create a critical mass of highly skilled scientists dedicated to health data research.
Funding: UKRI – MRC.
- Link to external resources: