Virtual Research Environment Architecture
Architecture of the Virtual Research Environment
How is the Virtual Research Environment (VRE) structured? What other infrastructures does the platform interface with and enable researchers to access and use data securely? Here you will find a schematic representation of the VRE and an explanation of its key elements.
VRE Architecture explained in video
The VRE Portal allows research teams to discover and query data, access user friendly dashboards and visualizations, directly process and analyze datasets within interactive workbenches, and manage their projects and permissions.
Data uploaded by researchers lands in the VRE Green Room, a restricted zone which provides pipelines and interfaces for pseudonymizing and preparing datasets before they can be processed further and shared. Once this has been done, approved data are copied from the Green Room into the Data Lake located in the VRE Core zone. The Data Lake includes services and pipelines for data cataloguing, quality control, curation, standardization and processing.
Datasets collected and processed across research projects and modalities, including data points from clinical, imaging and other data types such as genomics, can be aggregated into a centralized Data Warehouse. Metadata on these datasets are also captured centrally within the Knowledge Graph and the Metadata Repository. These systems support researchers in finding and extracting their data for the purposes of downstream analysis and visualization.
Using the VRE Analysis Workbench, researchers can work securely inside the platform to process and analyze their datasets. For example, image data processing pipelines can be developed and executed in a high performance computing environment provided by the BIH HPC cluster, which is integrated with the VRE; and simulations and analyses based on machine learning or other computationally demanding methods can be conducted within customized virtual machines or containerized environments. The results of this processing and analysis can flow back into the Data Lake to ensure that full provenance and data lineage is maintained across the data lifecycle.
A number of essential Platform Services are also deployed to support the functionality of the Portal, Green Room and VRE Core and to ensure that the platform operates securely and efficiently. These include an API Gateway, Authentication and Authorization services, Pipeline Orchestration and Messaging. Overall the VRE has been designed and implemented using a microservices architecture, which enables the platform to be reliable, resilient, scalable, and highly adaptable to evolving research requirements and to changes to the underlying IT infrastructure.
The VRE will support a variety of research data needs, ultimately leading to an ecosystem of researchers, data scientists, and developers working together to make medical research data secure, findable and usable.