Covid-19 Data Portal

COVID-19 Portal and Data Flow Update

Wednesday 09 Dec 20


Guy Cochrane
Team Leader
European Molecular Biology Laboratory
COVID-19 Data Portal and the SARS-CoV-2 Data Hubs offer data and analysis for the scientific community.

Since the submission of the first novel coronavirus (2019-nCoV), in early January 2020, the International Nucleotide Sequence Database Collaboration (INSDC) has seen more than 160,000 raw SARS-CoV-2 read dataset submissions from sequenced samples and >40,000 assembled SARS-CoV-2 sequence submissions. One of the main access points for this data, along with other data types at the European Bioinformatics Institute (EMBL-EBI) is through the COVID-19 Data Portal, a component of the European COVID-19 Data Platform. Most of the raw data submissions have been supported through data mobilisation efforts undertaken as part of the European COVID-19 Data Platform, to mobilise SARS-CoV-2 data for access by the scientific community.

In addition to the archival and mobilisation of data, and to assist the scientific community in systematically analysing their data, the SARS-CoV-2 Data Hubs form another component of the European COVID-19 Data Platform. This is an extension to the COMPARE Data Hubs, which includes a new SARS-CoV-2 drag and drop uploader tool and additional public analytical workflows integrated within the Data Hubs analysis workflow management system.

The SARS-CoV-2 drag and drop uploader tool provides a simple upload tool for raw read and unannotated assembled SARS-CoV-2 sequence data at the European Nucleotide Archive (ENA). Submitters drag and drop their data files along with a metadata spreadsheet, which is then received by ENA’s support bioinformaticians to finalise processing and accessioning, reducing the burden on the submitter. The tool currently has seen 7 submissions, with several more ongoing. In parallel, ongoing work is underway to automate back-end processing and accessioning steps.

The SARS-CoV-2 Data Hubs have seen four additional workflows fully integrated into the system, in order to systematically process SARS-CoV-2 raw sequencing read data. This includes Jovian (both metagenomic and reference alignment modes), developed by National Institute for Public Health and the Environment (RIVM), Netherlands, the Nanopore Analysis Workflow (NAW), developed by Erasmus Medical Centre (EMC), Netherlands and COVID-19 Sequence Analysis Workflow, developed by Eötvös Loránd University (ELTE), Hungary. Currently, public raw SARS-CoV-2 reads are undergoing processing through Jovian reference alignment and NAW to generate assembled sequences, annotated with quality control metrics to be publicly accessible in the COVID-19 Data Portal. Further plans are ongoing with the integration of the Evergreen trees to leverage the production of phylogenetic trees within the Data Hubs system.

This work is supported by Work Package 15, COVID-19 Response, of the VEO project. The objective of WP15 is to provide a suite of analytical tools, storage, and data sharing workspace to facilitate the sharing, analysis and reuse of raw and annotated SARS-CoV-2 genomic data.

