Genomic Diversity of SARS-CoV-2 in India
Vinod PK's Lab, Center for Computational Natural Sciences & Bioinformatics (CCNSB),
IIIT Hyderabad

13th August '20

Latest Update using 1813 samples; clusters observed in Karartaka

6th July '20

We analyzed the spike protein sequences for Indian samples and observed 4 distinct clusters determined by D614 mutation

6th June '20

We observed that C2 (clade I/A3i) is not a dominant cluster based on the clustering pattern. C1 shows more diversity than C2. We also observed that different clusters are emerging in Gujarat and Maharashtra from C2

31st May '20

Nextstrain phylogenetic tree analysis by CSIR labs reported a new cluster as a clade I/A3i.

26th May '20

We analyzed the genomic sequences of SARS-CoV2 in India using an alignment-free method based on Chaos Game Representation. We found two predominant clusters (C1 and C2) in India with one cluster (C2) containing samples mostly from Telangana.
Progression of COVID19 in India from May to August


Diversity in the Genomic Sequences

Data Distribution
Progression of COVID19 in India from May to July
1 / 3
2 / 3
3 / 3


Clustering Observed in Indian Spike Protein Sequences
D614G Mutation has been found to affect the infectivity of the Virus SARS-CoV-2

Diversity in Spike Protein Sequences
Multiple Sequence Alignment performed over 872 Indian Spike Protein Samples. Entropy (Hx) Plot for all samples indicates that the clustering pattern is influenced by max variance of D614

Diversity in the Genomic Sequences

Data Distributions

Chaos Game Representation
Chaos Game Representation is a 2D graphical representation of long 1D sequences based on oligonucleotide frequencies
Indian SARS-CoV2 Sequences Visualized through tSNE - Statewise Diversity
[Based on Data Accessed on 3rd June 2020]
tSNE_255_IndianSamples

SARS-CoV2 Sequences Visualized through tSNE - Countrywise Diversity
[Based on Data Accessed on 3rd June 2020]
Indian Samples

Sample Distributions for Indian SARS-CoV-2 sequences

Phylogenetic tree of Indian Samples
Generated using NextStrain: Hadfield et al., Nextstrain: real-time tracking of pathogen evolution, Bioinformatics (2018)
PhylogeneticTree

CSIR lab describes a distinct phylogentic cluster
The clade I/A3i accounts for '41% of all genomes sequenced and deposited in the public domain from multiple states in India'
[Banu, Sofia, et al. "A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates." bioRxiv (2020)]
Interesting clusters observed in the Indian SARS-CoV2 Sequences
The Indian samples analyzed using an alignment-free approach based on Chaos Game Representation indicate patterns conforming to regional variations in the t-SNE graph. One cluster from the state Telangana is found to be diverging from rest of the country.
tSNE_255_IndianSamples

The presence of clades observed in Indian samples by CSIR Lab
[updated: 14 May '20] [Image Credits: Vinod Scaria Lab at CSIR Institute of Genomics & Integrative Biology]

Analysis performed by Ms. Ruchi Chauhan
Based on the data from Global Initiative on Sharing All Influenza Data (GISAID)
Details about Materials & Methods can be found here
Created by Ruchi Chauhan