Whole genome and whole exome sequence analysis:

Quality checked sequencing data will be aligned to the human reference genome (hg38) using PEMapper. Variant calls will be generated using PECaller. Variants will be annotated using A VCF file containing the cleaned variants will be generated. Read more

Somatic mutations from cancer (tumor/normal) data:

Raw sequence data reads (WES/WGS) of the tumor or matching-normal samples will be aligned to the human reference genome (ex. hg38) using the Burrows–Wheeler Aligner (BWA). After alignment, and deduplication of reads with Picard software, we sort and index the binary version of Sequence Alignment Map (SAM) file. We then use Genome Analysis Toolkit (GATK) v4.0 to perform indel realignment and base quality score recalibration. Somatic variant calling will be performed on matched tumor-normal pairs using MuTect2. Read more

Cytogenomic/SNP Genotyping data analysis:

The widespread use of microarrays allows gene expression profiling, genotyping, mutation detection, and gene discovery throughout the genome. This document aims to provide a workflow for analysis of Infinium CytoSNP-850K v1.2 array data to identify genetic and structural variations. Read more

Prokaryotic Whole Genome Sequencing – Assembly and Functional Annotation of Illumina Reads

Raw Illumina reads from a whole genome sequencing project will be run through an analysis pipeline that includes quality control, read quality and adapter trimming, reference based or de novo assembly, gene prediction, and genome functional annotation. We can assemble a genome from pooled Illumina read libraries or single-cell reads. Read more



ATAC-seq (bulk) data analysis:

ATAC-seq (Assay for Transposase Accessible Chromatin with high-throughput Sequencing) is a next-generation sequencing approach for the analysis of open chromatin regions to assess genome-wise chromatin accessibility. ATAC-seq achieves this by simultaneously fragmenting and tagging genomic DNA with sequencing adapters using the hyperactive Tn5 transposase enzyme. This document aims to provide a workflow for the analysis of ATAC-seq data to identify differential chromatin accessibility. Read more



RNA-Seq (Cancer) data analysis:

Quality filtered sequencing data will be aligned to the reference genome (ex. hg38) using STAR (Spliced Transcripts Alignment to a Reference). Gene quantification will be done using HTSeq-count. Fusion transcripts are characteristic of cancer tumors. STAR-Fusion uses chimeric-reads collected during STAR-alignment for fusion RNA prediction. In order to reduce the number of false-positive fusion genes, fusion events with fusion fragments per million total reads < 0.1 and putative fusions between homologous genes will be discarded. Read more

RNA-Seq (Bulk) data analysis:

Quality checked sequencing data will be aligned to the human reference genome (hg38) using STAR (Spliced Transcripts Alignment to a Reference). Gene quantification will be done using HTSeq-count. To characterize expressed genes, a pre-ranked permutation based gene set enrichment analysis (GSEA) will be performed. Read more

RNA-seq (Single Cell) data analysis:

Using CellRanger Single Cell Software Suite 3.0.2, we demultiplex, align and quantify sequenced data. Raw base call data from sequencer will be demultiplexed into sample-specific FASTQ files. Quality checked sequencing data will be processed using CellRanger. The R software package Seurat will be used for further analysis.  Read more



Proteomics data analysis - Label Free Quantification

Database searches will be performed using the Andromeda search engine with the UniProt-SwissProt human canonical database as a reference and a contaminants database of common laboratory contaminants. Protein group LFQ (label free quantification) intensities will be log2-transformed to reduce the effect of outliers. To overcome the obstacle of missing LFQ values, missing values will be imputed before fitting the models. Two-tailed, Student’s t test calculations will be used in statistical tests. Read more



Amplicon (16S rRNA) data analysis:

Demultiplexed raw sequences will be processed using an open-source software package Quantitative Insights Into Microbial Ecology, QIIME 2 2018.8. Denoising and dereplication of paired-end sequences will be performed using the Divisive Amplicon Denoising Algorithm 2 (DADA2), an amplicon-specific error-correction method that models and corrects Illumina-sequenced amplicon errors. Read more

Shotgun metagenomic data analysis:

To perform taxonomic (phyla, genera or species level) profiling of shotgun metagenome sequencing reads, the MetaPhlAn2 pipeline will be used on a high performance cluster-computing environment or as Amazon custom AMI. HUMAnN2 (HMP Unified Metabolic Analysis Network) utilizes the MetaCyc database as well as the UniRef gene family catalog to characterize the microbial pathways present in samples. Read more