Bioinformatics: variant ranking for NGS data
After raw sequence data analysis, which involves base calling, de-multiplexing, alignment to the hg19
human reference genome (Genome Reference Consortium GRCh37), and variant calling, variants are then annotated using Annovar1 and in-house ad hoc bioinformatics tools. Alignments are visually verified with the Integrative Genomics Viewer v.2.32 and Alamut v.2.4.5 (Interactive Biosoftware, Rouen, France).
Variant analysis is performed without bias using a cascade of filtering steps previously described. All
candidate variants are required to be present on both sequenced DNA strands and to account for ≥20% of total reads at that site with a minimum depth of coverage of 20X. Common polymorphisms (≥5% in the
general population) are discarded by comparison with the 1000G (May 2015, www.1000genomes.org ), the Exome Variant Server (May 2015, evs.gs.washington.edu ), the Exome Aggregation Consortium database (ExAC, May 2015, exac.broadinstitute.org ), and an in-house exome variant database, to filter out both common benign variants and recurrent artifact variant calls. However, since these databases also contain known disease-associated mutations, all detected variants are initially compared to our internal mutation database (CentoMD®¬), Human Gene Mutation Database (HGMD®), and ClinVar4 to directly identify and annotate variants previously described in the literature as definitely or likely pathogenic, uncertain, or
Evaluation of the pathogenicity of variants and clinical reporting
After comparison to known pathogenic variants in databases (CentoMD®, HGMD®, and ClinVar),
rare variants (MAF<2%) are evaluated with the following criteria. Mutations predicted to result in a
premature truncated protein: nonsense, frameshift, initiation codon, single- or multi-exon deletions, other larger genomic rearrangements as well as canonical splice site mutations (±2 bps), are given the highest priority. Missense variants are considered a priori unclassified sequence variants (UCV) and their potential pathogenicity is evaluated taking into consideration the biophysical and biochemical difference between wild type and mutant amino acid, the evolutionary conservation of the nucleotide and amino acid residue in orthologs,5 a number of in silico predictors (Sift, Polyphen-2, Mutation taster amongst others), and
population frequency data. Putative splicing variants are analyzed using Alamut version 2.4.5 (Interactive Biosoftware, Rouen, France), a software package that uses different splice site prediction programs to compare the normal and variant sequences for differences in potential regulatory signals. Variants are then evaluated based on the suspected disease mode of inheritance and compatibility with the clinical
phenotype of the index case; this evaluation step involves consulting several databases and other sources of information such as the Online Mendelian Inheritance in Man (OMIM®, May 2015, omim.org ), HGMD®, CentoMD®, as well as scientific literature searches in PubMed (http://www.ncbi.nlm.nih.gov/). The final short-list of variants is re-evaluated by a human geneticist and Centogene’s medical director. Selected
putative candidate pathogenic and likely pathogenic variants identified are confirmed by conventional PCR amplification and Sanger sequencing. Segregation of these changes with the disease is assessed for all available family members. For reporting purposes, variants are classified as those fully or partially
explaining the clinical phenotype of the index.