Automated variant prioritization system, a breakthrough in the field of rare disease diagnosis

Among the 7,000 or more Mendelian diseases identified worldwide so far, more than 5,000 are known to have specific genetic deficiencies. Even now, disease-causing variants are continuously being identified through whole exome sequencing (WES).

Approximately 20,000 variants are identified when a person’s DNA is sequenced through WES, and assessing the possibility of a variant causing a disease is how WES is applied in clinical diagnosis. Previously, this whole process was performed manually by people, which meant longer analysis time and a higher likelihood of bias. Hence, 3billion has developed EVIDENCE, an automated variant prioritization system, to be used for clinical diagnosis.

This paper discusses the analysis results and insights obtained from 330 probands using the EVIDENCE system.

Study method and background

The study was conducted at the Asan Medical Center, Seoul, South Korea, from April 2018 to August 2019 on 330 non-consanguineous patients suspected of having genetic diseases. For patients aged 5 months or older, only those who were strongly suspected of having rare diseases but failed to receive a positive result from other diagnostic tests (chromosome analyses, chromosome microarray, or single or targeted gene panel testing) were included. For patients younger than 5 months, those with problems in major organs and strongly suspected of having rare diseases were included as target subjects.

Illumina Novaseq 6000 was used as the sequencing platform for WES, and the data analysis was conducted using 3billion’s variant prioritization and interpretation tool, EVIDENCE.

The EVIDENCE system evaluates a variant by utilizing information on the risk of disease and the symptom similarity score of a specific variant identified from past data and research results. The variants selected by the EVIDENCE system are manually reviewed and ultimately confirmed by medical geneticists. Additionally, Sanger sequencing is conducted for confirmation when necessary.

Summary of study results

The analysis results obtained from the study are as follows.
223 variants suspected as the cause for the patient’s symptoms were identified in a total of 330 patients. Following the American College of Medical Genetics and Genomics (ACMG) guidelines, 15 variants (9%) were classified as pathogenic, 88 variants (52.7%) as Likely pathogenic, and 65 variants (38.9%) as Variant of Uncertain Significance (VUS).


Distribution of the probably pathogenicity of identified variants by EVIDENCE before family member testing, after addition of phenotypic specific rules (PP4), and after family member testing

One of the most significant takeaways from the analysis results was the correlation between the symptom similarly score used for the prioritization in EVIDENCE and the variant confirmation rate.

The variant confirmation rate was significantly higher when the symptom similarity score was greater than 5 compared to when the symptom similarity score was less than 5. This indicates that other than the genetic similarity, symptom similarity is also a highly influential factor in determining a specific variant as the cause of a disease.


Distribution of symptom similarity scores of patient phenotypes and genetic phenotypes suggested by the automated system


In conclusion, the diagnostic yield obtained using EVIDENCE was 42.7%, which was higher than 30~35%, the previously known diagnostic yield of automated systems. This result, of course, is due to multiple factors.

EVIDENCE has continued to improve after the study with the adoption of a re-analysis pipeline and functional enhancement. The interest in automated variant prioritization systems is increasing as indicated by how this paper was selected as the most cited paper in the field of clinical genetics from Wiley Online Library between 2020 and 2021.

For more detailed information and in-depth discussion of the paper, please access the original publication through the link provided below.


  1. Seo, G, 2020. Diagnostic yield and clinical utility of whole exome sequencing using an automated variant prioritization system, EVIDENCE. Clinical Genetics, [online] 98(6), pp.562-570.