3billion

Differences Between Raw Data and Clinical Reports  | NGS Data Analysis Workflow [Part 2]

Genetic test | 26. 05. 07

📍Key Takeaways

  1. Distinct Roles of Raw Data vs. Clinical Reports: While raw VCF data provides an exhaustive list of genomic “possibilities,” the clinical report delivers “validated results” specifically curated for diagnostic accuracy and medical decision-making.
  2. Rigorous Filtering for Clinical Utility: Variants are selectively reported based on a stringent evaluation of phenotype relevance, pathogenic evidence, and population/in-house data consistency, ensuring that clinicians can focus on findings with high diagnostic yield.
  3. Technical Quality and Noise Reduction: 3billion minimizes clinical confusion by filtering out technical artifacts and low-reliability variants through BAM-level QC, providing clinicians with a high-confidence report that mitigates the risk of false positives.

Missed Part 1? Read it here! [Go to Article]


Why do raw data and clinical reports differ?

“Why wasn’t this variant reported?”

After WES/WGS testing, many users review their VCF files (annotated data).
And very naturally, they ask questions like:

  • “This variant is present—why isn’t it in the report?”
  • “Please report pathogenic variants even if they are carrier findings.”

To answer these questions, we need to clarify one key point first.


Raw data and clinical reports serve different purposes

NGS analysis produces two types of outputs:

1) Annotated data: a list of all detected variants

  • Hundreds of thousands to millions of variants
  • Includes annotation (VCF-based)

2) Clinical report: clinically meaningful variants

  • Used for diagnosis and medical decision-making

In simple terms:

Annotated data represents possibilities,
while the clinical report represents selected, interpreted results.


Why are certain variants not included in the report?

Variants identified in raw data are often excluded from the clinical report due to a combination of the following reasons:


1. Low relevance to the patient’s phenotype

  • The gene is not related to the patient’s symptoms
  • Disease association is unclear

2. Insufficient evidence of pathogenicity

Being rare alone is not sufficient.

Common situations include:

  • No published clinical reports (limited case evidence)
  • Lack of supporting in silico prediction
  • No functional studies

3. Inconsistency with population data

This is one of the most fundamental criteria in variant interpretation.

For example:

  • A variant is repeatedly observed in population databases (e.g., gnomAD)
  • But its frequency is too high to be consistent with the disease prevalence or inheritance pattern

→ In such cases, the likelihood of the variant being disease-causing is low.


4. Inconsistency with in-house data

This is a critical complementary source of evidence in clinical interpretation.

For example:

  • The same variant is observed in internal datasets
  • But the associated phenotypes do not match the patient’s presentation
  • Or the variant is repeatedly observed in asymptomatic individuals

→ In practice, this can serve as strong evidence against pathogenicity.


5. Low technical reliability (quality issues)

This is one of the most commonly overlooked factors.

Even if a variant appears in a VCF file, not all variants have equal reliability.

False positives can arise due to:

  • Low read depth
  • Strand bias
  • Mapping errors (especially in repetitive regions)
  • Sequencing artifacts

These factors are evaluated at the BAM level or through internal QC processes,
and cannot be fully assessed from the VCF file alone.

In other words, not every variant in a VCF file is necessarily reliable.


So what does this mean in practice?

Variants not reported in the clinical report are typically excluded due to a combination of:

  • Lack of phenotype relevance
  • Insufficient pathogenic evidence
  • Inconsistency with population or in-house data
  • Technical quality concerns

“Are there more VUS?”

Yes—there are many.

However, not all of them are reported because:

  • Their clinical interpretation is uncertain
  • They may increase false positives
  • They can lead to clinical confusion

→ Only clinically meaningful VUS are selectively reported.


“Can you report carrier pathogenic variants as well?”

This is another common request.

However, in WES/WGS diagnostic reports, variant reporting is limited to:

  • Relevance to the patient’s current phenotype
  • Consideration of incidental findings
  • The intended scope of interpretation

→ Therefore, only variants related to the clinical indication are reported.


Then how can carrier information be assessed?

It is still possible.

At 3billion, we offer separate options to address this need:

Family Insight test

  • WGS-based testing for individuals at high genetic risk
  • Interpretation incorporating family history and clinical context

Additionally, carriership findings can be requested separately to identify
pathogenic or likely pathogenic variants.


How to best use your data

Clinical report
→ For diagnosis and clinical decision-making

Annotated data (VCF)
→ For exploration and extended analysis


Conclusion

A 3billion clinical report is not just a list of variants.
It is the result of careful interpretation and validation, where meaning is assigned to selected findings.

Understanding this distinction helps explain why certain variants are not reported—and allows for more effective use of raw genomic data.


Streamline your diagnostic workflow. Discover how our phenotype-driven filtering process delivers only the most relevant findings for your patients.

Get exclusive rare disease updates
from 3billion.

Sohyun Lee

Clinical Genomics Scientist & Clinical Customer Support — guiding test selection, supporting variant and result interpretation, handling case inquiries, and translating field insights into service improvements.

Read More from This Author

Recommended For You