How is NGS data actually generated? | NGS Data Analysis Workflow [Part 1]

Insights | 26. 04. 24

📍Key Takeaways

NGS analysis runs through three stages — Primary, Secondary, and Tertiary — and the quality and tools used at each stage directly shape the final result.
Variant detection is not a single-algorithm process. Different tools are applied for different variant types: SNVs, CNVs, structural variants, repeat expansions, and more.
The same sample can yield different results depending on the pipeline and interpretation criteria used. It’s not just about whether NGS was done — it’s about how it was done.

When you order a genomic test and receive a report, you typically see a summary of a few variants along with their interpretations.
However, behind those results lies a much more complex analytical process than it may appear.

In this article, we will walk through how WES/WGS-based testing is actually performed, focusing on the clinically relevant workflow.

NGS analysis consists of three main stages

The overall workflow can be divided into three steps:

Primary analysis
Secondary analysis
Tertiary analysis

Understanding this structure is the first step toward properly interpreting NGS results.

1. Primary analysis: “Reading the data”

This stage converts raw signals generated by the sequencing instrument into actual nucleotide sequence data.

The main steps include:

Base calling (image → sequence)
BCL to FASTQ conversion
Adapter trimming
Quality control (QC)

There is one key takeaway here:
the initial quality of the data determines everything that follows.

If the data quality is poor at this stage, even the most sophisticated downstream analysis cannot fully compensate for it.

2. Secondary analysis: “Finding where it belongs in the genome”

In this stage, the sequenced DNA fragments are aligned to the human reference genome.

Key steps include:

Alignment (mapping to the reference genome)
BAM file generation
Duplicate read removal
Base quality recalibration

And the most critical step happens here: Variant calling (identifying variants)

This is where actual variants are detected.

One important point is that no single algorithm is used to detect all types of variants.

SNV/INDEL → GATK
CNV → 3bCNV + MANTA
Structural variants → MANTA
Repeat expansion → ExpansionHunter
Mobile element insertion → MELT

In other words, a single test result is the product of multiple analytical tools working together.

3. Tertiary analysis: “Assigning clinical meaning”

From this stage onward, the process moves beyond data processing into interpretation.

(1) Annotation

Identifying which gene the variant is located in
Assessing its impact on the protein
Using tools such as VEP and internal databases

(2) Filtering

Removing common variants using population databases (e.g., gnomAD)
Most clinically insignificant variants are filtered out at this step

(3) Variant classification

Classification based on ACMG guidelines (Pathogenic, Likely pathogenic, VUS, etc.)

(4) Prioritization

AI-based analysis
Matching with the patient’s phenotype

Ultimately, one key question

After all these steps, everything comes down to a single question:

“Does this variant explain the patient’s phenotype?”

No matter how advanced the technology becomes, this question does not change.

Questions we often receive in clinical practice

In clinical settings, we often hear questions like:

“This variant was reported by another lab—why is it not in the 3billion report?”
“Why is this variant classified as VUS here, but pathogenic elsewhere?”

These are natural questions, but they often overlook one important fact:

NGS results are not simply ‘discovered data’—they are the result of a series of analytical processes and interpretation criteria.

In reality, results can vary depending on:

Data quality
Variant calling algorithms used
Filtering criteria
Interpretation strategy

Even with the same sample, differences in these steps can lead to different reported variants and different interpretations.

What matters is not simply that “NGS was performed,” but how the data were analyzed and interpreted.

NGS results are not just test outputs—they are the outcome of a complex process of identifying meaningful signals from vast amounts of data.

Understanding this process is the starting point for accurate interpretation.

3billion’s WES/WGS tests run through every analysis stage described above using our in-house pipeline. If you have questions about our testing process or how we interpret variants, feel free to reach out.

Inquire About WES/WGS Testing

Get exclusive rare disease updates
from 3billion.

Sohyun Lee

Clinical Genomics Scientist & Clinical Customer Support — guiding test selection, supporting variant and result interpretation, handling case inquiries, and translating field insights into service improvements.