LC–MS/MS in proteomics: DDA vs DIA (what you can and can’t infer)
LC–MS/MS measures peptides (MS1) and fragments selected peptides (MS2) to identify proteins. Quantification comes from how much signal you get for each peptide/protein across runs.
Data-dependent acquisition (DDA)
In DDA, the instrument repeatedly picks the “top N” most intense precursors at that moment to fragment. This makes identification strong, but sampling is stochastic—especially for complex mixtures—so the same peptide can be selected in one run/timepoint and missed in another even if it’s present (classic “missing values” problem). Reviews comparing DDA and DIA emphasize that DIA generally yields fewer missing values and more reproducible quantification than DDA for comparative studies.
Data-independent acquisition (DIA)
In DIA, the instrument fragments everything within systematic m/z windows, which usually improves reproducibility and completeness for comparisons across conditions/time series (at the cost of more complex analysis).
Key implication for a paper: conclusions built on DDA runs (Q Exactive) of a highly complex secretion mixture, so “absence” and “changes over time or versus condition” need extra caution unless supported by robust quant + replication.
What NSAF is (and what it is not)
Definition
NSAF = Normalized Spectral Abundance Factor. It starts from spectral counts (how many MS2 spectra matched peptides from a protein). It then normalizes by protein length and by the sum across all proteins in that run, producing a relative abundance index. Reviews describe NSAF exactly this way: SpC/length, normalized by the sum of SpC/length across proteins.
What NSAF actually measures
NSAF is best thought of as “within-run relative representation” of proteins under that specific acquisition/sampling outcome—not a direct, linear, absolute measure of concentration or secretion rate.
Why NSAF (spectral counting) is not a reliable semi-quantitative measure for comparing biological conditions
Even when computed correctly, NSAF inherits several limitations of spectral counting, especially for between-condition or time-series inference:
-
Compositional constraint (closed-sum problem) If you express NSAF as “% of total NSAF,” every sample sums to 100%. If one protein/category truly increases, others must decrease even if their absolute amount stayed constant. That makes “% abundance” shifts easy to over-interpret as biology.
-
DDA sampling noise + missingness dominates low/moderate abundance proteins Spectral counts change because the instrument chose different ions to fragment, not only because biology changed. Comparative LFQ reviews highlight that proper design + statistics are essential and that LFQ outputs often need validation.
-
Nonlinearity and saturation Spectral counts are roughly proportional only over a limited dynamic range; highly abundant proteins can saturate and small fold-changes can be invisible.
-
Protein inference and shared peptides In complex databases (many similar proteins/isoforms), counts can be split or ambiguously assigned, distorting NSAF.
-
No error model without true biological replication Spectral-count statistics papers explicitly warn that many spectral-count comparisons rely on simplistic transforms and are undermined by bias + limited replicates.
Neilson et al. - “don’t over-interpret label-free spectral counting across conditions” (label-free quant workflows and the need for statistical assessment/validation). And for spectral-count statistical pitfalls specifically.
Choi et al. how common approaches fail to address bias and limited replicates.