The statistical association of a gene expression marker with breast cancer outcome cannot reliably indicate that the biological mechanism represented by that marker is relevant to cancer, because most gene expression signatures—including random ones—show significant outcome association. Without proper adjustment for confounding variables like cell proliferationproliferation, attributing clinical significance to a marker based solely on outcome correlation is not justified. [@venet_most_2011]

Definitions

Synthesis

Studies have established that the statistical association between gene expression signatures and breast cancer outcomes does not necessarily indicate biological relevance, as demonstrated by the finding that over 90% of random gene signatures containing more than 100 genes achieve significant outcome association despite bearing no relationship to cancer biology. The mechanistic explanation for this phenomenon centers on confounding by cell proliferation, where adjustment for a proliferation metagene derived from PCNA-correlated genes almost entirely abrogates the outcome associations of both published and random signatures. A critical unresolved issue is whether removing cell cycle genes from signatures can adequately control for this confounding effect, with evidence suggesting that proliferation effects extend beyond classical cell cycle gene expression and cannot be eliminated through selective gene exclusion alone, raising fundamental questions about whether outcome association serves as a valid indicator that a gene signature reflects a causally relevant biological mechanism.

Bibliography