We read with interest the review article by Conrads et al. (2004) on the use of high-resolution proteomic profiling of serum for ovarian cancer detection, noting in particular the authors’ claims of 100% sensitivity and specificity. Unfortunately, an aspect of the paper causes us great concern, as it suggests the possible presence of substantial experimental bias.
The source of our concern is contained in Figs 6A and 7 of Conrads et al. (2004). In Fig 6A, the record number (number of non-zero entries) in each Qstar spectrum is plotted against sample number, with different symbols indicating the 3 days on which the samples were run: day 1 on the left, day 2 in the center, and day 3 on the right. As the authors note, this figure shows clearly that something was going wrong with the process near the end of the run. Figure 7 shows the record numbers of the files remaining after the application of a quality control filter against sample number, with controls on the left and cancers on the right. As can be seen in Fig 1 of this letter, when these two plots are superimposed, they coincide perfectly. This suggests that the controls (on the left in Fig 7 from Conrads et al. 2004) were almost all run on day 1, with a small number at the start of day 2, and the cancers were all run on days 2 and 3: the run order of the samples was not randomized. This is a serious problem, because any changes that affect the machine over time can systematically bias the results by distorting one group more than the other. The authors note that exactly such a time trend was present; weakening signals on the third day would affect only cancer spectra in this experiment.
The presence of systematic bias can mean that the results are irreproducible, because they may represent ephemeral experimental conditions rather than true biological patterns. Signs of systematic bias have also been found (Sorace and Zhan 2003, Baggerly et al. 2004) in earlier ovarian cancer datasets used to produce proteomic patterns (Petricoin et al. 2002), calling those results into question as well. Further, valid questions have been raised as to whether the patterns being found represent true diagnostic biomarkers or non-specific epiphenomena (Diamandis 2004). These questions are not addressed in Conrads et al. (2004), though the machine being used should allow for substantially easier identification of the peaks involved. In this context, claims of 100% sensitivity and specificity strike us as premature.
BaggerlyKA Morris JS & Coombes KR 2004 Reproducibility of SELDI-TOF protein patterns in serum: comparing data sets from different experiments. Bioinformatics20777–785.
ConradsTP Fusaro VA Ross S Johann D Rajapakse V Hitt BA Steinberg SM Kohn EC Fishman DA Whitely G et al.2004 High resolution serum proteomic features for ovarian cancer detection. Endocrine-Related Cancer11163–178.
DiamandisEP2004 Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Molecular and Cellular Proteomics3367–378.
PetricoinEF 3rd Ardekani AM Hitt BA Levine PJ Fusaro VA Steinberg SM Mills GB Simone C Fishman DA Kohn EC & Liotta LA 2002 Use of proteomic patterns in serum to identify ovarian cancer. The Lancet359572–577.
SoraceJ & Zhan M 2003 A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics4http://www.biomedcentral.com/1471-2105/4/24/.