- Jul 2018
-
europepmc.org europepmc.org
-
On 2016 Jul 17, David Keller commented:
Berger's erroneous remarks in paragraph 4 imply possible serious errors in the Cologuard internal assay results
Berger wrote: "Dr. Keller’s attempt to draw a correlation between the occult hemoglobin value within a Cologuard test result and the stool hemoglobin concentration thresholds for positive results in a commercially available fecal immunochemical tests with separately validated reference points is just one example of how easily the erroneous use of data can yield potentially inappropriate clinical decisions. "
Keller: Each Cologuard specimen is subjected to an assay for occult hemoglobin concentration, measured in ng/mL, a measurement which must be accurate and repeatable by other modes of testing, including "commercially available fecal immunochemical tests (FIT)". Any biochemical measurement, such as fecal hemoglobin concentration, should remain constant and not vary depending on the test used to measure it. The Cologuard composite score requires an accurate measurement of fecal hemoglobin concentration, and it does not matter how that concentration is measured, but the concentrations measured by different tests must come out the same, within the error limits of the tests, or there is a problem.
Berger wrote: "For instance, as discussed at the FDA panel review of Cologuard, due to the Cologuard test’s use of significantly more stool in the Cologuard hemoglobin tube, the cut off of 100ng/mL concentration in the hemoglobin tube collection buffer used by some fecal immunochemical tests was estimated by FDA reviewers to be likely equivalent to more than twice that level within a Cologuard test."
Keller: Your statement is completely false, and reveals a serious misunderstanding of the very simple and basic concept of concentration itself. Assuming a homogeneous specimen, the concentration of a solute (in this case hemoglobin) is independent of the sample size. The fact that the Cologuard assay uses "significantly more stool" should not affect the measured hemoglobin concentration. The amount of collection buffer solution relative to the amount of stool specimen in the test tube should not make any difference. If two different assays are measuring different fecal hemoglobin concentrations for the same homogeneous sample, then one or both assays are erroneous. Because the commercial FIT assays are calibrated and validated by the FDA, while you admit that Cologuard's internal assays are not validated or calibrated to that standard, it appears that the internal Cologuard fecal hemoglobin concentration measurement is erroneous, which, in turn, impairs the accuracy of all Cologuard screens!
Berger wrote: "Even then, the theoretically resulting comparative threshold level is merely the product of conjecture because it has not been studied for that purpose, and to our understanding, the four most commonly used fecal immunochemical tests all generate only qualitative positive or negative results without release of the underlying quantitative information".
Keller: You are wrong - commercial FIT tests yield a BINARY result of positive or negative, which is completely QUANTITATIVE and not in any way qualitative. For example, the FIT test used as the control comparator in the Multitarget clinical trial [1] was positive for fecal hemoglobin concentrations greater than or equal to 100 ng/mL, and negative for lesser concentrations. The results of that FIT test were binary (positive or negative), quantitative, accurate, repeatable and did not vary with the size of the stool specimen! The results of the Cologuard internal fecal hemoglobin concentration assay had better agree with the FIT test results, or there is a serious problem, and it is probably in your assay (see my rebuttal of your second paragraph for a full explanation of why).
Reference
Imperiale, T.F., Ransohoff, D.F., Itzkowitz, S.H., Turnbull, B.A., Ross, M.E. Colorectal Cancer Study Group. Fecal DNA versus fecal occult blood for colorectal cancer screening in an average risk population. N Engl J Med. 2004 Dec 23;351 (270414. PubMed PMID: 15616205).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 16, David Keller commented:
Rebuttal of Berger's paragraph 3; and an explanation of how to define any test's normal range
Berger wrote: "Reporting of the constituent values within a Cologuard test is not only an unapproved use of Cologuard, but the granularity that individual marker levels and the specific composite score might be assumed to provide for clinical decision making in screening patients is not supported."
Keller: The result of any assay you run on my body tissues or wastes, regardless of the granularity or the FDA approval status of the assay, must be made available to me upon request. This is a right of any patient. Nothing you have said releases you from the obligation to inform me of my test results.
Berger wrote: "While the algorithm and the algorithm cut-off that define positive/negative are clinically validated for screening, the individual constituent values are not approved for clinical use outside the context of the algorithm.
Keller: According to the FDA, the Cologuard individual DNA assays are not approved for use outside the Multitarget algorithm primarily because Exact Sciences (your employer) did not apply for their approval, which is the necessary first step in the approval process. Every patient has the right to be informed of the results of any assay you run on his or her DNA, regardless of whether or not the assay is FDA-approved.
Berger wrote: "The component markers do not have individual “normal” reference ranges associated with them and, as a result, these intermediate analytes are not separately interpretable."
Keller: The normal range of any test may be defined as the average value of all test results, plus or minus 2 standard deviations, a range which will contain 95% of the results for the population. The remaining 5% may be defined as "abnormal".
Berger wrote: "This is different from the way that the separate tests ordered in a “test panel” can be interpreted, as under those circumstances, each test in the panel has its own separate reference range."
Keller: No, there is no difference, the process of defining a normal range is the same regardless of whether you are defining it for an internal Cologuard assay or a component of a "test panel".
Regardless of how or whether a reference range is defined, it does not alter the ethical imperative that you must release the result of any test you perform on a patient to that patient. In my original commentary, I presented two clinical scenarios where patients could come to harm because of the failure of Exact Sciences to report extreme or "abnormal" internal assay results, such as a high fecal hemoglobin concentration caused by a non-neoplastic colon disease, or a patient with a high-negative composite score who has a higher-than-average risk of false negative Cologuard screening result, and is given the same 3-year rescreening schedule as a patient whose fecal specimen result has the ideal lowest-risk composite score of zero. In the event that a patient comes to harm as a result of one of these clinical scenarios, Exact Sciences could, and should, become the target of a product liability lawsuit because of the failure to inform these patients of their abnormal internal assay results.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 12, David Keller commented:
Rebuttal of paragraph 2: Dr. Berger mistakes "binary" for "qualitative"
Berger: "The Cologuard test is intended for the qualitative detection of colorectal neoplasia associated biomarkers."
Keller: This so-called qualitative detection process is actually totally quantitative in nature. Each biomarker (DNA mutation, methylation abnormality or hemoglobin concentration) is carefully measured and inserted into a precise equation to generate a precise "Composite Score" which is directly related to the risk of colon cancer. This is a totally quantitative process.
Berger: " The numerical values generated from the component assays of the Cologuard test are excluded from the scope of the approval and are not clinically validated as individual test results."
Keller: I asked the FDA about that. Their reply to me was essentially as follows: the primary reason FDA did not approve the component assays as individual test results was that your company, Exact Sciences, did not apply for such approval. The first step of the FDA approval process for a test is for the manufacturer to apply for approval.
The MultiTarget algorithm was well validated at a composite score of 183 in a large randomized trial [1], which measured the risk of neoplasia (among other parameters) at that point. That validation can be extended across the entire range of "negative" Cologuard scores, from zero to 182, but clinical validation of one or more additional composite scores in that range is required. A reasonable minimum number of points to validate could be composite scores of zero, 60 and 120, thus providing (along with the already validated composite score of 183) approximation of the risk of neoplasia as a function of composite score by the three straight line segments defined by the risk of neoplasia at composite scores of zero, 60, 120 and 183. The mathematical name for this process is called "interpolation", and it will provide a reasonable estimate of neoplasia risk across the entire range of negative composite scores (0 - 182). The more points which are validated within that range, the more accurate the estimate of neoplasia risk will be across the entire range (as measured by root-mean-squared error).
Berger: "These numerical values are only constituents of the validated test algorithm that generates the qualitative Cologuard composite result (positive/negative)."
Keller: Numerical values are quantitative by definition. Your so-called "qualitative" result is derived from a number called the "composite score". The Cologuard result is "positive" if the composite score is 183 or greater, "negative" if the score is 182 or less. The Cologuard result is therefore not qualitative, it is binary. Dr. Berger has clearly mistaken a binary decision-making process for a qualitative one. This is a clear-cut error of nomenclature and conceptualization, for which I will submit an erratum to this journal.
Reference
1: Imperiale, T.F., Ransohoff, D.F., Itzkowitz, S.H., Turnbull, B.A., Ross, M.E. Colorectal Cancer Study Group. Fecal DNA versus fecal occult blood for colorectal cancer screening in an average risk population. N Engl J Med. 2004 Dec 23;351 (270414. PubMed PMID: 15616205).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 12, David Keller commented:
Beyond rebuttal: constructive suggestions for unleashing the full power of ColoGuard screening
My point-by-point rebuttal of the response by Dr. Berger and Dr. Lidgard to my commentary is necessary but of little importance. What is important is that their company's outstanding colon cancer screening system, an implementation of the MultiTarget algorithm marketed under the commercial name "ColoGuard", could function at an even higher level of accuracy if its full power were unleashed for each fecal sample screened.
I propose that the Composite Score be used for risk-stratification of "negative" ColoGuard screens, so that we do not perpetuate the potentially dangerous situation we have now, where patients with Composite Scores ranging from zero to 182 are all treated the same: they are told their result is negative, nobody is certain when to repeat the test, but Medicare will pay for it again in 3 years. However, a composite score of 182 must confer a higher risk of being a false negative than a score of zero. Further, a composite score of 183 is only 0.5% higher than a score of 182, yet a patient with a score of 182 is told their screening result is negative and sent home, while a patient with score 183 is told he needs a colonoscopy immediately. That kind of discontinuity, called a Heaviside step function, is not found in nature. The risk of cancer must, as a natural phenomenon, vary across the range of composite scores in a manner which is smooth and continuous.
In a large randomized clinical trial, Cologuard was found to have a sensitivity of 92.3% for detecting colorectal cancer, and hence, a false-negative rate of 7.7%. The Cologuard assays for malignancy-associated DNA yield results which are directly related to cancer risk, so the risk of a false-negative result must therefore vary directly with the composite score, increasing monotonically over the range of negative composite scores, from zero to 182. In other words, across the range of zero to 182, any composite score of N+1 must confer an incrementally higher risk of false-negative colon cancer than is conferred by a composite score of N. Therefore, patients whose Cologuard screening result is currently reported simply as "negative" can be further risk-stratified by their composite scores, for example:
Score.........Repeat screening interval
0 - 60........Repeat Cologuard in 3 years
61 - 120......Repeat Cologuard in 2 years
121 - 182.....Repeat Cologuard in 1 year
The retail price of one Cologuard test kit is $649, very expensive compared with standard fecal immunochemical screening for colon cancer, which retails for $15 per kit. Risk-stratification would concentrate medical resources where the risk is highest, by allocating additional Cologuard kits to patients at the highest risk of an initial false-negative screen. At the same time, fewer false positive screens would occur, compared with simply repeating the Cologuard screen annually for all patients.
Validation of the composite score across its entire negative range of zero to 182, and determination of where to position the break points in composite score for optimal risk stratification, can be performed using Monte Carlo simulation, with models based on post-approval Cologuard clinical data. Similar techniques were employed by CISNET to compare the outcomes of various screening algorithms, and these simulation results were recently published in JAMA, to support the latest USPSTF colon cancer screening recommendation update [1]. These results can also be applied to hybrid screening strategies, devised in response to spending limits or other constraints [2].
References
1: Knudsen AB, Zauber AG, Rutter CM, Naber SK, Doria-Rose VP, Pabiniak C, Johanson C, Fischer SE, Lansdorp-Vogelaar I, Kuntz KM. Estimation of Benefits, Burden, and Harms of Colorectal Cancer Screening Strategies: Modeling Study for the US Preventive Services Task Force. JAMA. 2016 Jun 21;315(23):2595-609. doi: 10.1001/jama.2016.6828. PubMed PMID: 27305518.
2: Keller DL. A Hybrid Non-Invasive Colon Cancer Screening Strategy, To Maximize Sensitivity With Medicare Coverage. PubMed Commons, accessed on 7/30/2016 at the following URL:
http://www.ncbi.nlm.nih.gov/pubmed/27305518#cm27305518_22819
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2016 Jul 12, David Keller commented:
Beyond rebuttal: constructive suggestions for unleashing the full power of ColoGuard screening
My point-by-point rebuttal of the response by Dr. Berger and Dr. Lidgard to my commentary is necessary but of little importance. What is important is that their company's outstanding colon cancer screening system, an implementation of the MultiTarget algorithm marketed under the commercial name "ColoGuard", could function at an even higher level of accuracy if its full power were unleashed for each fecal sample screened.
I propose that the Composite Score be used for risk-stratification of "negative" ColoGuard screens, so that we do not perpetuate the potentially dangerous situation we have now, where patients with Composite Scores ranging from zero to 182 are all treated the same: they are told their result is negative, nobody is certain when to repeat the test, but Medicare will pay for it again in 3 years. However, a composite score of 182 must confer a higher risk of being a false negative than a score of zero. Further, a composite score of 183 is only 0.5% higher than a score of 182, yet a patient with a score of 182 is told their screening result is negative and sent home, while a patient with score 183 is told he needs a colonoscopy immediately. That kind of discontinuity, called a Heaviside step function, is not found in nature. The risk of cancer must, as a natural phenomenon, vary across the range of composite scores in a manner which is smooth and continuous.
In a large randomized clinical trial, Cologuard was found to have a sensitivity of 92.3% for detecting colorectal cancer, and hence, a false-negative rate of 7.7%. The Cologuard assays for malignancy-associated DNA yield results which are directly related to cancer risk, so the risk of a false-negative result must therefore vary directly with the composite score, increasing monotonically over the range of negative composite scores, from zero to 182. In other words, across the range of zero to 182, any composite score of N+1 must confer an incrementally higher risk of false-negative colon cancer than is conferred by a composite score of N. Therefore, patients whose Cologuard screening result is currently reported simply as "negative" can be further risk-stratified by their composite scores, for example:
Score.........Repeat screening interval
0 - 60........Repeat Cologuard in 3 years
61 - 120......Repeat Cologuard in 2 years
121 - 182.....Repeat Cologuard in 1 year
The retail price of one Cologuard test kit is $649, very expensive compared with standard fecal immunochemical screening for colon cancer, which retails for $15 per kit. Risk-stratification would concentrate medical resources where the risk is highest, by allocating additional Cologuard kits to patients at the highest risk of an initial false-negative screen. At the same time, fewer false positive screens would occur, compared with simply repeating the Cologuard screen annually for all patients.
Validation of the composite score across its entire negative range of zero to 182, and determination of where to position the break points in composite score for optimal risk stratification, can be performed using Monte Carlo simulation, with models based on post-approval Cologuard clinical data. Similar techniques were employed by CISNET to compare the outcomes of various screening algorithms, and these simulation results were recently published in JAMA, to support the latest USPSTF colon cancer screening recommendation update [1]. These results can also be applied to hybrid screening strategies, devised in response to spending limits or other constraints [2].
References
1: Knudsen AB, Zauber AG, Rutter CM, Naber SK, Doria-Rose VP, Pabiniak C, Johanson C, Fischer SE, Lansdorp-Vogelaar I, Kuntz KM. Estimation of Benefits, Burden, and Harms of Colorectal Cancer Screening Strategies: Modeling Study for the US Preventive Services Task Force. JAMA. 2016 Jun 21;315(23):2595-609. doi: 10.1001/jama.2016.6828. PubMed PMID: 27305518.
2: Keller DL. A Hybrid Non-Invasive Colon Cancer Screening Strategy, To Maximize Sensitivity With Medicare Coverage. PubMed Commons, accessed on 7/30/2016 at the following URL:
http://www.ncbi.nlm.nih.gov/pubmed/27305518#cm27305518_22819
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 12, David Keller commented:
Rebuttal of paragraph 2: Dr. Berger mistakes "binary" for "qualitative"
Berger: "The Cologuard test is intended for the qualitative detection of colorectal neoplasia associated biomarkers."
Keller: This so-called qualitative detection process is actually totally quantitative in nature. Each biomarker (DNA mutation, methylation abnormality or hemoglobin concentration) is carefully measured and inserted into a precise equation to generate a precise "Composite Score" which is directly related to the risk of colon cancer. This is a totally quantitative process.
Berger: " The numerical values generated from the component assays of the Cologuard test are excluded from the scope of the approval and are not clinically validated as individual test results."
Keller: I asked the FDA about that. Their reply to me was essentially as follows: the primary reason FDA did not approve the component assays as individual test results was that your company, Exact Sciences, did not apply for such approval. The first step of the FDA approval process for a test is for the manufacturer to apply for approval.
The MultiTarget algorithm was well validated at a composite score of 183 in a large randomized trial [1], which measured the risk of neoplasia (among other parameters) at that point. That validation can be extended across the entire range of "negative" Cologuard scores, from zero to 182, but clinical validation of one or more additional composite scores in that range is required. A reasonable minimum number of points to validate could be composite scores of zero, 60 and 120, thus providing (along with the already validated composite score of 183) approximation of the risk of neoplasia as a function of composite score by the three straight line segments defined by the risk of neoplasia at composite scores of zero, 60, 120 and 183. The mathematical name for this process is called "interpolation", and it will provide a reasonable estimate of neoplasia risk across the entire range of negative composite scores (0 - 182). The more points which are validated within that range, the more accurate the estimate of neoplasia risk will be across the entire range (as measured by root-mean-squared error).
Berger: "These numerical values are only constituents of the validated test algorithm that generates the qualitative Cologuard composite result (positive/negative)."
Keller: Numerical values are quantitative by definition. Your so-called "qualitative" result is derived from a number called the "composite score". The Cologuard result is "positive" if the composite score is 183 or greater, "negative" if the score is 182 or less. The Cologuard result is therefore not qualitative, it is binary. Dr. Berger has clearly mistaken a binary decision-making process for a qualitative one. This is a clear-cut error of nomenclature and conceptualization, for which I will submit an erratum to this journal.
Reference
1: Imperiale, T.F., Ransohoff, D.F., Itzkowitz, S.H., Turnbull, B.A., Ross, M.E. Colorectal Cancer Study Group. Fecal DNA versus fecal occult blood for colorectal cancer screening in an average risk population. N Engl J Med. 2004 Dec 23;351 (270414. PubMed PMID: 15616205).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 16, David Keller commented:
Rebuttal of Berger's paragraph 3; and an explanation of how to define any test's normal range
Berger wrote: "Reporting of the constituent values within a Cologuard test is not only an unapproved use of Cologuard, but the granularity that individual marker levels and the specific composite score might be assumed to provide for clinical decision making in screening patients is not supported."
Keller: The result of any assay you run on my body tissues or wastes, regardless of the granularity or the FDA approval status of the assay, must be made available to me upon request. This is a right of any patient. Nothing you have said releases you from the obligation to inform me of my test results.
Berger wrote: "While the algorithm and the algorithm cut-off that define positive/negative are clinically validated for screening, the individual constituent values are not approved for clinical use outside the context of the algorithm.
Keller: According to the FDA, the Cologuard individual DNA assays are not approved for use outside the Multitarget algorithm primarily because Exact Sciences (your employer) did not apply for their approval, which is the necessary first step in the approval process. Every patient has the right to be informed of the results of any assay you run on his or her DNA, regardless of whether or not the assay is FDA-approved.
Berger wrote: "The component markers do not have individual “normal” reference ranges associated with them and, as a result, these intermediate analytes are not separately interpretable."
Keller: The normal range of any test may be defined as the average value of all test results, plus or minus 2 standard deviations, a range which will contain 95% of the results for the population. The remaining 5% may be defined as "abnormal".
Berger wrote: "This is different from the way that the separate tests ordered in a “test panel” can be interpreted, as under those circumstances, each test in the panel has its own separate reference range."
Keller: No, there is no difference, the process of defining a normal range is the same regardless of whether you are defining it for an internal Cologuard assay or a component of a "test panel".
Regardless of how or whether a reference range is defined, it does not alter the ethical imperative that you must release the result of any test you perform on a patient to that patient. In my original commentary, I presented two clinical scenarios where patients could come to harm because of the failure of Exact Sciences to report extreme or "abnormal" internal assay results, such as a high fecal hemoglobin concentration caused by a non-neoplastic colon disease, or a patient with a high-negative composite score who has a higher-than-average risk of false negative Cologuard screening result, and is given the same 3-year rescreening schedule as a patient whose fecal specimen result has the ideal lowest-risk composite score of zero. In the event that a patient comes to harm as a result of one of these clinical scenarios, Exact Sciences could, and should, become the target of a product liability lawsuit because of the failure to inform these patients of their abnormal internal assay results.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2016 Jul 17, David Keller commented:
Berger's erroneous remarks in paragraph 4 imply possible serious errors in the Cologuard internal assay results
Berger wrote: "Dr. Keller’s attempt to draw a correlation between the occult hemoglobin value within a Cologuard test result and the stool hemoglobin concentration thresholds for positive results in a commercially available fecal immunochemical tests with separately validated reference points is just one example of how easily the erroneous use of data can yield potentially inappropriate clinical decisions. "
Keller: Each Cologuard specimen is subjected to an assay for occult hemoglobin concentration, measured in ng/mL, a measurement which must be accurate and repeatable by other modes of testing, including "commercially available fecal immunochemical tests (FIT)". Any biochemical measurement, such as fecal hemoglobin concentration, should remain constant and not vary depending on the test used to measure it. The Cologuard composite score requires an accurate measurement of fecal hemoglobin concentration, and it does not matter how that concentration is measured, but the concentrations measured by different tests must come out the same, within the error limits of the tests, or there is a problem.
Berger wrote: "For instance, as discussed at the FDA panel review of Cologuard, due to the Cologuard test’s use of significantly more stool in the Cologuard hemoglobin tube, the cut off of 100ng/mL concentration in the hemoglobin tube collection buffer used by some fecal immunochemical tests was estimated by FDA reviewers to be likely equivalent to more than twice that level within a Cologuard test."
Keller: Your statement is completely false, and reveals a serious misunderstanding of the very simple and basic concept of concentration itself. Assuming a homogeneous specimen, the concentration of a solute (in this case hemoglobin) is independent of the sample size. The fact that the Cologuard assay uses "significantly more stool" should not affect the measured hemoglobin concentration. The amount of collection buffer solution relative to the amount of stool specimen in the test tube should not make any difference. If two different assays are measuring different fecal hemoglobin concentrations for the same homogeneous sample, then one or both assays are erroneous. Because the commercial FIT assays are calibrated and validated by the FDA, while you admit that Cologuard's internal assays are not validated or calibrated to that standard, it appears that the internal Cologuard fecal hemoglobin concentration measurement is erroneous, which, in turn, impairs the accuracy of all Cologuard screens!
Berger wrote: "Even then, the theoretically resulting comparative threshold level is merely the product of conjecture because it has not been studied for that purpose, and to our understanding, the four most commonly used fecal immunochemical tests all generate only qualitative positive or negative results without release of the underlying quantitative information".
Keller: You are wrong - commercial FIT tests yield a BINARY result of positive or negative, which is completely QUANTITATIVE and not in any way qualitative. For example, the FIT test used as the control comparator in the Multitarget clinical trial [1] was positive for fecal hemoglobin concentrations greater than or equal to 100 ng/mL, and negative for lesser concentrations. The results of that FIT test were binary (positive or negative), quantitative, accurate, repeatable and did not vary with the size of the stool specimen! The results of the Cologuard internal fecal hemoglobin concentration assay had better agree with the FIT test results, or there is a serious problem, and it is probably in your assay (see my rebuttal of your second paragraph for a full explanation of why).
Reference
Imperiale, T.F., Ransohoff, D.F., Itzkowitz, S.H., Turnbull, B.A., Ross, M.E. Colorectal Cancer Study Group. Fecal DNA versus fecal occult blood for colorectal cancer screening in an average risk population. N Engl J Med. 2004 Dec 23;351 (270414. PubMed PMID: 15616205).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-