Allen, Jared
(2023)
Application of autoantibody binding curve characteristics and machine learning methods for improving the diagnostic performance of an early detection test for lung cancer.
PhD thesis, University of Nottingham.
Abstract
The EarlyCDT®-Lung test has been technically and clinically validated for the early detection of lung cancer with a sensitivity ~40% and a specificity of ~90% through measurement of a panel of seven serum autoantibodies.
The test generates curves of autoantibody binding to a titrated series of capture antigen concentrations thus providing patient-specific autoantibody profile titration curves. We postulated that the antibodies responsible for false positive results in healthy individuals exhibit different binding kinetics to specific autoantibodies present in cancer patients and that these differences may manifest themselves in the shape of the autoantibody-antigen titration curves.
The EarlyCDT®-Lung test result is currently a simple logic test combination of the results from the seven autoantibodies. The employment of machine learning models to combine the biomarker results, especially with the addition of a number of extra biomarker parameters, may allow improved clinical utility of the test through increased sensitivity and specificity.
A health economic analysis was undertaken to determine the current cost-effectiveness of the EarlyCDT®-Lung test for population screening for lung cancer compared to low-dose computed tomography, it showed that the current test performance was more cost-effective than LDCT screening at £37,679 per QALY, and quantified the performance needed to achieve cost-effectiveness at £30,000 per QALY was sensitivity of 39.8% at 99% specificity, 47.5% at 95% specificity, or 56.2% at 90% specificity respectively.
Serum autoantibodies from three case-control cohorts were measured on the EarlyCDT®-Lung test, as well as on an extended panel of autoantibodies. The titration binding curves returned by the test were analysed for signal magnitude, as well as curve characteristics including Slope, Intercept, Area Under Curve (AUC) and maximum slope obtained over the curve (SlopeMax). A range of unsupervised and supervised machine learning strategies for combining these biomarker results were explored, including principal components analysis, cluster analysis, logistic regression, decision tree analysis, naïve bayes, support vector machines, random forest, and extreme gradient boosting. The performance improvements of these optimised models was, however, modest and inconsistent across cohorts.
Finally, a simulated annealing based algorithm for multivariate panel optimisation was developed as an evolution of the Monte Carlo random search strategy previously used to establish panel cutoff thresholds. This algorithm was able to derive optimal panels that compared favourably to both the current commercial thresholds and to the best models derived by machine learning strategies.
Actions (Archive Staff Only)
|
Edit View |