基于机器学习的膀胱癌发病风险预测研究

Research on bladder cancer risk prediction based on machine learning

  • 摘要: 目的:构建一个融合临床风险因素与血浆蛋白的堆叠集成模型,以预测膀胱癌的发病风险。方法:利用英国生物样本库队列进行研究,纳入419例新发膀胱癌患者和33 453例对照,采用Cox比例风险模型筛选出与膀胱癌发病风险相关的血浆蛋白,应用随机森林与Boruta两种算法进行特征选择,取交集获得两种算法选出的共有特征蛋白。通过堆叠集成学习策略构建膀胱癌发病风险预测模型,将临床风险因素融合特征蛋白进行模型训练和预测。结果:通过Cox比例风险模型筛选出104个与膀胱癌发病风险相关蛋白,经算法筛选后最终筛选出20个特征蛋白。融合临床风险因素与特征蛋白的堆叠集成模型预测膀胱癌发病风险的受试者工作特征(receiver operating characteristic,ROC)曲线下面积为0.788。结论:多种血浆蛋白对膀胱癌发病风险有重要预测作用,结合临床风险因素构建的堆叠集成模型可有效预测膀胱癌的发病风险。

     

    Abstract: Objective: To develop a stacked ensemble model that integrates clinical risk factors and plasma proteins to predict the risk of bladder cancer. Methods: This study utilized the UK Biobank cohort. We included 419 incident bladder cancer cases and 33,453 controls. The Cox proportional hazards model was used to screen out the plasma proteins associated with the risk of bladder cancer, and the two algorithms of random forest and Boruta were used to select the common characteristic proteins selected by the two algorithms. A bladder cancer risk prediction model was constructed using a stacked ensemble learning strategy, training and predicting the model by integrating clinical risk factors with feature proteins. Results: A total of 104 proteins associated with the risk of bladder cancer were identified through the Cox proportional hazards model, and 20 feature proteins were ultimately selected after algorithmic screening. Using a stacked ensemble model that integrates clinical risk factors and feature proteins to predict the risk of bladder cancer, the area under the receiver operating characteristic (ROC) curve reached 0.788. Conclusion: Various plasma proteins play an important role in the risk of bladder cancer, and a stacked ensemble model combined with clinical risk factors effectively predicts the risk of developing bladder cancer.

     

/

返回文章
返回