邱维蓉,吴帮玉,潘学树,唐亚明.几种聚类优化的机器学习方法在灵台县滑坡易发性评价中的应用[J].西北地质,2020,53(1):222-233 QIU Weirong,WU Bangyu,PAN Xueshu,TANG Yaming.Application of Several Cluster-optimization-based Machine Learning Methods in Evaluation of Landslide Susceptibility in Lingtai County[J].Northwestern Geology,2020,53(1):222-233
几种聚类优化的机器学习方法在灵台县滑坡易发性评价中的应用
Application of Several Cluster-optimization-based Machine Learning Methods in Evaluation of Landslide Susceptibility in Lingtai County
投稿时间:2019-10-02  修订日期:2019-11-05
DOI:10.19751/j.cnki.61-1149/p.2020.01.021
中文关键词:  滑坡;灵台县;滑坡易发性评价;机器学习方法
英文关键词:landslide;Lingtai County;landsilde susceptibility evaluation;machine learning methods
基金项目:中国博士后科学基金(2016M600780)及中央高校基本科研业务费专项资金(xjj2018260)资助
作者单位
邱维蓉 西安交通大学 数学与统计学院, 陕西 西安 710049 
吴帮玉 西安交通大学 数学与统计学院, 陕西 西安 710049 
潘学树 西安芯数启理信息科技有限公司, 陕西 西安 710065 
唐亚明 中国地质调查局西安地质调查中心/西北地质科技创新中心, 陕西 西安 710054 
摘要点击次数: 204
全文下载次数: 236
中文摘要:
      笔者以甘肃省平凉市灵台县为目标研究区域,基于地理空间和历史滑坡数据,利用混合高斯聚类(GMM)优化的逻辑回归(LR)、支持向量机(SVM)、BP神经网络(BP Neural Network)、随机森林(RF)4种机器学习模型构建滑坡易发性评价分析模型。选取高程、坡度、坡向、曲率、黄土侵蚀强度、归一化植被指数、地质构造7个环境因子作为滑坡易发性影响因子,以30 m栅格建立影响因子地理空间数据库,将研究区域划分为180万栅格单元。利用混合高斯聚类模型对整个研究区域的栅格单元进行聚类,得出初步的滑坡易发分区,选择易发程度最低类别中的栅格单元作为非滑坡区域,每次随机选择500个单元作为非滑坡单元,并根据历史滑坡数据将203个已知滑坡栅格单元作为滑坡单元,建立4种机器学习分类模型。利用训练好的模型对整个研究区域进行预测,绘制各算法的受试者工作曲线(ROC曲线),对各个算法的预测结果进行对比。分析结果表明,在本目标研究区域,各模型的滑坡易发区划图与实际的滑坡分布情况总体相吻合。随机森林模型的ROC曲线下面积(AUC)最大为0.96,测试集准确率最高为0.93;BP神经网络模型的ROC曲线下面积和测试集准确率次之,为0.90和0.87;支持向量机模型和逻辑回归模型的ROC曲线下面积和测试集准确率分别为0.86、0.81和0.85、0.80,均低于随机森林和BP神经网络模型。
英文摘要:
      This paper takes Lingtai County, Pingliang City of Gansu Province as target research area. Based on the geospatial and historical landslide data, four machine learning models were used to construct the landslide susceptibility evaluation model. The four models are BP neural networks model, Random Forest classification model, support vector machine model, and logistic regression model which were optimized by GMM cluster model. In this paper, seven factors are selected as the landslide susceptibility influence factors, including elevation, slope, aspect, loess erosion intensity, vegetation coverage and geological structure. The influence factor of the geospatial database is established with 30m grid. The target area is divided into 1.8 million grid cells, and the grid cells of the whole area are clustered by the GMM model to obtain the preliminary subarea of landslide susceptibility map. 500 grid cells in the lowest-susceptibility category are selected as non-landslide units randomly, and 203 landslide grid units were used as landslide units according to historical landslide data. trained model is used to simulate and predict the whole research area, and to draw the ROC curve of each algorithm. Then compare the prediction results of each algorithm. The results of the analysis showed that the landslide susceptibility map of each algorithm is consistent with the actual landslide development. The random forest model has the largest area of 0.96 under the ROC curve, and the highest prediction accuracy of 0.93. It is followed by the BP-neural-network model with 0.89 under the ROC curve and 0.87 of the prediction accuracy. The area under the ROC curve and prediction accuracy of the support-vector-machine-model is 0.86, 0.81; and the logistic regression model is and 0.85, 0.80 respectively. The latter are lower than the first two models.
查看全文  查看/发表评论  下载PDF阅读器
关闭