Journal of Agricultural Science and Technology ›› 2020, Vol. 22 ›› Issue (6): 81-90.DOI: 10.13304/j.nykjdb.2019.0900

Previous Articles     Next Articles

Study on Estimation Model of Eucalyptus Accumulation in Guangxi Based on Decision Tree Integrated Learning

LI Xiaowei1,WU Baoguo1*,SU Xiaohui1,CHEN Yuling1,PENG Yiqin2,YU Yonghui2, FAN Xiaohu2   

  1. 1. Forestry Information Institute, School of Information Science and Technology, Beijing Forestry University, Beijing 100083,China;2. Guangxi Gaofeng State Owned Forest Farm, Nanning 530000,China
  • Received:2019-10-30 Online:2020-06-15 Published:2020-02-08

基于决策树集成学习的桉树蓄积预估模型研究

李晓伟1,吴保国1*,苏晓慧1,陈玉玲1,彭意钦2,于永辉2,范小虎2   

  1. 1.北京林业大学信息学院,林业信息化研究所,北京 100083; 2.广西壮族自治区国有高峰林场,南宁 530000
  • 通讯作者: *通信作者 吴保国 Email:wubg@bjfu.edu.cn
  • 作者简介:李晓伟Email:lixiaowei93@126.com;
  • 基金资助:
    国家重点研发计划专项(2017YFD0600906)。

Abstract: Stand accumulation is an important index to measure the productivity of small stand. Taking peak forest farm in Guangxi fastgrowing eucalyptus as the research object, and two stand factors of age and density, and the slope direction, slope, slope, soil etc-site factor as the independent variables, and ha accumulation as dependent variable, this paper compared and analyzed nine choose optimal decision tree model forecast eucalyptus accumulation of different ages using the construction integration, integration, learning methods decision tree model. The results showed that: ① the ensemble learning decision tree model was more accurate than the nonensemble model, and the serial ensemble model boosting was more accurate than the parallel ensemble model bagging. In the serial ensemble model, XGboost model had the best evaluation index, and the training set R2was 081, RMSE was 044, and test set RMSE was 048, MAE was 034. ② the proportion of the importance of the independent variable in the optimal model XGboost was greater than 1%, followed by age (78%), elevation (49%), soil thickness (38%) and density (32%), in which age was much higher than the sum of the importance of other variables, the influence of vertical elevation was greater than that of horizontal slope in spatial position, and the influence of afforestation density was lower than that of soil factors. ③ the model results showed that the generalization test accuracy R2 of the same species in other areas of guangxi was 0785, and the P value was 22E-16, which met the test standard, indicating that the model had better effect in predicting the productivity of fastgrowing eucalyptus species in some areas of Guangxi, and provided a basis for forested harvest prediction of forest farms.

Key words: subcompartment volume prediction model, integrated learning, decision tree, XGboost

摘要: 林分蓄积是衡量小班林分生产力的重要指标。以广西高峰林场速生桉为研究对象,以年龄、密度两个林分因子和坡向、坡位、坡度、土壤等立地因子作为自变量,公顷蓄积作为因变量,利用非集成、集成学习方法构建9个决策树模型,选择最优决策树模型预估不同年龄的桉树蓄积。结果表明:①集成学习决策树模型精度高于非集成模型,串行集成类模型boosting精度高于并行集成类模型bagging,其中串行集成模型中XGboost模型评价指标最优,训练集R2为081,RMSE为044;测试集RMSE为048,MAE为034。② 最优模型XGboost自变量重要性占比大于1%,依次为年龄(78%)、海拔(49%)、土层厚度(38%)以及密度(32%),其中年龄重要性远高于其他变量,纵向海拔高度影响大于空间位置上的横向坡度,造林密度影响程度低于土壤因素。③模型结果在广西其他地区同树种泛化测试精度R2为0785,P值为22E-16,符合检验标准,说明该模型针对广西部分地区速生桉树种生产力预估结果较好,可以为林场造林收获预估提供依据。

关键词: 小班蓄积预估模型, 集成学习, 决策树, XGboost