1. Probabilistic evaluation of cultural soil heritage hazards in China from extremely imbalanced site investigation data using SMOTE-Gaussian process classification.
- Author
-
Song, Chao, Peng, Hongzhen, Xu, Ling, Zhao, Tengyuan, Guo, Zhiqian, and Chen, Wenwu
- Subjects
- *
CULTURAL property , *GAUSSIAN processes , *HAZARDS , *HUMIDITY , *SOIL classification , *MACHINE learning - Abstract
• A SMOTE approach was proposed to handle imbalanced data of hazards levels. • SMOTE was integrated with Gaussian process for the prediction of hazards levels. • A real-world example is used to illustrate and validate the proposed approach. • Results from different data generation approaches are compared with that of SMOTE. • Results from different machine learning methods are compared with that of GP. Cultural soil heritages (CSHs) are artifacts with historical, artistic, and scientific significance; however, they are vulnerable to various hazards, such as weathering, fractures, hollowing, collapses, and gullies. This is especially true for those CSHs exposed to the outdoors. Due to the large number of CSHs sites within China, managing and protecting these heritages with the aid of detailed on-site investigations is time-consuming and expensive. Consequently, evaluating the spatial distribution and degree of hazards developed in all these heritages becomes impractical. To address this issue, this paper developed a Gaussian process classification (GPC) method to predict the spatial distribution of typical hazards (i.e., weathering, fractures, hollowing, collapses, and gullies) and the development level of each hazard from eight environmental factors (e.g., annual relative humidity and annual sunshine time) and a limited number of investigation data. As the number of investigation data for different levels of each hazard is usually imbalanced and sparse, this study proposed a synthetic minority oversampling technique (SMOTE) with GPC to form the SMOTE-GPC method. A real-world example is used to illustrate this approach. Results from real-world data demonstrated that the proposed method achieved an F 1 score, precision, recall, and Cohen's kappa with values greater than 0.93 in both the training and testing datasets, indicating its good performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF