Safety risk evaluations of deep foundation construction schemes based on imbalanced data sets
Safety risk evaluations of deep foundation construction schemes are important to ensure safety. However, the amount of knowledge on these evaluations is large, and the historical data of deep foundation engineering is imbalanced. Some adverse factors influence the quality and efficiency of evaluations using traditional manual evaluation tools. Machine learning guarantees the quality of imbalanced data classifications. In this study, three strategies are proposed to improve the classification accuracy of imbalanced data sets. First, data set information redundancy is reduced using a binary particle swarm optimization algorithm. Then, a classification algorithm is modified using an Adaboost-enhanced support vector machine classifier. Finally, a new classification evaluation standard, namely, the area under the ROC curve, is adopted to ensure the classifier to be impartial to the minority. A transverse comparison experiment using multiple classification algorithms shows that the proposed integrated classification algorithm can overcome difficulties associated with correctly classifying minority samples in imbalanced data sets. The algorithm can also improve construction safety management evaluations, relieve the pressure from the lack of experienced experts accompanying rapid infrastructure construction, and facilitate knowledge reuse in the field of architecture, engineering, and construction.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. https://doi.org/10.1007/BF00058655
Cao, H. M. (2014). Research on the risk assessment for the construction safety in the planning and design stages of bridge engineering. Advanced Materials Research, 998–999, 1678– 1681. https://doi.org/10.4028/www.scientific.net/AMR.998-999.1678
Cao, Y., Miao, Q.-G., Liu, J.-C., & Gao, L. (2013). Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, 39(6), 745–758. https://doi.org/10.1016/S1874-1029(13)60052-X
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(3), 321–357. https://doi.org/10.1613/jair.953
Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. Paper presented at the Knowledge Discovery in Databases: PKDD 2003 (pp. 107–119). Springer. https://doi.org/10.1007/978-3-540-39804-2_12
Chuang, L. Y., Chang, H. W., Tu, C. J., & Yang, C. H. (2008). Improved binary PSO for feature selection using gene expression data. Computational Biology and Chemistry, 32(1), 29–38. https://doi.org/10.1016/j.compbiolchem.2007.09.005
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1023/A:1022627411411
Eskesen, S. D., Tengborg, P., Kampmann, J., & Holst Veicherts, T. (2004). Guidelines for tunnelling risk management: International Tunnelling Association, Working Group No. 2. Tunnelling and Underground Space Technology, 19(3), 217–237. https://doi.org/10.1016/j.tust.2004.01.001
Ding, L. Y., Yu, H. L., Li, H., Zhou, C., Wu, X. G., & Yu, M. H. (2012). Safety risk identification system for metro construction on the basis of construction drawings. Automation in Construction, 27, 120–137. https://doi.org/10.1016/j.autcon.2012.05.010
Everson, R. M., & Fieldsend, J. E. (2006). Multi-class ROC analysis from a multi-objective optimisation perspective. Pattern Recognition Letters, 27(8), 918–927. https://doi.org/10.1016/j.patrec.2005.10.016
Fernández, A., López, V., Galar, M., del Jesus, M. J., & Herrera, F. (2013). Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-Based Systems, 42, 97–110. https://doi.org/10.1016/j.knosys.2013.01.018
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(4), 463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Gao, P., Liu, X., & Tong, R. P. (2013). Risk assessment and preliminary study of safety management system on construction works. Applied Mechanics and Materials, 368–370, 1917–1921. https://doi.org/10.4028/www.scientific.net/AMM.368-370.1917
GB50652-2011. Code for risk management of underground works in urban rail transit.
GB50715-2011. Standard for construction safety evaluation of metro engineering.
Hassan, M. R., Ramamohanarao, K., Karmakar, C., Hossain, M. M., & Bailey, J. (2010). A novel scalable multi-class ROC for effective visualization and computation. In Advances in Knowledge Discovery and Data Mining (pp. 107–120). Springer. https://doi.org/10.1007/978-3-642-13657-3_14
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98 (pp.137– 142). Springer. https://doi.org/10.1007/BFb0026683
Kennedy, J., & Eberhart, R. C. (1995). Particle swarm optimization. In Proceedings of ICNN’95 – International Conference on Neural Networks. Perth, WA, Australia. IEEE. https://doi.org/10.1109/ICNN.1995.488968
Kennedy, J., & Eberhart, R. C. (1997). A discrete binary version of the particle swarm algorithm. In 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation. Orlando, FL, USA. IEEE. https://doi.org/10.1109/ICSMC.1997.637339
Krawczyk, B., & Schaefer, G., (2013). An improved ensemble approach for imbalanced classification problems. In 2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI). Timisoara, Romania. IEEE. https://doi.org/10.1109/SACI.2013.6609011
Lee, C. Y., & Lee, Z. J. (2012). A novel algorithm applied to classify unbalanced data. Applied Soft Computing, 12(8), 2481–2485. https://doi.org/10.1016/j.asoc.2012.03.051
Lesser, V., Durfee, E., & Corkill, D. (2006). Trends in cooperative distributed problem solving. IEEE Transactions on Knowledge & Data Engineering, 18, 63–77.
Liu, W., Zhao, T., Zhou, W., & Tang, J. (2018). Safety risk factors of metro tunnel construction in China: an integrated study with EFA and SEM. Safety Science, 105, 98–113. https://doi.org/10.1016/j.ssci.2018.01.009
Li, Y. J., Guo, H. X., Liu, X., Li, Y. N., & Li, J. L. (2016a). Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowledge-Based Systems, 94, 88–104. https://doi.org/10.1016/j.knosys.2015.11.013
Li, Y. J., Guo, H. X., Li, Y. N., & Liu, X. (2016b). A boosting based ensemble learning algorithm in imbalanced data classification. System Engineering – Theory & Practice, 36(1), 189–199.
López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250(11), 113–141. https://doi.org/10.1016/j.ins.2013.07.007
Luo, H., & Gong, P. (2015). A BIM-based code compliance checking process of deep foundation construction plans. Journal of Intelligent & Robotic Systems, 79(3–4), 549–576. https://doi.org/10.1007/s10846-014-0120-z
Khan, M. N., Ksantini, R., Ahmad, S. I., & Guan, L. (2014). Covariance-guided one-class support vector machine. Pattern Recognition, 47(6), 2165–2177. https://doi.org/10.1016/j.patcog.2014.01.004
Park, J., Park, S., & Oh, T., (2015). The development of a webbased construction safety management information system to improve risk assessment. KSCE Journal of Civil Engineering, 19(3), 528–537. https://doi.org/10.1007/s12205-014-0664-2
Patel, D. A., & Jha, K. N. (2017). Developing a process to evaluate construction project safety hazard index using the possibility approach in India. Journal of Construction Engineering and Management, 143(1), 04016081. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001205
Pinto, A. (2014). QRAM a qualitative occupational safety risk assessment model for the construction industry that incorporate uncertainties by the use of fuzzy sets. Safety Science, 63, 57–76. https://doi.org/10.1016/j.ssci.2013.10.019
Preidel, C., & Borrmann, A. (2015). Automated code compliance checking based on a visual language and building information modeling. In Proceedings of the International Symposium on Automation and Robotics in Construction (ISARC) (Vol. 32). IAARC Publications. https://doi.org/10.22260/ISARC2015/0033
Sansakorn, P., & An, M. (2015). Development of risk assessment and occupational safety management model for building construction projects. International Journal of Civil and Environmental Engineering, 9(9), 1248–1255.
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227. https://doi.org/10.1007/BF00116037
Seo, J. W., & Choi, H. (2008). Risk-based safety impact assessment methodology for underground construction projects in Korea. Journal of Construction Engineering and Management, 134(1), 72–81. https://doi.org/10.1061/(ASCE)0733-9364(2008)134:1(72)
Sun, Y., Fang, D., Wang, S., Dai, M., & Lv, X., (2008). Safety risk identification and assessment for Beijing Olympic Venues construction. Journal of Management in Engineering, 24(1), 40–47. https://doi.org/10.1061/(ASCE)0742-597X(2008)24:1(40)
Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Costsensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378. https://doi.org/10.1016/j.patcog.2007.04.009
Tan, X., Hammad, A., & Fazio, P. (2010). Automated code compliance checking for building envelope design. Journal of Computing in Civil Engineering, 24(2), 203–211. https://doi.org/10.1061/(ASCE)0887-3801(2010)24:2(203)
Tao, X., Li, Q., Guo, W., Ren, C., Li, C., Liu, R., & Zou, J. (2019a). Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Information Sciences, 487, 31–56. https://doi.org/10.1016/j.ins.2019.02.062
Tao, X., Li, Q., Ren, C., Guo, W., Li, C., He, Q., & Zou, J. (2019b). Real-value negative selection over-sampling for imbalanced data set learning. Expert Systems with Applications, 129, 118– 134. https://doi.org/10.1016/j.eswa.2019.04.011
Wang, F., Ding, L., Love, P. E. D., & Edwards, D. J. (2016). Modeling tunnel construction risk dynamics: Addressing the production versus protection problem. Safety Science, 87, 101–115. https://doi.org/10.1016/j.ssci.2016.01.014
Wang, Z. Z., & Chen, C. (2017). Fuzzy comprehensive Bayesian network-based safety risk assessment for metro construction projects. Tunnelling and Underground Space Technology, 70, 330–342. https://doi.org/10.1016/j.tust.2017.09.012
Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354. https://doi.org/10.1613/jair.1199
Yang, M., Yin, J. M., & Ji, G. L., (2008). Classification methods on imbalance data: A survey. Journal of Nanjing Normal University (Engineering and Technology Edition), 8(4), 7–12.
Yang, Q. Z., & Xu, X. (2004). Design knowledge modeling and software implementation for building code compliance checking. Building and Environment, 39(6), 689–698. https://doi.org/10.1016/j.buildenv.2003.12.004
Zhang, B. X., & Ma, F. H. (2014). Metro construction safety risk assessment of Xi’an based on CIM model. Applied Mechanics and Materials, 638–640, 804–808. https://doi.org/10.4028/www.scientific.net/AMM.638-640.804
Zhang, S., Shang, C., Wang, C., Song, R., & Wang, X. (2019). Real-time safety risk identification model during metro construction adjacent to buildings. Journal of Construction Engineering and Management, 145(6), 04019034. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001657
Zhang, S., & Zhang, H. X. (2011). Modified KNN algorithm for multi-label learning. Application Research of Computers, 28(12), 4445–4450.
Zhang, Y., Ding, L., & Love, P. E. D. (2017). Planning of deep foundation construction technical specifications using improved case-based reasoning with weighted k-nearest neighbors. Journal of Computing in Civil Engineering, 31(5), 04017029. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000682
Zheng, E. H., Li, P., & Song, Z. H. (2006). Cost sensitive support vector machines. Control and Decision, 21(4), 473–476.
Zheng, X., Ma, F. H. (2014). Metro construction safety risk assessment based on the fuzzy AHP and the comprehensive evaluation method. Applied Mechanics and Materials, 580– 583, 1243–1248. https://doi.org/10.4028/www.scientific.net/AMM.580-583.1243
Zhong, B. T., Ding, L. Y., Luo, H. B., Zhou, Y., Hu, Y. Z., & Hu, H. M. (2012). Ontology-based semantic modeling of regulation constraint for automated construction quality compliance checking. Automation in Construction, 28, 58–70. https://doi.org/10.1016/j.autcon.2012.06.006
Zhong, B., Gan, C., Luo, H., & Xing, X. (2018). Ontology-based framework for building environmental monitoring and compliance checking under BIM environment. Building and Environment, 141, 127–142. https://doi.org/10.1016/j.buildenv.2018.05.046
Zhong, B., & Li, Y. (2015). An ontological and semantic approach for the construction risk inferring and application. Journal of Intelligent & Robotic Systems, 79(3), 449–463. https://doi.org/10.1007/s10846-014-0107-9
Zhou, H.-b., & Zhang, H. (2011). Risk assessment methodology for a deep foundation pit construction project in Shanghai, China. Journal of Construction Engineering and Management, 137(12), 1185–1194. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000391
Zhou, Y., Ding, L. Y., & Chen, L. J. (2013). Application of 4D visualization technology for safety management in metro construction. Automation in Construction, 34, 25–36. https://doi.org/10.1016/j.autcon.2012.10.011