Enhancing action recognition of construction workers using data-driven scene parsing

    Jun Yang   Affiliation


Vision-based action recognition of construction workers has attracted increasing attention for its diverse applications. Though state-of-the-art performances have been achieved using spatial-temporal features in previous studies, considerable challenges remain in the context of cluttered and dynamic construction sites. Considering that workers actions are closely related to various construction entities, this paper proposes a novel system on enhancing action recognition using semantic information. A data-driven scene parsing method, named label transfer, is adopted to recognize construction entities in the entire scene. A probabilistic model of actions with context is established. Worker actions are first classified using dense trajectories, and then improved by construction object recognition. The experimental results on a comprehensive dataset show that the proposed system outperforms the baseline algorithm by 10.5%. The paper provides a new solution to integrate semantic information globally, other than conventional object detection, which can only depict local context. The proposed system is especially suitable for construction sites, where semantic information is rich from local objects to global surroundings. As compared to other methods using object detection to integrate context information, it is easy to implement, requiring no tedious training or parameter tuning, and is scalable to the number of recognizable objects.

Keyword : worker, action recognition, scene parsing, computer vision, context

How to Cite
Yang, J. (2018). Enhancing action recognition of construction workers using data-driven scene parsing. Journal of Civil Engineering and Management, 24(7), 568-580.
Published in Issue
Nov 19, 2018
Abstract Views
PDF Downloads
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.


Akhavian, R.; Behzadan, A. H. 2015. Construction equipment activity recognition for simulation input modeling using mobile sensors and machine learning classifiers, Advanced Engineering Informatics 29(4): 867–877.

Akhavian, R.; Behzadan, A. H. 2016. Smartphone-based construction workers’ activity recognition and classification, Automation in Construction 71: 198–209.

Biederman, I.; Mezzanotte, R. J.; Rabinowitz, J. C. 1982. Scene perception: Detecting and judging objects undergoing relational violations, Cognitive Psychology 14(2): 143–177.

Brilakis, I.; Park, M.; Jog, G. M. 2011. Automated vision tracking of project related entities, Advanced Engineering Informatics 25(4): 713–724.

Bugler, M.; Ogunmakin, G.; Teizer, J.; Vela, P. A.; Borrmann, A. 2014. A comprehensive methodology for vision-based progress and activity estimation of excavation processes for productivity assessment, in Proceedings of the 21st International Workshop: Intelligent Computing in Engineering (EG-ICE), 2014, Cardiff, Wales.

Cheng, T.; Venugopal, M.; Teizer, J.; Vela, P. A. 2011. Performance evaluation of ultra wideband technology for construction resource location tracking in harsh environments, Automation in Construction 20(8): 1173–1184.

Cho, D.; Cho, H.; Kim, D. 2014. Automatic data processing system for integrated cost and schedule control of excavation works in NATM tunnels, Journal of Civil Engineering and Management 20(1): 132–141.

CII. (Ed). 2010. IR252.2a – Guide to activity analysis. Construction Industry Institute, Austin, TX, USA [online], [cited 02 March 2018]. Available from Internet:

Costin, A. M.; Pradhananga, N.; Teizer, J. 2012. Leveraging passive RFID technology for construction resource field mobility and status monitoring in a high-rise renovation project, Automation in Construction 24: 1–15.

Dollar, P.; Rabaud, V.; Cottrell, G.; Belongie, S. 2005. Behavior recognition via sparse spatio-temporal features, in 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, IEEE, 65–72.

Ding, L.; Fang, W.; Luo, H.; Love, P. E. D.; Zhong, B.; Ouyang, X. 2018. A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long shortterm memory, Automation in Construction 86: 118–124.

Everingham, M.; van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. 2008. Overview and results of the classification challenge, in The PASCAL VOC08 Challenge Workshop, in conj. with ECCV.

Fang, Q.; Li, H.; Luo, X.; Ding, L.; Rose, T. M.; An, W.; Yu, Y. 2018. A deep learning-based method for detecting noncertified work on construction sites, Advanced Engineering Informatics 35: 56–68.

Fathi, H.; Dai, F.; Lourakis, M. 2015. Automated as-built 3D reconstruction of civil infrastructure using computer vision: achievements, opportunities, and challenges, Advanced Engineering Informatics 29: 149–161.

Gerek, I. H.; Erdis, E.; Mistikoglu, G.; Usmen, M. 2014. Modelling masonry crew productivity using two artificial neural network techniques, Journal of Civil Engineering and Management 21(2): 231–238.

Golparvar-Fard, M.; Heydarian, A.; Niebles, J. C. 2013. Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers, Advanced Engineering Informatics 27(4): 652–663.

Gong, J.; Caldas, C. H. 2011. An object recognition, tracking, and contextual reasoning-based video interpretation method for rapid productivity analysis of construction operations, Automation in Construction 20(8): 1211–1226.

Gong, J.; Caldas, C. H.; Gordon, C. 2011. Learning and classifying actions of construction workers and equipment using bag-of-video-feature-words and Bayesian network models, Advanced Engineering Informatics 25(4): 771–782.

Gouett, M. C.; Haas, C. T.; Goodrum, P. M.; Caldas, C. H. 2011. Activity analysis for direct-work rate improvement in construction, Journal of Construction Engineering and Management 137(12): 1117–1124.

Gupta, A. K.; Kembhavi, A.; Davis, L. S. 2009. Observing human-object interactions: Using spatial and functional compatibility for recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(10): 1775–1789.

Han, S.; Lee, S.; Pena-Mora, F. 2014. Comparative study of motion features for similarity-based modeling and classification of unsafe actions in construction, Journal of Computing in Civil Engineering 28(5): A4014005.

Herath, S.; Harandi, M. T.; Porikli, F. 2017. Going deeper into action recognition: A survey, Image and Vision Computing 60: 4–21.

Joshua, L.; Varghese, K. 2011. Accelerometer-based activity recognition in construction, Journal of Computing in Civil Engineering 25(5): 370–379.

Joshua, L.; Varghese, K. 2013. Selection of accelerometer location on bricklayers using decision trees, Computer-Aided Civil and Infrastructure Engineering 28(5): 372–388.

Kim, H.; Kim, K.; Kim, H. 2016. Data-driven scene parsing method for recognizing construction site objects in the whole image, Automation in Construction 71: 271–282.

Kim, J. Y.; Caldas, C. H. 2013. Vision-based action recognition in the internal construction site using interactions between worker actions and construction objects, in International Symposium on Automation and Robotics in Construction and Mining, 661–668.

Krizhevsky, A.; Sutskever, I.; Hinton, G. E. 2017. ImageNet classification with deep convolutional neural networks, Communications of the ACM 60(6): 84–90.

Laptev, I. 2005. On space-time interest points, International Journal of Computer Vision 64(2/3): 107–123.

Laptev, I.; Marszalek, M.; Schmid, C.; Rozenfeld, B. 2008. Learning realistic human actions from movies, in International Conference on Computer Vision and Pattern Recognition, 1–8.

Liu, C.; Yuen, J.; Torralba, A. 2011a. Nonparametric scene parsing via label transfer, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(12): 2368–2382.

Liu, C.; Yuen, J.; Torralba, A. 2011b. SIFT flow: Dense correspondence across scenes and its applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5): 978–994.

Luo, X.; Li, H.; Cao, D.; Dai, F.; Seo, J.; Lee, S. 2018. Recognizing diverse construction activities in site images via relevance networks of construction related objects detected by convolutional neural networks, Journal of Computing in Civil Engineering 32(3): 04018012.

Marszalek, M.; Laptev, I.; Schmid, C. 2009. Actions in context, in International Conference on Computer Vision and Pattern Recognition, 2929–2936.

Memarzadeh, M.; Golparvarfard, M.; Niebles, J. C. 2013. Automated 2D detection of construction equipment and workers from site video streams using histograms of oriented gradients and colors, Automation in Construction 32: 24–37.

Navon, R.; Goldschmidt, E. 2010. Examination of worker – location measurement methods as a research tool for automated labor control, Journal of Civil Engineering and Management 16(2): 249–256.

Oliva, A.; Torralba, A. 2001. Modeling the shape of the scene: A holistic representation of the spatial envelope, International Journal of Computer Vision 42(3): 145–175.

Onofri, L.; Soda, P.; Pechenizkiy, M.; Iannello, G. 2016. A survey on using domain and contextual knowledge for human activity recognition in video streams, Expert Systems with Applications 63: 97–111.

Peddi, A.; Huan, L.; Bai, Y.; Kim, S. 2009. Development of human pose analyzing algorithms for the determination of construction productivity in real-time, in Construction Research Congress, 2009, ASCE, Seattle, WA, USA, 1: 1–20.

Pradhananga, N.; Teizer, J. 2013. Automatic spatiotemporal analysis of construction site equipment operations using GPS data, Automation in Construction 29: 107–122.

Rezazadeh Azar, E.; Mccabe, B. 2012. Part based model and spatial temporal reasoning to recognize hydraulic excavators in construction images and videos, Automation in Construction 24: 194–202.

Rezazadeh Azar, E.; Dickinson, S.; McCabe, B. 2012. Server-customer interaction tracker: computer vision-based system to estimate dirt-loading cycles, Journal of Construction Engineering and Management 139(7): 785–794.

Russell, B. C.; Torralba, A.; Murphy, K.; Freeman, W. T. 2008. LabelMe: A database and web-based tool for image annotation, International Journal of Computer Vision 77: 157–173.

Seo, J.; Han, S.; Lee, S.; Kim, H. 2015. Computer vision techniques for construction safety and health monitoring, Advanced Engineering Informatics 29(2): 239–251.

Tang, M.; Gorelick, L.; Veksler, O.; Boykov, Y. 2013. Grabcut in one cut, in 14th IEEE International Conference on Computer Vision, 1769–1776.

Teizer, J. 2015. Status quo and open challenges in vision-based sensing and tracking of temporary resources on infrastructure construction sites, Advanced Engineering Informatics 29(2): 225–238.

Ullah, M. M.; Parizi, S. N.; Laptev, I. 2010. Improving bag-of-features action recognition with non-local cues, in Proceedings of the British Machine Vision Conference, September 2010. BMVA Press, 95.1–95.11.

Wang, H.; Klaser, A.; Schmid, C.; Liu, C. L. 2013. Dense trajectories and motion boundary descriptors for action recognition, International Journal of Computer Vision 103(1): 60–79.

Yang, J.; Arif, O.; Vela, P. A.; Teizer, J.; Shi, Z. 2010. Tracking multiple workers on construction sites using video cameras, Advanced Engineering Informatics 24(4): 428–434.

Yang, J.; Vela, P.; Teizer, J.; Shi, Z. 2014. Vision-based tower crane tracking for understanding construction activity, Journal of Computing in Civil Engineering 28(1): 103–112.

Yang, J.; Park, M. W.; Vela, P. A.; Golparvar-Fard, M. 2015. Construction performance monitoring via still images, timelapse photos, and video streams: Now, tomorrow, and the future, Advanced Engineering Informatics 29: 211–224.

Yang, J.; Shi, Z.; Wu, Z. 2016. Vision-based action recognition of construction workers using dense trajectories, Advanced Engineering Informatics 30(3): 327–336.

Yao, B.; Fei-Fei, L. 2010a. Grouplet: A structured image representation for recognizing human and object interactions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9–16.

Yao, B.; Fei-Fei, L. 2010b. Modeling mutual context of object and human pose in human-object interaction activities, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 17–24.

Ziaeefard, M.; Bergevin, R. 2015. Semantic human activity recognition: A literature review, Pattern Recognition 48(8): 2329–2345.

Zou, J.; Kim, H. 2007. Using hue, saturation, and value color space for hydraulic excavator idle time analysis, Journal of Computing in Civil Engineering 21(4): 238–246.