Page Header

บทความปริทัศน์: งานวิจัยการรู้จำการกระทำของมนุษย์ด้วยการเรียนรู้เชิงลึก
Human Action Recognition in Deep Learning Aspect: Review Article

Tanakon Sawanglok, Pokpong Songmuang

Abstract


การที่คอมพิวเตอร์สามารถรู้จำการกระทำของมนุษย์เกิดจากการที่โมเดลสามารถทำนายได้ว่า มนุษย์กำลังกระทำการกระทำอะไรอยู่ โดยใช้ข้อมูลวิดีโอของสถานะการกระทำในปัจจุบัน การรู้จำการกระทำของมนุษย์สามารถนำไปประยุกต์ได้หลายด้านเช่น ความปลอดภัย ความบันเทิง การอำนวยความสะดวก ซึ่งงานวิจัยในสาขานี้มีการเปลี่ยนแปลงรวดเร็วมาก ดังนั้น ในงานสำรวจนี้รวบรวมงานวิจัยการรู้จำของมนุษย์ที่ใช้เทคนิคการเรียนรู้เชิงลึกซึ่งมีประสิทธิภาพสูง ถูกนำไปใช้ต่อยอดในงานวิจัยอื่น หรือ นำไปใช้ในอุตสาหกรรม โดยจะนำเสนอความท้าทายในงานรู้จำการกระทำ โจทย์ปัญหาของผู้วิจัย เทคนิคและสถาปัตยกรรมที่ใช้ ข้อจำกัดของงานวิจัย

Computer capability of human action recognition is on the basis that the model can make and inference-based inductive predictions on human actions using video action reasoning of the current activity status. Human action recognition can be widely applied in several fields such as security, entertainment, and facilitation. The research in this field is changing rapidly. In this review article, research information on human action recognition using deep learning techniques will be gathered. State-of-the-art techniques will be further applied in various fields and across industries. The review article presents challenges in action recognition, researcher problems, techniques, research methodology algorithm along with research limitations.


Keywords



[1] I. E. Olatunji and C. H. Cheng, “Video analytics for visual surveillance and applications: An overview and survey,” in Learning and Analytics in Intelligent Systems. Springer, Cham, 2019.

[2] M. Ravinder, T. V. Gopal, and T. V. N. Rao, “Video indexing and retrieval - applications and challenges,” Oriental Journal of Computer Science & Technology, vol. 3, pp. 125–137, 2010.

[3] S. Akihiko, K. Kiichi, K. Masahiro, H. Shoichi, N. Masayuki, and S. Makoto, “Entertainment applications of human-scale virtual reality systems,” in Advances in Multimedia Information Processing - PCM 2004. Springer, Berlin, Heidelberg, 2004.

[4] J. Prakash and L. M. Nithya, “A survey on semisupervised learning techniques,” International Journal of Emerging Trends & Technology in Computer Science, vol. 8, no. 1, pp. 25–29, 2014.

[5] M. Luca, W. Xinyi, and P. Anastasia, “Mind the gap: Developments in autonomous driving research and the sustainability challenge,” Journal of Cleaner Production, vol. 275, 2020.

[6] S. Karen and Z. Andrew. (2014, November). Two-stream convolutional networks for action recognition in videos. [Online]. Available: https://arxiv.org/abs/1406.2199v2

[7] W. Limin, X. Yuanjun, W. Zhe, Q. Yu, L. Dahua, T. Xiaoou, and V. G. Luc, “Temporal segment networks: Towards good practices for deep action recognition,” in European Conference on Computer Vision, Springer, Cham, 2016.

[8] F. Christoph, P. Axel, and Z. Andrew, “Convolutional two-stream network fusion for video action recognition,” presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[9] G. Rohit, R. Deva, M. Harikrishna, and S. Josef, “Action VLAD: Learning spatio-temporal aggregation for action classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 971–980.

[10] C. Joao and Z. Andrew, “Quo vadis, action recognition? a new model and the kinetics dataset,” presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[11] Z. Yi, L. Zhenzhong, N. Shawn, and G. H. Alexander, “Hidden two-stream convolutional networks for action recognition,” in Computer Vision – ACCV 2018, Springer, Cham, 2018.

[12] R. Gao, B. Xiong, and K. Grauman, “Im 2Flow: Motion hallucination from static images for action recognition,” presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.

[13] T. Du, B. Lubomir, F. Rob, T. Lorenzo, and P. Manohar, “Learning spatiotemporal features with 3D convolutional networks,” presented at the 2015 IEEE International Conference on Computer Vision (ICCV), 2015.

[14] A. Diba, M. Fayyaz, V. Sharma, A. H. Karami, M. M. Arzani, R. Yousefzadeh, and L. V. Gool. (2017, November). Temporal 3D ConvNets: New architecture and transfer learning for video lassification [Online]. Available: https:// arxiv.org/abs/1711.08200

[15] Y. Zhou, X. Sun, Z.-J. Zha, and W. Zeng, “MiCT: Mixed 3D/2D convolutional tube for human action recognition,” presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[16] D. Jeff, A. H. Lisa, R. Marcus, V. Subhashini, G. Sergio, S. Kate, and D. Trevor, “Long-term recurrent convolutional networks for visual recognition and description,” presented at the IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.

[17] J. Wang, W. Wang, X. Chen, R. Wang, and W. Gao, “Deep alternative neural network: Exploring contexts as early as possible for action recognition,” presented at the Advances in Neural Information Processing Systems, 2016.

[18] G. Rohit and R. Deva, “Attentional pooling for action recognition,” presented at the Conference on Neural Information Processing Systems, 2017.

[19] V. Safvan and C. C. Narendrasinh, “Deep neural network model for group activity recognition using contextual relationship,” Engineering Science and Technology an International Journal, vol. 22, no. 1, pp. 47–54, 2019.

[20] Z. Yue, X. Yuanjun, and L. Dahua, “Trajectory convolution for action recognition,” presented at the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 2018.

[21] C. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and P. Krähenbühl, “Compressed Video Action Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6026–6035.

[22] C. Vasileios, W. Philippe, R. Jerome, and S. Cordelia, “PoTion: Pose motion representation for action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7024– 7033.

[23] L. Maosen, C. Siheng, C. Xu, Z. Ya, W. Yanfeng, and T. Qi, “Actional-structural graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3595–3603.

[24] C. Nieves, W. Philippe, A. Karteek, and S. Cordelia, “MARS: Motion-augmented rgb stream for action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7882–7891.

[25] M. M. Brais, M. Davide, X. Yuanjun, and T. Joseph, “Action recognition with spatial-temporal discriminative filter banks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 5482–5491.

Full Text: PDF

DOI: 10.14416/j.kmutnb.2022.07.013

ISSN: 2985-2145