Speaker:Bruce X.B. Yu, Post-doc Fellow, Department of Computing, Hong Kong Polytechnic University
Time:8:30 - 9:30, July 28, 2023 (Beijing Time)
Location:Online
Abstract
This talk will introduce the latest outcomes of human action recognition (HAR) in RGB-D videos and its future directions. Being applicable to broad application scenarios such as smart city, healthcare, manufacturing, etc., HAR in RGB-D videos has been widely investigated since the release of the affordable Microsoft Kinect v2sensor. Currently, unimodal approaches (e.g., skeleton-based and RGB video-based) have realized substantial improvements on increasingly larger datasets. However, multimodal methods can be challenging to achieve good performance due to the lack of effective fusion schemes. In this talk, a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities will be introduced. The objective of MMNet is to improve the ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities. MMNet has achieved competitive results on five benchmark datasets: NTU RGB+D 60, NTU RGB+D 120, PKU-MMD, Northwestern-UCLA Multiview, and Toyota Smarthome. Based on the results, future research directions of HAR will be concluded.
Biography
Dr. Bruce X.B. Yu is a Post-doc Fellow with the Department of Computing at the Hong Kong Polytechnic University where he obtained Ph.D. in 2020. His areas of research expertise include big data analytics, artificial intelligence, and image/video processing. His main research topic is vision-based human behavior understanding, leading to publications on top venues such as TPAMI, Pattern Recognition, ICCV, AAAI, and IJCAI. He has been actively conducting interdisciplinary research through various practical application-oriented projects, leading to publications on top venues such as IEEE Transactions on Automation Science and Engineering, and International Journal of Contemporary Hospitality Management. Besides application-driven research, he also works on fundamental research problems such as 3D human pose reconstruction, multimodal data/sensor fusion, and efficient transfer learning.