Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators play a crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the physical and cognitive behavior of operators working alongside collaborative robots. While existing literature explores temporal action segmentation datasets, there is a lack of evaluation for manufacturing tasks. This work assesses six state-of-the art action segmentation models using the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, where subjects assemble an industrial object in realistic manufacturing scenarios. By employing Cross Subject and Cross-Location approaches, the study not only demonstrates the effectiveness of these models in industrial settings but also introduces a new benchmark for evaluating generalization across different subjects and locations. The evaluation further includes new videos in simulated industrial locations, assessed with both fully and semi-supervised learning approaches. The findings reveal that the Multi-Stage Temporal Convolutional Network ++ (MS-TCN++) and the Action Segmentation Transformer (ASFormer) architectures exhibit high performance in supervised and semi-supervised learning settings, also using new data, particularly when trained with Skeletal features, advancing the capabilities of temporal action segmentation in real-world manufacturing environments. This research lays the foundation for addressing video activity understanding challenges in manufacturing and presents opportunities for future investigations.

Multi-modal temporal action segmentation for manufacturing scenarios / Romeo, Laura; Marani, Roberto; Perri, Anna Gina. - In: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE. - ISSN 0952-1976. - STAMPA. - 148:(2025), pp. 110320.1-110320.13.

Multi-modal temporal action segmentation for manufacturing scenarios

Laura Romeo
Conceptualization
;
Roberto Marani
Methodology
;
Anna Gina Perri
Writing – Review & Editing
2025-01-01

Abstract

Industrial robots have become prevalent in manufacturing due to their advantages of accuracy, speed, and reduced operator fatigue. Nevertheless, human operators play a crucial role in primary production lines. This study focuses on the temporal segmentation of human actions, aiming to identify the physical and cognitive behavior of operators working alongside collaborative robots. While existing literature explores temporal action segmentation datasets, there is a lack of evaluation for manufacturing tasks. This work assesses six state-of-the art action segmentation models using the Human Action Multi-Modal Monitoring in Manufacturing (HA4M) dataset, where subjects assemble an industrial object in realistic manufacturing scenarios. By employing Cross Subject and Cross-Location approaches, the study not only demonstrates the effectiveness of these models in industrial settings but also introduces a new benchmark for evaluating generalization across different subjects and locations. The evaluation further includes new videos in simulated industrial locations, assessed with both fully and semi-supervised learning approaches. The findings reveal that the Multi-Stage Temporal Convolutional Network ++ (MS-TCN++) and the Action Segmentation Transformer (ASFormer) architectures exhibit high performance in supervised and semi-supervised learning settings, also using new data, particularly when trained with Skeletal features, advancing the capabilities of temporal action segmentation in real-world manufacturing environments. This research lays the foundation for addressing video activity understanding challenges in manufacturing and presents opportunities for future investigations.
2025
https://authors.elsevier.com/sd/article/S0952-1976(25)00320-3
Multi-modal temporal action segmentation for manufacturing scenarios / Romeo, Laura; Marani, Roberto; Perri, Anna Gina. - In: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE. - ISSN 0952-1976. - STAMPA. - 148:(2025), pp. 110320.1-110320.13.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/284820
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact