This paper presents a real-time speech-to-text (STT) system designed for edge computing environments requiring ultra-low latency and local processing. Differently from cloud-based STT services, the proposed solution runs entirely on a local infrastructure which allows the enforcement of user privacy and provides high performance in bandwidth-limited or offline scenarios. The designed system is based on a browser-native audio capture through WebRTC, real-time streaming with WebSocket, and offline automatic speech recognition (ASR) utilizing the Vosk engine. A natural language processing (NLP) component, implemented as a microservice, improves transcription results for spelling accuracy and clarity. Our prototype reaches sub-second end-to-end latency and strong transcription capabilities under realistic conditions. Furthermore, the modular architecture allows extensibility, integration of advanced AI models, and domain-specific adaptations.

Real-Time Speech-to-Text on Edge: A Prototype System for Ultra-Low Latency Communication with AI-Powered NLP / Di Leo, Stefano; De Cicco, Luca; Mascolo, Saverio. - In: INFORMATION. - ISSN 2078-2489. - ELETTRONICO. - 16:8(2025). [10.3390/info16080685]

Real-Time Speech-to-Text on Edge: A Prototype System for Ultra-Low Latency Communication with AI-Powered NLP

Di Leo, Stefano
;
De Cicco, Luca;Mascolo, Saverio
2025

Abstract

This paper presents a real-time speech-to-text (STT) system designed for edge computing environments requiring ultra-low latency and local processing. Differently from cloud-based STT services, the proposed solution runs entirely on a local infrastructure which allows the enforcement of user privacy and provides high performance in bandwidth-limited or offline scenarios. The designed system is based on a browser-native audio capture through WebRTC, real-time streaming with WebSocket, and offline automatic speech recognition (ASR) utilizing the Vosk engine. A natural language processing (NLP) component, implemented as a microservice, improves transcription results for spelling accuracy and clarity. Our prototype reaches sub-second end-to-end latency and strong transcription capabilities under realistic conditions. Furthermore, the modular architecture allows extensibility, integration of advanced AI models, and domain-specific adaptations.
2025
Real-Time Speech-to-Text on Edge: A Prototype System for Ultra-Low Latency Communication with AI-Powered NLP / Di Leo, Stefano; De Cicco, Luca; Mascolo, Saverio. - In: INFORMATION. - ISSN 2078-2489. - ELETTRONICO. - 16:8(2025). [10.3390/info16080685]
File in questo prodotto:
File Dimensione Formato  
2025_Real-Time_Speech-to-Text_on_Edge_pdfeditoriale.pdf

accesso aperto

Tipologia: Versione editoriale
Licenza: Creative commons
Dimensione 724.39 kB
Formato Adobe PDF
724.39 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11589/293902
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact