Immersive multimedia content delivery is becoming increasingly popular due to the spread of Head Mounted Displays. In particular, omnidirectional video streaming is gaining ground among video delivery platforms. Delivering 360° video content over the Internet requires much larger bandwidth compared to classic 2D videos. Therefore, for the purpose of reducing bandwidth consumption, the tiling technique breaks down the video into smaller portions so that those falling outside the user's viewport are encoded at a low resolution whereas those in the viewport are encoded at a higher resolution. This operation can be performed only when the user's future viewports are known in advance. Thus, it is necessary to provide a trustworthy prediction of future viewports. In this work, we show that users have a tendency to explore the environment at the beginning of the video and then to focus on one of the regions attracting more attention (Points of Interest). This insight is helpful when it comes to designing viewport-adaptive streaming techniques. On this basis, we propose a viewport prediction approach that combines Long Short-Term Memory (LSTM) networks and the classic naive technique. Preliminary simulative tests show promising results.
LSTM-based Viewport Prediction for Immersive Video Systems / Manfredi, G.; Racanelli, V. A.; De Cicco, L.; Mascolo, S.. - (2023), pp. 49-52. (Intervento presentato al convegno 21st Mediterranean Communication and Computer Networking Conference, MedComNet 2023 tenutosi a ita nel 2023) [10.1109/MedComNet58619.2023.10168847].
LSTM-based Viewport Prediction for Immersive Video Systems
Manfredi G.;Racanelli V. A.;De Cicco L.;Mascolo S.
2023-01-01
Abstract
Immersive multimedia content delivery is becoming increasingly popular due to the spread of Head Mounted Displays. In particular, omnidirectional video streaming is gaining ground among video delivery platforms. Delivering 360° video content over the Internet requires much larger bandwidth compared to classic 2D videos. Therefore, for the purpose of reducing bandwidth consumption, the tiling technique breaks down the video into smaller portions so that those falling outside the user's viewport are encoded at a low resolution whereas those in the viewport are encoded at a higher resolution. This operation can be performed only when the user's future viewports are known in advance. Thus, it is necessary to provide a trustworthy prediction of future viewports. In this work, we show that users have a tendency to explore the environment at the beginning of the video and then to focus on one of the regions attracting more attention (Points of Interest). This insight is helpful when it comes to designing viewport-adaptive streaming techniques. On this basis, we propose a viewport prediction approach that combines Long Short-Term Memory (LSTM) networks and the classic naive technique. Preliminary simulative tests show promising results.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.