A/V alignment and segmentation

Dolby AC-4 can adapt its frame rate to match commonly used video frame rates (for example, 23.976, 25, and 29.97 fps). Therefore, Dolby AC-4 frames and corresponding video access units can maintain temporal alignment so long as the same frame rate is used in both the audio and video encoders.

Temporally aligned audio and video are especially important at the beginning and particularly at the end of segments when trying to achieve seamless transition for certain use cases (such as ad insertion) without introducing artifacts through audio gaps or additional audio.

Dolby AC-4 I‑frames should be placed temporally aligned with the I‑frames of the video to enable seamless switching. Most importantly, the first I‑frames in the video and audio segments should be temporally aligned. It is acceptable that the succeeding I‑frames in the corresponding segments are not aligned.

In the following three figures, the segment length is two seconds. For the first audio and video pair, the I‑frame intervals of both the video and audio streams are one second. The I‑frames, frame rates, and segment sizes are all aligned. For the second and third audio and video pairs, the I‑frame intervals for the video streams are two seconds, while the I‑frame intervals for the audio streams are one second. The second I‑frame in an audio segment is not aligned with any video I‑frame. All of the three examples are suitable as all the first I‑frames in video and audio segments are temporally aligned.

Figure 1: Alignment of I-frame intervals, frame rates, and segment sizes


Figure 2: Alignment of frame rates and segment sizes


Figure 3: Alignment of segment sizes


The following figure shows an example where the first I‑frames of the video segmentn+1 and audio segmentn+1 are not aligned, and thus this user case is not recommended.

Figure 4: I-frames in segmentn+1 are not aligned


If frame alignment cannot be achieved, or if the target segment duration is not an integer multiple of the Dolby AC-4 I‑frame interval, Dolby AC-4 segment durations are allowed to fluctuate to maintain close alignment with video segments or the target segment duration. The following figure shows an example where an Dolby AC-4 segment (segment(n+1) in the figure) is shorter by one Dolby AC-4 I‑frame interval to maintain close segment alignment

library_booksNote: The actual segment length must not exceed the target duration by more than 0.5 seconds.

Figure 5: Variable Dolby AC-4 segment durations to maintain segment alignment


In an HLS playlist, the #EXT-X-TARGETDURATION parameter is used to determine the target segment duration. The #EXTINF parameter indicates the actual length of each segment, as shown in the following example.

#EXTM3U
#EXT-X-TARGETDURATION:8
#EXT-X-VERSION:7
#EXT-X-MEDIA-SEQUENCE:1
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-MAP:URI="main.mp4",BYTERANGE="1118@0"
#EXTINF:7.98333,	
#EXT-X-BYTERANGE:1700094@1118
main.mp4
#EXTINF:8.00000,	
#EXT-X-BYTERANGE:1789481@1701212
main.mp4
#EXTINF:8.00000,	
#EXT-X-BYTERANGE:1777588@3490693
main.mp4
#EXTINF:8.00000,	
#EXT-X-BYTERANGE:1752144@5268281
main.mp4
#EXTINF:7.26667,	
#EXT-X-BYTERANGE:1563219@7020425
main.mp4
#EXTINF:8.00000,	
#EXT-X-BYTERANGE:1801953@8583644