A/V alignment and segmentation

To achieve seamless transition for certain use cases (such as ad insertion) without introducing artifacts through audio gaps or additional audio, it is important that audio and video I‑frames are aligned at the beginning of segments, and it is also important to have compatible video and audio segment durations so that the segment ends are aligned time wise.

Dolby AC-4 can adapt its frame rate to match commonly used video frame rates (for example, 23.976, 25, and 29.97 fps). Therefore, Dolby AC-4 frames and corresponding video access units can maintain temporal alignment as long as the same frame rate is used in both the audio and video encoders.

To enable seamless switching, Dolby AC-4 I‑frames should be placed temporally aligned with the I‑frames of the video. Most importantly, the first I‑frames in the video and audio segments should be temporally aligned. It is acceptable that the succeeding I‑frames in the corresponding segments may not be aligned.

In the following three figures, the segment length is two seconds. The I‑frame interval here indicates the number of frames between two I‑frames, similar to the length of GOP for video streams. For the first audio and video pair, the I‑frame intervals of both the video and audio streams are one second. The I‑frames, frame rates, and segment sizes are all aligned. For the second and third audio and video pairs, the I‑frame intervals for the video streams are two seconds, while the I‑frame intervals for the audio streams are one second. The second I‑frame in an audio segment is not aligned with any video I‑frame. All of the three examples are suitable as all the first I‑frames in video and audio segments are temporally aligned.

Figure 1: Alignment of I-frame intervals, frame rates, and segment sizes

Figure 2: Alignment of frame rates and segment sizes

Figure 3: Alignment of segment sizes

The following figure shows an example where the first I‑frames of the video segmentn+1 and audio segmentn+1 are not aligned, and thus this user case is not recommended.

Figure 4: I-frames in segmentn+1 not aligned

If I‑frame alignment cannot be achieved, or if the target segment duration is not an integer multiple of the Dolby AC-4 I‑frame interval, Dolby AC-4 segment durations are allowed to fluctuate to maintain close alignment with video segments or the target segment duration timeline. The following figure shows an example where a Dolby AC-4 segment (segmentn+1 in the figure) is shorter by one Dolby AC-4 I‑frame interval to maintain close segment alignment.

Figure 5: Variable Dolby AC-4 segment durations to maintain segment alignment

If the SegmentTimeline element is used to reference segments, the segment timeline must signal accurate segment durations. Otherwise, you must ensure that AC-4 segments still have almost equal durations. The maximum duration deviation for a signal segment must be within ±50% of the signaled segment duration (for example, as indicated by the @duration attribute). The maximum accumulated duration deviation over multiple segments must be within ±50% of the signaled segment, as constrained in DASH-IF interoperability points. The following two figures illustrate the maximum duration deviation of a single segment and the maximum accumulated duration deviation, respectively.

Figure 6: Maximum duration deviation of a single segment

Figure 7: Maximum accumulated deviation

To minimize segment duration fluctuation and meet the preceding constraint, the AC-4 I‑frame interval must be, at most, one fourth of the target segment duration.

Recommended frame rates

Recommended Dolby AC-4 frame rates are listed in following table.

Highest video frame rate used (in fps) Recommended audio frame rates

(in fps, in the order of preference)

120 30, 24[a], 25[a]
119.88 29.97
100 25, 24[a]
60 30, 24[a], 25[a]
59.94 29.97
50 25, 24[a]
48 24, 25[a]

30, 24[a], 25[a]

29.97 29.97
25 25, 24[a]
24 24, 25[a]
23.98 23.98
[a] Recommended only if perfect alignment of audio and video segments can be achieved using the frame rate.