Re: [R-C] Congestion Control BOF

11 Oct 2011


      On 10/8/2011 11:29 PM, Justin Uberti wrote:
...
On Sat, Oct 8, 2011 at 10:39 PM, Randell Jesup <randell-ietf@jesup.org
<mailto:randell-ietf@jesup.org>> wrote:
Well, I'm probably being overly-worried about processing delays (and
    in particular differing delays for audio and video).  Let's say
    audio gets sampled at X, and (ignoring other processing steps) takes
    1ms to encode.  It gets to the wire at X + <other steps> + 1.  Lets
    say video is also sampled at X, and (ignoring other processing
    steps) takes 10ms to encode.  It gets to the wire at X + <other
    steps> + 10.  So we've added a 9ms offset to all our A/V sync, and
    in this case it's in the "wrong" direction (people are more
    sensitive to early-audio than early-video). And if "other steps" on
    each side don't balance (and they may not), it could be worse.  I
    also worry more that in a browser, with no access to true RT_PRI
    processing, the delays could be significantly variable (we get
    preempted by some other process/thread for 10 or 20ms, etc).  Also,
    if the receiver isn't careful it could be tricked into skipping
    frames it should be displaying due to jitter in the packet-to-packet
    timestamps.
So perhaps I'm not being overly-worried.  I realize that I'm trading
    off accuracy in bandwidth estimation (or if you prefer, reaction
    speed) for ease in getting a consistent framerate and best-possible
    A/V sync.
    In a perfect world we'd record the sampling time and the delta until
    it was submitted to sendto(), so we'd have both.  (You could use a
    header extension to do that).
There's a lot more going on here. The algorithmic delays for audio and
video will often be different, the capture delays perhaps wildly so. In
addition, you won't want to just dump the video directly onto the wire -
typically it will be leaked out over some interval to avoid bandwidth
spikes, and the audio will have to maintain some jitter buffer to
prevent underrun - so I think the encoding processing deltas will be
nominal compared to the other delays in the pipeline.
Sure - though you have the sampling time of the audio and video, and if 
you do your job right on the playback side, they'll be rock-solid synced 
(and that can be done even if there's static drift between the audio and 
video timestamp clocks).  So long as you don't use time-on-wire 
timestamps...
...
I think this also does illustrate why having "time-on-wire" timestamping
is really useful for increasing estimation accuracy :-)
BTW, I was serious when I said you could improve on this with an RTP 
header extension with "time-on-the-wire" delta from sample time. 
However, I don't think we need this here.  As it would be totally 
optional and ignored, that could be added later.


-- 
Randell Jesup
randell-ietf@jesup.org