LEDBAT vs RTCWeb

older
Question about the problem with...

Harald Alvestrand

10 Apr 2012 10 Apr '12

10:10 a.m.

Just to summarize what I currently understand about LEDBAT as compared to the scenario that RTCWEB envisions..... - RTCWEB assumes a set of media streams, possibly supplemented by a set of data streams carrying data with real-time requirements. LEDBAT assumes that there is some amount of data that needs transferring, and that it's appropriate to delay data if congestion occurs. - RTCWEB wants to make sure delay is low as long as it's not fighting with TCP, and wants its "fair share" when competing with TCP. LEDBAT wants to back off when encountering TCP, and uses low delay as a signal of "no competition is occuring"; it doesn't care about the specific delay. - RTCWEB's RTP streams consists of unidirectional streams with (currently) fairly infrequent feedback messages. LEDBAT assumes an acknowledgement stream with nearly the same packet intervals as the forward stream. My conclusion: When discussing behaviour of specific models, we can learn from LEDBAT's experiences and the scenarios it was tested in, but the design goals of LEDBAT do not resemble the design goals for congestion control in the RTCWEB scenario, and we should not expect specific properties of the implementation to fit. If I'm totally off base on any of the above, please holler..... Harald

Show replies by date

Stefan Holmer

10 Apr 10 Apr

11:51 a.m.

On Tue, Apr 10, 2012 at 12:10 PM, Harald Alvestrand <harald@alvestrand.no>wrote:

...

Just to summarize what I currently understand about LEDBAT as compared to the scenario that RTCWEB envisions.....

- RTCWEB assumes a set of media streams, possibly supplemented by a set of data streams carrying data with real-time requirements. LEDBAT assumes that there is some amount of data that needs transferring, and that it's appropriate to delay data if congestion occurs.

- RTCWEB wants to make sure delay is low as long as it's not fighting with TCP, and wants its "fair share" when competing with TCP. LEDBAT wants to back off when encountering TCP, and uses low delay as a signal of "no competition is occuring"; it doesn't care about the specific delay.

I don't think it's clear that an RTCWEB flow at all times want to compete with a TCP flow to get its fair share. For instance I can imagine that a user may find that a delay of 1-2 seconds caused by the TCP flow makes the experience too bad, and that it's better to leave more bandwidth for TCP so that the TCP transfer will finish more quickly. Depends on the amount of buffer bloat, the length of the TCP flow, user preference, etc.

...

- RTCWEB's RTP streams consists of unidirectional streams with (currently) fairly infrequent feedback messages. LEDBAT assumes an acknowledgement stream with nearly the same packet intervals as the forward stream.

My conclusion: When discussing behaviour of specific models, we can learn from LEDBAT's experiences and the scenarios it was tested in, but the design goals of LEDBAT do not resemble the design goals for congestion control in the RTCWEB scenario, and we should not expect specific properties of the implementation to fit.

I agree.

...

If I'm totally off base on any of the above, please holler.....

Harald

______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Randell Jesup

2:02 p.m.

On 4/10/2012 7:51 AM, Stefan Holmer wrote:

...

On Tue, Apr 10, 2012 at 12:10 PM, Harald Alvestrand <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:

Just to summarize what I currently understand about LEDBAT as compared to the scenario that RTCWEB envisions.....

- RTCWEB assumes a set of media streams, possibly supplemented by a set of data streams carrying data with real-time requirements. LEDBAT assumes that there is some amount of data that needs transferring, and that it's appropriate to delay data if congestion occurs.

- RTCWEB wants to make sure delay is low as long as it's not fighting with TCP, and wants its "fair share" when competing with TCP. LEDBAT wants to back off when encountering TCP, and uses low delay as a signal of "no competition is occuring"; it doesn't care about the specific delay.

I don't think it's clear that an RTCWEB flow at all times want to compete with a TCP flow to get its fair share. For instance I can imagine that a user may find that a delay of 1-2 seconds caused by the TCP flow makes the experience too bad, and that it's better to leave more bandwidth for TCP so that the TCP transfer will finish more quickly. Depends on the amount of buffer bloat, the length of the TCP flow, user preference, etc.

1-2 seconds is effectively unusable. When we compete with a (saturating) TCP flow, there are a few options: 1) reduce bandwidth and hope the TCP flow won't take it all (not all TCP flows can sustain infinite bandwidth, and the TCP flow may have bottlenecks or RTT-based limits that stop it from taking everything) or that the TCP flow will use the bandwidth to end faster. 2) reduce bandwidth and hope the TCP flow may not take the extra bandwidth fast enough to force the queues too deep - we reduce bandwidth and cause the queues to drain, and TCP will keep adding to its bandwidth - but at a certain rate depending on RTT/etc. We may be able to keep the queues low as we give up bandwidth in chunks, though eventually we will be driven down to our base. If we're lucky the TCP flow will (as per 1) hit another limit, or will end (not unusual for web browsing!). This really is a variant of #1. 3) reduce bandwidth, but allow queues to rise to a degree (say 100-200ms, perhaps an adaptive amount). This may allow an AQM or short-queue router to cause losses to the TCP flow(s) and cause them to back off. This could be a secondary measure after initial bandwidth reductions. 4) switch to pure loss-based, which means letting queue depths rise to the point of loss. Cx-TCP uses this (see recent (Nov? Dec?) ToN article referenced in my rtcweb Interim presentation from the end of Jan/early Feb). In some cases this will result in seconds of delay. There might be some possible dynamic tricks, though they may not result in a reasonable experience, such as allowing or forcing a spike in queue depths to get TCP to back off a lot, then reducing bandwidth a lot to let them drain while TCP starts to ramp back up. Eventually TCP will saturate again and force queues depths to rise, requiring you to repeat the behavior. This will cause periodic bursts of delay or loss which will be annoying, and also may lead to poor overall link utilization (though it's an attempt to get a fair share most of the time for the delay-based protocol). I doubt that overall this is practical.

...

- RTCWEB's RTP streams consists of unidirectional streams with (currently) fairly infrequent feedback messages. LEDBAT assumes an acknowledgement stream with nearly the same packet intervals as the forward stream.

My conclusion: When discussing behaviour of specific models, we can learn from LEDBAT's experiences and the scenarios it was tested in, but the design goals of LEDBAT do not resemble the design goals for congestion control in the RTCWEB scenario, and we should not expect specific properties of the implementation to fit.

I agree.

As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect. I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number. -- Randell Jesup randell-ietf@jesup.org

Stefan Holmer

2:40 p.m.

On Tue, Apr 10, 2012 at 4:02 PM, Randell Jesup <randell-ietf@jesup.org>wrote:

...

On 4/10/2012 7:51 AM, Stefan Holmer wrote:

...
On Tue, Apr 10, 2012 at 12:10 PM, Harald Alvestrand <harald@alvestrand.no <mailto:harald@alvestrand.no>> wrote:

Just to summarize what I currently understand about LEDBAT as compared to the scenario that RTCWEB envisions.....

- RTCWEB assumes a set of media streams, possibly supplemented by a set of data streams carrying data with real-time requirements. LEDBAT assumes that there is some amount of data that needs transferring, and that it's appropriate to delay data if congestion occurs.

- RTCWEB wants to make sure delay is low as long as it's not fighting with TCP, and wants its "fair share" when competing with TCP. LEDBAT wants to back off when encountering TCP, and uses low delay as a signal of "no competition is occuring"; it doesn't care about the specific delay.

I don't think it's clear that an RTCWEB flow at all times want to compete with a TCP flow to get its fair share. For instance I can imagine that a user may find that a delay of 1-2 seconds caused by the TCP flow makes the experience too bad, and that it's better to leave more bandwidth for TCP so that the TCP transfer will finish more quickly. Depends on the amount of buffer bloat, the length of the TCP flow, user preference, etc.

1-2 seconds is effectively unusable.

When we compete with a (saturating) TCP flow, there are a few options:

1) reduce bandwidth and hope the TCP flow won't take it all (not all TCP flows can sustain infinite bandwidth, and the TCP flow may have bottlenecks or RTT-based limits that stop it from taking everything) or that the TCP flow will use the bandwidth to end faster.

2) reduce bandwidth and hope the TCP flow may not take the extra bandwidth fast enough to force the queues too deep - we reduce bandwidth and cause the queues to drain, and TCP will keep adding to its bandwidth - but at a certain rate depending on RTT/etc. We may be able to keep the queues low as we give up bandwidth in chunks, though eventually we will be driven down to our base. If we're lucky the TCP flow will (as per 1) hit another limit, or will end (not unusual for web browsing!). This really is a variant of #1.

3) reduce bandwidth, but allow queues to rise to a degree (say 100-200ms, perhaps an adaptive amount). This may allow an AQM or short-queue router to cause losses to the TCP flow(s) and cause them to back off. This could be a secondary measure after initial bandwidth reductions.

4) switch to pure loss-based, which means letting queue depths rise to the point of loss. Cx-TCP uses this (see recent (Nov? Dec?) ToN article referenced in my rtcweb Interim presentation from the end of Jan/early Feb). In some cases this will result in seconds of delay.

There might be some possible dynamic tricks, though they may not result in a reasonable experience, such as allowing or forcing a spike in queue depths to get TCP to back off a lot, then reducing bandwidth a lot to let them drain while TCP starts to ramp back up. Eventually TCP will saturate again and force queues depths to rise, requiring you to repeat the behavior. This will cause periodic bursts of delay or loss which will be annoying, and also may lead to poor overall link utilization (though it's an attempt to get a fair share most of the time for the delay-based protocol). I doubt that overall this is practical.

I agree that those are all possible actions.

...

...
- RTCWEB's RTP streams consists of unidirectional streams with (currently) fairly infrequent feedback messages. LEDBAT assumes an acknowledgement stream with nearly the same packet intervals as the forward stream.

My conclusion: When discussing behaviour of specific models, we can learn from LEDBAT's experiences and the scenarios it was tested in, but the design goals of LEDBAT do not resemble the design goals for congestion control in the RTCWEB scenario, and we should not expect specific properties of the implementation to fit.

I agree.

As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect.

I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number.

Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

...

-- Randell Jesup randell-ietf@jesup.org

______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Randell Jesup

2:55 p.m.

On 4/10/2012 10:40 AM, Stefan Holmer wrote:

...

On Tue, Apr 10, 2012 at 4:02 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote: As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect.

I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number.

Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it? -- Randell Jesup randell-ietf@jesup.org

Piers O'Hanlon

4:17 p.m.

On 10 Apr 2012, at 15:55, Randell Jesup wrote:

...

On 4/10/2012 10:40 AM, Stefan Holmer wrote:

...
On Tue, Apr 10, 2012 at 4:02 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote: As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect.

I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number.

Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

LEDBAT WG ;) Though for the moment it's not clear how many flows are using LEDBAT - It's been mentioned that a few bittorrent clients are using it but I'm not clear on numbers - also some of them seem to implement a 'variant' called uTP, which I'm guessing isn't going to err on the side of lower throughput and delay.... I noticed that LEDBAT now ships as an available congestion control for TCP on OSX Lion for 'background' flows though again it's unclear how many apps actually use it. It seems that the 100ms is an agreed maximum delay bound which seems to have been adopted after an earlier 25ms proposal. As mentioned in the draft it is a question of balancing their throughput concerns with achieving suitable a accommodation with TCP flows. The 100ms is a default for the OSX implementation, whilst the (telecom-paristech.fr) Linux one has the earlier 25ms target. Piers.

...

-- Randell Jesup randell-ietf@jesup.org _______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Randell Jesup

6:58 p.m.

On 4/10/2012 12:17 PM, Piers O'Hanlon wrote:

...

As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect.

I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number.

Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Yeah... Not pretty. Any delay-based algorithm should target minimal/zero queue lengths to ensure fairness, otherwise it's an evil game/race where whomever cares least about delay gets all the bandwidth.

...

...
Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

LEDBAT WG ;)

Though for the moment it's not clear how many flows are using LEDBAT - It's been mentioned that a few bittorrent clients are using it but I'm not clear on numbers - also some of them seem to implement a 'variant' called uTP, which I'm guessing isn't going to err on the side of lower throughput and delay....

I noticed that LEDBAT now ships as an available congestion control for TCP on OSX Lion for 'background' flows though again it's unclear how many apps actually use it.

Also not good. We should actively discourage it, IMHO, until this is resolved.

...

It seems that the 100ms is an agreed maximum delay bound which seems to have been adopted after an earlier 25ms proposal. As mentioned in the draft it is a question of balancing their throughput concerns with achieving suitable a accommodation with TCP flows.

The 100ms is a default for the OSX implementation, whilst the (telecom-paristech.fr) Linux one has the earlier 25ms target.

100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that. -- Randell Jesup randell-ietf@jesup.org

Jim Gettys

7:14 p.m.

On 04/10/2012 02:58 PM, Randell Jesup wrote:

...

100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow. Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks). Worse yet, the broadband edge is typically a single queue today (even in technologies that may support multiple classifications. So your VOIP and other traffic is likely stuck behind other traffic. ISP's telephony services are typically bypassing these queues. If there is AQM, then you'll get packet marking going on (drop or ECN), and decent latencies. There is hope here for AQM algorithms that are self tuning: I now know of two of such beasts, though they are a long way from "running code" state at the moment. So the direction I'm going to to get AQM that works..... (along with classification...). But the high order bit is AQM, to keep the end point's TCP's behaving, which you can't do solely by classification. - Jim

Randell Jesup

8:45 p.m.

On 4/10/2012 3:14 PM, Jim Gettys wrote:

...

On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

You missed part of the point: LEDBAT is a "scavenger" protocol that currently targets 100ms of queuing delay using a one-way-delay estimator and an estimate of base transfer delay based on history. This means it won't necessarily saturate to the full buffer-bloat level that TCP would, but it may well keep the link at or above 100ms queuing *all* the time (given a target for this protocol is bittorrent clients). -- Randell Jesup randell-ietf@jesup.org

Matt Mathis

8:50 p.m.

Jim, For the record, my earlier "General thoughts" messages reflects my bet that better AQM will not be sufficient by itself, because the best that you can do with AQM will not in general be good enough for teleconferencing. Two reasons: - The optimal AQM setpoint for throughput maximization will consume too much of the RTCWEB end-to-end delay budget. - AQM is designed to allow transient long queues for TCP slowstart, etc. I do not disagree that AQM helps a lot. Without AQM the situation is abysmal. But this is really a different problem than bufferbloat, and good solutions to bufferbloat are not likely to automatically solve the RTCWEB problem because RTCWEB's target delay is well below the target delays for nearly all other applications. The exception might be the gaming community. Just last week somebody was telling me that all serious gaming hackers roll QoS on their home LAN, and that many of the large ISPs honor enough of the bits to make a difference. Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay On Tue, Apr 10, 2012 at 12:14 PM, Jim Gettys <jg@freedesktop.org> wrote:

...

On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks).

Worse yet, the broadband edge is typically a single queue today (even in technologies that may support multiple classifications. So your VOIP and other traffic is likely stuck behind other traffic. ISP's telephony services are typically bypassing these queues.

If there is AQM, then you'll get packet marking going on (drop or ECN), and decent latencies.

There is hope here for AQM algorithms that are self tuning: I now know of two of such beasts, though they are a long way from "running code" state at the moment.

So the direction I'm going to to get AQM that works..... (along with classification...). But the high order bit is AQM, to keep the end point's TCP's behaving, which you can't do solely by classification. - Jim

_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Jim Gettys

9:01 p.m.

On 04/10/2012 04:50 PM, Matt Mathis wrote:

...

Jim, For the record, my earlier "General thoughts" messages reflects my bet that better AQM will not be sufficient by itself, because the best that you can do with AQM will not in general be good enough for teleconferencing. Two reasons: - The optimal AQM setpoint for throughput maximization will consume too much of the RTCWEB end-to-end delay budget. - AQM is designed to allow transient long queues for TCP slowstart, etc.

I do not disagree that AQM helps a lot. Without AQM the situation is abysmal. But this is really a different problem than bufferbloat, and good solutions to bufferbloat are not likely to automatically solve the RTCWEB problem because RTCWEB's target delay is well below the target delays for nearly all other applications.

The exception might be the gaming community. Just last week somebody was telling me that all serious gaming hackers roll QoS on their home LAN, and that many of the large ISPs honor enough of the bits to make a difference.

I think we're in violent agreement here: while AQM is necessary, it is unlikely to be sufficient, particularly with entertainment like IW10, sharded web sites, and tcp offload engines doing really evil things when the packet trains go "splat" at the edge of the network. I'm planning on both AQM and classification/"fair" queueing. The problem that scares me most are these large packet trains leaving data centers... - Jim

...

Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay

On Tue, Apr 10, 2012 at 12:14 PM, Jim Gettys <jg@freedesktop.org <mailto:jg@freedesktop.org>> wrote:

On 04/10/2012 02:58 PM, Randell Jesup wrote: > > 100ms is just bad, bad, bad for VoIP on the same links. The only case > where I'd say it's ok is where it knows it's competing with > significant TCP flows. If it reverted to 0 queuing delay or close > when the channel is not saturated by TCP, then we might be ok (not > sure). But I don't think it does that. > You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks).

Worse yet, the broadband edge is typically a single queue today (even in technologies that may support multiple classifications. So your VOIP and other traffic is likely stuck behind other traffic. ISP's telephony services are typically bypassing these queues.

If there is AQM, then you'll get packet marking going on (drop or ECN), and decent latencies.

There is hope here for AQM algorithms that are self tuning: I now know of two of such beasts, though they are a long way from "running code" state at the moment.

So the direction I'm going to to get AQM that works..... (along with classification...). But the high order bit is AQM, to keep the end point's TCP's behaving, which you can't do solely by classification. - Jim

_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no <mailto:Rtp-congestion@alvestrand.no> http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Harald Alvestrand

11 Apr 11 Apr

6:16 a.m.

On 04/10/2012 09:14 PM, Jim Gettys wrote:

...

On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks). the encouraging thing in those (depressing) charts is that the fiber stuff (green subcloud) seems to be less broken than the DSL. So the future may actually be less depressing than the past. Worse yet, the broadband edge is typically a single queue today (even in technologies that may support multiple classifications. So your VOIP and other traffic is likely stuck behind other traffic. ISP's telephony services are typically bypassing these queues.

If there is AQM, then you'll get packet marking going on (drop or ECN), and decent latencies.

There is hope here for AQM algorithms that are self tuning: I now know of two of such beasts, though they are a long way from "running code" state at the moment.

So the direction I'm going to to get AQM that works..... (along with classification...). But the high order bit is AQM, to keep the end point's TCP's behaving, which you can't do solely by classification. AQM within a class, and DSCP to separate classes. Sounds like a necessary one-two punch.

(The authorization of DSCP is of course ANOTHER still-unsolved problem; Mike O'Dell once referred to the EF marking as "DDOS on steroids"....) Harald

Jim Gettys

10:43 a.m.

On 04/11/2012 02:16 AM, Harald Alvestrand wrote:

...

On 04/10/2012 09:14 PM, Jim Gettys wrote:

...
On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks). the encouraging thing in those (depressing) charts is that the fiber stuff (green subcloud) seems to be less broken than the DSL. So the future may actually be less depressing than the past. Get out your anti-depressants. The ICSI data *understates* the severity of the problem.

The ICSI data tops out at 20Mbps due to a limitation in their server systems, so we don't really know how good/bad fiber is (since most fiber tiers of service start around 20Mbps and so won't show up where it should on that plot). Secondly, the home router situation is even worse than broadband. As soon as the bandwidth is higher in the broadband hop than the wireless hop (and 802.11g tops out at about 20-22Mbps), the bottleneck shifts to the wireless hop, and you have the problem on either side of the wireless hop (our OS's and home routers). This is why I spend my time on home routers and Linux. Home routers/our operating systems have yet more bloat than broadband, typically. We have a disaster on our hands. Sorry to be the bearer of such horrifying news. - Jim

Harald Alvestrand

11:31 a.m.

On 04/11/2012 12:43 PM, Jim Gettys wrote:

...

On 04/11/2012 02:16 AM, Harald Alvestrand wrote:

...
On 04/10/2012 09:14 PM, Jim Gettys wrote:

...
On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks). the encouraging thing in those (depressing) charts is that the fiber stuff (green subcloud) seems to be less broken than the DSL. So the future may actually be less depressing than the past. Get out your anti-depressants. The ICSI data *understates* the severity of the problem.

The ICSI data tops out at 20Mbps due to a limitation in their server systems, so we don't really know how good/bad fiber is (since most fiber tiers of service start around 20Mbps and so won't show up where it should on that plot).

Secondly, the home router situation is even worse than broadband. As soon as the bandwidth is higher in the broadband hop than the wireless hop (and 802.11g tops out at about 20-22Mbps), the bottleneck shifts to the wireless hop, and you have the problem on either side of the wireless hop (our OS's and home routers). This is why I spend my time on home routers and Linux. Home routers/our operating systems have yet more bloat than broadband, typically.

We have a disaster on our hands.

Sorry to be the bearer of such horrifying news. I know - well enough to want to smile at any small hint of a silver lining :-)

Jim Gettys

12:31 p.m.

On 04/11/2012 07:31 AM, Harald Alvestrand wrote:

...

On 04/11/2012 12:43 PM, Jim Gettys wrote:

...
On 04/11/2012 02:16 AM, Harald Alvestrand wrote:

...
On 04/10/2012 09:14 PM, Jim Gettys wrote:

...
On 04/10/2012 02:58 PM, Randell Jesup wrote:

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

You aren't going to see delay under saturating load under 100ms unless the bottleneck link is running a working AQM; that's the property of tail drop, and the "rule of thumb" for sizing buffers has been of order 100ms. This is to ensure maximum bandwidth over continental paths of a single TCP flow.

Unfortunately, the bloat in the broadband edge is often/usually much, much higher than this, being best measured in seconds :-(. http://gettys.files.wordpress.com/2010/12/uplink_buffer_all.png http://gettys.files.wordpress.com/2010/12/downlink_buffer_all.png (thanks to the Netalyzr folks). the encouraging thing in those (depressing) charts is that the fiber stuff (green subcloud) seems to be less broken than the DSL. So the future may actually be less depressing than the past. Get out your anti-depressants. The ICSI data *understates* the severity of the problem.

The ICSI data tops out at 20Mbps due to a limitation in their server systems, so we don't really know how good/bad fiber is (since most fiber tiers of service start around 20Mbps and so won't show up where it should on that plot).

Secondly, the home router situation is even worse than broadband. As soon as the bandwidth is higher in the broadband hop than the wireless hop (and 802.11g tops out at about 20-22Mbps), the bottleneck shifts to the wireless hop, and you have the problem on either side of the wireless hop (our OS's and home routers). This is why I spend my time on home routers and Linux. Home routers/our operating systems have yet more bloat than broadband, typically.

We have a disaster on our hands.

Sorry to be the bearer of such horrifying news. I know - well enough to want to smile at any small hint of a silver lining :-)

I know you (and I) would like some smiles... Unfortunately, given the limitations of buffering ICSI's test, we don't know much for faster devices. This means much/most of fiber, and the higher tiers of cable are not testable by that test, and you won't see valid results. (you can go look at the netalyzr papers for better understanding and interpretation of the scatter plots). They are working to try to raise the bandwidth limit of that test. But I'm *very* pessimistic, for the reasons I now explain. There seem to be two common cases among engineers building network devices: 1) no conscious thought about buffering: use all the RAM you've got, or defaults inherited from other technologies (e.g. gigabit Ethernet defaults applied to wireless, for a concrete example). 2) the usual 100MS rule of thumb *at the highest bandwidth the device can run* for a single TCP flow so that it benchmarks well. Sometimes this buffering seems to get stretched to hide firmware bugs too... We *know* that hiding firmware bugs is occurring in practice; it isn't a hypothetical. So, say you have a cable modem or fiber box that is designed to be able to run at 100Mbps, and you are buying 20Mbps. You end up at least five times over-buffered out of the starting gate (at 1/2 second, presuming they designed to the usual 100MS metric). The first generation DOCSIS 3 modems, for example, are designed to run up to 150Mbps. I've seen corresponding amounts of buffering present; God forbid you only buy 10Mbps and use such a modem (until the DOCSIS amendment is fully deployed, you'll get today's full glory). Then the buffering turns into a couple seconds. I think it likely (actually as certain as I can be without having run tests myself) that fiber has the same problem: else they would flunk their bandwidth tests, and not get bought by the ISP. So the fact that fiber doesn't show as much buffering on the ICSI data is an artefact, almost certainly. And as soon as this bandwidth exceeds your share of your 802.11 link, you get all the problems of both home routers and our operating systems, rather than the broadband link. There, there seems to be hundreds (or even more than 1000) packets of buffering present, depending on the vintage of your OS and home router. Since the buffering here is in packets rather than bytes, you get points on the ICSI data that are not the "power of two" structure you see prominently in that data. And I won't even go into what may be going on elsewhere in the internet (e.g. the broadband head end CMTS, DSLAM or fiber box) beyond saying that they likely are not running AQM.... It's a mess. Sorry to be so depressing on a fine morning. I see no silver lining. - Jim

Wesley Eddy

2:35 a.m.

On 4/10/2012 2:58 PM, Randell Jesup wrote:

...

...
Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Yeah... Not pretty. Any delay-based algorithm should target minimal/zero queue lengths to ensure fairness, otherwise it's an evil game/race where whomever cares least about delay gets all the bandwidth.

The target is a balancing act, and a heuristic. Aiming for overly small queue lengths (or zero) is poor due to delay variability of lower layers (e.g. WLAN or cellular) as well as measurement error. These are not specific to LEDBAT and will be problematic for other delay-based proposals, such as the Google one we have discussed somewhat.

...

...
...
Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

LEDBAT WG ;)

Though for the moment it's not clear how many flows are using LEDBAT - It's been mentioned that a few bittorrent clients are using it but I'm not clear on numbers - also some of them seem to implement a 'variant' called uTP, which I'm guessing isn't going to err on the side of lower throughput and delay....

I noticed that LEDBAT now ships as an available congestion control for TCP on OSX Lion for 'background' flows though again it's unclear how many apps actually use it.

Also not good. We should actively discourage it, IMHO, until this is resolved.

I'm not sure who the "we" is above. Have you tried sharing a link with BitTorrent traffic before and after the clients implemented LEDBAT? You might decide to actively encourage it; VoIP is totally unusable in the "before" configuration and quite usable "after", with uTP.

...

100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

My experience suggests otherwise. 100ms is quite a bit better than the alternative of multiple seconds. You can easily use Skype with modern BitTorrent clients running. Zero queueing delay is an unreasonable target since it can't be accurately measured versus variations caused by wireless MAC, OS, and other factors. Since Windows OS over WLAN is probably one of the main ways that people run either BitTorrent or will run RTCWEB, these variations in delay need to be understood. -- Wes Eddy MTI Systems

Randell Jesup

6:56 a.m.

On 4/10/2012 10:35 PM, Wesley Eddy wrote:

...

On 4/10/2012 2:58 PM, Randell Jesup wrote:

...
...
Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Yeah... Not pretty. Any delay-based algorithm should target minimal/zero queue lengths to ensure fairness, otherwise it's an evil game/race where whomever cares least about delay gets all the bandwidth.

The target is a balancing act, and a heuristic.

Aiming for overly small queue lengths (or zero) is poor due to delay variability of lower layers (e.g. WLAN or cellular) as well as measurement error. These are not specific to LEDBAT and will be problematic for other delay-based proposals, such as the Google one we have discussed somewhat.

Agreed, though Google's algorithm does not directly target 0 queuing delay; it only indirectly targets it. Since it doesn't attempt to measure actual queue depth, and merely focuses staying below (in bandwidth use) the point where delay appears to rise, it can stabilize at higher queuing depths. If LEDBAT's estimates are reasonably accurate it may make sense to incorporate that into the algorithm in some manner. How the algorithms will interact is very hard to predict right now. In theory, an algorithm tolerant of a larger delay *should* end up collecting most of all of the bandwidth, but that assumes similar methods of probing for delay.

...

...
...
...
Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

LEDBAT WG ;)

Though for the moment it's not clear how many flows are using LEDBAT - It's been mentioned that a few bittorrent clients are using it but I'm not clear on numbers - also some of them seem to implement a 'variant' called uTP, which I'm guessing isn't going to err on the side of lower throughput and delay....

I noticed that LEDBAT now ships as an available congestion control for TCP on OSX Lion for 'background' flows though again it's unclear how many apps actually use it.

Also not good. We should actively discourage it, IMHO, until this is resolved.

I'm not sure who the "we" is above. Have you tried sharing a link with BitTorrent traffic before and after the clients implemented LEDBAT? You might decide to actively encourage it; VoIP is totally unusable in the "before" configuration and quite usable "after", with uTP.

I've found a lot of people have gotten used to tolerating much more delay from their VoIP client than I'd ever find comfortable; I still use landlines a lot because people on Skype and other VoIP softclients often talk over each other. I cringe when I see engineers blithely interviewing candidates over Skype or a VoIP softclient with a clear one-way mouth-to-ear delay over 200 or 300ms - it's an uncomfortable experience encouraging people to "hold the floor" and/or awkward pauses/talkover.

...

...
100ms is just bad, bad, bad for VoIP on the same links. The only case where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

My experience suggests otherwise. 100ms is quite a bit better than the alternative of multiple seconds.

Well sure, in the same way that punched in the nose is better than shot through the head. :-)

...

You can easily use Skype with modern BitTorrent clients running. Zero queueing delay is an unreasonable target since it can't be accurately measured versus variations caused by wireless MAC, OS, and other factors. Since Windows OS over WLAN is probably one of the main ways that people run either BitTorrent or will run RTCWEB, these variations in delay need to be understood.

Agreed. I assume the LEDBAT WG did VoIP tests? Did they measure MOS? (not that it's a great measure, but it's well-known and understood). Any links to results? -- Randell Jesup randell-ietf@jesup.org

Stefan Holmer

7:22 a.m.

On Wed, Apr 11, 2012 at 8:56 AM, Randell Jesup <randell-ietf@jesup.org>wrote:

...

On 4/10/2012 10:35 PM, Wesley Eddy wrote:

...
On 4/10/2012 2:58 PM, Randell Jesup wrote:

...
...
Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-**congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Yeah... Not pretty. Any delay-based algorithm should target minimal/zero queue lengths to ensure fairness, otherwise it's an evil game/race where whomever cares least about delay gets all the bandwidth.

The target is a balancing act, and a heuristic.

Aiming for overly small queue lengths (or zero) is poor due to delay variability of lower layers (e.g. WLAN or cellular) as well as measurement error. These are not specific to LEDBAT and will be problematic for other delay-based proposals, such as the Google one we have discussed somewhat.

Agreed, though Google's algorithm does not directly target 0 queuing delay; it only indirectly targets it. Since it doesn't attempt to measure actual queue depth, and merely focuses staying below (in bandwidth use) the point where delay appears to rise, it can stabilize at higher queuing depths. If LEDBAT's estimates are reasonably accurate it may make sense to incorporate that into the algorithm in some manner.

Correct, noisy variations in delay are suppressed. And there's also an outlier filter with the purpose of not reacting to a few packets being delayed.

...

How the algorithms will interact is very hard to predict right now. In theory, an algorithm tolerant of a larger delay *should* end up collecting most of all of the bandwidth, but that assumes similar methods of probing for delay.

Right - which means someone should raise this issue about LEDBAT

...
...
...
...
ASAP. Which WG is handling it?

LEDBAT WG ;)

Though for the moment it's not clear how many flows are using LEDBAT - It's been mentioned that a few bittorrent clients are using it but I'm not clear on numbers - also some of them seem to implement a 'variant' called uTP, which I'm guessing isn't going to err on the side of lower throughput and delay....

I noticed that LEDBAT now ships as an available congestion control for TCP on OSX Lion for 'background' flows though again it's unclear how many apps actually use it.

Also not good. We should actively discourage it, IMHO, until this is resolved.

I'm not sure who the "we" is above. Have you tried sharing a link with BitTorrent traffic before and after the clients implemented LEDBAT? You might decide to actively encourage it; VoIP is totally unusable in the "before" configuration and quite usable "after", with uTP.

I've found a lot of people have gotten used to tolerating much more delay from their VoIP client than I'd ever find comfortable; I still use landlines a lot because people on Skype and other VoIP softclients often talk over each other. I cringe when I see engineers blithely interviewing candidates over Skype or a VoIP softclient with a clear one-way mouth-to-ear delay over 200 or 300ms - it's an uncomfortable experience encouraging people to "hold the floor" and/or awkward pauses/talkover.

100ms is just bad, bad, bad for VoIP on the same links. The only case

...
...
where I'd say it's ok is where it knows it's competing with significant TCP flows. If it reverted to 0 queuing delay or close when the channel is not saturated by TCP, then we might be ok (not sure). But I don't think it does that.

My experience suggests otherwise. 100ms is quite a bit better than the alternative of multiple seconds.

Well sure, in the same way that punched in the nose is better than shot through the head. :-)

You can easily use Skype with

...
modern BitTorrent clients running. Zero queueing delay is an unreasonable target since it can't be accurately measured versus variations caused by wireless MAC, OS, and other factors. Since Windows OS over WLAN is probably one of the main ways that people run either BitTorrent or will run RTCWEB, these variations in delay need to be understood.

Agreed. I assume the LEDBAT WG did VoIP tests? Did they measure MOS? (not that it's a great measure, but it's well-known and understood). Any links to results?

...

-- Randell Jesup randell-ietf@jesup.org

______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Wesley Eddy

6:38 p.m.

On 4/11/2012 2:56 AM, Randell Jesup wrote:

...

How the algorithms will interact is very hard to predict right now. In theory, an algorithm tolerant of a larger delay *should* end up collecting most of all of the bandwidth, but that assumes similar methods of probing for delay.

Yes, the other relevant aspect is the "latecomer advantage" that's been discussed in LEDBAT. Basically, if there's already a standing queue when your flow starts, you may not have a good chance of measuring the "base delay" without a queue. So even if your target delays from base delay are equal, if the base delays aren't, there can be problems.

...

...
My experience suggests otherwise. 100ms is quite a bit better than the alternative of multiple seconds.

Well sure, in the same way that punched in the nose is better than shot through the head. :-)

Agreed!

...

...
You can easily use Skype with modern BitTorrent clients running. Zero queueing delay is an unreasonable target since it can't be accurately measured versus variations caused by wireless MAC, OS, and other factors. Since Windows OS over WLAN is probably one of the main ways that people run either BitTorrent or will run RTCWEB, these variations in delay need to be understood.

Agreed. I assume the LEDBAT WG did VoIP tests? Did they measure MOS? (not that it's a great measure, but it's well-known and understood). Any links to results?

To my knowledge, there hasn't been specific test results like this presented to the working group. The working group is actually planning to close "real soon now", as the specification is on the way to the IESG and the rest of the discussion has really died off. Such tests would be interesting and useful (maybe even necessary) in thinking about progressing from Experimental to Standards Track, in my opinion. Since this will be part of the deployed base that RTCWEB is sharing links with, it will be good to think about in terms of how we evaluate candidate RTCWEB mechanisms/algorithms. -- Wes Eddy MTI Systems

Randell Jesup

7:50 p.m.

On 4/11/2012 2:38 PM, Wesley Eddy wrote:

...

On 4/11/2012 2:56 AM, Randell Jesup wrote:

...

...
Agreed. I assume the LEDBAT WG did VoIP tests? Did they measure MOS? (not that it's a great measure, but it's well-known and understood). Any links to results?

To my knowledge, there hasn't been specific test results like this presented to the working group. The working group is actually planning to close "real soon now", as the specification is on the way to the IESG and the rest of the discussion has really died off.

The link posted by Piers shows that a 25ms-target LEDBAT flow increased VoIP delay by 35ms. One should assume barring additional tests that a 100ms LEDBAT would increase VoIP delay by >100ms. Given a 150ms window for best quality (mouth-to-ear), you've already blown the window. At best you can avoid going *too* far down the slope. Given 50-150ms of capture, encoding, transmission, jitter buffer, decode, playback delay for a video call, adding 100ms+ of queuing delay will put you *well* down the curve, even with a good, local connection. From that data, I would not say 100ms LEDBAT is fair (as a scavenger protocol) with classic inflexible VoIP traffic, which includes our (rtcweb) traffic. Unfortunately, they also mention that bittorrent's equivalent to LEDBAT is deployed, and has 100ms as the target.

...

Such tests would be interesting and useful (maybe even necessary) in thinking about progressing from Experimental to Standards Track, in my opinion.

Since this will be part of the deployed base that RTCWEB is sharing links with, it will be good to think about in terms of how we evaluate candidate RTCWEB mechanisms/algorithms.

We can and should test against LEDBAT, but I think it will tell just tell us that LEDBAT doesn't play nicely with other delay-based algorithms, especially if they have low-delay targets. -- Randell Jesup randell-ietf@jesup.org

Stefan Holmer

12 Apr 12 Apr

8:05 a.m.

On Tue, Apr 10, 2012 at 4:55 PM, Randell Jesup <randell-ietf@jesup.org>wrote:

...

On 4/10/2012 10:40 AM, Stefan Holmer wrote:

...
On Tue, Apr 10, 2012 at 4:02 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org**>> wrote: As do I. Also, I *REALLY* worry about the interaction of LEDBAT flows and rtcweb flows... If it targets 100ms queuing delay as the "I'm out of the way of TCP" level, that could seriously negatively impact us (and general VoIP as well, but even more so us, since we'll again get driven into the ground trying to keep the queues drained. It may take longer, but LEDBAT flows tend to be close-to-infinite I would assume. If it targets 25ms, that's less problematic I suspect.

I'm not saying I know there will be a problem here, but that I fear there will be since LEDBAT has a non-0 queuing target - it may "poison the waters" for any delay-based algorithm that wants to target a lower number.

Yes, having two algorithms with different delay targets compete should be approximately the same thing as having a delay-based algorithm compete with a loss-based algorithm, although the effects seen may be more or less bad depending on how close the targets are. To be clear, our draft (draft-alvestrand-rtcweb-**congestion) has a 0 delay target, which means that it will always let the queues drain before increasing the rate.

Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

Gave this some more thought, and the problem might not be as big as I first anticipated. When an receive-side inter-arrival time-based congestion control algorithm such as RRTCC (will refer to draft-alvestrand-rtcweb-**congestion as Receive-side Real-Time Congestion Control, RRTCC for now. Other suggestions are welcome) compete with LEDBAT we will see a situation where the the LEDBAT flow builds up queues of 100 ms. RRTCC will observe increased inter-arrival time and react back off (although, it will probably not back off too far). Sooner or later I think we will reach a steady state with 100 ms delay, and that point RRTCC will act as normal. If the LEDBAT flow stops, RRTCC will see decreasing inter-arrival time for some time and let the queue flush before trying to grab the newly available bandwidth.

...

-- Randell Jesup randell-ietf@jesup.org ______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Randell Jesup

8:52 p.m.

On 4/12/2012 4:05 AM, Stefan Holmer wrote:

...

Right - which means someone should raise this issue about LEDBAT ASAP. Which WG is handling it?

Gave this some more thought, and the problem might not be as big as I first anticipated.

When an receive-side inter-arrival time-based congestion control algorithm such as RRTCC (will refer to draft-alvestrand-rtcweb-__congestion as Receive-side Real-Time Congestion Control, RRTCC for now. Other suggestions are welcome) compete with LEDBAT we will see a situation where the the LEDBAT flow builds up queues of 100 ms. RRTCC will observe increased inter-arrival time and react back off (although, it will probably not back off too far).

Agreed, RRTCC will back off, though probably not dramatically (depending on how aggressive the slow-start/etc of LEDBAT is).

...

Sooner or later I think we will reach a steady state with 100 ms delay, and that point RRTCC will act as normal.

Hmm, not sure I agree here. Both LEDBAT and RRTCC will probe for more bandwidth, and both will see the results of the other's probes. When either increases bandwidth use enough to cause a delay queue rise, one or both will react; and there's a question of which is more likely to react, how fast, and how will re-stabilization occur. I significantly doubt this is a stable situation; one side or the other will likely drop earlier, drop faster or stabilize and start probing again slower. You can even see aspects of this with LEDBAT competing with itself if the target depths are close. Since RRTCC and LEDBAT have different algorithms and time constants (I assume), it's likely that it will be unstable in a consistent direction, and one will "win". Also they'll act differently in response to minor disruptions from other traffic. If RRTCC tries to (most of the time) stay slightly below the point where delay queues increase, this backoff will be a small amount of 'clear' bandwidth that will cause queue draining and cause LEDBAT to increase usage. Again leading to instability (and likely LEDBAT monopolizing). It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

...

If the LEDBAT flow stops, RRTCC will see decreasing inter-arrival time for some time and let the queue flush before trying to grab the newly available bandwidth.

Right; I don't expect any problems there. -- Randell Jesup randell-ietf@jesup.org

Stefan Holmer

13 Apr 13 Apr

7:03 a.m.

On Thu, Apr 12, 2012 at 10:52 PM, Randell Jesup <randell-ietf@jesup.org>wrote:

...

On 4/12/2012 4:05 AM, Stefan Holmer wrote:

Right - which means someone should raise this issue about LEDBAT

...
ASAP. Which WG is handling it?

Gave this some more thought, and the problem might not be as big as I first anticipated.

When an receive-side inter-arrival time-based congestion control algorithm such as RRTCC (will refer to draft-alvestrand-rtcweb-__**congestion as Receive-side Real-Time

Congestion Control, RRTCC for now. Other suggestions are welcome) compete with LEDBAT we will see a situation where the the LEDBAT flow builds up queues of 100 ms. RRTCC will observe increased inter-arrival time and react back off (although, it will probably not back off too far).

Agreed, RRTCC will back off, though probably not dramatically (depending on how aggressive the slow-start/etc of LEDBAT is).

Sooner or later I think we will reach a steady state with 100 ms

...
delay, and that point RRTCC will act as normal.

Hmm, not sure I agree here. Both LEDBAT and RRTCC will probe for more bandwidth, and both will see the results of the other's probes. When either increases bandwidth use enough to cause a delay queue rise, one or both will react; and there's a question of which is more likely to react, how fast, and how will re-stabilization occur.

I significantly doubt this is a stable situation; one side or the other will likely drop earlier, drop faster or stabilize and start probing again slower. You can even see aspects of this with LEDBAT competing with itself if the target depths are close. Since RRTCC and LEDBAT have different algorithms and time constants (I assume), it's likely that it will be unstable in a consistent direction, and one will "win". Also they'll act differently in response to minor disruptions from other traffic.

If RRTCC tries to (most of the time) stay slightly below the point where delay queues increase, this backoff will be a small amount of 'clear' bandwidth that will cause queue draining and cause LEDBAT to increase usage. Again leading to instability (and likely LEDBAT monopolizing).

It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

I agree. Because of different averaging and trigger thresholds they will likely not end up in a stable state. It for sure seems unlikely that they will happen to fairly share the bandwidth.

...

If the LEDBAT flow

...
stops, RRTCC will see decreasing inter-arrival time for some time and let the queue flush before trying to grab the newly available bandwidth.

Right; I don't expect any problems there.

-- Randell Jesup randell-ietf@jesup.org ______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Randell Jesup

9:17 a.m.

On 4/13/2012 3:03 AM, Stefan Holmer wrote:

...

On Thu, Apr 12, 2012 at 10:52 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:

...

It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

I agree. Because of different averaging and trigger thresholds they will likely not end up in a stable state. It for sure seems unlikely that they will happen to fairly share the bandwidth.

The real problem here is that LEDBAT is designated a "scavenger" protocol that should get out of the way of primary uses (which includes rtcweb traffic). While it's possible that will be the result experimentally, I tend to doubt it and it's certainly unclear without experiments - and I also doubt a fair sharing will occur. So my guess (which should be checked!) is that LEDBAT and RRTCC are not compatible on the same bottleneck links. This means that manual intervention will be needed to enable RRTCC traffic to be usable; either stopping or bandwidth-limiting any LEDBAT flows. In theory if the OS was controlling the LEDBAT flows it could be asked by an RRTCC (userspace) application to have them get out of the way (which probably means halt or virtually so during RRTCC operation) or to in some manner use send() traffic as a flag to do so. An example might be applications using LEDBAT in OSX for 'background' download/update that may not have external controls that a user could use to suspend transfers during a call. I'm not holding my breath on this one; and it wouldn't help if there's another endpoint behind the same bottleneck using LEDBAT. The last recourse is the advanced modem/router with classification (again, not something we can do anything about). However, as Jim Gettys will tell you, this may not help you as much if another link is the bottleneck, such as a wifi router behind the modem or primary router. I think we're going to find that LEDBAT has failed in (what should be) a primary goal, which is to avoid hurting "foreground" traffic, largely because they appear to have only considered TCP flows (from review of their mailing list and specs). Regular inflexible VoIP traffic is likely badly hurt by the current spec (since 100+ms of extra delay will push typical VoIP traffic well into the "bad" part of the MOS slope), and delay-sensing foreground protocols like RRTCC are probably blown out of the water. If LEDBAT actually is to be a 'scavenger' protocol, it must have some mechanism other than purely queue depth to allow foreground protocols to push it out of the way. It's possible it could stick to queue depth but use very small values, and/or use slower time constants than "foreground" delay-sensing algorithms to guarantee they 'win' when competing with it. Cross-posting to the LEDBAT list in the hopes that I'm wrong. (For reference, RRTCC is a delay-sensing CC algorithm for RTP traffic, recently discussed at IETF83 in the ICCRG and planned for use in rtcweb clients. RRTCC is a brand-new moniker for the algorithm in Harald Alvestrand's draft, but similar algorithms have been in use (but not standardized) since at least 2004, long predating LEDBAT/uTP.) -- Randell Jesup randell-ietf@jesup.org

Mirja Kuehlewind

20 Apr 20 Apr

11:55 a.m.

New subject: [ledbat] LEDBAT vs RTCWeb

Hi Randell, I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that. If you choose a delay-based congestion control I don't think your problem is LEDBAT but standard loss-based TCP that will frequently fill up the queue completely. Maybe you don't want to look at the total queuing delay but at the changes in queuing delay...? LEDBAT will keep the delay constant. Mirja On Friday 13 April 2012 11:17:31 Randell Jesup wrote:

...

On 4/13/2012 3:03 AM, Stefan Holmer wrote:

...
On Thu, Apr 12, 2012 at 10:52 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:

It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

I agree. Because of different averaging and trigger thresholds they will likely not end up in a stable state. It for sure seems unlikely that they will happen to fairly share the bandwidth.

The real problem here is that LEDBAT is designated a "scavenger" protocol that should get out of the way of primary uses (which includes rtcweb traffic). While it's possible that will be the result experimentally, I tend to doubt it and it's certainly unclear without experiments - and I also doubt a fair sharing will occur. So my guess (which should be checked!) is that LEDBAT and RRTCC are not compatible on the same bottleneck links. This means that manual intervention will be needed to enable RRTCC traffic to be usable; either stopping or bandwidth-limiting any LEDBAT flows.

In theory if the OS was controlling the LEDBAT flows it could be asked by an RRTCC (userspace) application to have them get out of the way (which probably means halt or virtually so during RRTCC operation) or to in some manner use send() traffic as a flag to do so. An example might be applications using LEDBAT in OSX for 'background' download/update that may not have external controls that a user could use to suspend transfers during a call. I'm not holding my breath on this one; and it wouldn't help if there's another endpoint behind the same bottleneck using LEDBAT.

The last recourse is the advanced modem/router with classification (again, not something we can do anything about). However, as Jim Gettys will tell you, this may not help you as much if another link is the bottleneck, such as a wifi router behind the modem or primary router.

I think we're going to find that LEDBAT has failed in (what should be) a primary goal, which is to avoid hurting "foreground" traffic, largely because they appear to have only considered TCP flows (from review of their mailing list and specs). Regular inflexible VoIP traffic is likely badly hurt by the current spec (since 100+ms of extra delay will push typical VoIP traffic well into the "bad" part of the MOS slope), and delay-sensing foreground protocols like RRTCC are probably blown out of the water.

If LEDBAT actually is to be a 'scavenger' protocol, it must have some mechanism other than purely queue depth to allow foreground protocols to push it out of the way. It's possible it could stick to queue depth but use very small values, and/or use slower time constants than "foreground" delay-sensing algorithms to guarantee they 'win' when competing with it.

Cross-posting to the LEDBAT list in the hopes that I'm wrong. (For reference, RRTCC is a delay-sensing CC algorithm for RTP traffic, recently discussed at IETF83 in the ICCRG and planned for use in rtcweb clients. RRTCC is a brand-new moniker for the algorithm in Harald Alvestrand's draft, but similar algorithms have been in use (but not standardized) since at least 2004, long predating LEDBAT/uTP.)

-- ------------------------------------------------------------------- Dipl.-Ing. Mirja Kühlewind Institute of Communication Networks and Computer Engineering (IKR) University of Stuttgart, Germany Pfaffenwaldring 47, D-70569 Stuttgart tel: +49(0)711/685-67973 email: mirja.kuehlewind@ikr.uni-stuttgart.de web: www.ikr.uni-stuttgart.de -------------------------------------------------------------------

Arjuna Sathiaseelan

12:03 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

Hi Mirja - I am wondering how the mechanism discussed in the following paper could be useful to predict the network state and then ledbat or even TCP choosing its aggressiveness based on the state.. End-to-End Transmission Control by Modeling Uncertainty about the Network State http://conferences.sigcomm.org/hotnets/2011/papers/hotnetsX-final100.pdf cc-ed to the authors too.. Arjuna On 20 April 2012 12:55, Mirja Kuehlewind <mirja.kuehlewind@ikr.uni-stuttgart.de> wrote:

...

Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

If you choose a delay-based congestion control I don't think your problem is LEDBAT but standard loss-based TCP that will frequently fill up the queue completely.

Maybe you don't want to look at the total queuing delay but at the changes in queuing delay...? LEDBAT will keep the delay constant.

Mirja

On Friday 13 April 2012 11:17:31 Randell Jesup wrote:

...
On 4/13/2012 3:03 AM, Stefan Holmer wrote:

...
On Thu, Apr 12, 2012 at 10:52 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:

It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

I agree. Because of different averaging and trigger thresholds they will likely not end up in a stable state. It for sure seems unlikely that they will happen to fairly share the bandwidth.

The real problem here is that LEDBAT is designated a "scavenger" protocol that should get out of the way of primary uses (which includes rtcweb traffic). While it's possible that will be the result experimentally, I tend to doubt it and it's certainly unclear without experiments - and I also doubt a fair sharing will occur. So my guess (which should be checked!) is that LEDBAT and RRTCC are not compatible on the same bottleneck links. This means that manual intervention will be needed to enable RRTCC traffic to be usable; either stopping or bandwidth-limiting any LEDBAT flows.

In theory if the OS was controlling the LEDBAT flows it could be asked by an RRTCC (userspace) application to have them get out of the way (which probably means halt or virtually so during RRTCC operation) or to in some manner use send() traffic as a flag to do so. An example might be applications using LEDBAT in OSX for 'background' download/update that may not have external controls that a user could use to suspend transfers during a call. I'm not holding my breath on this one; and it wouldn't help if there's another endpoint behind the same bottleneck using LEDBAT.

The last recourse is the advanced modem/router with classification (again, not something we can do anything about). However, as Jim Gettys will tell you, this may not help you as much if another link is the bottleneck, such as a wifi router behind the modem or primary router.

I think we're going to find that LEDBAT has failed in (what should be) a primary goal, which is to avoid hurting "foreground" traffic, largely because they appear to have only considered TCP flows (from review of their mailing list and specs). Regular inflexible VoIP traffic is likely badly hurt by the current spec (since 100+ms of extra delay will push typical VoIP traffic well into the "bad" part of the MOS slope), and delay-sensing foreground protocols like RRTCC are probably blown out of the water.

If LEDBAT actually is to be a 'scavenger' protocol, it must have some mechanism other than purely queue depth to allow foreground protocols to push it out of the way. It's possible it could stick to queue depth but use very small values, and/or use slower time constants than "foreground" delay-sensing algorithms to guarantee they 'win' when competing with it.

Cross-posting to the LEDBAT list in the hopes that I'm wrong. (For reference, RRTCC is a delay-sensing CC algorithm for RTP traffic, recently discussed at IETF83 in the ICCRG and planned for use in rtcweb clients. RRTCC is a brand-new moniker for the algorithm in Harald Alvestrand's draft, but similar algorithms have been in use (but not standardized) since at least 2004, long predating LEDBAT/uTP.)

-- ------------------------------------------------------------------- Dipl.-Ing. Mirja Kühlewind Institute of Communication Networks and Computer Engineering (IKR) University of Stuttgart, Germany Pfaffenwaldring 47, D-70569 Stuttgart

tel: +49(0)711/685-67973 email: mirja.kuehlewind@ikr.uni-stuttgart.de web: www.ikr.uni-stuttgart.de ------------------------------------------------------------------- _______________________________________________ ledbat mailing list ledbat@ietf.org https://www.ietf.org/mailman/listinfo/ledbat

-- http://about.me/arjuna.sathiaseelan

Jim Gettys

12:23 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...

Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end.... Of course, it's even worse between continents, even without broken networks. Not so nice.... We *have* to fix the edge. Drop tail queues of 100ms or bigger have got to go....

...

If you choose a delay-based congestion control I don't think your problem is LEDBAT but standard loss-based TCP that will frequently fill up the queue completely.

Yup. All it takes is one. And once you eliminate the delays with AQM in the edge of the net, then delay based controls immediately go back to competing. - Jim

...

Maybe you don't want to look at the total queuing delay but at the changes in queuing delay...? LEDBAT will keep the delay constant.

Mirja

On Friday 13 April 2012 11:17:31 Randell Jesup wrote:

...
On 4/13/2012 3:03 AM, Stefan Holmer wrote:

...
On Thu, Apr 12, 2012 at 10:52 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:

It's *possible* that RRTCC will 'win', it's also definitely possible that they'll end up semi-stable where neither goes too far away from a 'fair' share. The one thing I don't predict is stable, fair and predictable sharing of the bandwidth. :-/

I agree. Because of different averaging and trigger thresholds they will likely not end up in a stable state. It for sure seems unlikely that they will happen to fairly share the bandwidth. The real problem here is that LEDBAT is designated a "scavenger" protocol that should get out of the way of primary uses (which includes rtcweb traffic). While it's possible that will be the result experimentally, I tend to doubt it and it's certainly unclear without experiments - and I also doubt a fair sharing will occur. So my guess (which should be checked!) is that LEDBAT and RRTCC are not compatible on the same bottleneck links. This means that manual intervention will be needed to enable RRTCC traffic to be usable; either stopping or bandwidth-limiting any LEDBAT flows.

In theory if the OS was controlling the LEDBAT flows it could be asked by an RRTCC (userspace) application to have them get out of the way (which probably means halt or virtually so during RRTCC operation) or to in some manner use send() traffic as a flag to do so. An example might be applications using LEDBAT in OSX for 'background' download/update that may not have external controls that a user could use to suspend transfers during a call. I'm not holding my breath on this one; and it wouldn't help if there's another endpoint behind the same bottleneck using LEDBAT.

The last recourse is the advanced modem/router with classification (again, not something we can do anything about). However, as Jim Gettys will tell you, this may not help you as much if another link is the bottleneck, such as a wifi router behind the modem or primary router.

I think we're going to find that LEDBAT has failed in (what should be) a primary goal, which is to avoid hurting "foreground" traffic, largely because they appear to have only considered TCP flows (from review of their mailing list and specs). Regular inflexible VoIP traffic is likely badly hurt by the current spec (since 100+ms of extra delay will push typical VoIP traffic well into the "bad" part of the MOS slope), and delay-sensing foreground protocols like RRTCC are probably blown out of the water.

If LEDBAT actually is to be a 'scavenger' protocol, it must have some mechanism other than purely queue depth to allow foreground protocols to push it out of the way. It's possible it could stick to queue depth but use very small values, and/or use slower time constants than "foreground" delay-sensing algorithms to guarantee they 'win' when competing with it.

Cross-posting to the LEDBAT list in the hopes that I'm wrong. (For reference, RRTCC is a delay-sensing CC algorithm for RTP traffic, recently discussed at IETF83 in the ICCRG and planned for use in rtcweb clients. RRTCC is a brand-new moniker for the algorithm in Harald Alvestrand's draft, but similar algorithms have been in use (but not standardized) since at least 2004, long predating LEDBAT/uTP.)

Michael Welzl

12:32 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

...

On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice....

Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay. Cheers, Michael

Stefan Holmer

12:39 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On Fri, Apr 20, 2012 at 2:32 PM, Michael Welzl <michawe@ifi.uio.no> wrote:

...

On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice....

Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

Is this really the case? I interpret that the target (100 ms) refers to queueing delay, since LEDBAT tries to minimize target - queueing_delay, where queueing_delay = current_delay - base_delay. Could be wrong though.

...

Cheers, Michael

______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Michael Welzl

12:51 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On Apr 20, 2012, at 2:39 PM, Stefan Holmer wrote:

...

On Fri, Apr 20, 2012 at 2:32 PM, Michael Welzl <michawe@ifi.uio.no> wrote:

On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote: Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice....

Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

Is this really the case? I interpret that the target (100 ms) refers to queueing delay, since LEDBAT tries to minimize target - queueing_delay, where queueing_delay = current_delay - base_delay. Could be wrong though.

Sorry, my bad (I think). Cheers, Michael

Piers O'Hanlon

12:41 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 20 Apr 2012, at 13:32, Michael Welzl wrote:

...

On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

...
On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice....

Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

I think LEDBAT's target is the relative delay (i.e. from queues) - It's not clear how it would measure the total end-to-end delay. Piers.

...

Cheers, Michael

_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Jim Gettys

1:13 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 04/20/2012 08:41 AM, Piers O'Hanlon wrote:

...

On 20 Apr 2012, at 13:32, Michael Welzl wrote:

...
On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

...
On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that. 100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice.... Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

I think LEDBAT's target is the relative delay (i.e. from queues) - It's not clear how it would measure the total end-to-end delay.

Sorry, I wasn't really clear enough. It may be that LEDBAT will stop adding at 100ms; my point is that even with "traditional" drop-tail queues sized at the "traditional" 100ms level, you have to think about the fact there are two ends. Either/both may be filled, and filled by even a single TCP connection. So the total delays are as I lay out: you can trivially get to 300ms without trying hard under some circumstances, twice the ITU max recommendation for telephony, and about 10x human perception time for other applications. And the variable bandwidth case makes this all much worse (both Powerboost or equivalent in the broadband connections, and tremendously more so in wireless. Fixed size, unmanaged, drop tail queues have *got* to go. To "fix" this disaster, we have to make the edge "smarter", probably with a combination of some sort of "fair" queueing/diffserv *and* AQM. This is that tack that some of us are taking in Dave Taht's CeroWrt home router work. Once we've done that, delay sensitive AQM's stop being effective, since the delays drop to relative zero. So people need to think about how delay sensitive algorithms such as LEDBAT compete with other traffic in the absence of delays that trigger them. Because any fix removes the very delay that they use to try to get out of the way. - Jim

Jim Gettys

1:20 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 04/20/2012 09:13 AM, Jim Gettys wrote:

...

On 04/20/2012 08:41 AM, Piers O'Hanlon wrote:

...
On 20 Apr 2012, at 13:32, Michael Welzl wrote:

...
On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

...
On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that. 100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice.... Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

I think LEDBAT's target is the relative delay (i.e. from queues) - It's not clear how it would measure the total end-to-end delay.

Sorry, I wasn't really clear enough. It may be that LEDBAT will stop adding at 100ms; my point is that even with "traditional" drop-tail queues sized at the "traditional" 100ms level, you have to think about the fact there are two ends. Either/both may be filled, and filled by even a single TCP connection. So the total delays are as I lay out: you can trivially get to 300ms without trying hard under some circumstances, twice the ITU max recommendation for telephony, and about 10x human perception time for other applications.

And the variable bandwidth case makes this all much worse (both Powerboost or equivalent in the broadband connections, and tremendously more so in wireless. Fixed size, unmanaged, drop tail queues have *got* to go.

To "fix" this disaster, we have to make the edge "smarter", probably with a combination of some sort of "fair" queueing/diffserv *and* AQM. This is that tack that some of us are taking in Dave Taht's CeroWrt home router work. Once we've done that, delay sensitive AQM's stop being effective, since the delays drop to relative zero. I meant to day delay sensitive congestion control algorithms, of course... - Jim

...

So people need to think about how delay sensitive algorithms such as LEDBAT compete with other traffic in the absence of delays that trigger them. Because any fix removes the very delay that they use to try to get out of the way. - Jim

Stefan Holmer

1:22 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On Fri, Apr 20, 2012 at 3:13 PM, Jim Gettys <jg@freedesktop.org> wrote:

...

On 04/20/2012 08:41 AM, Piers O'Hanlon wrote:

...
On 20 Apr 2012, at 13:32, Michael Welzl wrote:

...
On Apr 20, 2012, at 2:23 PM, Jim Gettys wrote:

...
On 04/20/2012 07:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that. 100 ms + 75ms speed of light delay across the US (or equivalent across Europe, for example) + 100ms at the receiving end....

Of course, it's even worse between continents, even without broken networks.

Not so nice.... Not argueing about your point here (I agree that we have to fix the edge), but: LEDBAT is an end-to-end mechanism, so I think that the 100ms reflect the total measured end-to-end delay.

I think LEDBAT's target is the relative delay (i.e. from queues) - It's not clear how it would measure the total end-to-end delay.

Sorry, I wasn't really clear enough. It may be that LEDBAT will stop adding at 100ms; my point is that even with "traditional" drop-tail queues sized at the "traditional" 100ms level, you have to think about the fact there are two ends. Either/both may be filled, and filled by even a single TCP connection. So the total delays are as I lay out: you can trivially get to 300ms without trying hard under some circumstances, twice the ITU max recommendation for telephony, and about 10x human perception time for other applications.

And the variable bandwidth case makes this all much worse (both Powerboost or equivalent in the broadband connections, and tremendously more so in wireless. Fixed size, unmanaged, drop tail queues have *got* to go.

To "fix" this disaster, we have to make the edge "smarter", probably with a combination of some sort of "fair" queueing/diffserv *and* AQM. This is that tack that some of us are taking in Dave Taht's CeroWrt home router work. Once we've done that, delay sensitive AQM's stop being effective, since the delays drop to relative zero.

So people need to think about how delay sensitive algorithms such as LEDBAT compete with other traffic in the absence of delays that trigger them. Because any fix removes the very delay that they use to try to get out of the way.

Agreed, it's indeed important to have a mechanism using signals from the AQM in parallel to the delay sensitive one.

...

- Jim

_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Randell Jesup

2:48 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 4/20/2012 7:55 AM, Mirja Kuehlewind wrote:

...

Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

I'm afraid you've mis-understood the 150ms number. That's the "knee" in the curve for mouth-to-ear delay, not congestion delay or even end-to-end packet delay. (And it's more complicated than that; the 150ms number is dependent on the amount of echo from the far end - with high echo (poor/no ECs), it can be smaller.) And you can get asymmetric affects where delays in each direction are unequal. For VoIP communication, the delay budget roughly looks like this: frame size - typ 20-30ms -- you have to buffer up one packet to send echo cancellation - typ 1-3ms? encoder delay - typ 1-2ms? algorithmic delay - typ 0-5ms packetization, output queuing, etc - 0-10ms (typically low/nil) unloaded (no local-loop queuing) transit time: typically 20-100ms) queuing delay: ? jitter buffer depth - typically 20-60ms decoder, time scale modifier (adaptive jitter buffer): 0-2ms? rebuffering into audio frames for drivers: typ 1/2 frame size (5-10ms) Other random signal processing: 0-2ms? output device driver buffering (and reframing in OS frame size chunks - 16ms typ on Linux for 8KHz audio) - typ 10ms? longer on some OS's!!! hardware buffers This is kind of abstract and people could argue the numbers or list, but it's to give you an idea that queue depth is far from the only item (though it's the most variable one). Almost all of these are fixed and/or small, except transit time, queuing delay and jitter buffer depth (indirectly affected by queuing). Take off the fixed/small items from 150ms, and you are probably left 80-100ms (if you're lucky, 50-80 if you're not) - for transit, jitter buffer and queuing (and video calls can be a bit worse, with longer frame lengths, more jitter and often longer hardware queues). So, to stay under 150ms on local hops (with a fast access link at both ends), you need moderate jitter and can probably handle some static queuing (<25-50ms). For longer routes and/or slower access links (DSL), there's basically no budget for standing queues, especially as jitter is typically higher. I guarantee you any VoIP engineer seeing "100ms queuing delay" has their heart sink about conversational quality. Yes, you can have calls. Yes, they *will* suffer "typical" awkward-pause/talkover. You'll probably generally end up in the 200-300ms range mouth-ear, which isn't at the ~400-500ms "What do you think? over!" walkie-talkie level, but is uncomfortable. And that's assuming old-style inflexible VoIP UDP streams (G.711, G.722, G.729 (ugh). Once you add video with BW adaptation or adaptive audio codecs, interacting with LEDBAT gets painful if the VoIP stream uses a delay-sensing protocol (and it really, really wants to).

...

If you choose a delay-based congestion control I don't think your problem is LEDBAT but standard loss-based TCP that will frequently fill up the queue completely.

Delay-based can't "beat" a loss-based TCP flow that's long enough, that's true, but luckily most TCP flows are relatively short and/or bursty, especially across the access link.

...

Maybe you don't want to look at the total queuing delay but at the changes in queuing delay...? LEDBAT will keep the delay constant.

RRTCC and similar algorithms do not use OWD estimates, and so are less sensitive to mis-measurement of the base delay (which from the LEDBAT simulations can cause problems). RRTCC works entirely from deltas in inter-packet delay to determine if the queue is growing or shrinking (or stable). After a queuing event is observed (growing queue enough to give a signal from the filter), it drops bandwidth and tries to stay down (not probe for extra bandwidth) until the queue has drained and is once again stable. This allows it to generally make close to full use of bandwidth available with close to 0-length queues. It does generally value low queues over 100% bandwidth efficiency. -- Randell Jesup randell-ietf@jesup.org

Jim Gettys

3:03 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On 04/20/2012 10:48 AM, Randell Jesup wrote:

...

On 4/20/2012 7:55 AM, Mirja Kuehlewind wrote:

...
Hi Randell,

I didn't follow the whole discussion but regarding LEDBAT we have a TARGET delay of max. 100ms. That means you can choose a smaller one. We've chosen 100ms as a max as there is an ITU recommendation that 150 ms delay is acceptable for most user voice applications and we wanted for sure stay below that.

I'm afraid you've mis-understood the 150ms number. That's the "knee" in the curve for mouth-to-ear delay, not congestion delay or even end-to-end packet delay. (And it's more complicated than that; the 150ms number is dependent on the amount of echo from the far end - with high echo (poor/no ECs), it can be smaller.) And you can get asymmetric affects where delays in each direction are unequal.

For VoIP communication, the delay budget roughly looks like this:

frame size - typ 20-30ms -- you have to buffer up one packet to send echo cancellation - typ 1-3ms? encoder delay - typ 1-2ms? algorithmic delay - typ 0-5ms packetization, output queuing, etc - 0-10ms (typically low/nil) unloaded (no local-loop queuing) transit time: typically 20-100ms) queuing delay: ? jitter buffer depth - typically 20-60ms decoder, time scale modifier (adaptive jitter buffer): 0-2ms? rebuffering into audio frames for drivers: typ 1/2 frame size (5-10ms) Other random signal processing: 0-2ms? output device driver buffering (and reframing in OS frame size chunks - 16ms typ on Linux for 8KHz audio) - typ 10ms? longer on some OS's!!! hardware buffers

This is kind of abstract and people could argue the numbers or list, but it's to give you an idea that queue depth is far from the only item (though it's the most variable one). Almost all of these are fixed and/or small, except transit time, queuing delay and jitter buffer depth (indirectly affected by queuing).

Take off the fixed/small items from 150ms, and you are probably left 80-100ms (if you're lucky, 50-80 if you're not) - for transit, jitter buffer and queuing (and video calls can be a bit worse, with longer frame lengths, more jitter and often longer hardware queues). So, to stay under 150ms on local hops (with a fast access link at both ends), you need moderate jitter and can probably handle some static queuing (<25-50ms). For longer routes and/or slower access links (DSL), there's basically no budget for standing queues, especially as jitter is typically higher.

I guarantee you any VoIP engineer seeing "100ms queuing delay" has their heart sink about conversational quality. Yes, you can have calls. Yes, they *will* suffer "typical" awkward-pause/talkover. You'll probably generally end up in the 200-300ms range mouth-ear, which isn't at the ~400-500ms "What do you think? over!" walkie-talkie level, but is uncomfortable.

And that's assuming old-style inflexible VoIP UDP streams (G.711, G.722, G.729 (ugh). Once you add video with BW adaptation or adaptive audio codecs, interacting with LEDBAT gets painful if the VoIP stream uses a delay-sensing protocol (and it really, really wants to).

...
If you choose a delay-based congestion control I don't think your problem is LEDBAT but standard loss-based TCP that will frequently fill up the queue completely.

Delay-based can't "beat" a loss-based TCP flow that's long enough, that's true, but luckily most TCP flows are relatively short and/or bursty, especially across the access link.

...
Maybe you don't want to look at the total queuing delay but at the changes in queuing delay...? LEDBAT will keep the delay constant.

RRTCC and similar algorithms do not use OWD estimates, and so are less sensitive to mis-measurement of the base delay (which from the LEDBAT simulations can cause problems). RRTCC works entirely from deltas in inter-packet delay to determine if the queue is growing or shrinking (or stable). After a queuing event is observed (growing queue enough to give a signal from the filter), it drops bandwidth and tries to stay down (not probe for extra bandwidth) until the queue has drained and is once again stable. This allows it to generally make close to full use of bandwidth available with close to 0-length queues. It does generally value low queues over 100% bandwidth efficiency.

Thank you very much for all of this: I've been very aware of all of these effects (I did an audio server in the early '90's called AF a few may remember). But espousing it in my mail was going to obscure the point I was really trying to drive home: that the queueing delays are so huge, that even when *ignoring* all the rest of the latency budget, that we really have to fix the queueing delays, as they by themselves are badly unacceptable. We have to do away with the uncontrolled, fixed (usually grossly bloated) sized, single queued, edge devices currently in the Internet. And that implies both fancy queuing and AQM that can handle (often hugely variable) bandwidths we now see in the edge. My personal targets for queuing delays for RT sensitive traffic are to get them below 10ms at the edge (presuming the broadband technology doesn't also get you: e.g. I measure about 9ms of latency on my Comcast cable link). This is particularly interesting given the 4ms quantisation in 802.11. Fundamentally, latency you *never* want to give away: you can *never* get it back, and there has to be time for other processes to "do their thing" (e.g. echo cancellation, jitter buffer, etc.). Burning *any* latency unnecessarily just makes everything else harder.... - Jim

Matt Mathis

9:41 p.m.

New subject: [ledbat] LEDBAT vs RTCWeb

On Fri, Apr 20, 2012 at 8:03 AM, Jim Gettys <jg@freedesktop.org> wrote:

...

But espousing it in my mail was going to obscure the point I was really trying to drive home: that the queueing delays are so huge, that even when *ignoring* all the rest of the latency budget, that we really have to fix the queueing delays, as they by themselves are badly unacceptable. We have to do away with the uncontrolled, fixed (usually grossly bloated) sized, single queued, edge devices currently in the Internet. And that implies both fancy queuing and AQM that can handle (often hugely variable) bandwidths we now see in the edge.

Exactly my earlier point. In the above "we" means the IETF and the rest of the Internet community. We (RTPweb) can't fix this problem because it is way out of scope for the WG. There are some paths that don't have excess delay or jitter, either because they are under loaded or because "fancy queuing" is present and properly configured. RTCweb must define it's scope to deliver the best possible quality over these healthy links. For links with excess delay and or delay jitter, the best we can do is report the problem, and choose a rate that doesn't make the suckage too much worse. But in a fundamental way we (RTPweb) can't directly fix the problem. To the extent that RTCweb based applications diagnose delay and jitter problems, they will bring market pressure to bear on the bigger problem, so that perhaps it will get fixed. Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay

4929

Age (days ago)

4939

Last active (days ago)

List overview

Download

36 comments

10 participants

participants (10)

Arjuna Sathiaseelan
Harald Alvestrand
Jim Gettys
Matt Mathis
Michael Welzl
Mirja Kuehlewind
Piers O'Hanlon
Randell Jesup
Stefan Holmer
Wesley Eddy

LEDBAT vs RTCWeb

tags

participants (10)