Comments on draft-alvestrand-rtcweb-congestion-01

Hi all, I have read draft-alvestrand-rtcweb-congestion-01. High-level comments: Today, this has been called a "strawman proposal". I agree with this way of looking at it. There are some quite interesting ideas there; however, a lot seems to appear "out of the blue", leaving the reader puzzled about the rationale behind some design choices. I wonder if these design choices are based on some academic papers out there - could citations help? Anyway, instead of trying to discuss the algorithm "in the air", it would be better to first work with some data: how does the mechanism really work in a real-life system? How does it interact with TCP? How sensitive is this mechanism to the typical issues raised for delay-based controls (the latecomer unfairness problem, behavior with very small queues, ..)? How likely is it for the derived rate of the mechanism to fall below the TFRC- function limit, under what conditions will this happen? As for the writing style, around the middle of section 3.2 you depart from what I would consider a reasonable amount of equations for an RFC. It might be better to point to a document where the derivation is given. Finer-grain comments: I got confused by your usage of "frame" - typically, to me, a frame is a packet at the link layer, or a picture in a video. Generally I have the impression that you use the word "frame" to refer to packets, but then towards the end of section 3.2 you talk about "the highest rate at which frames have been captured by the camera". On a side note, that is anyhow inappropriate, I guess, as this mechanism shouldn't necessarily be restricted to video (could be audio too), right? Same paragraph: "Since our assumption that v(i) should be zero mean WGN is less accurate in some cases" => why? Section 3.1: "Since the time ts to send a frame of size L over a path with a capacity C is" => this should at least say "roughly", as it omits a lot of (arguably, perhaps irrelevant) factors. Equation 5: I don't think that m(i) and v(i) have been defined before. (even though it seems obvious what is meant) Section 3.4: it's confusing to talk about increasing or decreasing "the available bandwidth" - if I got this right, what you do increase or decrease is the *receiver-side estimate* of the available bandwidth. Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess) I hope that's useful, Cheers Michael

Hi Michael, On Wed, Mar 28, 2012 at 1:16 AM, Michael Welzl <michawe@ifi.uio.no> wrote:
Hi all,
I have read draft-alvestrand-rtcweb-**congestion-01.
High-level comments:
Today, this has been called a "strawman proposal". I agree with this way of looking at it. There are some quite interesting ideas there; however, a lot seems to appear "out of the blue", leaving the reader puzzled about the rationale behind some design choices. I wonder if these design choices are based on some academic papers out there - could citations help? Anyway, instead of trying to discuss the algorithm "in the air", it would be better to first work with some data: how does the mechanism really work in a real-life system? How does it interact with TCP? How sensitive is this mechanism to the typical issues raised for delay-based controls (the latecomer unfairness problem, behavior with very small queues, ..)? How likely is it for the derived rate of the mechanism to fall below the TFRC-function limit, under what conditions will this happen?
We are currently working on producing some numbers on performance. It is very helpful to get suggestions on what kind of measurements and scenarios are most interesting.
As for the writing style, around the middle of section 3.2 you depart from what I would consider a reasonable amount of equations for an RFC. It might be better to point to a document where the derivation is given.
Finer-grain comments:
I got confused by your usage of "frame" - typically, to me, a frame is a packet at the link layer, or a picture in a video. Generally I have the impression that you use the word "frame" to refer to packets, but then towards the end of section 3.2 you talk about "the highest rate at which frames have been captured by the camera". On a side note, that is anyhow inappropriate, I guess, as this mechanism shouldn't necessarily be restricted to video (could be audio too), right?
In the draft a frame refers to a video frame. However, if an extension such as http://tools.ietf.org/html/rfc5450 is used frames can be substituted for packets. I agree that the draft shouldn't mention video, as it might as well act on audio frames.
Same paragraph: "Since our assumption that v(i) should be zero mean WGN is less accurate in some cases" => why?
Assuming that the jitter has a WGN distribution is a very strong assumption. For instance there are situations where a frame *i* is queued up behind another frame *i-1* and in that situation the jitter will not even be white.
Section 3.1: "Since the time ts to send a frame of size L over a path with a capacity C is" => this should at least say "roughly", as it omits a lot of (arguably, perhaps irrelevant) factors.
Agreed.
Equation 5: I don't think that m(i) and v(i) have been defined before. (even though it seems obvious what is meant)
Probably makes sense to refer to them from the text. E.g., "Breaking out the mean m(i) of w(i), leaving us with the zero mean process v(i), we get"
Section 3.4: it's confusing to talk about increasing or decreasing "the available bandwidth" - if I got this right, what you do increase or decrease is the *receiver-side estimate* of the available bandwidth.
Yes, I agree that should be clarified.
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I hope that's useful,
Cheers Michael
______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

On Mar 29, 2012, at 1:32 PM, Stefan Holmer wrote:
Hi Michael,
On Wed, Mar 28, 2012 at 1:16 AM, Michael Welzl <michawe@ifi.uio.no> wrote: Hi all,
I have read draft-alvestrand-rtcweb-congestion-01.
High-level comments:
Today, this has been called a "strawman proposal". I agree with this way of looking at it. There are some quite interesting ideas there; however, a lot seems to appear "out of the blue", leaving the reader puzzled about the rationale behind some design choices. I wonder if these design choices are based on some academic papers out there - could citations help? Anyway, instead of trying to discuss the algorithm "in the air", it would be better to first work with some data: how does the mechanism really work in a real-life system? How does it interact with TCP? How sensitive is this mechanism to the typical issues raised for delay-based controls (the latecomer unfairness problem, behavior with very small queues, ..)? How likely is it for the derived rate of the mechanism to fall below the TFRC- function limit, under what conditions will this happen?
We are currently working on producing some numbers on performance. It is very helpful to get suggestions on what kind of measurements and scenarios are most interesting.
As a starting point, for interaction with TCP, you could perhaps use at least parts of the test suite: http://caia.swin.edu.au/ngen/tcptestsuite/ And for showing that the mechanism works well and isn't prone to the typical problems of delay-based mechanism, take a look at the literature on delay-based congestion controls - papers on FAST, for instance, to check what the authors of these papers have investigated. Getting such suggestions is what the ICCRG should be for, IMO - fellow ICCRGers, please make suggestions!
In the draft a frame refers to a video frame. However, if an extension such as http://tools.ietf.org/html/rfc5450 is used frames can be substituted for packets. I agree that the draft shouldn't mention video, as it might as well act on audio frames.
Oh! That really wasn't clear to me - my bet was that you're really talking about packets! Well that should just be clarified.
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO). Cheers, Michael

On 03/29/2012 01:55 PM, Michael Welzl wrote:
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO).
This seems to be a subject that should be discussed in the context of the circuit-breakers draft: What kind of response time is appropriate for such a mechanism, and why? The current draft proposes that we're OK with a reaction time of around 10 seconds for a mechanism that we expect to use only in "emergencies" (such as total channel failure) - on the basis that getting a faster reaction time would require too much redesign of other parts of the ecosystem. As long as we can assume we get feedback signals in a timely fashion, we can take the absence of signals as a sign for "continue as usual", so the "emergency brake" is something we need when we have a total breakdown of the feedback channel, not for "day-to-day control". I think a discussion on what the timing requirements for an emergency brake are is a Good thing - but it may belong on AVTCORE rather than here.

On Mar 30, 2012, at 10:33 AM, Harald Alvestrand wrote:
On 03/29/2012 01:55 PM, Michael Welzl wrote:
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO).
This seems to be a subject that should be discussed in the context of the circuit-breakers draft: What kind of response time is appropriate for such a mechanism, and why?
I think not: we're talking about two kinds of situations here. The context here is: there was congestion, we should react to it within an RTO (and have an "emergency break" to always do that - but maybe that term was misleading). The circuit-breakers draft is about a much more serious condition (such as persistent congestion), warranting a much more serious reaction (terminating the connection). Cheers, Michael

Greetings Michael, Are you saying you need an emergency "brake" (i.e., slowing down) rather than emergency "break" (i.e., termination, with or without a restart later)? Cheers, Lachlan On 30 March 2012 21:17, Michael Welzl <michawe@ifi.uio.no> wrote:
On Mar 30, 2012, at 10:33 AM, Harald Alvestrand wrote:
On 03/29/2012 01:55 PM, Michael Welzl wrote:
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO).
This seems to be a subject that should be discussed in the context of the circuit-breakers draft: What kind of response time is appropriate for such a mechanism, and why?
I think not: we're talking about two kinds of situations here. The context here is: there was congestion, we should react to it within an RTO (and have an "emergency break" to always do that - but maybe that term was misleading). The circuit-breakers draft is about a much more serious condition (such as persistent congestion), warranting a much more serious reaction (terminating the connection).
Cheers, Michael
-- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia <http://caia.swin.edu.au/cv/landrew> Ph +61 3 9214 4837

On 3/30/2012 6:17 AM, Michael Welzl wrote:
On Mar 30, 2012, at 10:33 AM, Harald Alvestrand wrote:
On 03/29/2012 01:55 PM, Michael Welzl wrote:
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
Per discussion, "emergency brake"
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO). This seems to be a subject that should be discussed in the context of the circuit-breakers draft: What kind of response time is appropriate for such a mechanism, and why?
I think not: we're talking about two kinds of situations here. The context here is: there was congestion, we should react to it within an RTO (and have an "emergency break" to always do that - but maybe that term was misleading). The circuit-breakers draft is about a much more serious condition (such as persistent congestion), warranting a much more serious reaction (terminating the connection).
What's the RTO in this case, since we're talking UDP media streams? TCP RTO? -- Randell Jesup randell-ietf@jesup.org

On Apr 7, 2012, at 7:19 PM, Randell Jesup wrote:
On 3/30/2012 6:17 AM, Michael Welzl wrote:
On Mar 30, 2012, at 10:33 AM, Harald Alvestrand wrote:
On 03/29/2012 01:55 PM, Michael Welzl wrote:
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
Per discussion, "emergency brake"
yep (sorry for that)
I totally agree - what I meant is, it isn't clear to me if that emergency break is activated in time or too late. It should be in time (i.e. after roughly an RTO). This seems to be a subject that should be discussed in the context of the circuit-breakers draft: What kind of response time is appropriate for such a mechanism, and why?
I think not: we're talking about two kinds of situations here. The context here is: there was congestion, we should react to it within an RTO (and have an "emergency break" to always do that - but maybe that term was misleading). The circuit-breakers draft is about a much more serious condition (such as persistent congestion), warranting a much more serious reaction (terminating the connection).
What's the RTO in this case, since we're talking UDP media streams? TCP RTO?
Something in the order of that is what I had in mind. An RTT is the control interval that we can and should act upon, and of course you'd rather have an estimate that is averaged, and you really want to avoid having false positives from outliers, so you want to give it a reasonable safety margin. The TCP RTO has all that. Note I'm not "religious" about this - I think a mechanism that would react e.g. an RTT late could still lead to a globally "okay" behavior (but that would then be worth a closer investigation). My main point is that it should be around an RTO, or maybe a bit more, but not completely detached from RTT measurements. Cheers, Michael

On 4/7/2012 1:37 PM, Michael Welzl wrote:
On Apr 7, 2012, at 7:19 PM, Randell Jesup wrote:
What's the RTO in this case, since we're talking UDP media streams? TCP RTO?
Something in the order of that is what I had in mind. An RTT is the control interval that we can and should act upon, and of course you'd rather have an estimate that is averaged, and you really want to avoid having false positives from outliers, so you want to give it a reasonable safety margin. The TCP RTO has all that.
Note I'm not "religious" about this - I think a mechanism that would react e.g. an RTT late could still lead to a globally "okay" behavior (but that would then be worth a closer investigation). My main point is that it should be around an RTO, or maybe a bit more, but not completely detached from RTT measurements.
One thing I worry about for media streams if you mandate RTT-timeframe reaction (which usually means RTT-timeframe distance between reverse-path reporting) is the amount of potential additional reverse-path traffic they can engender. This is especially relevant in UDP land you don't have ACKs, and the available additional path of RTCP is bandwidth-limited itself with rules that might not allow you to send immediately. This could be an especially relevant constraint on low-RTT channels, especially if they're also low-bandwidth. IIRC, the same issue/impact was flagged in TFRC, but I think the issue may be worse here. In terms of global stability, an aspect of these sorts of algorithms that helps is that even if they may be a little slow to react(*) in a downward direction, they are also typically slow to re-take bandwidth. If they also use the slope of the delay change to estimate bandwidth, they can also be more accurate in their reaction to bandwidth availability changes, so when they do react they tend to avoid undershooting, and don't overshoot by much the way TCP may. (*) It's important to note that in normal operation, a delay-sensing algorithm may react much *faster* than TCP even if the reaction delay is many RTT - because the clock starts counting for a delay-sensitive algorithm when the bottleneck appears, not when the buffer overflows at the bottleneck. A delay sensitive algorithm that isn't faced with giant changes in queue depth will almost always react faster than TCP IMHO. The remaining question is what happens when the bottleneck faces a massive sudden cross-flow that suddenly saturates the buffer. As mentioned, if slope is used the delay-sensing algorithm may well cut bandwidth faster than TCP would, even if it reacts a little later in this case; doubly-so if the algorithm includes losses as an indication that not only has delay increased at the slope rate, but that on top of that a buffer has overflowed, and so losses should cause it to increase the estimate of the bandwidth change. -- Randell Jesup randell-ietf@jesup.org

Hi Randell, Thanks for your many comments! Catching up now: On Apr 8, 2012, at 8:33 AM, Randell Jesup wrote:
On 4/7/2012 1:37 PM, Michael Welzl wrote:
On Apr 7, 2012, at 7:19 PM, Randell Jesup wrote:
What's the RTO in this case, since we're talking UDP media streams? TCP RTO?
Something in the order of that is what I had in mind. An RTT is the control interval that we can and should act upon, and of course you'd rather have an estimate that is averaged, and you really want to avoid having false positives from outliers, so you want to give it a reasonable safety margin. The TCP RTO has all that.
Note I'm not "religious" about this - I think a mechanism that would react e.g. an RTT late could still lead to a globally "okay" behavior (but that would then be worth a closer investigation). My main point is that it should be around an RTO, or maybe a bit more, but not completely detached from RTT measurements.
One thing I worry about for media streams if you mandate RTT- timeframe reaction (which usually means RTT-timeframe distance between reverse-path reporting) is the amount of potential additional reverse-path traffic they can engender.
This is especially relevant in UDP land you don't have ACKs, and the available additional path of RTCP is bandwidth-limited itself with rules that might not allow you to send immediately. This could be an especially relevant constraint on low-RTT channels, especially if they're also low-bandwidth. IIRC, the same issue/impact was flagged in TFRC, but I think the issue may be worse here.
I'd argue that this rule is simply stupid when applied to congestion control. I understand the desire to limit feedback signaling, but doing congestion control right simply requires ACKs as a function of the RTT. One way out of this is to run RTP not over UDP but over a transport that uses the right amount of ACKs. Or call the UDP-based congestion control scheme + encapsulation a new protocol, with new rules, and run RTP "over" it, then the rules are gone :-)
In terms of global stability, an aspect of these sorts of algorithms that helps is that even if they may be a little slow to react(*) in a downward direction, they are also typically slow to re-take bandwidth. If they also use the slope of the delay change to estimate bandwidth, they can also be more accurate in their reaction to bandwidth availability changes, so when they do react they tend to avoid undershooting, and don't overshoot by much the way TCP may.
(*) It's important to note that in normal operation, a delay-sensing algorithm may react much *faster* than TCP even if the reaction delay is many RTT - because the clock starts counting for a delay- sensitive algorithm when the bottleneck appears, not when the buffer overflows at the bottleneck. A delay sensitive algorithm that isn't faced with giant changes in queue depth will almost always react faster than TCP IMHO.
... I'd generally agree, but all of that totally depends on the ACK frequency.
The remaining question is what happens when the bottleneck faces a massive sudden cross-flow that suddenly saturates the buffer. As mentioned, if slope is used the delay-sensing algorithm may well cut bandwidth faster than TCP would, even if it reacts a little later in this case; doubly-so if the algorithm includes losses as an indication that not only has delay increased at the slope rate, but that on top of that a buffer has overflowed, and so losses should cause it to increase the estimate of the bandwidth change.
As above, I think that we should not let ourselves be limited by these RTCP generation rules. Alternatively, we can go for Matt's suggestion and assume that traffic is by some means isolated from everything else, then this doesn't play a role as we only compete against our own traffic. Cheers, Michael

On 04/07/2012 07:37 PM, Michael Welzl wrote:
What's the RTO in this case, since we're talking UDP media streams? TCP RTO?
Something in the order of that is what I had in mind. An RTT is the control interval that we can and should act upon, and of course you'd rather have an estimate that is averaged, and you really want to avoid having false positives from outliers, so you want to give it a reasonable safety margin. The TCP RTO has all that.
Forgive me for being dense, but what is the metric by which the RTT is the right interval to act upon? I know that it's the shortest interval we CAN react upon, because it's impossible for signals to travel source -> destination -> source in less time than the RTT, but what's the logic that says it's the interval we MUST act upon? In particular, for the "bad" case of an unidirectional media stream that would normally use the AVP profile, with RTCP packets going back every 5 seconds, there's a real engineering cost in requiring that reactions happens on the RTT timescale, especially if the algorithm is required to react to the absence of feedback signals. I would like to have at least some idea of the benefit we're gaining from a short reaction time in order to evaluate what the cost/benefit tradeoff is. Harald

On Apr 10, 2012, at 7:13 AM, Harald Alvestrand wrote:
On 04/07/2012 07:37 PM, Michael Welzl wrote:
What's the RTO in this case, since we're talking UDP media streams? TCP RTO?
Something in the order of that is what I had in mind. An RTT is the control interval that we can and should act upon, and of course you'd rather have an estimate that is averaged, and you really want to avoid having false positives from outliers, so you want to give it a reasonable safety margin. The TCP RTO has all that.
Forgive me for being dense, but what is the metric by which the RTT is the right interval to act upon?
I know that it's the shortest interval we CAN react upon, because it's impossible for signals to travel source -> destination -> source in less time than the RTT, but what's the logic that says it's the interval we MUST act upon?
In particular, for the "bad" case of an unidirectional media stream that would normally use the AVP profile, with RTCP packets going back every 5 seconds, there's a real engineering cost in requiring that reactions happens on the RTT timescale, especially if the algorithm is required to react to the absence of feedback signals.
I would like to have at least some idea of the benefit we're gaining from a short reaction time in order to evaluate what the cost/ benefit tradeoff is.
Well okay, my statement here was handwavery. Let's put it this way, I imagine you causing problems with competing TCPs if you react slower, and I'd at least like to see this potential problem investigated (and that's how you can figure out the trade-off). On the other hand, as I keep saying: if you constantly receive SCTP ACKs from a parallel RTCweb data transfer, you should make use of that other feedback. Cheers, Michael

On 4/10/2012 1:54 AM, Michael Welzl wrote:
On Apr 10, 2012, at 7:13 AM, Harald Alvestrand wrote:
I would like to have at least some idea of the benefit we're gaining from a short reaction time in order to evaluate what the cost/benefit tradeoff is.
Well okay, my statement here was handwavery. Let's put it this way, I imagine you causing problems with competing TCPs if you react slower, and I'd at least like to see this potential problem investigated (and that's how you can figure out the trade-off).
Please see my discussion elsewhere which shows that except for the huge-sudden-congestion case a delay-sensing algorithm such as this should sense and react to imminent congestion earlier than a loss-based algorithm such as TCP would. (Otherwise it wouldn't be doing a very good job at avoiding delay!)
On the other hand, as I keep saying: if you constantly receive SCTP ACKs from a parallel RTCweb data transfer, you should make use of that other feedback.
While there are use-cases for rtcweb which involve "infinite" data sources (or at least close enough), by far most uses in rtcweb of the data channel are short, bursty, or paced at relatively low bandwidth. There are a few use-cases where large background transfers are done (largish file transfer), but most of those are typically a short burst in a longer call/connection. So it's a bad idea to plan on a "constant stream of SCTP ACKs"... If you have some SCTP traffic, then yes it would be nice if that helped your algorithm, but I think you'll find that the existing media streams will typically be far more frequent (and reliable). In a "normal" video chat, you'll have 10-30 frames/second of video (of 1 to (say) 6 packets per frame), plus typically 50 audio packets per second - in each direction, each carrying timing information. So in normal situations, there's plenty of timing traffic to use from media streams. There are cases where there may be one-way traffic, and knowledge of the idle direction's bandwidth will degrade - but that's ok, if the traffic starts up again is should see no delay (barring TCP causing standing queues). At most the algorithm should revert back to the starting state (faster adaptation, perhaps start-point, though I'd want the restart point to be based on our last good estimate, etc). So, does that answer your concerns? -- Randell Jesup randell-ietf@jesup.org

Sorry for the delay - I just noticed that I never answered this one: On 4/10/12 2:04 PM, Randell Jesup wrote:
On 4/10/2012 1:54 AM, Michael Welzl wrote:
On Apr 10, 2012, at 7:13 AM, Harald Alvestrand wrote:
I would like to have at least some idea of the benefit we're gaining from a short reaction time in order to evaluate what the cost/benefit tradeoff is.
Well okay, my statement here was handwavery. Let's put it this way, I imagine you causing problems with competing TCPs if you react slower, and I'd at least like to see this potential problem investigated (and that's how you can figure out the trade-off).
Please see my discussion elsewhere which shows that except for the huge-sudden-congestion case a delay-sensing algorithm such as this should sense and react to imminent congestion earlier than a loss-based algorithm such as TCP would. (Otherwise it wouldn't be doing a very good job at avoiding delay!) We're in agreement here.
On the other hand, as I keep saying: if you constantly receive SCTP ACKs from a parallel RTCweb data transfer, you should make use of that other feedback.
While there are use-cases for rtcweb which involve "infinite" data sources (or at least close enough), by far most uses in rtcweb of the data channel are short, bursty, or paced at relatively low bandwidth. There are a few use-cases where large background transfers are done (largish file transfer), but most of those are typically a short burst in a longer call/connection. So it's a bad idea to plan on a "constant stream of SCTP ACKs"...
No, not rely on that always being there, but make use of that when it's there - I was thinking of the scenario you get when you chat with someone and send that person some files (private pictures, ..) in the background in parallel.
If you have some SCTP traffic, then yes it would be nice if that helped your algorithm, but I think you'll find that the existing media streams will typically be far more frequent (and reliable). In a "normal" video chat, you'll have 10-30 frames/second of video (of 1 to (say) 6 packets per You're talking about RTCweb as if that would be a very common thing out there already :-) who knows how people will want to use the data channel in the future? Maybe it becomes a very popular means to send lots of files across...
frame), plus typically 50 audio packets per second - in each direction, each carrying timing information. So in normal situations, there's plenty of timing traffic to use from media streams. There are cases where there may be one-way traffic, and knowledge of the idle direction's bandwidth will degrade - but that's ok, if the traffic starts up again is should see no delay (barring TCP causing standing queues). At most the algorithm should revert back to the starting state (faster adaptation, perhaps start-point, though I'd want the restart point to be based on our last good estimate, etc).
So, does that answer your concerns? I'm not sure my concerns about what is now called RRTCC are answered, but I think that we pretty much agree design-wise anyway. I'm now just a bit confused by the mix of 1) arguments against my suggestion to use only one congestion control instance for everything (ideally by making all RTCweb flows streams of one SCTP association), 2) some statements saying "yeah, this was the plan anyway".
Maybe it's because I don't know RTCweb well enough yet, or don't understand who does what and who plans what... and maybe it's also because I proposed too many intertwined things in one go (all-over-SCTP *and* one congestion control instance for everything). Whatever. Anyway, I think it's an interesting discussion :-) Cheers, Michael

On 04/16/2012 03:48 PM, Michael Welzl wrote:
I'm not sure my concerns about what is now called RRTCC are answered, but I think that we pretty much agree design-wise anyway. I'm now just a bit confused by the mix of 1) arguments against my suggestion to use only one congestion control instance for everything (ideally by making all RTCweb flows streams of one SCTP association), 2) some statements saying "yeah, this was the plan anyway".
Maybe it's because I don't know RTCweb well enough yet, or don't understand who does what and who plans what... and maybe it's also because I proposed too many intertwined things in one go (all-over-SCTP *and* one congestion control instance for everything). Whatever. Anyway, I think it's an interesting discussion :-) Very possible - my take from RTCWEB side is that media over anything but SRTP is just not going to happen this year (or perhaps any year), for reasons having to do with installed base of a lot of stuff, while having a bridge for congestion control information to flow between the "media side" and the "sctp side" can be implemented in each implementation, and is such a no-brainer than everyone agrees we should Just Do It.
So since your note was "all in one go", it's no surprise that people seem to be saying yes AND no.... Harald

Let me second Harald's point by making an observation: Your (Michael's) solution has two subtasks: proper real time Congestion Control for a single flow and multiplexing multiple playload types onto a single flow. The short term agenda here is the first problem: proper real time CC for a single flow. There is (probably) 100% agreement that we need some solution to the 2nd problem (or at least some similar problem). But most of us do not want to put solving the second problem in the critical path for the first problem, which is essentially orthogonal. Note that the first problem might be concurrently solved and tested for both SRTP and RRTCC, since we all believe the algorithm might be implemented in either. Does this make sense? Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay On Mon, Apr 16, 2012 at 7:25 AM, Harald Alvestrand <harald@alvestrand.no> wrote:
On 04/16/2012 03:48 PM, Michael Welzl wrote:
I'm not sure my concerns about what is now called RRTCC are answered, but I think that we pretty much agree design-wise anyway. I'm now just a bit confused by the mix of 1) arguments against my suggestion to use only one congestion control instance for everything (ideally by making all RTCweb flows streams of one SCTP association), 2) some statements saying "yeah, this was the plan anyway".
Maybe it's because I don't know RTCweb well enough yet, or don't understand who does what and who plans what... and maybe it's also because I proposed too many intertwined things in one go (all-over-SCTP *and* one congestion control instance for everything). Whatever. Anyway, I think it's an interesting discussion :-)
Very possible - my take from RTCWEB side is that media over anything but SRTP is just not going to happen this year (or perhaps any year), for reasons having to do with installed base of a lot of stuff, while having a bridge for congestion control information to flow between the "media side" and the "sctp side" can be implemented in each implementation, and is such a no-brainer than everyone agrees we should Just Do It.
So since your note was "all in one go", it's no surprise that people seem to be saying yes AND no....
Harald
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

It totally does. Thanks!! Cheers, Michael On Apr 18, 2012, at 9:21 PM, Matt Mathis wrote:
Let me second Harald's point by making an observation: Your (Michael's) solution has two subtasks: proper real time Congestion Control for a single flow and multiplexing multiple playload types onto a single flow.
The short term agenda here is the first problem: proper real time CC for a single flow.
There is (probably) 100% agreement that we need some solution to the 2nd problem (or at least some similar problem).
But most of us do not want to put solving the second problem in the critical path for the first problem, which is essentially orthogonal.
Note that the first problem might be concurrently solved and tested for both SRTP and RRTCC, since we all believe the algorithm might be implemented in either.
Does this make sense?
Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay
On Mon, Apr 16, 2012 at 7:25 AM, Harald Alvestrand <harald@alvestrand.no
wrote: On 04/16/2012 03:48 PM, Michael Welzl wrote:
I'm not sure my concerns about what is now called RRTCC are answered, but I think that we pretty much agree design-wise anyway. I'm now just a bit confused by the mix of 1) arguments against my suggestion to use only one congestion control instance for everything (ideally by making all RTCweb flows streams of one SCTP association), 2) some statements saying "yeah, this was the plan anyway".
Maybe it's because I don't know RTCweb well enough yet, or don't understand who does what and who plans what... and maybe it's also because I proposed too many intertwined things in one go (all-over-SCTP *and* one congestion control instance for everything). Whatever. Anyway, I think it's an interesting discussion :-)
Very possible - my take from RTCWEB side is that media over anything but SRTP is just not going to happen this year (or perhaps any year), for reasons having to do with installed base of a lot of stuff, while having a bridge for congestion control information to flow between the "media side" and the "sctp side" can be implemented in each implementation, and is such a no-brainer than everyone agrees we should Just Do It.
So since your note was "all in one go", it's no surprise that people seem to be saying yes AND no....
Harald
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Hi Stefan, On 29 Mar 2012, at 13:32, Stefan Holmer wrote:
Hi Michael,
On Wed, Mar 28, 2012 at 1:16 AM, Michael Welzl <michawe@ifi.uio.no> wrote: Hi all,
I have read draft-alvestrand-rtcweb-congestion-01.
High-level comments:
Today, this has been called a "strawman proposal". I agree with this way of looking at it. There are some quite interesting ideas there; however, a lot seems to appear "out of the blue", leaving the reader puzzled about the rationale behind some design choices. I wonder if these design choices are based on some academic papers out there - could citations help? Anyway, instead of trying to discuss the algorithm "in the air", it would be better to first work with some data: how does the mechanism really work in a real-life system? How does it interact with TCP? How sensitive is this mechanism to the typical issues raised for delay-based controls (the latecomer unfairness problem, behavior with very small queues, ..)? How likely is it for the derived rate of the mechanism to fall below the TFRC-function limit, under what conditions will this happen?
We are currently working on producing some numbers on performance. It is very helpful to get suggestions on what kind of measurements and scenarios are most interesting.
Given that SCTP is being proposed for data in RTCWeb then the performance against SCTP's default TCP-like algorithm would be relevant. It would seem that since it is loss based it will probably lead to full queues and maximum latency so some alternative probably needs to be sought for SCTP. I had a question about the use of the Kalman filter being selected as a suitable filter - it's not that clear as to whether the underlying tracked quantities maybe modelled in a linear fashion which is what is expected for the standard Kalman filter. Piers
As for the writing style, around the middle of section 3.2 you depart from what I would consider a reasonable amount of equations for an RFC. It might be better to point to a document where the derivation is given.
Finer-grain comments:
I got confused by your usage of "frame" - typically, to me, a frame is a packet at the link layer, or a picture in a video. Generally I have the impression that you use the word "frame" to refer to packets, but then towards the end of section 3.2 you talk about "the highest rate at which frames have been captured by the camera". On a side note, that is anyhow inappropriate, I guess, as this mechanism shouldn't necessarily be restricted to video (could be audio too), right?
In the draft a frame refers to a video frame. However, if an extension such as http://tools.ietf.org/html/rfc5450 is used frames can be substituted for packets. I agree that the draft shouldn't mention video, as it might as well act on audio frames.
Same paragraph: "Since our assumption that v(i) should be zero mean WGN is less accurate in some cases" => why?
Assuming that the jitter has a WGN distribution is a very strong assumption. For instance there are situations where a frame i is queued up behind another frame i-1 and in that situation the jitter will not even be white.
Section 3.1: "Since the time ts to send a frame of size L over a path with a capacity C is" => this should at least say "roughly", as it omits a lot of (arguably, perhaps irrelevant) factors.
Agreed.
Equation 5: I don't think that m(i) and v(i) have been defined before. (even though it seems obvious what is meant)
Probably makes sense to refer to them from the text. E.g., "Breaking out the mean m(i) of w(i), leaving us with the zero mean process v(i), we get"
Section 3.4: it's confusing to talk about increasing or decreasing "the available bandwidth" - if I got this right, what you do increase or decrease is the *receiver-side estimate* of the available bandwidth.
Yes, I agree that should be clarified.
Section 4: par 3, "This algorithm is run every time a receive report arrives..." => so in case of severe congestion, when nothing else arrives, this algorithm waits for 2 * t_max_fb_interval... so can we rely on the mechanism to react to this congestion after roughly an RTO or not? (sounds like not) Is that bad? (I guess)
There is a need for some emergency break mechanism if no feedback gets through.
I hope that's useful,
Cheers Michael
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

My apologies for not chiming in earlier since the IETF meeting; I've been very busy with post-IETF work and needed to find some time to dive into the flurry of messages here. I'm really glad our proposals have generated such interest. On 3/27/2012 7:16 PM, Michael Welzl wrote:
Hi all,
I have read draft-alvestrand-rtcweb-congestion-01.
High-level comments:
Today, this has been called a "strawman proposal". I agree with this way of looking at it. There are some quite interesting ideas there; however, a lot seems to appear "out of the blue", leaving the reader puzzled about the rationale behind some design choices. I wonder if these design choices are based on some academic papers out there - could citations help? Anyway, instead of trying to discuss the algorithm "in the air", it would be better to first work with some data: how does the mechanism really work in a real-life system?
I know the Google folk are working on numbers. I'll tell you (per archive here) I designed and implemented a very similar delay-sensing algorithm back in 2004 which has been in use on the internet since that time (in residential videophones). The primary differences in our algorithm were using the slope of the delay increase (filter output in Google's algorithm) to approximate the amount the bottleneck is over-bandwidth. (packets arriving 5% slow means something very different than packets arriving 50% slow). (The updated draft from google might incorporate this idea now). I also didn't have a TFRC-based limit in mine, though it also watches packet loss. I also had a heuristic to watch for missing packets which were "fishy" due to an abrupt reduction of delay in the following packet(s) (a sign a the two packets had sat in a queue after the other packet had been lost). It's not definitive, and I didn't react unless I saw two of them within a few seconds, but it definitely helped overall to keep delay down. Overall, my algorithm worked similarly but was generally more heuristic.
How does it interact with TCP? How sensitive is this mechanism to the typical issues raised for delay-based controls (the latecomer unfairness problem, behavior with very small queues, ..)? How likely is it for the derived rate of the mechanism to fall below the TFRC-function limit, under what conditions will this happen?
All excellent questions. Very small queues in particular probably will force any such algorithm into loss-sensitive reactions or mode (which it must have); in that case the delay is inherently low anyways, so this isn't a problem. It is important in that case that the delay-sensing part not react incorrectly. -- Randell Jesup randell-ietf@jesup.org
participants (7)
-
Harald Alvestrand
-
Lachlan Andrew
-
Matt Mathis
-
Michael Welzl
-
Piers O'Hanlon
-
Randell Jesup
-
Stefan Holmer