
This is the first of a few messages on the details of RRTCC I expect to post, based on analysis (not testing) of the algorithm, and my 8-year experience with an algorithm I designed which had very similar theory and underpinnings. This has been used (mostly in residential settings) in hundreds of thousands of devices mostly for peer-to-peer calls. Please feel free to critique! I make no assertions that this analysis is guaranteed correct (and in fact I'm sure it will be pointed out several ways in which it's wrong), but I think it will be a helpful starting point. I also realize there are some simplifications assumed below; I've tried to note them. The first focus is on loss. I'll primarily focus on the impact of tail-drop policies for now. Loss affects a stream in a number of ways: 1) Directly - loss of a packet in the RTP stream 2) Indirectly - loss of a packet for another traffic stream, for this or another destination 3) "Random" loss (non-congestion) Note: Not all channels experience "random" loss, though some "random" loss is momentary congestion of a core router or some type of AQM. Since in this case we're focusing on tail-drop routers (at the bottleneck), we'll assume this category can be modeled as non-congestive random losses. Obviously the more inputs and knowledge of the streams (especially between the two endpoints, via any port, not just the 5-tuple) the better the model can perform. A confounding issue to be discussed later is how differential packet marking affects congestion and bandwidth models. Since RRTCC works from inter-packet-delays (when compared to sending times, either from RTP timestamps of header extensions with closer-to-the-stack sending times), let's look at how these types of loss affect the signals seen by the receiver and the Kalman filter. 1) Direct loss ------------------- In this case, we'll see the loss directly. This is common in access-link self-congestion, where our stream(s) are the only significant user of bandwidth or if there are a very few users, and we've exceeded the capacity of the physical link. The inter-packet delay was likely increasing steadily leading up to the loss. For example, if the other side is sending at 220Kbps at 20ms intervals with an access bottleneck (upstream at their side, or downstream at our side) of 200Kbps, and there's no other traffic on the link, then the packets would have been coming in about 22ms apart instead of 20ms. Let's assume a very short queue depth (no bufferbloat!) of 2 packets (40ms), and look what happens. For whatever reason (packet loss, noisy signal to the filter, long RTT, cross-traffic that just went away, etc) let's assume that the system didn't react before the loss happened. When the packet is lost, there must have been 2 packets in the buffer. The receiver will see packet N, N+1, and then N+3 and N+4. N will have been delayed around 38ms, N+1 around 40ms, N+3 about 22ms, and N+4 around 24ms Pictorially this would look like a sawtooth in the delay profile. If you naively input these into the filter, it will start to move down away from a 10% slope, and might indicate a flat queue-depth profile or perhaps even negative (draining) for a short time, when in fact we've been over-bandwidth the entire time. This would depend on the exact filter type and usage, but this certainly violates some of the assumptions about error in a Kalman filter (for example, Gaussian distribution). At least in this case, a Kalman filter might not be the optimum choice. * Possible modifications: a) drop samples where there's a loss, since losses frequently perturb the delta of the next packet This will reduce the 'noise' in the inputs to the filter, especially in simple cases like above. b) increase the "uncertainty" of the next packet dramatically in the input to the filter This is an attempt to get the Kalman filter to weight the packet much lower, but not ignore it entirely. I don't see this as very useful in practice. c) look at whether there was a significant drop in delay following a loss, and use that as a separate indicator that we're over-bandwidth In my algorithm, I termed these "fishy" losses - they implied a queue suddenly shortened. Too many (more than 1) of these in a short period of time meant I would decrease sending bandwidth by a significant extra amount, since the implication is not just delay, but a full queue, which I really want to get out of fast. This is a form of modification 'c'. If we're not the only stream on the bottleneck, and we just got "unlucky" and had our packet get dropped, similar issues may occur, but the reduction in delay of the next packet may be less or much less, since we're typically metering in our packets at a frame rate. With enough other traffic, the delay of the next packet will be roughly flat (or 40ms in the case above). So this modification (c) is of most use in detecting the bandwidth/queue limit of a relatively idle link, typically an access (especially upstream) link. It therefore may want to be combined with other mechanisms. If we see a direct loss without a significant reduction in delay, we need to assume it's either a congested link (not a physical layer limit we're hitting on an idle link) or it's a "random" loss. Losses on a congested link also indicate a full queue, and so even though the delay stopped increasing (or stayed on=average stable after you've hit the max and started dropped), you still want to decrease sending rate. If it's a "random" loss (and not AQM), this causes a minor under-utilization, but for a truly congested link or if AQM is causing "random" drops, it's a signal we should reduce to try to start the queue draining. To avoid over-reaction and 'hunting', it may be good to use some type of threshold, perhaps averaged or filtered. If the loss rate reached ~5%, I'd make a mild cut in sending rate on top of any filter-suggested cuts, and if it reached ~10% I'd make a strong cut. (to be continued in another post) -- Randell Jesup randell-ietf@jesup.org

On Mon, Aug 6, 2012 at 6:10 PM, Randell Jesup <randell-ietf@jesup.org>wrote:
This is the first of a few messages on the details of RRTCC I expect to post, based on analysis (not testing) of the algorithm, and my 8-year experience with an algorithm I designed which had very similar theory and underpinnings. This has been used (mostly in residential settings) in hundreds of thousands of devices mostly for peer-to-peer calls.
Please feel free to critique! I make no assertions that this analysis is guaranteed correct (and in fact I'm sure it will be pointed out several ways in which it's wrong), but I think it will be a helpful starting point. I also realize there are some simplifications assumed below; I've tried to note them.
The first focus is on loss. I'll primarily focus on the impact of tail-drop policies for now.
Loss affects a stream in a number of ways:
1) Directly - loss of a packet in the RTP stream 2) Indirectly - loss of a packet for another traffic stream, for this or another destination 3) "Random" loss (non-congestion)
Note: Not all channels experience "random" loss, though some "random" loss is momentary congestion of a core router or some type of AQM. Since in this case we're focusing on tail-drop routers (at the bottleneck), we'll assume this category can be modeled as non-congestive random losses.
Obviously the more inputs and knowledge of the streams (especially between the two endpoints, via any port, not just the 5-tuple) the better the model can perform. A confounding issue to be discussed later is how differential packet marking affects congestion and bandwidth models.
Since RRTCC works from inter-packet-delays (when compared to sending times, either from RTP timestamps of header extensions with closer-to-the-stack sending times), let's look at how these types of loss affect the signals seen by the receiver and the Kalman filter.
1) Direct loss ------------------- In this case, we'll see the loss directly. This is common in access-link self-congestion, where our stream(s) are the only significant user of bandwidth or if there are a very few users, and we've exceeded the capacity of the physical link.
The inter-packet delay was likely increasing steadily leading up to the loss. For example, if the other side is sending at 220Kbps at 20ms intervals with an access bottleneck (upstream at their side, or downstream at our side) of 200Kbps, and there's no other traffic on the link, then the packets would have been coming in about 22ms apart instead of 20ms.
Let's assume a very short queue depth (no bufferbloat!) of 2 packets (40ms), and look what happens. For whatever reason (packet loss, noisy signal to the filter, long RTT, cross-traffic that just went away, etc) let's assume that the system didn't react before the loss happened.
When the packet is lost, there must have been 2 packets in the buffer. The receiver will see packet N, N+1, and then N+3 and N+4. N will have been delayed around 38ms, N+1 around 40ms, N+3 about 22ms, and N+4 around 24ms Pictorially this would look like a sawtooth in the delay profile.
If you naively input these into the filter, it will start to move down away from a 10% slope, and might indicate a flat queue-depth profile or perhaps even negative (draining) for a short time, when in fact we've been over-bandwidth the entire time.
Averaged over a sawtooth, the queue depth *is flat*, right? So it is rather a question of doing correct detection based on the output of the Kalman filter, and not the Kalman filter not estimating what is happening correctly. Also, for the Kalman filter in RRTCC, the uncertainty of the samples will increase in this case, which makes the filter slower.
This would depend on the exact filter type and usage, but this certainly violates some of the assumptions about error in a Kalman filter (for example, Gaussian distribution). At least in this case, a Kalman filter might not be the optimum choice.
In the non-Gaussian case, the Kalman filter is the best linear (LMMSE) state estimator, while in the Gaussian case it is the optimal (MMSE) state estimator. So while it is certainly not the optimum choice, it isn't obviously broken either. There are computationally more expensive filters such as ensemble Kalman or particle filters, but it is not clear that they will improve significanly over straight Kalman. Some sort of simulation could sort this out. Here is a paper where they did this for bi-Gaussian and tri-Gaussian systems. I am not a statistician, so I can only skim this and sort-of kind-of see what is happening, but it could be an "inspiration" to further work: http://www.control.isy.liu.se/~fredrik/reports/05ifac_hendeby.pdf * Possible modifications:
a) drop samples where there's a loss, since losses frequently perturb the delta of the next packet This will reduce the 'noise' in the inputs to the filter, especially in simple cases like above.
Which samples in your example would you not feed to the Kalman filter? Deciding when to start feeding samples into the Kalman filter is another classification problem, or it can be done ad-hoc (ad-hoc ~= non-probabilistic). Also, reducing the 'noise' in the inputs to the filter makes it react faster, which is not necessarily what you want in this particular case. Note that there is an outlier filter in RRTCC. A sample is capped at 3 std.dev from the (exponential decay) jitter. (end of section 3.1) http://tools.ietf.org/html/draft-alvestrand-rtcweb-congestion-00#section-3.1
b) increase the "uncertainty" of the next packet dramatically in the input to the filter This is an attempt to get the Kalman filter to weight the packet much lower, but not ignore it entirely. I don't see this as very useful in practice.
I think this is conceptually cleaner. What RRTCC is doing now is calculating the exponential decay jitter of incoming packets. In the sawtooth-example, in stock RRTCC, the jitter variance will increase, which means that the filter will become slower. I think this is a good property. The exponential decay jitter calculations means that this will not happen on the *first loss*, but some time thereafter (i.e. the filter cannot react instantaneously). By using extra knowledge, we can probably build a more accurate model. My point is that it is not "obviously broken". A model in the filter that accounts for an absolute queue length while still retaining the current features would seem to be non-linear though, and thus require a different filter approach.
c) look at whether there was a significant drop in delay following a loss, and use that as a separate indicator that we're over-bandwidth
There will only be a significant drop in delay if the queue is short (no bufferbloat). To know whether a drop is "significant", a notion of queue length must be added to the model of the network. The larger the queue, and the higher the bandwidth, the smaller drops in delay is what we're searching for, and assuming that the queue sits egress on the sender, additional jitter is added on the rest of the path. Thus on the receiver side, an ad-hoc filtering (non probabilistic) could easily be far from optimal. I think it would certainly be interesting to look at a model that includes queue length as well as delay, but I think it should be probabilistic in nature, probably inside the filter, not as ad-hoc filtering on the outside. Alexander
In my algorithm, I termed these "fishy" losses - they implied a queue suddenly shortened. Too many (more than 1) of these in a short period of time meant I would decrease sending bandwidth by a significant extra amount, since the implication is not just delay, but a full queue, which I really want to get out of fast. This is a form of modification 'c'.
If we're not the only stream on the bottleneck, and we just got "unlucky" and had our packet get dropped, similar issues may occur, but the reduction in delay of the next packet may be less or much less, since we're typically metering in our packets at a frame rate. With enough other traffic, the delay of the next packet will be roughly flat (or 40ms in the case above). So this modification (c) is of most use in detecting the bandwidth/queue limit of a relatively idle link, typically an access (especially upstream) link. It therefore may want to be combined with other mechanisms.
If we see a direct loss without a significant reduction in delay, we need to assume it's either a congested link (not a physical layer limit we're hitting on an idle link) or it's a "random" loss. Losses on a congested link also indicate a full queue, and so even though the delay stopped increasing (or stayed on=average stable after you've hit the max and started dropped), you still want to decrease sending rate. If it's a "random" loss (and not AQM), this causes a minor under-utilization, but for a truly congested link or if AQM is causing "random" drops, it's a signal we should reduce to try to start the queue draining. To avoid over-reaction and 'hunting', it may be good to use some type of threshold, perhaps averaged or filtered. If the loss rate reached ~5%, I'd make a mild cut in sending rate on top of any filter-suggested cuts, and if it reached ~10% I'd make a strong cut.
(to be continued in another post)
-- Randell Jesup randell-ietf@jesup.org
______________________________**_________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/**mailman/listinfo/rtp-**congestion<http://www.alvestrand.no/mailman/listinfo/rtp-congestion>

Alexander: thanks for the response On 8/7/2012 8:37 AM, Alexander Kjeldaas wrote:
On Mon, Aug 6, 2012 at 6:10 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:
When the packet is lost, there must have been 2 packets in the buffer. The receiver will see packet N, N+1, and then N+3 and N+4. N will have been delayed around 38ms, N+1 around 40ms, N+3 about 22ms, and N+4 around 24ms Pictorially this would look like a sawtooth in the delay profile.
If you naively input these into the filter, it will start to move down away from a 10% slope, and might indicate a flat queue-depth profile or perhaps even negative (draining) for a short time, when in fact we've been over-bandwidth the entire time.
Averaged over a sawtooth, the queue depth *is flat*, right? So it is rather a question of doing correct detection based on the output of the Kalman filter, and not the Kalman filter not estimating what is happening correctly.
Right, on average it is flat once you're steady-state full-queue. And so the algorithm (if we get to that state) might believe the queue isn't full.... i.e. in the state machine it would be in the "flat" state. After a while there, it would try to increase sending rate, and it would see... no increase in queue depth, and increase again. (Or at least it does in my thought experiment... ;-) Of course, normally it would notice us getting into that state, but I certainly can conceive of it not noticing properly.
Also, for the Kalman filter in RRTCC, the uncertainty of the samples will increase in this case, which makes the filter slower.
This would depend on the exact filter type and usage, but this certainly violates some of the assumptions about error in a Kalman filter (for example, Gaussian distribution). At least in this case, a Kalman filter might not be the optimum choice.
In the non-Gaussian case, the Kalman filter is the best linear (LMMSE) state estimator, while in the Gaussian case it is the optimal (MMSE) state estimator.
So while it is certainly not the optimum choice, it isn't obviously broken either.
As I expected.
* Possible modifications: a) drop samples where there's a loss, since losses frequently perturb the delta of the next packet This will reduce the 'noise' in the inputs to the filter, especially in simple cases like above.
Which samples in your example would you not feed to the Kalman filter?
Samples where the previous packet was lost.
Deciding when to start feeding samples into the Kalman filter is another classification problem, or it can be done ad-hoc (ad-hoc ~= non-probabilistic).
Also, reducing the 'noise' in the inputs to the filter makes it react faster, which is not necessarily what you want in this particular case.
Well, the idea here was to segregate actions-at-a-drop from actions-not-at-a-drop, since in the 1-flow case I think that would produce a clearly distinct difference, and in that case I would want the filter to react faster. (In fact, I probably want faster reaction anytime there's significant loss, and that may help with AQM response too, where the filter will tell us relatively little or nothing useful.)
Note that there is an outlier filter in RRTCC. A sample is capped at 3 std.dev from the (exponential decay) jitter.
That might happen to drop those, or not. The model for that assumes random noise, which this really isn't.
(end of section 3.1) http://tools.ietf.org/html/draft-alvestrand-rtcweb-congestion-00#section-3.1
b) increase the "uncertainty" of the next packet dramatically in the input to the filter This is an attempt to get the Kalman filter to weight the packet much lower, but not ignore it entirely. I don't see this as very useful in practice.
I think this is conceptually cleaner. What RRTCC is doing now is calculating the exponential decay jitter of incoming packets. In the sawtooth-example, in stock RRTCC, the jitter variance will increase, which means that the filter will become slower. I think this is a good property.
Not sure about this for this case (see above). For gaussian and probably cross-traffic jitter, I probably agree.
The exponential decay jitter calculations means that this will not happen on the *first loss*, but some time thereafter (i.e. the filter cannot react instantaneously).
By using extra knowledge, we can probably build a more accurate model. My point is that it is not "obviously broken". A model in the filter that accounts for an absolute queue length while still retaining the current features would seem to be non-linear though, and thus require a different filter approach.
Maybe, or tweaks around a linear filter. Agreed not obviously broken.
c) look at whether there was a significant drop in delay following a loss, and use that as a separate indicator that we're over-bandwidth
There will only be a significant drop in delay if the queue is short (no bufferbloat). To know whether a drop is "significant", a notion of queue length must be added to the model of the network.
Actually the drop in delay should be the same regardless of queue depth. % delay will be less, but absolute will be the same.
The larger the queue, and the higher the bandwidth, the smaller drops in delay is what we're searching for, and assuming that the queue sits egress on the sender, additional jitter is added on the rest of the path. Thus on the receiver side, an ad-hoc filtering (non probabilistic) could easily be far from optimal.
I agree the more streams in the queue, the smaller the drops in delay, even if we're the one who gets the drop (and the odds of us getting it go down, and jitter goes up).
I think it would certainly be interesting to look at a model that includes queue length as well as delay, but I think it should be probabilistic in nature, probably inside the filter, not as ad-hoc filtering on the outside.
Probably correct. And this discussion makes me believe this input signal (delay reduction with a drop) is almost entirely relevant to small-number-of-streams on an otherwise non-congested link (eg access links usually). (And of course that's an immensely common case, and the main one I was designing for).
Alexander
In my algorithm, I termed these "fishy" losses - they implied a queue suddenly shortened. Too many (more than 1) of these in a short period of time meant I would decrease sending bandwidth by a significant extra amount, since the implication is not just delay, but a full queue, which I really want to get out of fast. This is a form of modification 'c'.
If we're not the only stream on the bottleneck, and we just got "unlucky" and had our packet get dropped, similar issues may occur, but the reduction in delay of the next packet may be less or much less, since we're typically metering in our packets at a frame rate. With enough other traffic, the delay of the next packet will be roughly flat (or 40ms in the case above). So this modification (c) is of most use in detecting the bandwidth/queue limit of a relatively idle link, typically an access (especially upstream) link. It therefore may want to be combined with other mechanisms.
And what I said here matches the above.
If we see a direct loss without a significant reduction in delay, we need to assume it's either a congested link (not a physical layer limit we're hitting on an idle link) or it's a "random" loss. Losses on a congested link also indicate a full queue, and so even though the delay stopped increasing (or stayed on=average stable after you've hit the max and started dropped), you still want to decrease sending rate. If it's a "random" loss (and not AQM), this causes a minor under-utilization, but for a truly congested link or if AQM is causing "random" drops, it's a signal we should reduce to try to start the queue draining. To avoid over-reaction and 'hunting', it may be good to use some type of threshold, perhaps averaged or filtered. If the loss rate reached ~5%, I'd make a mild cut in sending rate on top of any filter-suggested cuts, and if it reached ~10% I'd make a strong cut.
(to be continued in another post)
-- Randell Jesup randell-ietf@jesup.org

In the case of loss due to congestion (a full queue or AQM action), the loss itself seems like the right signal to process. Why wait to infer congestion from the subsequent delay pattern which can be speculative/unreliable rather than the loss itself? If the goal is to distinguish congestion from stochastic loss, that is a general problem that probably needs more thought than the RRTCC 3-sigma outlier filter, or Kalman filter (which is designed to filter stochastic jitter but not losses), or Randell's "fishy" filter. There should be ample research available on this topic from many years of TCP over wireless links. Mo -----Original Message----- From: rtp-congestion-bounces@alvestrand.no [mailto:rtp-congestion-bounces@alvestrand.no] On Behalf Of Randell Jesup Sent: Tuesday, August 07, 2012 6:13 PM To: rtp-congestion@alvestrand.no Subject: Re: [R-C] RRTCC issues: loss, decrease rate Alexander: thanks for the response On 8/7/2012 8:37 AM, Alexander Kjeldaas wrote:
On Mon, Aug 6, 2012 at 6:10 PM, Randell Jesup <randell-ietf@jesup.org <mailto:randell-ietf@jesup.org>> wrote:
When the packet is lost, there must have been 2 packets in the buffer. The receiver will see packet N, N+1, and then N+3 and N+4. N will have been delayed around 38ms, N+1 around 40ms, N+3 about 22ms, and N+4 around 24ms Pictorially this would look like a sawtooth in the delay profile.
If you naively input these into the filter, it will start to move down away from a 10% slope, and might indicate a flat queue-depth profile or perhaps even negative (draining) for a short time, when in fact we've been over-bandwidth the entire time.
Averaged over a sawtooth, the queue depth *is flat*, right? So it is rather a question of doing correct detection based on the output of the Kalman filter, and not the Kalman filter not estimating what is happening correctly.
Right, on average it is flat once you're steady-state full-queue. And so the algorithm (if we get to that state) might believe the queue isn't full.... i.e. in the state machine it would be in the "flat" state. After a while there, it would try to increase sending rate, and it would see... no increase in queue depth, and increase again. (Or at least it does in my thought experiment... ;-) Of course, normally it would notice us getting into that state, but I certainly can conceive of it not noticing properly.
Also, for the Kalman filter in RRTCC, the uncertainty of the samples will increase in this case, which makes the filter slower.
This would depend on the exact filter type and usage, but this certainly violates some of the assumptions about error in a Kalman filter (for example, Gaussian distribution). At least in this case, a Kalman filter might not be the optimum choice.
In the non-Gaussian case, the Kalman filter is the best linear (LMMSE) state estimator, while in the Gaussian case it is the optimal (MMSE) state estimator.
So while it is certainly not the optimum choice, it isn't obviously broken either.
As I expected.
* Possible modifications: a) drop samples where there's a loss, since losses frequently perturb the delta of the next packet This will reduce the 'noise' in the inputs to the filter, especially in simple cases like above.
Which samples in your example would you not feed to the Kalman filter?
Samples where the previous packet was lost.
Deciding when to start feeding samples into the Kalman filter is another classification problem, or it can be done ad-hoc (ad-hoc ~= non-probabilistic).
Also, reducing the 'noise' in the inputs to the filter makes it react faster, which is not necessarily what you want in this particular case.
Well, the idea here was to segregate actions-at-a-drop from actions-not-at-a-drop, since in the 1-flow case I think that would produce a clearly distinct difference, and in that case I would want the filter to react faster. (In fact, I probably want faster reaction anytime there's significant loss, and that may help with AQM response too, where the filter will tell us relatively little or nothing useful.)
Note that there is an outlier filter in RRTCC. A sample is capped at 3 std.dev from the (exponential decay) jitter.
That might happen to drop those, or not. The model for that assumes random noise, which this really isn't.
(end of section 3.1) http://tools.ietf.org/html/draft-alvestrand-rtcweb-congestion-00#section-3.1
b) increase the "uncertainty" of the next packet dramatically in the input to the filter This is an attempt to get the Kalman filter to weight the packet much lower, but not ignore it entirely. I don't see this as very useful in practice.
I think this is conceptually cleaner. What RRTCC is doing now is calculating the exponential decay jitter of incoming packets. In the sawtooth-example, in stock RRTCC, the jitter variance will increase, which means that the filter will become slower. I think this is a good property.
Not sure about this for this case (see above). For gaussian and probably cross-traffic jitter, I probably agree.
The exponential decay jitter calculations means that this will not happen on the *first loss*, but some time thereafter (i.e. the filter cannot react instantaneously).
By using extra knowledge, we can probably build a more accurate model. My point is that it is not "obviously broken". A model in the filter that accounts for an absolute queue length while still retaining the current features would seem to be non-linear though, and thus require a different filter approach.
Maybe, or tweaks around a linear filter. Agreed not obviously broken.
c) look at whether there was a significant drop in delay following a loss, and use that as a separate indicator that we're over-bandwidth
There will only be a significant drop in delay if the queue is short (no bufferbloat). To know whether a drop is "significant", a notion of queue length must be added to the model of the network.
Actually the drop in delay should be the same regardless of queue depth. % delay will be less, but absolute will be the same.
The larger the queue, and the higher the bandwidth, the smaller drops in delay is what we're searching for, and assuming that the queue sits egress on the sender, additional jitter is added on the rest of the path. Thus on the receiver side, an ad-hoc filtering (non probabilistic) could easily be far from optimal.
I agree the more streams in the queue, the smaller the drops in delay, even if we're the one who gets the drop (and the odds of us getting it go down, and jitter goes up).
I think it would certainly be interesting to look at a model that includes queue length as well as delay, but I think it should be probabilistic in nature, probably inside the filter, not as ad-hoc filtering on the outside.
Probably correct. And this discussion makes me believe this input signal (delay reduction with a drop) is almost entirely relevant to small-number-of-streams on an otherwise non-congested link (eg access links usually). (And of course that's an immensely common case, and the main one I was designing for).
Alexander
In my algorithm, I termed these "fishy" losses - they implied a queue suddenly shortened. Too many (more than 1) of these in a short period of time meant I would decrease sending bandwidth by a significant extra amount, since the implication is not just delay, but a full queue, which I really want to get out of fast. This is a form of modification 'c'.
If we're not the only stream on the bottleneck, and we just got "unlucky" and had our packet get dropped, similar issues may occur, but the reduction in delay of the next packet may be less or much less, since we're typically metering in our packets at a frame rate. With enough other traffic, the delay of the next packet will be roughly flat (or 40ms in the case above). So this modification (c) is of most use in detecting the bandwidth/queue limit of a relatively idle link, typically an access (especially upstream) link. It therefore may want to be combined with other mechanisms.
And what I said here matches the above.
If we see a direct loss without a significant reduction in delay, we need to assume it's either a congested link (not a physical layer limit we're hitting on an idle link) or it's a "random" loss. Losses on a congested link also indicate a full queue, and so even though the delay stopped increasing (or stayed on=average stable after you've hit the max and started dropped), you still want to decrease sending rate. If it's a "random" loss (and not AQM), this causes a minor under-utilization, but for a truly congested link or if AQM is causing "random" drops, it's a signal we should reduce to try to start the queue draining. To avoid over-reaction and 'hunting', it may be good to use some type of threshold, perhaps averaged or filtered. If the loss rate reached ~5%, I'd make a mild cut in sending rate on top of any filter-suggested cuts, and if it reached ~10% I'd make a strong cut.
(to be continued in another post)
-- Randell Jesup randell-ietf@jesup.org _______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

On 8/8/2012 1:04 AM, Mo Zanaty (mzanaty) wrote:
In the case of loss due to congestion (a full queue or AQM action), the loss itself seems like the right signal to process. Why wait to infer congestion from the subsequent delay pattern which can be speculative/unreliable rather than the loss itself?
I agree totally, one should always assume loss is some type of congestion (though very low levels of loss might be ignored). This is an area where the current proposed algorithm can be improved.
If the goal is to distinguish congestion from stochastic loss, that is a general problem that probably needs more thought than the RRTCC 3-sigma outlier filter, or Kalman filter (which is designed to filter stochastic jitter but not losses), or Randell's "fishy" filter. There should be ample research available on this topic from many years of TCP over wireless links.
Agreed. The way I used it was to give a "bonus" reduction in bandwidth if the losses appeared 'fishy'. Per the earlier emails, this would mostly happen on otherwise-mostly-idle access links or maybe during bursts of cross-traffic. -- Randell Jesup randell-ietf@jesup.org

Hi Randell, Just to make sure I understand your "bonus" reduction, it is because the "fishy" pattern confirms a congestion loss rather than a possibly stochastic loss, right? Or is there another reason to apply the bonus that I missed? Like some sort of mild congestion vs. severe congestion inference (that can't be obtained by the loss signal itself alone)? Thanks, Mo -----Original Message----- From: rtp-congestion-bounces@alvestrand.no [mailto:rtp-congestion-bounces@alvestrand.no] On Behalf Of Randell Jesup Sent: Wednesday, August 08, 2012 1:46 AM To: rtp-congestion@alvestrand.no Subject: Re: [R-C] RRTCC issues: loss, decrease rate On 8/8/2012 1:04 AM, Mo Zanaty (mzanaty) wrote:
In the case of loss due to congestion (a full queue or AQM action), the loss itself seems like the right signal to process. Why wait to infer congestion from the subsequent delay pattern which can be speculative/unreliable rather than the loss itself?
I agree totally, one should always assume loss is some type of congestion (though very low levels of loss might be ignored). This is an area where the current proposed algorithm can be improved.
If the goal is to distinguish congestion from stochastic loss, that is a general problem that probably needs more thought than the RRTCC 3-sigma outlier filter, or Kalman filter (which is designed to filter stochastic jitter but not losses), or Randell's "fishy" filter. There should be ample research available on this topic from many years of TCP over wireless links.
Agreed. The way I used it was to give a "bonus" reduction in bandwidth if the losses appeared 'fishy'. Per the earlier emails, this would mostly happen on otherwise-mostly-idle access links or maybe during bursts of cross-traffic. -- Randell Jesup randell-ietf@jesup.org _______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

On Wed, Aug 8, 2012 at 8:06 AM, Mo Zanaty (mzanaty) <mzanaty@cisco.com>wrote:
Hi Randell,
Just to make sure I understand your "bonus" reduction, it is because the "fishy" pattern confirms a congestion loss rather than a possibly stochastic loss, right? Or is there another reason to apply the bonus that I missed? Like some sort of mild congestion vs. severe congestion inference (that can't be obtained by the loss signal itself alone)?
Thanks, Mo
-----Original Message----- From: rtp-congestion-bounces@alvestrand.no [mailto: rtp-congestion-bounces@alvestrand.no] On Behalf Of Randell Jesup Sent: Wednesday, August 08, 2012 1:46 AM To: rtp-congestion@alvestrand.no Subject: Re: [R-C] RRTCC issues: loss, decrease rate
On 8/8/2012 1:04 AM, Mo Zanaty (mzanaty) wrote:
In the case of loss due to congestion (a full queue or AQM action), the loss itself seems like the right signal to process. Why wait to infer congestion from the subsequent delay pattern which can be speculative/unreliable rather than the loss itself?
I agree totally, one should always assume loss is some type of congestion (though very low levels of loss might be ignored). This is an area where the current proposed algorithm can be improved.
Agreed.
If the goal is to distinguish congestion from stochastic loss, that is a general problem that probably needs more thought than the RRTCC 3-sigma outlier filter, or Kalman filter (which is designed to filter stochastic jitter but not losses), or Randell's "fishy" filter. There should be ample research available on this topic from many years of TCP over wireless links.
Yes, and distinguishing between congestion and stochastic loss is a non-trivial problem. There are different approaches and ways to model it, but in the end I don't think there's a way to make it work reliably in all scenarios. But it might be possible to build something which improves things in some scenarios while not making a difference in other.
Agreed. The way I used it was to give a "bonus" reduction in bandwidth if the losses appeared 'fishy'. Per the earlier emails, this would mostly happen on otherwise-mostly-idle access links or maybe during bursts of cross-traffic.
You could even go one step further and make the "bonus" reduction depending on how certain you are that the loss was fishy or not.
-- Randell Jesup randell-ietf@jesup.org
_______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion _______________________________________________ Rtp-congestion mailing list Rtp-congestion@alvestrand.no http://www.alvestrand.no/mailman/listinfo/rtp-congestion

Stefan Holmer <holmer@google.com> wrote:
You could even go one step further and make the "bonus" reduction depending on how certain you are that the loss was fishy or not.
I agree with the general approach of having a confidence factor to weight the certainty of some metrics, including this one. Of course, the factor itself becomes another metric that must be modeled and computed effectively. Mo

On 8/8/2012 2:06 AM, Mo Zanaty (mzanaty) wrote:
Just to make sure I understand your "bonus" reduction, it is because the "fishy" pattern confirms a congestion loss rather than a possibly stochastic loss, right? Or is there another reason to apply the bonus that I missed? Like some sort of mild congestion vs. severe congestion inference (that can't be obtained by the loss signal itself alone)?
Mostly it was just stronger confirmation of congestion, and it seemed to work well. It increased the chance that in the first adjustment after congestion is noticed we'd be below the available bandwidth and the queues would start to drain (and drain faster!) - but the downside is that if the loss isn't congestion, you would impair quality more, which is why it was limited to cases where I was pretty sure. This brings up a related point - reaction speed. I've proposed here using the slope output of the filter to adjust the amount of bandwidth reduction. It's a fairly simple argument: * The slope of delay increase is directly proportional to how fast the queue is growing. * The rate the queue grows is directly proportional to how far over-bandwidth the bottleneck is * You want to reduce bandwidth far enough to cause the queues to start to drain. Note that in (3), it's really the combination of your reduction and the reduction of any other streams sharing the bottleneck. Once again, this argument is strongest when there are few streams at the bottleneck. In the single stream case, if you have correctly determined the slope, if you reduce bandwidth by an amount directly proportional to the slope, then you will (in theory!) exactly hit the link bandwidth. You'd want to slightly overshoot on purpose to ensure draining of the queue buildup. As the number of competing streams increase, you're more likely to over-react, in that you're reading the bottleneck over-use amount, but you're only responsible for one of the flows. So you have to be careful... On the other hand, you want to get the bottleneck below congestion as fast as possible to avoid further delay buildup, and you want to overshoot slightly to ensure it drains (especially if another flow is still increasing (TCP sawtooth)). So, instead of trying to hit the target in a single reduction, you can instead reduce sending rate by a fraction of apparent over-bandwidth amount (with some minimum reduction amount or percentage). This means in the single-flow case, it will take several RTT at minimum to get below the link rate, but the delay increase rate will continually slow. With this slower reaction pace, you want to also stop reacting more slowly, to ensure you get into the queues-draining point, and drain them fast enough to make a difference. You can do this by a number of methods. When the filter confirms the draining is occurring, you'll want to wait to increase bandwidth until the queues are stable and hopefully drained (the current algorithm includes this). This has the advantage of reacting faster in cases of more severe congestion or larger over-link-bandwidth cases. Tuning the reaction rate for sharing the link with other RRTCC flows, and to share reasonably with TCP flows would be the subject of tests; there are other related values to tune it, such as for how long it avoids switching direction (decrease->increase), how it reacts during startup, how RTT and jitter in delay signals feed in (more jitter may imply more competing streams, but not always). You *might* be able to infer the number of competing streams (or some measure of competition) by the slope change after a reduction. In the single-flow case, it may be a fairly strong signal that it's single-flow, and you might want to include memory of that for a period and adjust the reaction rate to be faster. One reason single-flows are so interesting is that access-link congestion is a very common case and very important case for connection quality for the user, if not the worrying case (from congestion-collapse or fairness perspectives). -- Randell Jesup randell-ietf@jesup.org

Randell Jesup <randell-ietf@jesup.org> wrote:
You *might* be able to infer the number of competing streams (or some measure of competition) by the slope change after a reduction. In the single-flow case, it may be a fairly strong signal that it's single-flow, and you might want to include memory of that for a period and adjust the reaction rate to be faster.
I agree it is desirable to infer the amount of competing traffic, via observing trends in delay (and loss/ECN) across rate changes, including the specific signal above.
One reason single-flows are so interesting is that access-link congestion is a very common case and very important case for connection quality for the user, if not the worrying case (from congestion-collapse or fairness perspectives).
I would agree in prior years, as most video devices/applications have addressed this "home alone" case (in proprietary, non-interoperable ways) and benefited from it. Dedicated video devices were common, and it was often easy to know if there was any competing traffic and stop it. But going forward, for newer multi-function devices/apps under design, I'm less confident that addressing the "home alone" case will be very beneficial. Multi-function often immediately breaks the solo flow assumption. Like webrtc games with player video, collaboration software with meeting video, chat with AR layers, etc. Even if you can coordinate all the flows via some type of congestion manager within the device/app, you are rarely the only active device/app on any network, even when seemingly "home alone". Devices that look dormant may be sleepwalking (cloud sync, or other content or software updates), so identifying and stopping competing traffic becomes impractical (not as easy as yelling "everyone off Netflix!"). I think the "home alone / solo flow" case is the right starting point to make progress on the simplest scenario. It should be the first case in the evaluation criteria draft, not because it is the most common scenario, but because it will be simplest to analyze and identify basic design flaws. Anything that fails the first simple test can't possibly survive further. But I would not consider this effort a success if that's as far as we get, or if that's the focus or sweet spot of the final solution. Mo
participants (4)
-
Alexander Kjeldaas
-
Mo Zanaty (mzanaty)
-
Randell Jesup
-
Stefan Holmer