Re: [R-C] Most problems are at the bottleneck: was Re: bufferbloat-induced delay at a non-bottleneck node

14 Oct 2011

      On 10/14/2011 04:28 PM, Randell Jesup wrote:
...
On 10/13/2011 7:43 PM, Jim Gettys wrote:
...
On 10/13/2011 06:46 PM, Randell Jesup wrote:
...
Yes - though in my case for desktops it's generally the main internet
or the other end's downstream, and for wireless it's usually 802.11 (I
have FiOS something like 30 or 35Mbps down, 20Mbps up).
The problem is that wireless is highly variable, and roughly comparable
to broadband bandwidth.  So we get the bottleneck going back and forth
(particularly since wireless is shared, and so others sharing the
wireless can slow the wireless bandwidth).
Best strategy for most home users is to try to get the bottleneck firmly
into the broadband link and use bandwidth shaping to control the
buffering there, since the host OS is not under your control.  So you
have a good excuse to go buy the shiny 802.11n router you have been
lusting after and hadn't convinced your wife/husband to buy.....  If you
do that, you can get really good behaviour today (until you wander too
far from your AP).
That's fine for me, that advice doesn't generally help our users.
Yeah, ergo shining the light on the problem.
...
...
...
...
Yup. Ergo the screed, trying to get people to stop before making
things
worse.  The irony is that I do understand that, were it not for the
fact
that browsers have long since discarded HTTP's 2 connection rule, it
might be a good idea, and help encourage better behaviour.
SPDY might help some here (though part of SPDY's purpose is to
continue to saturate that TCP connection even better, so maybe not).
It helps the transient problem.  It won't help if you are using SPDY for
bulk download of something the way HTTP is often abused for.
And it takes time for the buffers to fill, so it might help quite a
lot.  The buffers fill at one packet/ack I gather; the acks get further
and further apart as the buffer fills.
So, it might help if for no other reason than reducing the number of
TCP connections and startups, and reducing the number of
congestion-control streams.
Even one TCP stream will fill the buffers...

Using fewer connections reduces the transient problem.
...
...
...
We can't control other browsers/devices on the same connection; we may
be able to control other code within the same browser.
My point is that while external flows are outside our control,
internal browser TCP flows are within our control.
...
...
...
We can't really make our jitter buffers so big as to make for decent
audio/video when bufferbloat is present, unless you like talking to
someone half way to the moon (or further).  Netalyzr shows the problem
in broadband, but our OS's and home routers are often even worse.
The jitter buffers don't have to be that large - in steady-state, you
have a lot of delay.  You do have to manage delay some, but delay in
the network doesn't directly affect you.  Transitions  in and out of
bufferbloat will, but the jitter buffer should handle that.
I fear the spikes I see in my packet traces.  I see multiple retransmits
and a bunch of packets out of order each time I go through one of the
buffer fill cycles.
Not something I expect with RTP data - we don't retransmit on drops.
When looking at TCP data I would expect retransmits once those queues
fill.
I'm referring to the fact there are multiple packet losses very close
together in time.
...
...
...
...
In the short/immediate term, mitigations are possible.  My home
network
now works tremendously better than it did a year ago, and yours can
immediately too, even with many existing home routers.  But doing
so is
probably beyond non-network wizards today.
Yes.  Useful for looking into, but not for solving the problem.  (And
pressuring router makers - but that's a near-0-margin game for most of
them.
Again, the approach I have is to build a home router that actually works
right; ergo CeroWrt; the vendors can pick up the results as they see
fit.
That's about the only obvious way; they mostly license the base router
code from the HW vendor or a 3rd-party SW vendor, then put their
"corporate UI" and some features on top of it, from what I can tell.
The problem I would expect is that "hobbyist" router firmware is often
not usable by manufacturers for license issues, or if it is it's too
hard to reskin in their corporate layout, or it's too hard for them to
easily configure out stuff they don't want, etc.  And there's no one
*trying* to sell them on this, unless you can get the
SoC/reference-design people to pick it up.
Actually, OpenWrt is the "upstream" for some of the smaller router
vendors already.  And yes, I'm trying to get people to realise that
having a good upstream is better than where they are today.  Only time
will tell if we succeed.

And it's our way to get changes/fixes into the upstream projects that
are used by everybody, though the large commercial vendors currently
ship bits that have fermented (rotted) for 5 years or more.  So the way
I look at it is that at worst, the fixes eventually trickle into the
commercial code base; and some will ship much faster.
...
...
...
...
o exposing the bloat problem so that blame can be
apportioned is
*really* important.  Timestamps would help greatly here in rtp in
doing
so.  Modern TCP's (may) have the TCP timestamp option turned on (I
know
modern Linux systems do), so I don't know of anything needed there
beyond ensuring the TCP information is made available somehow, if it
isn't already. Being able to reliably tell people: "The network is
broken, you need to fix (your OS/your router/your broadband gear)." is
productive. and to deploy IPv6 we're looking to deploying new home kit
anyway.
We can look into that.  Suggestions welcome.
The first step is detection: simple timestamps get you that.
We can detect delay already (at least RTT delay; one-way is tough, but
we can approximate how much we are above the low point of one-way delay).
...
The next step is to locate the hop; basically, a traceroute like
algorithm that looks for the hop where the latency goes up unexpectedly
identifies what hop.  There is a commercial tool called "pingplotter"
which roughly does this and plots the result graphically.
So diagnostic tools.
...
...
...
o designing good congestion avoidance that will work in in an
unbroken, unbloated network is clearly needed.  But I don't think
heroic
engineering around bufferbloat is worthwhile right now for RTP; that
effort is better put into the solutions outlined above, I think. 
Trying
to do so when we've already lost the war (teleconferencing isn't
interesting when talking half way to the moon) is not productive, and
getting stable servo systems to work not just at the 100ms level, but
the multi-second level, when multi-second level isn't even usable for
the application is a waste.  RTP == Real-Time Transport Protocol, when
the network is no longer real time, is an oxymoron.
In practice it really does work most of the time.  But not all.
Yes, but I worry as more applications that move big stuff around deploy,
and Windows XP retires, the situation is only going to get worse.
Could be.