
When we encounter congestion, audio-over-TCP will experience this as jitter, while audio-over-UDP will experience this as packet loss, so the experience may be different.
"Jitter" is equivalent to packet loss in a real-time system. If you don't have decoded audio ready to go when it's time to play it out, there's nothing to be done but invoke your packet loss concealment algorithms, regardless of whether the packet is simply late, or not coming at all. UDP encounters this just as much as TCP does, and while you can attempt to mitigate the effect with a jitter buffer, there are limits to what it can do. You can also snapshot internal codec state and go back and re-decode if a packet does arrive late, but this is expensive and complicated, and the only benefit is a slightly faster recovery time: it's too late to fix the initial loss. The practical difference between TCP and UDP is that, when you encounter congestion, TCP wastes time and bandwidth continually trying to re-transmit packets that you no longer care about (making the congestion worse), while UDP does not. There are many other differences, but this is the one that can't be engineered around. Thus, http will always be suboptimal. It's better than nothing if you're behind a firewall that doesn't allow UDP, but that doesn't mean that everyone should have to use a suboptimal solution just because 5-10% of users do.