Bufferbloat, The Blind Men and the Elephant

I have been reading an excellent article by Jim Gettys entitled “The Blind Men and the Elephant” [here], and it made me smile. In it, he…

Photo by Photos By Beks on Unsplash

Bufferbloat, The Blind Men and the Elephant

I have been reading an excellent article by Jim Gettys entitled “The Blind Men and the Elephant” [here], and it made me smile. In it, he quotes a famous stanza:

It was six men of Indostan, to learning much inclined,
who went to see the elephant (Though all of them were blind),
that each by observation, might satisfy his mind.

……. (six stanzas elided)
And so these men of Indostan, disputed loud and long,
each in his own opinion, exceeding stiff and strong,
Though each was partly in the right, and all were in the wrong!
So, oft in theologic wars, the disputants, I ween,
tread on in utter ignorance, of what each other mean,
and prate about the elephant, not one of them has seen!
John Godfrey Saxe

In the article, Jim focuses on the problems related to Internet performance, and where each of the various roles involved only has the grip of their piece of the elephant. These might be routing engineers, network operators, Internet service providers, and so on. His viewpoint is that we consume increasing amounts of bandwidth that we don’t really need, just to cope with all the add-ons that we have in, and in just simply pushing data from one place to another — buffer bloat.

Perhaps it is all caused by the legacy of a time when we needed to split up the stack in order to support multiple protocols and different vendors, and where networks had to cope with traffic bursts and unreliable network connections. Our data, too, needed to be segmented up because our network buffers were just so limited. But these days are past, and we have generally convergence of the methods that we need to use. In our digital world, our connections are increasing fast and error-free, and where we can even move towards latency levels that are moving towards the speed of light. But still, we plod on supporting the past with all its three-way handshakes, data segment acknowledgments, and routing protocols that care little about the types of traffic they support.

I love the lines:

And so these men of Indostan, disputed loud and long,
each in his own opinion, exceeding stiff and strong,
Though each was partly in the right, and all were in the wrong!

and it makes me think of development, operations and security teams, and where each thinks they are doing things right, but they are all wrong.

Bloated security

In cybersecurity, a good example of the buffer bloat is in the creation of encryption tunnels. In order to keep compatibility with the networking stack, the SSL/TLS layer just adds in a fairly complex handshaking layer, and which supports a wide range of different hashing, symmetric key, key exchange and public-key methods. While this was all okay as we evolved through methods that were old and insecure, we have now generally settled on a few different methods that are well-proven and efficient. For example, there is often little need to support MD5, SHA-1, and Diffie Hellman methods, but still, we often support them. The greater the number of supported methods, the greater the chance of weakness, or a slow-down. Luckily, there are protocols such as Wireguard, and which get rid of all of the old crypto methods, and support the ones that are fit for an Internet that is secure by design:

We must see our bloatbuffer world as an old model and one which carries the past with it. Our new world aims to cut down on all the efficiencies that we add and looks towards an almost zero latency in access times and access to data whenever we need it. I appreciate that we are a long way from integrating high-bandwidth connections for every access we make, but if we cut down on all the in-efficiencies, and took hold of every part of the elephant, we would be heading in the right direction. Jim’s viewpoint is that we just keep ramping up our bandwidth, in order to cover all of the things we add to our connection to the Internet. I appreciate that IPv6 tried to overcome many of the problems we have caused, but that only addressed one part of the stack.

One must remember that IP and TCP were created in a world which often had dial-up connections, and transmitted over noisy environments. It was a world where we had devices that were slow and had limited buffers. Often these devices would have to tell fast devices to slow down and wait for a while. The mechanisms in TCP thus coped with this. But these days, even devices with limited processing capabilities can communicate in a relatively fast way and often do not need to tell the other side to stop communicating for a while. Thus the protocols we have are really not fit for a modern Internet, and we are really wasting bandwidth by just trying to overcome the legacy of the past.

Goodbye TCP, Hello QUIC (Quick UDP Internet Connections)

The need for the three-way handshake within TCP just seems to be a waste of time and the delay that it causes. So, why do we even need TCP anymore? Surely the more efficient UDP is all that we require in this error-free and fast networked world? Why can’t we just make a connection with another host, and then just send the data without adding all the additional things that TCP requires? Well, QUIC aims to overcome the problems of TCP.

Go fire-up Wireshark, and open your Chrome browser, and have a look at the traffic in the trace, and you might be baffled, as where you would expect lots of TLS 1.2 traffic, there are lots of QUIC UDP packets:

and if you follow a UDP stream you see the data contained in it:

So I’ll have to go back to my tutorials and re-write them to include QUIC (“Quick”). So what is it? Well, it’s Google’s new protocol and which started its life in 2012 (as created by Jim Roskind), but now is integrated fully into Chrome. While Chrome supports it, it is still not supported by many other browsers.

The one thing many security people know is that the three-way handshake is slow, and TCP provides a great overhead. Both TCP was really designed in the days when latency didn’t matter that much, and when we used unreliable networks. These days networks are highly reliable and fast. So why bother with TCP, at all, when UDP just sends the data with a small overhead? These days, there’s perhaps no need to handshake data, as you know it’ll get there reliably. It is now defined by the IETF [here].

It basically works by multiplexing data streams — as defined by HTTP/2 — at either end of the connection and removing the latency of the handshaking that is required in TCP. We thus get shorter connection time and reduced latency, along with using bandwidth estimations to reduce congestion problems. In a world of IoT and 5G, reduced latency will be a key attribute, and where we will get instant access to sensor data without requiring the massive overhead of TCP. There are also plans for FEC (Forward Error Correction) — as you would find in a CD-ROM, which can also be built-in, so even if there are errors, the receiver can correct them.

And, finally, watch our for QUIC HTTP/3, where we map HTTP connections over the QUIC protocol. For IoT and 5G, there is no other way, and TCP will be retired forever, and where we multiplex data stream … just like we did in the telecoms era.

Conclusions

We have booted up the Internet in the past five decades. But, technically, the Internet we have now is not that dissimilar to the one we created decades ago. And still, we continue using the same old methods that were created to overcome the problems we faced in the past. One of the core problems has always been that security and trust is a bolt-on, and has never really properly integrated.

Basically, in both the networking and security aspects of the Internet, we are all holding onto one part of the elephant. And so we have an Internet that was built for computer traffic (supporting bursts of traffic, and which is reliable), but increasingly needs to support real-time traffic (low latency and constant).

To me, the future is to build the Internet which truly integrates the best of our methods and which is efficient but integrates trust and resilience into every aspect of its design. The recent outage of Facebook shows that we need to start to get hold of the elephant, or the new digital-focused societies we are building will crumble due to the inherent weaknesses in our digital foundations. This will require Dev-Op-Sec integration, where teams come together to properly build software and networks which are efficient, secure, and resilient. Otherwise, we will just keep asking our ISP for more bandwidth, and the buffer bloat will only continue.

And I leave you with a quote that perhaps fits with our modern world:

tread on in utter ignorance, of what each other mean,
and prate about the elephant, not one of them has seen!