As HTTP/3 and HTTP/2 have existed for a while, you may have read some blog posts or heard tech conferences talking about their amazing performance revolution. Often they make it seem as if websites would magically become much faster with the flip of a switch, while it is really a more modest, yet beneficial, evolution in practice. So the primary objective of this series is to provide a nuanced understanding of the performance enhancements of HTTP over its history, enabling us to form an accurate assumption about their features.

To begin, it's a good thing to start with a brief history of the HTTP protocol, as the past would explain the present and give us some intuition for the future.

In 1991, Sir Tim Berners-Lee laid the foundation of the Hypertext Transfer Protocol (HTTP) by developing the HTTP/0.9 protocol. This initial version was a straightforward text-based protocol that enabled clients to request hypertext documents from servers.

The years between 1991 and 1995 are often referred to as the Internet Boom of the Early 1990s, marked by significant events such as the introduction of the first web browser and the formation of the World Wide Web Consortium (W3C) to guide the development of HTML. Additionally, the HTTP Working Group (HTTP-WG) was created to enhance the HTTP protocol, both of which have played crucial roles in the Web's evolution.

From this period of rapid growth, we needed a protocol that could serve more than just hypertext documents, and in 1996 the HTTP Working Group published HTTP/1.0. The HTTP protocol subsequently evolved into a hypermedia transport system, the response object could be various types, including hypertext, plain text, and images or any other content type. But there is a significant performance penalty on HTTP/1.0 that requires a new TCP connection per request, which proved to be expensive in terms of performance (further details on this will be discussed later in this series). After that, HTTP/1.1 standard was officially published in 1997 with a number of performance optimizations released in 1999, introducing keep-alive connections as one of the key performance enhancements.

In early 2015, HTTP/2 was announced, focusing on enhancing transport performance with innovative features like server push, parallel streams, and prioritization. Lastly, the introduction of HTTP3 in 2022 promises significant performance enhancements over HTTP/2 by shifting the underlying transport protocol from TCP to QUIC over UDP.

So what exactly are HTTP/3 and HTTP/2 ? Why were they needed after HTTP/1.1 ? How can or should you use them ? And especially, how do they improve web performance ? Let's find out.

In this post, we'll start with one of the main motivations behind not only HTTP/2 but also HTTP/3, which could provide valuable insight into the reason driving protocol evolution as well, named Head-of-Line blocking (HOL blocking) problem.

I. What is HOL blocking ?

A simple definition would be:

When a single (slow) object prevents other or following objects from making progress.

A practical illustration of this is a road with just a single lane, an accident can cause a significant traffic jam. This illustrates the First In, First Out manner, where a single issue at the “head” can “block” the entire “line”.

This issue has been one of the hardest Web performance problems to solve. To understand this, let’s start with the form of HOL blocking in HTTP version 1.1 and the strategies we address it, which could be quite familiar to web developers.

II. HOL blocking in HTTP/1.1

About the design, HTTP/1.1 reflects an earlier period of time when protocols were simpler, allowing for text-based communication that was easily readable as it traveled over the network. This is illustrated in Figure 1 below:

In this scenario, the browser made a request for the simple script.js file (highlighted in green) using HTTP/1.1, and Figure 1 illustrates the server's response. The HTTP layer is quite simple, as it just adds some textual "headers" (highlighted in red) right before the plaintext file content known as the "payload." These headers and the payload are then sent down to the underlying TCP layer (highlighted in orange) and delivered to the client. For this example, let's assume the entire file cannot be contained within a single TCP packet and it has to be split up into two parts.

Note: When using HTTPS, there's actually an additional security layer between HTTP and TCP, usually the TLS protocol, but we'll skip that for now to keep things clear.

Now let’s see what happens when the browser also requests style.css in Figure 2:

In this scenario, we are transmitting style.css (highlighted in purple) after the script.js file has been sent. The headers and content for style.css are simply added after the JavaScript file. The receiver relies on the Content-Length header to determine where one response ends and the next begins; in this straightforward example, script.js is 1000 bytes, and style.css is only 600 bytes.

This approach works well for two small files, but consider a situation where the JavaScript file is significantly larger, like 1 Mb instead of 1 Kb. In such a case, the CSS would have to wait until the entire JavaScript file is downloaded, despite being smaller and ready for use sooner. To visualize this, we can represent large_script.js as number 1 and style.css as number 2, resulting in a sequence like this:

11111111111111111111111111111111111111122

This situation illustrates the Head-of-Line blocking issue quite clearly! You might think the solution would be simply have the browser load the CSS file before the JavaScript file. However, the challenge lies in the fact that the browser cannot know which file would be larger at the time of the request. There’s currently no way to specify file sizes in HTML, which would be a helpful feature, like adding a size attribute to an image tag like this:

<img src="https://imgs.mysite.com/thisisfine.jpg" size="15000" />

The “real” solution to this problem is multiplexing. By breaking each file into smaller segments or "chunks," we can interleave these chunks during transmission. This means sending a chunk from the JavaScript file, followed by one from the CSS file, and continuing this pattern until both files are fully downloaded. This method allows the smaller CSS file to be available much sooner, while only slightly delaying the larger JavaScript file. If we visualize this with numbers, it would look something like this:

12121111111111111111111111111111111111111

Unfortunately, multiplexing is not feasible in HTTP/1.1 because of fundamental limitations within the protocol's design. In figure 3, we can observe the interleaving of just four chunks for the two resources, which highlights the constraints imposed by the protocol:

The core issue here is that HTTP/1.1 operates as a purely textual protocol, simply adding headers to the beginning of the payload without distinguishing between different resource chunks. For instance, when a browser begins to process the headers for script.js, it expects a payload of 1000 bytes as indicated by the Content-Length. However, it only receives 450 bytes of JavaScript in the first chunk and then moves on to the headers for style.css. This leads to the browser mistakenly interpreting the CSS headers and the first chunk of CSS payload as part of the JavaScript payload, since both are just plain text. Additionally, the browser stops reading after 1000 bytes, leaving it in the middle of the second chunk of script.js. At this point, it fails to find valid new headers and discards the rest of TCP packet 3. The browser then passes what it thinks is script.js to the JavaScript parser, which fails because it’s not valid:

You may think there is a simple solution: instruct the browser to identify the HTTP header pattern to determine the beginning of a new header block. While this approach may be effective for TCP packet 2, it would not succeed with packet 3, as the browser would be unable to determine where the script.js chunk ends and the style.css chunk starts.

Takeaway

This is a fundamental limitation of the way the HTTP/1.1 protocol was designed. With a single HTTP/1.1 connection, you must wait for the complete delivery of one resource before sending another. This can cause significant head-of-line blocking problem, especially if earlier resources take time to generate, like a dynamically created "index.html" that needs database queries, or when those resources are large.

Consequently, various techniques have been developed over time to address this issue, but it's important to note that these solutions are merely temporary workarounds. Let's explore this further.

III. Stopgap Workarounds

1. Using multiple TCP connections

HOL blocking in HTTP/1.1 protocol is why browsers began establishing multiple parallel TCP connections, usually 6, for each page load under HTTP/1.1 to mitigate head-of-line (HOL) blocking by distributing requests across these connections. However, this approach becomes less effective when a page contains more than 6 resources, a situation that is increasingly common. As the HTTP Archive indicates that the average webpage now consists of over 70 individual resources.

2. Domain Sharding

As most browsers limit the number of TCP connections per domain, the concept of "sharding" resources across various domains and Content Delivery Networks (CDNs) originates here. By allowing each domain to get 6 connections, the greater the number of shards, the more parallelism we have!

While this approach is effective, it comes with significant overhead. Establishing a new TCP connection can be costly, particularly regarding server state, memory usage, and the computations needed for connection setup. Additionally, it places the burden on the site author to oversee the distribution and management of resources.

To illustrate why TCP connection is expensive, consider the flow shown below. This flow explains how a TCP connection is established before the browser can send its initial request. It takes one round-trip time, and for HTTPS connections, the TLS handshake adds extra time, leading to increased request latency for each new page.

3. Concatenation and Spriting

In the case of JavaScript and CSS files, we combine them into a single resource. Similarly, multiple images are combined into a larger, composite image or “image sprite” and CSS are then used to select and position the appropriate parts of the sprite within the browser viewport.

These techniques can improve performance by reducing networking costs. However, they also introduce extra application complexity by requiring preprocessing, deployment consideration, extra code (for example, CSS markup for managing sprite, etc.,). Further, bundling multiple independent resources may also have a significant negative impact on cache strategy and execution speed of the page. A single update to any individual file will require invalidating and downloading the new asset bundle.

Both JavaScript and CSS are parsed and executed only when the transfer is finished, thus potentially delaying the execution speed of your application. For illustration purpose, let's see the figure below:

💡 There are actually two additional techniques known as Resource Inlining and HTTP Pipeline. If you're interested, you can look into these for more technical information.

4. Takeaway

For numerous web developers, these solutions are considered standard optimizations: well-known, essential, and widely recognized. Yet, it is important to understand that these methods are merely temporary fixes for the limitations of the HTTP/1.1 protocol. The need to concatenate files, or domain sharding should not be a concern. These optimizations are implemented for good reasons, and we must depend on them until the fundamental problems are addressed in the upcoming version of the protocol.

IV. Conclusion

In this post, we initially explored the reasons behind the Head-of-Line (HOL) blocking issues present in HTTP/1.1. Since HOL blocking cannot be effectively addressed with HTTP/1.1, and the existing workaround solutions have their own drawbacks, it became evident that a completely new approach was necessary, which led to the development of HTTP/2. In the following part, we will explore how HTTP/2 addresses this challenge and how its new features improve web performance in practice.

HTTP Adventure: [Part I] Head-of-Line Blocking

I. What is HOL blocking ?

II. HOL blocking in HTTP/1.1

Takeaway

III. Stopgap Workarounds

1. Using multiple TCP connections

2. Domain Sharding

3. Concatenation and Spriting

4. Takeaway

IV. Conclusion

Most read

Thuyết trình với quy tắc ABC

“Tân binh khủng long” của MFV chính thức “Nhật tiến”

How our team set up Automation Test by Playwright and Cucumber for multiple projects?

Unconscious Bias Trong Chốn Công Sở

Mock tạo thủ công có thể dùng để test trong Golang

More like this

Is Code Review as important as Salary Review?

Một Team Ăn Ý Là Team Một Màu?

Frontend Advantages with BFF Pattern