Performance testing HTTP/1.1 vs HTTP/2 vs HTTP/2 + Server Push for REST APIs – Evert Pot

When building web services, a common wisdom is to try to reduce the number of
HTTP requests to improve performance.

There are a variety of benefits to this, including less total bytes being
sent, but the predominant reason is that traditionally browsers will only make
6 HTTP requests in parallel for a single domain. Before 2008, most browsers
limited this to 2.

When this limit is reached, it means that browsers will have to wait until
earlier requests are finished before starting new ones. One implication is that
the higher the latency is, the longer it will take until all requests finish.

Take a look at an example of this behavior. In the following simulation we’re
fetching a ‘main’ document. This could be the index of a website, or a some
JSON collection.

After getting the main document, the simulator grabs 99 linked items. These
could be images, scripts, or other documents from an API.

The 6 connection limit has resulted in a variety of optimization techniques.
Scripts are combined and compressed, graphics are often combined into
‘sprite maps’.

The limit and ‘cost’ of a single HTTP connection also has had an effect on web
services. Instead of creating small, specific API calls, designers of REST
(and other HTTP-) services are incentivized to pack many logical ‘entities’
in a single HTTP request/response.

For example, when an API client needs a list of ‘articles’ from an API, usually
they will get this list from a single endpoint instead of fetching each article
by its own URI.

The savings are massive. The following simulation is similar to the last,
except now we’ve combined all entities in a single request.

If an API client needs a specific (large) set
of entities from a server, in order to reduce HTTP requests, API developers will
be compelled to either build more API endpoints, each to give a result
that is tailored to the specific use-case of the client or deploy systems
that can take arbitrary queries and return all the matching entities.

The simplest form of this is perhaps a collection with many query parameters,
and a much more complex version of this is GraphQL, which effectively uses
HTTP as a pipe for its own request/response mechanism and allows for a wide range
of arbitrary queries.

Drawbacks of compounding documents

There’s a number of drawbacks to this. Systems that require compounding of
entities typically need additional complexity on both server and client.

Instead of treating a single entity as some object that has a URI, which
can be fetched with GET and subsequently cached, a new layer is required
on both server and client-side that’s responsible for teasing these entities
apart.

Re-implementing the logic HTTP already provides also has a nasty side-effect
that other features from HTTP must also be reimplemented. The most common
example is caching.

On the REST-side of things, examples of compound-documents can be seen in
virtually any standard. JSON:API, HAL and Atom all have
this notion.

If you look at most full-featured JSON:API client implementations, you will
usually see that these clients often ship with some kind of ‘entity store’,
allowing it to keep track of which entities it received, effectively
maintaining an equivalent of a HTTP cache.

Another issue is that for some of these systems, is that it’s typically
harder for clients to just request the data they need. Since they are often
combined in compound documents it’s all-or-nothing, or significant complexity
on client and server (see GraphQL).

A more lofty drawback is that API designers may have trended towards systems
that are more opaque, and are no longer part of the web of information due
to a lack that interconnectedness that linking affords.

HTTP/2 and HTTP/3

HTTP/2 is now widely available. In HTTP/2 the cost of HTTP requests is
significantly lower. Whereas with HTTP/1.1 it was required to open 1 TCP
connection per request, with HTTP/2 1 connection is opened per domain. Many
requests can flow through them in parallel, and potentially out of order.

Instead of delegating parallelism to compound documents, we can now actually
rely on the protocol itself to handle this.

Using many HTTP/2 req

Truncated by Planet PHP, read more at the original (another 333361 bytes)