Web front-end benchmarking approach

This document describes the intended approach for performing benchmarks on web frontend systems based on ARM CPUs. The purpose is to identify bottlenecks occurring in these systems that can be removed by software optimization. This document captures the outcome of various discussions held between LEG team members and various stakeholders and sponsors over the course of February 2013 (including LCA '13)

Scope

The scope of these benchmarks shall be those components that are involved in serving web requests on a per-request basis. This covers SSL connection setup and payload encryption, caching of static and dynamic content, and the on-the-fly generation of such content. Any components performing caching and/or optimization on the client side are explicitly excluded. Components whose load is difficult to amortize per-connection are also not taken into account.

The test environment

The test environment (TE) shall consist of a number of nodes in addition to the node(s) whose performance we are actually interested in, mainly load generators and a database backend. The network interconnect - which should ideally be isolated from the network carrying the control and sequencing connections - is also considered part of the TE, as is the infrastructure used to perform external measurements such as power consumption.

The system under test (SUT)

The system under test is the system whose performance we are trying to optimize. This could be any kind of ARM based system, although at the moment, the Calxeda systems in the lab are the only ones suitable (i.e., systems whose I/O architecture is geared towards server use). This node will be loaded by the load generators to the point where contention starts to occur on any of its resources (CPU, I/O, memory etc), and the bottlenecks identified in this way will be inspected more closely and removed if possible.

SUT variants

Instead of expending a lot of effort to design the 'golden' configuration to deploy on the SUT, we will have several configurations with different types and levels of optimization applied. Some typical variants could be

  • boilerplate
  • opcode caching only
  • dynamic content caching
  • fully optimized (e.g., Yahoo-grade)
  • SSL enabled

One essential aspect of using several variants in this way is that it allows us to classify potential performance enhancing optimizations:

  • Horizontal optimizations: optimizations which will effectively turn one configuration into another (more optimized) one
  • Vertical: optimizations which make the chosen configuration perform better.

The purpose of this classification is to avoid entering the realm of engineering that is typically applied at deployment time. Instead, we should be focusing on generic improvements that makes the ARM perform better on a per-node basis.

Another advantage of benchmarking several variants is that it allows us to prioritize identified bottlenecks based on whether they occur in multiple configurations or just a single one.

The database backend

The database backend is the single backend node hosting all the data used by the SUT to generate the dynamic content it serves. The size of the data set should be large with respect to the various caches and buffers in the SUT, to prevent caching effects from skewing the results. The hardware should be dimensioned such that the response time to requests coming from the SUT is constant over the load range induced by the SUT. (In other words, it should be able to easily handle the load coming from the most optimized configuration we take into account)

This node should be contention free.

The load generators

The load generators shall be hosted on hardware different to the SUT, to highlight any I/O-related bugs that may be present.

Load balancing

As we are taking a 1-unit wide vertical slice of a multi node web server system, there is no reason to involve any kind of load balancing between the load generators and the SUT. This simplifies the setup significantly and removes a potential source of noise in the latency measurements.

LAVA

We will use LAVA to sequence the benchmarks and collect the results.

LEG/Engineering/vertical-web/benchmarking-approach (last modified 2013-03-15 11:35:39)