Load testing is an important aspect of any project: it provides insight into how your site and infrastructure will react under load. While critical to the launch process of any project, load testing is also useful to integrate into your standard testing procedure. It can help you locate bottlenecks and generic performance regressions introduced in the evolution of a site post launch (due to code and infrastructure changes).
There are many different methodologies and applications for performing load tests, but generally it involves bombarding various pages on the site with enough traffic to start causing degradation. From there, work can be done to isolate bottlenecks in order to improve performance. By performing periodic load tests, you are much more likely to catch something while it’s still a minor performance issue, not after it becomes a major problem.
Different Types of Load Tests
There are a number of different load testing configurations (sometimes referred to as test plans) which can be used individually or in conjunction to provide insight into site performance. Generally, we end up running three different types of tests:
- Baseline tests These tests are run with a relatively low amount of traffic in order to obtain some baseline information about site performance. These are useful for tracking general user-facing performance (time to first byte, time for a full page load), and to compare against a higher traffic load test result, as well as to track regressions in the standard case.
- High traffic tests Tests with relatively higher traffic are run in order to see when site performance begins to degrade as traffic increases. These tests can give you an idea of how many requests a site can handle before performance deteriorates to an unacceptable degree. At the same time, these types of tests are very good for uncovering bottlenecks in a site; many times issues with underlying services or service integration require a higher load in order to trigger. Generally speaking, this type of load test is the one most frequently run.
- Targeted tests Most tests are designed to cover all different request types and page types for a site. Targeted tests take a different approach; they are designed to test one or several specific features or user paths. For example, if you are working to improve performance of a certain page type on your site, you could run a load test which only focuses on that particular area.
Depending on the load testing tool in use, these tests could all be based on the same test plan by tweaking the amount of traffic generated and/or by enabling or disabling certain parts of the test plan in order to focus testing on only a subset of the site.
Creating a Valid Test
One of the most difficult aspects of load testing is creating a test that represents real site traffic. If the test diverges greatly from real traffic patterns, the performance results and bottlenecks found during the test may not help you improve real world performance in a meaningful way. By starting with a test that as closely as possible matches real world traffic (or expected traffic, if you are looking at a new/growing site), you’ll more reliably uncover performance issues on your site. Fine tuning a load test can happen over time in order to keep it in line with shifting traffic patterns and new features added to a site. Things to take under consideration when creating and reviewing a load test plan include:
- User browsing patterns. What pages are visited more often? How long do users spend on different page types? What are the most common entry points into the site?
- Logged in traffic. What percentage of traffic is logged in versus anonymous users? Do logged in users visit different pages than anonymous users?
- Amount of content. When creating a new site, do you have enough content on the site to perform a valid test? If not, then consider creating content programmatically before running a test. The Devel module is great for this purpose (among other things).
When to Test
There are many ways to approach load testing for a given website and infrastructure. How frequently tests are run is entirely up to you. Some sites may run a load test manually once per month, while others may run tests multiple times per day. Whatever you decide, it’s important to run a baseline test occasionally to understand what “normal” performance looks like. Only once you have baseline numbers for user-facing and server-side performance during a load test can you define what is “good” or “bad” in a particular test result.
Continuous Integration
Testing can be tied into your development process using a tool such as Jenkins. Tests could be set up to run each time a new release is pushed to the staging environment. Or, if you have sufficient resources, tests could even be run each time new code is pushed to the site’s code repository.
Periodic Testing
For those who don’t want to deal with the overhead of testing for each new code push, an alternative approach is to test on some pre-determined schedule. This could be daily, weekly, or even monthly. The more frequently tests are run, the easier it will be to directly link a change in performance to a specific change on the site. If you go too long between tests, it can become much harder to pinpoint the cause of a performance problem.
Manual Targeted Testing
In addition to the above approaches, it can be useful to run manual tests occasionally, especially if you are trying to test a specific aspect of the site with a targeted test plan. For example, if you are planning a media event which will drive a lot of new users to your site, it might be beneficial to run targeted tests against the main site entry points and features – such as user registration – which may receive higher than normal traffic.
Interpreting Test Results
One problem that many people encounter with load testing is that they are bombarded with too much data in the results, and it’s not always clear what information is important. In most situations, you will at least want to examine:
Depending on what the goals are for your load test, you may also be looking at additional information such as bytes transferred or throughput for a particular page.
No matter what data you choose to track, it becomes even more valuable if you are able to track it over time. By comparing multiple test results, you’ll get a much better idea of your site’s performance as well as gain the ability to see trends in the performance data. It can be very useful to observe things like page load time to see how it varies over time, or how it might increase or decrease in response to a specific code or infrastructure change.
Another important consideration is to understand how requests from load testing software differ from requests done by a user using a standard web browser. For example, JMeter does not execute javascript, and by default it will not download linked assets on a page. In general, those differences are acceptable as long as they are understood. However, it can be worthwhile to perform some sort of additional testing with a tool that more accurately represents a browser. These tools are not always capable of high-traffic load tests, but can at least be used to establish a baseline.
Server Monitoring During Load Tests
When you run a load test, you’ll be presented with a list of results which focus entirely on client-side performance, since that is all that the load testing application can see. It’s important to monitor servers and services during load test runs in order to get the most from your tests and to be able to track down infrastructure bottlenecks. Of course, this sort of monitoring could be left to the automated systems, but it can also be useful to manually watch the servers during test runs to see “live” how things are affected and be able to adjust what you are monitoring. Different sites will suffer from completely different infrastructure bottlenecks, so it’s best to keep an eye on as much data as possible, but for starters we recommend:
- Web servers Watch overall system load, memory usage, swap usage, network traffic, and disk I/O. Also keep track of things like Apache process count to see if you are approaching (or hitting!) the MaxClients setting. As always, don’t forget to watch logs for Apache to see if any errors are being reported.
- Reverse proxies and other caches like memcached Watch load, network traffic, and caching statistics. Is your cache hit-rate higher or lower than normal? Try to understand why that might be (e.g. a test plan which only hits a very small subset of the site’s pages would likely cause higher cache hit-rates). Watch memory usage and evictions to be sure that the cache isn’t becoming overfilled and forced to delete items before they’ve expired.
- Database servers: watch the server load and connection count. Watch the MySQL error log for any unusual errors. Ensure that the MySQL slow query log is enabled, and watch it for potential query improvements that can be made (see e.g. pt-query-digest in the Percona Toolkit). You can also watch MySQL statistics directly or with tools such as mysqlreport to watch things like InnoDB buffer usage, lock wait times, and query cache usage. Watch the MySQL process list to see if there are certain queries running frequently or causing waits for other queries.
Where to Test
There are a number of options to determine which environment to run load tests against: development, staging, production, or potentially some environment dedicated to load testing. In an ideal world, tests should always be run against the production environment to obtain the most valid data possible. However, site users (not to mention investors) tend to dislike the website becoming unusably slow due to a load test. While some people may be able to run tests against production, even if it means scheduling the test for three a.m. or some other low-traffic time, others won’t have that option. Our advice is to take into account what the goals and requirements are for your load testing and, based on that, run the test against an appropriate environment.
One other consideration when planning which environment to run a load test against is whether or not the test will be creating and/or deleting data on the site. In general, testing things like user comments against a production site can be very difficult to do in a way which doesn’t interfere with your users.
As a general rule, the closer your staging environment mimics production, the more useful it will be. In the case of load testing, staging can be a very fitting place to run load tests. While your staging servers may not be as numerous and high powered as in the production environment, you can still easily track down many performance issues and infrastructure bottlenecks by running tests against staging, if it is similar to your production environment.
Another option is to run tests against a development environment. This is especially valid for tests integrated with CI. While the performance numbers here will, expectedly, differ from production, it’s still a great way to test for performance changes when code changes occur.
When running tests against an environment that is not your production environment, be aware that any user-facing performance numbers should be taken with a grain of salt. That is, performance will likely be slower than your production environment, but the numbers can still be useful when comparing test results over time.
In cases where test or staging environments are under heavy use, it may not be possible to run load tests against those environments. For those situations, usually the only alternative is to have a dedicated “load testing” environment used specifically for load tests – or potentially used for other automated tests such as acceptance testing. As always, the closer this environment can mimic production, the more valid your test results will be. For those infrastructures that run mostly in the cloud, this environment might be spun up on demand when tests need to be run, but otherwise left offline.
Some sites might insist on running load tests against production in order to have “real” numbers. While this can make sense in certain situations, it’s rare that a dedicated staging environment wouldn’t be sufficient to get the data required. Generally, our recommendation would be to only run low-traffic tests against production in order to obtain user-facing performance information. If you are trying to put the site under a heavy load in order to catch performance bottlenecks, then doing so in a staging environment should yield useful results.
Image: ©IStockphoto.com/craftvision
This article text is excerpted from High Performance Drupal published by O’Reilly Media, Inc., 2013, ISBN: 978-1-4493-9261-1. You can purchase the book from your local retailer, or directly from O’Reilly Media’s online store: http://wdog.it/3/2/hpd