Web Page Analysis Basics for Testers

Posted: 22/03/2012 by oliver_nz in Exploratory Testing, Performance Testing
Tags: , , , , ,

I usually move in the performance testing realm and one of the things I regularly do, is check for obvious omissions in website design before I get into the low down with testing.

What do I mean by that?

There is such a thing as (and I am having difficulty writing this) Best Practice, when it comes to web page development. These are technological imperatives that can be easily checked by using simple tools. You don’t need to be an HTML guru to use these or to gain more knowledge about your website under test.

What I’m talking about are tools like YSlow (http://developer.yahoo.com/yslow/). A simple tool, that can be applied in seconds to any website. It can be a treasure trove for any tester. You can have it even easier than that. If your site is available online just hack it into http://gtmetrix.com. That gives you the YSlow results and Page Speed results without having to do any installing at all.

Have a look at TradeMe for example. Isn’t that cool?

TradeMe on GTMetrics

You can see a heap of information on how the page loads and what issues it has. You can even run comparisons from different locations in the world  (you’d think that the results would be the same….go surprise yourself ;-)).  You can (with a login) also run tests over time and see how the ratings change.

Any issues are highlighted and you can get details on what seems wrong. I’d always suggest understanding the issue before rasing a defect though. There are false positives, where certain issues appear because they are needed for the workings of the page or the technology stack used for your project restricts the outputs.

Personally I expect web designers and web developers to check for such easy to find issues before delivering code to test. Or, in a vendor scenario, I’d expect their test team to have had a look. This should be common knowledge and due process. Thing is, that I have yet to come across someone who does these tests. The consequences from that are blatantly obvious if you run a few tests. It gets worse if we are talking web sites for internal use only. Have a play with websites you know or tested and see how they stack up. You will also be able to easily pick out the sites that actually did their homework.

The main features I generally look for that are easy to deal with and give big wins are:

Number of HTTP requests

I expect a minimum number of resources that have to load to show a web page.

Why is that so important? Each resource requested locks up a thread on your browser (and these are limited to 4-6 threads per site). That means loading of resources is done sequentially. This impacts the load time of the whole web page. Each request for a resource (even from cache) will initiate some kind of communication. The overhead will have an impact on performance.

Example: If you amalgamate two CSS files that overhead has just halved, you are now using one thread instead of two and your compression might be better (more on that below).

Here is the analysis of the IRD page. Go to the Timeline tab. You get a really good representation of how the page loads. Focus on the LHS of each bar. The beige (for lack of a better description) colour denotes the time the request has been waiting to be processed. As the page gets loaded these get longer as resources get locked up. On the YSlow tab it suggests making less resource requests. If you click on “Make less HTTP calls” you’ll see that it lists 25 optimizable requests in total. With this information you could now go back to the developers and ask for a fix.

Expires Headers

As the name indicates these HTTP headers tell the browser and caching proxies how long to cache resources. If this is set to zero or not set at all, each resource will be requested anew with each page load. With the ADSL/cable/fiber speed that private homes have nowadays that should not be an issue one would think but…

It is of interest for the server side. Companies have to pay premium for every bite transmitted out of their datacenter. Not only to the ISP but for infrastructure, rackspace and other costs. If pages can be cached outside of their realm those costs can be minimised and frees up bandwidth and capability for other services (also see topic Content Delivery Networks and ETags).

Additionally this helps save hardware and energy costs. Each request not made to a server saves CPU time and the power, that is needed to generate the response. This means you can do more with less hardware. Certainly the savings per page are minute but looking at economies of scale they do tend to add up to a significant number.

Adding expires headers is usually quite straightforward and as easy as hitting a button in your web server config or changing a line in your web server config file. The question is how long resources should be cached. I usually suggest a standard 24hrs to a maximum of 72hrs. Different resources can/should be cached for different lengths of time.

The thing to note here is that the danger comes when the website changes. In the most extreme case a customer/user could see old content for a while. There are clever ways around this though. Have a Google and you’ll find lots on the topic.

Compress resources

Similar to the expires headers above compression aims at the similar benefits. Gzip compression can be used when browser and server communicate. The web world is quite text heavy and text is easily compressed. Web pages can therefore shrink communication 30 to 70%. As above this has a positive effect on bandwidth and infrastructure requirements.

Keep-Alive Enabled

Whenever the browser loads a resource it has to go through a heap of communication just to establish a link over the network (TCP), HTTP(S) and then to the server. This is quite a significant overhead (hence also the push to reduce resources – see above). In order to circumvent the re-establishing of connections to the same server  Keep-Alives are used. This means a connection is built up and is re-used for multiple requests. As you can imagine this saves a lot of communication and makes things much quicker.

Optimizing JavaScript and CSS

When the browser interprets a web page it can get into situations that are similar to those in databases. Certain things clash and can only proceed sequentially. In the database world that is called locking. For HTTP we’ll call this blocking.

There are two simple rules to get around this as much as possible.

  1. Put all CSS to the top
  2. Put JavaScript to the bottom
You can see the effect clearly in the timeline of the page load. If you look at the timeline of the Metservice start page, then you see about 10 lines down, that .js files are loading. A bit further down you can see that each loaded item is now preceded by a beigish item that is called Blocked. This is the effect that the JavaScript has. The items are probably waiting for it to finish processing.
CSS files define how a page should be rendered. If these instructions are further down the HTML page the browser will have to re-render/update the page several times. As you can imagine that will affect the rendering time in total. So put CSS files as far up the HTML as possible. Modern browsers seem to make headway on being smarter about this but it’s still better to fix the issue server-side.

Remove Redirects

The Metservice page also shows a possible issue with redirects. If you look at the first 3 steps of the  loaded web page they are HTTP 301 response codes (all HTTP 3xx codes are redirects). These redirects bounce the browser from one URL to the next. There are good reasons for doing redirects but as you can see the practice in this case costs over a second before the actual page loads.  As a tester I’d always question if these redirects are really needed or could be replaced by a simpler method.

Finally

There are another good two dozen things highlighted in these reports and it’s worthwhile looking at these too. Once you have some practice using the reports they deliver lots of low hanging fruit that can very cheaply increase the speed of your web site under test and if there is one thing all experts can agree on is, that faster is better for business.

An alternative GTMetrix is http://www.webpagetest.org and I’m sure there are a few more out there.

Also of interest is http://www.httparchive.org. They run similar tests on the most common sites on the internet. This can give you an impression where your site ranks and what the most common issues are. You can even add your site so it will get monitored for you. Just beware that this will generate traffic to your site you might not want.

Author: Oliver Erlewein

Comments
  1. oliver_nz says:

    Just found another exciting online tool to view HAR logs. http://www.softwareishard.com/har/viewer/
    This is interesting if you save page results with YSlow or Chrome, doing the above.

  2. Hi Oliver!
    Excellent post.
    I have one remark on compressing the resources. You wrote:
    “Web pages can therefore shrink communication 30 to 70%. As above this has a positive effect on bandwidth and infrastructure requirements.”
    gzip has negative inpact on infrastructure requirements (this is the trade off) because gzip is “hungry” for CPU time.

    –Karlo Smid

  3. oliver_nz says:

    @Karlo: Yes, I am aware of gzip using more CPU. Was thinking at the time I wrote it, whether to include or not. This used to be a problem but today’s systems it’s hardly noticeable (on both sides). I think the advantages here outweigh the (CPU) cost bandwidth is still the main bottleneck and cost factor. So mentioning it has the effect of scaring people away from the practice and I wouldn’t suggest doing that.

    Although SSL & gzip overhead is often outsourced to the Load Balancer (server side). These boxes/software is geared to be very efficient when doing such jobs and are a good alternative.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s