A blog by Ryan Breen of CloudFloor
Over at Surfin’ Safari, Antti Koivisto explains the preloading features in the latest WebKit nightlies. Antti begins by documenting the dominance of latency in determining total page load time, focusing on the slowdown caused by the blocking behavior of modern browsers while handling external scripts. As we’ve discussed here in the past, this has the effect of serializing object loads resulting in a total page load time that increases linearly with increases in network latency.
The new preloading feature available in WebKit nightlies attempts to maintain network parallelization even while the parser is blocked waiting for an external script to load. To achieve this, a separate parser is created to move through the remainder of the page, queuing up any additional objects to load. Scripts and stylesheets are also moved to the head of the queue of pending objects.
The net result for end users is a faster page load:
It should be noted that IE8 promises a similar improvement to script load parallelization, as discussed by Steve Souders a few weeks back. I would guess that the underlying implementation is similar to that used by the WebKit team.
Last September I posted about a CSS sprite generator designed to reduce the tedium of a popular HTTP optimization. The developers responsible, Edward Eliot and Stuart Colville, graciously released the generator under a BSD license earlier this month.
One of the first articles I wrote about CSS sprites covered the built-in support in GWT, and I focused on their clever trick of including an MD5 checksum of the sprite map contents into the filename. This allows you to set effectively infinite cache headers since the name will change if the underlying image is modified.
From Ajaxian, a nice presentation by Cyra Richardson of the IE team at MIX 2007 last year. The presentation covers both network and client side performance, and it should come as no surprise that the client side suggestions are more focused on IE.
Here’s hoping that some of these discussions of IE inefficiencies are finding their way into the design of IE 8.
Last week, IBM developerWorks posted an article suggesting the use of client cookies to communicate some information about the freshness of the data requested so the server can optionally return a 304 if no new data should be sent. There may be some edge cases where this approach is appropriate, but it’s probably not justifiable for the majority of apps.
If the response expected from the XHR is smaller than 10kB, the cost of retrieving the 304 is indistinguishable from loading the entire object, at least as far as network latency is concerned. The bandwidth consumed by the round trip request is also a meaningful, but small, chunk of the total object size, and you still bear the server cost of handling the inbound connection and request.
For cases where you are requesting large chunks of data frequently, this approach makes more sense, but it still feels like it’s just obscuring a problem with the application design. If you find yourself needing something like this, the easiest strategy to lessen the bandwidth impact on your servers is to tune down the client side polling interval. You could perhaps do this dynamically based on how recently you have received fresh data.
When I talk about Ajax performance optimization, I typically refer to optimization as a 3 stage process: reducing sources of network latency as far as possible, reducing sources of client side latency as far as possible, and lastly, hiding whatever is left. That last step is based on the assumption that, at a certain point, you will always hit a wall in traditional performance optimization as there is a certain amount of content to transfer and code to run client side. The techniques for hiding whatever is left tend to play in the area of ‘perceived performance,’ juggling the order of events such that the user sees an interactive and available application as quickly as possible.
An experimental new library from the YUI team called ImageLoader (as seen on Ajaxian) provides a perceived performance boost by allowing the developer to defer loading of images that are not visible on initial page load until they are necessary to complete the user experience. These images could either be outside of the viewport (“below the fold” as the cool kids say) or inside of hidden widgets. By deferring these objects until later, we restrict the number of network events, and thus the latency, before the user thinks the page is loaded and interactive.
The devil is in the details, of course, so the trick to making this work is ensuring that the images are loaded without blocking the initial page load but before they are actually needed. ImageLoader provides two techniques for loading these deferred images: triggering the load off of some event or loading after some fixed time interval. I’m not sure I love either approach as they both seem to err more on the side of giving the user a latency event later in the page to avoid slowing the initial load. I think I would rather start loading the content, at least in the above the fold case, as soon as all other networking operations had completed. This would be something more like a priority queue, or a “Mega Defer,” though that requires routing all network operations through some JS layer.
Regardless, it’s exciting to see the YUI team focusing on codifying these advanced performance techniques into libraries. Having well documented solutions opens fancy optimizations to a much broader range of developers.
Good thing I’m sitting on a train to New York, enjoying WWAN connectivity, because Ajaxian seems to think it’s sprite day, posting links to no fewer than 3 articles on the optimization technique.
First, the self-described Big Dumb Developer analyzed the performance of the new .Mac Gallery, found it lacking, and suggested some improvements: in this case, image concatenation a la CSS sprites. However, I’m not totally sold that spriting is the appropriate optimization for this scenario. The numbers he provides for his technique, while gaudy, appear a bit soft — the timing mechanism only covers the client side performance of swapping in different images, as far as I can tell. Also, as one commenter on the blog rightly points out, this approach does not fill the cache with the individual images, so it’s quite possible you’ll need to redownload them in other parts of the application.
Finally, CSS sprites are more appropriate in cases where the size of the images being downloaded is very small; in that case, the latency of sending a request significantly outweighs the download time for each object, so we are looking to hide latency by combining objects into one (still small) image. In the .Mac case, each image is fairly large, so combining them into one even larger image isn’t a huge win, and we would need to wait for the whole thing to load before displaying anything.
A more appropriate optimization in this case, in my opinion, is to exploit connection parallelism using the CNAME hack. Given that there are hundreds of images, using 6 (IE max) or 8 (Firefox/Safari max) persistent connections in parallel is much better than the 2 allowed by default. With this approach, we are cutting latency by a factor of 3 or 4, the images will live happily in the cache (as long as we are careful to use the same hostname for an image every time it is called), and we don’t have to wait for all content to load before being able to display something.
The next article is from Glen Lipka, another of the great people I had the pleasure to meet at TAE last month, who hopes to improve the manageability and reliability of CSS sprites. After wrestling with the verbose CSS syntax necessary to make this approach work in current implementations, Glen provides several alternative approaches, with examples. His approaches also have the benefit of degrading gracefully in the absence of JS support.
Finally, sprites get a brief mention over at snook.ca, with a few links to articles explaining the approach.
Look at that iTunesy widget for displaying the contribution of different content types to page weight. So sexy. The network details pane also provides access to all request and response headers.
This is a great way to catch up with the functionality provided by Firebug since 1.0 (with a much nicer UI, I would argue). Coupled with availability of Safari on Windows, Apple is making a clear play to remove any excuses a developer may have not to support Safari. Eagle-eyed readers may have noticed that the above screenshot, furnished by Apple, is of Web Inspector running on Windows.
The RC for GWT 1.4 provides native support for image concatenation in a feature called ImageBundle. If you’ll recall, I discussed image concatenation in a round up last month, and it’s a key embodiment of our “First Principal of Performance Optimization” — reduce network requests as much as possible.
Given that this is Google, they had to take things to that next level. And by that, I of course mean that their implementation has a few twists and refinements so brilliant and elegant that they seem obvious in retrospect and make the rest of us feel unworthy. Because we are.
The aspect of ImageBundle that jumps out at me is the infinite cacheability of the images: the file name includes an MD5 of the image contents, so there is no need for even periodic HTTP round trips for if-modified-since or if-none-match checks. You can set the expiration date to the expected heat death of the universe, or beyond. Thus, network requests are only made when they are absolutely necessary. Only one network round trip is made at that time. Everything is as simple as possible, but not one bit simpler.
And to those naysayers who may suggest a pathological case where minor modifications to one or two contained images cause frequent downloads of the larger, concatenated image: it just doesn’t matter. As this post at edgeblog explores, it’s still the latency, stupid. The real cost of object downloads is the round tripping. Establishing a connection, sending a request, and waiting for the first byte dominate network times. The added marginal cost of additional contents is always a fraction of those setup costs as the bytes of the reply are in flight by the time you begin receiving the reply. And as bandwidth increases while latency remains relatively flat, the dominance of latency will only increase.
<%= stylesheet_link_tag :all, :cache => true %>
Speaking of connections, the second performance trick in Rails 2.0 implements the host renaming technique I discussed in detail here. The config syntax looks like this:
config.action_controller.asset_host = 'assets%d.highrisehq.com'
I have not seen a more detailed transcript of the presentation, but I hope the asset mapping is smart enough to remember what objects were mapped to what connections. That could lead to suboptimal balancing of connections used on a given page, but the alternative (potentially remapping objects to different hosts on each page) would blow caching all to hell.
Here at Ajax Performance, I spend a lot of time (here and here, for example) discussing the network-side implications of design decisions for the simple reason that there’s no better way to improve the perceived performance of an application than to do more with less bandwidth. Sure, increasing connection parallelism is a neat trick to hide some of that download pain, but fewer objects is always the best answer.
I’ve seen a few good articles in the past month that give me yet another excuse to beat this drum. First, Steve Souders, Chief Performance Yahoo! at Yahoo!, goes so far as to make it Rule 1 in his upcoming O’Reilly title, High Performance Web Sites. For reducing the number of image requests, particularly for small images used as icons, Steve recommends combining all images into a map and using CSS to display different slices of the combined image in place of discrete downloads for each icon. This technique is explored in depth by Matthew Batchelder in this excellent howto article. Great stuff.