The WebKit team makes the case for preloading

March 24, 2008 on 7:13 am | In ajax, http | No Comments

Over at Surfin’ Safari, Antti Koivisto explains the preloading features in the latest WebKit nightlies. Antti begins by documenting the dominance of latency in determining total page load time, focusing on the slowdown caused by the blocking behavior of modern browsers while handling external scripts. As we’ve discussed here in the past, this has the effect of serializing object loads resulting in a total page load time that increases linearly with increases in network latency.

The new preloading feature available in WebKit nightlies attempts to maintain network parallelization even while the parser is blocked waiting for an external script to load. To achieve this, a separate parser is created to move through the remainder of the page, queuing up any additional objects to load. Scripts and stylesheets are also moved to the head of the queue of pending objects.

The net result for end users is a faster page load:

It should be noted that IE8 promises a similar improvement to script load parallelization, as discussed by Steve Souders a few weeks back. I would guess that the underlying implementation is similar to that used by the WebKit team.

Testing IE8′s Connection Parallelism

March 16, 2008 on 7:51 pm | In ajax | 9 Comments

A few weeks ago, I discussed IE8′s improved connection parallelism, specifically the increase from 2 concurrent connections per host to 6. One open question was the total number of connections allowed — my speculation was that the IE team would stick with a max of 6 rather than triple that value as well.

I was wrong. The new max is an astonishing 18 (!) concurrent connections:

That is some serious parallelism, and it has significant implications for application performance.

In December of 2006, I discussed the CNAME trick for circumventing browser connection limits, using 3 hostnames to serve images to trick the browser into using all available connections. At the time, that was 6 for IE. The above capture from IBM Page Detailer confirms 18 concurrent connections in IE8.

As expected, IE8′s handling of the unoptimized version, where only one hostname is used, is comparable to the performance of the optimized page in previous IE versions:

As an aside, the out of the box optimization provided by IE8 is actually slightly faster than the CNAME trick applied to previous IE versions as it does not incur any hostname resolution cost when establishing the first connections. Both examples would use 6 total concurrent connections, and IE8 should be equal to or faster than optimized connection management in previous versions.

But what about IE8 against a page optimized for connection parallelism? If 6 concurrent connections is good, 18 should be terrific, right? Not so fast. While the Page Detailer captures above show some improvement in the 18 connection version, point in time metrics can only tell us so much. What we need is a tool that can collect a statistically significant sample of performance data using both 6 and 18 connections to see if any trends shake out.

For this analysis, I used a hosted performance testing solution from Gomez, my employer. This is the same tool used in my original connection parallelism article. I ran my tests in IE8 compatibility mode, mirroring the new connection levels. As before, one test is against the default (1 host) page, and one test uses the CNAME trick (3 hosts) for greater connection parallelism. The results surprised me:

This aggregate data is made up of hundreds of tests taken from 7 locations in the US over the last 14 hours. The same locations were used for both tests. The “IE8 Parallelized” test, which uses 18 connections, has a much higher standard deviation and a higher average test time than the 6 connection “IE8 Default” test. What gives?

The answer appears to be sporadic connection hangs. The median response time for the parallelized page is lower than the default page, but a higher incidence of outliers skews the median and leads to the increased variability. Looking at the outliers, I typically see a section of the page load that looks like this:

Here we see 2 object downloads taking more than 8 seconds to complete. The average response time for an entire page is around half of a second, so this is a huge outlier. I see these outliers on between 5 and 10% of the test runs for the 18 connection page, but I never seen any comparably high outliers for the 6 connection version.

Below is a revised version of the test averages taken by removing outliers:

Note that the parallelized version is now consistently faster than the default. As expected, the outliers are responsible for the counterintuitive poor performance of the parallelized page.

I suspect that my hosting provider (Dreamhost) simply can’t keep up with the dramatic increase in connection parallelism. 18 connections is simply too much of a good thing, and it will present a scaling problem for those who are on small to medium hosts. 10 users hitting at the same time will yield 180 concurrent connections, a pretty significant number for smaller providers.

[Note: This objection was anticipated and handled by the IE team. See below.] Dial-up and cellular network users are also likely to be negatively impacted by this change. In the high broadband world where latency is the dominant factor, greater connection parallelism is a boon. But in bandwidth constrained networks, it just leads to thrash where progress is slowed by all the connections trying to share a small pipe.

I’m curious what sort of testing Microsoft has conducted to determine the impact of this change. The connection parallelism approach is used widely (including by the Virtual Earth team), and some servers may not be ready for the increase. My tests were conducted against only one host, but if similar results are experienced elsewhere, this may fall under the rubric of “don’t break the web.”

My advice to anyone who is using the connection parallelism trick is to perform a similar analysis of your application before IE8 is released. The new connection levels will create greater strain on your servers, and that may lead to occasional performance hiccups for your users. There are a few different approaches you can take to dealing with this change, but the most important first step is to understand the extent to which your application is impacted.

Update: Kris Zyp and Steve Souders have pointed out that IE8 will use 2 connections per host for dial-up users. This nicely addresses that concern, but the concern about 18 connections for pages using the CNAME approach still stands.

Google Code performance improvements: the Souders factor

March 16, 2008 on 1:05 am | In ajax | No Comments

Steve Souders is now at Google, and the Google Code team has taken some of the advice from High Performance Web Sites and applied it to reduce user-perceived latency. There is no magic in their performance improvements — the techniques (JS/CSS concatenation, CSS sprites, and lazy loading) have been discussed here and elsewhere in the past — but the user-centricity of the approach is what I find most cheering.

The explosion of web performance optimization tools and techniques would be meaningless if we were not focused on improving user experience, and the Google Code team clearly understands this message. The last approach they discuss, lazy loading, is a nice illustration. Rather than initializing the Google loader module in the traditional blocking manner (<script src="blah.js"><script>), the team used the non-blocking DOM scripting approach (document.createElement('script'), set src, append to head). A callback on complete of this operation loads the required APIs.

This approach prioritizes the load time of critical user-visible page elements. To understand the effectiveness of this optimization, you need to measure the time at which the user would perceive the page to be loaded as total page download time may overstate the actual latency. Using experience-centric measurements, the Google Code team saw improvements between 30% and 70% depending on page.

IE8: The Performance Implications

March 7, 2008 on 1:25 am | In ajax | No Comments

Mix08 is here, and with it the first beta of IE8. John has a great roundup of the JS/Dom work, noting that “Internet Explorer 8 is our release.” He’s right.

I’ll run through a few of the items that have particular implications for performance.

  • This one is the most exciting for me: the IE team has finally upped the connection limit to 6 per host from the default of 2. I’ve talked before about DNS tricks to get around the 2 connection limitation, but having this support out of the box will be a great assistance in the war on round-trip latency as it’s easier to make more expensive network calls in parallel. This is especially sweet for Comet and the like where the persistent connection could previously monopolize half of the connections to your site. As you would expect, Joe Walker of DWR is happy.

    One thing I haven’t seen mentioned anywhere is the total connection limit. Previous versions supported 2 per host and 6 total. Is the new version 6 per host / 6 total or 6 per host / 18 total. I really doubt it on the latter, but if no one has the answer I’ll grab the beta this weekend and test it out.

  • w3c Selectors API — Last month I discussed the work Firefox and WebKit have done to implement the new Selectors API spec, and it’s nice to see Microsoft is joining the list. I share John’s concern that these black boxes have a significant potential (make that inevitability) of browser bugs, so smoothing over these will, as always, remain the job of libraries. But it’s nice to have that blazing speed under the covers.
  • DOM Storage and offline events are techniques still on the fringes of relevance. DOM Storage in Firefox 2, as well as Google Gears and its less nerdly cousin Dojo Offline, have a lot of promise, but to this point they’ve lacked a killer app due in no small part to the chicken and egg problem. Having Microsoft on board finally offering these HTML 5 features may help push us to widespread adoption.
  • I’ve dinged Microsoft for the lack of a Firebug-like tool since, well, I first used Firebug, and they finally have a clone. A clone in serious need of a makeover. Yeah, I’m shallow. For those keeping score at home, the sexiness hierarchy goes Webkit Inspector > Firebug > IEBug (or whatever it’s eventually called).
  • For the truly performance obsessed, there are a collection of optimizations to common low level functionality, such as string concatenation and array manipulation.

All in all, some really cool stuff in this beta. If you want to give it a try without downloading, it’s already up on BrowserCam. Just like this:

Include, a new JS compression wrapper

February 27, 2008 on 10:45 am | In ajax | No Comments

Earlier this week, I talked about a tool which removes much of the tedium from generating CSS sprite maps. In a similar vein, Brian Moschel of the JavaScriptMVC Project pointed our good friends at Ajaxian to Include, a wrapper around Dean Edwards’ excellent JS compression tool, Packer.

Include is itself a fairly small chunk of JS which is designed to run within the browser of development and production users. This approach has some nice advantages: there’s no need for server side compression scripts and it’s easy to create many different compressed files depending on the different library requirements in different parts of your application. Expanding on that last point, you can select at browser load time which library to use within a specific page giving you runtime flexibility.

The one thing I don’t like is that Include is packaged as a separate .js file. As I’ve discussed here many times, performance in modern broadband networks is dominated by latency. The round trip time to request the initial include.js, which is only 3kB, will offset some of the gains from compressing and concatenating library files. In most use cases, the best performance approach will be to use include.js to compress your libraries only during development time, replacing all include.js references in production with a single compressed library call per page.

Yay! The CSS Sprite Generator is Open Source. Let’s play!

February 23, 2008 on 7:06 pm | In ajax, http | 3 Comments

Last September I posted about a CSS sprite generator designed to reduce the tedium of a popular HTTP optimization. The developers responsible, Edward Eliot and Stuart Colville, graciously released the generator under a BSD license earlier this month.

One of the first articles I wrote about CSS sprites covered the built-in support in GWT, and I focused on their clever trick of including an MD5 checksum of the sprite map contents into the filename. This allows you to set effectively infinite cache headers since the name will change if the underlying image is modified.

Since the CSS sprite generator is now open source, I decided to add the checksum approach as an optional feature (on by default). You can try it here, or grab the tar or patch.

YUI Profiler

February 20, 2008 on 7:36 pm | In ajax | 3 Comments

I missed covering the release of the YUI Profiler in 2.4.0, but in YUI 2.5.0 they’ve sweetened the pot further with ProfilerViewer Control, a visual interface to the data collected by Profiler. Together, they provide an advanced, cross browser alternative to the profiling features of Firebug, something that is desperately needed as the complexity of client side applications increases. As great as Firebug is at profiling JS, I’m always worried that the data may not perfectly apply to other browsers. With YUI Profiler, that’s not an issue.

The profiling model differs significantly from Firebug, which runs as a browser extension and can reach deep into browser internals for its metrics. YUI Profiler is running within the JS sandbox, so the user is required to register specific functions for profiling. This limits the profiling from broad spectrum analysis of the whole application to more targeted measurement, but that also serves to limit the performance overhead of the profiling itself. When discussing the jsLex approach to instrumentation, I was worried that the tool itself would significantly impact runtime performance; with YUI Profiler, I don’t think that’s a concern.

I’m a fan of this approach. I like targeted instrumentation, and YUI Profiler falls somewhere in the middle between full browser instrumentation (Firebug, jsLex) and granular instrumentation calls (Firebug Lite). There are definitely advantages in the latter approach, such as the ability to measure the time of network events, such as async network calls, which as far as I can tell isn’t possible with YUI Profiler. That might be a nice, easy addition to the suite.

Studies of mobile Ajax performance

February 20, 2008 on 2:28 pm | In ajax | No Comments

Helsinki CS Masters student Mikko Pervilä has posted his excellent thesis, “Performance of Ajax Applications on Mobile Devices,” for our edification. It’s a fantastically detailed and meticulously cited study, covering everything from the history of Ajax to the performance of current cutting edge mobile devices (including the iPhone and the Nokia N800). He analyzes mobile device performance across Ajax toolkits and large websites, yielding a mixed bag of results. Opera Mobile on the N95 was overall the strongest performer, but I was happy to see my beloved iPhone hold its own.

Also worthy of a read is yet another great Performance Research article by Tenni Theurer on the YUI blog: iPhone Cacheability – Making it Stick. The key takeaways for site designers are that components must be <25kB to be cacheable and the total cache size is only 475-500kB.

Mobile Ajax development is a hot topic, but as these two articles demonstrate, there is still a fair bit of work remaining before mobile devices are as capable as desktop browsers. In the mean time, we need to understand the specific design constraints of our target devices to deliver compelling performance to our users.

Going native

February 10, 2008 on 10:36 pm | In ajax | No Comments

Looking back on everything I’ve written here in the past 18 months, one trend is clear: every performance optimization technique I’ve discussed has involved network tweaks or client side scripting. After all, a web developer can typically only control those aspects of the delivered application, so frequently we are looking to trick the browser into behaving a bit more intelligently than it would left to its own devices.

That’s why it’s refreshing to see what the browser vendors working to provide native, and dramatically faster, implementations of some foundational functionality in the Ajax space. Way back in March of 2007, John Resig posted about the native getElementsByClassName in Firefox 3, followed in December by native getElementsByClassName in WebKit. In both cases, the performance improvements over the best JS/DOM implementations were staggering.

More recent is the Safari implementation of the W3C Selectors API and its querySelector and querySelectorAll functions. Expect a similar implementation from Firefox in the near future, post 3.0. Conspicuously absent from the discussion is Microsoft, but perhaps we’ll see similar work in IE8.

In general, I think this is great progress for the community, and I like the idea of a formalized process for taking the most successful library functionality and moving it native. There is, however, one point of risk: these native implementations of complex functions will doubtless have the occasional bug, so we may end up in a situation where library vendors need to write messy code to fallback to the JS implementation. For example, if Safari 4.0.1 has a bug involving querying a specific type of CSS selector, every library would potentially need to check the browser version and analyze the input to querySelector. Enough of those checks, and the performance gains from taking the functions native begin to slip away.

Note: I regrettably missed the Firefox 3 getElementsByClassName in the first version of this post and have attempted to correct the story.

Look how far we’ve come

January 14, 2008 on 9:53 pm | In ajax | No Comments

Paul Irish has put together a nice timeline of CSS selector development, showing all the hard scripting work that has gone into providing this facility. It’s interesting to walk through the iterations — each generation appears to be larger than the prior, but performance is generally improving along with capability and accuracy.

This type of analysis is also interesting at a meta level: Ajax has been around long enough that we can have meaningfully detailed conversations about the evolution of our techniques. We’ve come a long way, and it’s gratifying every now and then to take a step back and look at all the little steps that got us here.

« Previous PageNext Page »

Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds. Valid XHTML and CSS. ^Top^

Buy business furniture furniture wood furniture. Solid wood furniture furniture mor furniture. furniture
Inflatable Water Slide