A blog by Ryan Breen of CloudFloor
There’s been so much going on in the performance space lately that I’ve been snowed under. It’s difficult to know where to begin chronicling all of the progress. I’ll start with a few updates from Sitepen.
- Back in April, Kris Zyp had a great article for IBM developerWorks called Ajax performance analysis. The developerWorks crew puts out some great material, and this is no exception. Simply put, it’s one of the best articles I’ve seen on the topic, and it should be required reading for every Ajax developer. He discusses Firebug, YSlow, and some client-side instrumentation techniques.
- Old friend of the Perf, Tom Trenka, had a nice post about string operations across browsers in May. One of the more interesting takeaways is with regards to IE7 versus IE6. The net — there’s no longer any justification, if there ever was, for special casing string concat operations for IE.
- One of my favorite tools, Firebug Lite, has seen some “>dramatic improvements in the Dojo Toolkit version, as discussed by Mike Wilcox in early June. The features discussed: a popup mode that remembers size and position, ReCSS (so you can reload stylesheets without reloading the app), a DOM Inspector, an Object inspector, and a command line. They’ve definitely taken Firebug Lite a long way past the initial goal of offering a bare subset of Firebug functionality to IE developers.
- A few days ago, Mike posted another article — this time with a nice addition to the recent swell of client-side profiling articles. Mike whipped up a nice generic mechanism for tracker client side performance in a cookie to remove some of the tedium from generating a statistically relevant data set in your own browser.
- Finally, Alex Russell expands on the concept of lazy loading by creating a stub loader for Dojo. Weighing in at a slim 6kB (gzipped over the wire), this build of dojo.js is just the bootstrap code necessary for loading the main functionality, all of which is deferred until it’s actually called within an application. John Resig posted a follow-up regarding some of the clear downsides of this approach, such as the potential violation of user expectations.
Hot on the heels of yesterday’s discussion of Jiffy come a few unrelated notes involving client side performance testing. It looks like this approach is finally gathering the mindshare it deserves, and it’s really cool to see all the effort going into developing these solutions.
The first is a fairly basic client side performance tracker for Rails by Eric Falcao. Currently, it appears to only track the time from the start of the document parse to the onload event. Eric could make this a really compelling tool by providing an API allowing the developer to add more granular timings as desired. This is the approach I’ve used in the prototype Rails client side perf tool I used for Dojo Charts measurements (and a couple of other projects), and it’s also the approach used by Jiffy.
You could then hook into prototype.js or RJS code generators to auto-insert these performance counters for common actions (here’s an example — time every XHR fired on behalf of every end user). To use Eric’s words, there are some really cool ways to make this type of instrumentation “we’re-all-spoiled-with-rails simple.”
Next is a really cool cross-browser benchmark of SVG, VML, and Canvas by Ernest Delgado. Ernest uses two case studies of Google Maps charting to compare SVG (for Safari and Firefox) and VML (for IE) with a Canvas implementation. I’ve done some studies of the relative performance of VML and SVG, but I’ve never looked at how a Canvas implementation could compare.
Ernest’s findings are interesting. In my research, Firefox’ SVG implementation was notably slower than Safari’s, and Ernest’s data bears that out for Firefox 1 and 2. But Firefox 3 renders SVG in his case studies in between 1/2 and 1/3 time time of Firefox 2, so it appears the team has done some solid work on the SVG engine (or, perhaps, performance improvements elsewhere in the browser are responsible for the gains).
Compared to Safari 3, Firefox 3 turns in a mixed performance. In the first case, Firefox is significantly faster. In the second case, Safari is faster. I would like to know more about the sample size of the measurements to see if these numbers would hold up, but it definitely looks like the Mozilla team has been hard at work.
Speaking of the Mozilla team, John Resig last week described a new plugin for deep profiling jQuery. This plugin will instrument every jQuery call and give some basic stats as to call count and time spent. And this is just the beginning. Per John:
The next stage of development for this plugin would be to reveal which methods are running inside other jQuery methods – in addition to monitoring other aspects of the application (such as timers, Ajax callbacks, etc.). I’m pleased with even this most-basic result – it gives me the ability to quickly, and easily, learn much more about a jQuery-using application.
Clearly, the complexity of apps being run on the client side requires more measurement within the browser. That’s why we’re seeing a mini-explosion of browser side performance collection tools or demos. This is a space ripe for innovation.
I’m at the O’Reilly Velocity Conference in San Francisco today and will be sitting on a panel with Bill Scott, Ernest Mueller, and Scott Ruthfield. Steve Souders is moderating.
Bill is kicking off the show with something really exciting — the Jiffy plugin for Firebug. Jiffy relies on Scott Ruthfield’s Jiffy-Web open source analysis suite to track the performance of an application from both the client and server side. Client side performance tracking is something I’ve been a fan of for a while (I used a similar technique for the Dojo Charts benchmark last year).
This looks like a great new tool to make this type of analysis more accessible, and I’ll be attending Bill’s sessions today to get more information.
I’m doing a Webinar Thursday with Bob Buffone of Rock Star Apps and Nexaweb. I’ve never done a joint webinar before, so it should be a lot of fun. 2 hours of Ajax/RIA performance discussions — what could be better?
Bob’s blog has more details on how to sign up. It’s free, of course.
I’m fascinated by cases where seemingly banal technical details become precious commodities because very few have expended the time and energy necessary to document them. One good example is mobile browser connection profiles — there are thousands of combinations of mobile device and browser software, and each has its own particular connection limits and concurrency profile. No central body provides gratis access to this information, so those looking to study or test mobile browsers have few and costly options to choose from.
That’s why I was excited to see a post by Jason Grigsby of Cloud Four (via Ajaxian) about a research project to collect this information with some clever server-side magic. Just hit this link in your mobile device and help contribute to a worthy cause. The results will be published under a creative commons license for all to use.
I’ve talked before about the recent move by browser vendors to implement the Selectors API. There is potential for significant performance benefits from moving this code into the browser, but there is risk as well. If the provided functionality is buggy (as history tells us it must be), libraries will need to patch around these bugs on a case-by-case basis. If the spec is ambiguous or differs from de facto standards used in common practice, that’s yet more work for the library maintainers.
John Resig provided some insight with a post today into how browser vendors, the W3C, and library maintainers are coming together to smooth over the rough parts of the spec. It’s a fascinating read, providing a peek into the sausage-making process of spec wrangling for those who don’t frequent the public-webapi mailing list.
Testing new arrangements of DOM elements to improve the object load order or parallelism can be a bit of a cumbersome task. Fire up a text editor, create a test page with a meaningful name, hit with different browsers, and repeat a few hundred times. As an exemplar of the old aphorism that good programmers are lazy, Steve Souders (formerly of Yahoo!, now of Google) created Cuzillion to remove some of the friction from these testing cycles.
Cuzillion is a simple web app that allows for easy arrangement of different page elements (external scripts, images, stylesheets) within a DOM. These sample pages are each defined by a simple restian URL, so they can be shared with other developers as examples of what to do (or what not to do). Loading a page in Cuzillion also reports a high level number for page load time and some micro-metrics from within the page (the time to load an inline script, for example). You can use Page Detailer or HttpWatch to get a more detailed analysis of object load order.
When YSlow was released last year, one of the aspects of the project that excited me the most was the documentation it provided: just by ranking specific performance decisions made by the application, it served to educate developers on what they can do better. I could see a community developing around Cuzillion to serve a similar purpose, especially as the tool expands to handle more DOM elements or object load techniques (such as external scripts referenced via
It’s great to see pseudo-standards such as Firebug’s console and profiling APIs gain traction. That makes it much easier for users to get meaningful comparative data between browsers while testing their applications.
Over at Surfin’ Safari, Antti Koivisto explains the preloading features in the latest WebKit nightlies. Antti begins by documenting the dominance of latency in determining total page load time, focusing on the slowdown caused by the blocking behavior of modern browsers while handling external scripts. As we’ve discussed here in the past, this has the effect of serializing object loads resulting in a total page load time that increases linearly with increases in network latency.
The new preloading feature available in WebKit nightlies attempts to maintain network parallelization even while the parser is blocked waiting for an external script to load. To achieve this, a separate parser is created to move through the remainder of the page, queuing up any additional objects to load. Scripts and stylesheets are also moved to the head of the queue of pending objects.
The net result for end users is a faster page load:
It should be noted that IE8 promises a similar improvement to script load parallelization, as discussed by Steve Souders a few weeks back. I would guess that the underlying implementation is similar to that used by the WebKit team.
A few weeks ago, I discussed IE8′s improved connection parallelism, specifically the increase from 2 concurrent connections per host to 6. One open question was the total number of connections allowed — my speculation was that the IE team would stick with a max of 6 rather than triple that value as well.
I was wrong. The new max is an astonishing 18 (!) concurrent connections:
That is some serious parallelism, and it has significant implications for application performance.
In December of 2006, I discussed the CNAME trick for circumventing browser connection limits, using 3 hostnames to serve images to trick the browser into using all available connections. At the time, that was 6 for IE. The above capture from IBM Page Detailer confirms 18 concurrent connections in IE8.
As an aside, the out of the box optimization provided by IE8 is actually slightly faster than the CNAME trick applied to previous IE versions as it does not incur any hostname resolution cost when establishing the first connections. Both examples would use 6 total concurrent connections, and IE8 should be equal to or faster than optimized connection management in previous versions.
But what about IE8 against a page optimized for connection parallelism? If 6 concurrent connections is good, 18 should be terrific, right? Not so fast. While the Page Detailer captures above show some improvement in the 18 connection version, point in time metrics can only tell us so much. What we need is a tool that can collect a statistically significant sample of performance data using both 6 and 18 connections to see if any trends shake out.
For this analysis, I used a hosted performance testing solution from Gomez, my employer. This is the same tool used in my original connection parallelism article. I ran my tests in IE8 compatibility mode, mirroring the new connection levels. As before, one test is against the default (1 host) page, and one test uses the CNAME trick (3 hosts) for greater connection parallelism. The results surprised me:
This aggregate data is made up of hundreds of tests taken from 7 locations in the US over the last 14 hours. The same locations were used for both tests. The “IE8 Parallelized” test, which uses 18 connections, has a much higher standard deviation and a higher average test time than the 6 connection “IE8 Default” test. What gives?
The answer appears to be sporadic connection hangs. The median response time for the parallelized page is lower than the default page, but a higher incidence of outliers skews the median and leads to the increased variability. Looking at the outliers, I typically see a section of the page load that looks like this:
Here we see 2 object downloads taking more than 8 seconds to complete. The average response time for an entire page is around half of a second, so this is a huge outlier. I see these outliers on between 5 and 10% of the test runs for the 18 connection page, but I never seen any comparably high outliers for the 6 connection version.
Below is a revised version of the test averages taken by removing outliers:
Note that the parallelized version is now consistently faster than the default. As expected, the outliers are responsible for the counterintuitive poor performance of the parallelized page.
I suspect that my hosting provider (Dreamhost) simply can’t keep up with the dramatic increase in connection parallelism. 18 connections is simply too much of a good thing, and it will present a scaling problem for those who are on small to medium hosts. 10 users hitting at the same time will yield 180 concurrent connections, a pretty significant number for smaller providers.
[Note: This objection was anticipated and handled by the IE team. See below.] Dial-up and cellular network users are also likely to be negatively impacted by this change. In the high broadband world where latency is the dominant factor, greater connection parallelism is a boon. But in bandwidth constrained networks, it just leads to thrash where progress is slowed by all the connections trying to share a small pipe.
I’m curious what sort of testing Microsoft has conducted to determine the impact of this change. The connection parallelism approach is used widely (including by the Virtual Earth team), and some servers may not be ready for the increase. My tests were conducted against only one host, but if similar results are experienced elsewhere, this may fall under the rubric of “don’t break the web.”
My advice to anyone who is using the connection parallelism trick is to perform a similar analysis of your application before IE8 is released. The new connection levels will create greater strain on your servers, and that may lead to occasional performance hiccups for your users. There are a few different approaches you can take to dealing with this change, but the most important first step is to understand the extent to which your application is impacted.
Update: Kris Zyp and Steve Souders have pointed out that IE8 will use 2 connections per host for dial-up users. This nicely addresses that concern, but the concern about 18 connections for pages using the CNAME approach still stands.