Welcome to Web 2.0 - Please Wait While Your Page Loads, Part 2
This is the second in a two-part series exploring the current state of Web 2.0 and the need for better optimization in the age of Rich Internet Applications. Part 1 gives an extremely brief history of Internet development and takes stock of the current state of the World Wide Web, specifically in terms of various big-name and AJAX-enabled sites. It also discusses large download sizes inherent with Rich Internet Applications and their impact on end-user perceptions. Part 2 introduces various client-side optimization methods that every Web 2.0 developer should perform and every Web 2.0 client should expect from their Internet solutions partner, and explores these optimization methods in detail.
Current State of the Internet (Redux)
In part 1 of this post, the Congruent Media team examined some big-name websites and discovered some disconcerting results about their size and lack of optimization. By far the worst site in the data set was CNN.com (as measured the afternoon of July 23rd, 2008), which came in at a total size of 635 KB with 263 external items (images, cascading style sheets, and JavaScript files) encompassing nearly 97% of the total download. BBC News came in a close second with 503 KB total download size.
To recap:
| Web Site | Total Size (B) | # Objects | Broadband Speed (s) | Modem Speed (s) | Calculated Latency (s) | Images (B) | External JavaScripts (B) | External CSS (B) | Other External Objects (B) |
| BBC News | 515,274 | 151 | 32.93 | 132.89 | 30.20 | 294,490 | 77,101 | 60,742 | 0 |
| CNN | 670,327 | 264 | 56.35 | 186.40 | 52.80 | 232,935 | 273,020 | 144,300 | 0 |
A further investigation revealed that none of the external JavaScript files for the CNN site, including third-party libraries, were compressed!
In this Web 2.0 age of the Internet, first impressions count - users expect things to work quickly. Waiting around for 10 or 20 seconds for a page to download and work correctly is time that a web browser is likely to spend doing something else, like hitting the back button, returning to their Google search results, and trying the next site down - your competition.
Optimize, Baby!
What's a developer to do? Clearly, if a site is going to be dynamic and engaging, implementing various Web 2.0 techniques, a developer must be concerned with optimizing their code, including any dynamic server-side code that produces the web pages, SQL queries that pull the content, and especially the JavaScript libraries that power the site once it's been downloaded to a client's machine.
The first steps, of course, are to ensure that what's happening on the server side is up to par. This means that your development team has built a well-designed, normailzed database structure, complete with appropriate keys and indexes, found the major bottlenecks in server load time and ironed them out (to the extent humanly possible), and otherwise removed any unnecessary redundancy and poor code flow from your server-side code.
Assuming that's been taken care of, it's time to take a look at what's going on client side, and tackle such problems as download sizes and client load times. To do this, a developer needs to consider several key issues:
- Round Trips - If a page contains a large number of objects (images, script files, style sheets, ...), the client's browser will need to make many requests to the server to retrieve this data. Bandwidth concerns aside, each request comes with a cost - it has to complete a round-trip from the client's machine to the server and back again, introducing download latency.
- Well-written Client Code - Poorly designed client-side code can produce undue load and bottlenecks as the client's browser attempts to parse and execute it. Additionally, poorly implemented AJAX can introduce an excessive amount of client/server communication that could likely have been accomplished in fewer round-trips.
- White Space - All of those spaces, tabs, line breaks, and other formatting that actually makes the source code readable contribute to the size of a page and, therefore, it's download and parsing time.
- Compression - Further compression techniques, including a sever that can GZip the data for delivery and variable rewriting in JavaScript code to decrease a script file's size, can help to optimize download times.
Round Trips
Information on the Internet may seem to travel virtually at light speed, but it bears remembering that light itself does not travel infinitely fast - only at a rate of 186,282.397 miles per second... It's 5,371 miles from San Francisco, California, USA to London, England, UK as the crow flies, or a 0.0288 second one-way trip for an electromagnetic wave, assuming perfect operating conditions and ground-based communications. That's 6/100ths of a second round-trip if the information could go directly, unhindered from San Francisco to London and back. Factor in that it's not moving along a direct path, is being transmitted along fiber optic cables, passing through routers, transceivers, amplifiers... and you get the picture.
Even in a perfect world, assuming synchronous download, the current laws of physics require that a person visiting the BBC News website from San Francisco and downloading those 151 web objects to assemble the page wait 4.3 seconds. Clearly, the case is even more severe for an Australian site visitor in Melbourne hitting CNN's server in Atlanta!
| News Site | Server Location* | Distance to...† | One-Way Lightspeed Trip | Downloadable Objects | Approximate Speed of Light Latency‡ |
| CNN | Atlanta, GA, USA | ... Melbourne, Australia: 9,678 miles | 0.0520 seconds | 264 | 13.7 seconds |
| BBC | London, England, UK (Tadworth) | ... San Francisco, CA, USA: 5,371 miles | 0.0288 seconds | 151 | 4.3 seconds |
* Server location is based on technical data from the Whois database, and may not reflect the actual location of the physical machine.
† Distances calculated by the "How Far Is It?" page of indo.com. Whois records indicate BBC's techinical contact in Tadworth, on the outskirts of London.
‡ Approximate Speed of Light Latency is calculated as 1 one-way trip from the requesting client to the server, plus # objects X one-way trip back to the client. Realistically, this data is not sent in one lump sum package - large objects are sub-divided into data packets, and would require further communications between client and server. Additionally, synchronous downloading is assumed, although almost all modern machines support threaded, asynchronous downloads.
Obviously, data doesn't travel under nearly such ideal conditions, but even in such a perfect world, there is a latency involved simply in the transit of electromagnetic waves, and this should be a good illustration that a page loaded up with objects is going to take longer to completely download and render than a page with few objects (and therefore fewer necessary round trips), even with modern web browsers that can download objects asynchronously.
Granted, many large companies deploy their web sites in a geographically distributed manner. That is, they have multiple servers spread across a geographical region, country, or even the entire globe. This cluster is responsible for serving content to the end user from the nearest available machine, singificantly reducing the round-trip distance that data packets need to travel. Each machine in the cluster keeps itself synched with a master server. In the case of the CNN and BBC examples above, it's likely that both companies deploy their sites in a geographically distributed nature - when I hit the BBC site, I'm most likely not receiving data from the Tadworth server, but instead the nearest available mirror. In fact, there are sophisticated DNS systems coupled with complex server clusters (for example, Sun Microsystems Solaris Clusters, or IBM WebSphere Application Servers) that route a user to the nearest available server node which itself caches and reassmbles dynamic content portions and returns those to the client in order to maximize the benefits of geographical dispersion and clustering.
However, in the typical small-to-medium business scenario, it usually isn't cost-effective, convenient, or practical to deploy a web site on a geographically distributed server cluster. In this case, the web site sits on a single server on a rack in the hosting environment, and all web requests, regardless of their origin, have to travel to that server. In such a situation, a good first step toward defeating long download times is to reduce the number of round trips for objects contained in the page, and that means reducing the number of objects in the page in general. This can be achieved by consolidating script files, reusing images, or adjusting page fragments to dynamically load objects when needed or stream content in real-time.
So, a single user in San Francisco, California, USA, requesting content from two sites - BBC News and his friend's dedicated blog in London - is likely to have his/her data follow two completely different round-trip paths. The BBC News request might be channeled to the nearest server node in, for example, Los Angeles, California, whereas the request for the friend's blog is probably going to have to travel across the United States and then the Atlantic ocean to that server in London, and then back again! If that page is full of images, script includes, style sheets, movies, and/or Flash objects, that's a lot of additional round-trips to London needing to take place in order to render the completed page.
Well-written Client Code
It should go without saying that well-written client-code is key in the perception that a rich Internet application is running smoothly. Drag and drop is a wonderful feature, but only if there isn't a several second latency between when the user initiates dragging and the browser starts to respond. Even in standard, Web 1.0 types of applications, which often-times utilize JavaScript powered drop-down menus and interactive screen elements, well-written code is important. After the server-side has been optimized and the number of round-trips brought down to a minimum, the next step should be to make sure that the JavaScript that powers your users' experiences is up to par.
Without going into details on how to produce well-written JavaScript (that's a different beast altogether), here are a few key concerns for optimization:
- Minimal number of nested loops and recursive function calls
- Proper utilization of reusable code (functions and classes) and the DOM
- Retrieving references to an object once and keeping that reference until it's no longer needed in the local scope
- Minimized number of round-trips for AJAX calls (retrieve all of the data you need in one call, if possible)
- Minimized number of "global" updates - only adjust the elements that need adjusting to avoid needless DOM rewrites
- Correct JavaScript syntax, utilizing semi-colons and other appropriate programming practices (well-written code lends itself to compression whereas poorly written code is likely to become broken by most compression algorithms)
White Space and Compression
Even the most well-written client code is very likely bigger than it needs to be. Assuming the appropriate server configuration, and the requesting client browser is capable, web responses can be sent compressed in GZIP format - most modern servers and browsers already do this behind the scenes. However, there are further steps that should be taken to reduce the overall download size of a page. Comments, formatting, and human-readable variable names, which are all extremely important in the development of reusable code, tend to take up copious amounts of space in those documents. Once a site moves into production, it's a good idea to remove as much white space as possible, strip out the comments, and even compress your JavaScript files. Make sure to keep the originals in a handy place, though, since you'll likely need to refer back to them at some point in time, or modify them for future enhancements.
The primary step in minimizing download size is to simply remove as much white space as possible from your documents, including your actual HTML, Cascading Style Sheets, and JavaScript files. This does not necessarily mean that you should remove all of the formatting from your HTML or CSS docs - there's a balance to be struck between nice formatting and too much white space - since these are the documents that you're most likely going to have to edit on a day to day basis as client needs change. However, it is a good idea to ensure that, for example, there aren't 100 lines of empty space between your header include and your document body, or that you've removed that big long explanation you wrote, as a comment in your CSS files, explaining how to implement the style sheets.
The place you'll most likely benefit from white space removal is in your JavaScript files, especially if your site utilizes AJAX techniques to provide a rich user experience. For example, the Prototype JavaScript library (version 1.6.0.1) is 124,000 bytes in size, uncompressed, with white space. By just removing the comments and white space (using the packer utility, explained in more detail below), the file size could be shrunk down to 91,405 bytes. That's a savings of 32,595 bytes.
Of course, when compacting your JavaScript files, you should go one step further and run them through a compressor that will not only remove the white space and strip the comments, but also rewrite your variable names and perform other optimizations to decrease your script sizes... There are, in fact, various free tools available on the Internet to help with compressing client scripts. For example, packer is a web-based tool (also available as source code for download and local consumption) that allows you to paste your source scripts into one text area, click a few options, and then copy out the compressed and/or rewritten and/or packed JavaScript for your production environment. Interestingly, packer has the option of Base62 encoding your scripts, with the decompression algorithm built into the code - sort of like a manual JavaScript GZIP - for those servers that don't support compression on their end.
Another useful compression tool comes from the folks at Yahoo! - the YUI Compressor. The YUI Compressor, which is a command-line Java-based tool, free for download and use, achieves similar results as packer, and has a few additional features that packer does not. In fact, there's a good comparison of the two tools, and a few other JavaScript compressors, on Julien LeComete's blog, here http://www.julienlecomte.net/blog/2007/08/13/.
Congruent Media has utilized both the YUI Compressor and packer tools in its efforts to ensure that our clients' end-users have the best web experience possible with our clients' sites. Packer is convenient because of it's web-based nature, but unless your JavaScript is written to the letter of the law, you could run into syntax trouble when the code is compressed down to one line. The YUI Compressor is a bit more forgiving, however, and appears to be able to handle at least some minor syntax variations, such as missing end-of-line semi-colons.
It doesn't matter too much which tool you use to optimize your JavaScript files - packer, the YUI Compressor, your own custom code, or even doing it by hand - as long as you do in fact reduce the white space and optimize your code. The bottom line is that a smaller file takes less time to download and parse, and your site users will notice the difference.
Recap: Prototype and Scriptaculous - Before and After
A quick look at the Scriptaculous library, and the Prototype library that comes bundled with it, before and after compression with packer and the YUI Compressor, produces some enlightening results:
| JavaScript | Size Uncompressed | Compressed w/ packer | Compressed w/ YUI Compressor |
| Prototype.js (1.6.0.1) | 124,000 bytes | 74,791 bytes (39.7%) | 72,487 bytes (41.5%) |
| Scriptaculous - Builder.js (1.8.1) | 4,770 bytes | 2,432 bytes (49.0%) | 2,437 bytes (48.9%) |
| Scriptaculous - Control.js (1.8.1) | 34,868 bytes | 21,487 bytes (38.4%) | 21,540 bytes (38.2%) |
| Scriptaculous - DragDrop.js (1.8.1) | 31,605 bytes | 19,043 bytes (39.7%) | 19,290 bytes (39.0%) |
| Scriptaculous - Effects.js (1.8.1) | 38,986 bytes | 25,443 bytes (34.7%) | 24,912 bytes (36.1%) |
| Scriptaculous - Scriptaculous.js (1.8.1) | 2,654 bytes | 972 bytes (63.4%) | 918 bytes (65.4%) |
| Scriptaculous - Slider.js (1.8.1) | 10,296 bytes< | 6,737 bytes (34.6%) | 6,758 bytes (34.4%) |
| Scriptaculous - Sound.js (1.8.1) | 1,920 bytes | 1,121 bytes (41.6%) | 1,130 bytes (41.1%) |
Using either compressor, we achieve results that reduce the file sizes by somewhere between 35%-40%, in most cases. This is a significant size reduction in and of itself. When coupled with download speeds and numbers of round trips, however, this can significantly improve performance.
In the CNN example, 41% of the download time, or approximately 23 seconds on a 1.44Mbs T1 connection or 76 seconds on a 56kbs modem, is directly attributed to JavaScript files. If all of those scripts were run through a compressor, even achieving 35% compression (recall that CNN uses Scriptaculous and Prototype), this would reduce the total script download size from 267,963 KB to 174,176 KB - that's 93,787 KB. This equates to shaving off approximately 8.4 precious web surfer seconds in the broadband case, or roughly 26.6 seconds over dial up!
Conclusions
Designers have learned, over time, that they have to optimize their images for Internet consumption, saving down higher-quality versions to optimized JPG, GIF, or PNG files. Even with the increased bandwidth that's available, it's become ingrained in designers minds and hearts that a web-optimized image is better, and for good reason. Even optimized images start adding up to a significant number of bytes in a fancy design or a site that showcases graphical elements.
Unfortunately, this lesson appears to have been forgotten when it comes to other types of web assets. Developers seem to think that whitespace is no space, but every whitespace character is 1 byte (2 bytes if you're working in a Unicode language), and all of that nice indentation and line spacing can sink your ship if you're not careful. Developers need to keep in mind that their JavaScript libraries and their CSS documents can and should be optimized for production. This is an especially important point to remember when you're producing a site that utilizes various Web 2.0 techniques. Those JavaScript libraries that power your fade effects, your drag-and-drop, your real-time data editing, and your awesome expanding portfolio, are thousands of lines long...
Optimization and compression does not mean that nice, neat commenting and code formatting should be thrown out the window. On the contrary, those items are key in keeping script files legible and sensible. However, once one of these files is ready for release, it should be optimized, even if it's just a matter of stripping out all of the comments and whitespace, and the optimized versions placed on your production server. Better yet, run your code through a program like Dean Edwards' packer or the YUI Compressor from Yahoo! - these can remove whitespace and comments and will rewrite variables to the smallest possible size in order to shrink your files even further, helping to ensure that download times are kept to a minimum. Just make sure to keep your unoptimized, uncompressed, nicely formatted and commented development code around for reference and enhancement.
Just because there's been an increase in bandwidth, and most users are on broadband connections, doesn't mean that there's an infinite supply of it to go around. Pages that run upwards of 200 KB of total data are still going to download and run slower than an end user has come to expect from the era of 1.2 GHz processors, and web sites bloated with unoptimized code will ruin a surfers experience, likely sending them elsewhere, including the competition.




There are no comments for this entry.
[Add Comment]