Tuesday, December 13, 2011

How Google Can Quickly Update 200 Million Chrome Users


The development process of Chrome and the way Google has scaled the release of Chrome in an apparently effortless way to hundreds of developers and hundreds of millions of users is one of the big success stories in software development these days. A Chrome developer recently gave some insight in this release process and the pillars that allow Google to succeed with a model that causes so much headache for other software companies.

The notes come courtesy of Marc-Antoine Ruel, who published a summary of a presentation he gave at an agile development conference last month. Ruel’s explanations provide the most detailed information about the accelerated release browser cycle initiated by Google a little over a year ago. There is quite a bit of background information that sheds some light on Google’s philosophy how to deal with now more than 200 million Chrome users and update them “within 6 hours.”

In contrast to Mozilla’s open “rapid release cycle”, which is still very much a work in progress and is continuously discussed as being in need of improvements especially on the feature management side, the approach of  Chrome appears to be much more defined and aggressive toward a few focus areas that may not be so obvious to everyone at first sight. According to Ruel, a critical enabler for Google’s vision for developing and deploying Chrome surrounds the idea of reducing “friction” for developers, users, the software itself and the development and release of security fixes.

Google Buildbot

The published document invites comparison to Mozilla, which followed Google with a 6-week release cycle earlier this year, but has not been able to deliver on the promise of faster feature implementation so far. The intended benefits of an accelerated cycle are two-fold: First, software updates are less disruptive than an annual update with a greater number of changes to the software and, second, new features can be delivered to the user when they are available, and not just in an annual upgrade, which is, for example, still practiced by Microsoft with its IE10, which appears to be on a 12-month release cycle (or 2-month cycle, if we include security fixes and the release of IE browsers with minor version number changes).

If all works well, this “agile” software deployment can also result in tremendous learning how to deal with large user bases of web applications. The obvious downside of this development process is faster change for those environments that do not like change, including corporate environments, which heavily complained about Mozilla’s switch from a 6-month to 6-week cycle. Interestingly, this complaint has only hit Mozilla and not Google and Google largely ignores those complaints, even if it provides administration tools that enable system administrators to set their own pace of upgrades. However, Ruel’s document indicates that Google largely ignores those complaints and the company may be right in some aspects: Rapid releases have given Google a huge advantage in some areas that count – convenience for most users and the fastest deployment cycle of security fixes, which contribute to the perception of Google as being an always updated and secure browser. On many levels, Ruel’s presentation summary is a worthwhile read not just for those who are interested in Google’s ideas, but for all those dealing with the problem of how to translate agile software development for local apps to web apps.

Google Chrome Release Pipelining

It is apparent that Google does not take any chances on its master development branch. While most new features for Chrome are developed on this core branch, they are disabled just after “forking a release branch if not stable worthy”. This is the reason why there are more experimental features available on the canary versions than on the developer, beta and stable branches of the browser. Google makes black-and-white decisions about features that do not have a chance to go stable for the current release. If the feature is continued to be developed, it will have another chance for the next release. By the way, Google has four release versions in the works at this time: The stable version is at 15 and about to move to 16, while the canary and nightly branch goes down to version 18. Since the browser’s launch in late 2008, Chrome developers have applied more than 104,000 changes to the code of the Chrome. The best way to track those changes is the Chromium revision log, which logs every change made to the browser as well as to Chrome OS.

No matter which changes are made to the master branch, Google’s intent is to keep the master branch in a state that is always “shippable”. The company refers to the branching of Chrome versions as “pipelining” and the chart offered by Ruel certainly reveals why “pipelining” is an appropriate description for a process that includes parallel branches and accepts the abortion of entire branches should they fail quality assurance. Ruel said that pipelining also enables Google to bring “dot releases” to the beta and stable channel with the purpose of deploying security fixes. Overall, there is no code in the development process that is older than 3 months, which helps Google with code base refactoring – a problem Mozilla Firefox developers are currently dealing with.

Ruel also provided some information on the forced, automated update process used in Chrome and the way Google tries to be “respectful” to other applications running on a PC and, in the end, to the user. For example, to avoid extension breakage, Google offers a free ping infrastructure for extension developers, which enables an automated update of extensions within 5 hours in a worst case scenario. For the update process itself, Ruel said that Chrome never tests for updates on system startup as the system is overloaded anyway, it restricts the use of physical memory and the bandwidth that is occupied for auto-updates. Background processes are slowed to reduce CPU usage and increase battery life on notebooks. However, if an update, in fact, has errors and breaks especially security features for users, Ruel said that the “release team must always be ready to fire and manual [quality assurance] must be minimal.” According to the developer, Google occasionally notices problems and provides fixes even before users notice the breakage.
Recently, sandboxing has been viewed as Chrome’s critical advantage in security, but Ruel noted that the key to securing the browser is to deploy fixes fast, unnoticed by the user. “The basic idea is to update the system but make it almost transparent and hopefully magical to the user,” Ruel said. “Subtle enough that most users won’t realize but power users will act upon. In both Google Chrome and ChromeOS, we decided to add a little green arrow when an update is ready to be installed.”

Wolfgang Gruener in Business Products on December 12

No comments: