A Brief History of HTML

HTML5 is scheduled to enter Last Call Working Draft status in October 2009, next month. While that's not really the end of work, it's close enough that introductory articles on HTML5 are popping up all over the place. Beyond the technical details, these articles have a different tone than the previous round of introductory markup articles, when XHTML was the new hot. Where XHTML was presented as new with a reason (extensibility), HTML5 is offered as just new.

Part of that difference is just that the web has matured beyond the necessity to sell web standards; it's a given now. But another part of the difference in HTML5's pitch (or lack thereof) is the backstory, the process by which HTML5 came to be. It's an interesting story, and should be useful in understanding the why of HTML5.


In the beginning (1993), there was Tim Berners-Lee, and Tim created HTML. He said "this is good" and proposed a draft to the Internet Engineering Task Force (IETF), a standards organization. IETF drafts require implementations, so the HTML draft referenced Mosaic, a web browser that later became Netscape, which later became Firefox. Mosaic, of course, would have been rather worthless without HTML. So a symbiosis between browsers and web standards drove the web from the beginning.

You may not have heard of XHTML 2, because browser vendors fell all over themselves in their rush to not implement it...

When the original HTML draft expired in 1994, the IETF created the first HTML Working Group (HTMLWG), who created HTML 2. Also in 1994, Tim created the World Wide Web Consortium (W3C), with a mission To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web. If that sounds like what the HTMLWG was doing, that's because it was. The two standards bodies didn't run in parallel for long.

In 1996, after a series of additions to HTML 2, the IETF HTMLWG was closed and further work on HTML moved to the W3C. The W3C published HTML 3.2 and HTML 4, both in 1997. In December of 1999, HTML 4.01 was published. In the seventh year (1998), the W3C rested. And rested. And rested. HTML hasn't changed since. In summary: one man created HTML, two standards organizations worked on it for 5 years, and then it wasn't changed for 10 years.


And then there was XHTML. Backing up a bit, HTML was always based on SGML, the Standard Generalized Markup Language, an International Standards Organization (ISO) standard. Without getting into the details of SGML, many saw it as too ambiguous for a reliable web. Enter XML. The eXtensible Markup Language, is a stricter subset of SGML, with the goal of clearer communication. It's also, as the name implies, designed for extensibility.

The W3C didn't actually rest in 1998; it only rested on HTML, while publishing XHTML 1.0. XHTML 1.0 is essentially HTML 4 as XML instead of SGML. Many web publishers saw XHTML as the future and started publishing it instead of HTML 4. Some didn't see much value in XML syntax and decided not to switch, or to move back after trying it. One of the latter group was a guy named Ian Hickson, who in 2002 published a much-cited article on one key problem with XHTML: Microsoft's Internet Explorer browser didn't (still doesn't) handle XHTML well. Sometimes the symbiosis between browsers and standards doesn't work out so well. More on Ian later.


Like HTML, nothing has really changed with XHTML since 1999. But not for lack of trying. In 2002, the W3C first published XHTML 2, a complete rethinking of XHTML, this time less (not at all) focused on mirroring HTML 4 and more (completely) focused on extensibility. You may not have heard of XHTML 2, because browser vendors fell all over themselves in their rush to not implement it, so it never went anywhere. Symbiosis.

Many attribute XHTML 2's failure to the lack of compatibility with HTML 4 or XHTML 1. Of those who even noticed XHTML 2, one of the most scathing critiques of it came from Mark Pilgrim who summarized a 2003 article with Standards are bullshit. XHTML is a crock. The W3C is irrelevant. Ouch. More on Mark later.

2 Kings

Amidst the lack of progress on XHTML 2, there was a workshop in 2004 in which some people decided the W3C was no longer managing the web very well. So they created their own organization, the Web Hypertext Application Technology Working Group (WHATWG). To quote the WHATWG's FAQ:

The WHATWG was founded by individuals of Apple, the Mozilla Foundation, and Opera Software in 2004, after a W3C workshop. Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML and apparent disregard for the needs of real-world authors. So, in response, these organizations set out with a mission to address these concerns and the Web Hypertext Application Technology Working Group was born.

So now you can answer questions about HTML5 without even looking at the draft, which is handy, because the draft is 400+ pages long.

Notably absent from that list of browser vendors is Microsoft, vendor of the most popular widely-used browser. The WHATWG nonetheless pressed forward working on what they called Web Apps 1.0, an incremental improvement to HTML, which people quickly started referring to as HTML5. "But wait," you say, "I thought the W3C was in charge of HTML." Well yeah, so did they. So did the IETF. Remember that?

The initial WHATWG announcement was written by Ian Hickson, who careful readers will remember from his earlier criticism of XHTML, specifically Microsoft's handling of it. You can follow the WHATWG's own take on their work at their blog, which is updated by Mark Pilgrim, who is also writing a book about HTML5. You may remember him as the guy who said XHTML is a crock. The W3C is irrelevant. I did say this would be an interesting story.

1 Kings

Perhaps hoping the kids would get bored and disperse on their own, the W3C didn't directly tell the WHATWG to get off their lawn. They didn't much react to HTML5 at all until 2006, when W3C director Tim Berners-Lee (remember him? guy who started all this?) wrote Reinventing HTML in which he said Some things are clearer with hindsight of several years. It is necessary to evolve HTML incrementally. That was a significant (complete) change from the W3C's previous position, essentially that HTML was dead, to be non-incrementally replaced by XHTML 2. If the new HTMLWG sounded a lot like the WHATWG, that's because it was.

But unlike the previous transition from the IETF to the W3C, the WHATWG didn't simply hand control of HTML back to the W3C. Even after the WHATWG's HTML 5 draft was formally adopted by the HTMLWG in 2007, the WHATWG continued working on it. They did start working more closely with the W3C, but the relationship is ... what do the kids call it on Facebook now? ... it's complicated.

If you look at the HTML5 draft at the W3C, you'll find this explanation of who is creating the spec:

The W3C HTML Working Group is the W3C working group responsible for this specification's progress along the W3C Recommendation track... This specification is also being produced by the WHATWG.

The HTML Working Group is chaired by Sam Ruby of IBM and Chris Wilson of, wait for it... Microsoft. Ian is the editor of the HTML5 draft, and a member of both the HTMLWG and the WHATWG. Chris once said I would hope in the eventuality of time the WHAT-WG would simply dissolve because it’s no longer necessary... in my opinion HTML is not in the hands of the WHAT-WG and never has been, despite calling a spec or set of specs 'HTML 5'; it belongs to the W3C.

Whereas Ian once said The HTML5 work isn’t using the traditional W3C approach, and will never use a consensus approach so long as I am editor. In summary: HTML5 has been developed within the WHATWG, intentionally avoiding W3C processes, but with plans to have it be adopted by the W3C, via a group chaired by someone who thinks the WHATWG never had any authority to work on it in the first place.

Meanwhile, the W3C announced 2 months ago that XHTML 2 was officially dead. HTML5 does include XHTML5, an XML syntax of the language. So XHTML in general isn't dead, but it's not exactly healthy. Given the lack of any other path forward, it does seem very likely that somehow HTML5 will become the recommended web publishing format of both the WHATWG and the W3C's HTMLWG. Somehow.


One thing that is entirely clear about the future of HTML5 is that browsers will support it. That's clear because they already support it. One aspect of it, canvas, has been supported, and heavily relied upon, for years. If nothing else, HTML5 has already reversed the relationship between browsers and standards bodies. Instead of the W3C all but begging browsers to implement standards, browsers are now impatiently waiting for the W3C to recommend the standards they've already implemented.

So now you can answer questions about HTML5 without even looking at the draft, which is handy, because the draft is 400+ pages long. Why is there a new <video> tag in HTML5? Because some browser vendor (maybe the one that also owns a large video site) wanted it. Why are there so many scriptable interface elements in HTML5? Because some browser vendor (maybe the one selling phones without Flash support) wants them. Why is there no support for RDFa in HTML5? Apparently no browser vendor wanted it.

Beyond understanding how HTML5 got here, this background should also give you a good idea of where it's headed. Until the next major shift, browser vendors will be driving standards. If you want a <pegacorn> tag in HTML 6, get it supported by a browser or two.