Tagged.

September 4th, 2009

Part the first:

The Hypertext Markup Language was designed to describe the structure of scientific documents and how they related to one another.

Part the second:

The Hypertext Markup Language is used to describe the way a given screen of information is structured, presented and responds to user input.

Thesis:

Everything wrong with HTML5 can be ascribed to the differences between those two sentences.

The premise of HTML5 is maddeningly simple. Over the last few years, it’s become painfully obvious that the elements, attributes and behavior described by HTML4 is woefully out of touch with how developers are using it. The approach, as best I can tell, is to describe a new set of elements, attributes and behaviors that will be woefully out of touch with how developers are using it ten years from now.

Some of this insanity is being done in the name of “backwards compatibility”, though a good deal of the specification isn’t backwards compatible. Some is being done in the name of what browser manufacturers will actually implement, though most haven’t been able to implement things we agreed on over a decade ago, while implementing numerous things none of us even thought of but now find vital. And some is being done based on consensus and best practices, which are not only arbitrary, but subjective; prone to changes like the weather in Chicago.

HTML is a simple construct riddled with deep logic and a rich history. It began as a way to tell an application how a document was structured, so that it could figure out how to display it. Then we added ways to tell the applications how to display it. Then how the document should behave.

When that got too complicated to fit in one language, we divided the task. We created Cascading Stylesheets so that the HTML didn’t have to tell the browser how to display the document. JavaScript was invented to tell the browser how the page should behave. Over the last few years, we’ve managed to cobble from these three pillars rich and interactive screens of information, the likes of which HTML was never designed to describe.

For the most part, it works.

It works largely because we decided, collectively, that HTML was woefully inadequate at describing these things. We accepted that we did not have a perfect match for every element we needed, and were more than happy to just describe the general structure of the page, and rely on our presentation layer to make everything look the way we want it to look and for our behavioral layer to make our gadgets dance shiny-bright-and-right.

We can pretend that we choose our markup carefully, and that the elements we use do have meaning, but in reality, the elements we use are at best abstract ideals of the thing we’re describing. The very existence of HTML5, with its new tag soup, gives voice to that. The passionate arguing of the Super Friends and their ilk make it clear that we know we’ve been lying to ourselves; that we’ve been compromising.

HTML5, as drafted, is just setting us up for new lies.

We’ve accepted we were meant to be women, and we’re just gonna apply some lipstick and hope we pass.

We were right.

The best part of the last six years of web development is how right so many people turned out to be about what we needed. The early evangelists drilled into our heads the idea that semantics and structure should be separated from display, that behavior should be added progressively. That we should consider the worst-case scenario and build around that.

The tiers we’ve setup work amazingly well.

Consider the stylesheet. Its basic construct is even simpler than HTML. A set of over 90 attributes that can be applied to any element to make it look however we want. The benefit of this fantastically simple design is that we can transform near any element to look like damn near any other element. Yes, the specification has holes, some gaping, but the principle of its simplicity is sound enough that building atop it is easy, and that very soon, all major browsers will implement a good enough version of it that we can get our jobs done.

Consider JavaScript. Instead of a specific set of tools, JavaScript defines itself as a general-purpose programming language, with relatively advanced features, and a set of simple behaviors we can override at our will. I can make a click do near anything I want. It isn’t defined by any rules which say that the click must “perform an action” or “open a link”.

General rules with massive flexibility, agnostic to to intent and focused on use. These defining qualities of CSS and JavaScript are what have gotten us this far.

Instead of exploring that success, HTML5 is a mess of forced semantics. It’s piles of paragraphs dedicated to when and why something should be called a footer. What a header should be. What elements should and shouldn’t have links on them.

Most of the tag soup in HTML5, and HTML in general, can be attributed to the belief that an element has an intrinsic meaning to a human being. A footer element exists so that someone can read the source code and know a thing is a footer. The specification doesn’t say that a footer should always appear at the bottom of a section, only that “they usually do”. It doesn’t outline exactly what we should put in a footer only that they generally include “who wrote it, links to related documents, copyright data, and the like.” It defines nothing more than that it, and I love this, “represents a footer for its nearest ancestor sectioning content.”

What type of backwards pseudo-explanation is that? It’s an explanation destined to be read and parsed by humans. It has no technical merit. No presentational or behavioral limitations imposed beyond an arbitrary recommendation about what it should include.

The mind races.

Johnny robot can’t read.

This push, towards creating an extended and representative set of elements is all about moving towards the semantic web. The belief, correct in my opinion, that to really move forward, we as developers need to be able to describe our applications and documents in more rich, appropriate ways that can give better hints as to the structure and nature of the thing we’re developing.

As our documents become more verbose, and better defined, new technologies will come online that can analyze these documents and inflect meaningful relationships between them. That our computers will be able to know when there’s an address on a page, and when there’s someone’s name.

Part of what HTML5 seeks to do is codify a large set of appropriate tags that all developers will use, so that these technologies are possible.

As I mentioned earlier, developers have gotten quite good at adopting informal protocols to represent things. Better still, many browser developers have come to adapt the more widely used of these informal protocols, or defining their own that we adopt en mass. It’s not perfect, but it works. Because a large part of our site’s success rests on other people being able to parse and analyze our documents, the best of us will adopt informal protocols without the necessity of a codified standard. We’ve already done it. Think micro-formats. Sure, they’re not pervasive, but that’s largely because they’re overly verbose and hard to remember. The spirit of the new specification is right. An address should be represented by an address tag.

We’re simply not very smart. We have no real clue what we’ll be doing through a web browser in 10 years. The next big wave of innovation could come from expressive, 3D simulations. We may return to mostly desktop-oriented applications with the web becoming a rich system of data sources with limited formatting. We might abandon the whole thing and go back to gopher.

One great idea could make all of HTML5, as it stands, look immediately insufficient.

So why bother defining specific tags at all?

In the style of a waltz.

The basics of CSS provide the perfect foundation for an element agnostic markup language. CSS does not care which display properties you apply to which elements. There are defaults, yes, some of which different browser manufacturers disagree on, but the first thing any competent web developer does is define a set of CSS rules that resets all browsers to making everything essentially look exactly the same.

So there is nothing stopping me from using my own set of arbitrary elements, beyond the fact that someone decided to ignore them for no other reason than because a specification told them to.

If HTML becomes nothing but a light, formal specification for the construct of a structure, we never have to revisit this damn discussion again. Browser makes will pay attention only to the structure of a document, applying presentational rules as required. Developers will adopt informal standards for what tags to use, with some killer application eventually deciding that a particular standard is the one it prefers, and we’ll all use that.

But without a standard, everyone will do things differently!

Well, yeah, for awhile. But frankly, that’s going to happen anyway. Opera is going to do one thing with your lovingly suggested address element, Safari will do another, Firefox yet another, until one of them does it right and the rest will copy them to stay relevant. There’s a guy in a basement apartment right now thinking of a new kind of web browser that it’s going to use all of this stuff in some manner none of us can fathom right now.

So break off the definition of specific element types into smaller, more agile working groups. Instead of revisiting a massive specification every 10 years, small communities can form around getting a subset of tags right, with the best standard becoming adopted by the vast majority of people. Advocates and evangelists will do what they’ve always done; make their case and make it loudly.

The notion of any standard, even one painstakingly constructed with the assistance of browser makers, ignores the idea that the guys working with us today might not be the ones we care about tomorrow. Safari, the best browser around, didn’t even exist until seven years ago, and you’re expecting the marketplace to remain stagnant for the next 25?

Insanity.

Behave, children.

Beyond the tag soup, the big HTML5 get is a new series of interactive form elements that will let developers create advanced, desktop-like behavior with better rich media capabilities.

My own excitement towards HTML5 is entirely based on these. Building something as basic as a slider, in the current environment, takes far too much code when compared to its desktop equivalent.

This has lead to thousands of different interpretations as to what a slider should look like, and act like. It’s a usability nightmare that stems from a development one.

But I wonder why an element that allows a user to drag a control in one direction or another to define a numerical value will always be a “input” of type “range”. It could just as easily be a thermometer. Or a scale. Or a gauge. Depending on the interface we’re describing, any noun could apply to that functionality, just as any noun could apply to a block of text.

What we really need is for the attributes of a slider to be available to any element. We need a way to say that what I call a gauge, acts like a slider, and the value it returns is called steam pressure, not “integer”.

Why are we correcting issues with one set of semantic names, which work but aren’t quite right, only to maintain and extend an entirely different set of nouns which are the most generic of the bunch.

Still with me?

I don’t mean to impugn or insult any of the people who’ve worked so hard, and so well, on the HTML5 specification. What they’ve managed to do, take a decidedly dead thing and give it life, is amazing. I applaud them. If I meet them, I’ll shake their hands, kiss their babies, and praise their gods.

But like them, this is the thing I do every day. The thing I think about every day.

And I guess I have some pretty strong feelings on the topic.