Update (27 Sept 2009): HTML5 now has Custom Data Attributes, a feature that provides exactly the function of "HTML annotations" that I describe here. I have added two examples to the end of the list, one just setting the "data-" attributes, the other also using the new dataset DOM attribute to retrieve the values.

What is an Annotation?

An annotation is an item of information that you wish to closely associate with some existing data, where the existing data is given in some standard data format. Often, although not always, the annotation represents metadata, i.e. data about data. In other cases, an annotation can extend the data in some way, or extend the behaviour of the "object" that the annotated data represents.

Typically a given data format has a primary application which is the application intended to process that data. The need for annotations arises when some secondary application wishes to process the same data format, but it also needs the extra data which makes up the annotation. (Sometimes the secondary application is an extension or a later version of the primary application.)

If the data being annotated represents some kind of source code, then for reasons of legibility, it is desirable to place each annotation as close as possible to the thing being annotated.

Java Annotations

Some well-known examples of annotation relate to the Java programming language. In fact versions of Java 1.5 and above have an explicit annotation facility, where classes, fields and methods can be annotated with annotation objects which are constructed from annotation classes (which are defined by the programmer in separate annotation class source files).

Prior to the addition of this formal annotation facility, the only official annotation scheme in Java code was Javadoc, which required annotations to be written in the comments (and which could be regarded as a structured form of commentary). But various other unofficial annotation schemes and annotation-based frameworks appeared before Java 1.5, including the following:

Other Examples

There is one kind of annotation which can't be done, but everyone wishes it could be, which is an annotation specifying the character set encoding for a text file. Every byte in a text file represents a character or part of a character, so there is nowhere to place even the tiniest annotation describing the encoding. As a result, programmers are forever doomed to process text files not knowing for sure what the character set encodings of those text files are.

HTML Annotations

HTML is a modern structured data format which has been through various revisions and improvements. One might therefore expect it to have ample provision for annotations. But what I have found is that although there are various methods by which HTML files and components within HTML files can be annotated, none of these methods is completely satisfactory.

The following is a list of criteria by which I would judge any annotation scheme:

None of the HTML annotation schemes in my survey full satisfy all of these criteria, although some come close.

The Example

For the purpose of testing HTML annotation schemes, I have devised a simple but perhaps slightly contrived example, as follows:

The nine examples are as follows:

Auxiliary files:

(Note: source code views were generated using Pygments.)