What is an Annotation?
An annotation is an item of information that you wish to closely associate with some existing data, where the existing data is given in some standard data format. Often, although not always, the annotation represents metadata, i.e. data about data. In other cases, an annotation can extend the data in some way, or extend the behaviour of the "object" that the annotated data represents.
Typically a given data format has a primary application which is the application intended to process that data. The need for annotations arises when some secondary application wishes to process the same data format, but it also needs the extra data which makes up the annotation. (Sometimes the secondary application is an extension or a later version of the primary application.)
If the data being annotated represents some kind of source code, then for reasons of legibility, it is desirable to place each annotation as close as possible to the thing being annotated.
Java Annotations
Some well-known examples of annotation relate to the Java programming language. In fact versions of Java 1.5 and above have an explicit annotation facility, where classes, fields and methods can be annotated with annotation objects which are constructed from annotation classes (which are defined by the programmer in separate annotation class source files).
Prior to the addition of this formal annotation facility, the only official annotation scheme in Java code was Javadoc, which required annotations to be written in the comments (and which could be regarded as a structured form of commentary). But various other unofficial annotation schemes and annotation-based frameworks appeared before Java 1.5, including the following:
- JML: a formal specification system for Java which embeds assertions and specifications for methods and code within comments.
- XDoclet: a general system for placing javadoc-like annotations within source code.
- Naked Objects: a GUI framework which annotates classes, fields and methods using "about" methods and attributes.
Other Examples
- The properties or tags contained in media file formats such as MP3 or JPEG.
- File extensions. File extensions are an example of how to annotate when there is nowhere to put the annotation – put the annotation into the name of the thing being annotated.
There is one kind of annotation which can't be done, but everyone wishes it could be, which is an annotation specifying the character set encoding for a text file. Every byte in a text file represents a character or part of a character, so there is nowhere to place even the tiniest annotation describing the encoding. As a result, programmers are forever doomed to process text files not knowing for sure what the character set encodings of those text files are.
HTML Annotations
HTML is a modern structured data format which has been through various revisions and improvements. One might therefore expect it to have ample provision for annotations. But what I have found is that although there are various methods by which HTML files and components within HTML files can be annotated, none of these methods is completely satisfactory.
The following is a list of criteria by which I would judge any annotation scheme:
- Proximity: the annotations should be close to the data being annotated.
- Non-kludginess: the annotation author should not have to be devious or indirect in how they write the annotation into the data file
- Syntactic Correctness: the annotations should not violate the official syntactic definition of the data format. In the case of HTML, this means that the annotated HTML should validate.
- Reliability: the annotations should be reliably parsed and interpreted by the application that reads them.
- No side-effects: the annotations should not impact on the "primary" customer of the data format.
None of the HTML annotation schemes in my survey full satisfy all of these criteria, although some come close.
The Example
For the purpose of testing HTML annotation schemes, I have devised a simple but perhaps slightly contrived example, as follows:
- An HTML table is given, consisting of one column, where each table cell contains the name of one country. To keep the examples short, I have limited myself to the six Western English-speaking countries.
- A Javascript function drawPopulationBars, invoked in the body element's onload handler, takes one argument which is the name of a function to extract an annotated population value applicable to the table cell containing the country name. (The population values are taken from the CIA World Factbook and rounded to the nearest million.)
- For each country, the drawPopulationBars function determines the population value, and then adds a second table cell to the table row which contains a coloured bar whose width is proportional to the population, creating a sideways bar chart of country populations.
- To test the validation requirement, I have adopted XHTML-strict 1.0. (There is also an XHTML-strict version 1.1, but as far as I can tell this does not add any features that make it possible to create HTML annotations that satisfy all the criteria in my list.)
The nine examples are as follows:
- "Obtrusive" Javascript (source) For each country place a script element next to the table cell, which specifies the population for that country via a global population map. Unfortunately this fails the validation test, because script elements are not allowed in the HTML body section.
- "Semi-Obtrusive" Javascript (source) Add a script element in the HTML head section to specify values in the global map. This validates, but it no longer fully satisfies the proximity requirement.
- "Unobtrusive" Javascript (source) Same as the previous scheme, but now the Javascript is in a separate file. "Unobtrusive" Javascript is considered the absolute best kind of Javascript by some web programmers, but as an annotation system it definitely fails the proximity requirement.
- Attributes (source) Place each population value in a "population" attribute of the corresponding table cell tag. For simple data values, this is the least kludgy and most natural way of annotating data. Unfortunately it fails validation, because XHTML does not allow "population" attributes or any other user-defined attributes on HTML tags.
- Namespace Attributes (not working) (source) Same as the previous scheme, but using a separate XML namespace. Unfortunately this scheme doesn't even work, and in Internet Explorer you may get an error dialog (depending on your Javascript AKA "script" settings). My code appears to be correct (as compared to various XML worked examples I have seen) and Firefox does have a getAttributeNS function defined, which presumably is intended to be used for something, so it is not too clear why this example doesn't work.
- "id" Attributes (source) Place the population data into the "id" attributes of the table cells, after the string "population_". This works, and it validates, but it has a distinctly kludgy feel, especially as the "id" attribute is supposed to be a unique identifier, and an identifier should not change just because some attribute of the thing being identified is changing.
- "class" Attributes (source) Place the population data into the "class" attributes of the table cells, after the string "population_". This is slightly better than the "id" option, because at least class attributes can have multiple values, and any unknown values are just ignored by CSS. But it is still somewhat kludgy.
- Invisible Elements (source) The population data is placed in a span element which is styled with CSS style display: none. This validates, and it works. There is, however, an unintended side-effect if the user is browsing with page style disabled, since the "hidden" data then becomes visible, whether the author wants it to be or not. Also, there is a risk that "invisible" data may be interpreted as an attempt at illegitimate SEO, resulting in your website being blacklisted by major search engines.
- HTML Comments (source) Place the population data in comments. This works and it validates. It doesn't even look too kludgy. Comments have historically been a favourite place to put annotations, and it is almost a law of computing that if a given data format has a comment syntax, then that syntax will eventually be used for the purposes of annotation. For example, "conditional comments" are already used for this purpose within Internet Explorer to specify code that should only be included by particular versions of the browser. And of the Java annotation examples given above, two of them involved annotations inside comments. There is just one problem with using comments to annotate HTML, which is that according to this section in the XHTML specification, "XML parsers are permitted to silently remove the contents of comments". So it's a use-at-your-own-risk kind of annotation scheme.
- HTML5 "data-*" Attributes (source) Place each population value in a "data-population" attribute of the corresponding table cell tag. This uses the official new HTML5 Custom Data Attributes feature.
- HTML5 "data-*" Attributes using "dataset" (source) Place each population value in a "data-population" attribute of the corresponding table cell tag (as for previous example), and use the new dataset DOM attribute to return the data. Possibly doesn't yet work in your browser (but should work in the future).
Auxiliary files:
(Note: source code views were generated using Pygments.)