Update (27 Sept 2009): HTML5 now has
Custom Data Attributes,
a feature that provides exactly the function of "HTML annotations" that I describe here.
I have added two examples to the end of the list, one just setting the "data-" attributes, the other also using
the new
dataset DOM attribute to retrieve the values.
What is an Annotation?
An annotation is an item of information that you wish to closely associate
with some existing data, where the existing data is given in some standard
data format. Often, although not always, the annotation represents metadata,
i.e. data about data. In other cases, an annotation can extend the data in
some way, or extend the behaviour of the "object" that the annotated data represents.
Typically a given data format has a primary application which is the application
intended to process that data. The need for annotations arises when some
secondary application wishes to process the same data format, but it
also needs the extra data which makes up the annotation. (Sometimes the secondary
application is an extension or a later version of the primary application.)
If the data being
annotated represents some kind of source code, then for reasons of legibility,
it is desirable to place each annotation as close as possible to the thing
being annotated.
Java Annotations
Some well-known examples of annotation relate to the Java programming
language. In fact versions of Java 1.5 and above have an explicit annotation facility,
where classes, fields and methods can be annotated with annotation objects
which are constructed from annotation classes (which are defined by the programmer in
separate annotation class source files).
Prior to the addition of this formal annotation facility, the only official annotation scheme
in Java code was Javadoc, which required annotations to be written in the comments
(and which could be regarded as a structured form of commentary). But various other
unofficial annotation schemes and annotation-based frameworks appeared before Java 1.5,
including the following:
- JML: a formal specification system
for Java which embeds assertions and
specifications for methods and code within comments.
- XDoclet: a general
system for placing javadoc-like annotations within source code.
- Naked Objects: a GUI framework which annotates
classes, fields and methods using "about" methods and attributes.
Other Examples
- The properties or tags contained in media file formats such as MP3 or JPEG.
- File extensions. File extensions are an example of how to annotate when there is nowhere
to put the annotation – put the annotation into the name of the thing being annotated.
There is one kind of annotation which can't be done, but everyone wishes it could be,
which is an annotation specifying the character set encoding for a text file. Every
byte in a text file represents a character or part of a character, so there is nowhere to place even the tiniest
annotation describing the encoding. As a result, programmers are forever doomed to process
text files not knowing for sure what the character set encodings of those text files are.
HTML Annotations
HTML is a modern structured data format which has been through various
revisions and improvements. One might therefore expect it to have ample provision for annotations.
But what I have found is that although there are various methods by which HTML files and components
within HTML files can be annotated, none of these methods is completely satisfactory.
The following is a list of criteria by which I would judge any annotation scheme:
- Proximity: the annotations should be close to the data being annotated.
- Non-kludginess: the annotation author should not have to be
devious or indirect in how they write the annotation into the data file
- Syntactic Correctness: the annotations should not violate the official syntactic definition
of the data format. In the case of HTML, this means that the annotated HTML should validate.
- Reliability: the annotations should be reliably parsed and interpreted by the
application that reads them.
- No side-effects: the annotations should not impact on the "primary" customer
of the data format.
None of the HTML annotation schemes in my survey full satisfy all of these criteria,
although some come close.
The Example
For the purpose of testing HTML annotation schemes, I have devised a simple but perhaps
slightly contrived example, as follows:
- An HTML table is given, consisting of one column, where each table cell contains the name of
one country. To keep the examples short, I have limited myself to the six Western English-speaking
countries.
- A Javascript function drawPopulationBars, invoked in the body element's onload handler, takes
one argument which is the name of a function to extract an annotated population value
applicable to the table cell containing the country name. (The population values are taken
from the CIA World Factbook
and rounded to the nearest million.)
- For each country, the drawPopulationBars function determines the population value,
and then adds a second table cell to the table row which contains a coloured bar whose width
is proportional to the population, creating a sideways bar chart of country populations.
- To test the validation requirement, I have adopted XHTML-strict 1.0. (There is also an XHTML-strict
version 1.1, but as far as I can tell this does not add any features that make it possible to
create HTML annotations that satisfy all the criteria in my list.)
The nine examples are as follows:
- "Obtrusive" Javascript (source) For each country place a script element
next to the table cell, which specifies the population for that country via a global population map.
Unfortunately this fails the validation test, because script elements are not
allowed in the HTML body section.
- "Semi-Obtrusive" Javascript (source) Add a script element in the
HTML head section to specify values in the global map. This validates, but it no longer fully satisfies
the proximity requirement.
- "Unobtrusive" Javascript (source) Same as the previous scheme, but now the Javascript
is in a separate file. "Unobtrusive" Javascript is considered the absolute best kind of Javascript by some web
programmers, but as an annotation system it definitely fails the proximity requirement.
- Attributes (source) Place each population value in a "population" attribute of the corresponding table cell tag. For simple
data values, this is the least kludgy and most natural way of annotating data. Unfortunately it fails validation, because XHTML does not allow "population" attributes or any other user-defined attributes on HTML tags.
- Namespace Attributes (not working) (source) Same as the previous scheme, but using a separate XML namespace. Unfortunately this scheme doesn't even work, and in Internet Explorer you may get an error dialog (depending on your Javascript AKA "script" settings). My code appears to be correct (as compared to various XML worked examples I have seen) and Firefox does have a getAttributeNS function defined, which presumably is intended
to be used for something, so it is not too clear why this example doesn't work.
- "id" Attributes (source) Place the population data into the "id" attributes of the table cells, after the
string "population_". This works, and it validates, but it has a distinctly kludgy feel, especially as the "id"
attribute is supposed to be a unique identifier, and an identifier should not change just because some attribute of the thing
being identified is changing.
- "class" Attributes (source) Place the population data into the "class" attributes of the table cells, after the
string "population_". This is slightly better than the "id" option, because at least class attributes can have multiple values, and any unknown values are just ignored by CSS. But it is still somewhat kludgy.
- Invisible Elements (source) The population data is placed in a span element which is styled with CSS style display: none. This validates, and it works. There is, however, an unintended side-effect if the user is browsing with page style disabled, since the "hidden" data then becomes visible, whether the author wants it to be or not. Also, there is a risk that "invisible" data may be interpreted as an attempt at illegitimate SEO, resulting in your website being blacklisted by major search engines.
- HTML Comments (source) Place the population data in comments. This works and it validates. It doesn't even look too kludgy. Comments have historically been a favourite place to put annotations, and it is almost a law of computing that if a given data format has a comment syntax, then that syntax will eventually be used for the purposes of annotation. For example, "conditional comments" are already used for this purpose within Internet Explorer to specify code that should only be included by particular versions of the browser. And of the Java annotation examples given above, two of them involved annotations inside comments. There is just one problem with using comments to annotate HTML,
which is that according to
this section in the XHTML specification,
"XML parsers are permitted to silently remove the contents of comments". So it's a use-at-your-own-risk kind of annotation scheme.
- HTML5 "data-*" Attributes (source) Place each population value in a "data-population" attribute of the corresponding table cell tag. This uses the official new HTML5
Custom Data Attributes
feature.
- HTML5 "data-*" Attributes using "dataset" (source) Place each population value in a "data-population" attribute of the corresponding table cell tag (as for previous example), and use the new dataset
DOM attribute to return the data. Possibly doesn't yet work in your browser (but should work in the future).
Auxiliary files:
(Note: source code views were generated using Pygments.)