Miski: A White Paper

Philip Dorrell, 3 June 2000.

What is Miski ?

Is it a bulletin board ? Is it Usenet ? Is it email ? Is it email lists ? Is it push technology ? Miski is the code word for a new technology that has elements of all these existing technologies, but is actually not the same as any of them. A long-winded technical name for it is Poster-Centric Message Subscription Protocol, or PCMSP for short.

If there is one idea that underlies Miski, it is the following -

The best determinant of the likely quality of information that you have not yet received is the source of that information.

The result of using this as a basic design principle is that Miski, in its normal mode of operation, is intrinsically spam-proof.

A brief summary of the features that define the Miski system is as follows -

Each user has a unique identifier in the system.
Each user identifier consists of a domain name and a user name unique within that domain. The same syntax can be used as for email, however it should be noted that a 1-to-1 mapping between Miski names and email addresses is not necessarily desirable (in particular due to spammability issues, see below for more details).
Each user defines a subject space. This consists of a list of subjects. Each subject must have a self-contained description, and to make the subject completely context independent, must be prefixed by a standard language identifier.
A user can also define an inclusion relation on their subject space, which must be a partial ordering, i.e. with no cycles.
A second user can subscribe to a subject of a first user. They will then receive all messages posted by the first user in that subject, or in a subject included in the subscribed subject according to the posting user's subject inclusion relation.
A second user reading a posted message from a first user that they have subscribed to can repost that message into their own subject space. This causes all users subscribed to that subject in the second user's subject space to receive the reposted message, if they have not received it already. Messages can be reposted an indefinite number of times.

There is more than one way that such a system could be implemented, however the following arrangement gives the best scalability, accountability and a naturally mapping to the domain name system:

Each user runs client software which communicates with that user's own server, the server being determined as a function of the domain name part of their user id. The user's server maintains that user's subject space.
When subscribing, the subscribing user communicates with their own server, which then communicates (if necessary) with the server of the user being subscribed to. Subscription information is retained on both the subscribing user's server and on the posting user's server.
A user posting messages posts them to their own server. The posting user's server delivers header information to the servers of subscribers who are to receive the message. The message body remains on the posting user's server, as a normal web page.
The subscribers' servers deliver message headers to the subscribers in whatever manner the subscribers have arranged to have them received. The headers include a web URL pointing to message bodies for those messages for which the receiving users decide they want to receive the full contents.
A user reposts a message by communication with their own server. The header is augmented with details of the reposting user, and delivered to subscribers of the reposting user. The subscribers' servers check the unaugmented portion of the header to determine if their subscribers have already received the header of the reposted message. If not, they deliver the header to their subscribers. The message contents remain on the original posting user's server, and are retrieved from there.

Reposting can be regarded as a means of moderation, in that any user can effectively moderate any message in the sense of making it available to users who have subscribed to them. It can also be regarded as a form of deep linking, and all users posting messages via Miski implicitly give their permission for their messages to be linked to in this fashion. Message contents remain under the original posting user's control, and can if necessary be edited or even deleted.

Spammability

A problem with many existing communication systems on the Internet is that they assume good intentions on the part of all users. This is not a realistic assumption for a system that can be used by hundreds or even thousands of millions of users subject only to their ability to pay communications charges. The worst sort of abuse is called spam, which consists of blatantly irrelevant content posted to a discussion system or sent to an individual user by some means, or even just left in a place where a search engine will find it.

The retro-fitted solution is to actively identify and delete the most blatant examples of spam. But this type of spam is only the most obvious symptom of a more general problem, which is that there is no way for an open system to guarantee even a reasonable probability of quality in content, especially if one takes into account the fact that quality is often very subjective.

The drastic cure imposed by the Miski system is that a subscriber only receives messages posted by users that they have subscribed to. This is the poster-centric aspect of the system. Reposting is a mechanism that makes it possible for users to receive messages originally posted by users that they are not subscribed to, and turns Miski from a somewhat closed system into a system that is as open as a system can be without being spammable.

Above I qualified the spam-resistant nature of Miski by stating that it only applies to its normal mode of operation. There are two main circumstances where users can choose to expose themselves to spam to a controlled extent if they wish to.

The first is that users will inevitability wish to search Miski servers for other users advertising themselves as interested in given topics. This will be vulnerable to spamming in the same way that web search engines are subject to spamming, and the risk is the same.

The second has to do with replies. Most discussion systems allow messages to be replied to. Such a facility will be included in Miski. The header of a reply will contain a pointer to the original message. Thus anyone reading a reply can follow the link to the original message, even if they were not subscribed to the sender of the original message, and did not receive it by way of repost. Or to put it another way, replying to a message implicitly reposts it. However there is no guarantee that readers of an original message will read all replies made to the message. This is a necessary consequence of the system's spam-proof nature.

However in some cases a user may wish to receive replies to a message, even from users they are not subscribed to. So a user has an option of accepting replies, perhaps for a limited time period, and specifying this in the header of their posted message. All replies will then be sent to that user's server (as well as the server of the user posting the reply), and the server will send headers of those replies to that user as if the user had subscribed to the replying users, and the user will have the option of reposting those replies (so that their subscribers can see them).

Any user enabling open replies to a posted message will be exposing themselves to a consequential spamming risk, and will not be exposing anyone other than themselves to that risk. It remains to be seen as to whether spammers will exploit this loophole. Given that Miski is a new system, spam control measures can be built into it to make abuse more difficult. For example a posting user can specify a puzzle that has to be solved with the answer inserted as a special header entry into the reply, and their server will only pass on replies that contain the correct answer.

I will emphasize that allowing open replies is not an essential part of Miski's operation, and the system can be used quite profitably without a user ever using this option.

Above I mentioned that it will not necessarily be desirable to map email addresses one-to-one with Miski addresses. This is because it is intrinsically desirable to publicly advertise Miski addresses, whereas to avoid spam (and other unwanted email), a user may prefer not to advertise their email address.

Miski URL's

Posted messages don't get read by anyone until someone makes a subscription. The simplest way to do this is for a posting user to put links on their web pages that advertise their existence as a Miski user and perhaps details of their subject space. These links should take the reading user to a place where they can subscribe to all or part of the posting user's subject space.

The main complication is this: a user subscribes by communicating with their own server, not with the server of the posting user. The posting user cannot know in advance the identity of the subscribing user's server. Conceivably the posting user's server could ask the user to fill in this detail in a form on a subscription web page, and then hand control over to the subscribing user's server with a URL containing the relevant details.

However the most convenient solution in the long run is to define a new web protocol, for example pcmsp:. It would then be up to the subscribing user's browser to look up the name of their server stored on their client computer, and form an appropriate URL to log on to a subscription page on their server.

A practical problem then arises about how to deal with browsers that do not have the pcmsp protocol handler already installed. Early users of Miski will have to advertise not only their own Miski ID, but also a web page that contains information about what Miski is and URL's for downloading required browser add-in's to handle the protocol. It is an unfortunate fact that most existing browsers do not handle unknown URL protocols in any sensible fashion (c.f. their handling of new plugin types).

Some Examples

At this point it is useful to give some examples, although I have yet to discuss the issues of standard header fields, delivery methods, and nitty gritty involving DNS.

The following scenario demonstrates the basic features of the system-

A user fred@example.com defines some subjects "Pets", "Animals", "Rabbits", "Pet Rabbits".
Fred also defines an inclusion relationship, with "Pets" and "Rabbits" both included in "Animals", and "Pet Rabbits" included in both "Pets" and "Rabbits". "Pet Rabbits" is implicitly included in "Animals".
Fred includes a link on his web page with a URL pcmsp:fred@example.com/en/Pet Rabbits
A user annette@example.net clicks on this link, and is brought to a subscription page on miski.example.net (which is stored by her pcmsp browser add-in as a configuration value) which contains details of Fred's descriptions derived from Fred's server miski.example.com. Annette decides to subscribe to Fred's subject "Rabbits".
Annette's server communicates to Fred's server that it has a user subscribed to this subject (it is not necessary to reveal the subscriber's identity, and if another user on Annette's server was already subscribed, no communication between servers would be necessary).
Fred posts a message "Feeding too much lettuce" with a subject "Pet Rabbits".
Because "Pet Rabbits" is included in "Rabbits", the header for this message is delivered from Fred's server to Annette's server.
Annette's server delivers the header to her in the manner that Annette has specified to her server (see below for details on delivery methods).
Annette reads the header, which will probably tell her the subject, poster id and subject.
Interested, she clicks the link to receive the message contents.
Her web browser retrieves the message contents from Fred's server.
She likes the message so much, she reposts it into her own subject "Feeding Pets".
User tom@example.org, already subscribed to Annette's subject "Feeding Pets", receives the header for this message (after his server has determined that he has not previously received that header).
Tom reads the message. He also clicks on a link in the delivered header which brings up a subscription page for Fred's subject page.
User jim@example.org also receives the header reposted by Annette. He decides that the message is so stupid, that he clicks on a link that brings up a subscription page for Annette, and deletes his subscription to her. miski.example.org takes note of this deletion, and if noone else is now subscribed to that subject of Annette, communicates this deletion to Annette's server.

Note: In "Languages and Character Sets" I suggest that it may be best for each subject in the posting user's subject space (which can be regarded in effect as a channel) to have an ASCII name which is used to refer to it in URL's, and for the language code and full description to be attributes of the channel defined by that ASCII name.

Subjects

Subjects are required to be self-contained. For example, it does not make any sense to declare that "Restaurants" is included in a subject "London". Rather the user should create a subject called "London Restaurants". Some would suggest a hierarchy, e.g. "London/Restaurants/Some particular restaurant". The disadvantage of this is that different users are then much less likely to use the same terms for the same subjects.

When a user reposts, they have to choose a subject from their own subject space for the message being reposted. An obvious default is to use the same subject if they already have it. In other cases the client or server software could remember default mappings for specific users, along the lines of: messages from jim@example.org with subject "Pet Rabbits" always get reposted into my subject "Rabbits".

In some cases users may wish to change an existing subject in their subject space to better match the names chosen by other users with similar interests. Given the notification-based nature of Miski, it is possible to define a special message type whose meaning is "This subject in my subject space is now renamed such and such". Receiving users can then choose to automatically (or automatically after confirmation) accept corresponding changes to their own subscription data.

Delivery Methods

Because each user interacts mainly with their own server (except when retrieving message contents), the interaction between user and server can be in a manner agreed between user and server. A user might choose to receive message headers in one of the following ways -

As HTML email, with links to message contents
Via a logon to a web site, which shows a page of all headers of messages received since the last logon.
Via a client that receives headers in real-time. The client could be an Java applet in a web page.

If enough users use real-time delivery then the Miski system can be used effectively as a moderated chat system.

Posting Methods

Again because user/server interaction is agreed upon by the user and the server, this can be done in different ways. However there will be some desire to include posting abilities into existing software, for example in a web-browser to allow a user to post a recommendation about a web page they are browsing, and this will require the existence of at least one standard method of formulating postings and sending them to a server.

Headers

Header lines need to come from a finite standard set, as they are processed in specific ways by the receiving user's server and client software.

Firstly there is a "core set" of header lines that define a message as originally posted (i.e. before any reposting), which should include -

Title
Poster id
Subject (from poster's subject space), including language code
(Optional) If the message is a recommendation for a URL, the recommended URL.
Message ID and/or URL pointing to message contents
(Optional) If replies are accepted, a ticket representing address to send replies to and time limit on those replies.
(Optional) If this message is a reply, a pointer to the original message
GMT date/time that message was posted

Two message headers will be considered to be identical if and only if the core set of headers is identical. A user will not be delivered a given message header more than once. A reposted message will contain additional headers -

The reposter's id
The reposter's subject

If a message is reposted more than once, reposters may optionally include or not include the id's and subjects of the "in between" reposters. The reason for this is that there is no way to enforce inclusion of other reposters since a reposter can always go to the original message and repost it directly.

Note that there are two possible ways to identify a message: by message id, and by URL. We would create a new URL scheme pcmsp-msg, to be used for example as pcmsp-msg:user@example.com/message_id, which would translate into an ordinary web URL for the contents of user@example.com's message with id "message_id". The advantage of a direct URL is that it can be interpreted by a browser that does not have any pcmsp specific protocol handlers installed, as these are only absolutely needed for making subscriptions, not for receiving messages nor for posting them. The advantage of a Miski-specific URL is that the browser can "know" that it is a pointer to a Miski message, and it can provide user options accordingly. The alternative is that user options have to be provided on the page that provides the headers (as delivered by the server), rather than on the message content page itself. Frames could be used so that both are visible at the same time.

Probably headers should be written in XML.

Server-to-Server protocols

As stated above, client to server interactions do not have to be standardised, although at least the option of using standardised communications is still desirable. (Compare to email: POP3 is standard, but if you use a web-based email service then the interface is server-specific, layered on top of HTTP and HTML.)

On the other hand, server to server interactions have to be completely standardised. Specific aspects that have to be standardised include -

Resolution of user names to servers. Something similar to DNS email exchange records is required here. Given that many domain name owners do not have direct control over their DNS entries, a sort of poor man's DNS via the Web could be considered. For example, to resolve "example.com", look up a file http://www.example.com/poor-mans-dns.xml, which has contents -
```
<services>
<service name="pcmsp">miski.server-for-example.com</service>
</services>
```
Retrieving a subject space for a user. This could be provided as an XML file.
Making or cancelling a subscription to a given subject in a given user's subject space.
Delivering a message header for a newly posted message.

Why Miski is Better than Any Existing System

Miski serves a specific purpose, which is to make information available to anyone who is interested to read it, in a way that ties the information to its originator and to anyone who recommends it to others. In a sense it allows users to trade attention for information. A poster or reposter provides quality information to their subscribers, and in return receives attention from those subscribers.

Miski does not satisfy other purposes, such as point-to-point communication, or collaboration within a private group.

Without the reposting feature, Miski would just be a standardised form of push technology. While that is not a bad thing in itself, reposting has the potential to turn Miski into something more. Basically it provides a highly optimised and frictionless "word of mouth" system. People are already using the Internet to spread information by "word of mouth", but the catch is that they are not using systems that are optimised for this purpose. If we see something we like, we can send an email about it to someone we know who might be interested in it. But they might not be that interested in it, or they might already know about it, and in either case you are adding to the clutter in their mailbox.

In some ways Miski is like Usenet, but with the difference that Miski puts the poster's identity first and the subject second. Advantages of this are -

Each posting user is free to define their own subjects without having to fit into anyone else's idea of what subjects should be defined, and without having to engage in a political exercise to get other users to agree to the existence of a subject.
Moderation is disconnected from subject definition. A Usenet group either is or isn't moderated, and if it is moderated then it is moderated by one specific authority. Miski lets anyone moderate any message.
Spam can be avoided completely in Miski.
Server-to-server traffic in Miski is proportional to what users actually want to read, and to what they actually write and/or read themselves.

Email lists are often used either as a form of push technology, or as a private bulletin board. As is the case for a Usenet group, a bulletin board must always have one specific moderating authority. A particular problem with email lists is that the subscribing user is at the mercy of the list server. For example the user has to remember how to unsubscribe, and their is no guarantee that any particular server will take any notice of any unsubscribe command.

Although a Miski server can still behave badly, a user is protected from the bad behaviour of any Miski server other than those serving users that they are subscribed to. If server X sends messages from user A@X to subscribers to user B@X, this will annoy subscribers to user B, and will therefore disadvantage user B, who will be able to hold the administrator of server X responsible for such behaviour. If server X sends messages to user A on server Y, and A@Y is not subscribed to anyone on X, then server Y will not send the messages on to user A, and furthermore may log the unexpected messages, perhaps causing the administrator of Y to complain to X if bad messages continue to arrive. In effect all participants in the system are easily held accountable for their actions.

Miski can be regarded as a referral system, in that reposting users effectively act as intermediaries between posters and readers. There are various rating systems and collaboration filters operating on the Internet, which use preferences of some users to make recommendations to other users. These use statistical methods. The advantage of Miski is that it is quite explicit. Each reposting user controls exactly how their preferences translates into recommendations, and each reposted message credits the reposting user that resposted it. With a collaboration filter, the users that contributed to a recommendation receive no direct credit for it, and have no way to receive credit for the usefulness of recommendations that they helped form. Therefore they have less motive to make really useful recommendations. Collaboration filters and rating systems also have a tendency to highlight those sites that everyone already knows about. With Miski there is no entry barrier to receiving recognition. Just one user posting a recommendation for a new web site could be enough to start an exponential cascade of reposts.

Also see "How Fast is the Internet?"

What Needs to be Done ?

Almost everything. Most of what I have done is design work, which you have just read in this paper. A simple to-do list is -

Flesh out details of design,
Write corresponding internet draft,
Write client software for popular web-browsers to handle pcmsp and perhaps pcmsp-msg prototocls,
Write server software,
Run trial servers.