Miski: A White Paper

Philip Dorrell, 3 June 2000.

What is Miski ?

Is it a bulletin board ? Is it Usenet ? Is it email ? Is it email lists ? Is it push technology ? Miski is the code word for a new technology that has elements of all these existing technologies, but is actually not the same as any of them. A long-winded technical name for it is Poster-Centric Message Subscription Protocol, or PCMSP for short.

If there is one idea that underlies Miski, it is the following -

The best determinant of the likely quality of information that you have not yet received is the source of that information.

The result of using this as a basic design principle is that Miski, in its normal mode of operation, is intrinsically spam-proof.

A brief summary of the features that define the Miski system is as follows -

There is more than one way that such a system could be implemented, however the following arrangement gives the best scalability, accountability and a naturally mapping to the domain name system:

Reposting can be regarded as a means of moderation, in that any user can effectively moderate any message in the sense of making it available to users who have subscribed to them. It can also be regarded as a form of deep linking, and all users posting messages via Miski implicitly give their permission for their messages to be linked to in this fashion. Message contents remain under the original posting user's control, and can if necessary be edited or even deleted.

Spammability

A problem with many existing communication systems on the Internet is that they assume good intentions on the part of all users. This is not a realistic assumption for a system that can be used by hundreds or even thousands of millions of users subject only to their ability to pay communications charges. The worst sort of abuse is called spam, which consists of blatantly irrelevant content posted to a discussion system or sent to an individual user by some means, or even just left in a place where a search engine will find it.

The retro-fitted solution is to actively identify and delete the most blatant examples of spam. But this type of spam is only the most obvious symptom of a more general problem, which is that there is no way for an open system to guarantee even a reasonable probability of quality in content, especially if one takes into account the fact that quality is often very subjective.

The drastic cure imposed by the Miski system is that a subscriber only receives messages posted by users that they have subscribed to. This is the poster-centric aspect of the system. Reposting is a mechanism that makes it possible for users to receive messages originally posted by users that they are not subscribed to, and turns Miski from a somewhat closed system into a system that is as open as a system can be without being spammable.

Above I qualified the spam-resistant nature of Miski by stating that it only applies to its normal mode of operation. There are two main circumstances where users can choose to expose themselves to spam to a controlled extent if they wish to.

The first is that users will inevitability wish to search Miski servers for other users advertising themselves as interested in given topics. This will be vulnerable to spamming in the same way that web search engines are subject to spamming, and the risk is the same.

The second has to do with replies. Most discussion systems allow messages to be replied to. Such a facility will be included in Miski. The header of a reply will contain a pointer to the original message. Thus anyone reading a reply can follow the link to the original message, even if they were not subscribed to the sender of the original message, and did not receive it by way of repost. Or to put it another way, replying to a message implicitly reposts it. However there is no guarantee that readers of an original message will read all replies made to the message. This is a necessary consequence of the system's spam-proof nature.

However in some cases a user may wish to receive replies to a message, even from users they are not subscribed to. So a user has an option of accepting replies, perhaps for a limited time period, and specifying this in the header of their posted message. All replies will then be sent to that user's server (as well as the server of the user posting the reply), and the server will send headers of those replies to that user as if the user had subscribed to the replying users, and the user will have the option of reposting those replies (so that their subscribers can see them).

Any user enabling open replies to a posted message will be exposing themselves to a consequential spamming risk, and will not be exposing anyone other than themselves to that risk. It remains to be seen as to whether spammers will exploit this loophole. Given that Miski is a new system, spam control measures can be built into it to make abuse more difficult. For example a posting user can specify a puzzle that has to be solved with the answer inserted as a special header entry into the reply, and their server will only pass on replies that contain the correct answer.

I will emphasize that allowing open replies is not an essential part of Miski's operation, and the system can be used quite profitably without a user ever using this option.

Above I mentioned that it will not necessarily be desirable to map email addresses one-to-one with Miski addresses. This is because it is intrinsically desirable to publicly advertise Miski addresses, whereas to avoid spam (and other unwanted email), a user may prefer not to advertise their email address.

Miski URL's

Posted messages don't get read by anyone until someone makes a subscription. The simplest way to do this is for a posting user to put links on their web pages that advertise their existence as a Miski user and perhaps details of their subject space. These links should take the reading user to a place where they can subscribe to all or part of the posting user's subject space.

The main complication is this: a user subscribes by communicating with their own server, not with the server of the posting user. The posting user cannot know in advance the identity of the subscribing user's server. Conceivably the posting user's server could ask the user to fill in this detail in a form on a subscription web page, and then hand control over to the subscribing user's server with a URL containing the relevant details.

However the most convenient solution in the long run is to define a new web protocol, for example pcmsp:. It would then be up to the subscribing user's browser to look up the name of their server stored on their client computer, and form an appropriate URL to log on to a subscription page on their server.

A practical problem then arises about how to deal with browsers that do not have the pcmsp protocol handler already installed. Early users of Miski will have to advertise not only their own Miski ID, but also a web page that contains information about what Miski is and URL's for downloading required browser add-in's to handle the protocol. It is an unfortunate fact that most existing browsers do not handle unknown URL protocols in any sensible fashion (c.f. their handling of new plugin types).

Some Examples

At this point it is useful to give some examples, although I have yet to discuss the issues of standard header fields, delivery methods, and nitty gritty involving DNS.

The following scenario demonstrates the basic features of the system-

Note: In "Languages and Character Sets" I suggest that it may be best for each subject in the posting user's subject space (which can be regarded in effect as a channel) to have an ASCII name which is used to refer to it in URL's, and for the language code and full description to be attributes of the channel defined by that ASCII name.

Subjects

Subjects are required to be self-contained. For example, it does not make any sense to declare that "Restaurants" is included in a subject "London". Rather the user should create a subject called "London Restaurants". Some would suggest a hierarchy, e.g. "London/Restaurants/Some particular restaurant". The disadvantage of this is that different users are then much less likely to use the same terms for the same subjects.

When a user reposts, they have to choose a subject from their own subject space for the message being reposted. An obvious default is to use the same subject if they already have it. In other cases the client or server software could remember default mappings for specific users, along the lines of: messages from jim@example.org with subject "Pet Rabbits" always get reposted into my subject "Rabbits".

In some cases users may wish to change an existing subject in their subject space to better match the names chosen by other users with similar interests. Given the notification-based nature of Miski, it is possible to define a special message type whose meaning is "This subject in my subject space is now renamed such and such". Receiving users can then choose to automatically (or automatically after confirmation) accept corresponding changes to their own subscription data.

Delivery Methods

Because each user interacts mainly with their own server (except when retrieving message contents), the interaction between user and server can be in a manner agreed between user and server. A user might choose to receive message headers in one of the following ways -

If enough users use real-time delivery then the Miski system can be used effectively as a moderated chat system.

Posting Methods

Again because user/server interaction is agreed upon by the user and the server, this can be done in different ways. However there will be some desire to include posting abilities into existing software, for example in a web-browser to allow a user to post a recommendation about a web page they are browsing, and this will require the existence of at least one standard method of formulating postings and sending them to a server.

Headers

Header lines need to come from a finite standard set, as they are processed in specific ways by the receiving user's server and client software.

Firstly there is a "core set" of header lines that define a message as originally posted (i.e. before any reposting), which should include -

Two message headers will be considered to be identical if and only if the core set of headers is identical. A user will not be delivered a given message header more than once. A reposted message will contain additional headers -

If a message is reposted more than once, reposters may optionally include or not include the id's and subjects of the "in between" reposters. The reason for this is that there is no way to enforce inclusion of other reposters since a reposter can always go to the original message and repost it directly.

Note that there are two possible ways to identify a message: by message id, and by URL. We would create a new URL scheme pcmsp-msg, to be used for example as pcmsp-msg:user@example.com/message_id, which would translate into an ordinary web URL for the contents of user@example.com's message with id "message_id". The advantage of a direct URL is that it can be interpreted by a browser that does not have any pcmsp specific protocol handlers installed, as these are only absolutely needed for making subscriptions, not for receiving messages nor for posting them. The advantage of a Miski-specific URL is that the browser can "know" that it is a pointer to a Miski message, and it can provide user options accordingly. The alternative is that user options have to be provided on the page that provides the headers (as delivered by the server), rather than on the message content page itself. Frames could be used so that both are visible at the same time.

Probably headers should be written in XML.

Server-to-Server protocols

As stated above, client to server interactions do not have to be standardised, although at least the option of using standardised communications is still desirable. (Compare to email: POP3 is standard, but if you use a web-based email service then the interface is server-specific, layered on top of HTTP and HTML.)

On the other hand, server to server interactions have to be completely standardised. Specific aspects that have to be standardised include -

Why Miski is Better than Any Existing System

Miski serves a specific purpose, which is to make information available to anyone who is interested to read it, in a way that ties the information to its originator and to anyone who recommends it to others. In a sense it allows users to trade attention for information. A poster or reposter provides quality information to their subscribers, and in return receives attention from those subscribers.

Miski does not satisfy other purposes, such as point-to-point communication, or collaboration within a private group.

Without the reposting feature, Miski would just be a standardised form of push technology. While that is not a bad thing in itself, reposting has the potential to turn Miski into something more. Basically it provides a highly optimised and frictionless "word of mouth" system. People are already using the Internet to spread information by "word of mouth", but the catch is that they are not using systems that are optimised for this purpose. If we see something we like, we can send an email about it to someone we know who might be interested in it. But they might not be that interested in it, or they might already know about it, and in either case you are adding to the clutter in their mailbox.

In some ways Miski is like Usenet, but with the difference that Miski puts the poster's identity first and the subject second. Advantages of this are -

Email lists are often used either as a form of push technology, or as a private bulletin board. As is the case for a Usenet group, a bulletin board must always have one specific moderating authority. A particular problem with email lists is that the subscribing user is at the mercy of the list server. For example the user has to remember how to unsubscribe, and their is no guarantee that any particular server will take any notice of any unsubscribe command.

Although a Miski server can still behave badly, a user is protected from the bad behaviour of any Miski server other than those serving users that they are subscribed to. If server X sends messages from user A@X to subscribers to user B@X, this will annoy subscribers to user B, and will therefore disadvantage user B, who will be able to hold the administrator of server X responsible for such behaviour. If server X sends messages to user A on server Y, and A@Y is not subscribed to anyone on X, then server Y will not send the messages on to user A, and furthermore may log the unexpected messages, perhaps causing the administrator of Y to complain to X if bad messages continue to arrive. In effect all participants in the system are easily held accountable for their actions.

Miski can be regarded as a referral system, in that reposting users effectively act as intermediaries between posters and readers. There are various rating systems and collaboration filters operating on the Internet, which use preferences of some users to make recommendations to other users. These use statistical methods. The advantage of Miski is that it is quite explicit. Each reposting user controls exactly how their preferences translates into recommendations, and each reposted message credits the reposting user that resposted it. With a collaboration filter, the users that contributed to a recommendation receive no direct credit for it, and have no way to receive credit for the usefulness of recommendations that they helped form. Therefore they have less motive to make really useful recommendations. Collaboration filters and rating systems also have a tendency to highlight those sites that everyone already knows about. With Miski there is no entry barrier to receiving recognition. Just one user posting a recommendation for a new web site could be enough to start an exponential cascade of reposts.

Also see "How Fast is the Internet?"

What Needs to be Done ?

Almost everything. Most of what I have done is design work, which you have just read in this paper. A simple to-do list is -