OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Lars Marius Garshol

Analysis of published subject documentation

Proposed by: Lars Marius Garshol

This is a private proposal, and is not in any way endorsed by the TC. It is also quite unpolished, and needs considerable further refinement in order to be useful as part of or foundation for any TC recommendation.

PSD contents

It seems that published subject documentation may contain the types of information described below. Note that this does not in any way mean that they must, should, or will. These concepts are described here in order to help the TC structure its debates. The TC (and the author of this note) may well decide that some of these forms of information are unnecessary, harmful, or at least that they should be optional.

Published subject definitions

These are definitions of a published subject optimized for a human audience. Their purpose is to allow a human to establish what the identity of a published subject is. One example can found at http://psi.ontopia.net/ontopia/#1.

Published subject set description

This is a description of the set of published subject, intended to be read by human beings. Its purpose is to explain to a human reader what the published subject set is and contains. Description is here to be taken in a wide sense; it may include human-readable metadata about the published subject documentation. An example can be found at http://psi.ontopia.net/ontopia/.

Core published subject assertions

These are formal assertions made about the published subjects in a form optimized for machine consumption. That the assertions are "core" implies that they are part of the definition of the subjects, and should be accepted by anyone who is to use the subjects. The purpose of publishing them is to simplify the use of the published subjects by allowing users to import the assertions into their own systems. Examples are: this subject has this name, this subject is an instance of this class, etc.

An example can be found at http://www.oasis-open.org/committees/geolang/docs/language.xtm. (Note that this example contains only published subject assertions.)

Optional published subject assertions

These are like the core published subject assertions, except that they are not part of the definition of the published subjects, and may therefore be controversial. The purpose of publishing them separately from the core assertions is to make it easier for users to choose which assertions to use, and which to ignore.

Published subject metadata

This is formal assertions made about the definition of a particular published subject. Examples are: this definition was last updated on this date, this definition is written in this language, etc. No real-world examples of such metadata are known.

Published subject documentation metadata

This is formal assertions made about the published subject documentation as a whole. Examples are: this definition was last updated on this date, this definition is written in this language, etc. An example can be found at http://psi.ontopia.net/ontopia/ontopia.psi. (Note that this example contains both published subject assertions and published subject documentation metadata.)

A weakness of this terminology is that the terms are very similar to one another, making it difficult to tell them apart.

PSD structure

The above analysis of PSD contents makes it possible to discuss what the structure of PSDs might be in practice. The sections below describe a number of possible approaches.

Two resources

In this approach two resources would be published. This approach is best suited for published subject sets that don't contain too many subjects. An example of a PSD following this approach is http://psi.ontopia.net/ontopia/.

An entry resource, which contains the published subject set description and the published subject definitions. Suitable formats for this resource are: HTML and XHTML.
A resource containing the core published subject assertions and the published subject documentation metadata. Suitable formats are: RDF/XML and XTM.

It is of course possible to work the published subject documentation metadata into the entry resource, through the use of metadata encoded in HTML using META elements. Whether this is done or not is for the publisher to choose.

Single human-oriented resource

In this approach the published subjects are defined using nothing more than a single human-oriented resource, a suitable format for which would be HTML or XHTML. This resource would then contain the published subject definitions, the published subject documentation description, and possibly also the published subject documentation metadata.

Single machine-oriented resource

In this approach the published subjects are defined using nothing more than a single machine-oriented resource, a suitable format for which would be RDF/XML or XTM. This resource would then contain the published subject definitions, the published subject documentation description, the core published subject assertions, and the published subject documentation metadata.

Fine-grained resources

This approach is suitable for publishing large collections of published subjects, and would be separated into the following resources:

An entry resource which contains the published subject set description, and possibly the published subject documentation metadata. Suitable formats: HTML or XHTML.
A resource for each published subject which contains the published subject definition, and possibly also the published subject metadata. Suitable formats: HTML or XHTML.
A resource containing the core published subject assertions and possibly the published subject documentation metadata. Suitable formats: RDF/XML or XTM.
One or more resources containing the optional published subject assertions. Suitable formats: RDF/XML or XTM.

Issues raised

A key issue that is clarified by this proposal is: what are the published subject indicators? Should it be the published subject definitions, or the topic elements in the core published subject assertions? Both approaches are fully possible.

Do we want to recommend one of these approaches, several of the approaches, or to leave the choice entirely in the hands of the publishers? If we leave the decision of PSD structure to the publishers, what do we recommend?

Lars Marius Garshol, 2002-05-01.