OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > Documentation of Published Subjects

 

OASIS Topic Maps Published Subjects TC Deliverables
1. Documentation of Published Subjects - Requirements and Recommendations

Version 0.3 - updated February 20, 2002
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant



Status of this document : Working Draft

This recommendation addresses the "shall, should, may" of :
Documentation of Published Subjects - TC Process Requirements




1 - Statement of Purpose

The OASIS Topic Maps Published Subjects Technical Committee has been set forth to help application of Topic Maps specification ISO 13250, by providing recommendations for documentation, management and use of published subjects. The general purpose is that topic maps interoperability needs non-ambiguous definition of subjects (
represented by topics), that should be provided by stable resources, made available on-line through trustable publication process.

Those resources, organised in published subject documentation sets, will provide
both published subject indicators (human-understandable non-ambiguous definition of subjects) and published subject identifiers (stable URIs fit for computer processing, topic maps interoperability and merging, and many other foreseeable semantic applications).

The purpose of this document is to provide recommendations for the structure and content of published subject documentation sets. Those recommendations are aimed at publishers of ontologies, classifications, taxonomies, thesaurus, registries, catalogues, data bases ... to provide those publishers with efficient ways to make their legacy available as published subjects documentation, and therefore usable by topic maps and other semantic applications.

2 - A gentle introduction to Published Subjects

A main and original feature of topic maps is that they they deal with subjects. A subject can be an unique individual object, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style" ...
In a nutshell, a subject can be anything deserving to be identified, named, represented and generally talked about - in short, whatever can be a subject of conversation.

  • A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.

How do topic maps deal with a subject? First they represent a subject formally as an abstract "topic". In XTM documents, a topic is represented by a <topic> XML element. A topic should represent an unique, well-defined, non-ambiguous subject. So far, so good, at least in the mind of a single topic map author. But topic maps applications dream of inter-operability. That means that topic maps authors, users, and computer applications dealing with them, must have ways to know if two or more topics in the same or different topic maps represent the same subject.

How can that be achieved? A topic map author can indicate what is the subject of a topic by referring to a document, or any other kind of resource, where the subject appears to be defined in a proper and non-ambiguous way. Such a resource will therefore be considered by the topic map author as a subject indicator. Provided with this resource, an human being will be able, hopefully, to know what subject this topic represents.

  • A subject indicator is a resource that is referred to by the topic map author to provide an unambiguous indication of the identity of a subject. Any resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.

Since topic maps live in the Web universe, the subject indicator has to be an addressable (network-retrievable) resource. The reference to the subject indicator will therefore use some URI, which will both address the subject indicator and identify the subject. Computers applications will of course be happy to handle this subject identifier, since two topics with the same subject identifier clearly refer to the same subject indicator, and therefore represent the same subject.

  • A subject identifier is an URI used by a topic map author to identify and refer to a subject indicator.

Unfortunately, the whole above scenario is too simple to be sustainable. The subject indicators and subject identifiers defined only from the topic map author's end, are likely to be untrustable and unstable. URIs and the resources they address are moving targets in the Web universe. The publishers of resources used as subject indicators might not even be aware of it, and are likely to leave topic maps authors with meaningless identifiers and indicators, if any indicator at all, without previous notice.

Here the publishers enter in the loop. If some publishers are aware of the whole problem, and want to provide topic maps applications with stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers in a stable way, and declare their intention to maintain them stable and trustable for topic maps and other applications. At that point, the topic maps authors are provided with published subjects, defined in published subject documentation sets, coming along with published subject indicators and published subject identifiers. They will use them as before, but the whole scenario will become really sustainable.

  • A published subject is a subject for which there exists at least one published subject indicator.
  • A published subject indicator is a subject indicator that is published and maintained at an advertised address for the purpose of facilitating topic map interchange and mergeability.
  • A published subject identifier is the canonical URI of a published subject indicator, chosen and declared by its publisher as the URI to be used within topic maps to identify the published subject.
  • A published subject documentation set is the complete set of documentation about a set of published subject indicators and identifiers, as published by its publisher.

The topic maps litterature has coined for over a year the acronym "PSI". Note that it can expand both in "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking at humans (the indicator), and one looking at computers (the identifier).
Like Janus Bifrons over Roman doors, PSIs are warrants of a good communication between two universes ...

3 - Glossary

The following terms and concepts will be used in this document and further TC recommendations. Some of them are already defined and used by ISO 13250. Nevertheless, the TC proposes some modifications to clarify some of them and their relationships with new ones, and will send those proposals to ISO JTC1/SW34 for relevant revision and extension of ISO 13250 terminology. Both current ISO 13250 definition and PubSubj TC proposal are given when necessary.

"Publisher" is used throughout in the sense defined in Dublin Core metadata (dc:publisher)
"Resource" is used throughout in the sense of "network-retrievable resource" (IETF) or "addressable resource" (ISO 13250)

  • subject

    as defined by ISO 13250 XTM
    A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.
  • subject indicator

    as defined by ISO 13250 XTM
    A resource that is intended by the topic map author to provide a positive, unambiguous indication of the identity of a subject.

    definition proposal
    A resource that is referred to by the topic map author to provide an unambiguous indication of the identity of a subject. Any resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.
    See "published subject indicator"
  • subject identifier

    definition proposal

    An URI used by a topic map author to identify and refer to a subject indicator. When a subject identifier is declared by a publisher, in a published subject documentation set, to identify a published subject indicator, it is called a published subject identifier.
  • published subject

    definition proposal

    A subject for which there exists at least one published subject indicator.
  • published subject indicator

    as defined by ISO 13250 XTM
    A subject indicator that is published and maintained at an advertised address for the purpose of facilitating topic map interchange and mergeability.
  • published subject identifier

    definition proposal
    The canonical URI of a published subject indicator, chosen and declared by its publisher as the URI to be used within topic maps to identify the published subject.
  • published subject documentation set

    definition proposal

    The complete set of documentation about a set of published subject indicators and identifiers, as published by its publisher.

4 - Recommendations for published subjects documentation

Considering the considerable legacy of taxonomies, classifications, ontologies, data bases and catalogues likely to be made available as published subject documentation sets, their publishers should not be constrained more than necessary to use a specific structure, syntax or language. Therefore, the present recommendation does not try to enforce upon publishers either an unique standard structure for published subjects documentation, or a specific syntax for subject definition resource, or for subject indicator reference URIs. Nevertheless, it will suggest best practices for each existing relevant syntax.

Besides access to a set of PSIs, a published subject documentation set should include at least the following informations, ensuring their efficient and trustable use.

  • Statement of purpose
  • Publisher and documentation metadata
  • Statement of documentation structure

4.1 - Statement of purpose

A published subject documentation set shall include a formal statement from its publisher, expliciting its conformance to this recommendation, and its intention to maintain the documentation trustable, and the PSIs stable.

4.2 - Publisher and documentation metadata

A published subject documentation set shall include the following (Dublin Core) metadata.

  • Identity of the publisher (dc:publisher)
  • Identity of the documentation set (dc:identifier)
  • Format (dc:format)
  • Source of documentation (dc:source)
  • Creator (dc:creator) and contributors (dc:contributor)

    The above identities should be defined themselves as PSIs

  • Title of the documentation (dc:title)
  • Language of publication (dc:language)
  • Date of publication or validation (dc:date)
  • Possible restrictions of use (dc:rights)

In complement to those metadata, the documentation may include recommendations for use, and list of registered users.

4.3 - Statement of documentation structure

4.3.1 - A published subject documentation set should provide explicit information on the syntax used for its published subject identifiers. This syntax should as far as possible follow a consistent schema throughout the documentation, e.g. an uniform namespace or query string structure.

4.3.2 - Throughout a published subject documentation set, the published subject indicators should follow a consistent and uniform structure (DTD, schema or some equivalent structure definition), allowing unambiguous understanding of their content. Such uniformity could also enable their parsing and processing by topic maps engines, search engines, intelligent agents and any foreseeable kind of semantic web application.

4.4 - Information provided by published subject indicators

A published subject indicator shall provide, following a formal structure as defined in 4.3.2, explicit information items establishing the published subject identity, that should include at least the following elements.

  • Identifier (dc:identifier)
    The canonical URI that is to be used as the published subject identifier.
  • Name (dc:subject)
    A name given to the subject.
  • Type (dc:type)
    A class of which the subject is an instance.
  • Description (dc:description)
    Can be text, image or any kind of relevant resource describing the subject in an human-understandable way.
  • Equivalence
    Reference to equivalent published subject indicators in other published subject documentation sets.

5 - Best practices for published subject documentation structure

To be delivered - this part will provide examples of, or references to, published subject documentation sets conformant to the present recommendation, in various relevant formats, such as XTM, RDF or XHTML.