OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > Documentation of Published Subjects

 

OASIS Topic Maps Published Subjects TC Deliverables
1. Documentation of Published Subjects - Requirements and Recommendations

Version 0.4 - updated March 2, 2002
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant



Status of this document : Working Draft

This recommendation addresses the "shall, should, may" of :
Documentation of Published Subjects - TC Process Requirements




1 - Statement of Purpose

The OASIS Topic Maps Published Subjects Technical Committee has been set forth to help application of Topic Maps specification ISO 13250, by providing recommendations for documentation, management and use of published subjects. The general purpose is that topic maps interoperability needs non-ambiguous definition of subjects (
represented by topics), that should be provided by stable resources, made available on-line through trustable publication process.

Those resources, organised in published subject documentation sets, will provide
both published subject indicators (human-understandable non-ambiguous definition of subjects) and published subject identifiers (stable URIs fit for computer processing, topic maps interoperability and merging, and many other foreseeable semantic applications).

The purpose of this document is to provide recommendations for the content and structure of published subject documentation sets. Those recommendations are aimed at publishers of ontologies, classifications, taxonomies, thesaurus, registries, catalogues, data bases ... to provide those publishers with efficient ways to make their legacy available as published subjects documentation, and therefore usable by topic maps and other semantic applications.

2 - A gentle introduction to Published Subjects

A main and original feature of topic maps is that they they deal with subjects. A subject can be an unique individual object, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style" ...
In a nutshell, a subject can be anything deserving to be identified, named, represented and generally talked about - in short, whatever can be a subject of conversation.

  • A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.

How do topic maps deal with a subject? First they represent a subject formally as an abstract "topic". In XTM documents, a topic is represented by a <topic> XML element. A topic should represent an unique, well-defined, non-ambiguous subject. So far, so good, at least in the mind of a single topic map author. But topic maps applications dream of inter-operability. That means that topic maps authors, users, and computer applications dealing with them, must have ways to know if two or more topics in the same or different topic maps represent the same subject.

How can that be achieved? A topic map author can indicate what is the subject of a topic by referring to a document, or any other kind of resource, where the subject appears to be defined in a proper and non-ambiguous way. Such a resource will therefore be considered by the topic map author as a subject indicator. Provided with this resource, an human being will be able, hopefully, to know what subject this topic represents.

  • A subject indicator is a resource that is referred to by the topic map author to provide an unambiguous indication of the identity of a subject. Any resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.

Since topic maps live in the Web universe, the subject indicator has to be an addressable (network-retrievable) resource. The reference to the subject indicator will therefore use some URI, which will both address the subject indicator and identify the subject. Computers applications will of course be happy to handle this subject identifier, since two topics with the same subject identifier clearly refer to the same subject indicator, and therefore represent the same subject.

  • A subject identifier is an URI used by a topic map author to identify and refer to a subject indicator.

Unfortunately, the whole above scenario is too simple to be sustainable. The subject indicators and subject identifiers defined only from the topic map author's end, are likely to be untrustable and unstable. URIs and the resources they address are moving targets in the Web universe. The publishers of resources used as subject indicators might not even be aware of it, and are likely to leave topic maps authors with meaningless identifiers and indicators, if any indicator at all, without previous notice.

Here the publishers enter in the loop. If some publishers are aware of the whole problem, and want to provide topic maps applications with stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers in a stable way, and declare their intention to maintain them stable and trustable for topic maps and other applications. At that point, the topic maps authors are provided with published subjects, defined in published subject documentation sets, coming along with published subject indicators and published subject identifiers. They will use them as before, but the whole scenario will become really sustainable.

  • A published subject is a subject for which there exists at least one published subject indicator.
  • A published subject indicator is a subject indicator that is published and maintained at an advertised address for the purpose of facilitating topic map interchange and mergeability.
  • A published subject identifier is the canonical URI of a published subject indicator, chosen and declared by its publisher as the URI to be used within topic maps to identify the published subject.
  • A published subject documentation set - PS DocSet - is the complete set of documentation about a set of published subject indicators and identifiers, as published by its publisher.

The topic maps litterature has coined for over a year the acronym "PSI". Note that it can expand both in "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking at humans (the indicator), and one looking at computers (the identifier).
Like Janus Bifrons over Roman doors, PSIs are warrants of a good communication between two universes ...

3 - Glossary

The following terms and concepts will be used in this document and further TC recommendations. Some of them are already defined and used by ISO 13250. Nevertheless, the TC proposes some modifications to clarify some of them and their relationships with new ones, and will send those proposals to ISO JTC1/SW34 for relevant revision and extension of ISO 13250 terminology. Both current ISO 13250 definition and PubSubj TC proposal are given when necessary.

"Publisher" is used throughout in the sense defined in Dublin Core metadata (dc:publisher)
"Resource" is used throughout in the sense of "network-retrievable resource" (IETF) or "addressable resource" (ISO 13250)

  • subject

    as defined by ISO 13250 XTM
    A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.
  • subject indicator

    as defined by ISO 13250 XTM
    A resource that is intended by the topic map author to provide a positive, unambiguous indication of the identity of a subject.

    definition proposal
    A resource that is referred to by the topic map author to provide an unambiguous indication of the identity of a subject. Any resource can become a subject indicator by being referred to as such from within some topic map, whether or not it was intended by its publisher to be a subject indicator.
    See "published subject indicator"
  • subject identifier

    definition proposal

    An URI used by a topic map author to identify and refer to a subject indicator. When a subject identifier is declared by a publisher, in a published subject documentation set, to identify a published subject indicator, it is called a published subject identifier.
  • published subject

    definition proposal

    A subject for which there exists at least one published subject indicator.
  • published subject indicator - PS Indicator

    as defined by ISO 13250 XTM
    A subject indicator that is published and maintained at an advertised address for the purpose of facilitating topic map interchange and mergeability.
  • published subject identifier - PS Identifier

    definition proposal
    The canonical URI of a published subject indicator, chosen and declared by its publisher as the URI to be used within topic maps to identify the published subject.
  • published subject documentation set - PS DocSet

    definition proposal

    The complete set of documentation about a set of published subject indicators and identifiers, as published by its publisher.

4 - Requirements for PS DocSet content

A PS DocSet shall contain at least the following mandatory elements:

  • Statement of Purpose
  • Statement of PS DocSet structure and format
  • PS DocSet metadata
  • Homogeneous PSI set

4.1 - Statement of Purpose

A PS DocSet shall include the following formal statement from its publisher, expliciting its conformance to this recommendation, and its intention to maintain the documentation trustable, and its URIs stable.

This namespace "http://psi.organization-foo/bar/" is dedicated by its publisher, "organization-foo"
to host a permanent and stable Published Subject Documentation Set,
in conformance with Requirements and Recommendations of OASIS Topic Maps Subjects Technical Committee:
http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm

4.2 - Statement of PS DocSet structure and format

4.2.1 - A single URI shall be used both to identify the PS DocSet, and to provide a namespace for its PS Identifiers.
All PS DocSet elements shall be identified by URIs belonging to that namespace.

Remark: The wording of the above is certainly to improve.
What is intended is that if the PS DocSet is identified by http://psi.organization-foo/bar/
... then all PSIs in the PS DocSet shall be identified by URIs like
http://psi.organization-foo/bar/unameit

4.2.2 - A PS DocSet shall provide explicit declaration of its structure, format and syntax.

  • Syntax used for all PS Identifiers shall follow a consistent and declared format throughout the PS DocSet.
  • Syntax and structure used for all PS Indicators shall be uniform and declared explicitly, by reference to some XML DTD, Schema or any other equivalent structure definition.

4.3 - PS DocSet metadata

A PS DocSet shall include the following mandatory Dublin Core metadata.

  • Type (dc:type)
    The declaration of the resource as a PS DocSet, by reference to a core PSI for PS DocSet
  • Identifier (dc:identifier)
    The canonical PS DocSet URI namespace
  • Subject (dc:subject)
    A declaration of the general PS DocSet subject, domain or scope
  • Publisher (dc:publisher)
    The publisher is the legal authority appearing in the Statement of Purpose
  • Language (dc:language)
    The default language of publication used by PS Indicators
  • Format (dc:format)
    The format, language or syntax in which the PS Indicators are expressed
  • Date (dc:date)
    Date of publication, latest validation or revision

It may also include the other - optional - Dublin Core metadata

  • Title (dc:title)
    An usage name or title for the PS DocSet
  • Description (dc:description)
    Complementary relevant information not contained in (dc:subject) element
  • Creator (dc:creator)
  • Contributor (dc:contributor)
  • Conditions of use (dc:rights)
  • Source (dc:source)
  • Coverage (dc:coverage)

In complement to those metadata, the PS DocSet may include various recommendations for use, list of registered users, or any other relevant information item.

4.4 - Homogeneous PSI Set

4.4.1 - Every PS Indicator in a PS DocSet shall be identified by, and retrievable through an unique canonical URI.
This canonical URI is the corresponding PS Identifier,
uniquely defined in the PS DocSet namespace.

4.4.2 - Troughout a PS DocSet, all PS Indicators shall follow the same formal structure, as declared in 4.2.2

4.4.3 - A PS Indicator shall include at least the following Dublin Core elements:

  • Identifier (dc:identifier)
    The canonical URI that shall be used as the PS Identifier.
    This URI shall be unique, and defined in the PS DocSet namespace.
  • Language (dc:language)
    Language in which subject, type, and description are expressed - if different of the default PS DocSet language.
  • Subject (dc:subject)
    A name given to the subject that is identified by the PS Identifier.
    This name shall be unique in the PS DocSet namespace, in a given language scope.
  • Type (dc:type)
    A class of which the subject is an instance. This class should be defined itself by its PSI.

  • Description (dc:description)
    Text, image or any kind of relevant resource, describing the subject in a non-ambiguous, human-understandable way.

5 - Recommended Syntaxes, and examples of PS DocSet

Considering the considerable legacy of taxonomies, classifications, ontologies, data bases and catalogues likely to be made available as PS DocSets, their publishers should not be constrained to use any specific structure or syntax.

Therefore, the present recommendation will not enforce upon publishers either an unique standard structure for PS DocSets, or a specific syntax for PSIs. Nevertheless, it will recommend best practices for a certain number of existing relevant syntaxes, listed below. This list does not pretend to be exhaustive, and does not preclude any other present or future format and structure that would fit the requirements expressed in section 4.

5.1 - Recommendations for PS DocSet using XTM

Draft Proposals submitted to TC

5.2 - Recommendations for PS DocSet using RDF

To be delivered

5.3 - Recommendations for PS DocSet using XHTML

To be delivered