OASIS Topic Maps Published Subjects TC
Deliverables
1. Documentation
of Published Subjects - Requirements and Recommendations
Version 0.4 - updated
March 2, 2002
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant
Status
of this document :
Working Draft
This
recommendation addresses the "shall, should, may" of :
Documentation of Published Subjects
- TC Process Requirements
1 - Statement of Purpose
The OASIS Topic Maps Published Subjects Technical Committee has been set
forth to help application of Topic Maps specification ISO
13250, by providing recommendations for documentation, management
and use of published subjects. The general purpose is that topic maps
interoperability needs non-ambiguous definition of subjects (represented
by
topics), that should be provided by stable resources, made available on-line
through trustable publication process.
Those resources, organised in published subject documentation sets,
will provide
both published subject indicators (human-understandable non-ambiguous
definition of subjects) and published subject identifiers (stable
URIs fit for computer processing, topic maps interoperability and merging,
and many other foreseeable semantic applications).
The
purpose of this document is to provide recommendations for the content
and structure of published subject documentation sets. Those recommendations
are aimed at publishers of ontologies, classifications, taxonomies, thesaurus,
registries, catalogues, data bases ... to provide those publishers with
efficient ways to make their legacy available as published subjects documentation,
and therefore usable by topic maps and other semantic applications.
2 - A gentle introduction to Published Subjects
A
main and original feature of topic maps is that they they deal with subjects.
A subject can be an unique individual object, like "Isaac Newton",
"IBM, Inc.", or "Paris (France)" ... or a class of
such individuals, like "famous scientists" "software companies"
or "towns" ... or a more abstract concept like "gravitation"
"economic growth" or "baroque style" ...
In a nutshell, a subject can be anything deserving to be
identified, named, represented and generally talked about - in short,
whatever can be a subject of conversation.
-
A subject is anything whatsoever, regardless of whether it exists
or has any other specific characteristics, about which anything whatsoever
may be asserted by any means whatsoever.
How
do topic maps deal with a subject? First they represent a subject
formally as an abstract "topic". In XTM documents, a topic is
represented by a <topic> XML element. A topic should represent an
unique, well-defined, non-ambiguous subject. So far, so good, at least
in the mind of a single topic map author. But topic maps applications
dream of inter-operability. That means that topic maps authors, users,
and computer applications dealing with them, must have ways to know if
two or more topics in the same or different topic maps represent the
same subject.
How can that be achieved? A topic map author can indicate what
is the subject of a topic by referring to a document, or any other kind
of resource, where the subject appears to be defined in a proper
and non-ambiguous way. Such a resource will therefore be considered by
the topic map author as a subject indicator. Provided with this
resource, an human being will be able, hopefully, to know what subject
this topic represents.
- A
subject indicator is a resource that is referred to by the topic map
author to provide an unambiguous indication of the identity of a subject.
Any resource can become a subject indicator by being referred to as
such from within some topic map, whether or not it was intended by its
publisher to be a subject indicator.
Since
topic maps live in the Web universe, the subject indicator has to be an
addressable (network-retrievable) resource. The reference to the subject
indicator will therefore use some URI, which will both address the
subject indicator and identify the subject. Computers applications
will of course be happy to handle this subject identifier, since
two topics with the same subject identifier clearly refer to the same
subject indicator, and therefore represent the same subject.
- A subject identifier
is an URI used by a topic map author to identify and refer to a subject
indicator.
Unfortunately,
the whole above scenario is too simple to be sustainable. The subject
indicators and subject identifiers defined only from the topic map author's
end, are likely to be untrustable and unstable. URIs and the resources
they address are moving targets in the Web universe. The publishers of
resources used as subject indicators might not even be aware of it, and
are likely to leave topic maps authors with meaningless identifiers and
indicators, if any indicator at all, without previous notice.
Here the publishers enter in the loop. If some publishers are aware of
the whole problem, and want to provide topic maps applications with stable,
trustable, authoritative subject indicators and identifiers, the situation
is far better. The publishers can provide sets of subject indicators and
subject identifiers in a stable way, and declare their intention to maintain
them stable and trustable for topic maps and other applications. At that
point, the topic maps authors are provided with published subjects,
defined in published subject documentation sets, coming along with
published subject indicators and published subject identifiers.
They will use them as before, but the whole scenario will become really
sustainable.
- A
published subject is a subject for which there exists at least one published
subject indicator.
- A
published subject indicator is a subject indicator that is published
and maintained at an advertised address for the purpose of facilitating
topic map interchange and mergeability.
- A published
subject identifier is the canonical URI of a published subject indicator,
chosen and declared by its publisher as the URI to be used within topic
maps to identify the published subject.
- A
published subject documentation set - PS DocSet - is the complete set
of documentation about a set of published subject indicators and identifiers,
as published by its publisher.
The
topic maps litterature has coined for over a year the acronym "PSI".
Note that it can expand both in "published subject indicator"
and "published subject identifier". Those are two faces of the
concept, one looking at humans (the indicator), and one looking at computers
(the identifier).
Like Janus Bifrons over Roman doors, PSIs are warrants of a good
communication between two universes ...
3
- Glossary
The
following terms and concepts will be used in this document and further
TC recommendations.
Some of them are already defined and used by ISO 13250. Nevertheless,
the TC proposes some modifications to clarify some of them and their relationships
with new ones, and will send those proposals to ISO JTC1/SW34 for relevant
revision and extension of ISO 13250 terminology. Both current ISO 13250
definition and PubSubj TC proposal are given when necessary.
"Publisher" is used throughout in the sense defined in Dublin
Core metadata (dc:publisher)
"Resource" is used throughout in the sense of "network-retrievable
resource" (IETF) or "addressable resource" (ISO 13250)
- subject
as defined by ISO 13250 XTM
A subject is anything whatsoever, regardless of whether it exists or
has any other specific characteristics, about which anything whatsoever
may be asserted by any means whatsoever.
- subject
indicator
as defined by ISO 13250 XTM
A resource that is intended by the topic map author to provide a positive,
unambiguous indication of the identity of a subject.
definition proposal
A resource that is referred to by the topic map author to
provide an unambiguous indication of the identity of a subject. Any
resource can become a subject indicator by being referred to as such
from within some topic map, whether or not it was intended by its publisher
to be a subject indicator.
See "published
subject indicator"
- subject identifier
definition proposal
An URI used by a topic map author to identify and refer to a subject
indicator. When a subject identifier is declared by a publisher, in
a published subject documentation set, to identify a published subject
indicator, it is called a published subject identifier.
- published
subject
definition proposal
A subject for which there exists at least one published subject indicator.
- published
subject indicator
- PS Indicator
as defined by ISO 13250 XTM
A subject indicator that is published and maintained at an advertised
address for the purpose of facilitating topic map interchange and mergeability.
- published
subject identifier -
PS Identifier
definition
proposal
The canonical URI of a published subject indicator, chosen and declared
by its publisher as the URI to be used within topic maps to identify
the published subject.
- published
subject documentation set - PS DocSet
definition
proposal
The complete set of documentation about a set of published subject indicators
and identifiers, as published by its publisher.
4
- Requirements for PS DocSet content
A PS DocSet shall contain at least the following mandatory elements:
- Statement of
Purpose
- Statement of
PS DocSet structure and format
- PS DocSet metadata
- Homogeneous
PSI set
4.1
- Statement of Purpose
A PS DocSet shall include the following formal statement from its publisher,
expliciting its conformance to this recommendation, and its intention
to maintain the documentation trustable, and its URIs stable.
This
namespace "http://psi.organization-foo/bar/" is dedicated by
its publisher, "organization-foo"
to host a permanent and stable Published Subject Documentation Set,
in conformance with Requirements and Recommendations of OASIS Topic Maps
Subjects Technical Committee:
http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
4.2
- Statement of PS DocSet structure and format
4.2.1
- A single URI shall be used both to identify the PS DocSet, and to
provide a namespace for its PS Identifiers.
All PS DocSet elements shall be identified by URIs belonging to that namespace.
Remark: The wording of the above is certainly
to improve.
What is intended is that if the PS DocSet is identified by http://psi.organization-foo/bar/
... then all PSIs in the PS DocSet shall be identified by URIs like http://psi.organization-foo/bar/unameit
4.2.2
- A PS DocSet shall provide explicit declaration of its structure,
format and syntax.
-
Syntax used for all PS Identifiers shall follow a consistent and
declared format throughout the PS DocSet.
-
Syntax
and structure used for all PS Indicators shall be uniform and declared
explicitly, by reference to some XML DTD, Schema or any other equivalent
structure definition.
4.3
- PS DocSet metadata
A PS DocSet shall include the following mandatory Dublin Core metadata.
- Type
(dc:type)
The declaration of the resource as a PS DocSet, by reference to a core
PSI for PS DocSet
- Identifier
(dc:identifier)
The canonical PS DocSet URI namespace
- Subject
(dc:subject)
A declaration of the general PS DocSet subject, domain or scope
- Publisher
(dc:publisher)
The publisher is the legal authority appearing in the Statement of Purpose
- Language
(dc:language)
The default language of publication used by PS Indicators
- Format
(dc:format)
The format, language or syntax in which the PS Indicators are expressed
- Date
(dc:date)
Date of publication, latest validation or revision
It
may also include the other - optional - Dublin Core metadata
- Title
(dc:title)
An usage name or title for the PS DocSet
- Description
(dc:description)
Complementary relevant information not contained in (dc:subject) element
- Creator
(dc:creator)
- Contributor
(dc:contributor)
- Conditions
of use (dc:rights)
- Source
(dc:source)
- Coverage
(dc:coverage)
In
complement to those metadata, the PS DocSet may include various recommendations
for use, list of registered users, or any other relevant information item.
4.4
- Homogeneous
PSI Set
4.4.1
- Every PS Indicator in a PS DocSet shall be identified by, and retrievable
through an unique canonical URI.
This canonical URI is the corresponding PS Identifier,
uniquely defined in the PS DocSet namespace.
4.4.2
- Troughout a PS DocSet, all PS Indicators shall follow the same formal
structure, as declared in 4.2.2
4.4.3
- A PS Indicator shall include at least the following Dublin Core elements:
- Identifier
(dc:identifier)
The canonical URI that shall be used as the PS Identifier.
This URI shall be unique, and defined in the PS DocSet namespace.
- Language
(dc:language)
Language in which subject, type, and description are expressed - if
different of the default PS DocSet language.
- Subject
(dc:subject)
A name given to the subject that is identified by the PS Identifier.
This name shall be unique in the PS DocSet namespace, in a given language
scope.
- Type
(dc:type)
A class of which the subject is an instance. This class should be defined
itself by its PSI.
-
Description (dc:description)
Text, image or any kind of relevant resource, describing the subject
in a non-ambiguous, human-understandable way.
5
- Recommended
Syntaxes, and examples of PS DocSet
Considering
the considerable legacy of taxonomies, classifications, ontologies, data
bases and catalogues likely to be made available as PS DocSets, their
publishers should not be constrained to use any specific structure or
syntax.
Therefore, the present recommendation will not enforce upon publishers
either an unique standard structure for PS DocSets, or a specific syntax
for PSIs. Nevertheless, it will recommend best practices for a certain
number of existing relevant syntaxes, listed below. This list does not
pretend to be exhaustive, and does not preclude any other present or future
format and structure that would fit the requirements expressed in section
4.
5.1
- Recommendations for PS DocSet using XTM
Draft Proposals submitted to TC
5.2
- Recommendations for PS DocSet using
RDF
To be delivered
5.3
- Recommendations for PS DocSet using XHTML
To
be delivered
|