OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > Documentation of Published Subjects

 

OASIS Topic Maps Published Subjects TC Deliverables
1. Documentation of Published Subjects - Requirements and Recommendations

Version 0.6 - Last release 2002, May 15

Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant

Status of this document :
Draft proposal, modified following April 30 conference call

Note : To clarify the document, too controversial parts of the previous version have been removed from this version.
Pending and currently discussed
questions concerning those parts are gathered in the issues document.


1 - Scope and Statement of Purpose

The OASIS Topic Maps Published Subjects Technical Committee has been set forth to help wide-spread adoption of topic maps specification ISO 13250, singularly by publishers and users of classifications, taxonomies, thesaurus, registries, catalogues, directories ... needing standard ways to make their legacy available for topic map applications.

The first and main target of this recommendation is therefore topic maps interoperability, through efficient definition and identification of subjects represented by topics in topic maps.

This initial target is likely to be extended in the future to a wide range of applications or technologies making explicit use of abstract representations of subjects. Those include for example applications leveraging ontologies or vocabularies, search engines, intelligent agents, and other foreseeable or yet unknown Semantic Web tools. Throughout the present document, the generic term applications is used to refer to either topic maps applications or any other such technologies.

Both identification of subjects by applications and definition of subjects for their human users, can be provided by stable resources, made available through trustable publication process, as defined by the present and following recommendations. Subjects defined and identified in such a way are called published subjects.

The purpose of this document is to set recommendations for published subject documentation, that will provide structured sets of published subject indicators (human-readable definition of subjects), addressable and identified by published subject identifiers (stable URIs used by applications to identify the subjects).

2 - A gentle introduction to Published Subjects terminology

2.1 - Subjects and Topics

A subject can be an individual, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style"... In short, a subject can be anything deserving to be identified, named, represented and generally talked about - otherwise said a subject of conversation. Topic Maps specification XTM 1.0 proposes an extremely general definition of a subject.

A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.

Applications deal with a subject by handling a formal representation or proxy, that will be called throughout this document a topic, to conform to topic maps terminology.

A topic is a representation, inside an application, of an unique, well-defined, and non-ambiguous subject.

2.2 - Subject Indicators and Subject Identifiers

Applications inter-operability needs that authors, users, and applications themselves, must have ways to agree if two or more topics in the same or different applications represent the same subject.

How can that be achieved? An application can indicate to human users what is the subject of a topic by referring to a document, or any other kind of resource, where the subject appears to be defined or described in a human-readable and non-ambiguous way. Such a resource is called a subject indicator. Provided with this resource, an human user will be able, hopefully, to know what subject this topic represents.

A subject indicator is a resource that is referred to by an application, to provide an unambiguous indication of the identity of a subject to a human being.

In the Web universe, a subject indicator has to be an addressable (network-retrievable) resource. The reference to a subject indicator will therefore use some URI, which will both address the subject indicator and identify the subject. Applications will use this URI as a subject identifier.

A subject identifier is an URI that refers to a subject indicator, and provides an unambiguous identification of a subject to an application.

2.3 - Published Subjects

Unfortunately, the whole above scenario is too simple to be sustainable. Any resource can become a subject indicator by being referred to as such by an application, whether or not this resource was intended by its publisher to be a subject indicator. The subject indicators and subject identifiers defined in such a process are likely to be untrustable and unstable. URIs and the resources they address are moving targets in the Web universe. The publishers of resources used as subject indicators might not even be aware of it, and are likely to leave applications and users with meaningless identifiers and indicators, if any indicator at all, without previous notice.

Here the publishers enter in the loop. If some publishers are aware of the whole problem, and want to provide applications with stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers published in a standard way, and declare their intention to maintain them stable and trustable for topic maps and other applications. At that point, the applications and users are provided with published subjects, defined in published subject documentation, containing published subject indicators addressable through published subject identifiers. They will use them as above, but the whole scenario will become really sustainable.

A published subject is a subject for which there exists at least one published subject indicator.

A published subject indicator is a subject indicator that is published and maintained at an advertised address in order to facilitate interoperability of applications.

A published subject identifier is the URI of a published subject indicator, chosen and declared by its publisher as the URI to be used by applications to identify the published subject.

The topic maps literature has coined the acronym "PSI", also used in XTM 1.0 specification. Note that it can expand both as "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking at humans (the indicator), and one looking at computers (the identifier). Like Janus Bifrons over Roman doors, PSIs are warrants of a good communication between two universes.

ISSUE 21 - Glossary

3 - Published Subject Documentation - an overview

Publishers are not likely to document individual independent subjects, but rather sets of subjects, relevant to a consistent domain of application, and gathered in a structured resource or set of resources. Such resources will be called hereafter PSD (Published Subject Documentation). This section gives an overview of PSD constituents and structure. More detailed recommendations concerning those constituents and structure will be detailed in further sections.

This section is inspired by Lars Marius Garshol proposals in : Analysis of published subject documentation

3.1 - PSD Subjects Set

The PSD Subjects Set is the set of all subjects identified in a PSD.

ISSUE 22 - Subject equivalence

3.2 - PSD Constituents

3.2.1 - Subject Indicators Set:

The Subject Indicators Set is the set of all Subject Indicators used by a PSD.
Each Subject Indicator is a human-readable resource, providing a definition of a single subject in the PSD Subjects Set. There is a one-to-one correspondence between the Subjects Set and the Subject Indicators Set.

ISSUE 23 - Self-Referencing Subject Indicators
ISSUE 24 - Internal vs External Subject Indicators

3.2.2 - Subject Identifiers Set:

The Subject Identifiers Set is the set of all URIs identifying, and resolving to, Subject Indicators.
There is a one-to-one correspondence between the Subject Identifiers Set and the Subject Indicators Set.

ISSUE 25 - Infinite Subject Sets

3.2.3 - Documentation Description:

The Documentation Description contains human-readable definition of the Subject Indicators Set.
The description may be either extensive (exhaustive list of Subject Indicators with human-readable pointers), or intensive (definition of characteristic properties of the Subjects Set and/or Subject Indicators Set). It may also contains informations about the recommended domain of use.

3.2.4 - Documentation Metadata:

The Documentation Metadata are formal, machine-readable metadata about the PSD.
Example : publisher identification, language, version, date of validation, domain ...

3.2.5 - Subject Formal Assertions:

The Subject Formal Assertions are intended to express in a formal and machine-readable way, some information that belongs to the Subject definition.
Example : name, class, relationships with other subjects ...

ISSUE 26 - Consistency of Subject Indicators with Subject Formal Assertions

3.2.6 - Subject Indicator Metadata:

The Subject Indicator Metadata are formal, machine-readable metadata about the Subject Indicator.
Example : type, format ...

3.3 - PSD structure

This section will define the recommended packaging structure and format for the different PSD constituents.

Currently discussed - To be delivered

4 - Published Subject Documentation - specific recommendations

Currently discussed - To be delivered

5 - Use cases and examples

Currently discussed - To be delivered