Published Subjects - Definitions, Requirements and Examples

OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > 1. Definitions, Requirements and Examples

OASIS Topic Maps Published Subjects TC Deliverables
1. Published Subjects - Definitions, Requirements and Examples

Version 0.3 - last updated 2002, September 26 (Previous version)
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/general.htm
Editor: Bernard Vatant

Status of this document : Final Draft

See also a re-wording proposal by Patrick Durusau

Editor's Notes:

1. This document is the first of a series of deliverables that will address definition, publication, management and best practices about Published Subjects. This first document provides only generic and minimal concepts, requirements and recommendations necessary to understand, define, provide and use Published Subjects.

2. All technical issues, currently under discussion in the Technical Committee, are not addressed by this document, but will be by future deliverables. Among others, the following will be addressed by Deliverable 2.

Structure and interpretation of Published Subject Identifiers (URIs)
Structure and format of Published Subject Indicators
Examples of Subject Indicators in specific syntax (XTM, RDF, XHTML ...)
Content, structure and format of Subject Indicator Metadata
Distinction between Subject Metadata and Subject Indicator Metadata.

3. This document uses topic maps terminology, and intends to be consistent with its use both in ISO 13250 specification and in the Standard Application Model for Topic Maps, the latter currently under discussion. Terms like "subject","topic", "subject indicator", "subject identifier", "published subject", "published subject indicator", "published subject identifier", are introduced informally in this document. SAM provides normative formal definitions for those terms.
See http://www.isotopicmaps.org/sam/sam-model/ for current SAM editor's draft.

1 - Statement of Purpose

The OASIS Topic Maps Published Subjects Technical Committee has been set forth to help wide-spread adoption of topic maps specification ISO 13250, singularly by publishers and users of classifications, taxonomies, thesaurus, registries, catalogues, directories ... Those publishers and users need interoperable and standard ways to make their subjects clearly defined and available for topic map applications in their respective industries and communities, through a trustable and standard process.

The first and main target of this Technical Committees recommendations is therefore topic maps interoperability, through efficient definition and identification of subjects represented by topics in topic maps.

This initial target is likely to be extended in the future to a wide range of applications or technologies making explicit use of abstract representations of subjects. Those include for example applications leveraging ontologies or vocabularies, search engines, intelligent agents, and other foreseeable or yet unknown tools. Throughout the present document, the generic term applications will be used to refer to either topic maps applications or any other such technologies.

Both identification of subjects by applications and definition of subjects for their human users, can be provided by stable network-retrievable resources, made available under trustable publication process, as defined by the present and following recommendations. Subjects defined and identified in such a way are called published subjects.

2 - A gentle introduction to Published Subjects

2.1 - Subjects and Topics

A subject can be an individual, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style"... In short, a subject can be anything deserving to be identified, named, represented and generally talked about - otherwise said a subject of conversation.

Applications deal with a subject through a formal representation or proxy, that will be called throughout this document a topic, to conform to topic maps terminology. Otherwise said, a topic is the representation, inside an application, of an unique, well-defined, and non-ambiguous subject.

2.2 - Subject Indicators and Subject Identifiers

Applications aggregate information around topics, and represent relationships between topics. Inter-operability of applications sharing information in the same network needs that authors, users, and applications are provided ways to agree if two or more topics in the same or different applications represent the same subject. This agreement has to be effective in both human-to-application and application-to-application transactions.
As for human-to-human transactions, even if daily agreement is commonly achieved by natural language conversation, efficient and durable agreement has to be grounded on common reference to a stable document or resource.

2.2.1 - Indication of the subject for humans : Subject Indicators

An application can indicate to human users what is the subject of a topic by referring to a document, or any other kind of network-retrievable resource, where the subject appears to be defined, described, or at least indicated in a human-readable and non-ambiguous way. Such a resource is called a subject indicator.

Provided with a subject indicator, human users should be able to know what subject the topic represents. Whenever applications are considered media for human transactions, subject indicators will provide a common reference to human users connected through the application, and agreement on the subject indicator will be used as the external expression of agreement on the identity of a subject.

2.2.2 - Identification of the subject for applications : Subject Identifiers

While being able to provide humans with subject indicators, the computer applications cannot "know" what the subject "is". But they can handle identifiers (strings) allowing them to decide if two subjects are identical or not. If the reference to a subject indicator in the network uses some URI, this URI will be the best subject identifier for applications. A subject identifier is an URI that refers to a subject indicator, and provides an unambiguous identification of a subject to an application.

Subject indicator and subject identifier are therefore two faces of the same identification mechanism, the former being for humans and the latter for applications. This identification mechanism is the support for agreement on subject identity throughout the network, between applications, between users, and between applications and users.

2.2.3 - Example : Subject Identifier and Subject Indicator for the subject "Apple Tree" (Malus Domestica)

2.3 - Published Subjects

2.3.1 - Shortcomings of the above scenario

Unfortunately, the whole above scenario is too simple to be sustainable. Any resource can be considered a subject indicator by being referred to as such by an application, whether or not this resource was intended by its publisher to be a subject indicator, and whether or not the publisher is aware of it or even cares about it. Hence, subject indicators and subject identifiers defined in such a way are untrustable, and are likely to be either ambiguous, or unstable, or both.

Ambiguity: What subject exactly does the above subject indicator indicates? That this resource indicates the subject "Malus Domestica" is just an interpretation from user's end. That is was the intention of the publisher to provide a subject indicator for this specific subject is just a guess, and is nowhere specified. Therefore another user could use differently the same resource to indicate a different subject, for example the fruit "apple" ...

Stability: On what basis should the user trust that this resource is persistent, both in address and content? URIs and the resources they address are moving targets in the networked universe.

2.3.2 - Publishers in the loop

If publishers are aware of the above shortcomings, and want to provide applications and users with non-ambiguous, stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers published in a standard way, and declare their intention to maintain their stability, reliablility and availaility on the network. At that point, applications and users will be provided with published subjects, published subject indicators and published subject identifiers.

The publication space, where such published subjects will be used, is a network of applications connected together and of users allowed to access those applications. It can be of course as wide and open as the Web, but it can be also a more or less closed network like an enterprise intranet or community portal. "Published" does not mean necessarily "public".

2.3.3 - Example : Published Subject Identifier and Published Subject Indicator for the subject "Apple"

In the below figure, the subject identified for the computer by the (fictitious) URL "http://psi.fruit.org/#apple" is indicated to Isaac Newton by a dedicated resource in the Fruit.Org Published Subjects, providing him with a non-ambiguous and stable definition. The Publisher (Fruit.Org) has declared this resource stable and intended to be used as a PSI. Isaac Newton can trust the URI resolution to provide him with a stable on-line resource as long as he has access to the network.

1. The above picture seems only slightly different from the previous one. A minor difference is that the subject is the fruit here, and the fruit tree there. But major differences are publisher's statement of purpose, disambiguation of the subject, and stability.

2. The topic maps literature has coined the acronym "PSI", used in XTM 1.0 specification. Note that it can be expanded both as "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking towards humans (the indicator), and one looking towards computers (the identifier). Like Janus Bifrons over Roman doors, PSIs are warrants of a good communication between two universes.

3 - Requirements and Recommendations for PSIs

The following are the basic requirements and recommendations for PSIs

Requirement 1 :

A Published Subject Identifier must be a URI.

Requirement 2 :

A Published Subject Identifier must resolve to an human-interpretable Published Subject Indicator.

It has been widely discussed if URNs could be used as PSIs, or only URLs. Although general best practice will certainly use URLs, URNs are not completely ruled out as PSIs ... providing the publisher defines some resolution mechanism, to conform to Requirement 2.
Nevertheless, URNs can be used by Topic Maps authors as valid interoperable, machine-processable Subject Identifiers, but default of resolution to a human-interpretable Subject Indicator, they will not be considered as Published Subject Identifiers conformant to the above requirements.

Recommendation 1 :

A Published Subject Indicator should provide human-readable metadata.

Recommendation 2 :

A Published Subject Indicator should provide machine-processable metadata.

Machine-processable metadata is recommended so that applications can use more information on the subject than solely URI identification.

Human-readable as well as machine-processable metadata can be included in the Subject Indicator itself (e.g. RDF metadata), or in a separate resource referenced from the Subject Indicator (e.g. XTM metadata).
Deliverable 2 will provide complementary recommendations on the nature of those metadata.

Recommendation 3 :

Metadata defined in 1 and 2 should be consistent, but not necessary equivalent.

Consistency between human-readable and machine-processable metadata is the warrant of consistent "interpretation" by applications and humans. This can be achieved, for example, by human-readable metadata being an expression of machine-processable metadata. This issue will be addressed by Deliverable 2.

Recommendation 4 :

Published Subject Indicator should indicate that it is intended to be a PSI.

This statement of purpose has to be clearly endorsed by the publisher (see below).

Recommendation 5 :

Published Subject Indicator should identify its publisher.

Publisher is to be understood here in its Dublin Core definition: "An entity responsible for making the resource available."

Statement of purpose and Publisher identification are the warrants of trust, fundamental to efficient PSI mechanism.