OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > 1. Definitions, Requirements and Examples

 

OASIS Topic Maps Published Subjects TC Deliverables
1. Published Subjects - Definitions, Requirements and Examples

Version 0.2 - last updated 2002, August 20 ( Previous version )
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/general.htm
Editor: Bernard Vatant

Status of this document :
Draft for review, conformant to August 10 Montréal meeting decisions.


Preliminary Editor's Note:

This document is the first of a series of deliverables that will address definition, publication, management and best practices about Published Subjects. This document provides only generic and minimal concepts, requirements and recommendations necessary to understand, define, provide and use Published Subjects.

More specific issues, currently under discussion in the Technical Committee, are not addressed by this document, but will be by future deliverables. Among
others, the following will be addressed by Deliverable 2.

  • Structure and "semantic" interpretation of Published Subject Identifiers (URIs)
  • Structure and format of Published Subject Indicators
  • Content, structure and format of PSI Metadata
  • Distinction between Subject Metadata and Subject Indicator Metadata.

1 - Statement of Purpose

The OASIS Topic Maps Published Subjects Technical Committee has been set forth to help wide-spread adoption of topic maps specification ISO 13250, singularly by publishers and users of classifications, taxonomies, thesaurus, registries, catalogues, directories ... needing standard ways to make their legacy available for topic map applications. [add something about new stuff]

The first and main target of this Technical Committees recommendations is therefore topic maps interoperability, through efficient definition and identification of subjects represented by topics in topic maps.

This initial target is likely to be extended in the future to a wide range of applications or technologies making explicit use of abstract representations of subjects. Those include for example applications leveraging ontologies or vocabularies, search engines, intelligent agents, and other foreseeable or yet unknown tools. Throughout the present document, the generic term applications will be used to refer to either topic maps applications or any other such technologies.

Both identification of subjects by applications and definition of subjects for their human users, can be provided by stable network-retrievable resources, made available under trustable publication process, as defined by the present and following recommendations. Subjects defined and identified in such a way are called published subjects.


2 - A gentle introduction to Published Subjects terminology

2.1 - Subjects and Topics

A subject can be an individual, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style"... In short, a subject can be anything deserving to be identified, named, represented and generally talked about - otherwise said a subject of conversation. Topic Maps specification XTM 1.0 proposes an extremely general definition of a subject.

Definition 1

A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.

Applications deal with a subject by handling a formal representation or proxy, that will be called throughout this document a topic, to conform to topic maps terminology.

Definition 2

A topic is a representation, inside an application, of an unique, well-defined, and non-ambiguous subject.

2.2 - Subject Indicators and Subject Identifiers

Inter-operability of applications sharing information in the same network needs that authors, users, and applications are provided ways to agree if two or more topics in the same or different applications represent the same subject. This agreement has to be effective in both human-to-application and application-to-application transactions.
As for human-to-human transactions, even if daily agreement is commonly achieved by natural language conversation, efficient and durable agreement has to be grounded on common reference to a stable document or resource.

2.2.1 - Indication of the subject for humans : Subject Indicators

An application can indicate to human users what is the subject of a topic by referring to a document, or any other kind of network-retrievable resource, where the subject appears to be defined, described, or at least indicated in a human-readable and non-ambiguous way. Such a resource is called a subject indicator.

Provided with this resource, human users will be able to know what subject the topic represents. Whenever applications are considered media for human transactions, subject indicators will provide a common reference to human users connected through the application.

Definition 3

A subject indicator is a network-retrievable resource that is referred to by an application, to provide an unambiguous indication of the identity of a subject to a human being.

2.2.2 - Identification of the subject for applications : Subject Identifiers

While being able to provide humans with subject indicators, the computer applications cannot "know" what the subject "is". But they can handle identifiers (strings) allowing them to decide if two subjects are identical or not. If the reference to a subject indicator in the network uses some URI, this URI will be the best subject identifier for applications.

Definition 4

A subject identifier is an URI that refers to a subject indicator, and provides an unambiguous identification of a subject to an application.

Subject indicator and subject identifier are therefore two faces of the same identification mechanism, the former being for humans and the latter for applications. This identification mechanism is the support for agreement on subject identity throughout the network, between applications, between users, and between applications and users.

2.2.3 - Example : Subject Identifier and Subject Indicator for the subject "Apple Tree" (Malus Domestica)

2.3 - Published Subjects

Unfortunately, the whole above scenario is too simple to be sustainable. Any resource can be considered a subject indicator by being referred to as such by an application, whether or not this resource was intended by its publisher to be a subject indicator. The subject indicators and subject identifiers defined in such a process are likely to be untrustable and unstable. URIs and the resources they address are moving targets in the networked universe. The publishers of resources used as subject indicators might not even be aware of it or just not care about it, and are likely to leave applications and users with meaningless identifiers and indicators, if any indicator at all, without previous notice (as an example, check if the above resource is still available on the Web, by clicking on the URI box on the image)

Here publishers must enter in the loop. If publishers are aware of the whole problem, and want to provide applications and users with stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers published in a standard way, and declare their intention to maintain their stability, reliablility and availaility on the network.

At that point, applications and users will be provided with published subjects, published subject indicators and published subject identifiers. They will use them as above, but the whole scenario will become more sustainable.

Definition 5

A published subject is a subject for which at least one published subject indicator is available.

Definition 6

A published subject indicator is a subject indicator that is published and maintained at an advertised address in order to facilitate interoperability of applications.

Definition 7

A published subject identifier is the URI of a published subject indicator, chosen and declared by its publisher as the URI to be used by applications to identify the published subject.

Note that "publication space" means a network of applications connected together, and of users allowed to access those applications. It can be of course as wide and open as the Web, but it can be also a more or less closed network (enterprise intranet, community portal ...). Published does not mean necessarily "public"...

Example: Published Subject Identifier and Published Subject Indicator for the subject "Apple"

In the below figure, the subject identified for the computer by the URI http://psi.fruit.org/#apple is indicated to Isaac Newton by a dedicated resource in the Fruit Glossary, providing him with a definition and image.
The Publisher (Fruit.Org) has declared this resource stable and intended to be used as a PSI. Isaac Newton can trust the URI resolution to provide him with a stable on-line resource as long as he has access to the network.

Notes:

1. The above picture seems only slightly different from the previous one (a minor difference is that the subject is the fruit here, and the fruit tree there). But major differences are publisher's statement of purpose and user's trust.

2. The topic maps literature has coined the acronym "PSI", used in XTM 1.0 specification. Note that it can be expanded both as "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking towards humans (the indicator), and one looking towards computers (the identifier). Like Janus Bifrons over Roman doors, PSIs are warrants of a good communication between two universes.


3 - Requirements and Recommendations for PSIs

The following are the basic requirements and recommendations for PSIs

Requirement 1 :

  • A Published Subject Identifier must be a URI.

Requirement 2 :

  • A Published Subject Identifier must resolve to an human-interpretable Published Subject Indicator.

It has been widely discussed if URNs could be used as PSIs, or only URLs. Although general best practice will certainly use URLs, URNs are not completely ruled out as PSIs ... providing the publisher defines some resolution mechanism, to conform to Requirement 2.
Nevertheless, URNs can be used by Topic Maps authors as valid interoperable, machine-processable Subject Identifiers, but default of resolution to a human-interpretable Subject Indicator, they will not be considered as Published Subject Identifiers conformant to the above requirements.

Recommendation 1 :

  • A Published Subject Indicator should provide human-readable metadata.

Recommendation 2 :

  • A Published Subject Indicator should provide machine-processable metadata.

Machine-processable metadata is recommended so that applications can use more information on the subject than solely URI identification.

Human-readable as well as machine-processable metadata can be included in the Subject Indicator itself (e.g. RDF metadata), or in a separate resource referenced from the Subject Indicator (e.g. XTM metadata).
Deliverable 2 will provide complementary recommendations on the nature of those metadata.

Recommendation 3 :

  • Metadata defined in 1 and 2 should be consistent, but not necessary equivalent.

Consistency between human-readable and machine-processable metadata is the warrant of consistent "interpretation" by applications and humans. This can be achieved, for example, by human-readable metadata being an expression of machine-processable metadata. This issue will be addressed by Deliverable 2.

Recommendation 4 :

  • Published Subject Indicator should indicate that it is intended to be a PSI.

This statement of purpose has to be clearly endorsed by the publisher (see below).

Recommendation 5 :

  • Published Subject Indicator should identify its publisher.

Publisher is to be understood here in its Dublin Core definition:
"An entity responsible for making the resource available."

Statement of purpose and Publisher identification are the warrants of trust, fundamental to efficient PSI mechanism.


4 - Examples

The purpose of PSIs can be stated - in a nutshell - as providing the mechanism to make it possible to distinguish apples from oranges.
The examples in this section will therefore do that, providing ways to distinguish between the fruit 'apple' and the fruit 'orange', and between the fruit 'orange' and the color 'orange'.

1. Examples of Published Subject Identifiers

A (fictive) publisher owning the domain fruit.org, uses the subdomain psi.fruit.org, dedicated to Published Subjects about fruits. Various URLs can be used as Subject Identifiers for the fruit class "apple", for example:

  • http://psi.fruit.org/#apple
  • http://www.fruit.org/psi/apple.html
  • http://psi.fruit.org/fruits.html#apple
  • http://psi.fruit.org/fruits?id=apple

2. Examples of Published Subject Indicators

2.1 XHTML PSI

To be delivered

2.2 RDF PSI

To be delivered

Subject Identifier for Apple Tree