OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Deliverables > 1. Definitions, Requirements and Examples

 

OASIS Topic Maps Published Subjects TC Deliverables
1. Published Subjects - Definitions, Requirements and Examples

Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/general.htm
Editor: Bernard Vatant
Updated 2002, November 29

This is a version revised throughout and reworded by:
Patrick Durusau - Director of Research and Development, Society of Biblical Literature - pdurusau@emory.edu
See details of revision in Modifications and Comments


Editor's Notes:

1. This document is the first in a series that address defining, publishing, managing as well as best practices for Published Subjects. This document provides only the generic and minimal concepts, requirements and recommendations necessary to understand, define, provide and use Published Subjects.

2. Not every technical issue pending before the Published Subjects TC is addressed by this document. Further deliverables will address, among others:

  • Structure and interpretation of Published Subject Identifiers (URIs)
  • Structure and format of Published Subject Indicators
  • Examples of Subject Indicators in specific syntax (XTM, RDF, XHTML ...)
  • Content, structure and format of Subject Indicator Metadata
  • Distinction between Subject Metadata and Subject Indicator Metadata.

3. Topic map terminology used in this document is consistent with ISO/IEC 13250 Topic Maps and the Standard Application Model for Topic Maps (currently under development, see SAM editor's draft at: http://www.isotopicmaps.org/sam/sam-model/).
Normative definitions of topic map terms, such as "subject," "topic," "subject indicator," "subject identifier," "published subject," "published subject indicator," "published subject identifier," are found in ISO/IEC 13250. Informal introductions to these terms are given in this document but are provided as a reader convenience only. Such introductions have no normative value for this deliverable or any others.


1 - Statement of Purpose

The OASIS Topic Maps Published Subjects TC has been chartered to produce specifications for and guides to the production of interoperable definitions of subjects that are represented by topics in topic maps. All users of classifications, taxonomies, thesauri, registries, catalogues, should be provided with the means to express their subjects in a manner that makes them available to topic map applications in their respective industries and communities.

Therefore, the principal goal of this TC is the specification of how subjects should be defined by users and identified by applications so as to result in unambiguous and stable identities known as "published subjects."


2 - A gentle introduction to Published Subjects

2.1 - Subjects and Topics

A subject can be an individual, like "Isaac Newton", "IBM, Inc.", or "Paris (France)" ... or a class of such individuals, like "famous scientists" "software companies" or "towns" ... or a more abstract concept like "gravitation" "economic growth" or "baroque style"... In short, a subject can be anything deserving to be identified, named, represented and generally talked about - otherwise said a subject of conversation.

Subjects are represented in an application by a "topic," as used in topic map terminology. To be useful, a unique, well-defined and unambiguous subject should be made explicit for each topic.

2.2 - Subject Indicators and Subject Identifiers

Applications use topics as a gathering point for information that is relevant in some way to the subject which that topic represents. In order to insure that all relevant information for a particular subject is gathered by a given topic, authors, users and applications should use the same topic to refer to the same subject. That common reference, the same topic representing the same subject, depends upon the use of a stable mapping of a particular subject to a given topic. Between people, that common agreement is arrived at (hopefully!) during iterative conversation. For automatic processing by computer based applications, a less iterative process is required. Both human and automated processes benefit from a common point (document or resource) that defines this mapping of subject to topic.

2.2.1 - Subject to Topic Mapping (for humans) : Subject Indicators

An application can indicate to human users what is the subject of a topic by referring to a document, or any other kind of network-retrievable resource, where the subject appears to be defined, described, or at least indicated in a human-readable and non-ambiguous way. Such a resource is called a subject indicator.

Provided with a subject indicator, human users should be able to know what subject the topic represents. Whenever applications are considered media for human transactions, subject indicators will provide a common reference to human users connected through the application, and agreement on the subject indicator will be used as the external expression of agreement on the identity of a subject.

2.2.2 - Subject to Topic Mapping (for applications) : Subject Identifiers

Computer applications using a topic cannot "know" what a subject "is". In order to reach (hopefully!) a similar conclusion to that of a human user inspecting a subject indicator, the application relies upon what is known as a subject identifier. A subject identifier is a string that is used by an application to compare two topics to see if they "match." If they do "match," then the application considers the subject of those two topics to be the same. If the strings do not "match," then the subjects represented by those topic are considered to be different.

If the reference to a subject indicator in the network uses a URI, this URI will be the best subject identifier for applications. A subject identifier is a URI that refers to a subject indicator, and provides an unambiguous identification of a subject to an application.

Subject indicator and subject identifier are therefore two faces of the same identification mechanism, the former being for humans and the latter for applications. This identification mechanism is the support for agreement on subject identity throughout the network, between applications, between users, and between applications and users.

2.2.3 - Example : Subject Identifier and Subject Indicator for the subject "Apple Tree" (Malus Domestica)

2.3 - Published Subjects

2.3.1 - Problems with Simple Subject Identifiers and Subject Indicators

The Subject Identifier and Subject Indicator mechanisms outlined above are deficient in two important respects:

1. Any resource can be considered as a Subject Identifier or Subject Indicator
2. No resource is declared to be a Subject Identifier or Subject Indicator.

Since any resource can be considered as a Subject Identifier and/or Subject Indicator, there is no mechanism to require additional information that would enhance the utility of those mechanisms. The more structured information that is allowed for the Subject Identifier and/or Subject Indicator mechanism, the less ambiguity and greater utility of the resource.

A second problem is that the source of the resource, i.e., the author of the topic map, may or may not intend for any particular resource to be used or even considered as a Subject Identifier or Subject Indicator. If its use in that manner was not intended by the topic map authors or they are unaware of such use, the resource could change in unpredictable ways or even disappear altogether.

The mechanisms thus far described may suffer from ambiguity, lack of sufficient detail to be useful and unstable both in terms of content as well as location. An alternative to the current mechanism is outlined below.

2.3.2 - Publishers in the loop

If publishers are aware of the above shortcomings, and want to provide applications and users with non-ambiguous, stable, trustable, authoritative subject indicators and identifiers, the situation is far better. The publishers can provide sets of subject indicators and subject identifiers published in a standard way, and declare their intention to maintain their stability, reliablility and availaility on the network. At that point, applications and users will be provided with published subjects, published subject indicators and published subject identifiers.

The publication space, where such published subjects will be used, is a network of applications connected together and of users allowed to access those applications. It can be of course as wide and open as the Web, but it can be also a more or less closed network like an enterprise intranet or community portal. "Published" does not mean necessarily "public".

[Note : Patrick's suggestion was to strike this whole section]

2.3.3 - Published Subject Identifier and Published Subject Indicator: Subject "Apple"

The Published Subject Identifier and Published Subject Indicator shown below differ from the first example in three very important ways:

1. The publisher states this is a Published Subject Identifier and Published Subject Indicator (intentional PSI, not accidental)

2. The subject is unambiguous

3. The resource is stable

 

The topic maps literature has coined the acronym "PSI", used in XTM 1.0 specification. Note that it can be expanded both as "published subject indicator" and "published subject identifier". Those are two faces of the concept, one looking towards humans (the indicator), and one looking towards computers (the identifier).


3 - Requirements and Recommendations for PSIs

3.1 - Requirements for PSIs

Requirement 1 :

  • A Published Subject Identifier must be a URI.

Requirement 2 :

  • A Published Subject Identifier must resolve to an human-interpretable Published Subject Indicator.

It has been discussed if URNs could be used as PSIs. The best practice is to use URLs. URNs may be used as PSIs provided the publisher defines some resolution mechanism, to conform to Requirement 2.

3.2 - Recommendations for PSIs

Recommendation 1 :

  • A Published Subject Indicator should provide human-readable metadata.

Recommendation 2 :

  • A Published Subject Indicator should provide machine-processable metadata.

Machine-processable metadata is recommended so that applications can use more information on the subject than solely URI identification.

Human-readable as well as machine-processable metadata can be included in the Subject Indicator itself (e.g. RDF metadata), or in a separate resource referenced from the Subject Indicator (e.g. XTM metadata).
A future deliverable will provide complementary recommendations on the nature of those metadata.

Recommendation 3 :

  • Metadata defined in 1 and 2 should be consistent, but not necessary equivalent.

Consistency between human-readable and machine-processable metadata is the warrant of consistent "interpretation" by applications and humans. This can be achieved, for example, by human-readable metadata being an expression of machine-processable metadata. This issue will be addressed in a future deliverable.

Recommendation 4 :

  • A PSI should indicate that it is intended to act as a PSI, that is as a Public Subject Identifier and a Public Subject Indicator for a particular topic.

This statement of purpose has to be clearly endorsed by the publisher (see below).

Recommendation 5 :

  • A PSI should identify its publisher.

Publisher is to be understood here in its Dublin Core definition: "An entity responsible for making the resource available."


apple tree subject identifier