OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Recommendations > Documentation of Published Subjects

 

OASIS Topic Maps Published Subjects TC
Recommendations for Documentation of Published Subjects

Version 0.1 - January 10, 2002
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant
January 12 : Remarks from Lars Marius Garshol added.
January 14 : Remarks from Mary Nishikawa added.
January 18 : Remarks from Murray Altheim added.

Status of this document : Working Draft


The numbers respect the respective "shall, should, may" of the document:
TC Requirements for Documentation of Published Subjects

1 - Statement of Purpose

The purpose of this document is to provide recommendations for the structure and content of published subject documentation, as defined below. Those recommendations are aimed at publishers of classifications, taxonomies, thesaurus, catalogues, ontologies ... the objective being to provide those publishers with efficient and standard ways to make their legacy available as published subjects usable by topic maps and other semantic applications.

Lars Marius Garshol : Needs to make clear that this is based on ISO 13250, and should probably briefly explain what published subjects are. I think it's necessary to write the document so that people who have no idea what any of this is can get at least a clue of what the document is.


2 - Glossary

The following terms and concepts will be used in this document.

Note: Some of those terms are already defined and used by ISO 13250. Nevertheless, the TC proposes some modifications to clarify some of them and their relationships with new ones, and has sent those proposals to ISO JTC1/SW34 for revision and extension of ISO 13250 terminology. Both current ISO 13250 definition and PubSubj TC proposal are given when necessary.

  • published subject

    defined by ISO 13250 XTM

    A published subject is any subject for which a subject indicator has been made available for public use and is accessible online via a URI.

    new definition proposal

    A published subject is any subject for which at least one subject definition document has been made available by an identified publisher.

    Mary Nishikawa:
    How about this instead if we need to be more explicit?
    "A published subject is any subject for which at least one subject definition document at a stable URI has been made available for public use by the publisher identified within the published subject documentation."


    Murray Altheim:
    It might not be an entire document, but rather a document node, such as http://www.topicmaps.org/xtm/1.0/core.xtm#occurrence

  • published subject documentation

    new definition proposal

    A published subject documentation is a resource providing a structured set of subject definition documents.
  • publisher

    defined by Dublin Core
    The publisher of a resource is an entity responsible for making it available.
  • subject

    defined by ISO 13250 XTM
    A subject is anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever.
  • subject definition document

    new definition proposal

    A subject definition document is a resource that has been intended by its publisher to provide an indication of the nature of a subject. A subject definition document should be usable both for human understanding and computer processing.

    Lars Marius Garshol:
    I think the second sentence is misleading. I think it should be replaced by something like:
    "A subject definition document is not required to use any particular notation, but it must convey an understanding to a human of what the subject is. It may also be computer-processable."

  • subject indicator

    defined by ISO 13250 XTM
    A subject indicator is a
    resource that is intended by the topic map author to provide a positive, unambiguous indication of the identity of a subject.

    Lars Marius Garshol:
    This definition needs to change. How about: "Any resource can become a subject indicator by being referred to as such by some topic in some topic map." Slightly circular, but should work.


    Mary Nishikawa:
    Is this in addition to the first sentence and does it replace it completely?

  • subject indicator reference

    defined by ISO 13250 XTM
    The element <subjectIndicatorRef> provides a URI reference to a resource that acts as a subject indicator.

    new definition proposal

    A subject indicator reference is a URI reference to a resource that acts as a subject indicator.

3 - Recommendations for Published Subject Documentation

3.1 - Structure of published subject documentation.


Lars Marius Garshol
:

Perhaps this section is better called "Content of published subject documentation", so that we can work out what we want the PSD to contain before we dive into the how?


Considering that a considerable legacy of taxonomies, classifications, ontologies are likely to be made available as published subject documentations, their publishers should not be constrained more than necessary to use a specific syntax or language.
Therefore, the present recommendation does not aim to enforce upon publishers either an unique specific syntax for subject definition document (e.g. DTD or Schema), or an unique structure for subject indicator reference (e.g. specific namespace structure).

Murray Altheim:

PSIs are not always going to be put into a specialized XML markup language, and I think it's a mistake to require that. Most will be in XHTML or HTML (as in Cyc), or as addressable resources online (perhaps as a database query, as in ITIS). Requiring a specialized markup creates a big expenditure of resources that seems unnecessary given that the XTM design was to allow pointing to any addressable resource, especially since it's been the XTM documents themselves that in my experience have served as the subject "anchors."

But since we're really web-based (in terms of general audience and experience), I'd suggest something along the lines of specific XHTML markup, if we were to make any recommendation. This would enable both human- and machine-readable resources, using commonly available tools like a web browser.


The minimal requirements for conformance to this recommendation are:

3.1.1 - Consistency of subject definition document structure

Throughout a published subject documentation, the subject definition documents should be built following a consistent formal structure (DTD, schema or some equivalent structure definition), allowing an easy processing of their content by topic maps engines, search engines, intelligent agents and any foreseeable kind of semantic web application.

3.1.2 - Consistency of subject indicator reference structure

A published subject documentation shall use a consistent namespace and URI's structure for all its subject indicator references.

3.1.3 - Formal declaration of subject definition document and subject indicator reference structures

A published subject documentation should include formal declaration of structure for its subject definition document and its subject indicator reference.

3.2 - Content of subject definition document

A subject definition document shall provide, following a formal structure as defined in 3.1., explicit information items about the published subject and its publisher. A part of those elements can be assimilated to Dublin Core metadata.

  • Title of document (dc:title)
  • Identifier (dc:identifier) - should be the subject indicator reference
  • Language of the subject definition document (dc:language)
  • Publisher (dc:publisher)
  • Creator (dc:creator) and possible contributors (dc:contributor)
  • Source (dc:source)
  • Definition of the subject (dc:subject)
  • Rights (dc:rights)
  • History of document : dates of creation, modification, validation
  • Equivalence : reference to equivalent published subjects in other published subject documentations
  • Users : registered users of the published subject

Lars Marius Garshol:
I think this contains way too much stuff.
Remember: this resource is supposed to define a single subject, and to be part of a larger set of SDDs.

My thinking is that the PSD package should contain the following:
- dc:title (that is, the title of the PSI set)
- dc:identifier (the URI used to indicate the PSI set as a SIR)
- dc:language(s) (the language(s) in which subjects are defined)
- dc:publisher (who produced the PSD)
- dc:source (if the PSD is based on some source material)
- version information + publication date
- a set of SDDs
- a set of base PSDs

Mary Nishikawa:
Can we also recommend to add dc:date with a comment that this is the date of publication?
Is version information really needed or would this date suffice?


Murray Altheim:
The real question is whether or not we should *require* that the publication date be machine-readable, and if so, how the date(s) should be provided and maintained. DC includes ways of establishing more specialized date semantics, and we'd probably be wanting initial date of publication as well as extent of validity and last update (or "revision date"). This may be asking a lot of our audience, esp. when the PSIs are part of a database or code base from which the PSI publisher is unclear or unable to discern the date information.


Lars Marius Garshol:
I think the PSD should also be extensible, so that if the publisher wants to put in acknowledgements, copyright information, or other information. I think we should be very wary of structuring this information, however, unless we know of specific uses for that structure. Simplicity is good.

The individual SDDs, on the other hand, should have the following:
- names (with language qualifiers)
- identifier(s) (SIRs, that is)
- definition(s) (plain text definitions, with language qualifiers)
- class(es) of which the subject is an instance

I think repeating all the metadata for each SDD is not necessary, it can instead be inherited from the PSD package.

Mary Nishikawa:
Can we also add the acronyms in parenthesis to the published subject documentation (PSD) and the subject definition document (SDD) definitions? There are no acronyms used for the other definitions in ISO 13250, so this may not be good to have acronyms for some but not for others.
It would be nice to have at least one example of the PSD and the individual SDDs.

... to be completed