OASIS Topic Maps Published Subjects TC
Recommendations
for Documentation of Published Subjects
Version 0.1 - January
10, 2002
Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm
Editor: Bernard Vatant
January
12 : Remarks from Lars Marius Garshol added.
January
14 : Remarks from Mary Nishikawa added.
January
18 : Remarks from Murray Altheim added.
Status
of this document :
Working Draft
The
numbers respect the respective "shall, should, may" of the document:
TC Requirements for Documentation
of Published Subjects
1 - Statement of Purpose
The
purpose of this document is to provide recommendations for the structure
and content of published subject documentation, as defined below. Those
recommendations are aimed at publishers of classifications, taxonomies,
thesaurus, catalogues, ontologies ... the objective being to provide those
publishers with efficient and standard ways to make their legacy available
as published subjects usable by topic maps and other semantic applications.
Lars Marius Garshol : Needs to make clear that
this is based on ISO 13250, and should probably briefly explain what published
subjects are. I think it's necessary to write the document so that people
who have no idea what any of this is can get at least a clue of what the
document is.
2 - Glossary
The
following terms and concepts will be used in this document.
Note:
Some of those terms are already defined and used by ISO 13250. Nevertheless,
the TC proposes some modifications to clarify some of them and their relationships
with new ones, and has sent those proposals to ISO JTC1/SW34 for revision
and extension of ISO 13250 terminology. Both current ISO 13250 definition
and PubSubj TC proposal are given when necessary.
-
published subject
defined by ISO 13250 XTM
A published subject is any subject for which a subject indicator has
been made available for public use and is accessible online via a URI.
new definition proposal
A published subject is any subject for which at least one subject definition
document has been made available by an identified publisher.
Mary Nishikawa:
How about this instead if we need to be more explicit?
"A published subject is any subject for which at least one subject
definition document at a stable URI has been made available for public
use by the publisher identified within the published subject documentation."
Murray Altheim:
It might not be an entire document, but rather a document node, such
as http://www.topicmaps.org/xtm/1.0/core.xtm#occurrence
-
published subject documentation
new
definition proposal
A published subject documentation is a resource providing a structured
set of subject definition documents.
- publisher
defined by Dublin Core
The publisher of a resource is an entity responsible for making it available.
-
subject
defined by ISO 13250 XTM
A subject is anything whatsoever, regardless of whether it exists or
has any other specific characteristics, about which anything whatsoever
may be asserted by any means whatsoever.
-
subject definition document
new definition proposal
A subject definition document is a resource that has been intended by
its publisher to provide an indication of the nature of a subject. A
subject definition document should be usable both for human understanding
and computer processing.
Lars
Marius Garshol:
I think the second sentence is misleading. I think it should be replaced
by something like:
"A subject definition document is not required to use any particular
notation, but it must convey an understanding to a human of what the
subject is. It may also be computer-processable."
- subject
indicator
defined by ISO 13250 XTM
A subject indicator is a
resource that is intended by the topic map author to provide a positive,
unambiguous indication of the identity of a subject.
Lars
Marius Garshol:
This definition needs to change. How about: "Any resource can become
a subject indicator by being referred to as such by some topic in some
topic map." Slightly circular, but should work.
Mary Nishikawa:
Is this in addition to the first sentence and does it replace it completely?
-
subject indicator reference
defined by ISO 13250 XTM
The element <subjectIndicatorRef> provides
a URI reference to a resource that acts as a subject indicator.
new definition proposal
A subject indicator reference is a URI reference to a resource that
acts as a subject indicator.
3
- Recommendations for Published Subject Documentation
3.1 - Structure of published subject documentation.
Lars Marius Garshol:
Perhaps this section is better called "Content of published subject documentation",
so that we can work out what we want the PSD to contain before we dive
into the how?
Considering that a considerable legacy of taxonomies, classifications,
ontologies are likely to be made available as published subject documentations,
their publishers should not be constrained more than necessary to use
a specific syntax or language.
Therefore, the present recommendation does not aim to enforce upon publishers
either an unique specific syntax for subject definition document (e.g.
DTD or Schema), or an unique structure for subject indicator reference
(e.g. specific namespace structure).
Murray Altheim:
PSIs are not always going to be put into a specialized XML markup language,
and I think it's a mistake to require that. Most will be in XHTML or HTML
(as in Cyc), or as addressable resources online (perhaps as a database
query, as in ITIS). Requiring a specialized markup creates a big expenditure
of resources that seems unnecessary given that the XTM design was to allow
pointing to any addressable resource, especially since it's been the XTM
documents themselves that in my experience have served as the subject
"anchors."
But since we're really web-based (in terms of general audience and experience),
I'd suggest something along the lines of specific XHTML markup, if we
were to make any recommendation. This would enable both human- and machine-readable
resources, using commonly available tools like a web browser.
The minimal requirements for conformance to this recommendation are:
3.1.1
- Consistency of subject definition document structure
Throughout a published subject documentation, the subject definition documents
should be built following a consistent formal structure (DTD, schema or
some equivalent structure definition), allowing an easy processing of
their content by topic maps engines, search engines, intelligent agents
and any foreseeable kind of semantic web application.
3.1.2 - Consistency of subject indicator reference structure
A published subject documentation shall use a consistent namespace and
URI's structure for all its subject indicator references.
3.1.3
- Formal declaration of subject definition document and subject
indicator reference structures
A published subject documentation should include formal declaration of
structure for its subject definition document and its subject indicator
reference.
3.2
- Content of subject definition document
A
subject definition document shall provide, following a formal structure
as defined in 3.1., explicit information items about the published subject
and its publisher. A
part of those elements can be assimilated to Dublin Core metadata.
-
Title of document (dc:title)
- Identifier
(dc:identifier) - should be the subject indicator reference
- Language
of the subject definition document (dc:language)
- Publisher
(dc:publisher)
-
Creator (dc:creator) and possible contributors (dc:contributor)
-
Source (dc:source)
-
Definition of the subject (dc:subject)
-
Rights (dc:rights)
- History
of document : dates of creation, modification, validation
- Equivalence
: reference to equivalent published subjects in other published subject
documentations
- Users
: registered users of the published subject
Lars
Marius Garshol:
I think this contains way too much stuff.
Remember: this resource is supposed to define a single subject, and to
be part of a larger set of SDDs.
My thinking is that the PSD package should contain the following:
- dc:title (that is, the title of the PSI set)
- dc:identifier (the URI used to indicate the PSI set as a SIR)
- dc:language(s) (the language(s) in which subjects are defined)
- dc:publisher (who produced the PSD)
- dc:source (if the PSD is based on some source material)
- version information + publication date
- a set of SDDs
- a set of base PSDs
Mary
Nishikawa:
Can we also recommend to add dc:date with a comment that this is the date
of publication?
Is version information really needed or would this date suffice?
Murray Altheim:
The real question is whether or not we should *require* that the publication
date be machine-readable, and if so, how the date(s) should be provided
and maintained. DC includes ways of establishing more specialized date
semantics, and we'd probably be wanting initial date of publication as
well as extent of validity and last update (or "revision date"). This
may be asking a lot of our audience, esp. when the PSIs are part of a
database or code base from which the PSI publisher is unclear or unable
to discern the date information.
Lars
Marius Garshol:
I think the PSD should also be extensible, so that if the publisher wants
to put in acknowledgements, copyright information, or other information.
I think we should be very wary of structuring this information, however,
unless we know of specific uses for that structure. Simplicity is good.
The individual SDDs, on the other hand, should have the following:
- names (with language qualifiers)
- identifier(s) (SIRs, that is)
- definition(s) (plain text definitions, with language qualifiers)
- class(es) of which the subject is an instance
I think repeating all the metadata for each SDD is not necessary, it can
instead be inherited from the PSD package.
Mary Nishikawa:
Can
we also add the acronyms in parenthesis to the published subject documentation
(PSD) and the subject definition document (SDD) definitions? There are
no acronyms used for the other definitions in ISO 13250, so this may not
be good to have acronyms for some but not for others.
It
would be nice to have at least one example of the PSD and the individual
SDDs.
...
to be completed
|