Draft Proposal
Best Practices for Published Subject Documentation Structure
Mary Nishikawa
- February 21, 2002
For inclusion in:
Documentation of Published Subjects
- Requirements and Recommendations - Part 5
Scenarios for the use of published subjects in Topic Maps
Scenario 1: A topic map implementor would like to use a standard classification
as a Topic Map representation in XTM. The implementor would like to ensure
that the indicators are stable and that the references to the indicators,
(the subject identifiers) are the same in all maps for these subjects.
This will ensure that the same subjects in various maps are, in fact,
merged.
Method A. For this implementation, it is decided to publish the metadata
as Simple Dublin Core in RDF for each xtm file containing the set of published
subjects.
The standard classification chosen was Segment 71 of the Universal Standard
Products and Services Classification (UNSPSC) published by the Electronic
Commerce Code Management Association at http://www.eccma.org.
The classification of the UNSPSC is broken down into 4 levels: Segment,
Family, Class, and Commodity. There are also the core elements of title,
the UNSPSC eight digit hierarchical code called the ECCMA Global Commodity
Classification (EGCC), and the UNSPSC product or service control identifier
number, the ECCMA Global Commodity Indicator (EGCI). Since the ECCMA already
has an identifier, one number that is fixed throughout the lifetime of
each product or service, it will be used to establish the "identity"
of the subject in the topic map.
All Segments of the UNSPSC use Segment, Family, Class, Commodity, EGCC,
EGCI and title. I have placed these in a separate XTM published subject
set: psi-unspsc-core.xtm with
it's corresponding metadata file: psi-unspsc-core-meta.rdf
. The metadata was written as described in Expressing
Simple Dublin Core in RDF. All of the published subject indicators
are contained within the XTM file as resourceData in an occurrence. The
resource data element contains the product or service control identifier
number (EGCI)The published subject identifiers are the fragment identifiers
of the EGCI resourceData element id, in the subjectIndicatorRef within
the subjectIdentity element. For example, the subject identity of "Family"
is psi-unspsc-core.xtm#psi-unspsc-family-desc.
Also, browsers don't seem to resolve the fragment identifier with only
the .xtm appended with fragment identifiers, so the files have .xml appended
to them (file name is ...core.xtm.xml#psi-unspsc-family-desc). IE does
not take you to each individual fragment, only to the top of the file.
There is a second XTM file that
contains all of the UNSPSC code for Segment 71 (the Mining Services Family
has only been completed and the Oil and Gas Services Family will be added
later on) . It is set up similarly to the one above and also has its metadata
in RDF.
Note 1: The method of appending metadata to the XTM file still needs
to be worked on. The topic "metadata" is of type "psi-unspsc-71"
which is the id of the entire topic map containing the identifiers (Is
this doable?). The metadata is an instance of the Topic Map. The subject
identity of the metadata topic is the RDF metadata.
Note 2: While doing this, I realized that there are some basic PSIs that
need to be worked on as a separate set. We need PSIs for core linguistic
elements such as Acronym, All capitals, character encoding (maybe), xml:language
(maybe), data type for number, etc. I would really like to be able to
fix the data type for the EGCC as NN.NN.NN.NN, for example. I am planning
to work on this for an xml schema example.
Method B. In a second implementation, the published subject indicators
are obtained from the html source. Their public files do not include the
EGCI, so they were added. ECCMA already has fragment identifiers assigned
to each classification on the web page (the 8-digit code, not the service
control identifier number) in their html
files on their website, but since these can be changed with future updates
in the classification, they are not a good choice for published subject
indicators. Here is an example of the published subject indicator for
"Extraction" unspsc-71-pubsubj.htm#010295
. Since it is an html file, the browser does know to take you to the fragment.
This is one benefit of this approach. The subject identifiers from the
XTM file point to them. We would
not need to write any content into the resourceData, since we already
have subject indicators from ECCMA. I have deleted all of the occurrences
that were in the XTM file used in Method A. The metadata
is included within the html file.
Note 3: Encoding Dublin Core Metadata in HTML (DCMI, Network Working
Group) was used as a guide to write the metadata in the html. See http://www.ietf.org/rfc/rfc2731.txt.
|