PSI Best Practices

OASIS Topic Maps Published Subjects Technical Committee
Pubsubj > Documents > Working documents

Draft Proposal
Best Practices for Published Subject Documentation Structure

Mary Nishikawa - February 21, 2002

For inclusion in: Documentation of Published Subjects - Requirements and Recommendations - Part 5

Scenarios for the use of published subjects in Topic Maps

Scenario 1: A topic map implementor would like to use a standard classification as a Topic Map representation in XTM. The implementor would like to ensure that the indicators are stable and that the references to the indicators, (the subject identifiers) are the same in all maps for these subjects. This will ensure that the same subjects in various maps are, in fact, merged.

Method A. For this implementation, it is decided to publish the metadata as Simple Dublin Core in RDF for each xtm file containing the set of published subjects.

The standard classification chosen was Segment 71 of the Universal Standard Products and Services Classification (UNSPSC) published by the Electronic Commerce Code Management Association at http://www.eccma.org.

The classification of the UNSPSC is broken down into 4 levels: Segment, Family, Class, and Commodity. There are also the core elements of title, the UNSPSC eight digit hierarchical code called the ECCMA Global Commodity Classification (EGCC), and the UNSPSC product or service control identifier number, the ECCMA Global Commodity Indicator (EGCI). Since the ECCMA already has an identifier, one number that is fixed throughout the lifetime of each product or service, it will be used to establish the "identity" of the subject in the topic map.

All Segments of the UNSPSC use Segment, Family, Class, Commodity, EGCC, EGCI and title. I have placed these in a separate XTM published subject set: psi-unspsc-core.xtm with it's corresponding metadata file: psi-unspsc-core-meta.rdf . The metadata was written as described in Expressing Simple Dublin Core in RDF. All of the published subject indicators are contained within the XTM file as resourceData in an occurrence. The resource data element contains the product or service control identifier number (EGCI)The published subject identifiers are the fragment identifiers of the EGCI resourceData element id, in the subjectIndicatorRef within the subjectIdentity element. For example, the subject identity of "Family" is psi-unspsc-core.xtm#psi-unspsc-family-desc. Also, browsers don't seem to resolve the fragment identifier with only the .xtm appended with fragment identifiers, so the files have .xml appended to them (file name is ...core.xtm.xml#psi-unspsc-family-desc). IE does not take you to each individual fragment, only to the top of the file.

There is a second XTM file that contains all of the UNSPSC code for Segment 71 (the Mining Services Family has only been completed and the Oil and Gas Services Family will be added later on) . It is set up similarly to the one above and also has its metadata in RDF.

Note 1: The method of appending metadata to the XTM file still needs to be worked on. The topic "metadata" is of type "psi-unspsc-71" which is the id of the entire topic map containing the identifiers (Is this doable?). The metadata is an instance of the Topic Map. The subject identity of the metadata topic is the RDF metadata.

Note 2: While doing this, I realized that there are some basic PSIs that need to be worked on as a separate set. We need PSIs for core linguistic elements such as Acronym, All capitals, character encoding (maybe), xml:language (maybe), data type for number, etc. I would really like to be able to fix the data type for the EGCC as NN.NN.NN.NN, for example. I am planning to work on this for an xml schema example.

Method B. In a second implementation, the published subject indicators are obtained from the html source. Their public files do not include the EGCI, so they were added. ECCMA already has fragment identifiers assigned to each classification on the web page (the 8-digit code, not the service control identifier number) in their html files on their website, but since these can be changed with future updates in the classification, they are not a good choice for published subject indicators. Here is an example of the published subject indicator for "Extraction" unspsc-71-pubsubj.htm#010295 . Since it is an html file, the browser does know to take you to the fragment. This is one benefit of this approach. The subject identifiers from the XTM file point to them. We would not need to write any content into the resourceData, since we already have subject indicators from ECCMA. I have deleted all of the occurrences that were in the XTM file used in Method A. The metadata is included within the html file.

Note 3: Encoding Dublin Core Metadata in HTML (DCMI, Network Working Group) was used as a guide to write the metadata in the html. See http://www.ietf.org/rfc/rfc2731.txt.

Draft Proposal Best Practices for Published Subject Documentation Structure

Draft Proposal
Best Practices for Published Subject Documentation Structure