|  
         
       
        OASIS Topic Maps Published Subjects TC 
         Recommendations 
        for Documentation of Published Subjects 
       
        Version 0.1 - January 
        10, 2002 
        Latest version : http://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/psdoc.htm 
        Editor: Bernard Vatant 
        January 
        12 : Remarks from Lars Marius Garshol added. 
        January 
        14 : Remarks from Mary Nishikawa added.  
        January 
        18 : Remarks from Murray Altheim added. 
         
         
          Status 
        of this document : 
        Working Draft 
       
      The 
        numbers respect the respective "shall, should, may" of the document: 
        TC Requirements for Documentation 
        of Published Subjects 
         
        1 - Statement of Purpose 
         
      The 
        purpose of this document is to provide recommendations for the structure 
        and content of published subject documentation, as defined below. Those 
        recommendations are aimed at publishers of classifications, taxonomies, 
        thesaurus, catalogues, ontologies ... the objective being to provide those 
        publishers with efficient and standard ways to make their legacy available 
        as published subjects usable by topic maps and other semantic applications. 
         
        Lars Marius Garshol : Needs to make clear that 
        this is based on ISO 13250, and should probably briefly explain what published 
        subjects are. I think it's necessary to write the document so that people 
        who have no idea what any of this is can get at least a clue of what the 
        document is.  
         
         
        2 - Glossary  
          
      The 
        following terms and concepts will be used in this document.  
      Note: 
        Some of those terms are already defined and used by ISO 13250. Nevertheless, 
        the TC proposes some modifications to clarify some of them and their relationships 
        with new ones, and has sent those proposals to ISO JTC1/SW34 for revision 
        and extension of ISO 13250 terminology. Both current ISO 13250 definition 
        and PubSubj TC proposal are given when necessary. 
      
        -  
          published subject
 
           
          defined by ISO 13250 XTM 
          A published subject is any subject for which a subject indicator has 
          been made available for public use and is accessible online via a URI. 
            
           
          new definition proposal 
           
          A published subject is any subject for which at least one subject definition 
          document has been made available by an identified publisher.  
           
          Mary Nishikawa:  
          How about this instead if we need to be more explicit? 
          "A published subject is any subject for which at least one subject 
          definition document at a stable URI has been made available for public 
          use by the publisher identified within the published subject documentation." 
           
           
          Murray Altheim:  
          It might not be an entire document, but rather a document node, such 
          as http://www.topicmaps.org/xtm/1.0/core.xtm#occurrence 
           
       
      
        -  
          published subject documentation
 
           
           new 
          definition proposal 
          A published subject documentation is a resource providing a structured 
          set of subject definition documents. 
       
      
        - publisher 
          
 
           
          defined by Dublin Core 
          The publisher of a resource is an entity responsible for making it available. 
           
       
      
        -  
          subject 
 
           
           defined by ISO 13250 XTM  
          A subject is anything whatsoever, regardless of whether it exists or 
          has any other specific characteristics, about which anything whatsoever 
          may be asserted by any means whatsoever. 
       
      
        -  
          subject definition document 
 
           
          new definition proposal 
           
          A subject definition document is a resource that has been intended by 
          its publisher to provide an indication of the nature of a subject. A 
          subject definition document should be usable both for human understanding 
          and computer processing.  
           
          Lars 
          Marius Garshol:  
          I think the second sentence is misleading. I think it should be replaced 
          by something like:  
          "A subject definition document is not required to use any particular 
          notation, but it must convey an understanding to a human of what the 
          subject is. It may also be computer-processable."  
           
       
      
        - subject 
          indicator 
 
           
          defined by ISO 13250 XTM  
          A subject indicator is a 
          resource that is intended by the topic map author to provide a positive, 
          unambiguous indication of the identity of a subject. 
           
          Lars 
          Marius Garshol: 
           
          This definition needs to change. How about: "Any resource can become 
          a subject indicator by being referred to as such by some topic in some 
          topic map." Slightly circular, but should work.  
           
          Mary Nishikawa:  
          Is this in addition to the first sentence and does it replace it completely? 
           
            
       
      
        -  
          subject indicator reference 
 
           
           defined by ISO 13250 XTM 
          The  element <subjectIndicatorRef> provides 
          a URI reference to a resource that acts as a subject indicator. 
           
          new definition proposal 
           
          A subject indicator reference is a URI reference to a resource that 
          acts as a subject indicator. 
       
      3 
        - Recommendations for Published Subject Documentation 
         
        3.1 - Structure of published subject documentation. 
         
        Lars Marius Garshol:  
        Perhaps this section is better called "Content of published subject documentation", 
        so that we can work out what we want the PSD to contain before we dive 
        into the how?  
         
        Considering that a considerable legacy of taxonomies, classifications, 
        ontologies are likely to be made available as published subject documentations, 
        their publishers should not be constrained more than necessary to use 
        a specific syntax or language.  
         
        Therefore, the present recommendation does not aim to enforce upon publishers 
        either an unique specific syntax for subject definition document (e.g. 
        DTD or Schema), or an unique structure for subject indicator reference 
        (e.g. specific namespace structure).  
         
        Murray Altheim:  
        PSIs are not always going to be put into a specialized XML markup language, 
        and I think it's a mistake to require that. Most will be in XHTML or HTML 
        (as in Cyc), or as addressable resources online (perhaps as a database 
        query, as in ITIS). Requiring a specialized markup creates a big expenditure 
        of resources that seems unnecessary given that the XTM design was to allow 
        pointing to any addressable resource, especially since it's been the XTM 
        documents themselves that in my experience have served as the subject 
        "anchors."  
        But since we're really web-based (in terms of general audience and experience), 
        I'd suggest something along the lines of specific XHTML markup, if we 
        were to make any recommendation. This would enable both human- and machine-readable 
        resources, using commonly available tools like a web browser.  
         
        The minimal requirements for conformance to this recommendation are:  
      3.1.1 
        - Consistency of subject definition document structure 
         
        Throughout a published subject documentation, the subject definition documents 
        should be built following a consistent formal structure (DTD, schema or 
        some equivalent structure definition), allowing an easy processing of 
        their content by topic maps engines, search engines, intelligent agents 
        and any foreseeable kind of semantic web application. 
         
        3.1.2 - Consistency of subject indicator reference structure 
         
        A published subject documentation shall use a consistent namespace and 
        URI's structure for all its subject indicator references.  
      3.1.3 
         - Formal declaration of subject definition document and subject 
        indicator reference structures 
         
        A published subject documentation should include formal declaration of 
        structure for its subject definition document and its subject indicator 
        reference. 
         
      3.2 
        - Content of subject definition document 
          
      A 
        subject definition document shall provide, following a formal structure 
        as defined in 3.1., explicit information items about the published subject 
        and its publisher. A 
        part of those elements can be assimilated to Dublin Core metadata. 
      
        -  
          Title of document (dc:title)
 
        - Identifier 
          (dc:identifier) - should be the subject indicator reference
 
        - Language 
          of the subject definition document (dc:language)
 
        - Publisher 
          (dc:publisher)
 
        -  
          Creator (dc:creator) and possible contributors (dc:contributor)
 
        -  
          Source (dc:source)
 
        -  
          Definition of the subject (dc:subject) 
          
 
        -  
          Rights (dc:rights)
 
        - History 
          of document : dates of creation, modification, validation
 
        - Equivalence 
          : reference to equivalent published subjects in other published subject 
          documentations 
 
        - Users 
          : registered users of the published subject
 
           
       
      Lars 
        Marius Garshol:  
        I think this contains way too much stuff. 
         
        Remember: this resource is supposed to define a single subject, and to 
        be part of a larger set of SDDs.  
         
        My thinking is that the PSD package should contain the following:  
        - dc:title (that is, the title of the PSI set) 
        - dc:identifier (the URI used to indicate the PSI set as a SIR) 
        - dc:language(s) (the language(s) in which subjects are defined)  
        - dc:publisher (who produced the PSD)  
        - dc:source (if the PSD is based on some source material)  
        - version information + publication date  
        - a set of SDDs 
        - a set of base PSDs  
         
        Mary 
        Nishikawa:  
        Can we also recommend to add dc:date with a comment that this is the date 
        of publication?  
        Is version information really needed or would this date suffice? 
         
        Murray Altheim: 
         
        The real question is whether or not we should *require* that the publication 
        date be machine-readable, and if so, how the date(s) should be provided 
        and maintained. DC includes ways of establishing more specialized date 
        semantics, and we'd probably be wanting initial date of publication as 
        well as extent of validity and last update (or "revision date"). This 
        may be asking a lot of our audience, esp. when the PSIs are part of a 
        database or code base from which the PSI publisher is unclear or unable 
        to discern the date information.  
          
        Lars 
        Marius Garshol:   
        I think the PSD should also be extensible, so that if the publisher wants 
        to put in acknowledgements, copyright information, or other information. 
        I think we should be very wary of structuring this information, however, 
        unless we know of specific uses for that structure. Simplicity is good. 
         
         
        The individual SDDs, on the other hand, should have the following:  
        - names (with language qualifiers) 
        - identifier(s) (SIRs, that is)  
        - definition(s) (plain text definitions, with language qualifiers)  
        - class(es) of which the subject is an instance  
         
        I think repeating all the metadata for each SDD is not necessary, it can 
        instead be inherited from the PSD package.  
       
        Mary Nishikawa:  
        Can 
        we also add the acronyms in parenthesis to the published subject documentation 
        (PSD) and the subject definition document (SDD) definitions? There are 
        no acronyms used for the other definitions in ISO 13250, so this may not 
        be good to have acronyms for some but not for others. 
        It 
        would be nice to have at least one example of the PSD and the individual 
        SDDs. 
      ... 
        to be completed 
     |