Code list task group final report (draft 2)

$Date: 2003/09/12 01:06:06 $(UTC)


Table of Contents

1. Overview
2. Scope
3. Validation of authored content regarding code lists
3.1. Mechanical approach to validation
3.2. Example illustration of the approach to validation
4. The supply of supplemental information in a code list adjunct file
5. Synthesizing stock code list definitions from external sources
6. Code list catalogue

1.  Overview

This is the final report of the UBL code list task group addressing the issues described in the minutes of the 2003-08-28 meeting of the code list task group:

This document is not meant to be the candidate wording for any NDR rules or recommendations or formal UBL documentation to be supplied to end users; this is only meant for communicating the conclusions of the task group. It will be up to each of NDRSC, FPSC and LCSC to accept and then act on these results.

The active members of and contributors to this code list task group were (alphabetically) Jon Bosak, Chee-Kai Chin, Tony Coates, Steve Green, G. Ken Holman (chair) and Lisa Seaburg. Many thanks to Tim McGrath and Gunther Stuhec for their contributions.

In conjunction with this report is a ZIP file named code-proposal2-20030911.zip containing the experimental illustrative results of validation and the first draft of the catalogue produced by this task group.

In this document reference is made to both an "end user" and an "implementer". We recognize that while an end user is responsible for editing the instances representing their information, an implementer is responsible for opening UBL out-of-the-box and configuring it for use by the end user. Of course these two roles can be played by the same person, but a distinction is made here to separate the responsibilities of getting UBL to work as an end user desires, and the end user actually using it.

2.  Scope

The scope of this work is in two areas (primarily in the first of these):

  • the validation of authored values used in code-list-based data type content of a UBL instance

  • the provision for supplemental information of interest to downstream processing applications for purposes such as the presentation and acceptance of the information to and from a human with appropriate translations

3.  Validation of authored content regarding code lists

The code list task group identified and detailed four validation perspectives, termed "code list definitions", for the values found in instance content of the type of a given code list, summarized as:

  • "standard" constraints supplied by UBL and expected to be used by users

  • "placebo" constraints delivered "out-of-the-box" by UBL but only providing rudimentary validation of the structure of the contents and not the specific value of the contents

  • "stock" constraints delivered by UBL and available to the user to override "placebo" constraints

  • "private" constraints created by the user and used to override either "standard" or "placebo" constraints

The code list task group has created a catalogue of all code lists in all UBL document models, including among a host of other pieces of information the following fields:

  • a classification of each code list as requiring either a standard or a placebo set of constraints to be developed by LCSC and supplied in the final package as the out-of-the-box delivered collection of valid values

  • a classification of each placebo code list as needing a stock set of constraints to be developed by LCSC and supplied in the final package as an alternative collection of valid values

  • an informative namespace prefix for the data type of the code list, used to ensure consistent documentation

    • namespace prefixes are never normative

    • this value is a key field in the catalogue indexing the same values found in the model spreadsheets

The "in-use" definition for a code list is that definition utilized during the validation process. Only values for standard definitions of code lists are validated for their content when UBL is run out-of-the-box. All other code lists are validated using the placebo definition merely as having a tokenized value, and this value is not checked against any further constraints.

A number of code lists have been identified as suitable for a stock definition of meaningful values that an end user may wish to use, but is not obliged to use when UBL is run out-of-the-box. Alternatively an end user may wish to create a private-use code list definition, in a separate file from the UBL definitions, containing the collection of values different than any collection. An implementer can then configure the UBL validation environment to reflect the end user's requirements.

One thought expressed in the task group discussions was that LCSC could consider supplying a stock definition available to be used for each and every placebo definition.

The implementer is obliged to go through a mechanical task of engaging either a stock or private-use code list definition, and after any such engagement can revert to the out-of-the-box configuration by engaging the original standard or placebo code list definition.

3.1.  Mechanical approach to validation

It is assumed that an implementer of UBL will take UBL out-of-the-box and prepare it for the end user. Should no specific preparations be made, standard and placebo definitions are in play for the validation task for the instances created by the end user. It is a conscious configuration task by the implementer to change the behaviour of UBL by engaging alternative code list definitions for validation.

An entirely satisfactory end user/implementer mechanism for this code list definition engagement was not determined. For the time being the only viable configuration mechanism uses a brute-force file copy command, though this does have the benefit of not needing the user to go into any particular delivered file and do any editing of the file. The code list task group did not consider this an onerous task for the end user or for the implementer.

A highly modular approach is, therefore, proposed for the mechanical implementation of the data typing of the code list:

  • the delivered set of UBL code list validation constraints works out-of-the-box

    • the contents of those code lists with standard definitions are validated for specific values

    • the contents of all other code lists are validated only as being tokenized values through the placebo definitions

  • the delivered Core Component Types schema fragment is unchanged and provides a base code data type from which all specific code data types are derived

    • this base data type is named cct:CodeType

    • all code list data types have the local name DerivedCodeType

    • each code list is distinguished by a normative namespace URI calculated from the dictionary entry name for the code list

    • a conventional namespace prefix has been proposed for every namespace in UBL, including all of the namespaces used for the code lists

    • the code list data type uses the common local name in its respective distinguishing namespace URI using its namespace prefix, as in the document status code status:DerivedCodeType

  • all code lists are delivered referencing the relative URL of the in-use definition file

    • delivered as either the standard definition or the placebo definition as applicable to each list

  • implementer reconfiguration of a UBL code lists is accomplished by the user replacing the in-use definition file with the desired definition file:

    • any utility can be used for copying the desired files over top of delivered files

    • none of the delivered files need to be edited by the user, only copied and then only if desired

  • copies of the original delivered set of standard and placebo definitions still exist so that users can restore their configuration to the UBL out-of-the-box configuration

Note that this approach will greatly increase the number of files in the deliverable, but it won't change the complexity of the validation task from the user's perspective.

The following in ASCII art is the importation hierarchy of the schema fragments for a typical document type exemplified by the Order Cancellation document model:

    Order-Cancellation.xsd
        |               |
        |               v
        |       UBL-Reusable.xsd
        |          |       |||
        |          |       vvv
        |          |  Each-Code-List.xsd
        |          |   Each-Code-List.xsd
        |          |    Each-Code-List.xsd
        |          |         |||
        v          v         vvv
        Core-Component-Types.xsd

Experimentation attempted to find an alternative configuration method through the use of the XML Catalog http://www.oasis-open.org/committees/entity/spec-2001-08-06.html facility, though the current state of development of tools did not sufficiently support this facility for proper experimentation. It is thought that useful changes for user configuration of code lists could be considered for the future.

3.2.  Example illustration of the approach to validation

A suite of test files has been created to illustrate the mechanics of validation for a single code list data type. Should this approach be approved, these steps will need to be repeated for every code list. Although this test suite was written to run in an MSDOS command line environment, the concepts are not reserved to MSDOS and the concepts expressed in this test can be replicated in any other operating system environment. A future implementation guide should avoid any Windows-centric bias in documenting this procedure and could offer the same behaviours using an Ant task or Bash script.

This illustration utilizes two validating W3C Schema processors: MSV http://wwws.sun.com/software/xml/developers/multischema/ and the DOMPrint utility in Xerces-C http://xml.apache.org/xerces-c/ (not included in the ZIP). The UBL document being validated in a short example of Order Cancellation.

Consider the example of status codes: probably not a good choice for placebo but the smallest sample instance uses it and small is good for examples (go with me on this for this example). A document is of a particular status. As delivered in this example, UBL users can use any status they wish as long as it is a token value because the delivered in-use validation is just a copy of the placebo validation thus it is checking that the value is a W3C Schema token value. UBL delivers a stock set of the two values "Original" and "Copy" as valid status code members. Most users would probably be sufficed by this, so most would copy the stock set over top of the in-use set. One user, however, needs a private set of three values "Original", "Copy" and "Fraudulent", thus necessitating they create a private set of constraints and copy them over top of the in-use file.

The following files (listed alphabetically) are in this illustrative suite:

  • adjunct.bat

    • reconfigures the in-use XSD file of code list constraints by copying the version of the private set of constraints that includes the adjunct URI

    • produces the output-good-private-adjunct.xml file

  • code-catalogue-proposal-20030911.sxc

    • the catalogue of all code lists and their associated information values

  • code-proposal-20030911.xml

    • the DocBook XML source to the task group report HTML file

  • CoreComponentParameters-codelist-strawman-2.xsd

    • unchanged from version 0.81D7 (except for experimental URI strings for namespaces)

  • CoreComponentTypes-codelist-strawman-2.xsd

    • adds a new data type for the <cat:StatusCode> element (and experimental namespace URI)

    • derived from an externally-defined data type in a namespace used solely for that data type

  • output-good-private-adjunct.xml

    • the output of running the test-good-private.xml instance through a W3C Schema processor that exposes the defaulted attributes in the PSVI

  • test-good-private.xml

    • a test instance file utilizing a status code value "Fraudulent" only described in the private set of constraints

  • test-good-stock.xml

    • a test instance file utilizing only a status code value "Original" described in the stock set of constraints

  • test.bat

    • reconfigures the in-use XSD file of code list constraints by copying the desired set of constraints

    • tests the resulting in-use XSD against both of the test instance files

  • UBL-Codelist-Catalog-Private.xml

    • an untested catalogue-based approach to the configuration mechanism to employ the private set of constraints

  • UBL-Codelist-Catalog-Stock.xml

    • an untested catalogue-based approach to the configuration mechanism to employ the stock set of constraints

  • UBL-CodeList-StatusCode-Placebo-strawman-2.xsd

    • a lax set of constraints checking only that the coded value is a token

    • delivered by UBL and the delivered contents of the in-use XSD file

  • UBL-CodeList-StatusCode-Private-Adjunct-strawman-2.xsd

    • a constrained set of values for the status code including "Fraudulent"

    • as if it had been created by an implementer of UBL

    • includes an optional attribute of type URI providing for a value pointing to a code list adjunct file, the contents of which are arbitrarily defined by the code list creator and documented for those writing stylesheets and processing applications

    • as with many other optional attributes in UBL, the implementation is responsible for populating these with values to be accessed by processing applications

  • UBL-CodeList-StatusCode-Private-strawman-2.xsd

    • a constrained set of values for the status code including "Fraudulent"

    • as if it had been created by a user of UBL

  • UBL-CodeList-StatusCode-Stock-strawman-2.xsd

    • a constrained set of values expected by the UBL committee to be used by the UBL user for functioning interoperability

  • UBL-CodeList-StatusCode-Use-strawman-2.xsd

    • the set of constraints used by reference by being imported by the Core Component Types schema fragment

    • this file is overwritten in the course of testing different end user configurations of UBL

  • UBL-OrderCancellation-codelist-strawman-2.xsd

    • the order cancellation document model unmodified

  • UBL-Reusable-codelist-strawman-2.xsd

    • the declaration of the status code element as being the status code data type

  • external/dummy.xsd

    • a dummy code list placebo definition not meant for production use but used to illustrate the production of stock definitions from external definitions

    • in production this file will not be used and each stock definition generation step will utilize the respective placebo definition

  • external/extcode2ubl.xsl

    • an XSLT stylesheet to read an external code list definition and a placebo definition to synthesize a stock definition comprised of all of the enumerations found in the external definition

  • external/makestock.bat

    • the dummy execution for illustrative purposes of extcode2ubl for six UNeDocs code list external definitions

Corresponding to the approach detailed above, note the following regarding various files in this deliverable:

  • the core component types are unchanged from Draft 8 and define the base data type for code lists as CodeType (CoreComponentTypes-codelist-proposal.xsd line 93)

  • the reusable fragment declares <StatusCode> to be of type status:DerivedCodeType (UBL-Reusable-codelist-proposal.xsd line 7494)

  • the reusable fragment imports the definition of status:DerivedCodeType from the in-use file of constraints using the <xsd:import> (UBL-Reusable-codelist-proposal.xsd line 57)

  • the placebo file of constraints derives the status:DerivedCodeType only from the cct:CodeType (UBL-CodeList-StatusCode-Placebo-proposal.xsd line 17)

  • the stock file of constraints declares a set of values UBL believes to be sufficient (UBL-CodeList-StatusCode-Stock-proposal.xsd lines 18-19)

  • the private file of constraints declares that set needed by a particular user and with values not anticipated by UBL (UBL-CodeList-StatusCode-Private-proposal.xsd lines 18-20)

  • the private adjunct file of constraints is the same as the private file of constraints, with the addition of the fixed URL attribute (UBL-CodeList-StatusCode-Private-Adjunct-proposal.xsd line 21)

Running the test.bat file produces the following results using, in turn, each of MSV and Xerces C++ for validation, illustrating the same conclusions made by each of these W3C Schema validators and exposing the PSVI seen by an application:

T:\ubl>rem Step 1 - Review of the input end-user instances 

T:\ubl>rem The delivered placebo definition validates against all token values 

T:\ubl>type test-good-stock.xml 
<?xml version="1.0"?>
<xo:OrderCancellation 
  xmlns:xo="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81" 
  xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.81" 
  xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81" 
  xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.81" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81
                      UBL-OrderCancellation-codelist-proposal2.xsd">
  <cat:ID>X03-20031234-0</cat:ID>
  <cat:StatusCode>Original</cat:StatusCode>
  <cat:IssueDate>2003-02-03</cat:IssueDate>
  <cat:Note>this cancellation is illustrative since the order has been changed too</cat:Note>
  <xo:CancellationReasonCode>0</xo:CancellationReasonCode>
  <cat:OrderReference>
    <cat:BuyersDocumentID>20031234-0</cat:BuyersDocumentID>
    <cat:IssueDate>2003-02-02</cat:IssueDate>
    <cat:StatusCode>Original</cat:StatusCode>
  </cat:OrderReference>
  <cat:BuyerParty>
    <cat:SellerAssignedAccountID>DA-VH007-G</cat:SellerAssignedAccountID>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Bills Microdevices</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:BuyerParty>
  <cat:SellerParty>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Joes Office Supply</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:SellerParty>
</xo:OrderCancellation>

T:\ubl>type test-good-private.xml 
<?xml version="1.0"?>
<xo:OrderCancellation 
  xmlns:xo="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81" 
  xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.81" 
  xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81" 
  xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.81" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81
                      UBL-OrderCancellation-codelist-proposal2.xsd">
  <cat:ID>X03-20031234-0</cat:ID>
  <cat:StatusCode>Original</cat:StatusCode>
  <cat:IssueDate>2003-02-03</cat:IssueDate>
  <cat:Note>this cancellation is illustrative since the order has been changed too</cat:Note>
  <xo:CancellationReasonCode>0</xo:CancellationReasonCode>
  <cat:OrderReference>
    <cat:BuyersDocumentID>20031234-0</cat:BuyersDocumentID>
    <cat:IssueDate>2003-02-02</cat:IssueDate>
    <cat:StatusCode>Fraudulent</cat:StatusCode>
  </cat:OrderReference>
  <cat:BuyerParty>
    <cat:SellerAssignedAccountID>DA-VH007-G</cat:SellerAssignedAccountID>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Bills Microdevices</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:BuyerParty>
  <cat:SellerParty>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Joes Office Supply</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:SellerParty>
</xo:OrderCancellation>

T:\ubl>rem Step 2a - Implementer configuring UBL for the placebo definition 

T:\ubl>copy UBL-CodeList-StatusCode-Placebo-proposal2.xsd UBL-CodeList-StatusCode-Use-proposal2.xsd 
        1 file(s) copied.

T:\ubl>type UBL-CodeList-StatusCode-Use-proposal2.xsd 
<?xml version="1.0"?>
<!--
  This is the placebo code list definition that validates against all tokens.
-->
<xsd:schema 
        targetNamespace="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:status="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
elementFormDefault="qualified"
attributeFormDefault="qualified">
  <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
    schemaLocation="CoreComponentTypes-codelist-proposal2.xsd"/>
  <xsd:element name="StatusCode" type="status:DerivedCodeType"/>
  <xsd:complexType name="DerivedCodeType">
    <xsd:simpleContent>
      <xsd:restriction base="cct:CodeType">
      </xsd:restriction>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:schema>
T:\ubl>rem Step 2b - Validating the two end-user instances against the model 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-stock.xml 
start parsing a grammar.
validating test-good-stock.xml
the document is valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-stock.xml  1>nul 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-private.xml 
start parsing a grammar.
validating test-good-private.xml
the document is valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-private.xml  1>nul 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>rem Step 3a - Implementer configuring UBL for the stock definition 

T:\ubl>copy UBL-CodeList-StatusCode-Stock-proposal2.xsd UBL-CodeList-StatusCode-Use-proposal2.xsd 
        1 file(s) copied.

T:\ubl>type UBL-CodeList-StatusCode-Use-proposal2.xsd 
<?xml version="1.0"?>
<!--
  This is the placebo code list definition that validates against all tokens.
-->
<xsd:schema 
        targetNamespace="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:status="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
elementFormDefault="qualified"
attributeFormDefault="qualified">
  <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
    schemaLocation="CoreComponentTypes-codelist-proposal2.xsd"/>
  <xsd:element name="StatusCode" type="status:DerivedCodeType"/>
  <xsd:complexType name="DerivedCodeType">
    <xsd:simpleContent>
      <xsd:restriction base="cct:CodeType">
        <xsd:enumeration value="Original"/>
        <xsd:enumeration value="Copy"/>
      </xsd:restriction>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:schema>
T:\ubl>rem Step 3b - Validating the two end-user instances against the model 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-stock.xml 
start parsing a grammar.
validating test-good-stock.xml
the document is valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-stock.xml  1>nul 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-private.xml 
start parsing a grammar.
validating test-good-private.xml
Error at line:18, column:48 of file:///T:/ubl/test-good-private.xml
  the value is not a member of the enumeration: ("Original"/"Copy")

the document is NOT valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-private.xml  1>nul 2>domprint.err 

T:\ubl>if 4 NEQ 0 type domprint.err 
Error at file "T:\ubl/test-good-private.xml", line 18, column 48
   Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value 'Fraudulent' is not in enumeration .

T:\ubl>rem Step 4a - Implementer configuring UBL for the private definition 

T:\ubl>copy UBL-CodeList-StatusCode-Private-proposal2.xsd UBL-CodeList-StatusCode-Use-proposal2.xsd 
        1 file(s) copied.

T:\ubl>type UBL-CodeList-StatusCode-Use-proposal2.xsd 
<?xml version="1.0"?>
<!--
  This is the placebo code list definition that validates against all tokens.
-->
<xsd:schema 
        targetNamespace="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:status="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
elementFormDefault="qualified"
attributeFormDefault="qualified">
  <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
    schemaLocation="CoreComponentTypes-codelist-proposal2.xsd"/>
  <xsd:element name="StatusCode" type="status:DerivedCodeType"/>
  <xsd:complexType name="DerivedCodeType">
    <xsd:simpleContent>
      <xsd:restriction base="cct:CodeType">
        <xsd:enumeration value="Original"/>
        <xsd:enumeration value="Copy"/>
        <xsd:enumeration value="Fraudulent"/>
      </xsd:restriction>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:schema>
T:\ubl>rem Step 4b - Validating the two end-user instances against the model 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-stock.xml 
start parsing a grammar.
validating test-good-stock.xml
the document is valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-stock.xml  1>nul 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>java -jar p:\xml\xml\sun-msv\msvcurr\msv.jar UBL-OrderCancellation-codelist-proposal2.xsd test-good-private.xml 
start parsing a grammar.
validating test-good-private.xml
the document is valid.

T:\ubl>domprint -n -s -f -wddc=off test-good-private.xml  1>nul 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>rem Done! 

Note above how the private instance does not validate against the "stock" set of constraints using either of the W3C Schema processors.

4.  The supply of supplemental information in a code list adjunct file

Some downstream processes may require supplemental information associated with each of the members described in the code list data types. For example, the display string "Etats Unis" for the French-language display of the country code "US".

The code list task group initially considered the definition and inclusion of defaulted attributes in the W3C Schema expression that would expose any hardwired values found in the expression to downstream processes using the W3C Schema Post Schema Validation Infoset. This approach was abandoned in light of other validation technologies available to XML users that do not supplement the information set with any information maintained in the document model.

This proposal suggests providing room in a validated attribute for the file name of supplemental information, where the information in the file is indexed by some means to the individual code list member values. The maintainer of the code list has free rein to describe the supplemental information in any vocabulary desired, though it is incumbent on them to sufficiently describe the vocabulary so that people writing stylesheets or other processes know how to de-reference the strings or other values they might need.

As with other optional attributes in UBL data types, the W3C Schema declarations provide for their presence but the W3C Schema post validation information set defaulted attribute facility is not used. It is the responsibility of the implementation of UBL, not the validation schemas of UBL, to take an end-user document and populate the attributes with the configuration information. An end-user document so populated could then still be validated by the UBL schemas because of the provision for the presence of these attributes.

This report suggests that the listAdjunctURI attribute be made available for implementations to record the URI of an associated adjunct file of information related to a code list.

Xerces C++ is an XML processor that can build DOM trees on the PSVI instead of the information set. The Post Validation Information Set does include defaulted attributes described by W3C Schema expressions.

Running the adjunct.bat file produces the following results using Xerces C++ for validation, exposing in the resulting DOM tree the defaulted adjunct URI filename:

T:\ubl>if exist test-good-private-adjunct.xml del test-good-private-adjunct.xml 

T:\ubl>copy UBL-CodeList-StatusCode-Private-Adjunct-proposal2.xsd UBL-CodeList-StatusCode-Use-proposal2.xsd 
        1 file(s) copied.

T:\ubl>type UBL-CodeList-StatusCode-Use-proposal2.xsd 
<?xml version="1.0"?>
<!--
  This is the placebo code list definition that validates against all tokens.
-->
<xsd:schema 
        targetNamespace="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
        xmlns:status="urn:oasis:names:tc:ubl:codelist:statuscode:proposal2"
elementFormDefault="qualified"
attributeFormDefault="qualified">
  <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81"
    schemaLocation="CoreComponentTypes-codelist-proposal2.xsd"/>
  <xsd:element name="StatusCode" type="status:DerivedCodeType"/>
  <xsd:complexType name="DerivedCodeType">
    <xsd:simpleContent>
      <xsd:restriction base="cct:CodeType">
        <xsd:enumeration value="Original"/>
        <xsd:enumeration value="Copy"/>
        <xsd:enumeration value="Fraudulent"/>
        <xsd:attribute name="listAdjunctURI" type="xsd:anyURI" use="optional"/>
      </xsd:restriction>
    </xsd:simpleContent>
  </xsd:complexType>
</xsd:schema>
T:\ubl>type test-good-private.xml 
<?xml version="1.0"?>
<xo:OrderCancellation 
  xmlns:xo="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81" 
  xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.81" 
  xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81" 
  xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.81" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81
                      UBL-OrderCancellation-codelist-proposal2.xsd">
  <cat:ID>X03-20031234-0</cat:ID>
  <cat:StatusCode>Original</cat:StatusCode>
  <cat:IssueDate>2003-02-03</cat:IssueDate>
  <cat:Note>this cancellation is illustrative since the order has been changed too</cat:Note>
  <xo:CancellationReasonCode>0</xo:CancellationReasonCode>
  <cat:OrderReference>
    <cat:BuyersDocumentID>20031234-0</cat:BuyersDocumentID>
    <cat:IssueDate>2003-02-02</cat:IssueDate>
    <cat:StatusCode>Fraudulent</cat:StatusCode>
  </cat:OrderReference>
  <cat:BuyerParty>
    <cat:SellerAssignedAccountID>DA-VH007-G</cat:SellerAssignedAccountID>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Bills Microdevices</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:BuyerParty>
  <cat:SellerParty>
    <cat:Party>
      <cat:PartyName>
        <cat:Name>Joes Office Supply</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:SellerParty>
</xo:OrderCancellation>

T:\ubl>domprint -v=always -n -s -f -wddc=off test-good-private.xml  1>output-good-private-adjunct.xml 2>domprint.err 

T:\ubl>if 0 NEQ 0 type domprint.err 

T:\ubl>type output-good-private-adjunct.xml 
<?xml version="1.0" encoding="UTF-8" standalone="no" ?><xo:OrderCancellation xmlns:xo="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81" xmlns:cat="urn:oasis:names:tc:ubl:CommonAggregateTypes:1.0:0.81" xmlns:cct="urn:oasis:names:tc:ubl:CoreComponentTypes:1.0:0.81" xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1.0:0.81" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:ubl:OrderCancellation:1.0:0.81                       UBL-OrderCancellation-codelist-proposal2.xsd">
  <cat:ID language="en">X03-20031234-0</cat:ID>
  <cat:StatusCode language="en">Original</cat:StatusCode>
  <cat:IssueDate language="en">2003-02-03</cat:IssueDate>
  <cat:Note language="en" languageID="en">this cancellation is illustrative since the order has been changed too</cat:Note>
  <xo:CancellationReasonCode language="en">0</xo:CancellationReasonCode>
  <cat:OrderReference>
    <cat:BuyersDocumentID language="en">20031234-0</cat:BuyersDocumentID>
    <cat:IssueDate language="en">2003-02-02</cat:IssueDate>
    <cat:StatusCode language="en">Fraudulent</cat:StatusCode>
  </cat:OrderReference>
  <cat:BuyerParty>
    <cat:SellerAssignedAccountID language="en">DA-VH007-G</cat:SellerAssignedAccountID>
    <cat:Party>
      <cat:PartyName>
        <cat:Name language="en" languageID="en">Bills Microdevices</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:BuyerParty>
  <cat:SellerParty>
    <cat:Party>
      <cat:PartyName>
        <cat:Name language="en" languageID="en">Joes Office Supply</cat:Name>
      </cat:PartyName>
    </cat:Party>
  </cat:SellerParty>
</xo:OrderCancellation>
T:\ubl>rem Done! 

Note in the above how a processor using the PSVI output from a W3C Schema processor "sees" the defaulted language= attribute defined in the core components schema fragment, augmented to the original XML instance. This default attribute is declared in the commonAttributes attribute group. The original XML instance shows <cat:StatusCode>Original</cat:StatusCode> while the PSVI XML instance shows <cat:StatusCode language="en">Original</cat:StatusCode> for the processed content.

Implementers of UBL can then define their own code list W3C Schema fragments, include the URI of the associated adjunct information during processing, and the provision made for the URI in the attribute declarations will allow an augmented instance still to be validated. A processing application can then have access to the URI and therefore access the adjunct file contents as documented by the adjunct file creator.

As an aside, it should be noted this W3C Schema practice shown in the UBL fragment above of defining a defaulted attribute is not portable to other schema expressions such as RELAX-NG.

5.  Synthesizing stock code list definitions from external sources

There are two cases where external definitions of code list enumerated members are used to create UBL stock code list definitions:

  • the code list is created by an outside agency or organization

    • e.g. the UNeDocs country, currency and location code lists

  • the code list is too large to be enumerated in the code list catalogue spreadsheet

    • a skeleton declaration of values using W3C Schema syntax, without regard for namespaces or other UBL facilities, can be used in the synthesis of a UBL stock code list definition

In the external/ subdirectory of the experimental prototype package is an XSLT 1.0 stylesheet named extcode2ubl.xsl used to create a UBL stock code list definition. XSLT processors will successfully open URL addresses when on-line. The included makestock.bat file illustrates the synthesis of six UBL code lists from six UNeDocs code lists.

The input to the stylesheet is the placebo code list definition. This provides all of the UBL specific information needed in the stock definition, but does not include any enumerations. The external XSD file includes the desired enumerations for the stock code list definition. The two arguments to the invocation of this stylesheet are processDate= and externalURL=.

When LCSC has completed creating the UBL placebo code list definitions, this stylesheet is used with the then-current external XSD files (either those skeleton ones in UBL or those supplied by an outside organization).

It should be noted this tool is not appropriate for code lists defined using W3C Schema mechanisms other than xsd:enumeration (such as the use of regular expressions).

6.  Code list catalogue

The code list catalogue for LCSC is included in the ZIP file. The code list task group has ascertained the columns as a collection of useful information for the code lists.

This will be used to capture relevant information about code lists for documentation and development purposes. It is tied to the model spreadsheets through the first data column, titled "Documentary Namespace Prefix".

The reader is directed to each of the columns and their respective documentation found in pop-up boxes. These boxes are displayed when the pointer hovers near the top-right corner of the column heading cell.

A preliminary set of rows is defined as a contribution from LCSC, and is not considered part of the deliverable from the code list task group.