ubl message

Subject: Re: [ubl] Examples for Proposed Single-pass Extension Validation
From: "G. Ken Holman" <gkholman@CraneSoftwrights.com>
To: Universal Business Language <ubl@lists.oasis-open.org>
Date: Sat, 30 Oct 2010 21:52:38 -0400
Thank you, Ken, for your efforts with this task.  But I'm reminded of 
the Kobayashi Maru!

I certainly agree that the approach you put forward will report valid 
documents as valid.  But it will not report invalid documents as 
invalid.  There will be invalid documents that this approach will 
report as valid.

That changes the rules of the test.

At the time the extension point was designed, the committee's 
thoughts about validation were centred around the body of the UBL 
document:  any structural invalidity must be reported and the 
document called invalid.  Thus, the philosophy was carried over to 
the extension point, and the various machinations I've put into PRD1 
address that.

Specifically, there were two issues enunciated within the committee 
at the time that are not covered with your approach:

  (1) - an extension that has a UBL construct as the apex element, including
        any CBC, any CAC and any document ABIE ... we explicitly said we
        didn't want users putting UBL documents under the extension point,
        let alone allowing something like:

                 <ext:UBLExtension>
                   <ext:ExtensionContent>
                     <cbc:UUID>123</cbc:UUID>
                   </ext:ExtensionContent>
                 </ext:UBLExtension>

          ... or:

                 <ext:UBLExtension>
                   <ext:ExtensionContent>
                     <in:Invoice>.....</in:Invoice>
                   </ext:ExtensionContent>
                 </ext:UBLExtension>

      - in retrospect, I realize now my approach doesn't even cover this!  W3C
        schema won't even let my approach catch the above kind of error

  (2) - an extension that has an incorrect user extension element as the
        apex element;  consider that a user defines this extension:

                 <ext:UBLExtension xmlns:corpx="urn:corpx">
                   <ext:ExtensionContent>
                     <corpx:CorpXItem>
                       <cbc:ID>1</cbc:ID>
                       <cbc:TypeCode>CR</cbc:TypeCode>
                      <corpx:CorpXOtherItem>abc</corpx:CorpXOther>
                     </corpx:CorpXItem>
                   </ext:ExtensionContent>
                 </ext:UBLExtension>

        ... then the proposed scheme would not signal this as invalid:

                 <ext:UBLExtension xmlns:corpx="urn:corpx">
                   <ext:ExtensionContent>
                    <corpx:CorpXOtherItem>abc</corpx:CorpXOther>
                   </ext:ExtensionContent>
                 </ext:UBLExtension>

      - this is being caught by my approach as an invalid document, and this
        was the focus of our efforts at the time:  we only wanted the apex
        element to be allowed as the child of the extension content

Of course one could take the viewpoint that if the *wrong* element is 
at the apex of the extension, that simply represents an extension 
that you don't recognize and so you don't care that it is wrong.  As 
it is you don't care about other extensions you don't recognize, so 
extensions in error are simply extensions you don't recognize so why worry?

But that wasn't the attitude at the time we designed the extension 
point.  If the sender of the document incorrectly structured their 
instance (as in (2) above) and it inadvertently validated because of 
this new policy, they would send that incorrect document to the 
recipient thinking it contained important information to be processed 
by the recipient.  The recipient would then accept the document 
because it didn't violate any schema constraints.  The sender 
*thinks* he has sent extension information but the recipient *knows* 
the sender didn't because nothing recognizable was at the apex of the 
extension point.

Committee members know from years of experience I'm really quite anal 
about this validation stuff, but if as a group we accept a revised 
extension point policy in our PRD1 review, then I can certainly live 
with it.  The design I put forward supported the policy at the time 
it was designed.  This new approach was expressly rejected because it 
was "too loose".  If that policy proves overly burdensome, then let's 
change the policy and I agree I would have done it the way you've done it.

In fact, writing this response in detail has helped me to more 
appreciate the generic approach we rejected years ago and you've 
articulated here.  Its simplicity outweighs, I now think, the limited 
benefits of what the 2006 approach brings to the table.

Let's change the test.

Had RELAX-NG been used, none of this would need a discussion.  It 
would catch all invalid documents as invalid.  We are wrestling with 
a limitation of W3C Schema:  no namespace exclusion other than 
"other", and unique particle attribution preventing the choice using 
two of those "namespace exclusion" operands.

So I'm prepared to adopt this new approach for PRD2 if the technical 
committee agrees during the PRD1 review.

Thank you again for your efforts.

. . . . . . . . . . Ken

p.s. regarding your note, I ran your example instance through Xerces, 
Saxon and Altova2010 standalone validation and it was passed by all 
three processors

At 2010-10-29 23:50 -0400, Kenneth Vaughn wrote:
>Per my task from the last teleconference, I have created an example of
>what my proposal would look like for handling extensions in a single
>pass. The easiest way to demonstrate this is to unzip the contents of
>the attached file into your UBL PRD1 directory. You can then modify
>the "include" element in UBL-CommonExtensionComponents-2.1.xsd to
>point to either UBL-ExtensionContentDataType-2.1.klv.xsd or UBL- 
>ExtensionContentDataType-2.1.klv-cust.xsd. The files should then be
>good to play with (with one note below). Descriptions for each file are:
>
>UBL-ExtensionContentDataType-2.1.klv.xsd
>This is the proposed STANDARD schema file that would be distributed.
>It defines the ExtensionContentType datatype and references the
>digital signature namespace. (Obviously, the namespace reference would
>be removed, if we moved signatures into the main body). If you
>evaluate Example-Invoice-2.1.klv.xml with the
>CommonExtensionComponents-2.1.xsd including this file, it will pass
>validation, but none of the extensions will be checked as the
>namespaces will not be recognized by the validator; instead, the
>extensions are simply skipped.
>
>
>UBL-ExtensionContentDataType-2.1.klv-cust.xsd
>This is an example customization of the standard file above. An
>implementor would customize the file to
>1) define the customization namespaces that the implementation can
>recognize and
>2) import the customization schemas containing the specific elements
>that can be included as extensions
>That is the only customization needed; and technically, the namespaces
>do not need to be declared since they are not actually referenced in
>the schema; they are only provided for readability. If you evaluate
>Example-Invoice-2.1.klv.xml with the CommonExtensionComponents-2.1.xsd
>including this file, the content of the defined CorpX and RegionY
>extensions will be checked - this can be shown by playing with the
>values in the example XML document as described in the comments
>embedded within the file.
>
>
>CorpX.xsd
>An example of a schema containing a customized extension.
>
>
>RegionY.xsd
>A second example of a schema containing a customized extension; the
>second example demonstrates that a single instance file can reference
>multiple custom extensions and they are all validated in a single pass.
>
>
>Example-Invoice-2.1.klv.xml
>The example XML document to play around with. As is, it should
>validate with either Extension Content Data Type schema; but if you
>change the extension content when using the customized Extension
>Content Data Type schema, errors will be flagged.  The example also
>includes an extension from a third (undefined) namespace to
>demonstrate that this extension will be skipped over while still
>validating the recognized extensions.
>
>
>
>
>===========     NOTE    NOTE    NOTE   ===================
>
>As near as I can tell, it SHOULD work as described above. For some
>reason, XMLSpy would not check the content of the Extensions as
>these files are written... I had to change the namespace for the UBL
>Invoice schema in both the XML instance document and the UBL
>Invoice schema for it to work as described. I spent about 3 hours
>trying to figure out why it was not checking the content only to
>stumble upon this work-around. It works if I simply change the
>namespace (i.e., the full URN) to have a 3 at the end instead of a 2...
>
>I can not figure out why this would be a problem, my working theory
>is that it is some bug within XMLSpy itself (perhaps it is stuck on the
>old definition of the namespace referring to the old Extension??? But
>I rebooted my machine and still had the problem...). In short, I was
>unable to think of any reason that the namespace would pose a real
>problem and I am suspecting that it is unique to XMLSpy (older
>version) and/or my machine - but if you do not see it checking content
>when it should, please let me know and perhaps we can figure out
>why this anomaly occurred.
>
>Actually, if it does work for you, I'd be interested in knowing as well
>and what software you are using... 3 hours is too much time to spend
>chasing down a fake problem!


--
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/o/
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/o/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
References:
- Examples for Proposed Single-pass Extension Validation
  - From: Kenneth Vaughn <kvaughn@trevilon.com>