RE: [ubl] Specialised DataTypes Schema Module

ubl message

Subject: RE: [ubl] Specialised DataTypes Schema Module

From: "Stig Korsgaard" <STK@Finansraadet.dk>

To: "CRAWFORD, Mark" <MCRAWFORD@lmi.org>

Date: Wed, 17 Mar 2004 13:15:43 +0100

Mark has a very good point here. I strongly support his comments!

Honestly to me deviating from already agreed decisions seems only to prolong the release of final work on a continued basis!

Best Regards

Stig Korsgaard
M.Sc.E Standardisation Manager
Tel: +45 3370 1083
Cell: +45 2121 8234
Mail: stk@finansraadet.dk

Danish Bankers Association
Amaliegade 7
DK-1256 Copenhagen K
Tel: 3370 1000
Fax: 3393 0260
mail@finansraadet.dk
www.finansraadet.dk

-----Original Message-----
From: CRAWFORD, Mark [mailto:MCRAWFORD@lmi.org]
Sent: 17. marts 2004 12:40
To: Tim McGrath; Stephen Green
Cc: ubl@lists.oasis-open.org
Subject: RE: [ubl] Specialised DataTypes Schema Module

Quite frankly, arguments that "we haven't done it this way before" are wearing very thin. We agreed to the CCT, UDT, SDT and CLUDT schema modules in January in NDR and reaffirmed it at the F2F in a full TC meeting. The approach was chosen to ensure that there were logical arrangements of the various datatypes - from the CCT and UDT which provide CCTS conformance to the CLUDT and SDT schema provides a consistent way to ensure that we have separate and distinct schema modules for all UBL created datatypes. By separating between SDT and UDT, we allow customisers to then create their own SDT schema module.

If we deviate from our approach, we run the risk of making our various DT schema's unacceptable to the larger community. For example - if the CCT schema module does not faithfully represent the CCTS CCTs in their current form, then any party interested in CCTS conformance - such as UN/CEFACT will not support it. If the UDT schema module has any restrictions above and beyond those in CCTS for supplementary components, then once again any party interested in CCTS conformance will not support it. If we do not maintain a SDT schema module, then we have lost our clear and concise modularity solution. From my perspective, we should adhere to the agreements made at the F2F.

Mark

-----Original Message-----
From: Tim McGrath [mailto:tmcgrath@portcomm.com.au]
Sent: Wednesday, March 17, 2004 5:09 AM
To: Stephen Green
Cc: ubl@lists.oasis-open.org
Subject: Re: [ubl] Specialised DataTypes Schema Module

In reponse to this, I want to make the point that it has been agreed that we need Specialised Data Types for Code Lists. The question is whether we need to define them in two places - once in a combined schema and once in individual schemas for each code list. As has been expressed on several occassions, we definitely want to have separate schemas for each code list. So, the real question is, do we need the combined SDT schema?

Secondly, this issue is not a matter of changing existing practice. we have never used the SDT approach before, so whatever we do is new and therefore untested. Your conclusion recommends to *not* remove the SDT Schema - i would say you are actually putting the case to add it. it is not there in 1.0-Beta (except as a placeholder that was not referenced) and it is not required by the code list representation mechanism (or the CCTS - to get that argument out of the way). Given our committment to have separate schemas for each code list module - this combined SDT schema is an additional and new module, not an existing artifact we wish to retain.

To support its value, i had hoped you would....
a. provide an example of what the combined SDT schema should contain to demonstrate its value - clearly the one we generate now is not suitable.
and/or
b. use a complete example - by not removing DerivedCodeType you make it hard to contrast, not easier.

However I think that even so, you would agree there are no architectural reasons why we need a combined SDT schema module and separate Schema Modules for each code list.

The entire argument for using a combined SDT schema seems to be its possible future value for customisation. This is based on the idea that the combined SDT acts as an index to the code list schemas. I think the argument goes... if we change a code list schema we need only change the index to update the document schemas. I am not sure you make the case that this is essential for introducing substitution groups. in fact, the CLSC paper's examples are based on not having the combined SDT module - so clearly we can use the substitution group method of extension without it.

The real point should be whether having a combined SDT schema makes introducing different code lists easier.

Please note that we already have an index for mapping logical codes to code lists. This is our specialised data type spreadsheet ( and the corresponding EDIFIX model).

The way this works currently is...
a. We establish a UBL data type (a qualified CCTS data type) for the BBIE. Using your example, we have Currency_ Code. Type for our Invoice. Transaction Currency. Code.
b. We also define a logical mapping from this to a specific, physical code list using our Specialised Data Type model. So Currency_ Code. Type becomes bound to the ISO 4217 Alpha version 0.3 code list. This binding is both by code list, namespace and by namespace prefix. For example, the UBL 1.0 Currency_ Code. Type has a codelistID of "ISO 4217 Alpha", a namespace prefix of "cur:". and a CodeListSchemeURI of "urn:oasis:names:tc:ubl:codelist:CurrencyCode:1:0".
c. The schema generator them uses these to assembly the structures.

It is worth reminding oursleves that there is now and will be forever more, only one UBL 1.0 specialised data type model and only one UBL 1.0 Currency_ Code . Type defining one set of UBL 1.0 Currency Code values (the ones stated in ISO 4217 Alpha 0.3). Any changes or customisation mean a different set of code values and this means a different code list and therefore a different namespace. So, if someone wanted to use ISO 4271 Numeric Codes, they can use Currency_ Code. Type but they must have their own namespace for this and a different own specialised data type model to map it. Alternatively, they can define a Numeric Currency_ Code. Type and keep the two options logically and physically separate - which seems more sensible. either way, both methods must use their own namespaces.

And exactly the same options would apply to someone trying to use their own set of code values.

So what difference does introducing a combined SDT schema make?

Well, without a combined SDT schema, if someone wants to hand craft schemas for their own code lists, then they would have to change the document schemas (to replace the namespace - but not the prefix). Some people may consider this a feature. Using another set of code values from the ones published in 1.0, makes this a non-compliant UBL 1.0 document schema. There is no assurance of interoperability. So having an edited/different document schema (and corresponding chnages to it namespace) makes it clear that it is a customized implementation.

However, if you want to hide this customisation and reduce the amount of editing required, then a combined SDT schema acting as an index between the logical names and physical namespaces would be the way to do it. With this method, if we want to make Currency_ Code. Type the ISO 4217 Numeric version we can modify the combined SDT schema to indicate the new namespace/location. The Invoice schema still thinks it is using CurrencyCodeType, but it picks up a different Code List schema.

To do this effectively, the combined SDT schema should only describe this logical to physical mapping and leave all other metadata to the code list schema itself. otherwise we will get them out of synch. as with most indices they should just be pointers and have no supplementary information themselves. So i would expect to see the combined SDT schema as something like....

<xsd:schema targetNamespace="urn:oasis:names:tc:ubl:SpecialisedDatatypes:1:0-draft-8.3" xmlns:cur="urn:oasis:names:tc:ubl:codelist:CurrencyCode:1:0-draft-8.3" xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1:0-draft-8.3" xmlns="urn:oasis:names:tc:ubl:SpecialisedDatatypes:1:0-draft-8.3" xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1:0-draft-8.3">
    <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentParameters:1:0-draft-8.3" schemaLocation="UBL-CoreComponentParameters-1.0-draft-8.3.xsd"/>
    <xsd:import namespace="urn:oasis:names:tc:ubl:codelist:CurrencyCode:1:0-draft-8.3" schemaLocation="../codelist/use/UBL-CodeList-CurrencyCode-1.0-draft-8.3.xsd"/>
    <xsd:complexType name="CurrencyCodeType">
        <xsd:simpleContent>
            <xsd:extension base="cur:CurrencyCodeType"/>
        </xsd:simpleContent>
    </xsd:complexType>
</xsd:schema>
(obviously with entries for the other code lists as well.)

If i got this right it should say "what the document schema calls sdt:CurrencyCodeType is actually cur:CurrencyCodeType". Whilst this example seems trite, we could have different names for the mapping. If someone wanted to adopt ISO4217 Numeric codes we could change this to....

<xsd:schema targetNamespace="urn:oasis:names:tc:ubl:SpecialisedDatatypes:1:0-draft-8.3" xmlns:cur="urn:oasis:names:tc:myown:codelist:CurrencyCode:1:0" xmlns:ccts="urn:oasis:names:tc:ubl:CoreComponentParameters:1:0-draft-8.3" xmlns="urn:oasis:names:tc:ubl:SpecialisedDatatypes:1:0-draft-8.3" xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="1:0-draft-8.3">
    <xsd:import namespace="urn:oasis:names:tc:ubl:CoreComponentParameters:1:0-draft-8.3" schemaLocation="UBL-CoreComponentParameters-1.0-draft-8.3.xsd"/>
    <xsd:import namespace="urn:oasis:names:tc:myown:codelist:CurrencyCode:1:0" schemaLocation="../codelist/use/Customised-NumericCodeList-CurrencyCode-1.0.xsd"/>
    <xsd:complexType name="CurrencyCodeType">
        <xsd:simpleContent>
            <xsd:extension base="cur:PossiblyDifferentNameForCurrencyCodeType"/>
        </xsd:simpleContent>
    </xsd:complexType>
</xsd:schema>
( where we change namespace/location and the name of the base type).

This has some obvious architectural elegance, but is it actually solving a real problem?

Firstly, because we have changed the combined SDT schema definitions, it now needs a new version. This means changes in each affected document schema. Our Invoice schema now has to import a different combined SDT schema.

So what have we gained. Instead of changing the namespace for the Code List schema we change the namespace for the combined SDT schema (as well as changing the combined SDT schema itself).

It appears that, with or without the combined SDT schema, we end up changing the Invoice document schema whenever we change the code lists applying to any of its codes.

But, as i noted above, this is a good thing. Because after these chnages, it is no longer the UBL 1.0 Invoice schema. Any instances will need to use different values to be validated.

Another side issue with this idea of using a combined SDT schema, is what of implementations that want to use their own code lists (the "placebo" ones). I presume we would not want them to add to the UBL combined SDT schema. So do they create their own combined SDT schema? Then we get sets of these and so on, and so on...

I keep coming back to the idea of making this simple - a code has a qualified code data type that maps onto a schema that looks like the CLSC schema (less the substitution group - for now). We collectively refer to all these code list schemas as 'specialised data types' and everyone is happy :-)

This discussion reminds me of my grandfather leaving his wood offcuts - just in case he might need them later. My shed is full of old bits of wood - maybe I can give some to stephen.

Stephen Green wrote:
Specialised DataTypes Schema Module

The next Co-ordination meeting will be preceded by a meeting to discuss

the content of the Specialised DataTypes Schema Module. In particular

Tim has suggested that, since it does not seem to contain anything not

found already in other Schema modules, it may be that we can do without it.

In preparation for this discussion I have built a set of Schemas, as we have

in draft 8.3 but without the SDT Schema. The only document schema included

in this is the invoice schema. An invoice instance was produced too.

The changes necessary were as follows:

1. The namespaces for the codelist schema modules had to be added to both

the document schema modules (just the invoice in this example) and to the

Common Aggregate Components Schema Module, along with the schema locations.

2. References to Codes in these, where the code has a codelist Schema Module in UBL,

(but, importantly, *not* where it doesn't) have to be changed from

'type="sdt:CurrencyCodeType"'

to, say, 'type="cur:DerivedCodeType"'.

(I did not attempt to amend the use of the name 'DerivedCodeType' since I wished to

compare the results as closely as possible with draft-8.3.)

The sample invoice (a maximal elements and attributescontent sample, generated with

XML Spy) was valid both against the original schemas and against the new ones since,

although, ideally the namespaces should change (sdt removed and cur added), actually

the invoice is valid (using XML Spy - XSD spec and other parsers ??) without the namespace

change since the namespaces of the codes' types are effectively hidden in the instances.

This then seems to support the case for successful removal of the SDT Schema module.

However, a major concern would be:

1. What happens if UBL or other groups wish to add new codelist schema modules

where, at present, either UDT is used for the code's type or the code is new to UBL

altogether. Such a change would appear to not break backwards compatibility with

the SDT Schema Module in place, as at present (or with the substitutionGroup design),

but would this still be so with the SDT removed?

Such a change would be encouraged if substitutionGroups were introduced for 1.1 say.

Would this removal of the SDT prevent the later use of substitutionGroups in terms

of the need to preserve backwards compatibilty?

2. Does backwards compatibilty only apply to instances? Does not in some ways

apply to schemas even in cases where instances can be unaffected? Is the removal

of the SDT Schema Module going to adversely affect backwards compatibility when

a new codelist needs to be added or one which was based on UDT is change to having

a new Codelist Schema Module as the base of its type? After all, to implement the

facilities offered by substitutionGroup / abstract element Schema architecture one might

have to create a codelist Schema module where previously there was only the UDT.

In answer:

Adding a codelist schema module that didn't exist before, or requiring that a new

namespace be introduced to the Document Schema Modules and the Common Aggregate

Schema Modules does not necessarily mean that these namespaces have to be changed

in the instances. Though one might wish that it did, it might have negative ramifications

on the backwards compatibility.

Adding a codelist means adding a new namespace and a new prefix to the SDT at present

but not necessarily elsewhere.

Without the SDT, the namespace prefix has to be added to the type on which a

Code element is based. So the namespace and prefix have to be added to the CAC and,

where appropriate, the Document Schema Modules.

They do not have to be added to the instance (to my knowledge), but they could be.

I do not think that adding them would necessarily cause instance problems, though I

wouldn't be very surprised if it did in some situations such as with XPaths and XSLT

Stylesheets as well as some applications. I'd really want to check it with he experts

- and do we have time to do so?

Even if there were no instance problems that we could foresee, there is still the need

to go updating namespaces and prefixes in Schema Modules which appeared to be immune

before when adding or, in some cases, changing Codelist Modules.

Conclusions

I would prefer, in the light of Jon's recent statement "...taking care to construct 1.0 in a

way that will allow the adoption of substitution groups in 1.1 without breaking 1.0 instances",

that we *not* remove the SDT Schema Module at this stage without further expert assurance

that it will not cause foreseeable problems with 1.1

I think it may be worth getting extra advice regarding the effect of changing a codelist schema

namespace on an invoice with regards to backwards compatibilty too. There is no adverse affect in

XML Spy and StylusStudio but how about other parsers and XSLT stylesheets? Have we any comeback

about this from LMI or Ken? The question is - changing a namespace in a Schema which is not directly

referenced by an instance - does it ever cause problems for such instances in a way that some

would view as meaning that such changes break backwards compatibilty?

One way round this, if the SDT were removed (and it might not hurt even if it weren't), might be

to create schema modules of all our codelists so that we don't get problems adding these later.

This doesn't seem ideal though (we did it for beta but it meant a large set of schemas and

greater complexity and maintenance). I know I sought to assure that this wouldn't be necessary

when considering adding substitutionGroups, etc for 1.1 but without the SDT I wouldn't be so sure.

Stephen Green
To unsubscribe from this mailing list (and be removed from the roster of the OASIS TC), go to http://www.oasis-open.org/apps/org/workgroup/ubl/members/leave_workgroup.php.
-- 
regards
tim mcgrath
phone: +618 93352228  
postal: po box 1289   fremantle    western australia 6160