ubl-dev message

Subject: RE: Datatype Methodology
From: "David RR Webber \(XML\)" <david@drrw.info>
To: jon.bosak@sun.com
Date: Tue, 09 May 2006 13:46:23 -0700
Jon, 

I think we're strongly agreeing here.   

The notion of creating a layering here - where UBL defines the base
specification at the XML level and then overlaid on that XML tools such
as CAM can cover off peoples application needs in a systematic standard
way that's compatible with the ebXML solution stack - and particularly
role and context aware. 

Problem today is that people code all these workarounds and back doors
internally and have no consistent way of doing real process alignment
with their partners. 

Hopefully using XML-scripting to augment the base UBL offers us a way
out here - beyond just schema definitions - and removes the temptation
for people to tinker with the underlying UBL itself. 
 
DW

 -------- Original Message --------
Subject: Re: Datatype Methodology
From: jon.bosak@sun.com
Date: Tue, May 09, 2006 3:32 pm
To: david@drrw.info
Cc: stephen.green@systml.co.uk, ubl-dev@lists.oasis-open.org

Could we please restrict these discussions to ubl-dev?  I'm really
tired of seeing two copies of every post.

The W3C XML recommendation goes into agonizing detail about which
unicode characters are allowed and which are not.  That's where
these concerns are and should be addressed -- at the level of the
XML recommendation.  An application that does not support the
characters specified in the XML recommendation is not a conformant
XML application.  Full stop.

If an application purporting to be an XML application chokes on
particular unicode characters allowed by XML, then that
application is broken; fix it.

If users persist in embedding characters in file names that aren't
legal in the target system, tell them to stop.

It is not the job of UBL to create mechanisms for specifying which
of the 30,000+ unicode characters are allowed in XML documents and
which are not.  That job belongs to other specification efforts.
Insofar as this is a real problem (and I'm not sure that it is),
it applies to all XML documents, not just UBL.  If a solution is
necessary, it should be developed at a level that applies to XML
in general.

Jon

  Date: Tue, 09 May 2006 05:50:07 -0700
  From: "David RR Webber (XML)" <david@drrw.info>
  Cc: ubl-dev@lists.oasis-open.org, ubl@lists.oasis-open.org,
  CAM OASIS TC <cam@lists.oasis-open.org>

  Steve, 

  Right now the only way I'm aware of controlling this is thru the XML
  prologue and setting UTF-8, etc. 

  Like Bryan - we have found this problematic in production.  File
  attachments and file names is one area where people can create a
  filename on one O/S that is then not processable / gives problems -
  especially persisting into the backend database (e.g. Oracle) or
during
  file handle opening. 

  The only way we have addressed this to date is to issue manual
  guidelines to submitters.  Because these characters can cause issues
in
  the processing at various levels - failures can occur prior to or
after
  the CAM step ; -) 

  It's a good thought though - to add the ability to filter on character
  codes via an exclusion table mechanism - that would then point up the
  problem - e.g. invalid character code found in element <dataitem123>
  etc.  And then a predicate applyCharacterFilter(/XPath/, filtername). 

  DW


   -------- Original Message --------
  Subject: Re: [ubl-dev] SV: [ubl] Re: [ubl-dev] Datatype Methodology
RE:
  [ubl-dev] SBS and  Restricted Data Types
  From: stephen.green@systml.co.uk
  Date: Tue, May 09, 2006 5:55 am
  To: ubl-dev@lists.oasis-open.org, ubl@lists.oasis-open.org

  Bryan, All,

  This raises and interesting point. There is surely an important need
  to specify in a trading agreement the character set to be used in
  the documents. I wonder whether even CAM has this :-) After all,
should
  my application have to be able to support musical notation or
  hieroglyphics
  in a product description? Maybe there should be a way to specify a
  subset
  of a character set too (especially if it is Unicode we are talking
  about).
  I bet many have had problems when a character decodes to two
characters
  in
  certain systems (e.g. the GBP sign ): not good for translation to
fixed
  width and/or EDI.

  All the best

  Steve

  Quoting Bryan  Rasmussen <BRS@itst.dk>:

  > I agree with not setting string length restrictions, I think it
would be nice
  > to have string length minimums or constraints to require some
content in an
  > element if the element is required, but it's not a big thing for me.
  >
  > Another thing though would be restricting characters that are not
needed, as
  > per the recommendations in
http://www.w3.org/TR/unicode-xml/#Suitable
  >
  > I think what should be restricted is (from document):
  >
  > U+202A .. U+202E BIDI embedding controls
  > (LRE, RLE, LRO, RLO, PDF) Strongly discouraged in [HTML 4.0]
  > U+206A .. U+206B Activate/Inhibit Symmetric swapping Deprecated  in
Unicode
  > U+206C .. U+206D Activate/Inhibit Arabic form shaping Deprecated in
Unicode
  > U+206E .. U+206F Activate/Inhibit National digit shapes Deprecated
in Unicode
  >
  > U+FFF9 .. U+FFFB Interlinear annotation characters Use ruby markup
[Ruby]
  > U+FEFF Byte order mark / ZWNBSP Use only as byte order mark. Use
U+2060 Word
  > Joiner instead of using U+FEFF as ZWNBSP
  > U+FFFC Object replacement character Use markup
  > U+1D173..U+1D173A Scoping for Musical Notation Use an appropriate
markup
  > language
  > U+E0000 .. U+E007F Language Tag codepoints
  >
  > I don't want to restrict the use of line feeds etc. as is
recommended in the
  > aforementioned document.
  >
  > Cheers,
  > Bryan Rasmussen
  >