[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: Datatype Methodology
Jon, I think we're strongly agreeing here. The notion of creating a layering here - where UBL defines the base specification at the XML level and then overlaid on that XML tools such as CAM can cover off peoples application needs in a systematic standard way that's compatible with the ebXML solution stack - and particularly role and context aware. Problem today is that people code all these workarounds and back doors internally and have no consistent way of doing real process alignment with their partners. Hopefully using XML-scripting to augment the base UBL offers us a way out here - beyond just schema definitions - and removes the temptation for people to tinker with the underlying UBL itself. DW -------- Original Message -------- Subject: Re: Datatype Methodology From: jon.bosak@sun.com Date: Tue, May 09, 2006 3:32 pm To: david@drrw.info Cc: stephen.green@systml.co.uk, ubl-dev@lists.oasis-open.org Could we please restrict these discussions to ubl-dev? I'm really tired of seeing two copies of every post. The W3C XML recommendation goes into agonizing detail about which unicode characters are allowed and which are not. That's where these concerns are and should be addressed -- at the level of the XML recommendation. An application that does not support the characters specified in the XML recommendation is not a conformant XML application. Full stop. If an application purporting to be an XML application chokes on particular unicode characters allowed by XML, then that application is broken; fix it. If users persist in embedding characters in file names that aren't legal in the target system, tell them to stop. It is not the job of UBL to create mechanisms for specifying which of the 30,000+ unicode characters are allowed in XML documents and which are not. That job belongs to other specification efforts. Insofar as this is a real problem (and I'm not sure that it is), it applies to all XML documents, not just UBL. If a solution is necessary, it should be developed at a level that applies to XML in general. Jon Date: Tue, 09 May 2006 05:50:07 -0700 From: "David RR Webber (XML)" <david@drrw.info> Cc: ubl-dev@lists.oasis-open.org, ubl@lists.oasis-open.org, CAM OASIS TC <cam@lists.oasis-open.org> Steve, Right now the only way I'm aware of controlling this is thru the XML prologue and setting UTF-8, etc. Like Bryan - we have found this problematic in production. File attachments and file names is one area where people can create a filename on one O/S that is then not processable / gives problems - especially persisting into the backend database (e.g. Oracle) or during file handle opening. The only way we have addressed this to date is to issue manual guidelines to submitters. Because these characters can cause issues in the processing at various levels - failures can occur prior to or after the CAM step ; -) It's a good thought though - to add the ability to filter on character codes via an exclusion table mechanism - that would then point up the problem - e.g. invalid character code found in element <dataitem123> etc. And then a predicate applyCharacterFilter(/XPath/, filtername). DW -------- Original Message -------- Subject: Re: [ubl-dev] SV: [ubl] Re: [ubl-dev] Datatype Methodology RE: [ubl-dev] SBS and Restricted Data Types From: stephen.green@systml.co.uk Date: Tue, May 09, 2006 5:55 am To: ubl-dev@lists.oasis-open.org, ubl@lists.oasis-open.org Bryan, All, This raises and interesting point. There is surely an important need to specify in a trading agreement the character set to be used in the documents. I wonder whether even CAM has this :-) After all, should my application have to be able to support musical notation or hieroglyphics in a product description? Maybe there should be a way to specify a subset of a character set too (especially if it is Unicode we are talking about). I bet many have had problems when a character decodes to two characters in certain systems (e.g. the GBP sign ): not good for translation to fixed width and/or EDI. All the best Steve Quoting Bryan Rasmussen <BRS@itst.dk>: > I agree with not setting string length restrictions, I think it would be nice > to have string length minimums or constraints to require some content in an > element if the element is required, but it's not a big thing for me. > > Another thing though would be restricting characters that are not needed, as > per the recommendations in http://www.w3.org/TR/unicode-xml/#Suitable > > I think what should be restricted is (from document): > > U+202A .. U+202E BIDI embedding controls > (LRE, RLE, LRO, RLO, PDF) Strongly discouraged in [HTML 4.0] > U+206A .. U+206B Activate/Inhibit Symmetric swapping Deprecated in Unicode > U+206C .. U+206D Activate/Inhibit Arabic form shaping Deprecated in Unicode > U+206E .. U+206F Activate/Inhibit National digit shapes Deprecated in Unicode > > U+FFF9 .. U+FFFB Interlinear annotation characters Use ruby markup [Ruby] > U+FEFF Byte order mark / ZWNBSP Use only as byte order mark. Use U+2060 Word > Joiner instead of using U+FEFF as ZWNBSP > U+FFFC Object replacement character Use markup > U+1D173..U+1D173A Scoping for Musical Notation Use an appropriate markup > language > U+E0000 .. U+E007F Language Tag codepoints > > I don't want to restrict the use of line feeds etc. as is recommended in the > aforementioned document. > > Cheers, > Bryan Rasmussen >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]