ubl-ndrsc message

Subject: Re: [ubl-ndrsc] Rule: 115 and 116 Containers
From: Eduardo Gutentag <Eduardo.Gutentag@Sun.COM>
To: "Burcham, Bill" <Bill_Burcham@stercomm.com>
Date: Thu, 17 Jul 2003 16:08:47 -0700
Bill, I think your argument is bogus.

The alternative to

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<SuperfluousContainer>
		<Fruit>Apple</Fruit>
		<Fruit>Orange</Fruit>
		<Fruit>Banana</Fruit>
	</SuperfluousContainer>
</doc>

is not, in real life,

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<Fruit>Apple</Fruit>
	<Fruit>Orange</Fruit>
	<Fruit>Banana</Fruit>
</doc>

but more probably

<?xml version="1.0" encoding="UTF-8"?>
<doc>
	<someelement>foo</somelement>
	<Fruit>Apple</Fruit>
	<anotherone>bar</anotherone>
	<Fruit>Orange</Fruit>
	<alongcontainerlikeaddress>
              <a>
                 <b>
                    <c>foo</c>
                 </b>
               </a>
         </alongcontainerlikeaddress>
	<Fruit>Banana</Fruit>
</doc>

Also, although I don't have the time or the inclination of checking this out,
(I am on vacation after all) I believe your first stylesheet is way more
complicated than needed for dealing with the container case, I believe it
can be cut in half -- but again, I have not checked this, it's just based
on previous experience with stylesheets.

Burcham, Bill wrote:
> I'm with Chee-Kai -- I think [R 116] is wrong.  (I know it's probably too
> late -- but I'm gonna say my peace anyway :-)
> The two cases I've heard made in favor of it are:
> 
> 1. container elements foster more readable stylesheets
> 2. container elements significantly improve document processing performance
> 
> Argument 1 is weak.  Forgive me for posting working code, but here is an
> instance document with superfluous containers:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <doc>
> 	<SuperfluousContainer>
> 		<Fruit>Apple</Fruit>
> 		<Fruit>Orange</Fruit>
> 		<Fruit>Banana</Fruit>
> 	</SuperfluousContainer>
> </doc>
> 
> And here is a stylesheet to process it:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:transform version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 	<xsl:output method="xml" version="1.0" encoding="UTF-8"
> indent="yes"/>
> 	<xsl:template match="doc">
> 		<xsl:element name="NewDoc">
> 			<xsl:apply-templates select="current()/*"/>
> 		</xsl:element>
> 	</xsl:template>
> 	<xsl:template match="SuperfluousContainer">
> 		<BeforeFruit/>
> 		<xsl:apply-templates select="current()/*"/>
> 		<AfterFruit/>
> 	</xsl:template>
> 	<xsl:template match="Fruit">
> 		<AFruit>
> 			<xsl:value-of select="text()"/>
> 		</AFruit>
> 	</xsl:template>
> </xsl:transform>
> 
> And here is the output:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <NewDoc>
> 	<BeforeFruit/>
> 	<AFruit>Apple</AFruit>
> 	<AFruit>Orange</AFruit>
> 	<AFruit>Banana</AFruit>
> 	<AfterFruit/>
> </NewDoc>
> 
> The example injects an element before the first fruit and after the last
> one.  That's the example we've been discussing for a couple years as being
> the bugaboo here.
> 
> And here is an analogous source instance doc -- this time with no
> superfluous containers:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <doc>
> 	<Fruit>Apple</Fruit>
> 	<Fruit>Orange</Fruit>
> 	<Fruit>Banana</Fruit>
> </doc>
> 
> And here is a different stylesheet to process this one:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:transform version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> 	<xsl:output method="xml" version="1.0" encoding="UTF-8"
> indent="yes"/>
> 	<xsl:template match="doc">
> 		<xsl:element name="NewDoc">
> 			<xsl:apply-templates select="current()/*"/>
> 		</xsl:element>
> 	</xsl:template>
> 	<xsl:template match="Fruit">
> 		<xsl:if test="position() = 1">
> 		<BeforeFruit/>
> 		</xsl:if>
> 		<AFruit>
> 			<xsl:value-of select="text()"/>
> 		</AFruit>
> 		<xsl:if test="position() = last()">
> 		<AfterFruit/>		
> 		</xsl:if>
> 	</xsl:template>
> </xsl:transform>
> 
> Comparing the two stylesheets I note that the one for superfluous containers
> is 19 lines and the one for repeating elements (with no superfluous
> containers) is 20 lines.  That's only one line of code difference.  And I
> don't think the second stylesheet is any less readable than the first.
> 
> If I look at the two source documents, and extrapolate to larger documents
> with more nesting I can say with certainty that superfluous containers make
> for larger documents and IMHO are a bit harder for humans to read -- do to
> the increase in indentation necessitated by the deeper hierarchy.
> 
> As for point 2 (processing performance), that's just Voodoo Computer
> Science.  So, which XML processing tools are we using for comparison?  Which
> versions of those tools?  What is the use-case/scenario/algorithm?  How big
> is the document?  Worst-case, if you tell me that the document is HUGE then
> I'll tell you a) the Bolivian rug-weaver using Perl as the processing tool
> isn't gonna see the HUGE document and b) the company (Wal*Mart) that sees
> the HUGE document can darn-well write a transform on the incoming document
> (or four or five transforms) that make it more amenable to efficient
> processing.
> 
> But you know what -- I still haven't seen any real _evidence_ that
> superfluous containers provide any processing performance advantage in the
> first place.  It's more likely they hurt performance since they _definitely_
> make documents larger!
> 
> So by my count, it's:
> 
> Superfluous containers:  they make documents bigger (inflicting a processing
> burden) and harder for humans to read
> Repeated elements (no superfluous containers): they make documents smaller
> and easier for humans to read, and necessitate a tiny bit more XSLT code in
> some situations.
> 
> Down with [R 116]!
> 
> 
> Bill Burcham
> Sr. Software Architect, Integration Software Development
> Sterling Commerce, Inc.
> 469.524.2164
> bill_burcham@stercomm.com
> 
> -----Original Message-----
> From: Chin Chee-Kai [mailto:cheekai@softml.net] 
> Sent: Wednesday, July 16, 2003 8:38 PM
> To: UBL-NDR
> Subject: Re: [ubl-ndrsc] Rule: 115 and 116 Containers
> 
> 
> 
>>>[R 115]  All documents shall have a container for metadata  and which 
>>>proceeds the body of the document and is named  "Head" _____________. 
>>>(anything but header)
> 
> 
>>>[R 116]  All elements with a cardinality of 1..n, (and lack a 
>>>qualifying
>>>structure) must be contained by a list container named  "(name of
> 
> repeating
> 
>>>element)List", which has a cardinality of 1..1.
> 
> 
> I remain critical of having to maintain such virtual structure for no
> apparent use.  I've heard that the rules don't affect FPSC at all.  By
> design, they should not affect LC.  So who's benefiting from carrying all
> the empty luggages around?
> 
> 
> That said, I pointed out last time that the [R 115] should have "precedes"
> instead of "proceeds", unless the proponent of the rule wants Head sitting
> at the tail.
> 
> 
> 
> Best Regards,
> Chin Chee-Kai
> SoftML
> Tel: +65-6820-2979
> Fax: +65-6743-7875
> Email: cheekai@SoftML.Net
> http://SoftML.Net/
> 
> 
> On Wed, 16 Jul 2003, Lisa-Aeon wrote:
> 
> 
>>>Rules for Voting:  Each email will have only one rule in it, I will 
>>>try to mark the rules that group with it, or rules that might 
>>>duplicate it.  The membership has 5 working days to bring forth 
>>>objection or discussion, after the 5 working days, if there are no 
>>>objections, the rule will be assumed to be "ACCEPTED" and be given to 
>>>the LCSC for their implementation.
>>>
>>>Please Reply leaving first email in Reply.
>>>
>>>Voting period on this rule ends:  July 23, 2003
>>>
>>>*******************************
>>>I am combining the last two rules, because we have already voted on a 
>>>decision.  These are the old rules:
>>>
>>>[R 115]  All documents shall have a container for metadata  and which 
>>>proceeds the body of the document and is named  "Head" _____________. 
>>>(anything but header)
>>>
>>>[R 116]  All elements with a cardinality of 1..n, (and lack a 
>>>qualifying
>>>structure) must be contained by a list container named  "(name of
> 
> repeating
> 
>>>element)List", which has a cardinality of 1..1.
>>>
>>>These are the new rules agreed upon during the teleconference call on 
>>>9 July.  These are voted as approved, just need polishing up.  To 
>>>remind everybody, here is the motion and it was approved.
>>>
>>>***Motion:(Arofan) We agree in the direction of the rules being 
>>>submitted, a. Endorse the direction as indicated in this proposal.
>>>
>>>b. Authorize Arofan to make the changes that were discussed in this 
>>>meeting.
>>>
>>>Changes:
>>>
>>>Substitute the word "Top" for "Head",
>>>
>>>Make sure we have explicitly covers the 1..n in the wording.
>>>
>>>c. Authorize Mark to make editorial changes.
>>>
>>>d. Submit to list for final approval. (vote by email)
>>>
>>>******
>>>Proposed full set of rules, as discussed:
>>>
>>>----------------------------------------------------------------------
>>>------
>>>----
>>>
>>>(1) All non-repeatable BIEs that are direct children of the 
>>>document-level BIE in the model will be child elements of a generated 
>>>"Top" element in the schema. The generated "Top" element will be named 
>>>"[doctype]Top", and its content model will be a sequence. It will 
>>>reference a generated type named "[doctype]TopType". Both the 
>>>generated "Top" element and its type will be declared in the same 
>>>namespace as the document-level element. (Note: This rule implies that 
>>>all documents will have generated "Top" elements, without exception, 
>>>regardless of their other 'body' contents, to cover cases where the 
>>>document will be extended with the Context mechanism, and for general
>>>consistency.)
>>>
>>>(2) All repeatable BIEs in the model will have generated containers. 
>>>The containers will be named "[name_of_repeatable_element]List". These 
>>>containers will be required if the cardinality of their contained 
>>>immediate children requires at least one; if their contained children 
>>>are optional; the container itself will be optional. At least one of 
>>>the repeatable children of the List will always be required, but there 
>>>may be more than one required child if that agrees with the 
>>>cardinality found in the business model.
>>>
>>>All "_____List" elements will reference a "_______ListType", which 
>>>will be declared in the same namespace as the element that represents 
>>>the repeatable BIE in the business model. The content model of this 
>>>type will have a single child element, which will have a maximum 
>>>occurrence that reflects the maximum occurrence in the business model, 
>>>and a minimum occurrence as described in this rule, above.
>>>
>>>(NOTE: This rule applies equally to 'list' containers at the document 
>>>level, and also at lower levels within the document.)
>>>
>>>(3) The document element in the schema will have a content model that 
>>>is a sequence of elements, the first of which will be the "Top" 
>>>element, and the others will be the generated "List" elements, in the 
>>>order in which their contained, repeatable children appeared in the 
>>>model.
>>>
>>>(4) All elements in the generated schema that are direct children of 
>>>the generated "top" elements in all documents should be gathered 
>>>together into a common aggregate type, named "TopType", which will be 
>>>declared in the Common Aggregate Types namespace. This type should be 
>>>declared abstract, and all document headers should be extensions - 
>>>even if only trivial extensions to facilitate re-naming - of this 
>>>abstract type. (Note: This rule allows for polymorphic processing of 
>>>the set of generic header elements across all document types.)
>>>
>>>
>>>---
>>>Outgoing mail is certified Virus Free.
>>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>>
>>>
>>>
>>>---
>>>
>>>File has not been scanned
>>>
>>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>>Version: 6.0.498 / Virus Database: 297 - Release Date: 7/8/2003
>>>
> 
> 
> 
> You may leave a Technical Committee at any time by visiting http://www.oasis-open.org/apps/org/workgroup/ubl-ndrsc/members/leave_workgroup.php
> 

-- 
Eduardo Gutentag               |         e-mail: eduardo.gutentag@Sun.COM
Web Technologies and Standards |         Phone:  +1 510 550 4616 x31442
Sun Microsystems Inc.          |
W3C AC Rep / OASIS TAB Chair
Follow-Ups:
- Re: [ubl-ndrsc] Rule: 115 and 116 Containers
  - From: Chin Chee-Kai <cheekai@softml.net>
References:
- RE: [ubl-ndrsc] Rule: 115 and 116 Containers
  - From: "Burcham, Bill" <Bill_Burcham@stercomm.com>