5. Recommendations

The WG intends these recommendations be used to inform further work by CommerceNet and other organizations on concrete e-commerce business libraries.

5.1. Schemas

The WG recommends that a e-commerce business library expressed in XML take advantage of XML schemas for two reasons: validation and extensibility.

All XML documents must be well formed. A well formed document is a document that follows the syntax rules of XML, which assures that an XML parser can separate content from markup reliably. In addition to well-formedness, XML provies two basic modes of validation:

DTD Processing—structural validation with minimal data validation
Schema Processing—structural validation and data validation

The requirements for the processing e-commerce documents point us toward the second category. Traditional EDI-based e-commerce systems perform thorough structure and data validation, and this experience points out the need for similar levels of validation in XML processing. Full structural and data validation is possibly only with schema-based processing; well-formedness and DTD validation are insufficient.

The difficulty in an EDI system is that structural and data validation both rely on the creation of application-specific parsers. In XML, we have generic tools for performing structural and weak data validation through the use of XML DTDs. Consequently, implementation is simplified. However, the level of data validation provided by XML parsers working with XML DTDs is not sufficient. If we look at EDI-based applications, we find that much of the data comes from standard code lists, or is strongly typed numeric data. Further, text-based fields may be restricted by the field lengths of the databases used to process the EDI messages.

XML schemas provide us with the ability to provide full validation both for structure and for data types. By providing us with a syntax for describing field length, degrees of precision, or an enumeration of token values (such as codes), we avoid sacrificing the benefits of full validation while leveraging other benefits of using XML as a syntax. As an improvement on EDI—because schema languages are themselves expressed in an XML syntax requiring only XML DTD-level validation—parsing tools for using them are largely generic, speeding implementation.

However, XML DTDs will be a useful way to describe e-business documents for purposes other than validation. For editing and low-level document structure validation, as performed by XML editing software, DTDs will be a needed expression of document structure. If a DTD is created and maintained that corresponds a closely as possible to the schema, then that DTD can be used both for editing and for instance document validation. Note however, that DTD based validation typically will not be as stringent as schema based validation. (The only issue here would be one of correctly assigning a PUBLIC or SYSTEM identifier at the level of the processing software for a given task.)

The WG recommends that both schema and DTD expressions of an e-business document library be created and maintained in tandem.

5.1.2. Extensibility

The requirements of e-commerce are such that many basic document types are generally useful, but for specific tasks or for particular markets, minor structural variations are extremely useful. If a truly common XML structure is to be established for e-commerce, it will need to be easily modifiable, while minimizing the costs associated with implementation around these variations on standard data structures.

In EDI there has been a gradual increase in the number of different elements, to accommodate market-specific variations. Several efforts within the EDI community are focused on eliminating this problem, which points out the fact that variations are a requirement, and one that is not easy to meet. A related EDI phenomenon is the overloading of the meaning and use of existing elements, creating a tangible bar to interoperation without low-level coordination between trading partners. The end result is a high cost in implementation.

XML DTDs require that a data structure be described fully before implementation, in terms of its elements, attributes, and their structural relationships and content models. Without these fundamental structural rules in place, building an e-commerce application becomes difficult or impossible. For documents of a given document type to be interoperable across different e-commerce applications, they must conform to a single DTD, with only minimal variation in their structures. In practice, the high degree of cross-application coordination required to handle structural variation reduces the usefulness of this built-in document-specific capability of XML processing with DTDs.

Schema-based XML processing offers us a way to enhance the ability of applications to interoperate, because it accommodates the required variations in basic data structures, without either overloading the meaning and use of existing data elements, or requiring wholesale addition of data elements specific to a particular industry or process. This is accomplished by allowing implementors to specify new element types that inherit the properties of existing elements. Schemas also allow you to specify exactly the structural and data content of the additions made to existing data structures. In this way, schemas allow us to limit variations and minimize the amount of additional implementation effort required in building an application.

This benefit derives from the nature of most variations required in e-commerce documents: many data structures are very similar to “standard” data structures, but have some significant semantic difference in a particular industry or process. Because schemas give us a mechanism for indicating the semantic “predecessors” of a particular variation, generic processing of standard types provides us with a basis for implementing just the refinements needed to handle the specific semantic variation. (An example of this would be the addition of a field to an address block, to describe some industry-specific addressing information. The address structure could be taken from a common library, and only the single additional field would require new processing, even though the entire structure were given a different name, to distinguish it from the “normal” address structure.)

In those cases where a variation in data structure is required only for some particular process, schemas again allow us to minimize implementation effort. It is possible to add a mechanism that allows a system to process a modified data element exactly as it would process its direct, standard parent, except for the specific interaction that requires the modified structure. By having most processes ignore the variation, except where it is specifically needed, schemas again help us reduce the effort required to build e-commerce applications, and enhance the level of interoperability.

Note that schema syntax can express structural extensions and information about new data types. This ability can help users accommodate requirements placed on them by legacy processing systems with nonstandard specifications.

While the problems encountered in EDI applications cannot be avoided entirely, the use of XML schemas helps us identify variations in data structure, and manage them better. Further, it gives us a solid syntax for modifying only those specific aspects of the data structure that require modification.

5.1.3. Implications of Schemas for Business Document Design

If we look at schema capabilities, certain considerations regarding data structure design strike us:

In existing XML schema languages, extensibility is largely limited to element content, and does not readily accommodate the modification of existing attributes on a particular XML element. Consequently, designers use elements rather than attributes to contain data that may be subject to extension in schemas.
Because data typing is much stronger when using XML schema processing, attention to the actual use of different kinds of data elements is critical in designing a common library. Where a DTD-based system would not produce errors over minor variations in the length of a #PCDATA field, for example, schema-validated XML applications will. The more control over our data our validation gives us, the more careful we need to be, or we will produce a standard data structure that will not be useful for some.
In many respects, as a result of schema extensibility, less is more. If we can identify those places within business document structure that are most liable to be extended, then we should model only the absolute common core. Because schema extension mechanisms are additive, it is better to recognize what is in fact common, rather than taking a (possibly wrong) guess at what might be useful.

5.2. Modularity

The WG recommends that schemas should be modular, so as to enable reuseablity and extensibility for vertical markets.

Consideration was given to the usability of any standard set of e-commerce components. If we look at Simpl-EDI, we have a case where the different types of elements have been formally classified:

Message Type—the type of the containing document/message
Segment—the type of the subsection (frequently nested)
Composite Data Elements—data elements that have both data members and some substructure
Data Elements—data elements without substructure

While Simpl-EDI is organized according to this set of distinctions, XML, because it has a broader application, is not. In XML, an element at any level is potentially a substructure in some other element. In effect, a PurchaseOrder element is not significantly different than an AddressBlock element, even though their uses within a processing application may be very different. The generic processing capabilities of XML tools do not recognize any inherent difference.

In many ways, this capability of XML is advantageous. It allows us to process nested (“looping”) structures easily. It fails to provide any useful distinction about the functional roles played by any specific element in a particular XML application. If there is any formal distinction in XML, it is between mixed content elements, which can contain plain text as well as element substructures, and those elements whose only content is element substructures. Even here, the difference is not as clear as in EDI, because XML elements are capable of carrying attributes that always contain content.

However, when it comes to building a standard set of business documents that are easy to understand and use, the conceptual classification of data elements may be helpful. If such a classification is seen as useful, it was considered that a four-level breakdown, based on the Simpl-EDI model, would be the best approach. The WG recognized that this may or may not be helpful for a particular user population. As it is not a strong technical distinction in XML, this conceptualization is left up to those documenting a particular set of business documents for an e-commerce application. It is not seen as a necessary part of a standard business document set.

5.3. Naming

The WG recommends that the naming of data elements (in XML terms, elements and attributes) should be done in accordance with ISO/IEC 11179, Part 5.

The UN/CEFACT Committee for Trade, Industry, and Enterprise Development has recommended the use of ISO/IEC 11179 for naming in document TRADE/CEFACT/1999/3, 6 January 1999. The WG takes no position on the choice of separator characters (or the use of capitalization to substitute for a separator character), preferring to leave the issue for consideration by developers of concrete e-business libraries. We take note that ISO/IEC 11179 syntax requires two separator characters.

The WG recommends that English words be used in naming, and recognizes that the choice between the use of U.S. English and U.N. English requires further investigation, with consideration of the issue of globalization.

5.4. XML Design Principles

The WG recommends certain practices in use of XML:

Express semantics fully in schemas and do not rely merely on well-formedness.
Instances conforming to schemas should be readable and understandable, and should enable reasonably intuitive interactions. Model for the abstractions of the user, not the programmer.
Use markup to make data substructures explicit (that is, distinguish separate data items as separate elements and attributes).
Use well known data types.
Code lists should be cited by external reference. In terms of the eCo architecture, the provision of code lists may be regarded as a “service”.
In the context of a schema, information that expresses correspondences between data elements in different classification schemes (“mappings”) may be regarded as metadata. This information should be accessible in the same manner as the rest of the information in the schema.