RELAX NG DTD Compatibility

Committee Specification�7 September 2001

This version:
Committee Specification: 7 September 2001
Editors:
James Clark�<jjc@jclark.com>, MURATA Makoto�<mura034@attglobal.net>

Copyright � The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Abstract

This specification defines datatypes and annotations for use in [RELAX NG] schemas. The purpose of these datatypes and annotations is to support some of the features of XML 1.0 DTDs that are not supported directly by RELAX NG.

Status of this Document

This committee specification was approved for publication by the OASIS RELAX NG technical committee. It is a stable document which the committee believes is now ready for implementation. This specification is for version 0.9 of RELAX NG DTD Compatibility. The committee invites comments on this specification from both users and implementors until 10th October 2001. Comments should be sent to relax-ng-comment@lists.oasis-open.org. At the end of the comment period, the committee plans to consider the comments received, resolve all outstanding issues and release a specification for version 1.0 of RELAX NG DTD Compatibility.

Table of Contents

1 Introduction
2 Example
3 Conformance
4 Attribute default values
5 ID, IDREF and IDREFS
6 Documentation

Appendixes

A RELAX NG schema
References

1. Introduction

RELAX NG [RELAX NG] provides two mechanisms for extensibility:

  • RELAX NG schemas can reference external libraries of datatypes;
  • in a RELAX NG schema, RELAX NG-defined elements can be annotated with child elements and attributes from other namespaces.

The goal of this specification is to facilitate transition from XML 1.0 DTDs to RELAX NG schemas by using these extensibility mechanisms to support some of the features of XML 1.0 DTDs that are not supported by RELAX NG.

RELAX NG itself performs only validation: it does not change the infoset [XML Infoset] of an XML document. Most of the features of XML 1.0 DTDs that are not supported by RELAX NG involve modification to the infoset. In XML 1.0, validation and infoset modification are combined in a monolithic XML processor. It is a goal of this specification to provide a clean separation between validation and infoset modification, so that a wide variety of implementation scenarios are possible. In particular, it should be possible to make the infoset modifications either before performing RELAX NG validation or after performing RELAX NG validation or without performing RELAX NG validation at all. It should also be possible for an implementation of this specification not to modify the infoset at all and instead provide the application with a description of the modifications implied by the annotations, independently of any particular instance.

This specification does not provide any support for features of XML 1.0 DTDs, such as entity declarations, that cannot be cleanly separated from validation.

In an XML 1.0 document that is valid with respect to a DTD, each element or attribute in the instance has a unique corresponding element or attribute declaration in the DTD. With RELAX NG this is not always the case: it may be ambiguous which element or attribute pattern any particular element or attribute in the instance matches. In addition, it is non-trivial to determine when a RELAX NG schema is ambiguous. A further complication is that even when cases where it is not ambiguous, it may require multiple passes or lookahead to determine which element or attribute pattern a particular element or attribute matches. Detecting this situation is also non-trivial.

Some features of XML 1.0 DTDs, in particular default attribute values and ID/IDREF/IDREFS validation, depend crucially on this unambiguous correspondence between elements or attributes in the instance and their corresponding declarations. In order to support these features in RELAX NG schemas by means of datatypes and annotations, it is therefore necessary to impose restrictions on the use of these datatypes and annotations. The goals in framing these restrictions were as follows:

  1. It must be possible to determine whether a schema satisfies the restrictions independently of any particular instance.
  2. Processing of the instance must not require lookahead or multiple passes.
  3. The modified infoset must be XML 1.0 compatible: it must be an infoset that could have been produced by a validating XML 1.0 parser for some DTD.
  4. Implementation of the restrictions should be straightforward.
  5. The restrictions should not be any more restrictive than necessary.

The annotations defined in this specification have the namespace URI:

http://relaxng.org/ns/compatibility/annotations/0.9

Examples in this specification follow the convention of using the prefix a to refer to this namespace URI.

Annotations with the above namespace URI can be used in conjunction with annotations with other namespace URIs. Annotations with other namespace URIs are allowed wherever [RELAX NG] specifies that they are allowed.

2. Example

The following DTD

<!DOCTYPE employees [
<!-- A list of employees. -->
<!ELEMENT employees (employee*)>
<!-- An individual employee. -->
<!ELEMENT employee (#PCDATA)>
<!ATTLIST employee
  id ID #REQUIRED
  manages IDREFS #IMPLIED
  managedBy IDREF #IMPLIED
  country (US|JP) "US"
>
]>

could be translated to the following RELAX NG schema:

<element name="employees"
    xmlns="http://relaxng.org/ns/structure/0.9"
    xmlns:a="http://relaxng.org/ns/compatibility/annotations/0.9"
    datatypeLibrary="http://relaxng.org/ns/compatibility/datatypes/0.9">
  <a:documentation>A list of employees.</a:documentation>
  <zeroOrMore>
    <element name="employee">
      <a:documentation>An individual employee.</a:documentation>
      <attribute name="id">
        <data type="ID"/>
      </attribute> 
      <optional>
        <attribute name="manages">
          <data type="IDREFS"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="managedBy">
          <data type="IDREF"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="country" a:defaultValue="US">
          <choice>
            <value>US</value>
            <value>JP</value>
          </choice>
        </attribute>
      </optional>
      <text/>
    </element>
  </zeroOrMore>
</element>

3. Conformance

This specification defines three features:

  • attribute default values
  • ID/IDREF/IDREFS
  • documentation

Conformance is defined separately for each feature. A conformant implementation can support any combination of features. For each feature, this specification defines

  • a compatibility property; this is a property that may hold for a correct RELAX NG schema; it is analogous to correctness for RELAX NG;
  • a soundness relationship; this is a relationship that may hold between a RELAX NG schema for which the compatibility property holds and an XML instance; it is analogous to validity for RELAX NG;
  • for an instance and schema between which the soundness relationship holds, a modification of the infoset of the instance; there is nothing analogous to this in RELAX NG.

For each feature, there are two levels of conformance.

  1. Level 1 is similar to RELAX NG conformance. It has two parts: determining whether the compatibility property holds of a schema, and determining whether the soundness relationship holds between a compatible schema and an instance.
  2. Level 2 requires that an implementation provide information about the modification of the infoset defined for the feature. An implementation can provide the application either with a modified infoset or with sufficient information that would allow the application to modify the infoset itself.

A conformant implementation may support different features at different levels.

A conformant implementation may be an integral part of a RELAX NG validator or may be a separate software module.

Note that compatibility does not affect RELAX NG correctness. Thus, a conforming RELAX NG validator is required to be able to validate an instance against a correct RELAX NG schema even if that schema is not compatible with one or more of the features defined in this specification. Furthermore, soundness is completely independent of validity. A conforming RELAX NG validator must be able to determine whether an instance is valid with respect to a correct RELAX NG schema regardless of whether it is sound with respect to that schema for any of the features defined in this specifcation. A conforming implementation of a feature defined by this specification must be able to determine whether a instance is sound with respect to a compatible schema, regardless of whether the instance is valid with respect to that schema.

4. Attribute default values

This feature is specified by an annotation attribute. An a:defaultValue attribute on a RELAX NG attribute element specifies the default value for the attribute.

A correct RELAX NG schema is compatible with this feature if after schema simplification, for each attribute element that has an a:defaultValue attribute, all of the following hold:

  • its first child is a name element
  • the first child of the containing element element is a name element
  • the value of the a:defaultValue attribute matches the pattern contained in the attribute element
  • the pattern in the attribute element does not contain a data or value element with a context-dependent datatype; a context-dependent datatype is one for which there is a string for which the datatypeAllows function (defined in [RELAX NG]) returns both true and false according to the context
  • it does not have a oneOrMore ancestor
  • any ancestor that is a choice element has one child that is an empty element
  • it has at least one choice ancestor
  • if the containing definition competes with another definition, then that other definition also contains an attribute element with the same name and with an a:defaultValue attribute with the same value. A definition
    <define name="ln1">
      <element>
        nc1
        p1
      </element>
    </define>

    competes with a definition

    <define name="ln2">
      <element>
        nc2
        p2
      </element>
    </define>

    if there is a name n that belongs to both nc1 and nc2.

There is no soundness relationship for this feature.

The modification of the infoset for this feature adds attribute information items for omitted attributes.

Editorial Note

Define this more precisely.

5. ID, IDREF and IDREFS

A RELAX NG schema makes use of this feature by using datatypes. Each datatype is associated with an ID-type, which is one of ID, IDREF or IDREFS or null. The datatype library with URI

http://relaxng.org/ns/compatibility/datatypes/0.9

contains datatypes named ID, IDREF and IDREFS associated with ID-types of ID, IDREF and IDREFS respectively. The datatypes in other datatype libraries are associated with a null ID-type, unless the datatype library specifies otherwise.

A RELAX NG schema is compatible with this feature if and only if, after schema simplification, for each data or value element that specifies a datatype associated with a non-null ID-type, all of the following hold:

  • its parent is an attribute element
  • the first child of its attribute parent is a name element
  • the first child of the element ancestor is a name element
  • any attribute element that competes with its parent attribute element has an data or value child specifying a datatype associated with the same ID-type. Two attribute elements
    <attribute> nc1 p1 </attribute>

    and

    <attribute> nc2 p2 </attribute>

    compete if and only if the containing definitions compete and there is a name n that belongs to both nc1 and nc2. Note that a definition competes with itself.

Thus, a RELAX NG schema that is compatible with this feature implies a mapping from element/attribute name pairs onto an ID-type, and hence a mapping from attributes in the instance onto ID-types.

An instance is sound for this feature with respect to a compatible RELAX NG schema if and only if

  • when the value of each attribute in the instance whose ID-type is not null is split into a sequence of whitespace-separated tokens, the length of the sequence is 1 if the ID-type is ID or IDREF and greater than or equal to 1 if the ID-type is IDREFS, and
  • no two distinct tokens in attributes of ID-type ID have the same value, and
  • for each token in an attribute of ID-type IDREF or IDREFS, there is a token in an attribute of ID-type ID with the same value

The modification of the infoset for this feature changes the [attribute type] property of attribute information items to ID, IDREF or IDREFS and modifies the [normalized value] by applying the normalizeWhiteSpace function.

Editorial Note

Define this more precisely.

The semantics needed for RELAX NG validation are defined for the datatype library

http://relaxng.org/ns/compatibility/datatypes/0.9

as follows:

  • for ID and IDREF a string is an allowed representation if it is a single NCName (as defined in [XML Namespaces]) with optional leading and trailing whitespace; for IDREFS a string is an allowed representation if it is a a whitespace-separated list of one or more NCNames;
  • values are tested for equality in the same way as for the builtin token datatype

6. Documentation

The a:documentation element can be used to specify human-readable documentation. The functionality provided by an a:documentation element in a RELAX NG schema would be provided by a comment in an XML 1.0 DTD. The a:documentation element contains text specifying documentation. It can also have namespace-qualified attributes such as xml:lang.

If an a:documentation element does not have a preceding sibling element from the RELAX NG namespace, then the specified documentation applies to the parent element. Otherwise, the specified documentation applies to the nearest preceding sibling element from the RELAX NG namespace. There may be multiple a:documentation elements specifying documentation that apply to the same RELAX NG element.

A correct RELAX NG schema is compatible with this feature if and only if, for each a:documentation element, all of the following hold:

  • it does not have any child elements
  • it does not have any attribute whose namespace URI is the empty string, the RELAX NG namespace URI or the compatibility annotations namespace URI
  • if it has a preceding sibling element from the RELAX NG namespace, then the nearest such preceding sibling element is an element that does not allow child elements (i.e. value, param or name)

There is no soundness relationship for this feature.

There is no infoset modification for this feature.

A. RELAX NG schema

To be supplied.

References

James Clark, Makoto MURATA, editors. RELAX NG Specification. OASIS, 2001.

Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.

Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. W3C (World Wide Web Consortium), 1999.

John Cowan, Richard Tobin, editors. XML Information Set. W3C (World Wide Web Consortium), 2001.