RELAX NG DTD Compatibility Annotations

Working Draft�10 August 2001

This version:
Working Draft: 10 August 2001
Editors:
James Clark�<jjc@jclark.com>, MURATA Makoto�<mura034@attglobal.net>

Copyright � The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Abstract

This specification defines elements and attributes that can be used as annotations in [RELAX NG] schemas. The purpose of these annotations is to support some of the features of XML 1.0 DTDs that are not supported by RELAX NG.

Status of this Document

This is a working draft constructed by the editors. It is not an official committee work product and may not reflect the consensus opinion of the committee. Comments on this document may be sent to relax-ng-comment@lists.oasis-open.org.

Table of Contents

1 Introduction
2 Example
3 Conformance
4 Attribute default values
5 ID, IDREF and IDREFS
6 Documentation

Appendixes

A RELAX NG schema
References

1. Introduction

RELAX NG [RELAX NG] provides an annotation capability. In a RELAX NG schema, RELAX NG-defined elements can be annotated with child elements and attributes from other namespaces. The goal of this specification is to facilitate transition from XML 1.0 DTDs to RELAX NG schemas by defining annotations that support some of the features of XML 1.0 DTDs that are not supported by RELAX NG.

RELAX NG itself performs only validation: it does not change the infoset [XML Infoset] of an XML document. Most of the features of XML 1.0 DTDs that are not supported by RELAX NG involve modification to the infoset. In XML 1.0, validation and infoset modification are combined in a monolithic XML processor. It is a goal of this specification to provide a clean separation between validation and infoset modification, so that a wide variety of implementation scenarios are possible. In particular, it should be possible to make the infoset modifications implied by the annotations either before performing RELAX NG validation or after performing RELAX NG validation or without performing RELAX NG validation at all. It should also be possible for an implementation of this specification not to modify the infoset at all and instead provide the application with a description of the modifications implied by the annotations, independently of any particular instance.

This specification does not provide any support for features of XML 1.0 DTDs, such as entity declarations, that cannot be cleanly separated from validation.

In an XML 1.0 document that is valid with respect to a DTD, each element or attribute in the instance has a unique corresponding element or attribute declaration in the DTD. With RELAX NG this is not always the case: it may be ambiguous which element or attribute pattern any particular element or attribute in the instance matches. In addition, it is non-trivial to determine when a RELAX NG schema is ambiguous. A further complication is that even when cases where it is not ambiguous, it may require multiple passes or lookahead to determine which element or attribute pattern a particular element or attribute matches. Detecting this situation is also non-trivial.

Some features of XML 1.0 DTDs, in particular default attribute values and ID/IDREF/IDREFS validation, depend crucially on this unambiguous correspondence between elements or attributes in the instance and their corresponding declarations. In order to support these features by means of annotations on RELAX NG patterns, it is therefore necessary to impose restrictions on the use of such annotations. The goals in framing these restrictions were as follows:

  1. It must be possible to determine whether a schema satisfies the restrictions independently of any particular instance.
  2. Processing of the instance must not require lookahead or multiple passes.
  3. The modified infoset must be XML 1.0 compatible: it must be an infoset that could have been produced by a validating XML 1.0 parser for some DTD.
  4. Implementation of the restrictions should be straightforward.
  5. The restrictions should not be any more restrictive than necessary.

The elements and attributes defined in this specification have the namespace URI.

http://relaxng.org/ns/annotation/0.9

Examples in this specification follow the convention of using the prefix a to refer to this namespace URI.

2. Example

The following DTD

<!DOCTYPE employees [
<!-- A list of employees. -->
<!ELEMENT employees (employee*)>
<!-- An individual employee. -->
<!ELEMENT employee (#PCDATA)>
<!ATTLIST employee
  id ID #REQUIRED
  manages IDREFS #IMPLIED
  managedBy IDREF #IMPLIED
  country (US|JP) "US"
>
]>

could be translated to the following RELAX NG schema with annotations:

<element name="employees"
         xmlns="http://relaxng.org/ns/structure/0.9"
         xmlns:a="http://relaxng.org/ns/annotation/0.9">
  <a:documentation>A list of employees.</a:documentation>
  <zeroOrMore>
    <element name="employee">
      <a:documentation>An individual employee.</a:documentation>
      <attribute name="id" a:attributeType="ID">
        <data type="token"/>
      </attribute> 
      <optional>
        <attribute name="manages" a:attributeType="IDREFS">
          <list>
            <oneOrMore>
              <data type="token"/>
            </oneOrMore>
          </list>
        </attribute>
      </optional>
      <optional>
        <attribute name="managedBy" a:attributeType="IDREF">
          <data type="token"/>
        </attribute>
      </optional>
      <optional>
        <attribute name="country" a:defaultValue="US">
          <choice>
            <value>US</value>
            <value>JP</value>
          </choice>
        </attribute>
      </optional>
      <text/>
    </element>
  </zeroOrMore>
</element>

3. Conformance

This specification defines three features:

  • attribute default value
  • ID/IDREF/IDREFS
  • documentation

Conformance is defined separately for each feature. A conformant implementation can support any combination of features. There are also two levels of conformance.

  1. Level 1 is similar to RELAX NG conformance. It has two parts: checking that the schema uses the annotations correctly, and checking that an instance is valid with respect to the schema's annotations.
  2. Level 2 requires that an implementation provide information about the infoset modifications implied by annotations. An implementation can provide the application either with a modified infoset or with sufficient information that would allow the application to modify the infoset itself.

A conformant implementation may support different features at different levels.

A conformant implementation may be an integral part of a RELAX NG validator or may be a separate software module.

4. Attribute default values

An a:defaultValue attribute on a RELAX NG attribute element specifies the default value for the attribute.

If an attribute element has an a:defaultValue attribute, then, after schema simplification,

  • its first child must be a name element
  • the first child of the containing element element must be a name element
  • the value of the a:defaultValue attribute must match the pattern contained in the attribute element
  • the pattern in the attribute element must not contain data or value elements with context-dependent datatypes
  • it must not have an a:attributeType attribute (see Section 5)
  • it must not have a oneOrMore ancestor
  • any ancestor that is a choice element must have one child that is an empty element
  • it must have at least one choice ancestor
  • if the containing definition competes with another definition, then that other definition must also contain an attribute element with the same name and with an a:defaultValue attribute with the same value. A definition
    <define name="ln1">
      <element>
        nc1
        p1
      </element>
    </define>

    competes with a definition

    <define name="ln2">
      <element>
        nc2
        p2
      </element>
    </define>

    if there is a name n that belongs to both nc1 and nc2.

The a:defaultValue annotation implies a modification of the infoset that adds attribute information items for omitted attributes.

Editorial Note

Define this more precisely.

5. ID, IDREF and IDREFS

An a:attributeType attribute on a RELAX NG attribute element must have the value ID, IDREF or IDREFS. It specifies the XML 1.0 attribute type of the attribute and corresponds to the [attribute type] infoset property.

Leading and trailing whitespace is ignored in the value of the a:attributeType element during schema simplification.

If an attribute element has an a:attributeType attribute, then, after schema simplification,

  • its first child must be a name element
  • the first child of the containing element element must be a name element
  • if the value of the a:attributeType attribute is ID, then there must not be a definition that competes with the definition containing the attribute element and that contains an attribute element that has a different name and an a:attributeType attribute with value ID. Note that a definition competes with itself. This implies that in the instance if two attributes are both IDs and have a parent element with the same name, then the two attributes must have the same name.
  • any competing attribute element must have an a:attributeType attribute with the same value. Two attribute elements
    <attribute> nc1 p1 </attribute>

    and

    <attribute> nc2 p2 </attribute>

    compete if and only if the containing definitions compete and there is a name n that belongs to both nc1 and nc2. Note that a definition competes with itself.

  • it must not have an a:defaultValue attribute (see Section 4)

An instance is valid with respect to the a:attributeType attributes in the schema if the attribute values in the instance declared by a:attributeType attributes in the schema to be of type ID, IDREF or IDREFS meet the validity constraints specified in [XML 1.0] for values of that type, after normalizing the values by applying the normalizeWhiteSpace function defined in [RELAX NG].

The a:attributeType annotation implies a modification of the infoset that changes the [attribute type] property of attribute information items to ID, IDREF or IDREFS and modifies the [normalized value] by applying the normalizeWhiteSpace function.

Editorial Note

Define this more precisely.

6. Documentation

The a:documentation element can be used to specify human-readable documentation. The functionality provided by an a:documentation element in a RELAX NG schema would be provided by a comment in an XML 1.0 DTD. An a:documentation element must not contain any elements. It can have any attributes whose namespace URI is neither the empty string, the RELAX NG namespace URI nor the RELAX NG annotation namespace URI. In particular, it may have an xml:lang attribute.

The documentation specified in an a:documentation element applies to the parent of the a:documentation element. To apply documentation to a value element, wrap the value element in a group element. To apply documentation to a name element, wrap the name element in a choice element.

A RELAX NG element may have multiple a:documentation child elements, but all a:documentation child elements must precede all child elements from the RELAX NG namespace.

A. RELAX NG schema

To be supplied.

References

James Clark, Makoto MURATA, editors. RELAX NG Specification. OASIS, 2001.

Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.

John Cowan, Richard Tobin, editors. XML Information Set. W3C (World Wide Web Consortium), 2001.