<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE article [
<!-- ELEMENT declarations work around MSXML bug. -->
<!ELEMENT section ANY>
<!ATTLIST section id ID #IMPLIED>
<!ELEMENT appendix ANY>
<!ATTLIST appendix id ID #IMPLIED>
<!ELEMENT bibliomixed ANY>
<!ATTLIST bibliomixed id ID #IMPLIED>
]>
<article status="Committee Specification">
<articleinfo>
<releaseinfo>$Id: tutorial.xml,v 1.63 2001/08/10 08:05:00 jjc Exp $</releaseinfo>
<title>RELAX NG Tutorial</title>
<authorgroup>
<editor>
  <firstname>James</firstname><surname>Clark</surname>
  <affiliation>
    <address><email>jjc@jclark.com</email></address>
  </affiliation>
</editor>
<editor>
  <surname>MURATA</surname><firstname>Makoto</firstname>
  <affiliation>
    <address><email>mura034@attglobal.net</email></address>
  </affiliation>
</editor>
</authorgroup>
<pubdate>10 August 2001</pubdate>
<releaseinfo role="meta">
$Id: tutorial.xml,v 1.63 2001/08/10 08:05:00 jjc Exp $
</releaseinfo>

<copyright><year>2001</year><holder>OASIS</holder></copyright>

<legalnotice>

<para>Copyright &#169; The Organization for the Advancement of
Structured Information Standards [OASIS] 2001. All Rights
Reserved.</para>

<para>This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to OASIS, except as needed for the
purpose of developing OASIS specifications, in which case the
procedures for copyrights defined in the OASIS Intellectual Property
Rights document must be followed, or as required to translate it into
languages other than English.</para>

<para>The limited permissions granted above are perpetual and will not
be revoked by OASIS or its successors or assigns.</para>

<para>This document and the information contained herein is provided
on an <quote>AS IS</quote> basis and OASIS DISCLAIMS ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE
USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.</para>

</legalnotice>

<legalnotice role="status"><title>Status of this Document</title>
<!--
<para>This is a working draft constructed by the editors. It is not an
official committee work product and may not reflect the consensus
opinion of the committee.  Comments on this document may be sent to
<ulink url="mailto:relax-ng-comment@lists.oasis-open.org"
>relax-ng-comment@lists.oasis-open.org</ulink>.</para>
-->

<para>This committee specification was approved for publication by the
OASIS RELAX NG technical committee. Comments on this document may be
sent to <ulink url="mailto:relax-ng-comment@lists.oasis-open.org"
>relax-ng-comment@lists.oasis-open.org</ulink>.</para>

</legalnotice>

<abstract>
<para>RELAX NG is a simple schema language for XML, based on <xref
linkend="relax"/> and <xref linkend="trex"/>. A RELAX NG schema
specifies a pattern for the structure and content of an XML
document. A RELAX NG schema thus identifies a class of XML documents
consisting of those documents that match the pattern.  A RELAX NG
schema is itself an XML document.</para>

<para>This document is a tutorial for RELAX NG version 0.9.</para>

</abstract>

<revhistory>
<revision>
  <revnumber>Committee Specification</revnumber>
  <date>10 August 2001</date>
</revision>
<revision>
  <revnumber>Working Draft</revnumber>
  <date>12 June 2001</date>
</revision>
</revhistory>
</articleinfo>


<section>
<title>Getting started</title>

<para>Consider a simple XML representation of an email address book:</para>

<programlisting><![CDATA[<addressBook>
  <card>
    <name>John Smith</name>
    <email>js@example.com</email>
  </card>
  <card>
    <name>Fred Bloggs</name>
    <email>fb@example.net</email>
  </card>
</addressBook>]]></programlisting>

<para>The DTD would be as follows:</para>

<programlisting><![CDATA[<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card (name, email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>]]></programlisting>

<para>A RELAX NG pattern for this could be written as follows:</para>

<programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>If the <literal>addressBook</literal> is required to be non-empty, then
we can use <literal>oneOrMore</literal> instead of
<literal>zeroOrMore</literal>:</para>

<programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <oneOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </oneOrMore>
</element>]]></programlisting>

<para>Now let's change it to allow each <literal>card</literal> to have an
optional <literal>note</literal> element.</para>

<programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>Note that the <literal>text</literal> pattern matches arbitrary text,
including empty text. Note also that whitespace separating tags is
ignored when matching against a pattern.</para>

<para>All the elements specifying the pattern must be namespace qualified
by the namespace URI:</para>

<programlisting>http://relaxng.org/ns/structure/0.9</programlisting>

<para>The examples above use a default namespace declaration
<literal>xmlns="http://relaxng.org/ns/structure/0.9"</literal> for this. A
namespace prefix is equally acceptable:</para>

<programlisting><![CDATA[<rng:element name="addressBook" xmlns:rng="http://relaxng.org/ns/structure/0.9">
  <rng:zeroOrMore>
    <rng:element name="card">
      <rng:element name="name">
        <rng:text/>
      </rng:element>
      <rng:element name="email">
        <rng:text/>
      </rng:element>
    </rng:element>
  </rng:zeroOrMore>
</rng:element>]]></programlisting>

<para>For the remainder of this document, the default namespace
declaration will be left out of examples.</para>

</section>

<section>
<title>Choice</title>

<para>Now suppose we want to allow the <literal>name</literal> to be broken
down into a <literal>givenName</literal> and a <literal>familyName</literal>,
allowing an <literal>addressBook</literal> like this:</para>

<programlisting><![CDATA[<addressBook>
  <card>
    <givenName>John</givenName>
    <familyName>Smith</familyName>
    <email>js@example.com</email>
  </card>
  <card>
    <name>Fred Bloggs</name>
    <email>fb@example.net</email>
  </card>
</addressBook>]]></programlisting>

<para>We can use the following pattern:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
        <element name="name">
          <text/>
        </element>
        <group>
          <element name="givenName">
            <text/>
          </element>
          <element name="familyName">
            <text/>
          </element>
        </group>
      </choice>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <text/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>This corresponds to the following DTD:</para>

<programlisting><![CDATA[<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card ((name | (givenName, familyName)), email, note?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT givenName (#PCDATA)>
<!ELEMENT familyName (#PCDATA)>
<!ELEMENT note (#PCDATA)>
]>]]></programlisting>
</section>

<section>
<title>Attributes</title>

<para>Suppose we want the <literal>card</literal> element to have attributes
rather than child elements. The DTD might look like this</para>

<programlisting><![CDATA[<!DOCTYPE addressBook [
<!ELEMENT addressBook (card*)>
<!ELEMENT card EMPTY>
<!ATTLIST card
  name CDATA #REQUIRED
  email CDATA #REQUIRED>
]>]]></programlisting>

<para>Just change each <literal>element</literal> pattern to an
<literal>attribute</literal> pattern:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <attribute name="name">
        <text/>
      </attribute>
      <attribute name="email">
        <text/>
      </attribute>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>In XML, the order of attributes is traditionally not significant.
RELAX NG follows this tradition.  The above pattern would match both</para>

<programlisting><![CDATA[<card name="John Smith" email="js@example.com"/>]]></programlisting>

<para>and</para>

<programlisting><![CDATA[<card email="js@example.com" name="John Smith"/>]]></programlisting>

<para>In contrast, the order of elements is significant. The pattern</para>

<programlisting><![CDATA[<element name="card">
  <element name="name">
    <text/>
  </element>
  <element name="email">
    <text/>
  </element>
</element>]]></programlisting>

<para>would <emphasis role="strong">not</emphasis> match:</para>

<programlisting><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></programlisting>

<para>Note that an <literal>attribute</literal> element by itself indicates a
required attribute, just as an <literal>element</literal> element by itself
indicates a required element. To specify an optional attribute, use
<literal>optional</literal> just as with <literal>element</literal>:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <attribute name="name">
        <text/>
      </attribute>
      <attribute name="email">
        <text/>
      </attribute>
      <optional>
        <attribute name="note">
          <text/>
        </attribute>
      </optional>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>The <literal>group</literal> and <literal>choice</literal> patterns can be
applied to <literal>attribute</literal> patterns in the same way they are
applied to <literal>element</literal> patterns.  For example, if we wanted
to allow either a <literal>name</literal> attribute or both a
<literal>givenName</literal> and a <literal>familyName</literal> attribute, we can
specify this in the same way that we would if we were using
elements:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
        <attribute name="name">
          <text/>
        </attribute>
        <group>
          <attribute name="givenName">
            <text/>
          </attribute>
          <attribute name="familyName">
            <text/>
          </attribute>
        </group>
      </choice>
      <attribute name="email">
        <text/>
      </attribute>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>The <literal>group</literal> and <literal>choice</literal>
patterns can combine <literal>element</literal> and
<literal>attribute</literal> patterns without restriction. For
example, the following pattern would allow a choice of elements and
attributes independently for both the <literal>name</literal> and the
<literal>email</literal> part of a <literal>card</literal>:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <choice>
	<element name="name">
	  <text/>
	</element>
	<attribute name="name">
	  <text/>
	</attribute>
      </choice>
      <choice>
	<element name="email">
	  <text/>
	</element>
	<attribute name="email">
	  <text/>
	</attribute>
      </choice>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>As usual, the relative order of elements is significant, but the
relative order of attributes is not. Thus the above would match any
of:</para>

<programlisting><![CDATA[<card name="John Smith" email="js@example.com"/>
<card email="js@example.com" name="John Smith"/>
<card email="js@example.com"><name>John Smith</name></card>
<card name="John Smith"><email>js@example.com</email></card>
<card><name>John Smith</name><email>js@example.com</email></card>]]></programlisting>

<para>However, it would not match</para>

<programlisting><![CDATA[<card><email>js@example.com</email><name>John Smith</name></card>]]></programlisting>

<para>because the pattern for <literal>card</literal> requires any
<literal>email</literal> child element to follow any <literal>name</literal> child
element.</para>

<para>There is one difference between <literal>attribute</literal> and
<literal>element</literal> patterns: <literal>&lt;text/&gt;</literal>
is the default for the content of an <literal>attribute</literal> pattern,
whereas an <literal>element</literal> pattern is not allowed to be
empty. For example,</para>

<programlisting><![CDATA[<attribute name="email"/>]]></programlisting>

<para>is short for</para>

<programlisting><![CDATA[<attribute name="email">
  <text/>
</attribute>]]></programlisting>

<para>It might seem natural that</para>

<programlisting><![CDATA[<element name="x"/>]]></programlisting>

<para>matched an <literal>x</literal> element with no attributes and no
content.  However, this would make the meaning of empty content
inconsistent between the <literal>element</literal> pattern and the
<literal>attribute</literal> pattern, so RELAX NG does not allow the
<literal>element</literal> pattern to be empty. A pattern that matches an
element with no attributes and no children must use
<literal>&lt;empty/&gt;</literal> explicitly:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
        <element name="prefersHTML">
          <empty/>
        </element>
      </optional>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>Even if the pattern in an <literal>element</literal> pattern
matches attributes only, there is no need to use
<literal>empty</literal>. For example,</para>

<programlisting><![CDATA[<element name="card">
  <attribute name="email">
    <text/>
  </attribute>
</element>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<element name="card">
  <attribute name="email">
    <text/>
  </attribute>
  <empty/>
</element>]]></programlisting>

</section>

<section>
<title>Named patterns</title>

<para>For a non-trivial RELAX NG pattern, it is often convenient to be able
to give names to parts of the pattern.  Instead of</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
	<text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>we can write</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="cardContent"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="cardContent">
    <element name="name">
      <text/>
    </element>
    <element name="email">
      <text/>
    </element>
  </define>

</grammar>]]></programlisting>

<para>A <literal>grammar</literal> element has a single <literal>start</literal>
child element, and zero or more <literal>define</literal> child elements.
The <literal>start</literal> and <literal>define</literal> elements contain
patterns. These patterns can contain <literal>ref</literal> elements that
refer to patterns defined by any of the <literal>define</literal> elements
in that <literal>grammar</literal> element. A <literal>grammar</literal> pattern
is matched by matching the pattern contained in the <literal>start</literal>
element.</para>

<para>We can use the <literal>grammar</literal> element to write patterns in a
style similar to DTDs:</para>

<programlisting><![CDATA[<grammar>

  <start>
    <ref name="AddressBook"/>
  </start>

  <define name="AddressBook">
    <element name="addressBook">
      <zeroOrMore>
        <ref name="Card"/>
      </zeroOrMore>
    </element>
  </define>

  <define name="Card">
    <element name="card">
      <ref name="Name"/>
      <ref name="Email"/>
    </element>
  </define>

  <define name="Name">
    <element name="name">
      <text/>
    </element>
  </define>

  <define name="Email">
    <element name="email">
      <text/>
    </element>
  </define>

</grammar>]]></programlisting>

<para>Recursive references are allowed.  For example</para>

<programlisting><![CDATA[<define name="inline">
  <zeroOrMore>
    <choice>
      <text/>
      <element name="bold">
        <ref name="inline"/>
      </element>
      <element name="italic">
        <ref name="inline"/>
      </element>
      <element name="span">
        <optional>
          <attribute name="style"/>
        </optional>
        <ref name="inline"/>
      </element>
    </choice>
  </zeroOrMore>
</define>]]></programlisting>

<para>However, recursive references must be within an
<literal>element</literal>.  Thus, the following is <emphasis role="strong">not</emphasis>
allowed:</para>

<programlisting><![CDATA[<define name="inline">
  <choice>
    <text/>
    <element name="bold">
      <ref name="inline"/>
    </element>
    <element name="italic">
      <ref name="inline"/>
    </element>
    <element name="span">
      <optional>
	<attribute name="style"/>
      </optional>
      <ref name="inline"/>
    </element>
  </choice>
  <optional>
    <ref name="inline"/>
  </optional>
</define>]]></programlisting>

</section>

<section>
<title>Datatyping</title>

<para>RELAX NG allows patterns to reference externally-defined
datatypes, such as those defined by W3C XML Schema Part 2.  RELAX NG
implementations may differ in what datatypes they support.  You must
use datatypes that are supported by the implementation you plan to
use.</para>

<para>The <literal>data</literal> pattern matches a string that
represents a value of a named datatype. The
<literal>datatypeLibrary</literal> attribute contains a URI
identifying the library of datatypes being used. The datatype
library defined W3C XML Schema Part 2 would be identified by the
URI <literal>http://www.w3.org/2001/XMLSchema-datatypes</literal>.
The <literal>type</literal> attribute specifies the name of the
datatype in the library identified by the
<literal>datatypeLibrary</literal> attribute. For example, if a
RELAX NG implementation supported the datatypes of W3C XML
Schema Part 2, you could use:</para>

<programlisting><![CDATA[<element name="number">
  <data type="integer" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/>
</element>]]></programlisting>

<para>It is inconvenient to specify the
<literal>datatypeLibrary</literal> attribute on every
<literal>data</literal> element, so RELAX NG allows the
<literal>datatypeLibrary</literal> attribute to be inherited.  The
<literal>datatypeLibrary</literal> attribute can be specified on any
RELAX NG element.  If a <literal>data</literal> element does not have
a <literal>datatypeLibrary</literal> attribute, it will use the
value from the closest ancestor that has a
<literal>datatypeLibrary</literal> attribute.  Typically, the
<literal>datatypeLibrary</literal> attribute is specified on the
root element of the RELAX NG pattern. For example:</para>

<programlisting><![CDATA[<element name="point" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <element name="x">
    <data type="double"/>
  </element>
  <element name="y">
    <data type="double"/>
  </element>
</element>]]></programlisting>

<para>If the children of an element or an attribute match a
<literal>data</literal> pattern, then complete content of the element or
attribute must match that <literal>data</literal> pattern.  It is not
permitted to have a pattern which allows part of the content to match
a <literal>data</literal> pattern, and another part to match another
pattern. For example, the following pattern is <emphasis role="strong">not</emphasis>
allowed:</para>

<programlisting><![CDATA[<element name="bad">
  <data type="int"/>
  <element name="note">
    <text/>
  </element>
</element>]]></programlisting>

<para>However, this would be fine:</para>

<programlisting><![CDATA[<element name="ok">
  <data type="int"/>
  <attribute name="note">
    <text/>
  </attribute>
</element>]]></programlisting>

<para>Note that this restriction does not apply to the
<literal>text</literal> pattern.</para>

<para>Datatypes may have parameters. For example, a string datatype may
have a parameter controlling the length of the string.  The parameters
applicable to any particular datatype are determined by the datatyping
vocabulary.  Parameters are specified by adding one or more
<literal>param</literal> elements as children of the <literal>data</literal>
element.  For example, the following constrains the <literal>email</literal>
element to contain a string at most 127 characters long:</para>

<programlisting><![CDATA[<element name="email">
  <data type="string">
    <param name="maxLength">127</param>
  </data>
</element>]]></programlisting>

</section>

<section>
<title>Enumerations</title>

<para>Many markup vocabularies have attributes whose value is constrained
to be one of set of specified values.  The <literal>value</literal> pattern
matches a string that has a specified value.  For example,</para>

<programlisting><![CDATA[<element name="card">
  <attribute name="name"/>
  <attribute name="email"/>
  <attribute name="preferredFormat">
    <choice>
      <value>html</value>
      <value>text</value>
    </choice>
  </attribute>
</element>]]></programlisting>

<para>allows the <literal>preferredFormat</literal> attribute to have the value
<literal>html</literal> or <literal>text</literal>.  This corresponds to the
DTD</para>

<programlisting><![CDATA[<!DOCTYPE card [
<!ELEMENT card EMPTY>
<!ATTLIST card
  name CDATA #REQUIRED
  email CDATA #REQUIRED
  preferredFormat (html|text) #REQUIRED>
]>]]></programlisting>

<para>The <literal>value</literal> pattern is not restricted to attribute
values. For example, the following is allowed:</para>

<programlisting><![CDATA[<element name="card">
  <element name="name">
    <text/>
  </element>
  <element name="email">
    <text/>
  </element>
  <element name="preferredFormat">
    <choice>
      <value>html</value>
      <value>text</value>
    </choice>
  </element>
</element>]]></programlisting>

<para>The prohibition against a <literal>data</literal> pattern's matching
only part of the content of an element also applies to
<literal>value</literal> patterns.</para>

<para>By default, the <literal>value</literal> pattern will consider the string
in the pattern to match the string in the document if the two strings
are the same after the whitespace in both strings is normalized.
Whitespace normalization strips leading and trailing white-space
characters, and collapses sequences of one or more white-space
characters to a single space character.  This corresponds to the
behaviour of an XML parser for an attribute that is declared as other
than CDATA. Thus the above pattern will match any of</para>

<programlisting><![CDATA[<card name="John Smith" email="js@example.com" preferredFormat="html"/>
<card name="John Smith" email="js@example.com" preferredFormat="  html  "/>]]></programlisting>

<para>The way that the <literal>value</literal> pattern compares the
pattern string with the document string can be controlled by
specifying a <literal>type</literal> attribute and optionally a
<literal>datatypeLibrary</literal> attribute, which identify a
datatype in the same way as for the <literal>data</literal> pattern.
The pattern string matches the document string if they both represent
the same value of the specified datatype. Thus, whereas the
<literal>data</literal> pattern matches an arbitrary value of a
datatype, the <literal>value</literal> pattern matches a specific
value of a datatype.</para>

<para>If there is no ancestor element with a
<literal>datatypeLibrary</literal> element, the datatype library
defaults to a built-in RELAX NG datatype library.  This provides two
datatypes, <literal>string</literal> and <literal>token</literal>.
The built-in datatype <literal>token</literal> corresponds to the
default comparison behavior of the <literal>value</literal> pattern.
The built-in datatype <literal>string</literal> compares strings
without any whitespace normalization (other than the end-of-line and
attribute value normalization automatically performed by XML).  For
example,</para>

<programlisting><![CDATA[<element name="card">
  <attribute name="name"/>
  <attribute name="email"/>
  <attribute name="preferredFormat">
    <choice>
      <value type="string">html</value>
      <value type="string">text</value>
    </choice>
  </attribute>
</element>]]></programlisting>

<para>will <emphasis role="strong">not</emphasis> match</para>

<programlisting><![CDATA[<card name="John Smith" email="js@example.com" preferredFormat="  html  "/>]]></programlisting>

</section>


<section>
<title>Lists</title>

<para>The <literal>list</literal> pattern matches a whitespace-separated
sequence of tokens; it contains a pattern that the sequence of
individual tokens must match.  The <literal>list</literal> pattern
splits a string into a list of strings, and then matches the resulting
list of strings against the pattern inside the <literal>list</literal>
pattern.</para>

<para>For example, suppose we want to have a <literal>vector</literal>
element that contains two floating point numbers separated by
whitespace.  We could use <literal>list</literal> as follows:</para>

<programlisting><![CDATA[<element name="vector">
  <list>
    <data type="float"/>
    <data type="float"/>
  </list>
</element>]]></programlisting>

<para>Or suppose we want the <literal>vector</literal> element to
contain a list of one or more floating point numbers separated by
whitespace:</para>

<programlisting><![CDATA[<element name="vector">
  <list>
    <oneOrMore>
      <data type="double"/>
    </oneOrMore>
  </list>
</element>]]></programlisting>

<para>Or suppose we want a <literal>path</literal> element containing
an even number of floating point numbers:</para>

<programlisting><![CDATA[<element name="path">
  <list>
    <oneOrMore>
      <data type="double"/>
      <data type="double"/>
    </oneOrMore>
  </list>
</element>]]></programlisting>

</section>

<section>
<title>Interleaving</title>

<para>The <literal>interleave</literal> pattern allows child elements to occur
in any order. For example, the following would allow the
<literal>card</literal> element to contain the <literal>name</literal> and
<literal>email</literal> elements in any order:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <interleave>
	<element name="name">
	  <text/>
	</element>
	<element name="email">
	  <text/>
	</element>
      </interleave>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>The pattern is called <literal>interleave</literal> because of how it
works with patterns that match more than one element.  Suppose we want
to write a pattern for the HTML <literal>head</literal> element which
requires exactly one <literal>title</literal> element, at most one
<literal>base</literal> element and zero or more <literal>style</literal>,
<literal>script</literal>, <literal>link</literal> and <literal>meta</literal> elements
and suppose we are writing a <literal>grammar</literal> pattern that has one
definition for each element.  Then we could define the pattern for
<literal>head</literal> as follows:</para>

<programlisting><![CDATA[<define name="head">
  <element name="head">
    <interleave>
      <ref name="title"/>
      <optional>
        <ref name="base"/>
      </optional>
      <zeroOrMore>
        <ref name="style"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="script"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="link"/>
      </zeroOrMore>
      <zeroOrMore>
        <ref name="meta"/>
      </zeroOrMore>
    </interleave>
  </element>
</define>]]></programlisting>

<para>Suppose we had a <literal>head</literal> element that contained a
<literal>meta</literal> element, followed by a <literal>title</literal> element,
followed by a <literal>meta</literal> element.  This would match the pattern
because it is an interleaving of a sequence of two <literal>meta</literal>
elements, which match the child pattern</para>

<programlisting><![CDATA[      <zeroOrMore>
        <ref name="meta"/>
      </zeroOrMore>]]></programlisting>

<para>and a sequence of one <literal>title</literal> element, which matches
the child pattern</para>

<programlisting><![CDATA[      <ref name="title"/>]]></programlisting>

<para>The semantics of the <literal>interleave</literal> pattern are that a
sequence of elements matches an <literal>interleave</literal> pattern if it
is an interleaving of sequences that match the child patterns of the
<literal>interleave</literal> pattern.  Note that this is different from the
<literal>&amp;</literal> connector in SGML: <literal>A* &amp; B</literal> matches
the sequence of elements <literal>A A B</literal> or the sequence of
elements <literal>B A A</literal> but not the sequence of elements <literal>A B
A</literal>.</para>

<para>One special case of <literal>interleave</literal> is very common:
interleaving <literal>&lt;text/&gt;</literal> with a pattern
<replaceable>p</replaceable> represents a pattern that matches what <replaceable>p</replaceable>
matches but also allows characters to occur as children.  The
<literal>mixed</literal> element is a shorthand for this.</para>

<programlisting><![CDATA[<mixed> ]]><replaceable>p</replaceable><![CDATA[ </mixed>]]></programlisting>

<para>is short for</para>

<programlisting><![CDATA[<interleave> <text/> ]]><replaceable>p</replaceable><![CDATA[ </interleave>]]></programlisting>

</section>

<section>
<title>Modularity</title>

<section>
<title>Referencing external patterns</title>

<para>The <literal>externalRef</literal> pattern can be used to
reference a pattern defined in a separate file.  The
<literal>externalRef</literal> element has a required
<literal>href</literal> attribute that specifies the URL of a file
containing the pattern.  The <literal>externalRef</literal> matches if
the pattern contained in the specified URL matches. Suppose for
example, you have a RELAX NG pattern that matches HTML inline content
stored in <literal>inline.rng</literal>:</para>

<programlisting><![CDATA[<grammar>
  <start>
    <ref name="inline"/>
  </start>

  <define name="inline">
    <zeroOrMore>
      <choice>
        <text/>
        <element name="code">
          <ref name="inline"/>
        </element>
        <element name="em">
          <ref name="inline"/>
        </element>
        <!-- etc -->
      </choice>
    </zeroOrMore>
  </define>
</grammar>]]></programlisting>

<para>Then we could allow the <literal>note</literal> element to contain
inline HTML markup by using <literal>externalRef</literal> as follows:</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
      <optional>
	<element name="note">
	  <externalRef href="inline.rng"/>
	</element>
      </optional>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>For another example, suppose you have two RELAX NG patterns stored in
files <literal>pattern1.rng</literal> and <literal>pattern2.rng</literal>. Then
the following is a pattern that matches anything matched
by either of those patterns:</para>

<programlisting><![CDATA[<choice>
  <externalRef href="pattern1.rng"/>
  <externalRef href="pattern2.rng"/>
</choice>]]></programlisting>

</section>

<section>
<title>Combining definitions</title>

<para>If a grammar contains multiple definitions with the same name,
then the definitions must specify how they are to be combined into a
single definition by using the <literal>combine</literal> attribute.
The <literal>combine</literal> attribute may have the value
<literal>choice</literal> or <literal>interleave</literal>. For
example:</para>

<programlisting><![CDATA[<define name="inline.class" combine="choice">
  <element name="bold">
    <ref name="inline"/>
  </element>
</define>

<define name="inline.class" combine="choice">
  <element name="italic">
    <ref name="inline"/>
  </element>
</define>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<define name="inline.class">
  <choice>
    <element name="bold">
      <ref name="inline"/>
    </element>
    <element name="italic">
      <ref name="inline"/>
    </element>
  </choice>
</define>]]></programlisting>

<para>When combining attributes, <literal>combine="interleave"</literal>
is typically used.  For example:</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="card.attlist"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.attlist" combine="interleave">
    <attribute name="name">
      <text/>
    </attribute>
  </define>

  <define name="card.attlist" combine="interleave">
    <attribute name="email">
      <text/>
    </attribute>
  </define>

</grammar>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="card.attlist"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.attlist">
    <interleave>
      <attribute name="name">
	<text/>
      </attribute>
      <attribute name="email">
	<text/>
      </attribute>
    </interleave>
  </define>

</grammar>]]></programlisting>

<para>which is equivalent to</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="card.attlist"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="card.attlist">
    <group>
      <attribute name="name">
	<text/>
      </attribute>
      <attribute name="email">
	<text/>
      </attribute>
    </group>
  </define>

</grammar>]]></programlisting>

<para>since combining attributes with <literal>interleave</literal>
has the same effect as combining them with
<literal>group</literal>.</para>

<para>It is an error for two definitions of the same name to specify
different values for <literal>combine</literal>. Note that the order
of definitions within a grammar is not significant.</para>

<para>Multiple <literal>start</literal> elements can be combined in
the same way as multiple definitions.</para>

</section>

<section>
<title>Merging grammars</title>

<para>The <literal>include</literal> element allows grammars to be
merged together. A <literal>grammar</literal> pattern may have
<literal>include</literal> elements as children.  An
<literal>include</literal> element has a required
<literal>href</literal> attribute that specifies the URL of a file
containing a <literal>grammar</literal> pattern.  The definitions in
the referenced <literal>grammar</literal> pattern will be included in
<literal>grammar</literal> pattern containing the
<literal>include</literal> element.</para>

<para>The <literal>combine</literal> attribute is particularly useful
in conjunction with <literal>include</literal>.  For example, suppose
a RELAX NG pattern <literal>inline.rng</literal> provides a pattern
for inline content, which allows <literal>bold</literal> and
<literal>italic</literal> elements arbitrarily nested:</para>

<programlisting><![CDATA[<grammar>

  <define name="inline">
    <zeroOrMore>
      <ref name="inline.class"/>
    </zeroOrMore>
  </define>

  <define name="inline.class">
    <choice>
      <text/>
      <element name="bold">
	<ref name="inline"/>
      </element>
      <element name="italic">
	<ref name="inline"/>
      </element>
    </choice>
  </define>

</grammar>]]></programlisting>

<para>Another RELAX NG pattern could use <literal>inline.rng</literal>
and add <literal>code</literal> and <literal>em</literal> to the set
of inline elements as follows:</para>

<programlisting><![CDATA[<grammar>

  <include href="inline.rng"/>

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="inline.class" combine="choice">
    <choice>
      <element name="code">
	<ref name="inline">
      </element>
      <element name="em">
	<ref name="inline">
      </element>
    </choice>
  </define>
  
</grammar>]]></programlisting>

<para>This would be equivalent to</para>

<programlisting><![CDATA[<grammar>

  <define name="inline">
    <zeroOrMore>
      <ref name="inline.class"/>
    </zeroOrMore>
  </define>

  <define name="inline.class">
    <choice>
      <text/>
      <element name="bold">
	<ref name="inline"/>
      </element>
      <element name="italic">
	<ref name="inline"/>
      </element>
    </choice>
  </define>

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="inline.class" combine="choice">
    <choice>
      <element name="code">
	<ref name="inline">
      </element>
      <element name="em">
	<ref name="inline">
      </element>
    </choice>
  </define>
  
</grammar>]]></programlisting>

<para>which is equivalent to</para>

<programlisting><![CDATA[<grammar>

  <define name="inline">
    <zeroOrMore>
      <ref name="inline.class"/>
    </zeroOrMore>
  </define>

  <define name="inline.class">
    <choice>
      <text/>
      <element name="bold">
	<ref name="inline"/>
      </element>
      <element name="italic">
	<ref name="inline"/>
      </element>
      <element name="code">
	<ref name="inline">
      </element>
      <element name="em">
	<ref name="inline">
      </element>
    </choice>
  </define>

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

</grammar>]]></programlisting>

<para>Note that it is allowed for one of the definitions of a name to
omit the <literal>combine</literal> attribute.  However, it is an
error if there is more than one definition that does so.</para>

<para>The <literal>notAllowed</literal> pattern is useful when merging
grammars. The <literal>notAllowed</literal> pattern never matches
anything.  Just as adding <literal>empty</literal> to a
<literal>group</literal> makes no difference, so adding
<literal>notAllowed</literal> to a <literal>choice</literal> makes no
difference.  It is typically used to allow an including pattern to
specify additional choices with <literal>combine="choice"</literal>.
For example, if <literal>inline.rng</literal> were written like
this</para>

<programlisting><![CDATA[<grammar>

  <define name="inline">
    <zeroOrMore>
      <choice>
	<text/>
	<element name="bold">
	  <ref name="inline"/>
	</element>
	<element name="italic">
	  <ref name="inline"/>
	</element>
	<ref name="inline.extra"/>
      </choice>
    </zeroOrMore>
  </define>

  <define name="inline.extra">
    <notAllowed/>
  </define>

</grammar>]]></programlisting>

<para>then it could be customized to allow inline
<literal>code</literal> and <literal>code</literal> elements as
follows</para>

<programlisting><![CDATA[<grammar>

  <include href="inline.rng"/>

  <start>
    <element name="doc">
      <zeroOrMore>
	<element name="p">
	  <ref name="inline"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="inline.extra" combine="choice">
    <choice>
      <element name="code">
	<ref name="inline">
      </element>
      <element name="em">
	<ref name="inline">
      </element>
    </choice>
  </define>
  
</grammar>]]></programlisting>

</section>
 
<section>
<title>Replacing definitions</title>

<para>RELAX NG allows <literal>define</literal> elements to be put
inside the <literal>include</literal> element to indicate that they
are to replace definitions in the included <literal>grammar</literal>
pattern.</para>

<para>Suppose the file <literal>addressBook.rng</literal>
contains:</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="cardContent"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="cardContent">
    <element name="name">
      <text/>
    </element>
    <element name="email">
      <text/>
    </element>
  </define>

</grammar>]]></programlisting>

<para>Suppose we wish to modify this pattern so that the
<literal>card</literal> element contains an
<literal>emailAddress</literal> element instead of an
<literal>email</literal> element. Then we could replace the definition
of <literal>cardContent</literal> as follows:</para>

<programlisting><![CDATA[<grammar>

  <include href="addressBook.rng">

    <define name="cardContent">
      <element name="name">
	<text/>
      </element>
      <element name="emailAddress">
	<text/>
      </element>
    </define>

  </include>

</grammar>]]></programlisting>

<para>This would be equivalent to</para>

<programlisting><![CDATA[<grammar>

  <start>
    <element name="addressBook">
      <zeroOrMore>
	<element name="card">
	  <ref name="cardContent"/>
	</element>
      </zeroOrMore>
    </element>
  </start>

  <define name="cardContent">
    <element name="name">
      <text/>
    </element>
    <element name="emailAddress">
      <text/>
    </element>
  </define>

</grammar>]]></programlisting>

<para>An <literal>include</literal> element can also contain a
<literal>start</literal> element, which replaces the
<literal>start</literal> in the included grammar pattern.</para>

</section>

</section>

<section>
<title>Namespaces</title>

<para>RELAX NG is namespace-aware. Thus, it considers an element or attribute
to have both a local name and a namespace URI which together
constitute the name of that element or attribute.</para>

<section>
<title>Using the <literal>ns</literal> attribute</title>

<para>The <literal>element</literal> pattern uses a <literal>ns</literal> attribute
to specify the namespace URI of the elements that it matches.  For
example</para>

<programlisting><![CDATA[<element name="foo" ns="http://www.example.com">
  <empty/>
</element>]]></programlisting>

<para>would match any of</para>

<programlisting><![CDATA[<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>
<example:foo xmlns:example="http://www.example.com"/>]]></programlisting>

<para>but not any of</para>

<programlisting><![CDATA[<foo/>
<e:foo xmlns:e="http://WWW.EXAMPLE.COM"/>
<example:foo xmlns:example="http://www.example.net"/>]]></programlisting>

<para>A value of an empty string for the <literal>ns</literal> attribute
indicates a null or absent namespace URI (just as with the
<literal>xmlns</literal> attribute).  Thus, the pattern</para>

<programlisting><![CDATA[<element name="foo" ns="">
  <empty/>
</element>]]></programlisting>

<para>matches any of</para>

<programlisting><![CDATA[<foo xmlns=""/>
<foo/>]]></programlisting>

<para>but not any of</para>

<programlisting><![CDATA[<foo xmlns="http://www.example.com"/>
<e:foo xmlns:e="http://www.example.com"/>]]></programlisting>

<para>It is tedious and error-prone to specify the <literal>ns</literal>
attribute on every <literal>element</literal>, so RELAX NG allows it to be
defaulted.  If an <literal>element</literal> pattern does not specify a
<literal>ns</literal> attribute, then it defaults to the value of the
<literal>ns</literal> attribute of the nearest ancestor that has a
<literal>ns</literal> attribute, or the empty string if there is no such
ancestor. Thus</para>

<programlisting><![CDATA[<element name="addressBook">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<element name="addressBook" ns="">
  <zeroOrMore>
    <element name="card" ns="">
      <element name="name" ns="">
        <text/>
      </element>
      <element name="email" ns="">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>and</para>

<programlisting><![CDATA[<element name="addressBook" ns="http://www.example.com">
  <zeroOrMore>
    <element name="card">
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<element name="addressBook" ns="http://www.example.com">
  <zeroOrMore>
    <element name="card" ns="http://www.example.com">
      <element name="name" ns="http://www.example.com">
        <text/>
      </element>
      <element name="email" ns="http://www.example.com">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>The <literal>attribute</literal> pattern also takes a
<literal>ns</literal> attribute.  However, there is a
difference in how it defaults.  This is because of the fact that the
XML Namespaces Recommendation does not apply the default namespace to
attributes.  If a <literal>ns</literal> attribute is not
specified on the <literal>attribute</literal> pattern, then it
defaults to the empty string. Thus</para>

<programlisting><![CDATA[<element name="addressBook" ns="http://www.example.com">
  <zeroOrMore>
    <element name="card">
      <attribute name="name"/>
      <attribute name="email"/>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<element name="addressBook" ns="http://www.example.com">
  <zeroOrMore>
    <element name="card" ns="http://www.example.com">
      <attribute name="name" ns=""/>
      <attribute name="email" ns=""/>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>and so will match</para>

<programlisting><![CDATA[<addressBook xmlns="http://www.example.com">
  <card name="John Smith" email="js@example.com"/>
</addressBook>]]></programlisting>

<para>or</para>

<programlisting><![CDATA[<example:addressBook xmlns:example="http://www.example.com">
  <example:card name="John Smith" email="js@example.com"/>
</example:addressBook>]]></programlisting>

<para>but not</para>

<programlisting><![CDATA[<example:addressBook xmlns:example="http://www.example.com">
  <example:card example:name="John Smith" example:email="js@example.com"/>
</example:addressBook>]]></programlisting>

</section>

<section>
<title>Qualified names</title>

<para>When a pattern matches elements and attributes from multiple
namespaces, using the <literal>ns</literal> attribute would require
repeating namespace URIs in different places in the pattern.  This is
error-prone and hard to maintain, so RELAX NG also allows the
<literal>element</literal> and <literal>attribute</literal> patterns to use a
prefix in the value of the <literal>name</literal> attribute to specify the
namespace URI. In this case, the prefix specifies the namespace URI to
which that prefix is bound by the namespace declarations in scope on
the <literal>element</literal> or <literal>attribute</literal> pattern. Thus</para>

<programlisting><![CDATA[<element name="e:addressBook" xmlns:e="http://www.example.com">
  <zeroOrMore>
    <element name="e:card">
      <element name="e:name">
        <text/>
      </element>
      <element name="e:email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>is equivalent to</para>

<programlisting><![CDATA[<element name="addressBook" ns="http://www.example.com">
  <zeroOrMore>
    <element name="card" ns="http://www.example.com">
      <element name="name" ns="http://www.example.com">
        <text/>
      </element>
      <element name="email" ns="http://www.example.com">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>If a prefix is specified in the value of the <literal>name</literal>
attribute of an <literal>element</literal> or <literal>attribute</literal>
pattern, then that prefix determines the namespace URI of the elements
or attributes that will be matched by that pattern, regardless of
the value of any <literal>ns</literal> attribute.</para>

<para>Note that the XML default namespace (as specified by the
<literal>xmlns</literal> attribute) is not used in determining the namespace
URI of elements and attributes that <literal>element</literal> and
<literal>attribute</literal> patterns match.</para>

</section>


</section>

<section>
<title>Name classes</title>

<para>Normally, the name of the element to be matched by an
<literal>element</literal> element is specified by a <literal>name</literal>
attribute.  An <literal>element</literal> element can instead start with an
element specifying a <emphasis>name-class</emphasis>.  In this case, the
<literal>element</literal> pattern will only match an element if the name of
the element is a member of the name-class.  The simplest name-class is
<literal>anyName</literal>, which any name at all is a member of, regardless
of its local name and its namespace URI.  For example, the following
pattern matches any well-formed XML document:</para>

<programlisting><![CDATA[<grammar>

  <start>
    <ref name="anyElement"/>
  </start>

  <define name="anyElement">
    <element>
      <anyName/>
      <zeroOrMore>
	<choice>
	  <attribute>
	    <anyName/>
	  </attribute>
	  <text/>
	  <ref name="anyElement"/>
	</choice>
      </zeroOrMore>
    </element>
  </define>

</grammar>]]></programlisting>

<para>The <literal>nsName</literal> name-class contains any
name with the namespace URI specified by the
<literal>ns</literal> attribute, which defaults in the same way
as the <literal>ns</literal> attribute on the
<literal>element</literal> pattern.</para>

<para>The <literal>choice</literal> name-class matches any name that is a
member of any of its child name-classes.</para>

<para>The <literal>anyName</literal> and <literal>nsName</literal>
name-classes can contain an <literal>except</literal> clause. For
example</para>

<programlisting><![CDATA[<element name="card" ns="http://www.example.com">
  <zeroOrMore>
    <attribute>
      <anyName>
        <except>
          <nsName/>
          <nsName ns=""/>
        </except>
      </anyName>
    </attribute>
  </zeroOrMore>
  <text/>
</element>]]></programlisting>

<para>would allow the <literal>card</literal> element to have any number of
namespace-qualified attributes provided that they were qualified with
namespace other than that of the <literal>card</literal> element.</para>

<para>Note that an <literal>attribute</literal> pattern matches a single
attribute even if it has a name-class that contains multiple names.
To match zero or more attributes, the <literal>zeroOrMore</literal> element
must be used.</para>

<para>The <literal>name</literal> name-class contains a single name.
The content of the <literal>name</literal> element specifies the name
in the same way as the <literal>name</literal> attribute of the
<literal>element</literal> pattern.  The <literal>ns</literal>
attribute specifies the namespace URI in the same way as the
<literal>element</literal> pattern.</para>

<para>Some schema languages have a concept of <emphasis>lax</emphasis> validation,
where an element or attribute is validated against a definition only
if there is one.  We can implement this concept in RELAX NG with name
classes that uses <literal>except</literal> and <literal>name</literal>.
Suppose, for example, we wanted to allow an element to have any
attribute with a qualified name, but we still wanted to ensure that if
there was an <literal>xml:space</literal> attribute, it had the value
<literal>default</literal> or <literal>preserve</literal>.  It wouldn't work to
use:</para>

<programlisting><![CDATA[<element name="example">
  <zeroOrMore>
    <attribute>
      <anyName/>
    </attribute>
  </zeroOrMore>
  <optional>
    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>
  </optional>
</element>]]></programlisting>

<para>because an <literal>xml:space</literal> attribute with a value
other than <literal>default</literal> or <literal>preserve</literal>
would match</para>

<programlisting><![CDATA[    <attribute>
      <anyName/>
    </attribute>]]></programlisting>

<para>even though it did not match</para>

<programlisting><![CDATA[    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>]]></programlisting>

<para>The solution is to use <literal>name</literal> together with
<literal>except</literal>:</para>

<programlisting><![CDATA[<element name="example">
  <zeroOrMore>
    <attribute>
      <anyName>
        <except>
          <name>xml:space</name>
        </except>
      </anyName>
    </attribute>
  </zeroOrMore>
  <optional>
    <attribute name="xml:space">
      <choice>
        <value>default</value>
        <value>preserve</value>
      </choice>
    </attribute>
  </optional>
</element>]]></programlisting>

<para>Note that the <literal>define</literal> element cannot contain a
name-class; it can only contain a pattern.</para>

</section>

<section>
<title>Annotations</title>

<para>If a RELAX NG element has an attribute or child element with a
namespace URI other than the RELAX NG namespace, then that attribute or
element is ignored.  Thus, you can add annotations to RELAX NG patterns
simply by using an attribute or element in a separate namespace:</para>

<programlisting><![CDATA[<element name="addressBook" xmlns="http://relaxng.org/ns/structure/0.9" xmlns:a="http://www.example.com/annotation">
  <zeroOrMore>
    <element name="card">
      <a:documentation>Information about a single email address.</a:documentation>
      <element name="name">
        <text/>
      </element>
      <element name="email">
        <text/>
      </element>
    </element>
  </zeroOrMore>
</element>]]></programlisting>

<para>RELAX NG also provides a <literal>div</literal> element which
allows an annotation to be applied to a group of definitions in a
grammar.  For example, you might want to divide up the definitions of
the grammar into modules:</para>

<programlisting>&lt;grammar xmlns:m="http://www.example.com/module">

  &lt;div m:name="inline">

    &lt;define name="code"> <replaceable>pattern</replaceable> &lt;/define>
    &lt;define name="em"> <replaceable>pattern</replaceable> &lt;/define>
    &lt;define name="var"> <replaceable>pattern</replaceable> &lt;/define>

  &lt;/div>

  &lt;div m:name="block">

    &lt;define name="p"> <replaceable>pattern</replaceable> &lt;/define>
    &lt;define name="ul"> <replaceable>pattern</replaceable> &lt;/define>
    &lt;define name="ol"> <replaceable>pattern</replaceable> &lt;/define>

  &lt;/div>

&lt;/grammar></programlisting>

<para>This would allow you easily to generate variants of the grammar
based on a selection of modules.</para>

<para>A companion specification, RELAX NG DTD Compatibility
Annotations <xref linkend="anno"/>, defines annotations to implement
some features of XML DTDs.</para>

</section>

<section>
<title>Nested grammars</title>

<para>There is no prohibition against nesting grammar patterns.  A
<literal>ref</literal> pattern refers to a definition from nearest
<literal>grammar</literal> ancestor. There is also a
<literal>parentRef</literal> element that escapes out of the current
grammar and references a definition from the parent of the current
grammar.</para>

<para>Imagine the problem of writing a pattern for tables.  The pattern
for tables only cares about the structure of tables; it doesn't care
about what goes inside a table cell.  First, we create a RELAX NG pattern
<literal>table.rng</literal> as follows:</para>

<programlisting><![CDATA[<grammar>

<define name="cell.content">
  <notAllowed/>
</define>

<start>
  <element name="table">
    <oneOrMore>
      <element name="tr">
        <oneOrMore>
	  <element name="td">
	    <ref name="cell.content"/>
	  </element>
        </oneOrMore>
      </element>
    </oneOrMore>
  </element>
</start>

</grammar>]]></programlisting>

<para>Patterns that include <literal>table.rng</literal> must redefine
<literal>cell.content</literal>. By using a nested
<literal>grammar</literal> pattern containing a
<literal>parentRef</literal> pattern, the including pattern can
redefine <literal>cell.content</literal> to be a pattern defined in
the including pattern's grammar, thus effectively importing a pattern
from the parent grammar into the child grammar:</para>

<programlisting><![CDATA[<grammar>

<start>
  <element name="doc">
    <zeroOrMore>
      <choice>
	<element name="p">
	  <ref name="inline"/>
	</element>
	<grammar>
	  <include href="table.rng">
	    <define name="cell.content">
	      <parentRef name="inline"/>
	    </define>
          </include>
	</grammar>
      </choice>
    </zeroOrMore>
  </element>
</start>

<define name="inline">
  <zeroOrMore>
    <choice>
      <text/>
      <element name="em">
        <ref name="inline"/>
      </element>
    </choice>
  </zeroOrMore>
</define>

</grammar>]]></programlisting>

<para>Of course, in a trivial case like this, there is no advantage in
nesting the grammars: we could simply have included
<literal>table.rng</literal> within the outer <literal>grammar</literal> element.
However, when the included grammar has many definitions, nesting it
avoids the possibility of name conflicts between the including grammar
and the included grammar.</para>

</section>

<section>
<title>Non-restrictions</title>

<para>RELAX NG does not require patterns to be "deterministic" or
"unambiguous".</para>

<para>Suppose we wanted to write the email address book in HTML, but use
class attributes to specify the structure.</para>

<programlisting><![CDATA[<element name="html">
  <element name="head">
    <element name="title">
      <text/>
    </element>
  </element>
  <element name="body">
    <element name="table">
      <attribute name="class">
        <value>addressBook</value>
      </attribute>
      <oneOrMore>
        <element name="tr">
	  <attribute name="class">
	    <value>card</value>
	  </attribute>
          <element name="td">
	    <attribute name="class">
	      <value>name</value>
	    </attribute>
            <interleave>
              <text/>
              <optional>
                <element name="span">
                  <attribute name="class">
                    <value>givenName</value>
                  </attribute>
                  <text/>
                </element>
              </optional>
              <optional>
                <element name="span">
                  <attribute name="class">
                    <value>familyName</value>
                  </attribute>
                  <text/>
                </element>
              </optional>
            </interleave>
          </element>
          <element name="td">
	    <attribute name="class">
	      <value>email</value>
	    </attribute>
            <text/>
          </element>
        </element>
      </oneOrMore>
    </element>
  </element>
</element>]]></programlisting>

<para>This would match a XML document such as:</para>

<programlisting><![CDATA[<html>
  <head>
    <title>Example Address Book</title>
  </head>
  <body>
    <table class="addressBook">
      <tr class="card">
        <td class="name">
          <span class="givenName">John</span>
          <span class="familyName">Smith</span>
        </td>
        <td class="email">js@example.com</td>
      </tr>
    </table>
  </body>
</html>]]></programlisting>

<para>but not</para>

<programlisting><![CDATA[<html>
  <head>
    <title>Example Address Book</title>
  </head>
  <body>
    <table class="addressBook">
      <tr class="card">
        <td class="name">
          <span class="givenName">John</span>
          <!-- Note the incorrect class attribute -->
          <span class="givenName">Smith</span>
        </td>
        <td class="email">js@example.com</td>
      </tr>
    </table>
  </body>
</html>]]></programlisting>

</section>

<section>
<title>Further information</title>

<para>The definitive specification of RELAX NG is <xref
linkend="spec"/>.</para>

</section>

<appendix>
<title>Comparison with XML DTDs</title>

<para>RELAX NG provides functionality that goes beyond XML DTDs. In
particular, RELAX NG</para>

<itemizedlist>

<listitem><para>uses XML syntax to represent schemas</para></listitem>

<listitem><para>supports datatyping</para></listitem>

<listitem><para>integrates attributes into content
models</para></listitem>

<listitem><para>supports XML namespaces</para></listitem>

<listitem><para>supports unordered content</para></listitem>

<listitem><para>supports context-sensitive content
models</para></listitem>

</itemizedlist>

<para>ID/IDREF validation is not provided by RELAX NG; however, it is
provided by a companion specification, RELAX NG DTD Compatibility
Annotations <xref linkend="anno"/>.  Comprehensive support for
cross-reference checking is planned for a future specification.</para>

<para>RELAX NG does not support features of XML DTDs that involve
changing the infoset of an XML document.  In particular, RELAX
NG</para>

<itemizedlist>

<listitem><para>does not allow defaults for attributes to be
specified; however, this is allowed by RELAX NG DTD Compatibility
Annotations <xref linkend="anno"/></para></listitem>

<listitem><para>does not allow entities to be specified</para></listitem>

<listitem><para>does not allow notations to be specified</para></listitem>

<listitem><para>does not specify whether white-space is significant</para></listitem>

</itemizedlist>

<para>Also RELAX NG does not define a way for an XML document to associate
itself with a RELAX NG pattern.</para>

</appendix>

<appendix>

<title>Comparison with RELAX Core</title>

<para>Any description in RELAX Core can be directly captured in RELAX
NG without loss of information.</para>

<section>
<title>Mapping RELAX NG to RELAX Core</title>

<section>
<title><literal>elementRule</literal>-<literal>tag</literal> pairs</title>

<para>An <literal>elementRule</literal> as well as the referenced
<literal>tag</literal> element is typically captured by a
<literal>define</literal> element containing an
<literal>element</literal> element as the child.</para>

<para>An <literal>elementRule</literal>-<literal>tag</literal> pair 
in RELAX Core is shown below:</para>

<programlisting>
&lt;elementRule role="foo" label="bar">
  <replaceable>hedge model</replaceable>
&lt;/elementRule>
</programlisting>

<programlisting>
&lt;tag role="foo" name="baz">
  <replaceable>attribute declarations</replaceable>
&lt;/tag>
</programlisting>

<para>A rewrite in RELAX NG is shown below:</para>

<programlisting>
&lt;define name="bar">
  &lt;element name="baz">
    <replaceable>hedge model</replaceable>
    <replaceable>attribute declarations</replaceable>
  &lt;/element>
&lt;/define>
</programlisting>

</section>
<section>
<title><literal>hedgeRule</literal></title>

<para>A <literal>hedgeRule</literal> element is captured by a
<literal>define</literal> element containing attribute
declarations.</para>

<para>A <literal>hedgeRule</literal> element
in RELAX Core is shown below:</para>

<programlisting>
&lt;hedgeRule label="bar">
  <replaceable>hedge model</replaceable>
&lt;/hedgeRule>
</programlisting>

<para>A rewrite in RELAX NG is</para>

<programlisting>
&lt;define name="bar">
  <replaceable>hedge model</replaceable>
&lt;/define>
</programlisting>

</section>

<section>
<title><literal>attPool</literal></title>

<para>An <literal>attPool</literal> element
in RELAX Core is shown below:</para>

<programlisting>
&lt;attPool role="foo">
  <replaceable>attribute declarations</replaceable>
&lt;/attPool>
</programlisting>

<para>A rewrite in RELAX NG is</para>

<programlisting>
&lt;define name="foo">
  <replaceable>attribute declarations</replaceable>
&lt;/define>
</programlisting>

</section>

<section>
<title>Hedge models</title>

<para>Mapping of hedge models in RELAX Core to RELAX NG is summaized below:
</para>

<orderedlist>
<listitem><para>
<literal>occurs="*"</literal> in RELAX Core 
is captured by
<literal>&lt;zeroOrMore>...&lt;/zeroOrMore></literal>.
</para></listitem>
<listitem><para>
<literal>occurs="+"</literal> in RELAX Core 
is captured by
<literal>&lt;oneOrMore>...&lt;/oneOrMore></literal>
</para></listitem>
<listitem><para>
<literal>occurs="?"</literal> in RELAX Core 
is captured by
<literal>&lt;optional>...&lt;/optional></literal>
</para></listitem>

<listitem><para>
<literal>&lt;mixed>...&lt;/mixed></literal> in
RELAX Core is captured by
<literal>&lt;mixed>...&lt;/mixed></literal>
</para></listitem>

<listitem><para>
<literal>&lt;ref label="..."/></literal> in
RELAX Core is captured by
<literal>&lt;ref name="..."/></literal>.
</para></listitem>

<listitem><para>
<literal>&lt;hedgeRef label="..."/></literal> in
RELAX Core is captured by
<literal>&lt;ref name="..."/></literal>
</para></listitem>
</orderedlist>

</section>

<section>
<title>Attribute declarations</title>

<para>Both languages use <literal>attribute</literal>.  However, in
RELAX Core, an <literal>attribute</literal> without
<literal>required="true"</literal> declares a defaultable attribute.  
On the other hand, in RELAX NG, a defaultable attribute has to 
be declared by an <literal>attribute</literal> element within 
an <literal>optional</literal> element.</para>

<para>Declaration of a required attribute in RELAX Core is shown below:</para>

<programlisting>
&lt;attribute name="foo" type="integer" required="true"/>
</programlisting>

<para>In RELAX NG, this is captured by</para>

<programlisting>
&lt;attribute name="foo">
  &lt;data type="integer"/>
&lt;/attribute>
</programlisting>

<para>Declaration of an optional attribute in RELAX Core is shown
below:</para>

<programlisting>
&lt;attribute name="foo" type="integer"/>
</programlisting>

<para>In RELAX NG, this is captured by</para>

<programlisting>
&lt;optional>
  &lt;attribute name="foo">
    &lt;data type="integer"/>
  &lt;/attribute>
&lt;/optional>
</programlisting>

</section>

</section>

<section>
<title>Examples</title>

<section>
<title>Ancestor-and-sibling-sensitive content models</title>

<para>Here is a rewrite of an example in <ulink
url="http://www.xml.gr.jp/relax/html4/howToRELAX_p1_c8_en.html">STEP
7</ulink> of "HOW TO RELAX".  The first paragraph cannot contain
footnotes, but the other paragraphs can.</para>

<programlisting>
&lt;grammar>
  &lt;start>
    &lt;element name="doc">
      &lt;ref name="paraWithoutFNotes"/>
      &lt;zeroOrMore>
        &lt;ref name="paraWithFNotes"/>
      &lt;/zeroOrMore>
    &lt;/element>
  &lt;/start>

  &lt;define name="paraWithoutFNotes">
    &lt;element name="para">
      &lt;text/>
    &lt;/element>
  &lt;/define>

  &lt;define name="paraWithFNotes">
    &lt;element name="para">
      &lt;mixed>
        &lt;zeroOrMore>
          &lt;element name="fnote">
            &lt;text/>
          &lt;/element>
        &lt;/zeroOrMore>
      &lt;/mixed>
    &lt;/element>
  &lt;/define>

&lt;/grammar>
</programlisting>

<para>The following document matches this pattern.</para>

<programlisting>
&lt;doc>&lt;para/>&lt;para>&lt;fnote/>&lt;/para>&lt;/doc>
</programlisting>

<para>On the other hand, the following document does not.</para>

<programlisting>
&lt;doc>&lt;para>&lt;fnote/>&lt;/para>&lt;/doc>
</programlisting>

</section>

<section>
<title>Attribute-sensitive content model</title>

<para>Here is a rewrite of an example in <ulink
url="http://www.xml.gr.jp/relax/html4/howToRELAX_p1_c9_en.html">STEP
8</ulink> of "HOW TO RELAX".  This pattern assigns different content
models for the same tag name <literal>div</literal> depending on the
value of the attribute <literal>class</literal>.</para>

<programlisting>
&lt;grammar>

  &lt;start>
    &lt;element name="html">
      &lt;zeroOrMore>
        &lt;ref name="section"/>
      &lt;/zeroOrMore>
    &lt;/element>
  &lt;/start>

  &lt;define name="section">
    &lt;element name="div">
      &lt;attribute name="class">&lt;value>section&lt;/value>&lt;/attribute>
      &lt;zeroOrMore>
        &lt;element name="para">
          &lt;text/>
        &lt;/element>
      &lt;/zeroOrMore>
      &lt;zeroOrMore>
        &lt;ref name="subsection"/>
      &lt;/zeroOrMore>
   &lt;/element>
  &lt;/define>

  &lt;define name="subsection">
    &lt;element name="div">
      &lt;attribute name="class">&lt;value>subsection&lt;/value>&lt;/attribute>
      &lt;zeroOrMore>
        &lt;element name="para">
          &lt;text/>
        &lt;/element>
      &lt;/zeroOrMore>
    &lt;/element>
  &lt;/define>

&lt;/grammar>
</programlisting>

<para>The following document matches this pattern.</para>

<programlisting>
&lt;html>
  &lt;div class="section">
    &lt;para/>
    &lt;div class="subsection">
      &lt;para/>
    &lt;/div>
  &lt;/div>
  &lt;div class="section">
    &lt;div class="subsection">
      &lt;para/>
    &lt;/div>
  &lt;/div>
&lt;/html>
</programlisting>

<para>On the other hand, the following document does not.</para>

<programlisting>
&lt;html>
  &lt;div class="subsection">
    &lt;para/>
    &lt;div class="section">
      &lt;para/>
    &lt;/div>
  &lt;/div>
&lt;/html>
</programlisting>

</section>

</section>

<section>
<title>Features of RELAX NG beyond RELAX Core</title>

<para>RELAX NG has some features which are missing in RELAX
Core.</para>

<orderedlist>
<listitem><para>Namespaces: since RELAX Core is intended to be used in
conjunction with RELAX Namespace, RELAX Core does not support
namespaces.  On the other hand, RELAX NG supports namespaces.  RELAX
Namespace will be extended so that it can work with RELAX NG.
</para></listitem>

<listitem><para>Mixture of <literal>element</literal> and
<literal>attribute</literal>: RELAX Core does not allow their 
mixture but rather provide two types of basic constructs, 
namely <literal>elementRule/hedgeRule</literal> and 
<literal>tag/attPool</literal>.</para></listitem>

<listitem><para>Name classes: RELAX Core does not have name
classes but merely provide name literals.</para></listitem>

<listitem><para><literal>interleave</literal>:  RELAX Core does not 
provide any mechanism for interleaving.</para></listitem>

<listitem><para>Datatype libraries: RELAX Core allows XML Schema Part
2 but does not allow other datatype libaries.</para></listitem>

<listitem><para><literal>define</literal> in <literal>include</literal>: 
RELAX Core does not allow such redefinitions.</para></listitem>

<listitem><para><literal>list</literal>: RELAX Core does not provide
such structured strings.</para></listitem>

<listitem><para><literal>data</literal> in <literal>choice</literal>:
in RELAX Core, the hedge model of <literal>elementRule</literal> is 
either a datatype reference or an expression without datatype 
references.</para></listitem>

</orderedlist>
</section>
</appendix>

<appendix>
<title>Comparison with TREX</title>

<para>RELAX NG has the following changes from TREX:</para>

<orderedlist>

<listitem><para>the <literal>concur</literal> pattern has been removed</para></listitem>

<listitem><para>the <literal>string</literal> pattern has been replaced by the
<literal>value</literal> pattern</para></listitem>

<listitem><para>the <literal>anyString</literal> pattern has been renamed to
<literal>text</literal></para></listitem>

<listitem><para>the namespace URI is different</para></listitem>

<listitem><para>pattern elements must be namespace qualified</para></listitem>

<listitem><para>anonymous datatypes have been removed</para></listitem>

<listitem><para>the <literal>data</literal> pattern can have parameters specified by
<literal>param</literal> child elements</para></listitem>

<listitem><para>the <literal>list</literal> pattern has been added
for matching whitespace-separated lists of tokens</para></listitem>

<listitem><para>the <literal>replace</literal> and
<literal>group</literal> values for the <literal>combine</literal>
attribute have been removed</para></listitem>

<listitem><para>an <literal>include</literal> element in a grammar may contain
<literal>define</literal> elements that replace included definitions</para></listitem>

<listitem><para>the restriction that definitions combined with the
<literal>combine</literal> attribute must be from different files has
been removed</para></listitem>

<listitem><para>a <literal>div</literal> element may be used to group
together definitions within a
<literal>grammar</literal></para></listitem>

<listitem><para>an <literal>include</literal> element occurring as a
pattern has been renamed to <literal>externalRef</literal>; an
<literal>include</literal> element is now allowed only as a child of
the <literal>grammar</literal> element</para></listitem>

<listitem><para>the <literal>parent</literal> attribute on the
<literal>ref</literal> element has been replaced by a new
<literal>parentRef</literal> element</para></listitem>

<listitem><para>the <literal>type</literal> attribute of the
<literal>data</literal> element is an unqualified name; the
<literal>data</literal> element uses the
<literal>datatypeLibrary</literal> attribute rather than the
<literal>ns</literal> attribute to identify the namespace of the
datatype</para></listitem>

<listitem><para>a <literal>start</literal> element is not allowed to
have a <literal>name</literal> attribute</para></listitem>

<listitem><para>an <literal>attribute</literal> element is not allowed
to have a <literal>global</literal> attribute</para></listitem>

<listitem><para>the <literal>not</literal> and <literal>difference</literal>
name classes have been replaced by <literal>except</literal></para></listitem>

<listitem><para>the <literal>data</literal> element may have
an <literal>except</literal> child</para></listitem>

</orderedlist>

</appendix>

<appendix>
<title>Changes from 12 June 2001 version</title>

<orderedlist>
<listitem><para><literal>key</literal> and <literal>keyRef</literal>
have been removed; support for ID and IDREF is now available
in a companion specification, RELAX NG DTD Compatibility
Annotations <xref linkend="anno"/></para></listitem>

<listitem><para><literal>difference</literal> and <literal>not</literal>
have been replaced by <literal>except</literal></para></listitem>

<listitem><para>a <literal>start</literal> element is no longer
allowed to have a <literal>name</literal> attribute</para></listitem>

<listitem><para>an <literal>attribute</literal> element is no longer
allowed to have a <literal>global</literal>
attribute</para></listitem>
</orderedlist>

</appendix>

<bibliography><title>References</title>

<bibliomixed id="spec"><abbrev>RELAX NG</abbrev>James Clark, Makoto
MURATA, editors.  <citetitle><ulink
url="http://www.oasis-open.org/committees/relax-ng/spec.html">RELAX NG
Specification</ulink></citetitle>.  OASIS, 2001.</bibliomixed>

<bibliomixed id="anno"><abbrev>Annotation</abbrev>James Clark, Makoto
MURATA, editors.  <citetitle><ulink
url="http://www.oasis-open.org/committees/relax-ng/annotate.html">RELAX NG
DTD Compatibility Annotations</ulink></citetitle>.  OASIS, 2001.</bibliomixed>

<bibliomixed id="trex"><abbrev>TREX</abbrev>James Clark.
<citetitle><ulink url="http://www.thaiopensource.com/trex/">TREX - Tree Regular Expressions for XML</ulink></citetitle>.
Thai Open Source Software Center, 2001.</bibliomixed>

<bibliomixed id="relax"><abbrev>RELAX</abbrev>MURATA Makoto.
<citetitle><ulink url="http://www.xml.gr.jp/relax/">RELAX (Regular
Language description for XML)</ulink></citetitle>.  INSTAC
(Information Technology Research and Standardization Center), 2001.</bibliomixed>

</bibliography>

</article>
