RELAX NG Compact Syntax

Working Draft�7 June 2002

This version:: Working Draft: 7 June 2002

Editor:: James Clark�<jjc@jclark.com>

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The limited permissions granted above are perpetual and will not be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Abstract

This document specifies a compact, non-XML syntax for [RELAX NG].

Status of this Document

This is a working draft constructed by the editor. It is not an official committee work product and may not reflect the consensus opinion of the committee. Comments on this document may be sent to relax-ng-comment@lists.oasis-open.org.

1 Introduction

2 Syntax

3 Lexical structure

4 Declarations

5 Annotations

5.1 Initial annotations
5.2 Documentation shorthand
5.3 Following annotations
5.4 Grammar annotations

6 Conformance

Appendixes

A Formal description

A.1 Syntax

A.2 Lexical structure

A.2.1 Character encoding
A.2.2 BOM stripping
A.2.3 Newline normalization
A.2.4 Escape interpretation
A.2.5 Tokenization
A.2.6 Literal concatenation

B Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

References

1. Introduction

This specification describes a compact, non-XML syntax for [RELAX NG].

The goals of this syntax are:

maximize readability;
support all features of RELAX NG; it must be possible to translate a schema from the XML syntax to the compact syntax and back without losing significant information;
support separate translation; a RELAX NG schema may be spread amongst multiple files; it must be possible to represent each of the files separately in the compact syntax; the representation of each file must not depend on the other files.

The syntax has similarities to [XQuery Formal Semantics], to [XDuce] and to the DTD syntax of [XML 1.0].

The body of this document contains an informal description of the syntax and how it maps onto the XML syntax. Developers should consult Appendix A. Formal description for a complete, rigorous description.

2. Syntax

The following is a summary of the syntax in EBNF. The reader may find it helpful to compare this with the syntax in Section 3 of [RELAX NG]. The start symbol is topLevel.

topLevel	��::=��	decl* (pattern \| grammarContent*)
decl	��::=��	"`namespace`" identifierOrKeyword "`=`" namespaceUri \| "`default`" "`namespace`" [identifierOrKeyword] "`=`" namespaceUri \| "`datatypes`" identifierOrKeyword "`=`" literal
pattern	��::=��	"`element`" nameClass "`{`" pattern "`}`" \| "`attribute`" nameClass "`{`" pattern "`}`" \| pattern ("`,`" pattern)+ \| pattern ("`&`" pattern)+ \| pattern ("`\|`" pattern)+ \| pattern "`?`" \| pattern "``" \| pattern* "`+`" \| "`list`" "`{`" pattern "`}`" \| "`mixed`" "`{`" pattern "`}`" \| identifier \| "`parent`" identifier \| "`empty`" \| "`text`" \| [datatypeName] datatypeValue \| datatypeName ["`{`" param* "`}`"] [exceptPattern] \| "`notAllowed`" \| "`externalRef`" uri [inherit] \| "`grammar`" "`{`" grammarContent* "`}`" \| "`(`" pattern "`)`"
param	��::=��	identifierOrKeyword "`=`" literal
exceptPattern	��::=��	"`-`" pattern
grammarContent	��::=��	start \| define \| "`div`" "`{`" grammarContent* "`}`" \| "`include`" uri [inherit] ["`{`" includeContent* "`}`"]
includeContent	��::=��	define \| start \| "`div`" "`{`" includeContent* "`}`"
start	��::=��	"`start`" assignMethod pattern
define	��::=��	identifier assignMethod pattern
assignMethod	��::=��	"`=`" \| "`\|=`" \| "`&=`"
nameClass	��::=��	name \| nsName [exceptNameClass] \| anyName [exceptNameClass] \| nameClass "`\|`" nameClass \| "`(`" nameClass "`)`"
name	��::=��	identifierOrKeyword \| CName
exceptNameClass	��::=��	"`-`" nameClass
datatypeName	��::=��	CName \| "`string`" \| "`token`"
datatypeValue	��::=��	literal
uri	��::=��	literal
namespaceUri	��::=��	literal \| "`inherit`"
inherit	��::=��	"`inherit`" "`=`" identifierOrKeyword
identifierOrKeyword	��::=��	identifier \| keyword
identifier	��::=��	(NCName - keyword) \| quotedIdentifier
quotedIdentifier	��::=��	"`\`" NCName
CName	��::=��	NCName "`:`" NCName
nsName	��::=��	NCName "`:*`"
anyName	��::=��	"`*`"
literal	��::=��	literalSegment+
literalSegment	��::=��	'`"`' (Char - '`"`')* '`"`' \| "`'`" (Char - "`'`")* "`'`"
keyword	��::=��	"`attribute`" \| "`default`" \| "`datatypes`" \| "`div`" \| "`element`" \| "`empty`" \| "`externalRef`" \| "`grammar`" \| "`include`" \| "`inherit`" \| "`list`" \| "`mixed`" \| "`namespace`" \| "`notAllowed`" \| "`parent`" \| "`start`" \| "`string`" \| "`text`" \| "`token`"

NCName is defined in [XML Namespaces]. Char is defined in [XML 1.0].

In order to use a keyword as an identifier, it must be quoted with \. It is not necessary to quote a keyword that is used as the name of an element or attribute or as datatype parameter.

The value of a literal is the concatenation of the values of its constituent literalSegments. The value of a literal segment consists of the characters between the opening and closing quote. The way to get a literal whose value contains both a single and a double quote is to divide the literal into multiple literalSegments so that the single and double quote are in separate literalSegments.

Annotations can be specified as described in Section 5.

There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section A.1.

3. Lexical structure

Whitespace is allowed between tokens. Tokens are the quoted terminals appearing in the EBNF in Section 2, except that literalSegment, nsName, CName, identifier and quotedIdentifer are single tokens.

Comments are also allowed between tokens. Comments start with a # and continue to the end of the line. Comments starting with ## are treated specially; see Section 5.

A Unicode character with hex code N can be represented by the escape sequence \x{N}. Using such an escape sequence is completely equivalent to the entering the corresponding character directly. For example,

element \x{66}\x{6f}\x{6f} { empty }

is equivalent to

element foo { empty }

4. Declarations

A datatypes declaration declares a prefix used in a QName identifying a datatype. For example,

datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
element height { xsd:double }

In fact, in the above example, the datatypes declaration is not required: the xsd prefix is predeclared to the above URI.

A namespace declaration declares a prefix used in a QName specifying the name of an element or attribute. For example,

namespace rng = "http://relaxng.org/ns/structure/1.0"
element rng:text { empty }

As in XML, the xml prefix is predeclared.

A default namespace declaration declares the namespace used for unprefixed names specifying the name of an element (but not of an attribute). For example,

default namespace = "http://example.com"
element foo { attribute bar { string } }

is equivalent to

namespace ex = "http://example.com"
element ex:foo { attribute bar { string } }

A default namespace declaration may have a prefix as well. For example,

default namespace ex = "http://example.com"

is equivalent to

default namespace = "http://example.com"
namespace ex = "http://example.com"

The URI may be empty. This makes the prefix stand for the absent namespace URI. This is necessary for specifying a name class that matches any name with an absent namespace URI. For example:

namespace local = ""
element foo { attribute * - local:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo"
         ns="http://example.com">
  <zeroOrMore>
    <attribute>
      <anyName>
	<except>
	  <nsName ns=""/>
	</except>
      </anyName>
      <data type="string"/>
    </attribute>
  <zeroOrMore>
</element>

RELAX NG has the feature that if a file does not specify an ns attribute then the ns attribute can be inherited from the including file. To support this feature, the keyword inherit can be specified in place of the namespace URI in a namespace declaration. For example,

default namespace this = inherit
element foo { element * - this:* { string }* }

is equivalent to

<element xmlns="http://relaxng.org/ns/structure/1.0""
         name="foo">
  <zeroOrMore>
    <element>
      <anyName>
	<except>
	  <nsName/>
	</except>
      </anyName>
      <data type="string"/>
    </element>
  <zeroOrMore>
</element>

In addition, the include and externalRef patterns can specify inherit = prefix to specify the namespace to be inherited by the referenced file. For example,

namespace x = "http://www.example.com"
externalRef "foo.rng" inherit = x

is equivalent to

<externalRef href="foo.rng"
  ns="http://www.example.com"
  xmlns="http://relaxng.org/ns/structure/1.0"/>

In the absence of an inherit parameter on include or externalRef, the default namespace will be inherited by the referenced file.

In the absence of a default namespace declaration, a declaration of

default namespace = inherit

is assumed.

5. Annotations

5.1. Initial annotations

An annotation in square brackets can be inserted immediately before a pattern, nameClass, grammarContent or includeContent. It has the following syntax:

annotation	��::=��	"`[`" annotationAttribute* annotationElement* "`]`"
annotationAttribute	��::=��	name "`=`" literal
annotationElement	��::=��	name "`[`" annotationAttribute* (annotationElement \| literal)* "`]`"

Each of the annotationAttributes will turn into attributes on the corresponding RELAX NG element. Each of the annotationElements will turn into initial children of the corresponding RELAX NG element, except in the case where the RELAX NG element cannot have children, in which case they will turn into following elements.

5.2. Documentation shorthand

Comments starting with ## are used to specify documentation elements from the http://relaxng.org/ns/compatibility/annotations/1.0 namespace as described in [Compatibility]. For example,

## Represents a language
element lang { 
  ## English
  "en" |
  ## Japanese
  "jp"
}

turns into

<element name="lang"
    xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
    xmlns="http://relaxng.org/ns/structure/1.0">
  <a:documentation>Represents a language</a:documentation>
  <choice>
    <value>en</value>
    <a:documentation>English</a:documentation>
    <value>jp</value>
    <a:documentation>Japanese</a:documentation>
  </choice>
</element>

## comments can only be used immediately before before a pattern, nameClass, grammarContent or includeContent. Multiple ## comments are allowed. Multiple adjacent ## comments without any intervening blank lines are merged into a single documentation element. Any ## comments must precede any annotation in square brackets.

5.3. Following annotations

A pattern or nameClass may be followed by any number of followAnnotations with the following syntax:

followAnnotation ��::=�� ">>" annotationElement

Each such annotationElement turns into an following sibling of the RELAX NG element representing the pattern or nameClass.

5.4. Grammar annotations

An annotationElement may be used in any place where grammarContent or includeContent is allowed. For example,

namespace x = "http://www.example.com"

start = foo

x:notation [ name="jpeg" systemId="http://www.example.com/jpeg" ]

foo = element foo { empty }

turns into

<grammar xmlns:x="http://www.example.com" 
         xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <ref name="foo"/>
  </start>
  <x:notation name="jpeg" systemId="http://www.example.com/jpeg"/>
  <define name="foo">
    <element name="foo">
      <empty/>
    </element>
  </define>
</grammar>

If the name of such an element is a keyword, then it must be escaped.

6. Conformance

TBD

A. Formal description

A.1. Syntax

The compact syntax is specified by a grammar in BNF. The translation into the XML syntax is specified by annotations in the grammar.

The start symbol is topLevel.

The BNF description consists of a set of production rules. Each production rule has a left-hand side and right-hand side separated by ::=. The left-hand side specifies the name of a non-terminal. The right-hand side specifies a list of one or more alternatives separated by |. Each alternative consists of a sequence of terminals and non-terminals. A non-terminal is specified by a name in italics. A terminal is either a literal string in quotes or a named non-terminal specified by a name in bold italics. An alternative can also be specified as ε, which denotes an empty sequence of tokens.

Each alternative may be followed by references to one or more named constraints that apply to that alternative.

The translation into XML syntax is specified by associating a value with each terminal and non-terminal in the derivation. Each alternative in the BNF may be followed by an expression in curly braces, which specifies how to compute the value associated with the left-hand side non-terminal. Each terminal and non-terminal on the right-hand side can be labelled with a subscript specifying a variable name. When that variable name is used within the curly braces, it refers to the value associated with that terminal or non-terminal. If an alternative consists of a single terminal or non-terminal, then the expression in curly braces can be omitted; in this case the value of the left-hand side is the value of that terminal or non-terminal.

The result of the translation is not a string containing the XML representation of a RELAX NG schema, but rather is an instance of the data model described in Section 2 of [RELAX NG]; this instance will match the RELAX NG schema for RELAX NG.

The computation of the value of a non-terminal may make use of one or more arguments. The name of such a non-terminal is always followed by a list of arguments. When the name of the non-terminal occurs on the left-hand-side, this list declares the formal arguments for the non-terminal. When the name occurs on the right-hand side of a producton, the list specifies the actual arguments which will be bound to the formal arguments during the computation of the value of the non-terminal. The expressions in curly braces on the right-hand side can refer to the formal arguments declared on the left-hand side. For example, see simpleNameClass.

In addition to explicit arguments, every non-terminal implicitly has an argument that specifies a context for the interpretation of a pattern. Normally the implicit context argument to each non-terminal is the same as its parent; an expression followed by a period followed by a non-terminal references that non-terminal with the context argument changed to be the value of that expression. For example, see topLevel and preamble. In the initial context used for the start symbol, xml is bound as a namespace prefix to http://www.w3.org/XML/1998/namespace, and xsd is bound as a datatype prefix to http://www.w3.org/2001/XMLSchema-datatypes.

Expressions use the following notation:

x denotes the value of the variable named x;
{ } denotes an empty set;
( ) denotes an empty sequence;
(x, y) denotes the concatenation of the sequences x and y;
context denotes the value of the implicit context argument;
true denotes boolean true;
false denotes boolean true;
inherit denotes a distinct constant used to indicate that a namespace URI should be inherited from the referencing schema;
"xyzzy" denotes a string consisting of the characters xyzzy;
foo(x, y) denotes the value of the function foo applied to the arguments x and y; the following primitive functions are used:
- qName(x, y)returns a qualified-name with prefix x and local part y;
- prefix(x) returns the prefix of the qualified-name x;
- localPart(x) returns the local-part of the qualified-name x;
- union(x, y) returns the union of the sets x and y;
- name(x, y) returns a name with namespace URI x and local name y;
- attribute(x, y) returns an attribute with name x and value y;
- element(x, y, z) returns an element with name x, attributes y and children z;
- bindPrefix(x, y, z) returns a context that is the same as x except that it has the prefix y bound to z;
- bindDefault(x, y) returns a context that is the same as x except it has the default namespace z;
- bindDatatypePrefix(x, y, z) returns a context that is the same as x except that it has y bound as a prefix for datatypes to the URI z;
- lookupPrefix(x, y) returns the binding in the context x for the prefix y; it is an error if there is no applicable binding;
- lookupDefault(x) returns the default namespace of the context x; if no default has been bound, returns inherit;
- lookupDatatypePrefix(x, y) returns the binding as a datatype prefix in the context x for the prefix y; it is an error if there is no applicable binding;
- mapSchemaRef(x) returns a URI; x is a URI of a resource containing a schema in the syntax described by this specification; the returned URI is the URI of a resource containing the translation of this schema into RELAX NG XML syntax;
- makeNsAttribute(x) returns an empty set if x is inherit, and otherwise returns an attribute whose namespace URI is the empty string, whose local name is ns and whose value is x;
- pair(x, y) returns a pair whose first member is x and whose second member is y;
- emptyAnnotations() returns a pair whose first member is an empty set and whose second member is an empty sequence;
- applyAnnotations(x, y) returns an element whose name is the name of y, whose attributes are the union of the first member of x and the attributes of y, and whose children are the concatenation of the second member of x and the children of y;
- applyAnnotationsGroup(x, y) is equivalent to applyAnnotations(x, <group> y </group>) unless x is equal to emptyAnnotations(), in which case it is equivalent to y;
- applyAnnotationsChoice(x, y) is equivalent to applyAnnotations(x, <choice> y </choice>) unless x is equal to emptyAnnotations(), in which case it is equivalent to y;
- stringConcat(x, y) returns a string that is the concatenation of the strings x and y;
- stripFirstSpace(x);
- datatypeAttributes(x, y) returns a set of two attributes; both attributes have the empty string as their namespace URI; one attribute has local name datatypeLibrary and value x; the other attribute has local name type and value y;
- documentationElementName() returns the name of the documentation element defined in [Compatibility], that is, the name with namespace URI http://relaxng.org/ns/compatibility/annotations/1.0 and local name documentation;
x ? y : z is a conditional expression, which denotes y if x is true and z if x is false;
<foo x> y </foo> denotes an element from the RELAX NG namespace with local name foo, attributes x and content x.

topLevel��::=
��preamble_c��c.topLevelBody_x
��{ x }

preamble��::=
��ε
��{ context }
��|��decl_c��c.preamble_d
��{ d }

decl��::=
��"namespace"��namespacePrefix_x��"="��namespaceUri_y
��Constraint: xml prefix
��Constraint: xml namespace URI
��Constraint: duplicate declaration
��{ bindPrefix(context, x, y) }
��|��"default"��"namespace"��"="��namespaceUri_x
��Constraint: xml namespace URI
��Constraint: duplicate declaration
��{ bindDefault(context, x) }
��|��"default"��"namespace"��namespacePrefix_x��"="��namespaceUri_y
��Constraint: xml prefix
��Constraint: xml namespace URI
��Constraint: duplicate declaration
��{ bindDefault(bindPrefix(context, x, y), y) }
��|��"datatypes"��datatypePrefix_x��"="��literal_y
��Constraint: xsd prefix
��Constraint: datatypes URI
��Constraint: duplicate declaration
��{ bindDatatypePrefix(context, x, y) }

namespacePrefix��::=
��identifierOrKeyword
��Constraint: valid prefix

datatypePrefix��::=
��identifierOrKeyword

namespaceUri��::=
��literal
��|��"inherit"
��{ inherit }

topLevelBody��::=
��pattern
��|��grammar_x
��{ <grammar> x </grammar> }

grammar��::=
��ε
��{ ( ) }
��|��member_x��grammar_y
��{ (x, y) }

member��::=
��annotatedComponent
��|��annotationElementNotKeyword

annotatedComponent��::=
��annotations_x��component_y
��{ applyAnnotations(x, y) }

component��::=
��start
��|��define
��|��include
��|��div

start��::=
��"start"��assignOp_x��pattern_y
��{ <start x> y </start> }

define��::=
��identifier_x��assignOp_y��pattern_z
��{ <define name=x y> z </define> }

assignOp��::=
��"="
��{ { } }
��|��"|="
��{ attribute(name("", "combine"), "choice") }
��|��"&="
��{ attribute(name("", "combine"), "interleave") }

include��::=
��"include"��literal_x��optInherit_y��optIncludeBody_z
��{ <include href=mapSchemaRef(x) y> z </include> }

optInherit��::=
��ε
��{ makeNsAttribute(lookupDefault(context)) }
��|��"inherit"��"="��identifierOrKeyword_x
��{ makeNsAttribute(lookupPrefix(context, x)) }

optIncludeBody��::=
��ε
��{ ( ) }
��|��"{"��includeBody_x��"}"
��{ x }

includeBody��::=
��ε
��{ ( ) }
��|��includeMember_x��includeBody_y
��{ (x, y) }

includeMember��::=
��annotatedIncludeComponent
��|��annotationElementNotKeyword

annotatedIncludeComponent��::=
��annotations_x��includeComponent_y
��{ applyAnnotations(x, y) }

includeComponent��::=
��start
��|��define
��|��includeDiv

div��::=
��"div"��"{"��grammar_x��"}"
��{ <div> x </div> }

includeDiv��::=
��"div"��"{"��includeBody_x��"}"
��{ <div> x </div> }

pattern��::=
��innerPattern(emptyAnnotations())

innerPattern(anno)��::=
��innerParticle(anno)
��|��particleChoice_x
��{ applyAnnotations(anno, <choice> x </choice>) }
��|��particleGroup_x
��{ applyAnnotations(anno, <group> x </group>) }
��|��particleInterleave_x
��{ applyAnnotations(anno, <interleave> x </interleave>) }
��|��annotatedDataExcept_x
��{ applyAnnotationsGroup(anno, x) }

particleChoice��::=
��particle_x��"|"��particle_y
��{ (x, y) }
��|��particle_x��"|"��particleChoice_y
��{ (x, y) }

particleGroup��::=
��particle_x��","��particle_y
��{ (x, y) }
��|��particle_x��","��particleGroup_y
��{ (x, y) }

particleInterleave��::=
��particle_x��"&"��particle_y
��{ (x, y) }
��|��particle_x��"&"��particleInterleave_y
��{ (x, y) }

particle��::=
��innerParticle(emptyAnnotations())

innerParticle(anno)��::=
��annotatedPrimary_x
��{ applyAnnotationsGroup(anno, x) }
��|��repeatedPrimary_x��followAnnotations_y
��{ (applyAnnotations(anno, x), y) }

repeatedPrimary��::=
��annotatedPrimary_x��"*"
��{ <zeroOrMore> x </zeroOrMore> }
��|��annotatedPrimary_x��"+"
��{ <oneOrMore> x </oneOrMore> }
��|��annotatedPrimary_x��"?"
��{ <optional> x </optional> }

annotatedPrimary��::=
��leadAnnotatedPrimary_x��followAnnotations_y
��{ (x, y) }

annotatedDataExcept��::=
��leadAnnotatedDataExcept_x��followAnnotations_y
��{ (x, y) }

leadAnnotatedDataExcept��::=
��annotations_x��dataExcept_y
��{ applyAnnotations(x, y) }

leadAnnotatedPrimary��::=
��annotations_x��primary_y
��{ applyAnnotations(x, y) }
��|��annotations_x��"("��innerPattern(x)_y��")"
��{ y }

primary��::=
��"element"��nameClass(true)_x��"{"��pattern_y��"}"
��{ <element> x y </element> }
��|��"attribute"��nameClass(false)_x��"{"��pattern_y��"}"
��{ <attribute> x y </attribute> }
��|��"mixed"��"{"��pattern_x��"}"
��{ <mixed> x </mixed> }
��|��"list"��"{"��pattern_x��"}"
��{ <list> x </list> }
��|��datatypeName_x��optParams_y
��{ <data x> y </data> }
��|��datatypeName_x��datatypeValue_y
��{ <value x> y </value> }
��|��datatypeValue_x
��{ <value> x </value> }
��|��"empty"
��{ <empty/> }
��|��"notAllowed"
��{ <notAllowed/> }
��|��"empty"
��{ <text/> }
��|��ref_x
��{ <ref name=x/> }
��|��"parent"��ref_x
��{ <parentRef name=x/> }
��|��"grammar"��"{"��grammar_x��"}"
��{ <grammar> x </grammar> }
��|��"externalRef"��literal_x��optInherit_y
��{ <externalRef href=mapSchemaRef(x) y/> }

dataExcept��::=
��datatypeName_x��optParams_y��"-"��leadAnnotatedPrimary_z
��{ <data x> y <except> z </except> </data> }

ref��::=
��identifier

datatypeName��::=
��CName_x
��{ datatypeAttributes(lookupDatatypePrefix(context, prefix(x)), localPart(x)) }
��|��"string"
��{ datatypeAttributes("", "string") }
��|��"token"
��{ datatypeAttributes("", "token") }

datatypeValue��::=
��literal

optParams��::=
��ε
��{ ( ) }
��|��"{"��params_x��"}"
��{ x }

params��::=
��ε
��{ ( ) }
��|��param_x��params_y
��{ (x, y) }

param��::=
��annotations_x��identifierOrKeyword_y��"="��literal_z
��{ applyAnnotations(x, <param name=y> z </param>) }

nameClass(elem)��::=
��innerNameClass(elem, emptyAnnotations())

innerNameClass(elem, anno)��::=
��annotatedSimpleNameClass(elem)_x
��{ applyAnnotationsChoice(anno, x) }
��|��nameClassChoice(elem)_x
��{ applyAnnotations(anno, <choice> x </choice>) }
��|��annotatedExceptNameClass(elem)_x
��{ applyAnnotationsChoice(anno, x) }

nameClassChoice(elem)��::=
��annotatedSimpleNameClass(elem)_x��"|"��annotatedSimpleNameClass(elem)_y
��{ (x, y) }
��|��annotatedSimpleNameClass(elem)_x��"|"��nameClassChoice(elem)_y
��{ (x, y) }

annotatedExceptNameClass(elem)��::=
��leadAnnotatedExceptNameClass(elem)_x��followAnnotations(elem)_y
��{ (x, y) }

leadAnnotatedExceptNameClass(elem)��::=
��annotations(elem)_x��exceptNameClass(elem)_y
��{ applyAnnotations(x, y) }

annotatedSimpleNameClass(elem)��::=
��leadAnnotatedSimpleNameClass(elem)_x��followAnnotations(elem)_y
��{ (x, y) }

leadAnnotatedSimpleNameClass(elem)��::=
��annotations_x��simpleNameClass(elem)_y
��{ applyAnnotations(x, y) }
��|��annotations_x��"("��innerNameClass(elem, x)_y��")"
��{ y }

exceptNameClass(elem)��::=
��nsName_x��"-"��leadAnnotatedSimpleNameClass(elem)_y
��Constraint: name class except
��{ <nsName makeNsAttribute(lookupPrefix(context, x))> <except> y </except> </nsName> }
��|��"*"��"-"��leadAnnotatedSimpleNameClass(elem)_x
��Constraint: name class except
��{ <anyName> <except> x </except> </anyName> }

simpleNameClass(elem)��::=
��identifierOrKeyword_x
��{ <name makeNsAttribute(elem ? lookupDefault(context) : "")> x </name> }
��|��CName_x
��{ <name makeNsAttribute(lookupPrefix(context, prefix(x)))> localPart(x) </name> }
��|��nsName_x
��{ <nsName makeNsAttribute(lookupPrefix(context, x))/> }
��|��"*"
��{ <anyName/> }

followAnnotations��::=
��ε
��{ ( ) }
��|��">>"��annotationElement_x��followAnnotations_y
��{ (x, y) }

annotations��::=
��documentations_x
��{ pair({ }, x) }
��|��documentations_x��"["��prefixedAnnotationAttributes_y��annotationElements_z��"]"
��{ pair(y, (x, z)) }

prefixedAnnotationAttributes��::=
��ε
��{ ( ) }
��|��prefixedAnnotationAttribute_x��prefixedAnnotationAttributes_y
��Constraint: duplicate attributes
��Constraint: unqualified name
��{ (x, y) }

annotationElements��::=
��ε
��{ ( ) }
��|��annotationElement_x��annotationElements_y
��{ (x, y) }

annotationElement��::=
��identifierOrKeyword_x��"["��annotationAttributes_y��annotationContent_z��"]"
��{ element(name("", x), y, z) }
��|��colonAnnotationElement

annotationElementNotKeyword��::=
��identifier_x��"["��annotationAttributes_y��annotationContent_z��"]"
��{ element(name("", x), y, z) }
��|��colonAnnotationElement

colonAnnotationElement��::=
��prefixedName_x��"["��annotationAttributes_y��annotationContent_z��"]"
��{ element(x, y, z) }

annotationContent��::=
��ε
��{ ( ) }
��|��annotationElement_x��annotationContent_y
��{ (x, y) }
��|��literal_x��annotationContent_y
��{ (x, y) }

annotationAttributes��::=
��ε
��{ ( ) }
��|��annotationAttribute_x��annotationAttributes_y
��Constraint: duplicate attributes
��{ (x, y) }

annotationAttribute��::=
��prefixedAnnotationAttribute
��|��unprefixedAnnotationAttribute

prefixedAnnotationAttribute��::=
��prefixedName_x��"="��literal_y
��Constraint: xmlns namespace URI
��{ attribute(x, y) }

prefixedName��::=
��CName_x
��Constraint: annotation inherit
��{ name(lookupPrefix(context, prefix(x)), localPart(x)) }

unprefixedAnnotationAttribute��::=
��identifierOrKeyword_x��"="��literal_y
��{ attribute(name("", x), y) }

documentations��::=
��ε
��{ ( ) }
��|��documentation_x��documentations_y
��{ (element(documentationElementName(), { }, x), y) }

identifierOrKeyword��::=
��identifier
��|��keyword

Constraint: valid prefix

It is an error if the value of a namespacePrefix is xmlns.

Constraint: xml prefix

It is an error if the value of namespacePrefix is xml and the the value of the namespaceUri is not http://www.w3.org/XML/1998/namespace.

Constraint: xml namespace URI

It is an error if the value of the namespaceUri is http://www.w3.org/XML/1998/namespace and the value of the namespacePrefix is not xml.

Constraint: xsd prefix

It is an error if the value of datatypePrefix is xsd and the the value of the literal is not http://www.w3.org/2001/XMLSchema-datatypes.

Constraint: datatypes URI

It is an error if the value of the literal in a datatypes declaration is not a syntactically legal value for a datatypeLibrary as specified in Section 3 of [RELAX NG].

Constraint: duplicate declaration

It is an error if there is more than one namespace declaration of a particular prefix, more than one default namespace declaration or more than one declaration of a particular datatypes prefix.

Constraint: name class except

It is an error if the value of exceptNameClass is such that it violates the constraint in the second paragraph of Section 4.16 of [RELAX NG]: "An except element that is a child of an anyName element must not have any anyName descendant elements. An except element that is a child of an nsName element must not have any nsName or anyName descendant elements."

Constraint: unqualified name

It is an error if the namespace URI of a prefixedName in a prefixedAnnotationAttributes is the empty string.

Constraint: xmlns namespace URI

It is an error if the namespace URI of a prefixedName in a prefixedAnnotationAttribute is http://www.w3.org/2000/xmlns.

Constraint: duplicate attributes

It is an error if a prefixedAnnotationAttributes or an annotationAttributes contains two attributes with the same namespace URI and local name.

Constraint: annotation inherit

It is an error if the namespace URI in the value of a prefixedName is inherit.

A.2. Lexical structure

This section describes how to transform the textual representation of a RELAX NG schema in compact syntax into a sequence of tokens, which can be parsed using the grammar specified in Section A.1.

There are six distinct stages, which are logically consecutive; the result of each stage is the input to the following stage.

A.2.1. Character encoding

The textual representation of the RELAX NG schema in compact syntax may be either a sequence of Unicode characters or a sequence of bytes. In the latter case, the first stage is to transform the sequence of bytes to the sequence of characters. The sequence of bytes may have associated metadata specifying the encoding. One example of such metadata is the charset parameter in a MIME media type. If there is such metadata, then the specified encoding is used. Otherwise, the first two bytes of the sequence are examined. If these are #xFF followed by #xFE or #xFE followed by #xFF, then an encoding of UTF-16 [Unicode] will be used, little-endian in the former case, big-endian in the latter case. Otherwise an encoding of UTF-8 [Unicode] is used. It is an error if the sequence of bytes is not a legal sequence in the selected encoding.

A.2.2. BOM stripping

If the first character of the sequence is a byte order mark (#xFEFF), then it is removed.

A.2.3. Newline normalization

Representations of newlines are normalized to #xA in a similar way to [XML 1.0]. Specifically, each occurrence of a #xD character that is not followed by a #xA character or of a #xD, #xA character pair is transformed to #xA.

A.2.4. Escape interpretation

In this stage, each escape sequence of the form \x{n}, where n is a hexadecimal number, is replaced by the character with Unicode code n. The escape sequence must match the production escapeSequence; the value computed in the BNF is the Unicode code of the replacement character. It is an error if the replacement character does not match the Char production of [XML 1.0]. It is an error if the input character sequence contains a character sequence escapeOpen that does not start an escapeSequence. After an escape sequence has been replaced, scanning for escape sequences continues following the replacement character; thus \x{5C}x{5C} is transformed to \x{5C} not to \.

Note

The \ character that opens an escape sequence may be followed by more than one x. This makes it possible for there to be a reversible transformation that maps a schema to a form containing only ASCII characters; the transformation replaces adds an extra x to each existing escape sequence, and replaces every non-ASCII character by an escape sequence with exactly one x.

escapeSequence��::=
��escapeOpen��hexNumber_x��escapeClose
��{ x }

escapeOpen��::=
��"\"��xs��"{"

xs��::=
��"x"
��|��"x"��xs

escapeClose��::=
��"}"

hexNumber��::=
��hexDigit
��|��hexNumber_x��hexDigit_y
��{ (x * 16) + y }

hexDigit��::=
��"0"
��{ 0 }
��|��"1"
��{ 1 }
��|��"2"
��{ 2 }
��|��"3"
��{ 3 }
��|��"4"
��{ 4 }
��|��"5"
��{ 5 }
��|��"6"
��{ 6 }
��|��"7"
��{ 7 }
��|��"8"
��{ 8 }
��|��"9"
��{ 9 }
��|��[Aa]
��{ 10 }
��|��[Bb]
��{ 11 }
��|��[Cc]
��{ 12 }
��|��[Dd]
��{ 13 }
��|��[Ee]
��{ 14 }
��|��[Ff]
��{ 15 }

A.2.5. Tokenization

In this stage, the sequence of characters is tokenized: it is transformed into a sequence of tokens, where each token corresponds to a non-terminal in the grammar in Section A.1, except that the token sequence contains literalSegment tokens instead of literal tokens.

A sequence of characters is tokenized by first finding the longest initial subsequence that:

is one of the literal string non-terminals occurring in the BNF in Section A.1
matches the grammar of one of the named non-terminals other than literal that is referenced in Section A.1 and specified in this section, that is, identifier, CName, nsName or documentation
matches the grammar for literalSegment, or
matches the grammar for separator

If the longest such initial subsequence matches separator, this subsequence is discarded. Otherwise, a single non-terminal is produced from this initial subsequence. In either case, the tokenization proceeds with the rest of the character sequence. It is an error if there is no such initial subsequence.

The production rules below use some additional notation. Square brackets enclose a character class. A character class of the form [^chars] specifies any legal XML character that does not occur in chars. A legal XML character is a character that matches the Char production of [XML 1.0]. A character class of the form [chars], where chars does not being with ^, specifies any single character that occurs in chars. XML hexadecimal character references are used to denote a single character, as in XML. NCName is defined in [XML Namespaces].

identifier��::=
��NCName_x - keyword
��{ x }
��|��"\"��NCName_x
��{ x }

CName��::=
��NCName_x��":"��NCName_y
��{ qName(x, y) }

nsName��::=
��NCName_x��":*"
��{ x }

literalSegment��::=
��"""��stringNoQuot_x��"""
��{ x }
��|��"'"��stringNoApos_x��"'"
��{ x }

stringNoQuot��::=
��ε
��{ "" }
��|��[^"]_x��stringNoQuot_y
��{ stringConcat(x, y) }

stringNoApos��::=
��ε
��{ "" }
��|��[^']_x��stringNoApos_y
��{ stringConcat(x, y) }

documentation��::=
��documentationLine
��|��documentation_x��documentationContinuation_y
��{ stringConcat(x, y) }

documentationLine��::=
��"##"��restOfLine_x
��{ stripFirstSpace(x) }

documentationContinuation��::=
��[
]_x��indent��documentationLine_y
��{ stringConcat(x, y) }

indent��::=
��ε
��{ "" }
��|��[	 ]_x��indent_y
��{ stringConcat(x, y) }

restOfLine��::=
��ε
��{ "" }
��|��[^
]_x��restOfLine_y
��{ stringConcat(x, y) }

separator��::=
��[	
 ]
��|��"#"��[^
#]��restOfLine
��|��"#"

A.2.6. Literal concatenation

In this stage, each maximal sequence of consecutive literalSegment tokens is concatenated into a literal token.

literal��::=
��literalSegment
��|��literalSegment_x��literal_y
��{ stringConcat(x, y) }

B. Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

# RELAX NG XML syntax specified in compact syntax.

default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace local = ""
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

start = pattern

pattern =
  element element { (nameQName | nameClass), (common & pattern+) }
  | element attribute { (nameQName | nameClass), (common & pattern?) }
  | element group|interleave|choice|optional
            |zeroOrMore|oneOrMore|list|mixed { common & pattern+ }
  | element ref|parentRef { nameNCName, common }
  | element empty|notAllowed|text { common }
  | element data { type, param*, (common & exceptPattern?) }
  | element value { commonAttributes, type?, xsd:string }
  | element externalRef { href, common }
  | element grammar { common & grammarContent* }

param = element param { commonAttributes, nameNCName, xsd:string }

exceptPattern = element except { common & pattern+ }

grammarContent = 
  definition
  | element div { common & grammarContent* }
  | element include { href, (common & includeContent*) }

includeContent =
  definition
  | element div { common & includeContent* }

definition =
  element start { combine?, (common & pattern+) }
  | element define { nameNCName, combine?, (common & pattern+) }

combine = attribute combine { "choice" | "interleave" }

nameClass = 
  element name { commonAttributes, xsd:QName }
  | element anyName { common & exceptNameClass? }
  | element nsName { common & exceptNameClass? }
  | element choice { common & nameClass+ }

exceptNameClass = element except { common & nameClass+ }

nameQName = attribute name { xsd:QName }
nameNCName = attribute name { xsd:NCName }
href = attribute href { xsd:anyURI }
type = attribute type { xsd:NCName }

common = commonAttributes, foreignElement*

commonAttributes = 
  attribute ns { xsd:string }?,
  attribute datatypeLibrary { xsd:anyURI }?,
  foreignAttribute*

foreignElement = element * - rng:* { (anyAttribute | text | anyElement)* }
foreignAttribute = attribute * - (rng:*|local:*) { text }
anyElement = element * { (anyAttribute | text | anyElement)* }
anyAttribute = attribute * { text }

References

Normative

Compatibility: James Clark, Makoto MURATA, editors. RELAX NG DTD Compatibility. OASIS, 2001.
RELAX NG: James Clark, Makoto MURATA, editors. RELAX NG Specification. OASIS, 2001.
Unicode: The Unicode Consortium. The Unicode Standard, Version 3.2 or later
XML 1.0: Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, Eve Maler, editors. Extensible Markup Language (XML) 1.0 Second Edition. W3C (World Wide Web Consortium), 2000.
XML Namespaces: Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. W3C (World Wide Web Consortium), 1999.

Non-Normative

Guidelines: James Clark, Kohsuke KAWAGUCHI, editors. Guidelines for using W3C XML Schema Datatypes with RELAX NG. OASIS, 2001.
W3C XML Schema Datatypes: Paul V. Biron, Ashok Malhotra, editors. XML Schema Part 2: Datatypes. W3C (World Wide Web Consortium), 2001.
XDuce: Haruo Hosoya. Regular Expression Types for XML. PhD Thesis. The University of Tokyo, 2000.
XQuery Formal Semantics: Peter Fankhauser et al., editors.XQuery 1.0 Formal Semantics. W3C Working Draft 07 June 2001. W3C (World Wide Web Consortium), 2001.

RELAX NG Compact Syntax

Working Draft�7 June 2002

Abstract

Status of this Document

Table of Contents

Appendixes

1. Introduction

2. Syntax

3. Lexical structure

4. Declarations

5. Annotations

5.1. Initial annotations

5.2. Documentation shorthand

5.3. Following annotations

5.4. Grammar annotations

6. Conformance

A. Formal description

A.1. Syntax

Constraint: valid prefix

Constraint: xml prefix

Constraint: xml namespace URI

Constraint: xsd prefix

Constraint: datatypes URI

Constraint: duplicate declaration

Constraint: name class except

Constraint: unqualified name

Constraint: xmlns namespace URI

Constraint: duplicate attributes

Constraint: annotation inherit

A.2. Lexical structure

A.2.1. Character encoding

A.2.2. BOM stripping

A.2.3. Newline normalization

A.2.4. Escape interpretation

Note

A.2.5. Tokenization

A.2.6. Literal concatenation

B. Compact syntax RELAX NG schema for RELAX NG (Non-Normative)

References

Normative

Non-Normative