Copyright © 1999 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from other documents. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
The list of known errors in this specification is available at http://www.w3.org/1999/11/REC-xpath-19991116-errata.
Comments on this specification may be sent to www-xpath-comments@w3.org; archives of the comments are available.
The English version of this specification is the only normative version. However, for translations of this document, see http://www.w3.org/Style/XSL/translations.html.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
This specification is joint work of the XSL Working Group and the XML Linking Working Group and so is part of the W3C Style activity and of the W3C XML activity.
XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations [XSLT] and XPointer [XPointer]. The primary purpose of XPath is to address parts of an XML [XML] document. In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.
In addition to its use for addressing, XPath is also designed so that it has a natural subset that can be used for matching (testing whether or not a node matches a pattern); this use of XPath is described in XSLT.
XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes. XPath defines a way to compute a string-value for each type of node. Some types of nodes also have names. XPath fully supports XML Namespaces [XML Names]. Thus, the name of a node is modeled as a pair consisting of a local part and a possibly null namespace URI; this is called an expanded-name. The data model is described in detail in [5 Data Model].
The primary syntactic construct in XPath is the expression. An expression matches the production Expr. An expression is evaluated to yield an object, which has one of the following four basic types:
The context position is always less than or equal to the context size.
The function library consists of a mapping from function names to functions. Each function takes zero or more arguments and returns a single result. This document defines a core function library that all XPath implementations must support (see [4 Core Function Library]). For a function in the core function library, arguments and result are of the four basic types. Both XSLT and XPointer extend XPath by defining additional functions; some of these functions operate on the four basic types; others operate on additional data types defined by XSLT and XPointer.
The namespace declarations consist of a mapping from prefixes to namespace URIs.
The variable bindings, function library and namespace declarations used to evaluate a subexpression are always the same as those used to evaluate the containing expression. The context node, context position, and context size used to evaluate a subexpression are sometimes different from those used to evaluate the containing expression. Several kinds of expressions change the context node; only predicates change the context position and context size (see [2.4 Predicates]). When the evaluation of a kind of expression is described, it will always be explicitly stated if the context node, context position, and context size change for the evaluation of subexpressions; if nothing is said about the context node, context position, and context size, they remain unchanged for the evaluation of subexpressions of that kind of expression.
XPath expressions often occur in XML attributes. The grammar
specified in this section applies to the attribute value after XML 1.0
normalization. So, for example, if the grammar uses the character
<, this must not appear in the XML source as
< but must be quoted according to XML 1.0 rules by,
for example, entering it as <. Within expressions,
literal strings are delimited by single or double quotation marks,
which are also used to delimit XML attributes. To avoid a quotation
mark in an expression being interpreted by the XML processor as
terminating the attribute value the quotation mark can be entered as a
character reference (" or
'). Alternatively, the expression can use single
quotation marks if the XML attribute is delimited with double
quotation marks or vice-versa.
One important kind of expression is a location path. A location path selects a set of nodes relative to the context node. The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path. Location paths can recursively contain expressions that are used to filter sets of nodes. A location path matches the production LocationPath.
In the following grammar, the non-terminals QName and NCName are defined in [XML Names], and S is defined in [XML]. The grammar uses the same EBNF notation as [XML] (except that grammar symbols always have initial capital letters).
Expressions are parsed by first dividing the character string to be parsed into tokens and then parsing the resulting sequence of tokens. Whitespace can be freely used between tokens. The tokenization process is described in [3.7 Lexical Structure].
Although location paths are not the most general grammatical construct in the language (a LocationPath is a special case of an Expr), they are the most important construct and will therefore be described first.
Every location path can be expressed using a straightforward but rather verbose syntax. There are also a number of syntactic abbreviations that allow common cases to be expressed concisely. This section will explain the semantics of location paths using the unabbreviated syntax. The abbreviated syntax will then be explained by showing how it expands into the unabbreviated syntax (see [2.5 Abbreviated Syntax]).
Here are some examples of location paths using the unabbreviated syntax:
child::para selects the
para element children of the context node
child::text() selects all text
node children of the context node
child::node() selects all the
children of the context node, whatever their node type
attribute::name selects the
name attribute of the context node
descendant::para selects the
para element descendants of the context node
ancestor-or-self::div selects the
div ancestors of the context node and, if the context node is a
div element, the context node as well
descendant-or-self::para selects the
para element descendants of the context node and, if the context node is
a para element, the context node as well
self::para selects the context node if it is a
para element, and otherwise selects nothing
child::chapter/descendant::para
selects the para element descendants of the
chapter element children of the context node
child::*/child::para selects
all para grandchildren of the context node
/ selects the document root (which is
always the parent of the document element)
/descendant::para selects all the
para elements in the same document as the context node
/descendant::olist/child::item selects all the
item elements that have an olist parent and
that are in the same document as the context node
child::para[position()=1] selects the first
para child of the context node
child::para[position()=last()] selects the last
para child of the context node
child::para[position()=last()-1] selects
the last but one para child of the context node
child::para[position()>1] selects all
the para children of the context node other than the
first para child of the context node
following-sibling::chapter[position()=1]
selects the next chapter sibling of the context node
preceding-sibling::chapter[position()=1]
selects the previous chapter sibling of the context
node
/descendant::figure[position()=42] selects
the forty-second figure element in the
document
/child::doc/child::chapter[position()=5]/child::section[position()=2]
selects the second section of the fifth
chapter of the doc document
element
child::para[attribute::type="warning"]
selects all para children of the context node that have a
type attribute with value warning
child::para[attribute::type='warning'][position()=5]
selects the fifth para child of the context node that has
a type attribute with value
warning
child::para[position()=5][attribute::type="warning"]
selects the fifth para child of the context node if that
child has a type attribute with value
warning
child::chapter[child::title='Introduction']
selects the chapter children of the context node that
have one or more title children with string-value equal to
Introduction
child::chapter[child::title] selects the
chapter children of the context node that have one or
more title children
child::*[self::chapter or self::appendix]
selects the chapter and appendix children of
the context node
child::*[self::chapter or
self::appendix][position()=last()] selects the last
chapter or appendix child of the context
node
There are two kinds of location path: relative location paths and absolute location paths.
A relative location path consists of a sequence of one or more
location steps separated by /. The steps in a relative
location path are composed together from left to right. Each step in
turn selects a set of nodes relative to a context node. An initial
sequence of steps is composed together with a following step as
follows. The initial sequence of steps selects a set of nodes
relative to a context node. Each node in that set is used as a
context node for the following step. The sets of nodes identified by
that step are unioned together. The set of nodes identified by
the composition of the steps is this union. For example,
child::div/child::para selects the
para element children of the div element
children of the context node, or, in other words, the
para element grandchildren that have div
parents.
An absolute location path consists of / optionally
followed by a relative location path. A /<