DOM Level 3 will provide an API for loading XML source documents into a DOM representation and for saving a DOM representation as a XML document.
Some environments, such as the Java platform or COM, have their own ways to persist objects to streams and to restore them. There is no direct relationship between these mechanisms and the DOM load/save mechanism. This specification defines how to serialize documents only to and from XML format.
Requirements that apply to both loading and saving documents.
Documents must be able to be parsed from and saved to the following sources:
Note that Input and Output streams take care of the in memory case. One point of caution is that a stream doesn't allow a base URI to be defined against which all relative URIs in the document are resolved.
While creating a new document using the DOM API, a mechanism must be provided to specify that the new document uses a pre-existing Content Model and to cause that Content Model to be loaded.
Note that while DOM Level 2 creation can specify a Content Model when creating a document (public and system IDs for the external subset, and a string for the subset), DOM Level 2 implementations do not process the Content Model's content. For DOM Level 3, the Content Model's content must be read.
When processing a series of documents, all of which use the same Content Model, implementations should be able to reuse the already parsed and loaded Content Model rather than reparsing it again for each new document.
This feature may not have an explicit DOM API associated with it, but it does require that nothing in this section, or the Content Model section, of this specification block it or make it difficult to implement.
Some means is required to allow applications to map public and system IDs to the correct document. This facility should provide sufficient capability to allow the implementation of catalogs, but providing catalogs themselves is not a requirement. In addition XML Base needs to be addressed.
Loading a document can cause the generation of errors including:
Saving a document can cause the generation of errors including:
This section, as well as the DOM Level 3 Content Model section should use a common error reporting mechanism. Well-formedness and validity checking are in the domain of the Content Model section, even though they may be commonly generated in response to an application asking that a document be loaded.
The following requirements apply to loading documents.
Parsers may have properties or options that can be set by applications. Examples include:
A mechanism to set properties, query the state of properties, and to query the set of properties supported by a particular DOM implementation is required.
The fundamental requirement is to write a DOM document as XML source. All information to be serialized should be available via the normal DOM API.
There are several options that can be defined when saving an XML document. Some of these are:
The following items are not committed to, but are under consideration. Public feedback on these items is especially requested.
Provide the ability for a thread that requested the loading of a document to continue execution without blocking while the document is being loaded. This would require some sort of notification or completion event when the loading process was done.
Provide the ability to examine the partial DOM representation before it has been fully loaded.
In one form, a document may be loaded asynchronously while a DOM based application is accessing the document. In another form, the application may explicitly ask for the next incremental portion of a document to be loaded.
Provide the capability to write out only a part of a document. May be able to leverage TreeWalkers, or the Filters associated with TreeWalkers, or Ranges as a means of specifying the portion of the document to be written.
Document fragments, as specified by the XML Fragment specification, should be able to be loaded. This is useful to applications that only need to process some part of a large document. Because the DOM is typically implemented as an in-memory representation of a document, fully loading large documents can require large amounts of memory.
XPath should also be considered as a way to identify XML Document fragments to load.
Document fragments, as specified by the XML Fragment specification, should be able to be loaded into the context of an existing document at a point specified by a node position, or perhaps a range. This is a separate feature than simply loading document fragments as a new Node.
DocumentBuilder
(Sun) and DOMParser
(Xerces).SAXException.toString()
and
SAXException.getMessage()
always the same? If not, we
need to add another attribute.DOMSystemException
needs to
be defined as part of the error handling module that is to be
shared with CM. Common I/O type errors need to be defined for it,
so that they can be reported in a uniform way. A way to imbed
errors or exceptions from the OS or language environment is needed,
to provide full information to applications that want it.This section defines an API for loading (parsing) XML source documents into a DOM representation and for saving (serializing) a DOM representation as an XML document.
The proposal for loading is influenced by Sun's JAXP API for XML Parsing in Java, http://java.sun.com/xml/download.html, and by SAX2, available at http://www.megginson.com/SAX/index.html
Here is a list of each of the interfaces involved with the Loading and Saving XML documents.
DOMImplementationLS
-- A new DOMImplementation
interface that provides the
factory methods for creating the objects required for loading and
saving.DOMBuilder
-- A parser interface.DOMInputSource
-- Encapsulate information about the source of the XML to be
loaded.DOMEntityResolver
-- During loading, provides a way for applications to redirect
references to external entities.DOMBuilderFilter
-- Provide the ability to examine and optionally remove Element
nodes as they are being processed durning the parsing of a
document.DOMFormatter
-- Provides for the actual formatting of DOM data into the output
format.DOMWriter
-- An interface for writing out DOM documents. The form in which
the data from the DOM will be written is controlled by a DOMFormatter
,
and the destination for the data is a
DOMOutputStream
.DOMImplementationLS
contains the factory methods
for creating objects implementing the DOMBuilder
(parser) and DOMWriter
interfaces.
interface DOMImplementationLS { DOMBuilder createDOMBuilder(); DOMWriter createDOMWriter(); };
createDOMBuilder
DOMBuilder
.
The newly constructed parser may then be configured by means of its
setFeature()
method, and used to parse documents by
means of its parse()
method.
The newly created parser object. |
createDOMWriter
A parser interface.
DOMBuilder
provides an API for parsing XML
documents and building the corresponding DOM document tree. A
DOMBuilder
instance is obtained from the DOMImplementationLS
interface by invoking its
createDOMBuilder()
method.
DOMBuilder
s have a number of named properties that
can be queried or set. Here is a list of properties that must be
recognized by all implementations.
validate-if-cm
feature will alter the validation
behavior when this feature is set true.interface DOMBuilder { attribute DOMEntityResolver entityResolver; attribute DOMErrorHandler errorHandler; attribute DOMBuilderFilter filter; void setFeature(in DOMString name, in boolean state) raises(DOMException); boolean supportsFeature(in DOMString name); boolean canSetFeature(in DOMString name, in boolean state); boolean getFeature(in DOMString name) raises(DOMException); Document parseURI(in DOMString uri) raises(DOMException, DOMSystemException); Document parseDOMInputSource(in DOMInputSource is) raises(DOMException, DOMSystemException); };
entityResolver
of
type DOMEntityResolver
DOMEntityResolver
has been specified, each time a reference to an external entity is
encountered the DOMBuilder
will pass the public and
system IDs to the entity resolver, which can then specify the
actual source of the entity.errorHandler
of
type DOMErrorHandler
DOMDcoumentBuilder
will call back to
the errorHandler
with the error information.
Note: The DOMErrorHandler interface is being developed separately, in conjunction with the design of the content model and validation module.
filter
of type DOMBuilderFilter
canSetFeature
DOMBuilder
to recognize a feature
name but to be unable to set its value.
name
of type
DOMString
state
of type
boolean
|
true if the feature could be successfully set to the specified value, or false if the feature is not recognized or the requested value is not supported. The value of the feature itself is not changed. |
getFeature
name
of type
DOMString
|
The current state of the feature (true or false). |
|
Raise a NOT_FOUND_ERR When the |
parseDOMInputSource
DOMInputSource
.
is
of type DOMInputSource
DOMInputSource
from which the source document is to be read.
|
The newly created and populated |
|
Exceptions raised by |
|
Exceptions raised by |
parseURI
uri
of type
DOMString
|
The newly created and populated |
|
Exceptions raised by |
|
Exceptions raised by |
setFeature
DOMBuilder
to recognize a feature
name but to be unable to set its value.
name
of type
DOMString
state
of type
boolean
|
Raise a NOT_SUPPORTED_ERR exception When the
Raise a NOT_FOUND_ERR When the |
supportsFeature
DOMBuilder
recognizes a feature name.DOMBuilder
to recognize a feature
name but to be unable to set its value. For example, a
non-validating parser would recognize the feature "validation",
would report that its value was false, and would raise an exception
if an attempt was made to enable validation by setting the feature
to true.
name
of type
DOMString
|
true if the feature name is recognized by the
|
This interface represents a single input source for an XML entity.
This interface allows an application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
There are two places that the application will deliver this
input source to the parser: as the argument to the
parseDOMInputSource
method, or as the return value of
the DOMEntityResolver.resolveEntity
method.
The DOMBuilder
will use the DOMInputSource
object to determine how to
read XML input. If there is a character stream available, the
parser will read that stream directly; if not, the parser will use
a byte stream, if available; if neither a character stream nor a
byte stream is available, the parser will attempt to open a URI
connection to the resource identified by the system identifier.
An DOMInputSource
object belongs to the
application: the parser shall never modify it in any way (it may
modify a copy if necessary).
interface DOMInputSource { attribute DOMInputStream byteStream; attribute DOMReader characterStream; attribute DOMString encoding; attribute DOMString publicId; attribute DOMString systemId; };
byteStream
of
type DOMInputStream
characterStream
of type DOMReader
encoding
of type
DOMString
publicId
of type
DOMString
systemId
of type
DOMString
DOMEntityResolver
Provides a way for applications
to redirect references to external entities.
Applications needing to implement customized handling for
external entities must implement this interface and register their
implementation by setting the entityResolver
property
of the DOMBuilder
.
The DOMBuilder
will then allow the application to intercept any external entities
(including the external DTD subset and external parameter entities)
before including them.
Many DOM applications will not need to implement this interface, but it will be especially useful for applications that build XML documents from databases or other specialized input sources, or for applications that use URI types other than URLs.
DOMEtityResolver
is based on the SAX2
EntityResolver
interface, described at
http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/EntityResolver.html
interface DOMEntityResolver { DOMInputSource resolveEntity(in DOMString publicId, in DOMString systemId ) raises(DOMSystemException); };
resolveEntity
DOMBuilder
will call this method before opening any external entity except the
top-level document entity (including the external DTD subset,
external entities referenced within the DTD, and external entities
referenced within the document element); the application may
request that the DOMBuilder
resolve the entity itself, that it use an alternative URI, or that
it use an entirely different input source.DOMBuilder
must resolve it fully before reporting it to the application
through this interface.
Note: See issue #4. An alternative would be to pass the URL out without resolving it, and to provide a base as an additional parameter. SAX resolves URLs first, and does not provide a base.
publicId
of type
DOMString
systemId
of type
DOMString
A |
|
Any |
DOMBuilderFilter
s provide applications the ability
to examine Element nodes as they are being constructed during a
parse. As each elements is examined, it may be modified or removed,
or the entire parse may be terminated early.
interface DOMBuilderFilter { boolean endElement(in Element element); };
endElement
element
of type
Element
|
return true |
DOMWriter provides the API that an application will use when serializing (writing) a DOM document out in the form of a source document.
Use of a DOMWriter
requires two other objects be
supplied: a DOMFormatter
,
which defines the output format in which the document will be
expressed, and a DOMOutputStream
, which defines where
the output will go.
interface DOMWriter { attribute DOMFormatter formatter; void writeNode(in DOMOutputStream destination, in Node node) raises(DOMSystemException); void writeTreeWalker(in DOMOutputStream destination, in TreeWalker tree) raises(DOMSystemException); void writeString(in DOMOutputStream destination, in DOMString aString) raises(DOMSystemException); };
formatter
of type DOMFormatter
DOMWriter
. For now, only an
XML formatter is defined, but others, such as an HTML formatter or
an arbitrary user-supplied formatter, could be used.formatter
defaults to an XML formatter, meaning that
applications do not need to explicitly set this attribute before
using a DOMWriter
.writeNode
destination
of type
DOMOutputStream
node
of type
Node
|
This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |
writeString
DOMString
.
destination
of type
DOMOutputStream
aString
of type
DOMString
|
This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |
writeTreeWalker
TreeWalker
.
destination
of type
DOMOutputStream
tree
of type
TreeWalker
|
This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |
DOMFormatter
defines the interface through which
the application controls the format in which a document will be
written.
Three options are available for the general appearance of the formatted output: As-is, canonical and reformatted.
Nodes of different types are written as follows:
DOMWriter.writeNode()
, output the entity expansion and
a Text Decl. The resulting output will be valid as an external
entity."&entityName;"
) in the output. Children (the
expansion) of the entity reference are ignored.TreeWalker
that is configured to
deliver that kind of a view to the DOMWriter
.Any characters that cannot be represented directly, either because of the rules of XML (& or <), or because of limitations of the output encoding, will be replaced with character references. If this is not possible (in a CDATA section, for example) the substitution character(s) will be output instead.
The XML to be written is assumed to be well formed. The output is undefined if an attempt is made to write not well formed XML, such as a Comment containing "--", or an Element containing two attributes with the same name.
Namespace prefixes, declarations and URIs are not checked for
consistency by the DOMWriter
.
If necessary, the (there is one, right) function from the
validation module should be used to bring these items into a
consistent state within the DOM prior to writing the document.
interface DOMFormatter { attribute DOMString encoding; readonly attribute DOMString lastEncoding; attribute DOMString substituteChars; attribute unsigned short format; void formatNode(in Node rootNode, in DOMOutputStream destination) raises(DOMSystemException); void formatTreeWalker(in TreeWalker tree, in DOMOutputStream destination) raises(DOMSystemException); };
encoding
of type
DOMString
format
of type
unsigned short
lastEncoding
of type
DOMString
, readonlysubstituteChars
of type DOMString
substituteChars
string will be
output in its place. If any of the characters from the
substituteChars string can not be represented, they will be
replaced by '?'. ('?' can be represented in all known character
encodings.)formatNode
DOMWriter.writeNode()
instead, which
will indirectly call back here. This interface (and this method)
are intended to be implemented by classes that provide alternative
output formats for DOM documents.
rootNode
of type
Node
destination
of type
DOMOutputStream
|
This exception will be raised in response to any kind of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |
formatTreeWalker
DOMWriter.writeTreeWalker()
instead,
which will indirectly call back here. This interface (and this
method) are intended to be implemented by classes that provide
alternative output formats for DOM documents.
tree
of type
TreeWalker
destination
of type
DOMOutputStream
|
This exception will be raised in response to any kind of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |