import com.ibm.xml.parser.*; .... String filename; .... InputStream is = new FileInputStream(filename); TXDocument doc = new Parser(filename).readStream(is); |
Parser#readStream()
never returns null
.
In this way, the parser prints parse errors to the standard error stream.
To access a parse tree, use TXDocument#getDocumentElement()
(See How to operate).
TXDocument#getDocumentElement()
may returns null
when the XML document has serious errors.
Parser
instance can not be reused.
An application can call Parser#readStream()
method only once.
You can restruct the parse tree into a stream in XML format.
String charset = "ISO-8859-1"; // MIME charset name String jencode = MIME2Java.convert(charset); PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, jencode)); doc.setEncoding(charset); doc.print(pw, jencode); |
You can configure parser's behavior after making Parser
instance
before call of readStream()
.
setErrorNoByteMark(boolean)
setKeepComment(boolean)
setPreserveSpace(boolean)
setWarningNoDoctypeDecl(boolean)
setWarningNoXMLDecl(boolean)
setWarningRedefinedEntity(boolean)
import com.ibm.xml.parser.*; .... String filename; .... Parser parse = new Parser(filename); parse.setWarningNoDoctypeDecl(false); parse.setWarningNoXMLDecl(false); InputStream is = new FileInputStream(filename); TXDocument doc = parse.readStream(is); |
You can control output of errors produced by the parser.
Make an instance of a class implementing ErrorListener
,
and specify the instance to Parser
constructor.
Object key
parameter of error()
method
is an instancce of String
or Exception
..
When key
is String
,
it means a type of error (See a source com/ibm/xml/parser/r/Message.java
).
import com.ibm.xml.parser.*; class ErrorIgnorer implements ErrorListener { public void error(String fname, int lineno, int charoff, Object key, String mes) { // do nothing } } .... String filename; .... InputStream is = new FileInputStream(filename); Parser parse = new Parser(filename, new ErrorIgnorer(), null); TXDocument doc = parse.readStream(is); |
import com.ibm.xml.parser.*; import java.awt.TextArea; class ErrorEater extends TextArea implements ErrorListener { String m_fname; ErrorEater(String n) { super(); m_fname = n; } public void error(String fname, int lineno, int charoff, Object key, String mes) { append((null == fname ? m_fname : fname)+":"+lineno+":"+mes+"\n"); } } .... String filename; .... InputStream is = new FileInputStream(filename); Parser parse = new Parser(filename, ee = new ErrorEater(filename), null); TXDocument doc = parse.readStream(is); |
See the sources, com/ibm/xml/parser/trlxml.java
,
com/ibm/xml/parser/Stderr.java
.
TXDocument
can have one TXElement
instance and
zero or one DTD
instance and instances of TXPI
and TXComment
as children.
All children of TXDocument
can be accessed with
TXDocument#getChildren()
/ TXDocument#getChildrenArray()
.
The TXElement
instance can be accessed
with TXDocuemnt#getDocumentElement()
also.
TXElement
can have some instances of TXElement
,
TXText
, TXPI
and TXComment
as children.
All children of TXElement
can be accessed with TXElement#getChildren()
/
TXElement#getChildrenArray()
.
Some mtehods of TXDocuement
and TXElement
returns instance(s) of
Child
interface.
These Child
instances are also instances of
TXElement
or
TXText
or
TXPI
or
TXComment
or
DTD
(if a child of TXDocument
).
To know what class an instance belongs to, use Node#getNodeType()
or instanceof
operator like the following:
import com.ibm.xml.parser.*; import org.w3c.dom.*; import java.util.Enumeration; .... TXDocument doc = ....; TXElement root = (TXElement)doc.getDocumentElement(); Enumeration en = root.elements(); whilte (en.hasMoreElements()) { Node ch = (Node)en.nextElement(); if (ch instanceof TXElement) { TXElement el = (TXElement)ch; .... } else if (ch instanceof TXText) { TXText te = (TXText)ch; .... } } |
The processor keeps all spaces and pass them to applications
according to 2.10 White Space Handling
in XML 1.0 Proposed Recommendation.
The processor set IsIgnorableWhitespace flag to
TXText
instances which consist of only white spaces.
<MEMBERS> <PERSON>Hiroshi</PERSON> <PERSON>Naohiko</PERSON> <PERSON> Kent </PERSON> </MEMBERS> |
The processor parses this Element as the following.
TXElement (getName():"MEMBERS", getText():"\n Hiroshi\n Naohiko\n \n Kent\n \n") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"Hiroshi") TXText ("Hiroshi") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"Naohiko") TXText ("Naohiko") TXText ("\n ", ignorable) TXElement (getName():"PERSON", getText():"\n Kent\n ") TXText ("\n Kent\n ") TXText ("\n", ignorable)
It is useful to call
TXText#trim(String)
/
TXText#trim(String,boolean,boolean)
when an application need not leading/trailing spaces.
class AElementHandler implements ElementHandler { public TXElement handleElement(TXElement el) { .... } } .... Parser parse = new Parser(...); parse.setElementHandler(new AElementHandler(), "CHANNEL"); TXDocument doc = parse.readStream(is); |
This ElementHandler#handleElement()
method is called after parsing eash end tag
(</CHANNEL>
) before being added to a parent
while processing Parser#readStream()
.
The parser adds to the parent an TXElement
instance
returned by handleElement()
.
If handleElement()
returns null
,
the parser doesn't add this TXElement
instance to the parent.
Two methods to set ElementHandler
:
TXElement
addElementHandler(handler, "CHANNEL");
</CHANNEL>
tag.
TXElement
addElementHandler(handler);
When more than one ElementHandlers are recorded to the parser,
first, the parser calls ElementHandlers for specific TXElement
(first set, first called)
and then calls ElementHandlers for all TXElement
.
Even if an ElementHandler changes a name of TXElement
,
the parser calls other ElementHandlers for original name.
When an ElementHandler returns null
,
the parser doesn't call other ElementHadnlers.
Parser parse = new Parser(...); parse.addElementHandler(handler1); parse.addElementHandler(handler2, "CHANNEL"); parse.addElementHandler(handler3, "CHANNEL"); parse.addElementHandler(handler4); TXDocument doc = parse.readStream(is); |
In this case, when the parser processes </CHANNEL>
tag,
the parser calls handler2
first, and calls handler3
,
handler1
and handler4
.
DefaultElementFactory
instance.ElementFacotry factory = new DefaultElementFactory();
TXDocument
instance with createDocument()
mehtodTXDocument doc = factory.createDocument();
doc.addElement(factory.createElement("ROOT"));
PrintWriter
TXDocument
if encoding of PrintWriter
isn't UTF-8.Format
class.Fromat.print(doc, pwriter);
ElementFactory factory = new DefaultElementFactory(); TXDocument doc = factory.createDocument(); TXElement el = factory.createElement("CHANNEL"); .... doc.addElement(el); PrintWriter pw = new PrintWriter(new OutputStreamWriter(System.out, MIME2Java.convert("Shift_JIS"))); doc.setEncoding("Shift_JIS"); Format.print(doc, pw); |
XML represents | How to make | |
---|---|---|
<?xml version="1.0" encoding="ISO-8859-1"?> |
TXDocument doc = factory.createDocument();
|
|
<?footarget foodata?> |
TXPI pi = factory.createPI("footarget", " foodata");
| |
<?footarget?> |
TXPI pi = factory.createPI("footarget", "");
| |
<!-- comment --> |
TXComment comm = factory.createComment(" comment ");
| |
<!DOCTYPE ROOT SYSTEM "root.dtd"> |
DTD dtd = factory.createDTD("ROOT", new ExternalID("root.dtd"));
| |
<!DOCTYPE ROOT [...]> |
DTD dtd = factory.createDTD("ROOT", null); | |
<!ELEMENT ROOT EMPTY> |
ElementDecl ed = factory.createElementDecl("ROOT", factory.createContentModel(ElementDecl.EMPTY));
| |
<!ELEMENT ROOT (#PCDATA|FOO|BAR)*> |
CMNode model = new CM1op('*', new CM2op('|', new CM2op('|', new CMLeaf("#PCDATA"), new CMLeaf("FOO")), new CMLeaf("BAR"))); or
| |
<!ELEMENT ROOT (FOO?, (DL|DD)+, BAR*)> |
CMNode model = new CM2op(',', new CM2op(',', new CM1op('?', new CMLeaf("FOO")), new CM1op('+', new CM2op('|', new CMLeaf("DL"), new CMLeaf("DD")))),new CM1op('*', new CMLeaf("BAR"))); or
| |
<!ATTLIST ROOT att1 CDATA #IMPLIED att2 (A|B|O|AB) "A"> |
Attlist al = factory.createAttlist("ROOT"); | |
<!NOTATION png SYSTEM "viewpng.exe"> |
TXNotation no = factory.createNotation("png", new ExternalID("viewpng.exe"));
| |
<!ENTITY version.num "1.1.6"> |
Entity ent = factory.createEntity("version.num", "1.1.6", false);
| |
<!ENTITY version.num SYSTEM "version.ent"> |
Entity ent = factory.createEntity("version.num", new ExternalID("version.ent"), null);
| |
<!ENTITY logoicon SYSTEM "logo.png" NDATA png> |
Entity ent = factory.createEntity("logoicon", new ExternalID("logo.png"), "png");
| |
<ROOT att1="val1" att2="val2">any text</ROOT> |
TXElement el = factory.createElement("ROOT"); | |
<![CDATA[any text]]> |
TXCDATASection cd = factory.createCDATASection("any text");
| |
&foobar; |
GeneralReference gr = factory.createGeneralReference("foobar");
| |
All XML nodes can be created with ` |
If you want to use not TXElement
class but a subclass of TXElement
,
Implement ElementFactory
interface
and call Parser#setElementFactory()
.
TXElement
class
DefaultElementFactory
class.
Parser#setElementFactory()
with an instance of the class implementing ElementFactory
.
class MyElement extends TXElement { .... } class MyElementFactory extends DefaultElementFactory { public TXElement createElement(String name) { MyElement el = new MyElement(name); el.setFactory(this); return el; } .... } .... Parser parse = new Parser(...); parse.setElementFactory(new MyElementFactory()); TXDocument doc = parse.readStream(is); // doc has not TXElement instances but MyElement instances |
You must call setFactory(this)
in create*()
methods
of your factory class.
String systemlit = "http://.../foobar.dtd"; InputStream is = (new URL(systemlit)).openStream(); Parser parse = new Parser(...); DTD dtd = parse.readDTDStream(is); |
Enumeration en = dtd.getAttributeDeclarations("FOO"); while (en.hasMoreElements()) { AttDef attd = (AttDef)en.nextElement(); // attd.getName() is attribute name } |
First, get AttDef
instance by the above method
or by DTD#getAttributeDeclaration(String,String)
.
Second, check attribute type by AttDef#getDeclaredType()
,
which returns one of the following values.
AttDef.CDATA
AttDef.ENTITIES
Enumeration en = dtd.getEntities(); while (en.hasMoreElements()) { EntityValue ev = (EntityValu)en.nextElement(); if (ev.isNDATA()) { // Each ev.getName() is valid value. } } |
AttDef.ENTITY
AttDef.NAME_TOKEN_GROUP
AttDef#elements()
.
Enumeration en = attd.elements(); while (en.hasMoreElements()) { String s = (String)en.nextElement(); // Each s is valid. } |
AttDef.ID
DTD#checkID()
returns null
.
String newid = ... if (null != dtd.checkID(newid)) { // Can't use newid } else dtd.registID(element, newid); |
AttDef.IDREF
Enumeration en = dtd.IDs(); while (en.hasMoreElements()) { String id = (String)en.nextElement(); // The attribute can has one in a set of each id. } |
AttDef.IDREFS
AttDef.NMTOKEN
AttDef.NMTOKENS
AttDef.NOTATION
AttDef#elements()
.
Enumeration en = attd.elements(); while (en.hasMoreElements()) { String s = (String)en.nextElement(); // Each s is valid. } |
<!ELEMENT PERSON (NAME, HEIGHT, WEIGHT, EMAIL?)>
By this declaration, you must insert "NAME" element to "PERSON" element first, "HEIGHT" element second, "WEIGHT" element third and may insert "EMAIL" element.
Applications can know such rules with
DTD#getInsertableElements() / DTD#getAppendableElements()
.
TXElement el = new TXElement("PERSON"); .... switch (dtd.getContentType("PERSON")) { case 0: // This element is not declared. break; case ElementDecl.EMPTY: // Any element is not insertable. break; case ElementDecl.ANY: // Any element is insertable. break; case ElementDecl.MODEL_GROUP: Hashtable tab = dtd.prepareTable("PERSON"); // This hashtable is reusable for any elements. dtd.getAppendableElement(el, tab); if (((InsertableElement)tab.get(DTD.CM_ERROR)).status) { // This element has incorrect structure. } else { Enumeration en = tab.elements(); while (en.hasMoreElements()) { InsertableElement ie = (InsertableElement)en.nextElement(); if (!ie.name.equals(DTD.CM_ERROR) && !ie.name.equals(DTD.CM_EOC) && ie.status) { if (ie.name.equals(DTD.CM_PCDATA)) { // Can append TextElement instance to el. } else { // Can append Element instance named ie.name. } } } } break; } |
Namespace spec. is in progress. This implementation is experimental.
Parser#setProcessNamespace(true)
when you need namespace feature.
getTagName() / getName()
always return a qualified name.
getNSName() / getNSLocalName()
has no value without namespace support.
TXElement#setTagName() / TXPI#setName()
,
getNSName() / getNSLocalName()
return null
.
setNSName() / setNSLocalName()
doesn't change a return value of
getTagName() / getName()
.
"rdf:assertion" without namespace support
| "rdf:assertion" with namespace support
| "author" with namespace support
| |
---|---|---|---|
TXElement#getTagName()
/ TXAttribute#getName()
/ TXPI#getName()
| "rdf:assertion"
| "rdf:assertion"
| "author"
|
getNSName()
| null
| "http://www.w3.org/TR/WD-rdf-syntax/"
| null
|
getNSLocalName()
| null
| "assertion"
| null
|
getUniversalName()
| null
| "http://www.w3.org/TR/WD-rdf-syntax/:assertion"
| "author"
|
TXElement / TXText / TXComment / TXPI
have getDigest()
method.
This method returns digest(hash) value (128bit MD5 in default).
TXElement#getDiget()
returns a digest value consisted of itself and all children.
When a child is modified, all parent element's getDigest()
returns a new digest value.
You need to rewrite much about namespace.
Old | New |
---|---|
TXElement#searchChildren() | TXElement#getElementNamed() |
TXElement#getNthElementByTagName() | TXElement#getNthElementNamed() |
Old | New |
---|---|
NodeIterator#getCurrent() | removed |
NodeIterator#toNext() | NodeIterator#toNextNode() |
NodeIterator#toPrevious() | NodeIterator#toPrevNode() |
NodeIterator#toFirst() | NodeIterator#toFirstNode() |
NodeIterator#toLast() | NodeIterator#toLastNode() |
NodeIterator#toNth(int) | NodeIterator#moveTo(int) |
NodeIterator#toNode(Node) | removed |
NodeIteraotr ni = el.getChildNodes(); for (Node n = ni.toFirst(); null != n; n = ni.toNext()) { .... } | NodeIteraotr ni = el.getChildNodes(); Node n; while (null != (n = ni.toNextNode())) { .... } |
print()
methods in some classes called by
Document#print()
method never add extra white-spaces.
TXElement top = new TXElement("FOO"); top.addElement(new TXElement("BAR")); top.print(new PrintWriter(new InputStreamWriter(System.out)), null, 0); |
<FOO> <BAR/> </FOO> |
<FOO><BAR/></FOO> |
When you want formatted output,
Use com.ibm.xml.parser.Format
class.
If you have a class implemented StreamProducer
,
add closeInputStream(java.io.InputStream)
method to it.
TXElement#TXElement(String,String,String) was removed.
W3C Document Object Model (DOM) working draft was updated at 18 Mar. 1998. XML for Java now adapt this new draft. So, some DOM APIs were changed from previous version of XML for Java.
NodeEnumerator
was removed. Use NodeIterator
NodeEnumerator ne = parent.getChildren().getEnumerator(); Node ch; while (null != (ch = ne.getNext())) { : : |
NodeIterator ni = parent.getChildNodes(); for (Node ch = ni.toFirst(); null != ch; ch = ni.toNext()) { : : |
old | new |
---|---|
Node.NodeType.* | Node.* |
Node#hasChildren() | Node#hasChildNodes() |
TXAttribute.T_ENUMERATION | AttDef.NAME_TOKEN_GROUP |
TXAttribtue.T_* | AttDef.* |
TXAttribute.S_TYPESTR | AttDef.S_TYPESTR |
AttDef.D_* | AttDef.* |
AttDef#setDefaultValue() | AttDef#setDefaultStringValue() |
AttDef#getDefaultValue() | AttDef#getDefaultStringValue() |
AttDef#setType() | AttDef#setDeclaredType() |
AttDef#getType() | AttDef#getDeclaredType() |
AttDef#setDefault() | AttDef#setDefaultType() |
AttDef#getDefault() | AttDef#getDefaultType() |
DTD.CM_EMPTY / DTD.CM_ANY / DTD.CM_REGULAR | ElementDecl.EMPTY / ElementDecl.ANY / ElementDecl.MODEL_GROUP |