[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: DOCBOOK-APPS: xmllint and &
Daniel Veillard writes: > On Tue, Dec 17, 2002 at 10:58:04AM -0500, Jeff Beal wrote: > > I'm getting the following error when parsing my documentation with xmllint: > [...] > > When I edit my local copy of the DocBook DTD and remove the following line > > from the iso-num.ent file, everything works: > > <!ENTITY amp "&"> <!-- AMPERSAND --> > > > > Any comments or suggestions on how to fix this without messing with the DTD? > > I have, by the way, verified that xmllint is reading the other character > > entities just fine. It seems only to be a problem with the & entity. > > And I don't understand what's happening, no such problem on > a smaller testcase: > > paphio:~/XML -> cat tst.xml > <?xml version="1.0" ?> > <!DOCTYPE foobar SYSTEM "tst.dtd"> > <foobar></foobar> > paphio:~/XML -> cat tst.dtd > <!ENTITY amp "&"> <!-- AMPERSAND --> > paphio:~/XML -> xmllint --loaddtd --noout tst.xml > paphio:~/XML -> > > and it's the first time I heard of such a problem. > however I note that the DTDs installed on my system for DocBook have > <!ENTITY amp "&#38;"> <!-- AMPERSAND --> > instead in docbook/xml-dtd-4.2-1.0-14/ent/iso-num.ent > but older version had the old style declaration but commented: > 3.1.7/ent/iso-num.ent: > <!-- predeclared in XML <!ENTITY amp "&"--> <!-- AMPERSAND --> > > strange, > There's nothing strange here. It's just one of the reasons, why you don't like mixing SGML and XML applications on unix. The reason why you don't see a problem in your test, is that you don't use the entity. If you add a '&' and use xmllint --loaddtd you will get the error. So your test case is a bit too small. XML *requires* amp to be declared as <!ENTITY amp "&#38;"> (or &#x26; if one prefers hex codes) See section 4.6 of the xml spec. I think the reason is, that reading the entity declaration makes & from &#38;, which is read again when the entity is used giving &. If you just declare it as & reading the entity declaration gives & and when the entity is used a single '&' is found. Similar arguments apply to < which must be declared as "&#60;" an not just <. For SGML & or & for amp is ok. But SGML even acepts 'abc & def' in PCDATA. So the answer to the initial question is, no, this cannot be fixed without changing the DTD since it's the DTD that is broken. The only thing one might consider in libxml is a warning whenever a predefined entity is defined in a way differing from what the xml spec requires. The spec says (again section 4.6): ... If the entities in question are declared, they must be declared as internal entities whose replacement text is the single character being escaped or a character reference to that character, as shown below. ... greetings Morus
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC