[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: Issue: Shall we add the pattern facet to <text>?
> The parser receives start tag events, end tag events, character events, > etc. It creates chunks of text by concatenating adjacent character > events. Then, it validates each chunk of text (after concatenation > wherever possible) against the pattern. That's certainly one possibility, but I strongly dislike it. I don't believe the chunking in mixed text should be significance. For example, if I have <p>This is <em>mixed</em> content.</p> the two chunks "This is " and " content" do not represent any sort of logical grouping. It would make much more sense validate the string: "This is mixed content" (ie the concatentation of the string descendants). But that's not always the right thing either: <p>This is mixed<footnote>See page 72</footnote> content.</p> As far as I remember, the main issue from the I18N folks was to be able to restrict the set of characters. For this it would make more sense to check each individual character (this can in fact be implemented very efficiently). The way to do this would be to allow the specification of a character class. The pattern facet is not really the right kind of thing for this, since it is designed for constraining strings not individual characters. I think we shouldn't do this for 1.0 because - it's too hard (to do it well we would need to invent a specification for character classes) - it's on the wrong side of the 80/20 line (especially when compared to some of the things we have already decided to leave out, such as minOccurs and maxOccurs) In 2.0, we could consider putting some sort of specification of character classes inside <text/>. In the meantime, implementors can experiment by adding annotations inside <text/>. James
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC