relax-ng message

Subject: Re: Issue: Shall we add the pattern facet to <text>?

From: James Clark <jjc@jclark.com>
To: Murata Makoto <mura034@attglobal.net>, relax-ng@lists.oasis-open.org
Date: Thu, 28 Jun 2001 10:47:36 +0700


> The parser receives start tag events, end tag events, character events,
> etc.  It creates chunks of text by concatenating adjacent character
> events.   Then, it validates each chunk of text  (after concatenation
> wherever possible)  against the pattern.

That's certainly one possibility, but I strongly dislike it.  I don't 
believe the chunking in mixed text should be significance.  For example, if 
I have

<p>This is <em>mixed</em> content.</p>

the two chunks "This is " and " content" do not represent any sort of 
logical grouping.  It would make much more sense validate the string:

  "This is mixed content"

(ie the concatentation of the string descendants).  But that's not always 
the right thing either:

<p>This is mixed<footnote>See page 72</footnote> content.</p>

As far as I remember, the main issue from the I18N folks was to be able to 
restrict the set of characters.  For this it would make more sense to check 
each individual character (this can in fact be implemented very 
efficiently).  The way to do this would be to allow the specification of a 
character class.  The pattern facet is not really the right kind of thing 
for this, since it is designed for constraining strings not individual 
characters.

I think we shouldn't do this for 1.0 because

- it's too hard (to do it well we would need to invent a specification for 
character classes)
- it's on the wrong side of the 80/20 line (especially when compared to 
some of the things we have already decided to leave out, such as minOccurs 
and maxOccurs)

In 2.0, we could consider putting some sort of specification of character 
classes inside <text/>. In the meantime, implementors can experiment by 
adding annotations inside <text/>.

James

References:
- Re: Issue: Shall we add the pattern facet to <text>?
  - From: Murata Makoto <mura034@attglobal.net>