OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

lexidma message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: Whitespace and the annotation module

Hi all,

One more comment for the 2nd public review. I have been thinking and I am still not sure about our rule for whitespace in the elements. In particular, in the converter I have been developing I am having problems because we cannot apply pretty printing (indenting) to an XML file without changing the content in the model. Further, I think the rules are unintuitive and many will add whitespace and create unfortunate errors. Instead I propose that we adopt the HTML methodology as described here:


In this case, before processing the content of any text carrying element, we will first remove all new lines ('\n', '\r'), delete all trailing and leading whitespace and replace all remaining blocks of ASCII whitespace with a single space.

I would also make a model change, replacing all references to 'non-empty string' with a 'normalised string'. This means a string that contains no new lines, does not start or end with a whitespace, contains no block of ASCII whitespace more than a single space and is non-empty. This ensures that other serializations (JSON, RDF) cannot generate content that cannot be represented in XML.

I do worry that this does not really cover Chinese, Japanese (and maybe Thai/Lao), as the whitespace rules for HTML are more complex in Unicode, but I think that this can probably be worked around by lexicographers working in these languages. We can add a note to the spec for these languages.



John P. McCrae
(he/him; #startsWithAName John (rhymes with "gone") McCrae (rhymes with "hay") /dÊÉn mÃkÉeÉ/)
Assistant Professor - SFI Insight Centre for Data Analytics, Data Science Institute & Computer Science, University of Galway

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]