DITA Proposed Feature # 12035

Allow any element to override its collation value

Longer description

The index-sort-as element gives indexterms the ability to override how the index term is treated when alphabetically sorting the generated index. This is useful for non-alphabetic terms:
<indexterm>1927<index-sort-as>Nineteen Twenty-Seven</index-sort-as></indexterm>
It is also crucial for languages such as Japanese where there is not a one-to-one relationship between orthography and pronunciation.
This proposal suggests a similar mechanism for elements other than indexterm, for other situations when a processor performs sorting or indexing or collation. The suggested name for the element is collate-as. Examples of markup:
  • <codeph>int fopen(const char *f);<collate-as>fopen</collate-as></codeph>
  • <lq><lines>The wasp, he is a nasty one,
    He scavenges and thrives.
    Unlike the honest honey-bee,
    He doesn't care for hives.
    - Pam Ayres<collate-as>Wasp, he is a nasty one, The</collate-as></lines></lq>
  • Pongo is a character in <cite>101 Dalmatians<collate-as>One Hundred and One Dalmatians</collate-as></cite>.

Statement of Requirement

The intent of this proposed feature is to give authors the ability to decouple the collation order of an element from its textual appearance. This is useful in cases where content is auto-generated, such as lists of terms/tables/figures and indexes.

Use Case

Any case where processing auto-generates or re-orders items based on those items' string values. Here are two examples.

Alphabetization of glossary entries
Glossaries are usually sorted based on their title. When the title cannot indicate where the entry should appear in the glossary, use collate-as:
<glossentry>
  <title>100 Years' War<collate-as>Hundred Years War</collate-as></title>
  <glossdef>A conflict between England and France in the Middle Ages.</glossdef>
</glossentry>
Sorting lists and tables
When a list or table is long, authors may want to keep its entries in alphabetical order. This is easy until the list or table contains items brought in by conref. A specialized processor could handle the sorting, with collate-as indicating how the item is to be alphabetized:
<ul-sorted>
  <li>Pink Floyd</li>
  <li>U2</li>
  <li conref="sometopic.xml#topicid/someartist"/>
  <li>Alan Parsons<collate-as>Parsons, Alan</collate-as></li>
  <li>The Alan Parsons Project<collate-as>Parsons, Alan</collate-as></li>
  <li conref="someothertopic.xml#topicid/someotherartist"/>
  <!-- Hundreds more here. -->
</ul-sorted>

Scope

Trivial. Can be done as a specialization of data, which is in almost every elements' content model. May benefit from a DITA 1.2 proposal which limits specializations to certain elements.

Technical Requirements

If implemented as a specialization, new DTD/Schema files and alterations to the topic shells. Toolkit implementations need to ignore the collate-as element (which they do for data anyway) except where they are looking for the collation value of an element.

New or Changed Specification Language

In the architectural specification, a new subsection of "DITA processing" needs to be created to explain the meaining of collate-as. It should give examples similar to the ones in this proposal. Default handling of the element (to ignore it) and cases where it should be consulted (list generation) should be discussed. The specification should state that collate-as does not affect the language used in the collation process (that remains with xml:lang), only the value used in the collation process. The specification should emphasize that this element is never required, and that it may be meaningless for some outputs. The specification should remind readers that in the context of translation, it may be necessary to create or destroy collate-as elements, depending on the source and target language.

The language specification should describe the collate-as element and give its content model (#CDATA).

Costs

Toolkits which use strings for collation purposes need to compare the contents of collate-as instead of its container element where it exists. XSLT for this is a simple change.

Benefits

Toolkits will be able to correctly sort glossaries with unusual titles. Japanese users will get more benefit from this feature than others. The index-sort-as element may be unifiable with collate-as.