Allow any element to override its collation value
Longer description
The
index-sort-as element gives indexterms the ability to override how the index term is treated when alphabetically sorting the generated index. This is useful for non-alphabetic terms:
<indexterm>1927<index-sort-as>Nineteen Twenty-Seven</index-sort-as></indexterm>
It can also be used in the absence of Japanese ruby (furigana) to overcome collation issues where there is not a one-to-one relationship between orthography and pronunciation.
This proposal suggests a similar mechanism for elements other than
indexterm, for other situations when a processor performs sorting or indexing or collation. The suggested name for the element is
collate-as. Examples of markup:
-
<codeph>int fopen(const char *f);<collate-as>fopen</collate-as></codeph>
-
<lq><lines>The wasp, he is a nasty one,
He scavenges and thrives.
Unlike the honest honey-bee,
He doesn't care for hives.
- Pam Ayres<collate-as>Wasp, he is a nasty one, The</collate-as></lines></lq>
-
Pongo is a character in <cite>101 Dalmatians<collate-as>One Hundred and One Dalmatians</collate-as></cite>.
Statement of Requirement
The intent of this proposed feature is to give authors the ability to decouple the collation order of an element from its textual appearance. This is useful in cases where content is auto-generated, such as lists of terms/tables/figures and indexes.
Use Case
Any case where processing auto-generates or re-orders items based on those items' string values. Here are two examples.
- Alphabetization of glossary entries
- Glossaries are usually sorted based on their title. When the title cannot indicate where the entry should appear in the glossary, use collate-as:
<glossentry>
<title>100 Years' War<collate-as>Hundred Years War</collate-as></title>
<glossdef>A conflict between England and France in the Middle Ages.</glossdef>
</glossentry>
- Sorting lists and tables
- When a list or table is long, authors may want to keep its entries in alphabetical order. This is easy until the list or table contains items brought in by conref. A specialized processor could handle the sorting, with collate-as indicating how the item is to be alphabetized:
<ul-sorted>
<li>Pink Floyd</li>
<li>U2</li>
<li conref="sometopic.xml#topicid/someartist"/>
<li>Alan Parsons<collate-as>Parsons, Alan</collate-as></li>
<li>The Alan Parsons Project<collate-as>Parsons, Alan</collate-as></li>
<li conref="someothertopic.xml#topicid/someotherartist"/>
<!-- Hundreds more here. -->
</ul-sorted>
Scope
Trivial. Can be done as a specialization of data, which is in almost every elements' content model.
Since data is in the content model for almost every DITA element, collate-as will also become available as a child of almost every element. This may be undesirable, so a restriction in the shell allowing collate-as only in certain elements (by way of proposal #12008) is probably a good idea.
Technical Requirements
If implemented as a specialization, new DTD/Schema files and alterations to the topic shells. Toolkit implementations need to ignore the collate-as element (which they do for data anyway) except where they are looking for the collation value of an element.
New or Changed Specification Language
In the architectural specification, a new subsection of
"DITA processing" needs to be created to explain the meaining of
collate-as. It should cover the following points:
-
Default handling of the element (to ignore it) and cases where it should be consulted (list generation).
-
collate-as does not affect the language used in the collation process (that remains with xml:lang), only the value used in the collation process.
-
Other specializations may define behaviour which affects collation values (for example, the indexing domain already does so for index-sort-as, and a ruby specialization may make the rt element the default collation value of a ruby group). Such cases override the use of collate-as.
-
collate-as is never required, and that it may be meaningless for some outputs.
-
In the context of translation, it may be necessary to create or destroy collate-as elements, depending on the source and target language.
The language specification should describe the collate-as element and give its content model (#CDATA).
Costs
Toolkits which use strings for collation purposes need to compare the contents of collate-as instead of its container element where it exists. XSLT for this is a simple change.
Benefits
Toolkits will be able to correctly sort glossaries with unusual titles. Japanese users will get more benefit from this feature than others. The index-sort-as element may be unifiable with collate-as.