DITA Proposed Feature #45a

Add sort order indexing element

Longer Description

The current indexterm element cannot express the full range of indexing semantics needed for production book indexes. This proposal addresses the ability to specify a sort phrase under which an index entry would be sorted. This feature gives an author the flexibility to sort an index entry in an index differently from how its text normally would be sorted. The common use for this is to disregard insignificant leading text, such as punctuation or words like "the" or "a". For example, the author may want <data> to be sorted under the letter D rather than the left angle bracket (<). An author may want to include such an entry under both the punctuation heading and the letter D, in which case there can be two index entry directives differentiated only by the sort order.

Certain languages may have special sort order needs. For example, Japanese index entries might be written partially or wholly in kanji, but need to be sorted in phonetic order according to its hiragana/katakana rendition. There is no reliable automated way to map written to phonetic text: for kanji text, there can be multiple phonetic possibilities depending on the context. The only way to correctly sort Japanese index entries is to keep the phonetic counterparts with the written forms. The phonetic text would be presented as the sort order text for indexing purposes.

There is a third use case for sort order: to generate "see also X" entries, authors would specify a sort order X so that it would not sort under the letter S that starts all such entries. Since there is a separate proposal to introduce see/see also semantics to indexterm in the form of actual elements, this use case will not be necessary.

Sort ordering is not uncommon in the publishing world. FrameMaker's index markers use square brackets ([]) to delimit sort order text. DocBook uses a SortAs attribute in its indexing elements to specify the same. LaTeX implements an out of sequence sort with the @ operator that distinguishes between the sort key and the index key. DITA can add a comparable function by adding a new optional element called <index-sort-as> to the indexterm content model. This element will contain the sort order text. If an indexterm does not need to specify a special sort order, it can omit this element. An index entry with a separate sort order would look like:


While this markup allows you to specify different sort orders for different instances of the same index entry, this can lead to inconsistent sort orders in cases where you want the same sort order. You may specify a global default sort order by placing an indexterm instance in a map's metadata (map/topicmeta/keywords/indexterm). There, the indexterm has no content context and will not generate an index page reference. Using index-sort-as there will set the global sort order expression for that term. The individual indexterm instances in the content may override this global default with their own index-sort-as elements.


Small: add one new element.

Use Cases

Technical Requirements

New child elements will be introduced to the indexterm element. They will not be specializations of indexterm itself: a domain specialization of indexterm will be available anywhere that indexterm is available, but these new elements are only meaningful as children of an indexterm element. These will be new elements that specialize off a new common index-base element. This will allow other indexing extensions to be added as specializations of index-base. This will make it easier for index generators to handle indexing elements that they understand while filtering out those that they cannot handle. There will be a new indexing domain indexing-p to accomodate the specializations of index-base.

The following is a sample implementation of the indexing elements in DTD:

<!-- shell DTD ----------------------------------------------------------->
<!ENTITY % indexing-d-dec     PUBLIC 
"-//OASIS//ENTITIES DITA Indexing Domain//EN" 
"indexingDomain.ent"                                                      >

<!ENTITY % index-base "index-base | %indexing-d-index-base;">

<!ENTITY included-domains 
                        "&ui-d-att; &hi-d-att; &pr-d-att; &sw-d-att;
                         &ut-d-att; &indexing-d-att;"                     >

<!ENTITY % indexing-d-def     PUBLIC 
"-//OASIS//ELEMENTS DITA Indexing Domain//EN" 

<!-- metaDecl.mod -------------------------------------------------------->
<!ELEMENT indexterm      (%words.cnt;|%indexterm;|%index-base)*>
<!ELEMENT index-base     (%words.cnt;|%indexterm;)*>

<!ATTLIST index-base   %global-atts;  class CDATA "- topic/index-base ">

<!-- indexingDomain.ent -------------------------------------------------->
<!ENTITY % indexing-d-index-base 
     "index-see | index-see-also | index-sort-as | index-range start | index-range-end">

<!ENTITY indexing-d-att "(topic indexing-d)">

<!-- indexingDomain.mod -------------------------------------------------->
<!ELEMENT index-see        (%words.cnt;|%indexterm;)*>
<!ELEMENT index-see-also   (%words.cnt;|%indexterm;)*>
<!ELEMENT index-sort-as (%words.cnt;|%indexterm;)*>
<!ELEMENT index-range-start EMPTY>
<!ELEMENT index-range-end EMPTY>

<!ATTLIST index-see %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-see ">
<!ATTLIST index-see-also %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-see-also ">
<!ATTLIST index-sort-as %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-sort-as ">
<!ATTLIST index-range-start %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-range-start ">
<!ATTLIST index-range-end %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-range-end ">


Minimal. There is one new element to be added to the DTDs and schemas that will be introduced to the indexterm content model. Processors should be updated to take advantage of this feature. Those that choose not to implement sort ordering can omit the new element and its content.


This will add an important feature to DITA's support for book indexes, one that is critical for certain languages.

Time Required

Minimal, no more than a day.