DITA Proposed Feature # 45

Add See, See Also indexing elements.

Longer description

The current indexterm element cannot express the full range of indexing semantics needed for production book indexes. The inability to express "see" and "see also" index entries is one glaring example. This proposal introduces new child elements of indexterm that address the missing functionality.

  • redirection elements:
    • see
    • see also
  • sort order specification
  • page ranges

The redirection elements' content will be the text of the referred-to entry. They will redirect the reader to the appropriate index entry.

Sort order will be specified by an element whose content will be the text to sort by. This controls how the index entry would be sorted relative to its peers, where the default would be to sort it by the entry's text. This allows a sort that ignores common words like "the" and "a," or leading punctuation characters such as "@href". This feature is important for some languages. For example, Japanese index entries would typically be sorted by their phonetic equivalent which cannot be reliably derived from the index entry text. Typically, correct sorting would require the presence of both the index entry text and the phonetic text.

Page range will be marked by start and end range markers. Within a topic, these markers will only span within that topic. Cross-topic spanning can be achieved by coalescing adjacent ranges ("DITA page 12-14, 14-15" becomes "DITA page 12-15"), or by spanning at the map level. If there is a start range marker but no end range marker, the index entry containing the start range marker will degrade into a simple single-page entry. The presence of an unmatched end range marker will cause a processing error. Note: Scott Prentice on the dita-users list suggested topic spanning by using an attribute to indicate that the indexterm spans the entire topic, which is another alternative.

All new indexterm child elements will have an "index-" prefix. So an index entry might look like:

 <indexterm>Art
     <indexterm>The Drawing
         <index-sort-as>Drawing</index-sort-as>
         <index-see-also>Illustration</index-see-also>
         <index-range-start/>
     </indexterm>
 </indexterm>

The linking behavior of the redirection elements presents some difficulty, because it is not simply a matter of linking directly to another indexterm element. For example, consider the following extract from the index of a hypothetical book, "The Complete Aquarium Book":

  • Carassius auratus
    • see Goldfish
  • Carp, 56, 67, 89
    • cooking, 789
    • see also Goldfish
  • Feeding, 348
    • see also Goldfish feeding
  • Goldfish
    • selection, 12-13
    • feeding, 56
    • flushing, 128, 345

In this example, there isn't a specific indexterm element for "Goldfish."It is only implicit in a multi-level indexterm such as <indexterm>Goldfish <indexterm>feeding </indexterm> </indexterm>. Any redirection to "Goldfish" remains valid even if individual entries like "selection" disappear depending on the DITA map's topic selection. In the case of "Carp," there are multiple indexterm instances that created the main entry: any redirection to "Carp" should remain valid even if one or more of these indexterms disappear. A third difficulty is that the reference to "Goldfish feeding" points to a nested indexterm. We need to define an identifier that a redirection element such as index-see can use to point to something yet to be generated. Some definitions might be useful:

  • The identifier of an indexterm instance (whether at top level or nested within other indexterms) is a chain of nested identifiers consisting of the top-level text contents at each level, down to and including the contents of the indexterm instance being identified.
  • An index entry is generated from an indexterm to populate the index section of a book.
  • The identifier of an index entry is the identifier of any indexterm instance that generated it. Each leaf indexterm generates one index entry.
  • A combined index entry is a coalesced entry that happens as follows.
    • Whenever two index entries have the same identifier, they are combined, and their page numbers are listed in increasing order in the combined entry.
    • Whenever the identifiers of two index entries agree on a prefix, they are combined. The common portion is listed once, and the differing portions are listed as siblings within the common portion.
  • A reference to an index entry is made in an indexterm by specifying the identifier of the index entry. A reference may be made to a non-leaf index entry.

If we adopt the textual content of an indexterm as its identifier, we can make links from an indexterm to an index entry or a combined indexed entry that is yet to be generated. For example, <index-see>Goldfish </index-see> will link to "Goldfish" because its content is the combined index entry's common prefix identifier. Similarly, a link to "Goldfish feeding" is possible with <index-see>Goldfish <indexterm>feeding </indexterm> </index-see> which mirrors the hierarchy of the indexterm that generates "Goldfish feeding": <indexterm>Goldfish <indexterm>feeding </indexterm> </indexterm>.

Discussion:

Scope

Small. Introducing a few new elements to the DTDs is trivial.

Use Case

Technical Requirements

New child elements should be introduced to the indexterm element. They will not be specializations of indexterm itself: a domain specialization of indexterm will be available anywhere that indexterm is available, but these new elements are only meaningful as children of an indexterm element.

Costs

While the actual DTD changes are minor, the new semantics may require substantial work for processors that generate indexes.

Benefits

With these additions, DITA adopters will have a standard way to generate production quality book indexes.

Time Required

A few days.

Issues

The subcommittee responsible for this feature do not have full agreement on the details. We have decided to submit the proposal for full debate. The following are issues we have left in the air:
  • There is disagreement on whether the index-see and index-see-also elements are semantically identical and should be just one element.
  • Instead of introducing many elements perhaps we can use just one with an attribute to differentiate between them. This would reduce the disruption of adding many new elements. On the other hand, differentiating types in this manner is not very author-friendly, since it's easier to see element names than attribute values.