DITA Proposed Feature #45b

Add page range indexing element

Longer Description

The current indexterm element cannot express the full range of indexing semantics needed for production book indexes. This proposal addresses the ability to express page ranges. Page ranges indicate where the index entry refers to an extended discussion that goes over a number of pages. According to the Chicago Manual of Style, "if a text discussion extends over more than one page (...), as it often does, beginning and ending references have to be given" (emphasis added). This would typically be manifested as a page range like 34-36. This is distinguished from individual references over consecutive pages (34, 35, 36).

The need to express page ranges is even more urgent in a single-sourcing context. You cannot tell when authoring if the pertinent range would span a page. Page breaks will change with the media type (letter size, pocket edition, large print edition). Inserting an illustration can turn a two paragraph range into a three page span. Index entry ranges will be even more common if the print boook and index pages use paragraph numbers instead of page numbers (e.g., 18.44-46), something quite common in nonfiction documentation. In this proposal, "page range" refers to both page number ranges and paragraph number ranges.

Index page ranges is a fairly common feature in authoring environments. Microsoft Word, FrameMaker and LaTeX offer index page ranges, to cite some of the most well-known applications. For writers moving to DITA, the absence of this feature will be quite jarring.

A page range cannot be expressed using starting and ending element tags: it would be too restrictive. Index entries are supposed to capture "pertinent statements" (c.f. Chicago Manual of Style), not structural content. According to the Microsoft Manual of Style for Technical Publications, an section range should not be in the index if it is listed in the table of contents. Rather, index entries capture content that may be orthogonal to the main content structure. Pertinent content for index entries can overlap or straddle each other or structural boundaries like task steps. The way to achieve this flexibility in expressing index page ranges is to use index "marker" elements within pairs of indexterm elements. This will consist of two new elements that will be added into the content model of the indexterm element:

  • index-range-start
  • index-range-end
For example, an index entry on cheese can start with:
<indexterm>cheese<index-range-start/></indexterm>

The range can close with:

<indexterm>cheese<index-range-end/></indexterm>

Due to the potential for orphaned range markers during map assembly, page ranges cannot span topics at the topic level. Index ranges that start within a topic must end in the same topic, excluding nested topics. Topic spanning can only occur at the map level by inserting indexterm elements into map metadata. Processors can handle unpaired ranges by generating individual references. That is, if there is an indexterm with a range start marker but does not have a corresponding indexterm that ends the range, it should just generate a single page number reference as if there was no range start marker. On the other hand, an indexterm that terminates a range but has no corresponding indexterm that starts the range should be dropped from output.

Scope

Small: introduce two new elements to the indexterm element's content model.

Use Case

An author adds a page spanning index entry: <indexterm>DITA<index-range-start/></indexterm>. Later in the same topic, she adds a range terminating marker: <indexterm>DITA<index-range-end/></indexterm>. This spans 4 pages on paper, so the generated PDF looks like:

  • DITA, 46-49

Technical Requirements

New child elements will be introduced to the indexterm element. They will not be specializations of indexterm itself: a domain specialization of indexterm will be available anywhere that indexterm is available, but these new elements are only meaningful as children of an indexterm element. These will be new elements that specialize off a new common index-base element. This will allow other indexing extensions to be added as specializations of index-base. This will make it easier for index generators to handle indexing elements that they understand while filtering out those that they cannot handle. There will be a new indexing domain indexing-p to accomodate the specializations of index-base.

The following is a sample implementation of the indexing elements in DTD:


<!-- shell DTD ----------------------------------------------------------->
<!ENTITY % indexing-d-dec     PUBLIC 
"-//OASIS//ENTITIES DITA Indexing Domain//EN" 
"indexingDomain.ent"                                                      >
%indexing-d-dec;

<!ENTITY % index-base "index-base | %indexing-d-index-base;">

<!ENTITY included-domains 
                        "&ui-d-att; &hi-d-att; &pr-d-att; &sw-d-att;
                         &ut-d-att; &indexing-d-att;"                     >

<!ENTITY % indexing-d-def     PUBLIC 
"-//OASIS//ELEMENTS DITA Indexing Domain//EN" 
"indexingDomain.mod">
%indexing-d-def;

<!-- metaDecl.mod -------------------------------------------------------->
<!ELEMENT indexterm      (%words.cnt;|%indexterm;|%index-base)*>
<!ELEMENT index-base     (%words.cnt;|%indexterm;)*>

<!ATTLIST index-base   %global-atts;  class CDATA "- topic/index-base ">

<!-- indexingDomain.ent -------------------------------------------------->
<!ENTITY % indexing-d-index-base 
     "index-see | index-see-also | index-sort-as | index-range start | index-range-end">

<!ENTITY indexing-d-att "(topic indexing-d)">

<!-- indexingDomain.mod -------------------------------------------------->
<!ELEMENT index-see        (%words.cnt;|%indexterm;)*>
<!ELEMENT index-see-also   (%words.cnt;|%indexterm;)*>
<!ELEMENT index-sort-as (%words.cnt;|%indexterm;)*>
<!ELEMENT index-range-start EMPTY>
<!ELEMENT index-range-end EMPTY>

<!ATTLIST index-see %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-see ">
<!ATTLIST index-see-also %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-see-also ">
<!ATTLIST index-sort-as %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-sort-as ">
<!ATTLIST index-range-start %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-range-start ">
<!ATTLIST index-range-end %global-atts; class CDATA 
                           "+ topic/index-base indexing-d/index-range-end ">

Costs

Minimal: only two new elements are added to one element's content model. Processors will have to be updated to interpret these elements, but they can degrade gracefully by ignoring them.

Benefits

This will add an important feature to DITA's support for book indexes, one that is fairly common.

Time Required

Less than a day.