DITA Proposed Feature # 12020

Generic text element

Longer description

Expand the content model of base elements which contain #PCDATA so that they can also contain a new text element.

Previous discussion of this proposed feature:

Statement of Requirement

Heavy users of conref have a need to create fragments of text which can be re-used in almost any context. This example is from the DITA 1.1 keyword nesting proposal:
The keyword element is often used to store common text such as product or platform names. This is done because keywords are allowed in nearly all locations that allow text. However, it is not possible to combine common strings into one single keyword for reuse. For example, both "Product A" and "Platform B" are common strings, and are often used together as "Product A for Platform B". To use the combined value, users must enter a new copy of each string in another keyword. Alternatively, they could always reference the two in text as <keyword conref="#topic/product"/> for <keyword conref="#topic/platform"/>. So, it is impossible to reuse the string "Product A for Platform B" without repeating text somewhere.
Similar issues happen with other elements such as term and specializations of these and of ph.
Specializers may have a requirement to disallow mixed content in an element, to ensure correct arity of child elements. For example, a specialization of keyword may need strict alternation of text with data elements:
<spec-keyword>
  text<spec-data-1/><spec-data-2>...</spec-data-2>
  text<spec-data-1/><spec-data-2>...</spec-data-2>
  text<spec-data-1/><spec-data-2>...</spec-data-2>
</spec-keyword>
This cannot be validated with XML Schema or DTD because mixed content models are too lax.

Use Cases

Conref of small pieces of text

Some elements such as keyword and term have a content model which does not allow further nesting of elements (discounting tm, which is not generic enough). Building such an element from multiple strings does not give the user a place to hang a conref.

Similarly, this problem can flow through to specializations, so that it is not possible to conref any text into a wintitle element.

With this proposal, text is available in elements (including text itself) that don't contain ph or keyword. Strings can be built from pieces and inserted by conref into any context:
<topic id="strings">
  ...
  <body>
    <keyword><text id="productA">Product A</text></keyword>
    <keyword><text id="platformB">Platform B</text></keyword>
    <keyword id="AforB"><text><text conref="#strings/productA"/> for <text conref="#strings/platformB"/></text></keyword>
  </body>
</topic>

<topic>
  <title>Using <keyword conref="strings.xml#strings/AforB"/></title>
  ...
</topic>
Removing mixed content from a specialization
Mixed content models in DTD and XML Schema cannot prevent the appearance of text in undesired places:
<!ELEMENT spec-keyword ((#PCDATA | spec-data-1 | spec-data-2)*)>
By placing the text content inside a container element:
<spec-keyword>
  <text>text</text><spec-data-1/><spec-data-2>...</spec-data-2>
  <text>text</text><spec-data-1/><spec-data-2>...</spec-data-2>
  <text>text</text><spec-data-1/><spec-data-2>...</spec-data-2>
</spec-keyword>
validation can now be more strict:
<!ELEMENT spec-keyword ((text, spec-data-1, spec-data-2)*)>
Note: text may be specialized so that it has a more appropriate name in the context of the specialization.

Scope

Minor. Content models for a few dozen elements require an additional entry.

Technical Requirements

The text element should be added to any element which allows #PCDATA but does not allow ph or keyword. In DITA 1.1, these elements are
  • keyword
  • term
  • tm

The text element should also be added to the content model of ph so that text can be conreffed into a phrase without additional semantics.

Rather than adding text, the following base elements should have ph added to their content model:
  • alt
  • linktext
  • navtitle
  • searchtitle
  • source
Specializations of these elements should also decide whether to include text (or keyword or ph, which in turn include text). In DITA 1.1, these elements are
Bookmap

  • revisionid gets keyword
  • year gets keyword
  • month gets keyword
  • day gets keyword
  • edition gets keyword
  • isbn gets keyword
  • volume gets keyword
Programming domain

  • option gets text
  • parmname gets text
  • synph gets text
  • apiname gets text
  • kwd gets text
Software domain

  • msgnum gets text
  • cmdname gets text
  • varname gets text
UI domain

  • wintitle gets text
  • shortcut gets text
Utilities domain

  • shape gets text
xNAL domain

  • honorific gets keyword
  • firstname gets keyword
  • middlename gets keyword
  • lastname gets keyword
  • generationidentifier gets keyword
  • postalcode gets keyword
  • country gets keyword
  • contactnumber gets keyword
The content model of text is
<!ELEMENT text ((#PCDATA | text)*)>
text contains all universal DITA attributes.

New or Changed Specification Language

This element requires no addition to the architectural specification.

The language specification should have an entry in the element reference for text. Here is a suggested description:

The text element associates no semantics with its content. It exists to serve as a container for text where a container is needed (e.g., for conref, or for restricted content models in specializations). Unlike ph, text cannot contain images. Unlike keyword, text does not imply keyword-like semantics. The text element contains only text data, or nested text elements. All universal attributes are available on text.

For contexts where ph is available, authors should use that element. Where keyword is available, authors should use that element. Where neither ph nor keyword is available, text can be used to pull content by conref.

The language specification for keyword should be reexamined now that it is no longer the preferred generic text container. It also contains a remark about output processing which is not appropriate for the language specification.

Costs

DTDs and Schemas must be updated.

Implementations need to include processing for text. This may require implementations to handle XML fragments where they once needed to handle only strings. Fallback behaviour (flattening the XML a la xsl:value-of) is not appropriate for text if attributes like dir or translate or filtering properties are present. Flattening is also not appropriate for specializations of text.

Benefits

Greater re-use of boilerplate text by users and fewer (to users) arbitrary limitations. Specializers can have more control over content models by avoiding mixed content.