DITA Proposed Feature # 12038

Add a new element based on an expansion of the extant DITA <keyword> element to assist in the resolution and handling of abbreviated-form text such as acronyms, general abbreviations, and short forms in source and target text within DITA documents

Longer description

Abbreviated forms, such as acronyms, are ubiquitous in technical documentation. Although there are similarities between abbreviated forms and glossary terms from the localization and presentation points of view, abbreviated forms are a special case. Abbreviated forms need to be expanded in the first encounter within a printed document. In electronic published documents, abbreviated form expansions can also be made available in the form of a hyperlink or 'tool tip' mechanism. In addition, the abbreviated form expanded text should be available for automatic inclusion in glossary entries for the publication. This proposal relates to all types of abbreviations, such as acronyms, initialisms, apocope, clipping, elision, syncope, syllabic abbreviation, and portmanteau.

Statement of Requirement

Abbreviated forms and their translations require special handling:


If the first occurrence of an abbreviated form in English is followed by its full form in parentheses, the translated version may require the expanded form followed by the abbreviated form in parentheses. It might also be necessary to include the English and a translation.

For example, in a Polish book on Java Web programming, the first reference to JSP may appear as follows:

"JSP (ang. Java Server Pages)"

Another example from a publication concerning OASIS:

"OASIS (ang. Organization for the Advancement of Structured Information Standards—organizacja dla propagowania strukturalnych standardów infomracyjnych)"

In the first example, the translator assumes the reader will not require a translation of the English abbreviated form. In the second example, the translator assumes the reader may not understand the English expanded form and therefore adds the translation.

Technical Proposal

The proposal is to create an element which would be a specialized form of the <keyword> element. The abbreviated form resolution will be via the conref attribute to the abbreviated form's text for short, expanded, and first forms. The abbreviated form element is designed to be extended via specialization to reflect the actual form of abbreviation, for example:

<abbreviated-form conref="acronyms.dita#acronyms/abs"/>

The entry in the acronyms.dita file would be as follows:

<abbreviated-form id="abs">
  <expanded>Anti-lock Braking System</expanded>
  <short>ABS</short>
  <surface-form>Anti-lock Braking System (ABS)</surface-form>
</abbreviated-form>
Note: The ID of the abbreviated-form element only needs to be unique to the file in which it is defined and does not need to match the acronym, so translations of the above example will continue to use id="abs"..

The <expanded> form will be a specialization of the <keyword> element, while the <short> element will be a specialization of the <data> element. This means that the expanded term is a normal phrase, while the short form is metadata that is hidden when processes do not know what to do with it. Translation processes should treat this data specialization as a subflow element for the purposes of translation. The <surface-form> element represents how the acronym should be displayed on the first occurrence of the acronym, or for hypertext display with the tool-tip rendition.

Table 1. New specialized elements to support acronyms
This new element… Is specialized from this base element…
<abbreviated-form> <keyword>
<expanded> <keyword>
<short> <data>
<surface-form> <keyword>

The first time an abbreviated form is encountered, the processing tool should use the text in the <surface-form> element. Subsequent instances should be replaced by the contents of the <short> element. The <expanded> form is designed to be used in glossaries. These three elements therefore allow the full needs of acronym handling to be met:


Note: This proposal assumes that <keyword> can be nested inside <keyword>, which is not supported in DITA 1.1, but is a proposed feature of DITA 1.2 (see proposal #12020).

Translation Issues

The following cases must be contemplated when working with documents that require internationalization:


Rendition

Authors will enter the <abbreviated-form> element for every occurrence of a given acronym.

At compose time, when putting together the publication, the publishing tool will print the <surface-form> element the first time. The ABS acronym used in previous examples would be rendered as:

"The Anti-lock Brake System (ABS) system will prevent the car from skidding in adverse weather conditions."

Subsequent instances will then be rendered as:

"The ABS system will provide the driver with feedback via the brake pedal."

Technical Requirements

A new <abbreviated-form> element needs to be created, which is a specialization of the <keyword> element. The content model of <abbreviated-form> is:

abbreviated-form = ( expanded, short, surface-form )

Elements expanded and surface-form are specialized from <keyword>.

Element short is specialized from <data>.

New or Changed Specification Language

The Architectural Specification should include a new topic on abbreviated forms. The title of this section should be "Acronyms and other Abbreviated Forms". The shortdesc element should contain the following content: "Resolution and handling of abbreviated-form text such as acronyms, general abbreviations, and short forms in source and target text within DITA documents"

The first paragraph of the acronym topic should include the following overview: "Abbreviated forms, such as acronyms, are ubiquitous in technical documentation. Although there are similarities between abbreviated forms and glossary terms from the localization and presentation points of view, abbreviated forms are a special case. Abbreviated forms need to be expanded in the first encounter within a printed document. In electronic published documents, abbreviated-form expansions can also be made available in the form of a hyperlink or 'tool tip' mechanism. In addition, the abbreviated-form expanded text should be available for automatic inclusion in glossary entries for the publication. This topic relates to all types of abbreviations, such as acronyms, initialisms, apocope, clipping, elision, syncope, syllabic abbreviation, and portmanteau."

Following paragraphs should be taken from Statement of Requirement, as-is.

This should be followed by a new section titled “Usage Guidelines” or similar, with the following content:

The basis of acronym handling is the <abbreviated-form> element, which is a specialized form of the <keyword> element. The abbreviated form resolution will be via the conref attribute to the abbreviated form's text for short, expanded, and first forms. The abbreviated-form element is designed to be extended via specialization to reflect the actual form of abbreviation, for example:

<abbreviated-form conref="acronyms.dita#acronyms/abs"/>

The entry in the acronyms.dita file would be as follows:

<abbreviated-form id="abs">
  <expanded>Anti-lock Braking System</expanded>
  <short>ABS</short>
  <surface-form>Anti-lock Braking System (ABS)</surface-form>
</abbreviated-form>
Note: The ID of the abbreviated-form element only needs to be unique to the file in which it is defined and does not need to match the acronym, so translations of the above example will continue to use id="abs".

The <expanded> form will be a specialization of the <keyword> element, while the <short> element will be a specialization of the <data> element. This means that the expanded term is a normal phrase, while the short form is metadata that is hidden when processes do not know what to do with it. Translation processes should treat this data specialization as a subflow element for the purposes of translation. The <surface-form> element represents how the acronym should be displayed on the first occurrence of the acronym, or for hypertext display with the tool-tip rendition.

The first time an abbreviated form is encountered, the processing tool should use the text in the <surface-form> element. Subsequent instances should be replaced by the contents of the <short> element. The <expanded> form is designed to be used in glossaries. These three elements therefore allow the full needs of acronym handling to be met:


The above section should be followed by a section titled “Translation Issues,” to be taken as-is from Translation Issues.

The above section should be followed by a section titled “Rendition,”,to be taken as-is from Rendition.

The Language Reference should include a new topic for each new element in this proposal:


For each of these elements, the description should state something like "This element is part of the acronym feature. For usage details, please read the Architectural Specification." The rest of the element topic would be similar to other element descriptions, with each section (parents, children, attributes, etc.) filled in appropriately.

Costs

We do not believe that the addition of the <abbreviated-form> elements as a specialization of <keyword>, and its child elements <expanded>, <short>, and <surface-form> involves significant work.

Benefits

Abbreviated forms will be handled in a uniform and consistent manner. The handling of the abbreviated form will be under the control of the composition software. The first occurrence of the abbreviated form can show the <surface-form>. The text for both the source and target languages will be consistent as it will be resolved via the conref attribute from a single source. The resolution of the abbreviated form can be completely under the control of the composition software so that glossary, tooltip, and first forms can be provided as required to meet the end-user requirements.