DITA Proposed Feature #12010

Unify domains and topics to improve design flexibility and to simplify the DITA specialization constructs.

Longer description

Problem

DITA 1.1 packages elements in two distinct kinds of module: domains and topics. The elements provided by a topic module can only specialize the elements available in its base topic module. Elements provided by a domain module can only specialize the elements available in its base domain module (or the root topic module).

These restrictions significantly reduce the designer's flexibility in reusing vocabularies. The only method for combining vocabularies is domain extension, which is global and provides optional alternatives for the extended base element. Aside from these global alternatives, each vocabulary is effectively segregated from every other vocabulary except for its base vocabularies. As a result, designers who need elements in specific contexts must create elements with different names for the same content, resulting in greater complexity, difficulty in reusing content fragments, and duplication of design, processing, and documentation.

For example, the javaClass, javaInterface, and javaMethod topics cannot share a <javaInterfaceName> element for marking up Java interfaces in the definition of a class, interface, or method. Instead, javaClass must have <javaClassInterface> and javaMethod must have <javaMethodInterface>.

Inheritance Containment
apiRef
    apiClassifier
        javaClass
        javaInterface
    apiOperation
        javaMethod
 
 
  ...  javaClassInterfaceName
  ...  javaInterfaceName

  ...  javaMethodInterfaceName

Solution

Relax the restriction on specialization so that common vocabulary elements can be shared between modules. For example, allow the javaClass, javaInterface, and javaMethod topics to share a <javaInterfaceName> element. Allowing broader reuse of vocabularies reduces the work of designers and processing developers.

For an earlier version of this proposal, please see:

http://www.oasis-open.org/committees/download.php/14849/Issue32.html

Statement of Requirement

  • Allow more flexible reuse of vocabularies so the same element name can be used for to mark up a kind of content in all contexts.
  • Maintain reliable interoperability by generalization to ancestors.
  • Maintain reliable checking on resolution of conref across document types.
  • Minimize the increased burden on designers.

Use Cases

Taking advantage of existing vocabularies when defining new topic types

For example, a reference specialization for message content can require the existing <msgnum> and <msgph> elements from the software domain within the <msgRefTitle> element. Similarly, the reference topic type could be refactored to move the <refsyn> element into the programming domain as a general-purpose section wrapper for <syntaxdiagram> but still provided as an option within the <refbody> content model.

Providing common subelements within the content of several topic types

For example, the Java API reference provides topic types for documenting a Java class library, including javaClass, javaInterface, and javaMethod. These topic types derive from generic API types such as apiClassifier and apiOperation and thus don't have a single common base vocabulary. In the content, Java interface names are common across Java classes, interfaces, and methods for identifying implemented interfaces, base interfaces, method parameters, return values, and exceptions.

The proposal allows the javaInterfaceName element to be required in specific contexts within javaClass, javaInterface, and javaMethod topic types.

For another example, if the <prereq>, <context>, <steps>, and <result> specialized elements within the task vocabulary module were refactored as a domain, other information types could have a <context> element as substructure.

Extending an existing topic type with optional special alternatives to an existing element

For example, recovery tasks can have steps that apply to hardware or software. The proposal allows <hardwareStep> and <softwareStep> specializations of <step> to be offered as more specific alternatives to the <step> element within task content.

Scope

This proposal has the following impacts:

  • Refactors some existing domains and class attributes for the core DITA vocabulary modules.
  • Requires enhancements to existing conref and generalization processors.

Technical Requirements

Terminology

This proposal relies on a conceptual refactoring of the existing division between topics modules and domain modules. Instead, both kinds of modules can be seen as provide vocabularies for mark up of content fragments (which, in the case of topics, are complete content objects). These content fragments are rooted at elements but also have elements for substructure. The proposal formalizes this distinction in existing DITA practice between extension and substructure elements:

Extension element
An element that, under the control of a document type shell, can appear as an alternative or replacement for its base element in contexts where its base element can appear.

A specialization of <topic> is always an extension element. For instance, the <reference> element can appear instead of <topic> in all <topic> contexts including the top context in a document as well as nested contexts in the content models for the <dita> and <topic> elements.

Domain modules always provide one or more extension elements. For instance, the programming domain supplies the <apiname>, <codeblock>, <codeph>, <option>, <parml>, <parmname>, <synph>, and <syntaxdiagram> extensions of the base <dl>, <fig>, <keyword>, <ph>, and <pre> elements.

Substructure element
An element that appears only within the content fragment under an extension element.

For instance, the <properties> element from the reference vocabulary module can appear only in the content under the <reference> extension element. Similarly, the <plentry> element from the programming domain can appear only as a subelement of the <parml> extension element.

For another example, the UI domain supplies the <menucascade>, <screen>, <shortcut>, <uicontrol>, and <wintitle> elements. Of these elements, the <menucascade>, <screen>, <uicontrol>, and <wintitle> elements are provided as extensions of their base elements. That is, when the UI domain is integrated into a document type shell, these elements become alternatives for their base elements in all contexts where their base elements appear. By contrast, the <shortcut> element from the UI domain is a substructure element. It doesn't appear in every context where its base <keyword> element can appear but instead appears only under the <uicontrol> element.

Recognition of extension elements is important for developers of specialized processing because the processing handles content fragments rooted at extension elements rather than individual elements in isolation. For instance, a developer of specialized processing for the UI domain processes a <uicontrol> content fragment that may contain the <shortcut> element but doesn't have to process the <shortcut> element in isolation.

Specialization rules

The distinction between extension and substructure elements offers a strategy for relaxing some of the existing restrictions of specialization.

Note: Changes to the fundamentals of specialization aren't necessary. Specialized designs are still packaged as reusable XML Schema or DTD modules. An element still can only restrict and specialize the content model and attributes of its base element. An element still declares its type and base types with a defaulted class attribute. Processing still matches the class attribute.

The relaxed restrictions on specialization can be defined in terms of extension and substructure elements as follows:

  • All of the extension elements provided by a vocabulary must specialize base elements from a single vocabulary module (or one of its base vocabulary modules). That's true in DITA 1.1.

    For instance, if the <menucascade> extension element from the UI domain specializes the <ph> element of the topic module, all extension elements from the UI domain must specialize elements of the topic module. Similarly, if the <linkpath> element from the web user interface domain module extends the <menucascade> element, the other elements in the web user interface domain module must extend elements in the UI domain or topic modules.

    This rule preserves generalization in that every module has an unambiguous generalization fallback to a more general module. As described below, substructure elements have a somewhat different restriction.

  • A vocabulary module can extend any element defined by the base vocabulary modules. That's a new capability in this proposal.

    For instance, the <properties> element in the reference module can be extended by a <parameters> element in a paramref domains module. A document type shell can provide the <parameters> element as an alternative or replacement for the <properties> element.

    This rule preserves generalization in that every extension element generalizes to one element in the base module.

  • The substructure of an extension can use or specialize the substructure of its base element including extensions of the base substructure elements.

    For instance, because <codeph> can contain <keyword>, a <commandph> extension of <codeph> can contain or specialize <keyword>. That part is true in DITA 1.1.

    As a new capability in this proposal, the <commandph> extension can also contain or specialize any extension of <keyword>. For instance, <commandph> can require the <cmdname> extension of <keyword> provided by the software domain.

    This rule preserves generalization in that a specialization of a substructure extension can generalize back to either the extension element or to the extended substructure element in the base module. That is, using extension elements doesn't remove the ability to generalize to a single base module. In the example, <cmdname> generalizes back to <keyword>, which is valid in the content for <codeph>.

    Unless the extension module is constrained, the extension elements must also be available in other context for their base element. In other contexts, however, the extension element is an optional alternative to the base element. In a specialized content model, the extension module can replace the base element, be made mandatory, or have any number of occurrences that is valid in a specialized content model.

    This requirement also ensures that the content of the extension element in a specialized position can be conreffed into any other document with the extension element because the element will be available in any nested contexts. For example, <codeph> contains its base element <ph>. Integrating the programming domain in a document type extends <ph> with <codeph> in all contexts. Thus, global integration of the programming domain allows <codeph> to nest. If such nesting is not desired, constraints can be declared and applied on the extension.

    The same option of using extensions that are valid in the base substructure is available at any depth. The vocabulary module must declare not only the base module but these substructure dependencies (as described in the following section).

  • A vocabulary module that extends a top element (topic or map) must define architectural attributes on that extension. That's true in DITA 1.1.

    Otherwise, with regard to specialization and pluggable integration, a vocabulary module with a top extension follows the same rules as any other vocabulary module, and the extension element follows the same rules as any other extension element. To put it another way, a topic or map specialization is just a domain specialization that happens to extend the top element. That's new in this proposal.

Cyclical dependencies among modules are not allowed. As in DITA 1.1, the main risk for cyclical dependencies in specialization comes from elements that nest themselves, either directly as with <ph> or indirectly as with <p>, which can contain <lq>, which can contain <p>. An example of a cyclical dependency would be a <slogan> specialization of <ph> in the attribution vocabulary module whose content model lists the <catchphrase> specialization of <ph> in the soundbyte vocabulary module where the content model of <catchphrase> lists <slogan>. In this situation, a processor could not determine how to generalize the content.

Declarations

The conref and generalization processors must be able to recognize the base modules for every module. The domains attribute declares the dependencies for each vocabulary module in a parenthetical expression in which the module is the rightmost token.

Where substructure makes use of extension elements from modules other than the base module, the parenthetical expression identifies the combination of the base module and these extension modules. To avoid confusing which module is the base module or implying a dependency between the extension modules and the base module, each extension module is appended to the base module with a separating plus sign.

For example, the following declaration indicates that the codeConcept module has concept as its base module and has a dependency on the combination of the concept module and the programming extension module.

domains="... (topic concept+pr-d codeConcept) ..."

Here are the details about that example and some additional examples:

A specialized topic whose substructure includes a preexisting domain element
The codeConcept topic marks up an explanation of a programming technique.
  • The <codeConcept> extension element specializes <concept> and has a <codeConceptBody> substructure.
  • The <codeConBody> content model lists the <codeblock> element from the programming domain to show the code analyzed and explained by the remainder of the topic.

As a result, the codeConcept topic depends on the combination of the concept and programming modules. The architectural attribute declarations:

codeConcept/@domains: (topic concept+pr-d codeConcept)
codeConcept/@class:   (- topic/topic concept/concept codeConcept/codeConcept )
codeConBody/@class:   (- topic/body  concept/conbody codeConcept/codeConBody )
codeblock/@class:     (+ topic/pre   pr-d/codeblock                          )

Topic instances generalize to any of the following combinations of modules:
topic and concept and programming, topic and concept, topic and programming, topic

Both the topic and concept modules are available because the <concept> element is only available in contexts where the <topic> element can appear, and the <topic> element isn't excluded unless a constraint has been applied. (For more detail, see the 12008 constraints proposal.)

A specialized topic whose substructure specializes a preexisting domain element
The uiTask topic marks up a UI procedure.
  • The <uiTask> extension element specializes <task> and has <uiTaskBody> and <uiContext> substructure.
  • The <uiContext> content model lists the <uiMenuContext> specialization of <menucascade> in the UI domain to identify the menu item that launches the UI task.

As a result, the uiTask topic depends on the combination of the task and ui modules. The architectural attribute declarations:

uiTask/@domains:      (topic task+ui-d uiTask)
uiTask/@class:        (- topic/topic   task/task        uiTask/uiTask        )
uiTaskBody/@class:    (- topic/body    task/taskbody    uiTask/uiTaskBody    )
uiContext/@class:     (- topic/section task/context     uiTask/uiContext     )
uiMenuContext/@class: (+ topic/ph      ui-d/menucascade uiTask/uiMenuContext )

Topic instances generalize to any of the following combinations of modules:
topic and task and UI, topic and task, topic and UI, topic

A specialized domain extension whose substructure includes a preexisting domain element
The <commandph> element marks up inline code phrases that show the invocation of a command. extension of <codeph>
  • The <commandph> extension element specializes <codeph> from the programming domain and lists the <cmdname> element from the software domain to identify the invoked command.

As a result, the cli module depends on the combination of the programming and software modules. The architectural attribute declarations:

topic/@domains:   (topic pr-d+sw-d cli-d)
commandph/@class: (+ topic/ph      pr-d/codeph  cli-d/commandph )
cmdname/@class:   (+ topic/keyword sw-d/cmdname                 )

Content instances generalize to any of the following combinations of modules:
topic and programming and software, topic and programming, topic and software, topic

A domain with extension and substructure elements that specialize different domains
The <widget> element marks up a component provided by a UI library.
  • The <widget> extension element specializes <uicontrol> from the UI domain and has the <widgetName> specialization of the <apiname> extension element from the programming domain to identify the API object used to display and manage the widget.

As a result, the widgetlib module depends on the combination of the UI and programming modules. The architectural attribute declarations:

topic/@domains:    topic ui-d+pr-d widgetlib-d)
widget/@class:     (+ topic/ph      ui-d/uicontrol widgetlib-d/widget     )
widgetName/@class: (+ topic/keyword pr-d/apiname   widgetlib-d/widgetName )

Content instances generalize to any of the following combinations of modules:
topic and UI and programming, topic and UI, topic and programming, topic

A domain that extends the substructure of a topic
The <parameters> element marks up the parameters as part of the reference for a command, function, or statement.
  • The <parameters> extension element specializes <properties> from the reference topic and contains the <paramtype> and <paramdesc> specializations of <proptype> and <propdesc>.

As a result, the paramref module depends on the reference topic. The architectural attribute declarations:

reference/@domains: (topic reference paramref-d)
parameters/@class:  (+ topic/simpletable reference/properties paramref-d/parameters )

Content instances generalize to any of the following combinations of modules:
topic and reference, topic

A specialized topic whose substructure requires a domain that extends the substructure of the base topic
The commandref topic marks up the reference for a command-line statement.
  • The <commandref> extension element specializes <reference> and has a <commandBody> substructure.
  • The <commandBody> content model lists the <parameters> element from the paramref module to identify the parameters of the command.

As a result, the commandref module depends on the combination of the reference topic and the paramref module. The paramref module retains its dependency on the reference topic. The architectural attribute declarations:

commandref/@domains: (topic reference paramref-d)
                     (topic reference+paramref-d commandref)
commandref/@class:   (- topic/topic       reference/reference  commandref/commandref  )
commandBody/@class:  (- topic/body        reference/refbody    commandref/commandBody )
parameters/@class:   (+ topic/simpletable reference/properties paramref-d/parameters    )

Topic instances generalize to any of the following combinations of modules:
topic and reference and paramref, topic and reference, topic

As part of unifying topics and domains, designers are encouraged

  • To provide parenthetical expressions in the domains attribute for topic modules as well as for domain modules. This approach provides a consistent representation when topic modules are constrained or combined with extension modules as dependencies.
  • To represent topics in nested positions based on the topic element name rather than a special infotypes name.

A future DITA release might consider whether defaulted class attributes should be refactored to distinguish extension and substructure elements rather than topic and domain elements.

Generalization

To determine compatibility of documents, processors can check the domains attribute. Document types with a superset of modules can accept instances of document types with a subset of modules.

Combination of base modules with extension modules doesn't affect such processing. Because a combination of modules only expresses a base dependency, checking the rightmost module in each parenthetical expression is sufficient to establish the list of available modules. As noted previously, because modules are combined only by means of extension elements, generalization cannot produce invalid instances.

As noted in the Generalization section of the Architectural specification (http://docs.oasis-open.org/dita/v1.1/OS/archspec/generalize.html), a generalizer process can take the list of source modules to be generalized from or the list of target modules to be generalized to or both as parameters for generalization. The generalizer must assume the following:

Source
The specified source module and all modules depending on the source module are invalid in the generalization output. Such depending modules include any modules that extend its substructure elements as well as any modules that include its extension elements in specialized content models. The base modules for the set of source modules and all of their ancestor modules are valid in the generalization output.

For example, if the programming domain module is declared as a source, a process can discover from its declaration in the domains attribute that the codeConcept module depends on the combination of the concept module and the programming domain extensions and thus is also invalid. Similarly, if the paramref module extends the <properties> element of the reference topic and the reference module is declared as a source, a process can discover that the paramref module is invalid from its declaration in the domains attribute.

Conversely, if the codeConcept module is declared as a source, the programming domain and concept topic are by implication valid. Similarly, if the paramref module is declared as a source, the reference topic is by implication valid.

A source list can never include base topic or map because they have no base modules.

Target
The specified target modules and all of their ancestor modules are valid in the generalization output. Descendent modules of the set of valid target modules (including any domain vocabulary elements that extend substructure elements) are implicitly invalid in the generalization output.

For example, if the programming domain module is declared to be a target, a process can discover from the domains attribute that the codeConcept module depends on the programming domain extensions and thus is by implication invalid. Similarly, if the reference topic is declared to be a target, a process can discover from the domains attribute that the paramref module depends on reference and is invalid by implication.

Conversely, if the codeConcept module is declared as a target, the programming domain and concept topic are by implication valid. Similarly, if the paramref module is declared as a target, the reference topic is by implication valid.

Unknown
Modules that are neither a source or target module nor their descendent or ancestor modules have no specified or implied disposition. Where such vocabulary modules extend the top element, they are assumed to be invalid. Otherwise, they are assumed to be valid.

For example, if the concept module has an unknown disposition, the codeConcept module must generalize as well. The elements specialize from the concept module generalize like all concept elements to base topic. The elements that specialize from the extension element of the programming module generalize to the programming module because that module is assumed to be valid. Similarly, if the reference topic has an unknown disposition, a process can discover from the domains attribute that the paramref module depends on reference and thus generalize all of its elements as well.

Draft comment:
The behavior described above for unknown modules is consistent with the DITA 1.1 specification and implementable. However, as long as we're reexamining the issue, would there be more value in having a simpler, consistent rule (either pessimistic for safety or optimistic for utility)?

Where a source module is a descendent of a target module, the target module takes precedence, invalidating all of its descendents. A target cannot be a descendent of a source module.

Draft comment:
The Architectural Specification specifies something slightly different for resolving conflicts, but that would appear to be unsafe. If a <javaClass> topic can contain only a <javaMethod> topic but the generalizer specifies a source of javaMethod and a target of reference, the <javaClass> would end up containing a <reference>, which would be invalid.

Conref

The same methods for determining compatibility of documents that apply during generalization also apply during conref.

Referencing Referenced Resolution
(topic concept+pr-d codeConcept)
(topic pr-d)
Prevented - the extension elements from the programming domain might appear in some base element contexts in the referenced document where they aren't allowed in the referencing document.
(topic pr-d)
(topic concept+pr-d codeConcept)
(topic pr-d)
Allowed - the extension elements from the programming domain can appear in any base element context in the referencing document.
(topic concept+pr-d codeConcept)
(topic)
Allowed - elements from the topic module are valid in all contexts in which they can be referenced.
(topic concept+pr-d codeConcept)
(topic concept+pr-d codeConcept)
Allowed - codeConcept elements including substructure elements from the programming domain are valid in all contexts in which they can be referenced.

Combined vocabulary modules and constraints

Constraints ordinarily apply to a single vocabulary module. For example, the following parenthetical expression declares the shortdescReq constraint on the topic module.

(topic shortdescReq-c)

Constraints can also be used with combinations of vocabulary modules:

A constraint can apply to a vocabulary module with a base dependency.

For example, the following parenthetical expression declares the commandShortdescReq constraint on the commandref module.

(topic reference+paramref-d commandref commandShortdescReq-c)
Compatible constraints on base modules can be declared.

Where vocabulary modules are combined, the constraint applies only to the base module. For example, the following parenthetical expression declares that commandref constrained by commandShortdescReq is consistent with the reference constrained by referenceShortdescReq-c and also with topic constrained by shortdescReq-c.

(topic shortdescReq-c reference+paramref-d referenceShortdescReq-c commandref
    commandShortdescReq-c)
A constraint require extension elements as part of redefining content models.

For example, a constraints module can override the reference topic, restricting the <proptype> element to contain only the <apiname> extension from the programming domain as a replacement for the <keyword> element. The following parenthetical expression declares this constraint:

(topic reference+pr-d apiproperty-c)

Instances of the <apiname> element generalize to the <keyword> element in document type shells that have the reference topic but not the programming domain.

Similarly, a constraint for the combination of the topic module and layout module can extend the <data> element with the <blockLayout> element in block contexts.

(topic+layout-d contextLayout-c)

Instances of the <blockLayout> element generalize to the <data> element in document types without the layout vocabulary module.

Note: Applying different constraints to the same elements in different document types can prevent conrefs that are allowed between other document types. Such differences for the same elements can raise an issue of usability for writers. Designers can prevent confusion for writers by applying constraints consistently across all documents that are authored as part of the same information set.

Schema and DTD Implementation

The proposal makes no changes in the implementation of specialization other than the rules described above for determining eligible base elements and content elements and for declaring module dependencies with parenthetical expressions in the domains attribute.

New or Changed Specification Language

Replaces the distinction between structural and domain specialization with the distinction between extension and substructure elements http://docs.oasis-open.org/dita/v1.1/CD02/archspec/ditaspecialization.html as described in this proposal.

Costs

  • Revising document type shells to list topic types in the domains attribute.
  • Revising the class attributes in Schema and DTD modules to distinguish extension and substructure elements rather than topic and domain elements.
  • Revising topic type modules to use topic element names to refer to nested topics instead of special info-type names (resolving the topic name by default to the value to the value of the existing info-type names).
  • Possibly refactoring the reference and task modules to move top-level specialized elements from the body into domains and to require the domain modules in the reference and task modules.
  • Revision to the specification.
  • Modifying the generalization and conref processes to check for combinations of vocabulary modules when determining compatibility of document instances.

Benefits

  • Provide document instances that are easier to understand because vocabularies have fewer elements and mark up the same content with the same element.
  • Reduces the implementation and maintenance effort in design, documentation, and processing by eliminating redundant elements.
  • Simplifies one aspect of the DITA architecture by removing the distinction between topic and domain modules.