DITA Proposed Feature #12008

Manage constraints on vocabularies

Longer description

Problem

DITA adopters have long requested the ability to restrict element content without changing the semantics or processing expectations. As the number of specialized vocabularies increases, an increasing need to restrict the choices presented to writers is an inevitable consequence.

Because specialization adds branches to the type hierarchy, specialization is not an optimal solution for this request. Using specialization to implement restriction pads the type hierarchy with types that don't introduce new semantics.

Solution

Allow constraints on the content of elements provided by vocabularies.

Definition: A constraint eliminates some of the possible instances of the vocabularies assembled by a document type without changing the semantics of those vocabularies. In the same way that a specialized vocabulary is interoperable with its base vocabulary, a constrained vocabulary is interoperable with the unconstrained vocabulary.

The availability of constraints has benefits for the type hierarchy. Designers can specialize with general elements and loose content models and rely on constraints to provide a more guided authoring experience. Such flexible vocabularies provide a better base for subsequent specialization.

By extending the existing DITA design pattern to constrain vocabularies, the DITA architecture realizes the following benefits:

  • Raises awareness of the importance of customizations that don't break the inheritance of processing or violate interoperability.
  • Provides a method for reuse of constraints across document type shells and sharing of constraints across organization boundaries.
  • Makes constraints on document instances visible to people and processors including processors that must check consistency of document instances (such as conref).

For an earlier version of this proposal, please see:

http://www.oasis-open.org/committees/download.php/14936/Issue34.html

Statement of Requirement

Constraints must allow the following kinds of restrictions:

Restricted content models
In DITA 1.1, an element has the same content model in all document type shells (with the exception of domain extensions).

A constraint should be able to simplify or enforce best practices for the element content. For instance, a constraint should be able to omit optional elements, restrict the range of occurrences for a position, turn a choice with more with one occurrence into a sequence, restrict the values of an attribute, remove an attribute, and so on.

The request to simplify the content model for the <section> element or block elements is common:

http://tech.groups.yahoo.com/group/dita-users/message/258

More formally, a content constraint imposes a restriction allowed under the rules of specialization without changing the semantics of the container element.

Selective domain extension
In DITA 1.1, domain extension introduces all root elements provided by the domain module in all contexts in which their base element appears. This requirement arises because DITA establishes vocabulary compatibility by module. For instance, if differing subsets of the highlight domain were allowed, two document instances might have the same domains declaration where one allowed <b> and <i> and the other allowed <u> (resulting in the need to generalize to for compatibility).

Also because vocabulary compatibility is established by module, domain extensions cannot replace a base element but can only add alternatives to the base element.

A constraint should be able to extend a base element with only some of the specialized elements provided by domain vocabularies. For example, a constrained domain might extend the <ph> element with the <b> and <i> elements from the highlight domain but not with the <sub>, <sup>, <tt>, or <u> elements.

In addition, a constraint should be able to replace the base element with the specialized domain elements. Effectively, such replacement makes the base element an abstract element in the context.

Here is a sample request for this capability:

http://tech.groups.yahoo.com/group/dita-users/message/524

DITA 1.1 allows document type shells to replace nested <topic> elements without providing a method by which processors can detect such restrictions. To formalize this practice, constraint support for replacement domains should also handle replacement of nested topics.

Use Cases

Removal of block nesting
A DITA adopter supports a set of developers as authors for software reference content. The developers have no experience with structured authoring. The adopter wants to enable as simple an authoring experience as possible yet produce documents that have precise structure and semantics for processability and reuse.

The adopter performs the following actions:

  1. Defines a new document type shell that plugs in the reference topic type and software domain.
  2. Constrains the generic <ph> element to be replaced by its <filepath>, <userinput>, and <systemoutput> specializations so the <ph> element is no longer visible.
  3. Constrains the generic <keyword> element to be replaced by its <cmdname> and <varname> specializations so the <keyword> element is no longer available.
  4. Removes block elements like <ol> and <table> from the content models of text container elements like <lq> and <p>.
  5. Removes text and phrase elements from the content models of structural elements like <li> and <section>.

The adopter could constrain different document type shells for other authoring populations and write common processing against the unconstrained document type.

A required short description
A DITA adopter wants to enforce the best practice that all topics have a short description. The adopter constrains the content models of <topic> and its specializations to make short description mandatory.
Note: Similarly, the DITA 1.2 request for a more general task could be met by relaxing the content model of the existing <task> element and refactoring the existing task document type with a single constraint on the <taskbody> element that enforces the existing validation.

Similarly, if blocks were relaxed to allow self nesting to enable specialization, constraints could preserve that restriction in the existing document type shells.

Scope

This proposal has the following impacts:

  • Refactors the existing DTD vocabulary modules.
  • Requires enhancements to existing conref and generalization processors.

Technical Requirements

Constraint implementation

The proposed constraints would be implemented as follows

Restriction of content model or attributes for an element
A reusable constraint module can redefine the class for an element's content model and attributes in XML Schema or predefine the entity for an element's content model or attributes in DTD. For inclusion in the document type shell, constraint modules can be nested in an XML Schema implementation (wrapping the vocabulary module) or sequenced in a DTD implementation before the vocabulary module.
Restriction of extension elements from a domain
A reusable constraint module can define a subset list of extension elements as a group in XML Schema or as an entity in DTD. The document type shell can use the group or entity when extending the base element.
Replacement of base elements by domain extensions
The document type shell can omit the base element when extending it.

The implementation of constraints on different particles of the content model for one element cannot be combined. That is, constraints implementations for a specific element cannot be aggregated.

Constraint rules

The following rules apply to constraints modules:

  • In the same way that the designer bears the responsibility of implementing a specialized content model that's at least as restrictive as its base module, the designer bears the responsibility of implementing a constrained content model that's more restrictive than the unconstrained content model for the same element.

  • The content model and attributes of one element can be constrained only by one constraints module included in a document type shell. Other shells may include different constraints modules that restrict the same element in a different way.

  • The list of extension elements provided by a domains module can be constrained only by one constraints module included in a document type shell. Other shells may include different constraints modules that restrict the list of extension element for the same domain in a different way.

  • Each constraints module may constrain elements from only one vocabulary module. This rule maintains granularity of reuse at the module level.

  • Constraints modules that restrict different elements within the same vocabulary module can be combined with one another or with a constraints module that selects a subset of the extension elements for the vocabulary. Such combinations of constraints on a single vocabulary module have no meaningful order or precedence.

  • Designers have the option to declare a constraints module or combination of constraints modules to be more restrictive than another constraints module or combination of constraints modules on the same vocabulary module or a base vocabulary module. This option is particularly useful when a designer wants to constrain base and specialized elements in a consistent way. The advantage of declaring the consistency is that processors can take advantage of the consistency when converting document instances.

Content processing

A document type with constraints allows a subset of the possible instances of a document type for the same vocabularies without constraints. To put it another way, all instances of the constrained document type are guaranteed to be valid instances of the unconstrained document type.

As a result, a constraint doesn't change basic or inherited content processing. The constrained instances remain valid instances of the element type, and the element retains the same semantics and class attribute declaration. In other words, a constraint never creates a new case for content processing such as output formatting.

For instance, a document type constrained to require the <shortdesc> element allows a subset of the possible instances of the unconstrained document type with an optional <shortdesc> element. Thus, the content processing for topic still works when topic is constrained to require a short description.

Content interoperability

Currently, DITA document instances declare (by means of the domains attribute and the class attribute for the topic or map elements) the vocabularies available in its document type. A processor can examine these declarations to determine whether a document instance uses a subset of the vocabularies in another DITA document type and thus is compatible with that document type.

A constrained document type allows only a subset of the possible instances of the unconstrained document type. Thus, for a processor to determine whether a document instance is compatible with another document type, the document instance must declare any constraints on the document type.

For instance, an unconstrained task is compatible with an unconstrained topic because the task can be generalized to topic. If, however, the topic is constrained to require the <shortdesc> element, a document type with an unconstrained task is not compatible with the constrained document type because some instances of the task might not have a <shortdesc> element. If, however, the task document type has also been constrained to require the <shortdesc> element, it is compatible with the constrained topic document type.

Declaring constraints

To allow processors to detect constraints, the domains attribute lists constrain modules as well as vocabulary modules (such as topic or domain modules). The rules for declaring constraints modules with parenthetical expressions in the domains attribute are as follows:

  • Each constraints modules included by the document type shell must appear as the rightmost token in one parenthetical expression.
  • The constrained vocabulary module must appear immediately to the left of the included constraints module. Constrained vocabulary modules can include topic modules as well as domain modules.
  • A constrained vocabulary module must not appear as the rightmost token in a parenthetical expression.
  • Where the constraint on a vocabulary module consists of replacement of a base element with its domain extensions in the document type shell, the named constraint doesn't have a corresponding implementing module (see the noBasePhrase example below).
  • The designer can declare compatibility with constraints that aren't included in the document type shell by listing those less restrictive constraints in the parenthetical expression (see the simpleTaskSection and strictTopic examples below).

The root name for a constraint is formed by removing the extension and "Constraints" infix from the module filename. In declaration contexts such as the domains attribute, the "-c" suffix is added. Thus, the shortdescReqConstraints.xsd Schema or shortdescReqConstraints.mod DTD implementation of a constraint has the root name of "shortdescReq" and the declaration name of "shortdescReq-c".

Here are some examples of constraints module declarations as qualifications on vocabulary modules:

Constraining element content in a topic vocabulary module
The shortdescReq constraints module redefines the content model of the <topic> element so that the <shortdesc> element is required.

The domains attribute declaration:

(topic shortdescReq-c)
Constraining element content in a domain vocabulary module
The noNestedHighlight constraints module redefines the content models of the highlight elements to prevent self-nesting. For example, the constrained content model for the <b> element replaces the nested <ph> element with the <i>, <u>, <sub>, and <sup> elements.

The domains attribute declaration:

(topic hi-d noNestedHighlight-c)
Integrating a subset of the extension elements from a domain module
The basicHighlight constraints module includes the <b> and <i> elements but not the <u>, <sub>, <sup>, and <tt> elements.

The domains attribute declaration:

(topic hi-d basicHighlight-c)
Applying multiple constraints to a single vocabulary module
The simpleSection constraints module redefines the content models of the <section> and <example> elements to allow a single initial <title> element and to remove text and phrase elements. Because this constraints module redefines different elements than the shortdescReq constraints module, both modules can apply to the topic module. The order in which the constraints modules are listed is not significant.

The domains attribute declaration:

(topic shortdescReq-c)
(topic simpleSection-c)
Constraining and integrating a subset of a domain
Because the noNestedHighlight constraints module redefines content models and the basicHighlight constraints module subsets extension, these constraints don't conflict in attempting to revise the same content model and thus can be combined.

The domains attribute declaration:

(topic hi-d noNestedHighlight-c)
(topic hi-d basicHighlight-c)
A topic with elements replaced by domain extensions
A document type shell replaces the <ph> element with extension elements from the highlighting and programming domains. Because the highlighting and programming domains cannot be generalized to a topic without the <ph> element, the removal constraint must be declared on the topic module with a separate parenthetical expression.

The domains attribute declaration:

(topic noBasePhrase-c)
(topic hi-d)
(topic pr-d)

For another example, the concept document type customizes the content model of <concept> to allow extension of the nested <topic> element only by other concept topics. This restriction could be declared by the domains attribute as follows:

(topic concept nestedConcept-c)
Declaring compatibility of specialized and base constraints
The simpleTaskSection constraints module redefines the content models of the <prereq>, <context>, <result>, and <postreq> elements to remove text and phrase elements. These content models are thus consistent with the content model of the <section> element as constrained by the simpleSection module.

The designer can declare the compatibility of the simpleTaskSection constraints module with the simpleSection constraints module so processors know that instances can be safely generalized to the topic module constrained by simpleSection. The designer bears the responsibility of determining that any instance of the constrained task specializations of <section> are valid for the constrained topic <section>.

The domains attribute declaration:

(topic simpleSection-c task simpleTaskSection-c)

By definition, an instance of task constrained by simpleTaskSection can always generalize to task or topic (the unconstrained vocabulary modules).

Note that the example doesn't imply that task is consistent with topic constrained by simpleSection. A vocabulary module never has a relationship with a constrained version of another vocabulary module. That is, constraints always augment the basic relations between vocabularies.

Declaring compatibility with other constraints on the same module
The strictTopic constraints module redefines the content model for the <topic> element to require the <shortdesc> element and to remove the <related-links> element and nested <topic> elements. Thus, the strictTopic constraints module is more restrictive than the shortdescReq constraints module.

A designer who knows about the shortdescReq constraints module has the option to declare the compatibility of the strictTopic constraints module with the shortdescReq constraints module so processors know that instances can be safely converted to the less restrictive schema.

The domains attribute declaration:

(topic shortdescReq-c strictTopic-c)

Note the difference from the earlier example of shortdescReq combined with simpleSection. Because the shortdescReq constraint isn't declared in the rightmost position, it doesn't constrain topic in this document type shell. Again, the designer would only want to declare this compatibility when needing interchange with a different document type shell that applies the shortdescReq constraint.

Draft comment:
An alternative syntax would be to append the constraint to the vocabulary module to indicate that a constraint always qualifies a vocabulary module, as in (topic~shortdescReq) and (topic~shortdescReq~simpleSection).

Conref and generalization processing

To determine compatibility between two document instances, a conref processor can check the domains attribute to confirm that

  • The referencing document has a superset of the vocabulary modules in the referenced document.
  • For each vocabulary module in the referenced document, the referencing document qualifies the common module with a subset of the constraints in the referenced document.

Some examples:

Referencing Referenced Resolution
(topic)
(topic shortdescReq-c)
Allowed - content model of referenced topic is more constrained
(topic shortdescReq-c)
(topic)
Prevented - content model of referenced topic is less constrained
(topic hi-d)
(topic hi-d basicHighlight-c)
Allowed - domain extension list of referenced document type shell is more constrained
(topic hi-d basicHighlight-c)
(topic hi-d)
Prevented - domain extension list of referenced document type shell is less constrained.
(topic hi-d)
(topic noBasePhrase-c)
(topic hi-d)
Allowed - referencing document type shell doesn't replace base element with domain extensions.
(topic noBasePhrase-c)
(topic hi-d)
(topic hi-d)
Prevented - referencing document type shell does replace base element with domain extensions.
(topic task)
(topic hi-d basicHighlight-c)
(topic simpleSection-c task simpleTaskSection-c)
Allowed - referencing shell has a subset of the constraints of the referenced shell on the common vocabulary modules.
(topic shortdescReq-c task shortdescTaskReq-c)
(topic hi-d basicHighlight-c)
(topic simpleSection-c task simpleTaskSection-c)
Prevented - referencing shell has constraints on common vocabulary modules that aren't in the referenced shell.

Similarly, to determine compatibility between a document instance and a target document type, a generalization processor can use the domains and class attributes for the document instance and the domains attribute for the target document type to determine how to rename elements in the document instance. For each element instance, the generalization processor:

  • Iterates over the class attribute on the element instance from specific to general, inspecting the vocabulary modules.

  • Looks for the first vocabulary module that is both present in the target document type and that has a subset of the constraints in the document instance.

    If a module is found in the target document type, that module becomes the minimum threshhold for the generalization of contained element instances.

    If a module is not found, the document instance cannot be generalized to the target document type and, instead, can only be generalized to a less constrained document type.

Note that a document instance can always be converted from a constrained document type to an unconstrained document type merely by switching the binding of the document instance to the less restricted schema (which would also have a different domains attribute declaration). No renaming of elements is needed to remove constraints.

Draft comment:
An XSLT 2 function library could convert the DITA domains and constraints architectural attribute values to fully parsed XML result structures for easier processing.

Schema Implementation

The basic strategy for implementing constraints in schemas is as follows:

  • Redefine the complex type for an element to restrict its content model or attributes in a reusable constraints module.
  • Define a group with a subset list of extension elements for a domain in a reusable constraints module.
  • In the document type shell, include the content constraints module instead of the vocabulary module that it wraps.
  • Also in the document type shell, include the extension subset constraints module and use that group for domain extension.
  • Also in the document type shell, declare the constraints in the domains attribute.
Figure 1. basicHighlightConstraint.xsd. This module declares restrictions on the highlight domain set:
...
<xs:group name="basicHighlight-c-ph">
  <xs:choice>
    <xs:element ref="b"/>
    <xs:element ref="i"/>
  </xs:choice>
</xs:group>
...
Figure 2. strictTopicConstraint.xsd. This module implements constraints on the <topic> element.
...
<xs:redefine schemaLocation="topicMod.xsd">
  <!-- constrain content and attributes of <topic> element -->
  <xs:complexType name="topic.class">
    <xs:complexContent>
      <xs:restriction base="topic.class">
        <xs:sequence>
          <xs:group ref="title"/>
          <xs:group ref="titlealts" minOccurs="0"/>
          <!-- make required -->
          <xs:choice>
            <xs:group ref="shortdesc" />
            <xs:group ref="abstract" />
          </xs:choice>
          <xs:group ref="prolog" minOccurs="0"/>
          <xs:group ref="body" minOccurs="0"/>
          <!-- remove <related-links> -->
          <xs:group ref="topic-info-types" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  ...
</xs:redefine>
...
Figure 3. strictTopic.xsd (shell). This shell assembles the constraint module instead of the wrapped vocabulary module.
...
<xs:include schemaLocation="basicHighlightConstraint.xsd"/>
...
<xs:redefine schemaLocation="commonElementGrp.xsd">
  <xs:group name="ph">
    <!-- drop base <ph> as well as apply basic subset of highlight domain -->
    <xs:choice>
      <xs:group ref="basicHighlight-c-ph"/>
    </xs:choice>
  </xs:group>
  ...
</xs:redefine>

<xs:redefine schemaLocation="strictTopicConstraint.xsd">
  <xs:complexType name="topic.class">
    <xs:complexContent>
      <xs:extension base="topic.class">
        <!-- declare the constraint of topic and highlight vocabulary modules
             and compatibility of constrained highlight with subset of 
             topic constraints -->
        <xs:attribute name="domains" type="xs:string"
            default="(topic noBasePhrase-c)
                     (topic strictTopic-c)
                     (topic strictTopic-c hi-d basicHighlight-c)"/>
        ...
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  ...
</xs:redefine>
...
Note: Attributes for an element can be constrained as part of the redefinition of the complex type.

DTD Implementation

The basic strategy for implementing constraints in DTDs is as follows:

  • Introduce an entity for the content model and for the attributes of each element (excluding the architectural attributes).
  • Predefine the content model or attributes entities for an element in a reusable constraints module.
  • Predefine the extension list and domains attribute parenthetical expression for a domain in a reusable constraints module.
  • In the document type shell, include the constraints module before the vocabulary module that it overrides by predefinition.
  • Also in the document type shell, includes the extension subset constraints module before the domains declaration that it overrides by predefinition.
  • Also in the document type shell, declare the constraints in the domains architectural attribute.
Figure 4. topic.mod. This vocabulary module provides entities for content models as well as elements.
...
<!ENTITY % topic.content  "((%title;), (%titlealts;)?, (%shortdesc;|%abstract;)?,
        (%prolog;)?, (%body;)?, (%related-links;)?, (%topic-info-types;)*)">
<!ENTITY % topic.attributes
            "id          ID                                #REQUIRED
             conref      CDATA                             #IMPLIED
             %select-atts;
             %localization-atts;
             outputclass CDATA                             #IMPLIED">
...
<!ELEMENT topic  %topic.content;>
<!ATTLIST topic  %topic.attributes;>
<!ATTLIST topic
             %arch-atts;
             domains    CDATA                    "&included-domains;">
...
Figure 5. basicHighlightConstraint.ent. This module declares restrictions on the highlight domain set.
<!ENTITY % basicHighlight-c-ph  "b | i">

<!ENTITY   basicHighlight-c-att   "(topic hi-d basicHighlight-c)">
Figure 6. strictTopicConstraint.mod. This module implements constraints on the <topic> element.
<!ENTITY  topic-constraints  "(topic strictTopic-c)">
...
<!ENTITY % topic.content "((%title;), (%titlealts;)?, (%shortdesc;|%abstract;), 
        (%prolog;)?, (%body;)?, (%topic-info-types;)*)">
...
Figure 7. strictTopic.dtd (shell). This shell assembles the constraint module instead of the wrapped vocabulary module.
...
<!ENTITY % basicHighlight-c-dec  SYSTEM "basicHighlightConstraint.ent">
%basicHighlight-c-dec;
...
<!-- drop base <ph> as well as apply the basic subset of highlight domain -->
<!ENTITY % ph "%basicHighlight-c-ph;">
...
<!ENTITY % strictTopic-c-def  SYSTEM  "strictTopicConstraint.mod">
%strictTopic-c-def;
...
<!-- declare the constraint of topic and highlight vocabulary modules and
     compatibility of constrained highlight with subset of topic constraints -->
<!ENTITY included-domains "(topic noBasePhrase-c)
                           (topic strictTopic-c)
                           (topic strictTopic-c hi-d basicHighlight-c)">
...
<!ENTITY % topic-type  SYSTEM  "topic.mod">
%topic-type;
...
Note: Attributes for an element can be constrained by predefining the topic.attributes entity similar to the predefinition of the topic.content entity.

New or Changed Specification Language

Adding a new branch under http://docs.oasis-open.org/dita/v1.1/CD02/archspec/ditaspecialization.html that describes constraints based on this proposal.

Costs

  • Revising the DTD modules to provide the content model and attribute list entities for each element.
  • Possibly refactoring the task module to be more general, creating a constraint that restores the existing content models, and refactoring the existing task document type to apply the constraint.
  • Possibly offering optional constraint modules that simplify <section> and block content models.
  • Revision to the specification.
  • Modifying the generalization and conref processes to check constraints qualifiers in the domains attribute expressions when determining compatibility of document instances.

Benefits

  • More selective use of domains
  • More efficient specialization through separation of restrictions
  • Better authoring experience