DITA Proposed Feature #12008

Manage constraints on vocabularies

Longer description

Problem

DITA adopters have long requested the ability to restrict element content without changing the semantics or processing expectations. As the number of specialized vocabularies increases, an increasing need to restrict the choices presented to writers is an inevitable consequence.

Because specialization adds branches to the type hierarchy, specialization is not an optimal solution for this request. Using specialization to implement restriction pads the type hierarchy with types that don't introduce new semantics.

Solution

Allow constraints on the content of elements provided by vocabularies.

Definition: A constraint eliminates some of the possible instances of the vocabularies assembled by a document type without changing the semantics of those vocabularies.

The availability of constraints has benefits for the type hierarchy. Designers can specialize with general elements and loose content models and rely on constraints to provide a more guided authoring experience. Such flexible vocabularies provide a better base for subsequent specialization.

For an earlier version of this proposal, please see:

http://www.oasis-open.org/committees/download.php/14936/Issue34.html

Statement of Requirement

Constraints must allow the following kinds of restrictions:

Restricted content models
In DITA 1.1, an element has the same content model in all document type shells (with the exception of domain extensions).

A constraint should be able to simplify or enforce best practices for the element content. For instance, a constraint should be able to omit optional elements, restrict the range of occurrences for a position, turn a choice with more with one occurrence into a sequence, restrict the values of an attribute, remove an attribute, and so on.

The request to simplify the content model for the <section> element or block elements is common:

http://tech.groups.yahoo.com/group/dita-users/message/258

More formally, a content constraint imposes a restriction allowed under the rules of specialization without changing the semantics of the container element.

Selective domain extension
In DITA 1.1, domain extension introduces all root elements provided by the domain module in all contexts in which their base element appears. This requirement arises because DITA establishes vocabulary compatibility by module. For instance, if differing subsets of the highlight domain were allowed, two document instances might have the same domains declaration where one allowed <b> and <i> and the other allowed <u> (resulting in the need to generalize to for compatibility).

Also because vocabulary compatibility is established by module, domain extensions cannot replace a base element but can only add alternatives to the base element.

A constraint should be able to extend a base element with only some of the specialized elements provided by domain vocabularies. For example, a constrained domain might extend the <ph> element with the <b> and <i> elements from the highlight domain but not with the <sub>, <sup>, <tt>, or <u> elements.

In addition, a constraint should be able to replace the base element with the specialized domain elements. Effectively, such replacement makes the base element an abstract element in the context.

Here is a sample request for this capability:

http://tech.groups.yahoo.com/group/dita-users/message/524

DITA 1.1 allows document type shells to replace nested <topic> elements without providing a method by which processors can detect such restrictions. To formalize this practice, constraint support for replacement domains should also handle replacement of nested topics.

Beyond these specific requirements, the constraint capability must meet the following more general requirements:

  • Extend DITA management of document type compatibility to constrained and unconstrained vocabularies.
  • As much as possible, make it possible for processors to recognize compatibility for constraints that were imposed independently.
  • Use architectural attributes for all declarations.
  • Provide for backward compatibility.

Use Cases

Removal of block nesting
A DITA adopter supports a set of developers as authors for software reference content. The developers have no experience with structured authoring. The adopter wants to enable as simple an authoring experience as possible yet produce documents that have precise structure and semantics for processability and reuse.

The adopter performs the following actions:

  1. Defines a new document type shell that plugs in the reference topic type and software domain.
  2. Constrains the generic <ph> element to be replaced by its <filepath>, <userinput>, and <systemoutput> specializations so the <ph> element is no longer visible.
  3. Constrains the generic <keyword> element to be replaced by its <cmdname> and <varname> specializations so the <keyword> element is no longer available.
  4. Removes block elements like <ol> and <table> from the content models of text container elements like <lq> and <p>.
  5. Removes text and phrase elements from the content models of structural elements like <li> and <section>.

The adopter could constrain different document type shells for other authoring populations and write common processing against the unconstrained document type.

A required short description
A DITA adopter wants to enforce the best practice that all topics have a short description. The adopter constrains the content models of <topic> and its specializations to make short description mandatory.
Note: Similarly, the DITA 1.2 request for a more general task could be met by relaxing the content model of the existing <task> element and refactoring the existing task document type with a single constraint on the <taskbody> element that enforces the existing validation.

Similarly, if blocks were relaxed to allow self nesting to enable specialization, constraints could preserve that restriction in the existing document type shells.

Scope

This proposal has the following impacts:

  • Refactors the existing DTD vocabulary modules.
  • Requires enhancements to existing conref and generalization processors.

Technical Requirements

Constraint types

Constraints can be applied:

  • To the content model of any element provided by a vocabulary module by creating a reusable constraint module that wraps the vocabulary module and is referenced by the document type shell instead of the vocabulary module.

    An element content model can be constrained only by one constraint module. However, different constraint modules can constrain different elements from the same vocabulary module. These wrappers can be nested in an XSD implementation or sequenced in a DTD implementation.

  • To the integration of domain extensions by creating constraints in the document shell.

Content processing

A document type with constraints allows a subset of the possible instances of a document type for the same vocabularies without constraints. To put it another way, all instances of the constrained document type are guaranteed to be valid instances of the unconstrained document type.

As a result, a constraint doesn't change basic or inherited content processing. The constrained instances remain valid instances of the element type, and the element retains the same semantics and class attribute declaration. In other words, a constraint never creates a new case for content processing.

For instance, a document type constrained to require the <shortdesc> element allows a subset of the possible instances of the unconstrained document type with an optional <shortdesc> element. Thus, the content processing for topic still works when topic is constrained to require a short description.

Content interoperability

Currently, DITA document instances declare (by means of the domains attribute and the class attribute for the top element) the vocabularies available in its document type. A processor can examine these declarations to determine whether a document instance uses a subset of the vocabularies in another DITA document type and thus is compatible with that document type.

A constrained document type allows only a subset of the possible instances of the unconstrained document type. Thus, for a processor to determine whether a document instance is compatible with another document type, the document instance must declare any constraints on the document type.

For instance, an unconstrained task is compatible with an unconstrained topic because the task can be generalized to topic. If, however, the topic is constrained to require the <shortdesc> element, a document type with an unconstrained task is not compatible with the constrained document type because some instances of the task might not have a <shortdesc> element. If, however, the task document type has also been constrained to require the <shortdesc> element, it is compatible with the constrained topic document type.

As a result

  • Any document instance with constraints can be generalized to an instance of a different document type that both provides a subset of the vocabularies and imposes a subset of the constraints.
  • Any content from a document whose document type has constraints can be conreffed into a document with a subset of the constraints where the destination has a superset of the vocabularies of the source (or where the conref processor generalizes).

Thus, these operations require the declaration of constraints:

A generalization processor
Must be able to compare the constraints on the elements used in a document instance with the constraints on elements of a target document type (including base elements) to verify that the target document type is less restricted than the document instance.

No renaming of elements is needed to remove constraints.

A conref processor
Must be able to compare constraints on the referencing and referenced elements (including base elements). The processor must be able to verify that the elements in the referenced fragment are more restricted than the referencing document instance.

Declaring constraints

The DITA architecture adds a new constraints attribute to declare the constraints applied to the content of elements to enable compatibility checking by processors. The constraints attribute provides a list of parenthetical expressions, each of which declares constraints on either an element's content model or on the global availability of an element.

Figure 1. Formal syntax
1  constraint-declaration ::= '(' S? content-constraint | domain-constraint S? ')'
2  content-constraint     ::= vocabularyModule '/' element S '=' (S constraintName)+
3  domain-constraint      ::= '*' S? '=' S '-' (S element)+
  • 2. The element is a container whose content model or attribute list is restricted to be consistent with the declared constraints. S represents the standard whitespace characters specified by http://www.w3.org/TR/REC-xml/#NT-S. The first constraint name identifies the module implementing the constraint. Other constraint names indicate constraints on the same element that are known to be less restrictive (and thus compatible).
  • 3. The list of elements indicates elements (either base elements or possible domain extensions) removed from the document type in favor of some domain extensions.

Here is an example:

constraints="
    (*             = - ph tt)
    (topic/topic   = simpleTopic shortdescReq noRelLink)
    (topic/section = simpleTopic)
    (topic/example = simpleTopic)
    (topic/task    = simpleTask simpleTopic)
    (topic/prereq  = simpleTask simpleTopic)
    (topic/context = simpleTask simpleTopic)
    (topic/result  = simpleTask simpleTopic)
    (topic/postreq = simpleTask simpleTopic)"

The previous constraints attribute declared the following constraints:

  • The <ph> element is replaced by its domain extensions.
  • The <tt> element is not provided as part of the highlight domain extensions.
  • The content models of the <topic>, <section>, and <example> elements are constrained by the simpleTopic constraint module.
  • The constraints applied to the <topic> element are at least as restrictive as the constraints applied to the <topic> element by the shortdescReq and noRelLink constraints.
  • The content models of the <task>, <prereq>, <context>, <result>, and <postreq> elements are constrained by the simpleTask constraint module.
  • The constraints applied to these elements are at least as restrictive as the constraints applied to base elements by the simpleTopic constraint.
Note: Each constraints module may constrain elements from only one vocabulary module. In the example, the simpleTopic constraint module restricts elements from the topic vocabulary module. The simpleTask constraint module restricts elements from the task vocabulary module.
Note: The content model and attributes of one element can be constrained only by one module. Different constraints modules, however, can constrain different elements from the same vocabulary module (either by nesting constraint redefines in Schema or by sequencing constraint inclusion in DTDs).
Note: In two circumstances, a designer should declare the constraints on an element to be at least as restrictive as other constraints:
  1. When enforcing more restrictions on an element than an existing constraints module. In the example, the simpleTopic constraint on the <topic> element enforces more restrictions on <topic> than the shortdescReq constraint (guaranteeing that conref will be valid).
  2. When constraining a specialized element to be compatible with the base element. In the example, the simpleTask constraint on the <task> element is declared compatible with the simpleTopic constraint.

In the same way that the designer bears the responsibility of implementing a specialized content model consistent with its base module, the designer also bears the responsibility of implementing a constrained content model consistent with a less constrained content model. In the example, if the shortdescReq constraints module becomes more restrictive than the simpleTopic constraints module, the maintainer of the simpleTopic constraints module must either enforce the same constraints or remove the declaration.

Draft comment:
Is it acceptable to require processors to look up the topic/topic constraint compatibility? Or should task/task declare the shortdescReq and noRelLink constraints?

The following example declares that the base topic element has been replaced by a specialized topic element. The existing concept, reference, and task document types can add this declaration so conref processors can detect problems with conref to a task containing a base <topic> topic (for instance, from a ditabase document instance):

constraints="(* = - topic)"

Thus, to determine compatibility, the conref processor can check the constraints attribute to confirm:

  1. That the content models and attributes in the referencing topic or map are no more restrictive than in the referenced topic or map.
  2. That the elements removed in the referencing topic or map are a subset of those removed in the referenced topic or map.

A more sophisticated conref processor can perform a fine-grained check of the constraints on the referencing placeholder and the referenced content fragment.

The generalization transform preserves the constraints attribute during roundtripping respecialization by normalizing the attribute. A document instance can be converted from a constrained document type to a document type with a subset of the constraints merely by switching the binding of the document instance to the less restricted schema (which would also have a different constraints attribute declaration).

Similar to domains, the rootname for a constraints module has a required "Constraints" infix that doesn't appear in references to the constraints module. For instance, "simpleTopic" is the qualifier corresponding to the simpleTopicConstraint.xsd Schema module and the simpleTopicConstraint.mod DTD module.

Note: The content-model constraints cannot be declared on the element itself because a conref processor must be able to inspect constraints on the content model even if an instance of the element isn't present in the referencing document.
Draft comment:
An XSLT 2 function library could convert the DITA domains and constraints architectural attribute values to fully parsed XML result structures for easier processing.

Schema Implementation

The basic strategy for implementing constraints in schemas is as follows:

  • Redefine the complex type for an element in a reusable constraints module.
  • In the document type shell, include the constraints module instead of the vocabulary module that it wraps.
  • Also in the document type shell, omit elements from the groups for domain extension.
  • Also in the document type shell, declare the constraints in architectural attributes in the schema shell (similar to the declaration of the domains attribute).
Figure 2. highlightNott.xsd. This module declares restrictions on the highlight domain set:
...
<xs:group name="hi-d-ph-nott">
  <xs:choice>
    <xs:element ref="b"/>
    <xs:element ref="i"/>
    <xs:element ref="sub"/>
    <xs:element ref="sup"/>
    <xs:element ref="u"/>
  </xs:choice>
</xs:group>
...
Figure 3. simpleTopicConstraint.xsd. This module implements constraints on the <topic> element.
...
<xs:redefine schemaLocation="topicMod.xsd">
  <!-- constrain content and attributes of <topic> element -->
  <xs:complexType name="topic.class">
    <xs:complexContent>
      <xs:restriction base="topic.class">
        <xs:sequence>
          <xs:group ref="title"/>
          <xs:group ref="titlealts" minOccurs="0"/>
          <!-- make required -->
          <xs:choice>
            <xs:group ref="shortdesc" />
            <xs:group ref="abstract" />
          </xs:choice>
          <xs:group ref="prolog" minOccurs="0"/>
          <xs:group ref="body" minOccurs="0"/>
          <!-- remove <related-links> -->
          <xs:group ref="topic-info-types" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>
  ...
</xs:redefine>
...
Figure 4. simpleTopic.xsd (shell). This shell assembles the constraint module instead of the wrapped vocabulary module.
...
<xs:include schemaLocation="highlightNott.xsd"/>
...
<xs:redefine schemaLocation="commonElementGrp.xsd">
  <xs:group name="ph">
    <!-- drop base <ph> as well as apply nott subset of highlight domain -->
    <xs:choice>
      <xs:group ref="hi-d-ph-nott"/>
    </xs:choice>
  </xs:group>
  ...
</xs:redefine>

<xs:redefine schemaLocation="simpleTopicConstraint.xsd">
  <xs:complexType name="topic.class">
    <xs:complexContent>
      <xs:extension base="topic.class">
        <xs:attribute name="constraints" type="xs:string" default="
            (*           = - ph tt)
            (topic/topic = simpleTopic shortdescReq noRelLink)"/>
        ...
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
  ...
</xs:redefine>
...
Note: Constraints implemented on a base element cannot be reused directly but, instead, must be implemented separately in a consistent way. This limitation is similar to the requirement to implement a consistent content model for a specialization rather than reuse the base model directly.
Note: Attributes for an element can be constrained as part of the redefinition of the complex type.
Draft comment:
An XSLT could generate the constraints attribute from the schema or check the constraints attribute against the schema.

DTD Implementation

The basic strategy for implementing constraints in DTDs is as follows:

  • Introduce an entity for the content model and for the attributes of each element (excluding housekeeping attributes).
  • Predefine the content model entity or attributes entity for an element in reusable constraints module.
  • In the document type shell, include the constraints module before the vocabulary module that it overrides by predefinition.
  • Also in the document type shell, omit elements from the entities for domain extension.
  • Also in the document type shell, declare the constraints in architectural attributes in the DTD shell (similar to the declaration of the domains attribute).
Figure 5. topic.mod. This vocabulary module provides entities for content models as well as elements.
...
<!ENTITY % topic.content  "((%title;), (%titlealts;)?, (%shortdesc;|%abstract;)?,
        (%prolog;)?, (%body;)?, (%related-links;)?, (%topic-info-types;)*)">
<!ENTITY % topic.attributes
            "id          ID                                #REQUIRED
             conref      CDATA                             #IMPLIED
             %select-atts;
             %localization-atts;
             outputclass CDATA                             #IMPLIED">
...
<!ELEMENT topic  %topic.content;>
<!ATTLIST topic  %topic.attributes;>
<!ATTLIST topic
             %arch-atts;
             domains    CDATA                    "&included-domains;">
...
Figure 6. highlightNott.ent. This module declares restrictions on the highlight domain set.
<!ENTITY % hi-d-ph-nott  "b | u | i | sup | sub">

<!-- elements removed by all subset entities -->
<!ENTITY   hi-d-nott  "tt">
Figure 7. simpleTopicConstraint.mod. This module implements constraints on the <topic> element.
<!ENTITY  topic-constraints  "(topic/topic = simpleTopic shortdescReq noRelLink)">
...
<!ENTITY % topic.content "((%title;), (%titlealts;)?, (%shortdesc;|%abstract;), 
        (%prolog;)?, (%body;)?, (%topic-info-types;)*)">
...
Figure 8. simpleTopic.dtd (shell). This shell assembles the constraint module instead of the wrapped vocabulary module.
...
<!ENTITY % hi-d-nott  SYSTEM "highlightNott.ent">
%hi-d-nott;
...
<!-- drop base <ph> as well as apply nott subset of highlight domain -->
<!ENTITY % ph "%hi-d-ph-nott;">
...
<!ENTITY % simpleTopic-constraint  SYSTEM  "simpleTopicConstraint.mod">
%simpleTopic-constraint;
...
<!ENTITY included-constraints "(* = - ph &hi-d-nott;)
                               &topic-constraints;">
...
<!ENTITY % topic-type  SYSTEM  "topic.mod">
%topic-type;
...
Note: Attributes for an element can be constrained by predefining the topic.attributes entity similar to the predefinition of the topic.content entity.

New or Changed Specification Language

Adding a new branch under http://docs.oasis-open.org/dita/v1.1/CD02/archspec/ditaspecialization.html that describes constraints based on this proposal.

Costs

  • Revising the DTD modules to provide the content model and attribute list entities for each element.
  • Adding the domain constraint for base topic to the concept, reference, and task document types.
  • Refactoring the task module to be more general, creating a constraint that restores the existing constraints, and refactoring the existing task document type to use the constraint.
  • Possibly offering optional constraint modules that simplify <section> and block content models.
  • Revision to the specification.
  • Modifying the generalization and conref processes to be check the constraints attribute when determining compatibility of document instances.

Benefits

  • More selective use of domains
  • More efficient specialization through separation of restrictions
  • Better authoring experience

Deferred requirements

Requirements

Contextual domains
In DITA 1.2, constraints on domain extension apply to all contexts in which their base element appears.

It would be quite useful for design flexibility to restrict a domain extension to a subset of the contexts where the base element can appear. For instance, because the <data> element appears in so many contexts, domain extension is essentially unusable for the <data> element. Yet, taken in isolation, the <data> element should provide a useful base for a plethora of reusable elements.

Here is a sample request for this capability:

http://www.oasis-open.org/apps/org/workgroup/dita/email/archives/200504/msg00011.html.

Because DITA 1.1 treats global attributes as domains, attributes that are specific to a set of elements can be treated as a special case of constraining a domain to a subset of the possible contexts for extension.

Implementing context-specific XSD groups or DTD entities, however, would impose an unsupportable burden on the designer.

Use cases

Hybrid documents with discourse and business data
A DITA adopter needs to create hybrid documents that combine discourse and business data as part of contracts, analysis reports, and so on.

To represent the business data, the adopter would like to make use of the UN/CEFACT CCTS (Core Component Technical Specification), which provides a standard vocabulary (adopted by UBL and OAGIS) for business data starting with atomic data like price and quantity and aggregating composite data structures like order contract and product item.

http://xml.coverpages.org/ni2007-04-20-a.html#excerpts

The adopter performs the following actions:

  1. Specializes the <data> element to define elements for both the atomic and aggregate CCTS data.
  2. Packages the CCTS atomic and aggregate data elements in domains.
  3. Plugs the CCTS data domains into the shells for hybrid documents, using constraints so the CCTS elements appear only in the <data> contexts at the top of the appropriate sections (instead of in all <data> contexts).
Style policies
A DITA toolkit extender wants to create a method for easier styling without weakening the salutory separation of presentation and content. In this approach (which has been presented by John Hunt), the extender supports creation of documents that match content by example and specify the style properties of the matched content. The extender makes use of a restricted <data> specialization so adopters can:
  1. Specialize the <data> element to define appropriate style properties for block, phrase, and other contexts.
  2. Package the style properties as a domain.
  3. Plug the style domain into a document type shell that also includes the topic and domain vocabularies for a content document type, using constraints so that the block style properties appear only in block contexts and the phrase style properties appear only in phrase contexts (instead of all properties appearing in all <data> contexts).
  4. Process the example to apply its styles to the corresponding content during formatting.