DITA Proposed Feature 34

Constraints - an architectural enhancement for restriction without specialization (was replacement domains).

Longer description

The problem: Adopters have raised many requirements to restrict element content models or attributes without changing the semantic or processing of an element.

To meet these requirements through specialization, however, adopters would have to specialize every DITA topic type. Worse, they would no longer have ancestry relationships between these specialized topic types. For instance, they wouldn't be able to generalize a specialized task to a specialized topic.

The solution:

Enhance the DITA architecture with the notion of a constraint and the ability to declare, implement, and manage constraints on existing elements.

Definition: A constraint defines a set of restrictions that conform to the rules of specialization.

As in specialization, a constraint can modify a content model to remove an optional element, require an optional element, substitute a specialized element for a base element, and so on. A constraint can also modify attributes, enumerating the values of a text attribute, removing or requiring an optional attribute, and so on.

Because a constraint is always a restriction, an instance of an element modified by a constraint is, by definition, a valid instance of the same element without the constraint. For instance, the instances of a <section> element that contains only block elements are also valid for an unconstrained <section> element.

The following posts indicate requirements that could be met through this enhancement:

Scope

Major – architectural.

Use Case

Here are some specific examples of potential uses of constraints:

Simplifying content models for users
For instance, restricting the <section> element to block content. Also, restricting block elements to phrase and inline content.
Enforcing best practices
For instance, requiring the <shortdesc> element in all topics.
Specifying replacement domains
For instance, substituting the specialized programming, software, and UI domain phrases for the generic <ph> element in all contexts – instead of offering the specialized domain phrases as alternatives to the generic <ph> element.
Specifying contextual domains
For instance, adding a <legalcaveat> element as an alternative to the <ph> element but only within the context of <note> elements. Also, in maps, adding a special kind of topic reference for navigations only at the top level within the <map> element instead of generally in all <topicref> contexts.
Setting enumerations for metadata attributes
For instance, restricting the audience attribute to "administrator," "programmer," and "user" in all elements.

Technical Requirements

As with specialization, constraints require a discipline of designers. That is, in the same way that designers are responsible for restricted definitions when specializing an element, designers must restrict definitions when modifying an existing element to implement a constraint.

Designers declare constraints with the following architectural attributes (which should rarely if ever be seen by writers).

constraints
Provided on every element with a defaulted value that lists the constraints (separated by spaces) that have been implemented by the element definition. If there are no constraints, the value is the empty string.
constraint-scope
Provided on the outer element for the content object (that is, on the topic or map) to list all of the constraints implemented in the document type and, in addition, the list of elements on which those constraints were defined originally.

The following example shows an instance of a document type with normalized architectural attributes identifying a set of three constraints, of which two apply to the <reference> element and one to the <refsyn> element.

<reference ...
        audience="administrator"
        constraints=" db-audience-enum topic-audience-req "
        constraint-scope="db-audience-enum( topic/topic ... )
            topic-audience-req( topic/topic )
            refsyn-title-req( reference/refsyn )">

The example illustrates some of the salient features of constraints:

A single element can implement multiple constraints
In the example, the <reference> element requires the audience attribute and requires that the value of the audience attribute come from a list of database administrators.

For another example, the <section> element could implement separate constraints to require a <title> and require only block elements as content.

A constraint can be implemented on multiple elements in multiple modules
In the example, to conform to the "db-audience-enum" constraint, every element with an audience attribute must restrict the audience value to the list of database administrators.

For another example, a <topic> and <task> might both implement a constraint for a required <shortdesc>.

Note: Whether constraints can be shared across elements depends entirely on the restrictions imposed when specializing those elements. For instance, a constraint requiring titles on <section> elements could be shared across <topic>, <concept>, and <reference> but not <task> because task has <section> specializations like <prereq> that cannot take a title. Thus, as currently defined, <prereq> cannot conform to that constraint.

If a specialization conforms to a existing constraint, the specialization can declare that conformance.

A constraint on a base element must be implemented on its specializations
Like specialization, a constraint defines an unambiguous contract. By declaring a constraint, a document type is obligated to fulfill the defined contract. This rule ensures that content is generalizable from specialized elements to base elements if both conform to the same constraints.

In the example, the constraint requiring a topic audience was originally defined for the <topic> element. Thus, for the document type to conform to the same constraint, the <reference> element must implement the same constraint.

For another example, let's say a constraint is defined on <section> to require block elements as content. If a task document type declares the "section-block-req" constraint, it must restrict the <prereq>, <context>, <result>, and <postreq> specializations of <section>. Otherwise, the task document type provides an incomplete implementation of the declared constraint.

A validation routine could check constraints attributes against the constraint-scope attribute to confirm that every specialized element within the scope of the declared constraints also declares the constraints. This validation wouldn't validate the constraint definitions but would catch some potential design omissions.

For clarity, designers might want to adopt a convention of prefixing a constraint with the name of the element group or element on which the constraint is defined.

Constraints have the following ramifications for processing:

Basic processing
Because all constrained instances are valid instances of the unconstrained element with the same semantic and the same class attribute value, the processing for an unconstrained element applies to the constrained element. For instance, the processing for a <section> element can operate on the instances of a block-only <section>.
Inherited processing (aka fallback processing)
For the same reason, inherited processing can operate on the constrained instances. For instance, the processing for a <section> element can operate on the instances of a block-only <prereq> element.
Conref
A conref operation is valid where the constraints on the content source are a superset of the constraints on the content destination. For instance, conref from a block-only task document type to an unconstrained task is guaranteed valid.

A more fine-grained conref validator can check the actual elements within the content reference fragment to determine whether the elements are constrained.

Generalization
The operation of actually renaming elements requires only the class attribute. Thus, constraints have no effect on the renaming operation itself. Constraints do, however, affect whether the generalized instances will be valid for a target document type.

Instances of a constrained element can (of course) be treated as instances of the unconstrained element and thus to any valid generalization target for the unconstrained element. For instance, a block-only <prereq> can revert to an unconstrained <prereq> and thus also to an unconstrained <section>.

Because both specialization and constraints work by restriction, any constraint that's possible on a specialized element can also be implemented on its specialization ancestors. For instance, if <prereq> is restricted to blocks, by definition its ancestor <section> can be restricted to blocks. Thus, generalization can preserve constraints if a document type implementing the constraint is available.As a result, organizations that implement constraints on all of their document types can revert instances to base types while maintaining those constraints across their information inventory.

Unconstrained content (whether specialized or not) cannot be used with a constrained document type (at least, not without manual verification that the unconstrained content happens to conform to the constraint). For instance, an unconstrained <prereq> cannot be generalized to a document type with block-only <section> elements until the instance has been checked (either manually or by a custom process) for phrase and text content.

To preserve pluggability, the architecture must provide a mechanism for implementing a set of constraints in a module of the validation grammar. That way, adopters can easily share the implementation of constraints on elements and can easily assemble document types with and without the constraints.

A best practice for organizations might be to provide an unconstrained equivalent for every constrained document type and to implement processing to handle all instances of the unconstrained document type. This approach will maximize flexibility for interoperability.

Note: A future version of the DITA architecture might establish base relationships between constraints. For instance, the section-block-req constraint might provide the base for a constraint that substitutes replaces the generic paragraph with semantic specializations of paragraph within a <section> element. Such relationships would enable more generalization flexibility between constraints.

Costs

This change is backward compatible for instances.

DTD impact: Extensive. The DITA DTDs must be refactored so that each element defines its content model in one entity and its non-architectural attribute list in another. In addition, each topic module must provide an entity file that declares its element names (as with domains). The module implementing a constraint can be included after the element entity declarations but before the default definitions of content models. While these changes would conform to common practices for DTDs, refactoring the DTDs while preserving backward compatibility may require some care.

Schema impact: Minimal. Constraints can be implemented by redefinition.

Processing impact: Minor. Generalization and conref must check constraints. A transform that commits to a generalization must remove constraint declarations that aren't part of the document type.

Benefits

Adopters have already been "bending the rules" to provide for simpler content models or to control substitution of domain elements. Constraints provide a way to manage these restrictions in a well-defined way while maintaining interoperability including the ability to share sets of restrictions with others.

In particular, outbound interoperability is guaranteed for adopters of the unconstrained document type. Inbound interoperability from the unconstrained document type can be handled either by processing the inbound content against the unconstrained document type or by providing the constraints to the interoperability partner.

More generally, constraints eliminate the need to specialize for structural reasons, increasing the semantic reliability of specialization. For example, DITA 1.1 could modify the <taskbody> element to allow the <section> element in its content model but preserve backward compatibility by refactoring the existing task document type to implement a taskbody-no-generic-section constraint that removes the <section> element.

Time Required

Three days to refactor the existing DTDs once the new design pattern has been verified.