DITA Proposed Feature # 43

Semantic (implicit) linking.

Longer description

The basic principle of inferring links for keywords is to associate a use of a term with the definition of the term. The same principle can also be used to associate any semantic element with a semantic topic that goes into more depth on the subject identified in the element. For example, an <apiname> element might refer to an API described in a reference topic. The linking strategy must maximize the reliability of the inferred link. The same name can mean different things in different vocabularies, so we cannot assume that the term names are globally unique. If we infer invalid links -- links that go to the wrong meaning for the term name -- the user will come to distrust the links and not follow them.

On the other hand, the linking strategy must also minimize the disruption to writers while using inline semantic markup. A writer should not need to create hard references or links every time an API name or other subject is mentioned. Instead, it should be possible for authors to identify within a map particular scopes within which a particular semantic element (eg APIname) can be mapped to a particular pattern in another set of topics (for example a branch of reference topics that documents an API library).

In other words, to balance the competing concerns of reliability and convenience, we can leverage additional clues to restrict the scope of matching:
  • The element contexts containing the used and defined terms/phrases.
  • The branches of the navigation hierarchy that contain uses of terms or definitions of terms/phrases.

Scope

Major

Use Case

{Describe this feature's use, as if ideally implemented.}

Technical Requirements

This note proposes a new map domain with elements that a writer can use to enable keyword linking by providing these clues:

linkword
Identifies the element contexts for used terms.
linkwordDefs
Identifies the topic branch containing the definitions for these terms.
linkwordMatch
Identifies the element contexts for term definitions within these topics.

The topics to which the keyword linking applies include every topic referenced in the branch of the topicref containing the linkword element. Here is an example:

    <!-- keyword link sources are found in this topic and 
         all topics referenced in this branch -->
    <topicref href="wsusing.dita" navtitle="Using Web Services">
        <!-- rule for matching an API name including a specialization
             anywhere within a code block -->
        <linkword href="codeblock//apiname">
            <!-- link targets for this rule are found under this topic -->
            <linkwordDefs href="wsapi.dita">
                <!-- rule for matching the link target in these topics -->
                <linkwordMatch href="reference/title"/>
                ...
            </linkwordDefs>
            ...
        </linkword>
        <topicref href="wsconcepts.dita" navtitle="Background on Web 
Services">
            ,,,
        </topicref>
        <topicref href="wsdiscover.dita" navtitle="Discovering a Web 
Service">
            ,,,
        </topicref>
        ,,,
    </topicref>

    <!-- because of the definition above, link targets are found in this 
topic
         and all topics referenced in a branch under this topic -->
    <topicref href="wsapi.dita" navtitle="Reference for the Web Services 
API">
        <topicref href="wsapiauthenticator.dita" navtitle="Authenticator 
class"/>
        <topicref href="wsapiconnector.dita"     navtitle="Connector 
class"/>
        ,,,
    </topicref>

From the example, the href attribute of the linkword element would expand to the following XSLT pattern:

*[contains(@class, ' topic/body ')] //
    *[contains(@class, ' pr-d/codeblock ')] //
    *[contains(@class, ' pr-d/apiname ')]

The href attribute of the linkwordMatch element would expand to the following XSLT pattern:

*[contains(@class, ' reference/reference ')] /
    *[contains(@class, ' topic/title ')]

As with the existing processing of the type attribute of the topicref element, the patterns implementing the rules match specializations as well as the specified element.

The author can use XPath expressions for specifying containment, containment at any depth (the double solidus), and for advanced users a predicate on the outputclass attribute. Here are some examples of common cases:

Link source linkword element Link target linkwordMatch
<sqlstmt>SELECT</sqlstmt> <linkword href="sqlstmt"> <SQLStatement ...> <title>SELECT</title> <linkwordMatch href="SQLStatement/title">
<cmdname>XCOPY</cmdname> <linkword href="cmdname"> <reference ...> <command>SELECT</command> <linkwordMatch href="reference/command">
<man1>ls</man1> <linkword href="man1"> <reference ...> <title><man1>ls</man1></title> <linkwordMatch href="reference/title/man1">
<keyword outputclass=”perl”>foreach </keyword> <linkword href="keyword[ @outputclass='perl']"> <reference ... outputclass=”perl”> <title>foreach</title> <linkwordMatch href="reference[ @outputclass='perl']/title">
<javamethod>read()</javamethod> <linkword href="javamethod"> <javaMethod ...> <apiName>read</apiName> <linkwordMatch href="javaMethod/apiName">
<term>storefront</term> <linkword href="term"> <concept> <title><term>storefront</term> </title> <linkwordMatch href="concept/title/term">

The link matching process performs the following sequence of actions:

  1. Delete non-alphanumeric characters within the name.
  2. Normalize spaces by trimming leading and trailing spaces and by reducing multiple spaces to single spaces within the name.
  3. Matches the resulting value.

If supported by the deployment system, links could be resolved at the deployed site based on the available definitions.

If a definition isn't found for a use, a build might generate a warning, but a deployment system would provide the text without linking and not generate any warnings or errors.

A deployment system should be able to tolerate cases where the same term has several definitions, though the markup scoping linking to navigation branches should minimize such ambiguity. The deployment system should display a list of some kind to let the user investigate the available definitions.

A branch can contain both uses and definitions of the same set of terms. For instance, the description of a class is very likely to refer to other classes in the same library. To handle such cases, a link word definition in the map can point to the same topic that contains the linkword. If a term instance matches both the definition and the use patterns, the definition takes precedence.

Costs

Several weeks for design discussion, several weeks for implementation (this one ain't trivial).

Benefits

Makes the creation and maintenance of inline links much easier, within the specific range within which inline links are usable and appropriate (ie definitional links from terms or keywords to their more detailed explanation). Increases the value of semantic inline markup: even when there is no other output effect there is still the possibility of automatic links. Allows linking among topics without creating dependencies between them, which increases possibilities for reuse.

Time Required

Several meetings for design discussion, several weeks for implementation.