[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [dita] Scenario for cross-deliverable referencing
In thinking about this more, I think that Michael's approach of thinking of the rendition-specific key-to-target binding as being a literal DITA map with literal key definitions is a useful one. It provides a clear syntax for capturing the binding for interchange purposes, will always work for distributed processing scenarios, and gives us a clear basis on which to discuss data details. I will use this approach from now on in my discussions. I explore the processing implications and possibilities in some detail below, but I think my difference with Michael comes down to: Is it possible to keep the two key spaces for two publications distinct or must you combine them? I say keep them distinct by enabling addressing of keys in the context of specific root maps. Michael says combine them so that existing processors "just work" once you swap in rendition-specific key bindings, at the cost of requiring coordination of the key names across the maps involved. If we stipulate that in both cases the actual rendition processing to create working links is done by "swapping out" the target map as authored for an equivalent map that contains rendition-specific key bindings, then there are no actual processing differences in our two models--the only difference is in the details of how those rendition-specific maps are coordinated or used, which is all implementation detail. That is, the processing doesn't require or disallow my all-knowing Processing Manager and fully allows Michael's completely informal and distributed processing environment. All the differences in these two models are implementation details. This "swapping out" must be done by specifying a mapping from the target map as authored (e.g., "map-b.ditamap") to the rendition-specific map to use instead (e.g., "map-b-PDF.ditamap") as an input to a rendition process. That mapping has to be known regardless of how the processing is done. It could be specified as a parameter or it could be specified as instructions to the human setting up the production, who then reflects the knowledge by modifying the input map. The functional result is the same. Given that mapping, a human or processor can thus reliably render key references as authored to working links in the rendition, as long as the rendition-specific key bindings are correct. Thus the mechanism by which rendition-specific keys are communicated to or used by a processor is an implementation detail. The only question is what does the processor have to do to resolve the keys? Do they always have exactly one key space or do they need to handle one or more key spaces? My approach, which requires a new fragment identifier in order to point to specific keys in the context of specific root maps, reliably keeps distinct key spaces distinct and removes the need to coordinate names across key spaces. I think this is essential. It requires processors to handle one or more key spaces, but I don't think that should pose a problem in practice because if you can construct one key space you can just as easily construct 100 key spaces. Since the universe has more than one map I would hope that engineers of DITA-aware systems instinctively provide for the possibility of multiple key spaces. Michael's approach requires combining the key spaces of otherwise separate publications into a single unified key space. This simplifies processing where the rendition-specific maps are used literally to implement cross-publication linking using DITA 1.2 processing, but at the cost of requiring coordination across all the key spaces that might be combined. I think that this coordination is impossible in the general, distributed case, because you may want to link to a publication over which you have no control and that happens to duplicate some keys in your publication that you do not want to resolve to that publication. The only solution in that case is to keep the key spaces separate. DITA 1.2 clearly defines the notion of key space so there can't be any ambiguity about what is intended when you address a key in the context of a given root map and it shouldn't be a surprise to any processor that there might be more than one key space in play at any given point in time (because the universe contains more than one map). In the case where you have, for some reason, multiple maps that contribute to a single rendered publication through some process, it would be up to that process to generate the appropriate rendition-specific map but it could do it. In that case there might be a many-to-one mapping from maps as authored to intermediate maps, but the processing will still work just as it would for the simpler case of one map exactly equal to one publication. So I think the question remains: do we allow referencing across key spaces in a way that keeps key spaces distinct or do we require that all maps that might want to participate in cross-publication links share a single unified key space that requires coordination of all key names across those maps? I feel strongly that the latter is not acceptable or sustainable and that the implementation cost of allowing cross-key-space referencing is low and is, in fact, arguably inherent in the DITA 1.2 architecture because it formally defines the concept of key space. In the case of the Toolkit in particular, I will personally implement the processing required if that's a barrier. It is certainly the case that key-aware editors and component management systems already have to manage multiple key spaces if they allow management of multiple maps, which they all do as far as I know (e.g., OxygenXML, Arbortext Editor 6, XMetal 6). [I don't know of any CMS systems that today actually manage keys or provide key-resolution services but there might be some. I'm actively working on adding that functionality to our CMS products, but it's a low product priority right now.] The purpose of the rest of this message is to try to define a general abstract processing model or environment that fits both my tightly-controlled approach and Michael's arbitrarily distributed model. My intent is to define some common vocabulary and appropriate abstractions that let us focus on the general requirements with out worrying too much about implementation details. Michael is presuming (but not requiring) an environment where there is no central all-knowing rendition system that maintains knowledge about all the renditions and the key-to-rendition mappings. I was assuming an all-knowing Production Manager. But I think for both of us those are implementation details that don't really change the problem. We were both presuming that *something* had required knowledge of the renditions involved and the intents of the renderers--in my case it was a management system, in Michael's it was the humans requesting the renditions. But I think the knowledge required in both cases in the same, the only difference is how that knowledge is captured or communicated, which is an implementation detail. The following discussion reflects the real case of the DITA 1.2 spec, where we have a single content set that needs to be published in at least two ways: as a single publication combining the Architectural Spec and the Language Reference and as two separate publications, the Architectural Spec and the Language Reference, with cross references between the two publications *as rendered*. I have tried to reflect this case with the smallest illustrative data set. Note that in the case of the DITA spec all the content is authored by a single, coordinated group, so it is possible to coordinate the key names across all the publication packages that might be applied to the content. This does not reflect the more general distributed case where you may want to link to renditions of a publication you only have read-only access to and for which there is no coordinate of its key names with your key names. Let us have three maps, Map A, Map B, and Map AB, and two topics, Topic 1 and Topic 2. The author of Topic 1 creates a link to Topic 2 because Topic 1 depends rhetorically on Topic 2. This is the DITA Spec case, where the arch spec points to language reference topics (and visa versa). Topic 1 looks like this: <topic id="topic-01"> <title>Topic One</title> <body> <p>See <xref keyref="topic-02"/>.</p> </body. </topic> Topic 2 looks like this: <topic id="topic-02"> <title>Topic Two</title> <body> <p>Something important to Topic 1.</p> </body. </topic> Map AB includes both topics: <map> <title>Map AB</title> <keydef keys="topic-01" href="topics/topic-01.dita" /> <keydef keys="topic-02" href="topics/topic-02.dita" / > <topicref keyref="topic-01"/> <topicref keyref="topic-02"/> </map> This is the full DITA spec case, where all the topics are used in the scope of a single root map. No processing ambiguity. The other case is where we have two publications, Map A and Map B: Map A: <map> <title>Map A</title> <keydef keys="topic-01" href="topics/topic-01.dita" /> <keydef keys="topic-02" href="????" format="????" scope="????" / > <topicref keyref="topic-01"/> <!-- NOTE: No reference to topic-02 --> </map> Map B: <map> <title>Map B</title> <keydef keys="topic-02" href="topics/topic-02.dita" / > <!-- NOTE: No reference to topic-01 --> <topicref keyref="topic-02"/> </map> Processing the publications: When Map B is rendered to a given output we can capture the key-to-address mapping in some way, such as Michael's keydefs, e.g.: Map B-PDF: <map> <title>Map B PDF-specific keys</title> <keydef keys="topic-02" href="/workspace/output/map-b/pdf/map-b.pdf#unique-01" format="pdf" scope="external" /> <map> That's as good as any other way to capture the information and I'm happy to stipulate that this is how it is always captured for the purpose of processing interchange. This leaves open the possibility of manual or automatic inclusion of the map into the publication map as I think Michael is describing in his processing model. How the map is used is an implementation detail if the map is not literally included by a map author separate from a specific rendition process action. When Map A is rendered the questions then are: Question 1. What should the keydef for key "topic-02" look like in Map A? My proposal is currently: <keydef keys="topic-02" href="map-b.ditamap#keyname::topic-02" format="ditamap" scope="peer" /> Where the fragment identifier is a strawman for a fragment ID that is unambiguously a reference to a key in the scope of the key space defined by root map map-b.ditamap. Michael's example is: <mapref processing-role="resource-only" href="map-b.ditamap"/> If I understand Michael's approach, he is simply including Map B as a resource-only map so that the keys have a binding. However, his form of inclusion doesn't make it clear that the intent is that those keys are treated as a separate key space. I think that is essential. That is the intent of my using scope="peer". It doesn't keep the two key spaces separate and therefore requires that the key names not conflict between the two root maps. His approach does allow swapping in of the rendition-specific bindings for Map B given the map-as-authored-to-rendition-map mapping stipulated above as a necessary parameter to the rendition process. But it still requires a single unified key space across maps A and B. In the context of processing Map A as authored outside the context of a specific rendition there would be nothing to indicate that map B's keys are not defining resources directly required by Map A. For example, a process that takes a map and produces a package of all of Map A's dependencies would also gather up everything used by Map B even though they're not really direct dependencies of Map A. (Such a processor is part of the open-source DITA for Publishers project and is also in the Open Toolkit.) If the mapref specified scope="peer" that would avoid the dependency confusion but wouldn't avoid the key space combination because there's no separate direct binding of key in Map A to key in Map B as in my approach. In both cases we're pointing to the map defining the keys, the difference in my approach is that I'm also pointing to the key within the map and using @scope to make it clear that I'm not simply using Map B's key definitions to include resources as part of Map A's content, which is otherwise the implication per the DITA 1.2 rules. In my proposal, because there's an additional layer of indirection between the key as referenced in the context of Map A and the key as referenced in the key definition in Map A, the key names need not be coordinated between the two maps. That is, if Map B defined the key for Topic 2 as "second-topic", my form of keydef could be: <keydef keys="topic-02" href="map-b.ditamap#keyname::second-topic" format="ditamap" scope="peer" /> And the original reference from Topic 1 would continue to work in both Map A and Map AB. I think that even if we don't address the keys via fragment ID that we have to distinguish references to peer and external key sets. Question 2. How does the agent (person or processor) rendering Map A specify which rendition of Map B some or all of the links to Map B should point to? That is, given that there is both a PDF rendition and an HTML rendition of Map B, the choices are: - The PDF rendition - The HTML rendition - Both renditions (multiple links generated from a single source link, or some intermediate fan-out link or whatever). Does this decision need to be made on a per-link basis or on a per-rendition of Map A basis? My thinking to date had been that like would always link to like, but Michael is correct to say that that can't be the only option, so it has to be either a build-time decision or an authoring-time decision. I think it needs to be a build-time decision determined by how you define the mapping of map-as-authored to rendition-specific map. Anything else would require additional per-key-definition syntax or metadata conventions that I think would be impractical in practice. I suppose if it came to it, you could modify the rendition-specific map to reflect exactly what you wanted and maintain that manually. Another fact, which I never stated but that Michael correctly pointed out, is that a given rendition is not identified just by the rendition type (PDF, HTML, etc.) but by all the runtime parameters that define it, including the active DITAVAL conditions, any processor-specific runtime options, the rendition-specific key-to-address mappings, etc. In my model, there is a Processing Manager that manages all rendition processing applied to a set of known content, e.g., all the publications managed within a given system. The Processing Manager abstracts the notion of "rendition" through a Rendition Definition object, which captures all the input parameters for a given rendition, e.g. "PDF, DITAVAL platform="windows", PDF option set "foo". The implication of Rendition Definition is that the same input rendered using the same Rendition Definition will produce the same (or functionally equivalent) output. Michael's model assumes there is no Processing Manager but that processing happens where it happens and people coordinate however they do it. However, the abstract notion of Rendition Definition is the same: you have to now what all the parameters were in order to reproduce the rendition. So in Michael's distributed world the Rendition Definition might be implemented as notes scribbled on your desk blotter or an email from the supplier of the rendition you want to link to, or whatever, but the information content is the same regardless. We can now define a "rendition instance" as being a Rendition Definition/input map pair. Two different input maps that use the same Rendition Definition will have "consistent" or "compatible" output (that is, they'll reflect the same set of runtime options). This is all, I think, equivalent to Michael saying "the person who renders the map has to specify the appropriate DITAVAL files, rendition-specific key bindings, and on". So my notion of Rendition Definition is either literal, as in my Processing Manager system, or virtual, reflected in the knowledge of the person doing the rendering, but in both cases, the same information is represented. In my model all rendering is done by the same processors, so that coordination of intermediate data (key-to-rendition-location mappings) is obviously easy to do. But Michael say "no, you can't assume that--it has to be more disconnected and distributed", which is true. But I think the degree of distribution becomes an implementation detail. That is, if Rendition Definitions include the key-to-rendition bindings, it's only a question of how those bindings get communicated among processing systems, not how they are captured or represented. Michael presumes or stipulates a map-based syntax because that is reliably interchanged and processed by DITA processors, which is fine. So now to the processing: If we stipulate that rendition-target-type is a runtime parameter, then when I process Map A to a particular rendition and want links to be to the PDF renditions of the target publications, part of the Rendition Definition is "render cross-publication links to PDF renditions". But in fact, it needs to be "Render cross-publication links to the rendition created using Rendition Definition X", that is, a specific Rendition Definition reflecting a specific set of rendition options, not just the base output type. In the context of the Open Toolkit, this means all the Ant parameters plus all the Toolkit Plugins and environment variables that contribute to the configuration of the transformation type used. Any other processing system will have the equivalent set of options and starting conditions. Given this background, we can now explore the different processing use cases: Processing Use Case 1: Don't have rendition-specific key bindings for Map B. If I process Map A and I don't have the rendition-specific key binding for Map B, the processor has three choices: 1. Process Map B using the Rendition Definition specified and then use the result to complete processing Map A. Note that this could be a literal process or it could be "get on the phone to the supplier of Map B and ask for the rendition-specific key binding that reflects the Rendition Defininition you want". 2. Process Map A with placeholder or otherwise unresolveable links. 3. Fail the rendition of A. Processing Use Case 2: Do have rendition-specific key bindings for Map B. If we are using my cross-publication key definition approach, then there needs to be an association between the root map map-b.ditamap and the corresponding set of rendition-specific keys. Abstractly this is part of the Rendition Definition parameters: you simply say "for map file "map-b.ditamap" use key definitions "map-b-PDF.ditamap" or whatever. It could also be done by literally change "map-b.ditamap" to "map-b-PDF.ditamap" in the map source before processing it normally. In any case, given that association, the processor can resolve references to keys nominally defined in map-b.ditamap to the keys as bound in the rendition-specific binding. Using Michael's approach it's essentially the same: you define the mapping as a rendition parameter or otherwise modify the map to be processed to replace "map-b.ditamap" with "map-b-PDF.ditamap". In the context of the Open Toolkit this would be something done as part of the general preparation of the intermediate files used to then create the final rendition (e.g., as part of the map-pull process or whatever makes sense). The data manipulation required is consistent with the sort manipulation the Toolkit already does. In the context of Processing Manager the mapping might be hidden behind a key resolution API that takes the rendition-specific key definitions into account. In any case, the result is the same--the rendered links reflect the bindings defined in the rendition-specific key map. The only difference is the interaction or potential interference of the two key spaces. In my approach, as explained above, there's no possible interference of the two key spaces because they are kept distinct, while in Michael's approach they key spaces are combined. I'm sure there's more to say on this subject but I'm out of time for now. But I think I've made my point about as clearly as I can. Cheers, E. -- Eliot Kimber Senior Solutions Architect "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.reallysi.com www.rsuitecms.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]