dita message

Subject: Re: [dita] Unique topic ids in the cross publication or global CMS use case
From: Eliot Kimber <ekimber@rsicms.com>
To: Jim Tivy <jimt@bluestream.com>, dita <dita@lists.oasis-open.org>
Date: Sun, 16 Jun 2013 13:30:30 -0500
Thanks--it's a difficult subject to explain clearly. I will definitely see
what I can do as prepare the final Stage 3 version of the proposal.

Cheers,

E.

On 6/16/13 7:13 PM, "Jim Tivy" <jimt@bluestream.com> wrote:

> Thanks Eliot
> 
> That is pretty clear
> 
> From your power point the text below is crystal clear.  In the proposal
> 13041 you have to work harder to get it - you define location as authored
> and location as delivered - but the power point is more clear.
> I think the DITA 1.3 needs to express something about "no requirement of
> topicids to have global uniqueness " because, as you say there is a
> misconception on this and we are not a strictly normative spec.
> 
> ******************** your text from powerpoint ***************
> 
> Addressing within the content as authored:
> Defined by the source format, e.g., DITA XML
> For XML source, should be independent of any given output format
> DITA defines the rules for addressing within DITA XML
> 
> 
> Addressing from the publication as delivered:
> Defined by the delivery format: PDF, HTML, EPUB, etc.
> No single standard
> Details may be proprietary
> 
> ************************************************************
> 
>> -----Original Message-----
>> From: Eliot Kimber [mailto:ekimber@rsicms.com]
>> Sent: June-16-13 6:10 AM
>> To: Jim Tivy; dita
>> Subject: Re: [dita] Unique topic ids in the cross publication or global
> CMS use
>> case
>> 
>> Jim,
>> 
>> I think your concern is addressed by the current cross-deliverable
> addressing
>> proposal: it does in fact propose the use of keys and mappings from keys
> to
>> locations as delivered as the way to ensure reliable cross-deliverable
> addressing.
>> The proposal as documented should make it clear that processors are
> obligated
>> to manage a mapping from objects as authored to objects as delivered such
> that
>> any delivery constraints are not imposed back onto the authored content,
> for
>> example, making topic IDs unique within a publication.
>> 
>> If the proposal is not sufficiently clear on that point then we must
> correct it.
>> Because I am so deeply into issues of linking and addressing I often
> forget that
>> what to me seems obvious is in fact not at all obvious.
>> 
>> Perhaps it's useful to discuss the general issue of topics IDs and their
> non-
>> requirement for uniqueness in the context of addressing generally. I think
> there
>> is either some general misunderstanding in the community on what is and
> isn't
>> required and probably some poor implementation choices made long ago that
>> still linger in our community. I don't fault implementors for not always
>> understanding the subtleties of addressing--it's a challenging subject.
>> 
>> -----------------------------------
>> Topic ID Uniqueness Is Not Required
>> 
>> Topic *IDs* are not required to be unique outside the context of their
> containing
>> XML document, nor do they need to be.
>> 
>> However, topic document addresses *are* necessarily unique, because the
> XML
>> documents that contain topics are distinct storage objects, which means
> they
>> have a unique location within the storage system that contains them and
> that
>> storage system has a unique location within the set of all possible
> storage
>> locations. That's how storage systems work.
>> 
>> In the world of the Web, every storage system exists on some kind of
> server
>> with a unique IP address. The storage system itself then exists at some
> unique
>> location within that server, and the resources managed by the storage
> system
>> then have unique locations, e.g., filenames, object IDs, or what have you.
>> 
>> Thus, every *topic* has a unique URL/ID pair that distinguishes it from
> *all
>> possible other topics* in existence at any moment in time.
>> 
>> Thus the ID of the <topic> element is necessary *only* to distinguish
> different
>> topics within the same *XML document*. But that requirement is imposed by
>> XML itself since DITA defines topic IDs as XML IDs.
>> 
>> If an XML document consists of exactly one topic, then addressing the
> document
>> is sufficient to reliably address the topic (by the rules of DITA
>> addressing) and in that case the topic ID is only of interest for
> addressing
>> elements within the topic, because DITA fragment identifiers are
>> {topicid}/{elementid} pairs. But even there, the value "topicid" for all
> topic IDs in
>> this case is as good as anything.
>> 
>> For the purposes of addressing in deliverables, there is no need for topic
> IDs to
>> be unique because the processor that generates the deliverable can ensure
> that
>> the IDs used in the deliverable are unique within that deliverable. The
> deliverable
>> is itself a storage object (or collection of storage objects) that, like
> all storage
>> objects, have identity within the set of all possible storage objects.
>> 
>> In addition, the processor that produces the deliverable must be able to
> have the
>> information required to maintain the mapping from objects as authored
> (that is,
>> topic ID, element IDs, and keys) to their locations as delivered. This is
> true
>> because the processor must have both the original source and deliverable
> it
>> generated available to it--this does not mean that all existing processors
> were
>> implemented in such a way that this information is maintained, only that
> they all
>> *could have been*.
>> 
>> So again, addressability is assured as long as the processor generating
> the
>> output generates unique IDs for any addressable things put into the
> deliverable
>> and maintains the source-to-deliverable address mapping.
>> 
>> If you need to do cross-deliverable addressing then you need to have a
> mapping
>> from the locations (not just IDs) of the things as authored to the
> locations of the
>> things as delivered. That mapping could be managed in many ways but the
>> current cross-deliverable proposal does it through the use of keys and
>> intermediate key definition sets that map the keys as used in the content
> as
>> authored to the locations of the key-bound resources in the deliverable.
> That is
>> sufficient to support the requirement for addressability.
>> 
>> In addition, the @copy-to attribute on <topicref> gives authors additional
>> control over deliverable addresses by allowing the assignment of new
> virtual
>> source storage object locations ("filenames") for distinct references to
> the same
>> topic or map. That doesn't remove the requirement for
> source-to-deliverable
>> address mapping, but it means that authors may influence the details of
> the
>> result.
>> 
>> The DITA 1.2 spec doesn't say anything about topic ID uniqueness because
> it
>> doesn't need to. Topic IDs don't need to be unique, except as already
> required
>> by XML rules.
>> 
>> It can be a *convenience* to assign unique IDs to the topics under your
> control,
>> but there is no way that any agency short of the divine can ensure global
> ID
>> uniqueness unless we mandate the use of a specific UUID generator.
>> 
>> By the same token, there's nothing wrong with making your topic IDs
> globally
>> unique if you want to, it's just not necessary and could be a waste of
> effort. Or it
>> could be a useful simplifying strategy. A typical use case might be to
> make topic
>> IDs be object IDs of topics managed in a component content management
>> system. That's fine as long as everyone is clear that these IDs can at
> best be
>> unique within the scope of that one component content management system
>> instance (even if you're using some sort of UUID generator there's always
> the
>> chance, however remote, that somebody might randomly choose the same ID
>> for one of their topics).
>> 
>> Cheers,
>> 
>> Eliot
>> 
>> On 6/15/13 8:08 PM, "Jim Tivy" <jimt@bluestream.com> wrote:
>> 
>>> Hi Folks
>>> 
>>> I have found numerous discussions that topic id is not required to be
>>> unique within a publication or collection of topics  none of these
>>> discussions in the current 1.2 specification (that I could find
>>> anyhow)  although omission means no requirement.
>>> One such reference was:
>>> http://tech.groups.yahoo.com/group/dita-users/message/14260
>>> Of course topic id does have to be unique within an XML document 
>>> that is not what I am talking about here  rather I am addressing
>>> intra publication uniqueness or even global uniqueness.
>>> Some PDF processors, such as the PDF5 processor for Antenna House,
>>> however, require that topic ids do have to be unique within a
> publication.
>>> At first it seems like this requirement is overstepping what Oasis has
>>> recommended (or not recommended through omission).  However, one
>>> reason for this unique id requirement of PDF5 is to support the cross
>>> publication linking use case.
>>> It just so happens that we dealt with this use case recently in
>>> approving proposal 13041 (Facility for key-based, cross-deliverable
>>> referencing (Kimber)).
>>> It seems if we do not recommend or say anything about unique topic
>>> ids, then we leave processors to ³twist in the wind²  or make extra
>>> requirements like
>>> PDF5 did.  On the other hand, if we require unique topic ids, we might
>>> be pre-supposing certain implementations which in fact are not
> necessary.
>>> It seems, however, if we are to add proposals such as 13041, then we
>>> might want to talk about how cross publication linking might happen 
>>> this proposal
>>> 13041 opens the door to some new possibilities.
>>> 
>>> For example, if our references were key rooted, we can used key export
>>> tables and the processors could do something like the following:
>>> 
>>> I use a PDF example here but it may have bearing on other cross
>>> publication links such as cross chunked HTML.
>>> In PDF, for example, to allow processor defined unique ids to topics
>>> for the purposes of merge (Like PDF2 merge) then to link from PDFB to
>>> PDFA would require PDFA to export its external links to PDFB because
>>> the ids of the topics in the PDF are not known at author time.
>>> 
>>> PDFA (export as XML)
>>> 
>>> keyname    newMergetopicId                      Original fragmentId
>>> MyKey1      a223345                                       be3333333
>>> 
>>> Then PDFB consumes this and has a reference to MyKey1/be3333333
>>> 
>>> Then when a processor builds PDFB and when it references PDFA with
>>> MyKey1/be3333333 it would resolve to PDFA.a223345/ be3333333
>>> 
>>> In this case, a223345 could be entirely generated by the PDF processor
>>> when PDFA is built, however, be3333333 would remain stable but not
>>> unique as a fragment Id.
>>> 
>>> My question here is, should we say something in the spec or when we
>>> document proposal 13041 regarding this.  Should we have text that says
>>> ³we DO NOT recommend processors rely on unique topic ids within a
>>> publication² or ³we DO recommend  same².
>>> 
>>> cheers
>>> Jim
>> 
>> --
>> Eliot Kimber
>> Senior Solutions Architect, RSI Content Solutions "Bringing Strategy,
> Content,
>> and Technology Together"
>> Main: 512.554.9368
>> www.rsicms.com
>> www.rsuitecms.com
>> Book: DITA For Practitioners, from XML Press,
>> http://xmlpress.net/publications/dita/practitioners-1/
> 
> 

-- 
Eliot Kimber
Senior Solutions Architect, RSI Content Solutions
"Bringing Strategy, Content, and Technology Together"
Main: 512.554.9368
www.rsicms.com
www.rsuitecms.com
Book: DITA For Practitioners, from XML Press,
http://xmlpress.net/publications/dita/practitioners-1/
References:
- RE: [dita] Unique topic ids in the cross publication or global CMS use case
  - From: "Jim Tivy" <jimt@bluestream.com>