-----Original
Message-----
From: Dave
McAlpin [mailto:dave.mcalpin@epokinc.com]
Sent: Friday, September 05, 2003 2:43
AM
To: 'Wachob, Gabe';
xri@lists.oasis-open.org
Subject: RE: [xri] Draft -07 feedback
from another Visa person (responses cont'd)
-----Original
Message-----
From: Wachob,
Gabe [mailto:gwachob@visa.com]
Sent: Thursday, August 28, 2003 4:29
PM
To:
'xri@lists.oasis-open.org'
Subject: RE: [xri] Draft -07 feedback
from another Visa person (responses cont'd)
Outlook
continues to drive me up the wall - this should be the end of my initial
responses to the comments from Terence Spielman.
Responses continue in
this email:
550-551 I didn't get the meaning
of persistent or re-assignable identifiers
out of this
description.
Yes,
this needs beefing up, as the inline comment mentions.
2.2.3.1 Is it allowable to
escape unicode characters? For example, if one
wanted to
express an international XRI in IA5 (ASCII)? In this
case,
the %AB format described in 2.2.3.1 is insufficient to support
the
expanded character width.
I'll
defer this question to our resident unicode & escaping guru, Dave McAlpin.
I think
step 5 in 2.2.3.2 addresses this when we specify “one escaped triplet for each
octet in the UTF-8 encoding of the disallowed character”. Did you have
something else in mind?
694 Does the lack od
idempotency affect semantics or syntax? I would
hope it would
only be syntax.
Again,
this gets deferred to Dave McAlpin.
It
affects semantics. If an XRI is inadvertently escaped twice and unescaped
once, for example, the result might be semantically different than the
original XRI (this depends, of course, on the original XRI). It’s the
essentially the same problem mentioned in section 2.4.2 of 2396, which says
“implementers should be careful not to escape or unescaped the same string
more than once, since unescaping an already unescaped string might lead to
misinterpreting a percent data character as another escaped character, or vice
versa in the case of escaping an already escaped
string.”
2.2.3.3 How about this as
an alternative?
Escape all current escapes
(%s).
Escape all syntactic elements with cross
references (parens).
Escape all
parens.
Dave
McAlpin has thought through the escaping issues quite a bit. We are trying to
track the (as-yet-not-finalized) RFC 2396bis and IRI (internationalized
resource identifiers) specs, and this adds some complexity with the benefit of
aligning with emerging best practices and architectures. I'd leave it to Dave
to explain exactly how he ended up with the escaping procedure we have.
I don’t
understand the second step. Can you give an example of escaping “all syntactic
elements with cross references”?
878-879 Why are XRI authorities
compared in a case-insensitive manner?
Thats a
good question. Not sure, honestly. Dave? Drummond?
Mostly,
I think, to make the comparison rules for XRIAuthority consistent with those
for URIAuthority (as specified by section 6 of 2396). It may be confusing,
though, in that it only applies to characters in the ALPHA production. That’s
fine for URIAuthorities because they only allow characters in the ALPHA
production, but the XRIAuthority can contain international characters. Is your
objection is that it’s odd that ‘e’ and ‘E’ are equivalent, but ‘e’ with an
accent mark is not equivalent to ‘E’ with the same mark? If it is, then I
agree. Is there a good way to specify case-insensitivity for all Unicode
characters?
Section 3 (I still
need to do some reading)
Has there been any work on
DECODING XRIs? It's not immediately
clear from the ABNF that
decoding is unambiguous.
I
believe the decoding is mechanical and unambigous.
Dave?
In
general, the escaping/unescaping mirrors IRI work, along with one extra step
for escaping () (parentheses). We definitely wanted to make sure the
transformations were reversible.
I think
the question is actually whether the BNF is unambiguous, i.e. does an XRI
exist that could be interpreted in more than one way by the BNF? I’ve done
some work in this area, but I certainly wouldn’t consider the BNF “proven” at
this point.
In addition, aside from
unresolvable references, is it possible
to canonicalize XRIs?
This is a highly desireable feature
(for equivalence, at a
minimum).
We
talked quite a bit about this. The decision was made to be silent on
canonicalization because equivalence is actually unambigious given
the rules stated. Now, that doesn't mean that its at all
obvious.
I do
think giving names to the escaped vs. unescpaed forms of XRI, at
least, would be useful. Canonicalization would then just be
transforming an identifier into one of those forms. We didn't want to mandate
a single canonical form because different environments would need XRIs in
different levels of escaping and it would be unfortunate to require a specific
canonicalization form that would require otherwise-unneeded transformation.
Again,
Dave McAlpin probably has better input on this.
A
canonical representation might be useful for comparison, but it would involve
a formal definition of things like “minimally escaped”, which would be fairly
difficult to nail down. It would also depend on the existence of a canonical
form for URIs used as cross-references. In other words, an XRI wouldn’t have a
canonical form if it contained cross-references that didn’t define a canonical
form.
Note
that equivalence rules are generally problematic. The IRI proposal, for
example, completely dodges the question of equivalence when it says, “There is
no general rule or procedure to decide whether two arbitrary IRIs are
equivalent or not… Each specification or application that uses IRIs has to
decide on the appropriate criterion for IRI equivalence.” 2396bis notes that
even terms like “different” and “equivalent” are fuzzy in the general spec and
ultimately application dependent.
An XRI is not a URI
(because of the expanded syntax). But
is an URI an XRI?
(no, because of different scheme (xri)).
I think it would be nice
to all URIs be valid XRIs.
Well, by
definition, all URIs can't be XRIs because URI's have different schemes -
XRI's must all have the "xri:" scheme. I think the goal of having all
URIs easily and trivially transformable into XRIs (ie remove the scheme and
insert xri:) is laudable, though its unclear that in many cases this makes a
lot of sense. This is because the XRIs are structured and resolution of
the XRIs (at the very least) gives special meaning to the firs segment (the
authority) -- not all URIs are hierarchical or treat the first
"segment" specially. Examples include mailto:, uuid:, cid:
etc
Note
also that it’s trivial to convert any legal URI into an XRI by simply
enclosing it in a cross-reference, e.g. mailto:bob@example.com -> xri:(mailto:bob@example.com), though I don’t know
that that’s generally useful.
Hope
that kicks off the conversation and gives us editors some good
pointers on where we need to focus on cleaning up of
language.