xri message

Subject: Proposal for alphanumeric version identifiers in XRI metadata spec
From: "Drummond Reed" <drummond.reed@cordance.net>
To: <xri@lists.oasis-open.org>
Date: Thu, 6 Apr 2006 12:57:12 -0700
Following is a proposal for discussion on today's XRI TC telecon. 

First, a little background. The current Committee Draft spec of XRI Metadata
2.0 (at
http://www.oasis-open.org/committees/download.php/11854/xri-metadata-V2.0-cd
-01.pdf) established four initial categories of XRI metadata: language,
datetime, version, and annotation.

In the new Working Draft under preparation by Marty Schleiff and myself (and
Dave McAlpin when he is available), we are deprecating one of these four
categories (annotation, per a message to the list last month), and adding
one new category (identifier type metadata -- $t -- also discussed
extensively last fall). Of the other three, language will remain largely
unchanged (except for the newer, less ambiguous ABNF). However we are
proposing some changes to datetime and version.

This message will summarize the proposed changes to version metadata (which
I suspect will take up today's call time.) Once we finish that discussion, a
subsequent message/discussion will summarize datetime.

*** PART ONE: PROPOSAL FOR MASTER ABNF ***

First, here's the proposed "master ABNF" that now governs all XRI metadata:

xri-metadata-exp	= "$" metadata-tag [ "*" metadata-subtag ] "*"
target
metadata-tag	= alpha / xref
metadata-subtag	= 1*xri-pchar / xref
target		= *xri-pchar / xref

As explained in an earlier message several months ago, this is based on the
basic RDF subject-predicate-object pattern as follows:

Subject		= target
Predicate		= metadata-tag
Object		= metadata-subtag

Note that in the master ABNF, a metadata-subtag value is OPTIONAL. Since a
specific metadata tag can only further restrict and not loosen this ABNF,
that means some types of metadata tags may omit the subtag value (i.e.,
declare a default subtag value) and others may require a subtag. Of the four
categories of metadata, $d datetime and $v version have defaults and $l
language and $t type do NOT have defaults and thus require a subtag.

This ABNF also means all XRI metadata is now defined as describing the
target identifier included in the cross-reference containing the metadata
(which Marty and I are calling the "metadata expression"). The semantics of
this description are defined unambigously in the spec with regards to the
target identifier, however the interpretation of the metadata expression as
a whole (meaning relative to the parent node(s) or child node(s) in the XRI)
is left to the authority for the XRI.

For instance, in the following examples...

	xri://(example.root)*delegate/($l*fr*mot)/resource
	xri://(example.root)*delegate/resource*($v*2.1)
	xri://(example.root)*delegate/resource*($d*2000-01-12T12:13:14Z)

...the metadata "$l*fr" describes the identifier "mot", the metadata "$v"
describes the identifier "2.1", and the metadata "$d" describes the
identifier "2000-01-12T12:13:14Z".

*** PART TWO: PROPOSAL FOR VERSION METADATA ABNF ***

Now, given this master, here's the proposed ABNF for version metadata:

	xri-version-exp	= "$v" [ "*" ver-subtag / xref ] "*" ver-target

In CD01, a version tag could not include a subtag -- there was only one
version identifier format (numeric). This new ABNF allows version metadata
to include a subtag value, however because it is OPTIONAL, there is a
default value when a subtag is not present.

Marty and I originally planned to define several version subtags -- for
example, one for numeric, one for alpha, and one for alphanumeric -- and
specify one as the default. However we realized that if it was done right,
we could just define alphanumeric and make it the default and be done with
it, as it is a superset of both alpha and numeric version identifiers.

The ABNF turns out to be very simple:

	def-ver-target	= ver-segment *( [ ver-seg-delim ] ver-segment )
	version-segment	= 1*digit / alpha
	ver-seg-delim	= "." / "-"

Note that the version segment delimiters (dot or dash) are OPTIONAL. This
allows all of the following as valid version metadata expressions:

($v*2.1)
($v*2.12)
($v*2.12.4)
($v*2.1a)
($v*2.1ab)
($v*2.1a-b)
($v*2.1a1)
($v*a)
($v*a1)
($v*a12)
($v*a1a)
($v*a1ab74)
($v*a-1-ab.7.4)

Now, the question is, if these are all valid version metadata expressions,
what are the normalization and comparision rules? This was a little tricky
to figure out, but they end out being quite simple:

** Normalization Rules **

1) Normalize all alpha characters to lowercase.

2) Normalize all delimiters to dots.

3) For all sequences of alpha characters, add dot delimiters so that every
alpha character becomes a single-character version-segment. Example:
($v*abc) becomes ($v*a.b.c) and ($v*2.1c5d) becomes ($v*2.1.c.5.d). This
means all segments are either a single alpha character or one-or-more digit
characters.

4) For all digit segments, remove leading zeros.

** Comparison Rules **

After normalization, working from left-to-right, compare each segment value
according to the following four rules:

a) An alpha-segment comes before a digit-segment.

b) All alpha-segments are compared by ASCII value. The higher ASCII value is
the later version value.

c) All digit-segments are a comparison of the integer value of the entire
sequence. The higher integer is the later version value.

d) If all version-segment values are equivalent but one version-target has
more version-segments than another, the version-target with more
version-segments is the later version value.

Examples in order of earliest-to-latest:

RAW                    NORMALIZED
($v*02.4)              ($v*2.4)
($v*02.4e)             ($v*2.4.e)
($v*02.4ef)            ($v*2.4.e.f)
($v*02.4g)             ($v*2.4.g)  
($v*02.5)              ($v*2.5)
($v*02.51)             ($v*2.51)
($v*2.52)              ($v*2.52)
($v*20.52)             ($v*20.52)
($v*21)                ($v*21)
($v*21a)               ($v*21.a)
($v*21ab)              ($v*21.a.b)
($v*21abc)             ($v*21.a.b.c)
($v*21a1)              ($v*21.a.1)