[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Subject identification and ontological commitment : a real-world example
Patrick, and all > Since we are now procedurally valid, let's turn our attention to > *substantive questions* and see if we can capitalize on all the good work > that Mary reminded me has already been done for PubSubj. I have underlined *substantive questions* in Patrick's post, and I fully agree with that position. I try below to focus on a question I consider to be a (the) most substantive one at this point, and which have *always*, during the past two years of "good work", been either forgotten on the backburner or swept under the carpet (pick up your metaphor). And the various debates we had around this non-explicit question were stuck by everyone having his/her own implicit a priori answer(s) (the way it usually goes until questions are explicit). This question was at the core of my former proposal to use OWL for PSIs. But when I made that proposal, certainly I pushed too quickly the answer before setting clearly the question - certainly at the time it was not completely clarified in my mind. Moreover the proposal had too much political context to be popular. So let's forget about any language, technical or process solution for the moment, and focus on the following questions. Q1: Is subject identification independent from ontological commitment? I expand below on the two concepts employed here, and why I consider the answer to this question to be "no". Q2: If the answer to Q1 is "no", how can we articulate the two concepts in our recommendations? Any further technical recommendation for PSI structure, metadata, publishing process, use ... should be based on explicit answers to Q1 and Q2, and a consensus on those is IMO a prerequisite to any further deliverables. We have addressed in Del 1 the question of subject identifiers and subject indicators, but we have not really addressed the question of *subject identification*. Subject identification is based on agreement to use some type of subject identifiers, following the same set of rules, in some type of processing context. For example, XTM use of subject identifiers is such a processing context. Subject identification in XTM processing is linked to the use of identifiers in some specific way, like under <subjectIndicatorRef>. If a subject identifier (URI) is used under <occurrence>, it does not necessarily support a process of subject identification (and merging). And if the same subject identifier is used outside XTM context, what could/should be the identification process and rules? Do we let every other user of PSIs set their own rules for subject identification outside TM? Or do we mention/recommend processing context and rules? (e.g. in XML, RDF, OWL, UDDI, DC metadata ...) Let me take a real-world example, where universal identifiers (ISBN numbers) are efficiently used for subject identification in a distributed environment. http://isbn.nu is a very cool site providing syndicated search on books based on Author, Subject, Title or ISBN. http://isbn.nu/about.html says much about our issues in a nutshell. "This site is a proof of concept of several ideas about information management, organization, and linkage. It's also an attempt to show how smarter systems combined with cleaner URLs can create shortcuts around roadblocks." Look at how it works. Type in the ISBN search field either "0-534-94965-7" or "0534949657". The URL generated from that search is http://isbn.nu/0534949657 - de facto an efficient subject identifier for the book of John Sowa "Knowledge Representation" in this context. Note the clean syntax, no weird query string, as simple as can be. If you search by author or subject or title, you will retrieve a list of books, each identified by one of those ISBN-URL-PSIs. What you get from that URL is a search result syndicated from various booksellers, including current availability and prices, and links to partners sites. The process is obviously using the ISBN identifier throughout to query different data bases in various ways, which figures all the partners have set an agreement on the way to deal with ISBN, both as internal subject identifier and for syndication transactions. BTW very impressive results. One interesting thing is one of the partners is amazon.com. But if you search directly at amazon.com for "ISBN 0534949657" or "ISBN 0-534-94965-7" you get total silence for the former and total noise for the latter. So the same data base, depending on the processing context, can make sense or not of subject identifiers. I find this example very interesting food for thought and wonderful illustration of what subject identifiers can achieve in a distributed environment. And what we can learn from it is why and how it works so well. It seems that some reasons can be listed. 1. The class of subjects which are uniquely identified in the process (Book) is clearly defined and well known from all the actors in the information system : syndication site, providers, end users. 2. All the actors are aware of ISBN as being an identifying attribute for this class Book and use it that way. 3. All the actors make the same sense of attributes attached to (and is some sense defining) instances of that class, either generic and permanent ones (title, author, publisher, publication date) or local context-defined ones (sales price, availability, time to ship). 4. The issue of identifiers being URIs or not, is tackled here in the simplest way possible. Internally, the system certainly uses only the ISBN itself, but the Web human interface uses a URI-PSI-fied form of ISBN, with an obvious one-to-one correspondence. It is clear that all the above 1, 2 and 3 boil down to have all actors in the process commit implicitly to the same ontology for the class Book - an ontology which could easily be explicited and formally expressed using the above-quoted attributes. This kind of explicitation whatever its formal expression, is certainly set under the hood to allow syndication of content between http://isbn.nu and providers. So, coming back to Q1, we see by this example that efficient subject identification needs some ontological commitment of all the users of the identifier in the same context, and coming back to Q2, some hints are given of what this ontological commitment consists of, and how it could be explicited by formal reference to a common ontology. Hope that helps to understand what I am about now. Bernard
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]