Subject: RE: [search-ws-comment] Facet ranges in SRU 2.0

I am not sure we are all talking about the same model for range facets.


In the range facet model for SRU the server might support for example a facet:  'datePublished', where it has indexed all of the publication dates. A client might want counts only for dates "August 1, 2010 through August 31, 2010" and would indicate these in facetLowValue and facetHighValue.


One can envision a different model, where the server groups  publication dates by months, so there is a facet for 'August2010', 'July2010', etc.  (Perhaps it is still one big index, but the server is exposing different facets by month. And listing via Explain.)  In this case if the client wants "August 1, 2010 through August 31, 2010" then it specifies the August2010 facet  - but of course if it want July 15 through August 15 it is out of luck; the ranges are pre-assigned by the server.  


The point is that in this latter case -  ranges pre-assigned by the server - the notion of range facets as defined in SRU 2.0 does not apply.


Edo - Before we carry this discussion further, can you clarify, which of these two models are we talking about in your case?  




If the client wants control over the labeling of the facets, the client can re-label them however they want.  There is no need for the client to be able to specify the labels that get returned by the server.


Let’s keep this as simple as we can until we get a real need for something more complicated.




Thanks for thinking with me and sorry for the late reply: it's been a busy week...

Before I reply, let me start by saying I am using these documents only as the basis of this discussion:

the example response XML at http://www.loc.gov/standards/sru/oasis/schemas/facetedResults.xml and the SRU 2.0 specifications at http://www.loc.gov/standards/sru/oasis/current/sru-2-0.doc. If I missed something, please let me know.


I think we are getting somewhere. Ray said:

"What I think you are asking for is
-  an additional request parameter, for example  &rangeFacetLabel="published last week"; and
-  an additional response element,  <rangeFacetLabel>, a subelement of facet, which simply echoes the request element. "

Ralph agrees that there is need for an extra response element, but he suggests that it should be a daughter element of <term> (which in turn is a daughter element of <facet>. I agree with that. 


However, I believe that on of the problems we are encountering here is that there is a hierarchical relationship in both the request parameters and the response XML. I believe the response part has been solved by Ralph's suggestion, but I think the issue for the request part remains: there are no hierarchic request parameters and hierarchy in a URL is not ideal. I believe there are two things that could be done here:

1). Avoid the request problem altogether by not allowing the client to specify request parameters for range facets. If a client requests a facet on dcterms:issued, the *server* will decide that:

A) the <facetDisplayLabel> should be, say, "Publication date" (I am assuming the server always chooses the facetDisplayLabel as there is no way to define it in the request parameters, by the way) and

B) that the facet terms are both defined *and* labeled as, say, "First quarter 2010", "Second quarter 2010" and "Third quarter 2010".

My reasoning is this: To me it does not make practical sense to allow a client to define a range but not allow a client to label the range (otherwise the server has to guess a logical name, and how is a computer supposed to know what a logical name for a range is?). Also it does not make sense to me to allow the definition of only one range per index (since that is not really a facet imho).

2) Have some sort of hiërarchical way of requesting facets, so that a cliënt could request the following facet (in natural language):

- Give me a facet on dcterms:issued with facet terms:

   "First Quarter 2010" (defined as 1-1-2010 until 31-3-2010),

   "Second Quarter 2010" (defined as 1-4-2010 until 31-6-2010),

   "Third Quarter 2010" (defined as 1-7-2010 until 31-9-2010),

There is an hierarchical XML suggestion for this at http://facetmap.com/demo/browse.jsp (look under browse price) and http://facetmap.com/spec/, but this will not work for URL defined facets (because of the hierarchical relationship). In SOLR the problem has been solved partly: http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range

Maybe something like this would work:


&facetTermLabel:dcterms:issued="First Quarter 2010"&facetLowvalue=1-1-2010&facetHighvalue=31-3-2010

&facetTermLabel:dcterms:issued="Second Quarter 2010"&facetLowvalue=1-4-2010&facetHighvalue=31-6-2010

&facetTermLabel:dcterms:issued="Third Quarter 2010"&facetLowvalue=1-7-2010&facetHighvalue=31-9-2010

The underlying assumption would be that each itemlabel and each definition would be mentioned consecutively (preserving the sister-sister relationship). Hmmm, I don't know.... 

Or maybe defining a help variable is the best way:


&rangeField:dcterms:issued:tempvar=Q1&facetTermlabel.Q1="First Quarter 2010"&facetLowvalue.Q1=1-1-2010&facetHighvalue.Q1=31-3-2010

Hmm, I don't know...


To me the 1) alternative is best: you lose some of the flexibility, but for many collections this is not such a problem I guess. On the up side: this alternative is less complicated and keeps the SRU standard manageable.

The 2) alternative is more flexible, but too complicated i.m.o.  


I am curious to hear your reply.


Kind regards,



