docbook-apps message

Subject: Re: DOCBOOK-APPS: Sorting and non-en_US indexes

From: Jirka Kosek <jirka@kosek.cz>
To: David Cramer <dcramer@broadjump.com>
Date: Mon, 23 Sep 2002 09:49:13 +0200

David Cramer wrote:

> To use autoidx.xsl for non-English languages (in addition to using the
> classes for Saxon mentioned below), I have to modify autoidx.xsl in two
> ways:
> 
> 1) Supply upper and lower case letters of the alphabet which autoidx
> uses to create indexdivs. For languages lacking the distinction between
> upper and lower case, I just put the alphabet in both places so that
> indexdivs are created. Any words beginning with a character not in the
> alphabet provided here ends up in the symbol category.
> 2) Add an appropriate lang attribute to each xsl:sort in autoidx.xsl,
> whether hard coded or gotten by looking at a @lang somewhere in the
> input document so that Saxon will sort using the right Collator.

Yes.
 
> For languages with accented characters, my choices are:
> a) Add the accented characters to &uppercase; and &lowercase; and so
> have words that begin with accented character end up in their own
> indexdivs, or
> b) Don't add these character to &uppercase; and &lowercase; and so have
> words that begin with those characters end up in the Symbols indexdiv

Yes.

> c) Don't use words as indexterms if the first letter of the term has a
> diacritical mark of some kind :)

If you want some word begining with diacritical mark to be sorted as
without it, you can specify sort key in sortas attribute.
 
> For Traditional Chinese, where I understand indexdivs are based on the
> number of strokes rather than the initial character in the word,
> autoidx.xsl doesn't support automatically generated indexdivs. To do
> that, the stylesheet would have to be rewritten (and include the number
> of strokes in an attribute on the <primary> element).

Yes.
 
> I understand that currently there is no way to have the stylesheets
> store multiple alphabets for &uppercase; and &lowercase; and use the
> appropriate one without the intervention of a processing system. I'm
> thinking of something along the lines of storing the declarations for
> uppercase and lowercase in files (en.ent, fr.ent), include parameter
> entity declarations that point to these files, and a reference to one of
> them, then have the processing system munge my customization of
> autoidx.xsl so that it includes the correct entity reference before
> using the xsl to process the document. 

Unfortunately yes.

> The alternative to something like
> that is to have a separate customization layer (with its own
> autoidx.xsl) for each target language.

This is not possible, as <xsl:key>s are marged if you override some in
imported stylesheet. This will probably result in very strange results.
 
> Some of these things I'll understand better as we get further in our
> experimentation, but it's helpful to know what behavior to expect since
> it saves you from debugging something that's really working as designed
> :) Once I've got this figured out, I'll write something up that we can
> include somewhere in the docs or faq.

Cool!

			Jirka

-- 
-----------------------------------------------------------------
  Jirka Kosek  	                     
  e-mail: jirka@kosek.cz
  http://www.kosek.cz

Follow-Ups:
- Re: DOCBOOK-APPS: Sorting and non-en_US indexes
  - From: Bob Stayton <bobs@caldera.com>

References:
- RE: DOCBOOK-APPS: Sorting and non-en_US indexes
  - From: David Cramer <dcramer@broadjump.com>