Introduction.1
This article explores potential of electronic text technology,
specifically Extensible Markup Language (XML) and Extensible Stylesheet Language
for Transformation (XSLT), in the study of mantra and text resources. The
reader should know that due to the necessity for clear explanation of this
technology a detailed, incisive, and otherwise comprehensive treatment of mantra
theory must remain secondary. Not unlike the competing demands for
developmental time that a scholar of the humanities finds when trying to add
digital technology to their arsenal of research techniques, this article strives
for an appropriate balance. Accordingly, the citations related to mantra use
are points of academic reference rather than a suggestion of rhetorical
conclusiveness.
In the present document, the phenomenon of mantra
presentation-specifically Vedic and Tantric-will serve as analog and a source of
examples in the illumination of the new, extensible, text technology for
research represented by XML and its related standards. There is a great deal
more to the technology and related tools for working with XML than the basic
outline of its primary characteristics presented below. However, if strictly
followed, what I have summarized will give you sufficient grasp to begin
benefiting from this technology in your own work. I've prepared files of
working examples, as well as a special edition of the Rig Veda in XML, which
accompany this publication online 2.
In the case of XSLT, the reader who is already familiar with
this powerful document command set will know that I've barely scratched the
surface. Still, what is possible with a simple set of basic commands is quite
powerful. In addition, I ask that the technically savvy reader accept the more
intuitive descriptions for the commands provided in some places rather than the
actual verbatim from the specification in order to ease the learning curve.
It remains to each reader and the resources of electronic
texts relevant to her/his work that they have available to them as to what
additional functions can be found. The examples below concentrate on the core
functions. For instance, I will not present the perfunctory and largely
esoteric lines of code, which come before and after the key operators I am
explaining. These are somewhat obtuse to the uninitiated and are themselves
largely unchanging from example to example. As a result, they could deflect the
reader from understanding basic operations and can therefore be learned later as
needed. 3 Unfortunately, I can assure you of no macrocosmic wisdom or
significance lying behind these code snippets.
This is not that different from the basic premise that a
disciple can only begin to direct the deity's power after s/he renders themself
fit to do so (Beyer, 1978:36). In just the same way can XSLT be unleashed with
repetition of core techniques in working with primary sources. Similar to how
japa cultivates power and concentration (Sadhu Santidev, 1999 -vol. II: 112) so
the core units of condensed code when carefully practiced can allow one's
ability with XSLT to permeate the full gamut of its potential.
I could draw the analogy further, and it would not be
disingenuous to do so. There is a conceptual parallel with the function of
mantra in Vedic and Tantric practice which became apparent as I looked through
various sources for examples which might make the application of the technology
appear more familiar. I will briefly revisit this discussion in the third and
final section of this paper.
With regard to the title of this paper, I was reading Stephan
Beyer's The Cult of Tara and noticed his citation of Kierkegaard:
"Mysticism has not the patience to wait for God's revelation . . ." Beyer
continues, "if we should ever be forced to attempt a definition of 'Tantra,' we
would say that it is a technique for magically storming the gates of Buddhahood
. . . In the broadest sense, magic is the manipulation of a distant object
through control of a simulacrum that is in some way associated with it" (Beyer, 1978:92).
Similarly, XSLT-an accessible, efficacious tool for working
with electronic texts-is directly associated with XML. It is written according
to XML's rules and acts upon, or manipulates, it. XSLT is a simulacrum for
navigating and traversing the components of an electronic text resource.
Considering the long anticipation of scholars non-technical disciplines for
accessible and powerful tools for working with their resources, XSLT is a
"magical storming of the gates" of technological power.
The reader must understand that the discussion of mantra
herein occurs for two related, but very different reasons. The first, and most
pedestrian, concerns the context in which this paper is presented-a journal of
tantric studies-and building the applied technology examples upon the
subject matter is helpful. The second, and more evocative, is the useful
analogy for understanding the relationship between XML and XSLT. The former
representing the core semantics-or syllables-and the latter the effectuation
(cf. Beyer, 1978:37, 243). In neither case am I suggesting an elevation
of XML technology to tantric praxis, or a misrepresentation of the latter as the
former.
Before the technology for research with XML can be
demonstrated or further analogized with respect to mantra and its study,
however, the core facets must be presented in some detail. It is assumed that
the reader has a working knowledge, or at least familiarity with, the basics of
markup technology. In other words, some rudimentary grasp of how HTML looks and
works will make the following discussion infinitely more fruitful. In the
absence of such prior knowledge, the astute reader will nonetheless find the
basic underpinnings of XML technology remarkably accessible.
The paper has three sections. In the first, I will introduce
XML. In the second, XSLT will be discussed. The third section will present
applications of this technology for identifying and extracting mantra material
for various ritual uses of the Rig Veda. With a little practice, you can also
adapt them to serve your own research interests. An appendix regarding access
to and installation of the related tools has also been included. It is assumed
that the reader has a working familiarity with the World Wide Web, some
awareness of HTML, and the ability to download a file and perform simple
installations of software.
I. Extensible Markup Language and the Birth of E-textnology
Context. With the recent advent of XML and its attendant suite of
technologies, scholars have at their disposal a sophisticated set of text
analysis and manipulation tools that do not demand mastery of overly-abstract
programming languages. Electronic Texts, or e-texts, can have handles and/or
topic "markers" built into, and continuously added to them according to a
scholar's own-or a collaborative group's-research agenda throughout the course
of their career.
Appropriately, then, we have before us the era of electronic
text technology, or "e-textnology." This new development builds upon the very
earliest and most consistent threads of computer technology development. It
derives from one of the most consistent and unchanging underpinnings of the
otherwise always-changing world of digital technology, the syntax of electronic
text markup.4
Since the late 1980s and before, computer technology has
striven to accurately manage text data with the same efficiency that marks its
handling of more abstract data. In fact, efforts to devise an artificial
intelligence system to translate between languages took place as early as the
1950s. The U.S. government was anxious to expedite access to Soviet espionage
materials regarding their missile program (Russell and Norvig, 1995:20f.).
It is from these early efforts that the classic example of the
failure of a computer to understand language came about. Most notably
popularized by Noam Chomsky as an example in the discussion of how individual
language utterances are forever unique rather than pre-patterned (cf. his
effective reputation of Skinner and behaviorism on this point, in Syntactic
Structures, 1957), a computer effort to translate the biblical quote "the
spirit is willing, but the flesh is weak," produced "the wine is agreeable, but
the meat is spoiled."5
In fact, this particular event served as a rallying cry around
which the so-called "AI winter" was identified as having begun (Russell and
Norvig, 1995:21, 24f:). The space program then proceeded to dominate much of
the development of artificial intelligence technology. Nonetheless, scholars of
Sanskrit were among the earliest pioneers in electronic text technology, or
e-textnology. I'm referring of course to the landmark achievement of Lehman and
Ananthanarayanan in 1971 with their digitization of the Rig Veda and Shatapatha
BrAhmaNa under a National Endowment for the Humanities Grant. This is not
surprising considering the long-standing deference to Sanskrit linguistic theory
throughout the Artificial Intelligence community. 6
Subsequent to these early efforts, work began in the 1980s to
develop a system by which the structure and content within a text could be
identified. As noted above, the term coined for this technology was "markup" or
"text markup." The original set of rules for writing markup code is known as
SGML, or Standard Generalized markup Language. SGML was the guiding
specification that standardized the way that its many offspring markups were to
be written.
The most well-known of these offspring is, of course, HTML.
HTML has its own spirit/flesh shortcomings however. Take for instance the
occasion of Italics in any given sentence. An HTML page will show you Italics
and it's up to you to figure out why they are there. The basic technology
behind this is remarkably simple, however.
Angle brackets, or less-than/greater-than symbols, are used to
indicate an element of a document which can be described by a general name, such
as one to designate an author: <author>Shakespeare</author>. These
brackets and the information they contain are usually called tags.7 They
generally appear at the beginning and immediately following whatever it is that
they are marking. The tags are usually only visible to the computer or
processing software. If it sounds simple, it really is. In spite the
commercially-hyped changes in technology, this basic format has remained
unchanged for over 15 years.
When you tag the text in this way it becomes possible to make
your computer work with text material which has been marked by the tags as
though it knew with the text was about. At a simple level, this means that when
you search a database or the web for Tantra ritual based on XML, you can exclude
certain unscrupulous online sources of questionable value. Or, for instance, in
a search for "Xanadu" you can avoid turning up songs by Olivia-Newton John and,
instead, focus on phonetically rich poetic offerings by Samuel Taylor Coleridge.
Of course, the technology to work with markup has largely remained so difficult
to learn and costly to employ that scholars have had little or no time to spare
for such steep learning curves and personnel resources.
Enter XML. XML begins with the basic principle of markup tags, removes
some of the more esoteric parts of SGML, and invokes a ruthlessly strict
syntax. What is achieved through this "strict" set of rules for tags is, among
other things, a wider range of software and a simpler set of applications for
writing, working with, and storing XML documents.
The other advantage of XML lies in the word "extensible."
This indicates that there is flexibility in the kinds of things that XML can
identify. For years the primary academic tagging set has been the Text Encoding
Initiative, or TEI. TEI is fairly all-purpose, primarily oriented toward
document structure (e.g., divisions and subdivisions of text), and predominantly
shaped by the requirements of Western -- usually English -- literature. Until
just recently, software for TEI was difficult to learn and fairly expensive. It
is understandable that this most ubiquitous of academic markup technologies has
not been widely adopted.
For Indologists, XML opens a panacea of possibilities by
contrast. Basic mechanisms for indicating text structure, such as TEI (why
reinvent the wheel?), can be augmented with tradition- or text-specific tags
designed to enable more precise marking, study, commentary, and reuse of an
ongoing--even lifelong--text research project. Add to this the support for
non-Roman-script languages, and you have an ideal tool for building a research
resource. Of course, XML is also designed for storing and accessing all manner
of media formats so that recordings and tapes from field study can be integrated
into the research resource.
XML takes what HTML has made possible and expands upon it.
For example, the beginning of my own work with markup technology came with my
dissertation which traced every occasion, in relative chronological order, of
some 13 different words related to the notion of the self in the Rig Veda.
Working with the relative chronology established by Oldenberg (1888), Witzel
(1995b), Lanman (1880), Narten (1968), et al., I was able to make links between
the several hundred different occasions each of some 13 other terms related to
the notion of the self throughout the Rig Veda. 8
When I was writing, it was possible to review all occasions,
in order, of a given word to identify different keywords associated with it as
they appeared or presented themselves as a new path of inquiry. What I was not
able to do was to make these links between the terms tell me anything more than
that they were there and that there was one more after each occasion to which I
could advance by clicking with my mouse. I could not, for instance, classify
all occasions of brahman which occurred in the gaayatrii meter, in a hymn
by Agastya, addressed to Indra.
This is possible with XML, of course, but there is a lot more
it can do as well. Ongoing research can not only be linked, but annotated,
identified according to native categories (e.g., vyakarana, nirukta, comparison
of different chronologies, phonetic variations, thematic or terminological
associations, etc.) and continuously re-accessed and augmented based on
different inquiries and new discoveries. Where a single passage is referred to
or quoted in five other places, XML linking enables a single link to access and
insert all five, or simultaneously open them in additional windows.
Specifics of XML. The accessibility, affordability (most of the best tools are
free!), and learnability of the XML technology, or "X-nology," is possible due
to the strict rules of syntax mentioned above. If you've never done any markup
before, this will make little difference to you (in other words, you'll be
starting off with good habits).
If you're used to some of the cut-n-paste shortcuts and other
tricks of HTML, your coding habits will have to "get some manners" before
working successfully with XML. It is worth it, in either case, however.
Problem? You have lots of texts in rough HTML? Good news! A program called
"HTML Tidy" will make your old HTML into new, rule-obeying XML-compatible
material. It's free, works pretty easy, and fits most computer systems.
9
1. Terminology. Before going further, a couple of basic terminological
clarifications will help. "Elements" are parts of the text marked with tags
(cf. note 7 above). Technically, in the following example
"<author>Shakespeare</author>", the entire string of characters
between the quote marks is an element. The word "author" is the element name.
It is common to refer to the pair of <author></author> components as
a "tag." Thus, you could say that "author" is the "tag name," but markup
purists won't accept this when you hobnob with them.
Most such tags have a start tag, and an end tag (marked by the
"/"), and they "wrap around"-or come before and after-the text that they are
tagging. The end tag is also called the "closing tag" and when it is present,
the complete wrapped element is often said to be "closed" as of the occurrence
of the end tag.
It is possible to add more information to a tag, while keeping
the same element name. Consider the following: "<author reference="last
name">Shakespeare</author>" as opposed to "<author
reference="full name">William Shakespeare</author>". The addition of
'reference="last name"' is called an "attribute." The word "reference," then
would be the attribute name and "last name" or "full name" would be the
attribute value (sometimes this is called a "variable," but not as frequently).
Notice also that the attribute need only be stated in the
start tag. In other words, it does not have to be "closed," or "ended" as with
the tag itself. You can also add as many attributes to an element as required
by your research task, or required level of detail (cf. Van Nooten and Holland,
1994:1, for the information below):
<mantra id="rv1.1.1a" meter="gaayatrii" family="vaizvaamitra" deity="agni">
agni'm iiLe puro'hitaM
</mantra>
These basic structures underlie all markup languages-for the
most part-currently in use.10
Also, for purposes of making notes or place markers, it's
actually possible to have a tag that doesn't contain anything, that just sits by
itself: <note correction="This is where I suggest this mandala was originally
appended" source="me" date="January, 2000" />. Note that there is not a pair
of tags here, just one, which "closes itself" by having a "/" just before the
">" symbol. This is called an "empty tag" or "empty element" and is
acceptable for use almost anywhere.
If you follow the basic XML syntax rules outlined below, you
can begin adding your own notes to whatever primary sources you have in
electronic form, or notes you've entered on disk over the years. Then, using
the XSLT techniques below and a little practice, you can begin spending less
time finding what you've researched and more time reflecting on, adding
to, and publishing your work.
As you know, each time you immerse yourself in what you're
studying, you know that text line-by-line best right then. You need to be able
to mark what you're finding as you find it, while your focus is at its most
refined. In addition, you might see other points for later study or ongoing
research. All these can be tagged as you go, and annotated over time.
Later, however, when it comes time to write something about it
or summarize it, you want quick access to all the different things you've looked
at, and maybe even combine different sets of notes you've made on the same text
as you make connections in your mind for your written work. This is where XML
rules enable powerful tools like XSLT to actualize the rich potential of your
work.
2. Syntax Rules for XML The syntax for XML is more strict than the usual HTML and
other markup. These strict rules are based on a notion of what is called
"well-formedness." Well-formedness is a new concept introduced by XML.
Essentially this means that all tags must either have closing tags or be written
in a special form (as described below), and that all the elements must nest one
within the other.
It's sort of like having good manners: saying "please" before
receiving something, and "thank you" afterward-or, use starting tags before the
content you're marking, and closing tags afterward, for instance. The following
short review is derived from the World Wide Web Consortium's (W3C) web
site.11
1. Although overlapping is widely tolerated in existing
browsers, it cannot be done in XML. This can also be called "straddling" your
tags.
CORRECT- nested elements:
<p>here is an emphasized
<emph>word</emph>.</p>
INCORRECT- overlapping elements:
<p>here is an emphasized
<emph>word.</p></emph>
Here's a more familiar example of proper tag nesting:
<rigveda>
<mandala>
<hymn>
<mantra></mantra>
<mantra></mantra>
</hymn>
</mandala>
--AND SO ON--
</rigveda>
Most conventional documents are intrinsically nested anyway
(e.g., paragraphs within sections within chapters, etc.), and this key feature
of XML syntax takes advantage of and formalizes it. As you will see in the
XSLT discussion below, familiarity with the hierarchical nesting of whatever
text you're working with is a key to powerful features. This notion of layered,
hierarchical nesting should be a familiar concept to scholars and students of
Tantra (not unlike concentric circles in, for instance, the Krama school of
Kashmir Shaivism).12
2. Element and attribute names must be in lower case. This
difference is necessary because XML is case-sensitive e.g. <li> (for a
list item) and <LI> are considered different tags. So to avoid confusion,
lower case is the norm.
This means, by the way, that the Harvard-Kyoto ASCII
transliteration is less conducive to tag or element names than is ITRANS. For
attribute values, almost any character can be used however. Thus, you can have:
<mantra meter="gaayatrii">agni'm iiLe. .. </mantra>.
3. Tags which begin must end. In traditional markup such as
HTML, the paragraph or "p" tags are often left unterminated or not closed, in
other words, bad manners-a "please" with no "thank you."
CORRECT- closed, or terminated, tags/elements:
<mantra>agni'm iiLe puro'hitaM</mantra>
<mantra>yaJJa'sya deva'm Rtvi'jam</mantra>
INCORRECT- unclosed, or unterminated, tags/elements:
<mantra>agni'm iiLe puro'hitaM
<mantra>yaJJa'sya deva'm Rtvi'jam
4. Attribute values for elements/tags must always be quoted,
that is, contained within single or double quote marks (whichever you choose,
either " " or ' ', you must match them for each attribute, you cannot have
'something", but instead, "something" or 'something'). So, for working with
attributes or, "variables" which are part of an element or tag, you should be
careful to note this.
CORRECT- quoted attribute values:
<mandala id="3">
INCORRECT- unquoted attribute values:
<mandala id=3>
5. (This one is a little abstract) Any time you have an
attribute that is called "id" such as in the various div, div1, div2, div3, and
div4 elements which structure the main and sub headings for your ETD, the value
of that attribute must begin with a letter of the alphabet:
CORRECT- an id beginning with an alphabetic character:
<mantra id="rv3.62.10">
INCORRECT- an id beginning with a numerical character:
<mantra id="3.62.10">
It is important to note that the latter "INCORRECT" codes in
each example will "work" or view correctly in most browsers like Netscape.
Nonetheless, they will not pass the technical checks for XML, so you can't make some of the wider
range of XML tools work.
These are the basic, essential rules for producing XML
documents according to their most fundamental and indispensable criteria, that
of "well-formedness." Within the parameters of these few basic rules, tags of
almost any kind can be added to a document to reflect and enhance the level of
inquiry desired.13
3. Optional Additions to XML
One of the best things to remember as you reflect on the basic
rules above, is the fact that you can begin working with existing XML texts
without having to start from scratch with a new file. When you begin with a
well-structured file like the RV, then you can begin to see and understand how
to design and apply your own XML uses for your research. Adding tags to an XML
file is easy to do with various free tools (see links in the appendix) available
so that you can tailor the examples below to your own needs.
Those rules of syntax for well-formed documents, presented
above insure the key criteria essential to an XML document. Every XML document
must be well-formed. In addition, an XML document can be "valid." To be valid,
however, a set of tagging rules must exist such that your XML document is not
only well-formed, but that it obeys these rules.
The rules for making a valid XML document are called Document
Type Definitions, or DTD's, specify the names of tags, which tags can nest in
which other tags, and what attributes can or must be used. Beyond this, I would
recommend that you visit the online resources at http://www.oasis-open.org/ to
learn more if you are interested. However, for the work we will be doing, it is
not essential to have a DTD.14
One reason for DTD's is they can help you be sure you apply an
orderly structure to your e-text which makes it easy to format the document for
printing or display when reviewing your research or publishing. You can use the
Text Encoding Initiative's (TEI) well-known tags for the basic structure of any
document. Building on this basic outline, you can take advantage of XML's
extensibility to add my own categories of inquiry such as chronology, words for
the self, thematic changes, etc.
XML comes with a raft of related standards, some already
completed, and others close behind. These standards add to the functionality
and truly exciting potential of X-nology, often without requiring expensive
software. A great deal of electronic commerce and the wireless information
devices such as web-smart cell phones use X-nology. To be sure, there are
deluxe XML gadgets out there with some pretty deluxe prices, but there are just
as many functional ones which are free or less than $100.
Some of these related XML standards are fairly esoteric for
the purposes of the current article, such as the Document Object Model (DOM)
which enables XML to be piped in and out of different software and computers
with ease. For the most part, you would never know it's there. It's like a
cyber-passport for XML's travels inside of your computer and beyond.
Two XML "technologies" that are recent standards (as of
November, 1999)-Extensible Stylesheet Language for Transformations (XSLT) and
XPath (a system for describing where a piece of information is based on the tags
and other content in an XML document)-are especially powerful and exciting. I
will introduce some of the core functions of XSLT below. To begin to do more,
XPath will enable more precise selection of tags and ways of working with your
files. These specifications enable sophisticated manipulation and extraction of
data you've added to an XML file with a fairly human-friendly set of commands.
Let's put it this way: if you can manage to research and read Sanskrit, Vedic,
or Prakrit, these languages are destined to quickly be your favorite research
tool.
It is important to underscore that all of these technologies
are standards. They are not "owned" by a corporation (though, predictably,
Microsoft is trying to by implementing XML their own unique way). Therefore, to
begin working with XML is not the same thing as upgrading in response to the
commercial forces of the digital market place, only to be out of date before
you've even received the bills.
The basic tagging technology behind XML has existed since
before SGML, and is a reliable international standard (ISO 8879). XML has been
designed to be compatible with older systems (you maybe can't get all the bells
and whistles, but you can never be at risk of not being able to read your
carefully constructed data). In short, your time invested in X-nology is a
long-term investment, not some new cyber-flavor-of-month.
If you want to begin tagging, you can use a plain text editor
such as the Microsoft Wordpad in your Windows/Programs/Accessories menu. There
are plenty of tools for specific work with XML which work much better, however.
If you are adding your own notes and comments, marking passages, or organizing
your own files of data, then you will want to use an XML editor such as those at
the links provided in the Appendix. For Windows, WordPerfect 2000 is the best
and easiest to use. Of all these tools, it is the most "costly"-but still quite
reasonable at less than $100 educational price. On a Mac, at the same price is
the Pro version of Media Design in-Progress's Emilé. For free, on
Windows there is a wide variety. I prefer the simple approach which
concentrates on keystrokes rather than some fussy graphic interface, in a
product called XED. On the Mac, the Lite version of Emilé is the best.
Important: the advantage of working with an
XML-automating software such as these is that they make it nearly-certain that
all the rules are followed, so your files will work the first time. If you just
type your own, typo's can make errors which violate the XML rules.
But XML tagging is only part of the story. It is the
indispensable core foundation, but the addition of a few XSLT commands is where
the magic happens. The more detail—with attributes or additional
elements—that you add, the more you can do with your text.
|