Home About Publications Projects Resources Search
IJTS Logo
ISSN 1084-7553

IJTS
Special Edition
June 2000
Table of Contents
Abstracts
Magically Storming the Gates...

IJTS Sections
IJTS Home
About the Editors
Aims of the IJTS
Editorial Policy
Distribution Policy
Copyright Policy
Current Issue
Back Issues
How to Submit
Members' Benefits
Members' Login
 
 
Magically Storming the Gates of Buddhahood:
Extensible Text Technology (XML/XSLT)
as a Simulacrum for Research
 
by John Robert Gardner
Introduction / Extensible Markup Language and the Birth of E-textnology / Specifics of XML / Extensible Stylesheet Language for Transformations and XPath / XSL Transformations and the Study of Mantra / Conclusion / Appendix 1: Tools and Resources for Working with XML and XSLT / Appendix 2: XPath Primer / Bibliography

Introduction.1

This article explores potential of electronic text technology, specifically Extensible Markup Language (XML) and Extensible Stylesheet Language for Transformation (XSLT), in the study of mantra and text resources. The reader should know that due to the necessity for clear explanation of this technology a detailed, incisive, and otherwise comprehensive treatment of mantra theory must remain secondary. Not unlike the competing demands for developmental time that a scholar of the humanities finds when trying to add digital technology to their arsenal of research techniques, this article strives for an appropriate balance. Accordingly, the citations related to mantra use are points of academic reference rather than a suggestion of rhetorical conclusiveness.

In the present document, the phenomenon of mantra presentation-specifically Vedic and Tantric-will serve as analog and a source of examples in the illumination of the new, extensible, text technology for research represented by XML and its related standards. There is a great deal more to the technology and related tools for working with XML than the basic outline of its primary characteristics presented below. However, if strictly followed, what I have summarized will give you sufficient grasp to begin benefiting from this technology in your own work. I've prepared files of working examples, as well as a special edition of the Rig Veda in XML, which accompany this publication online 2.

In the case of XSLT, the reader who is already familiar with this powerful document command set will know that I've barely scratched the surface. Still, what is possible with a simple set of basic commands is quite powerful. In addition, I ask that the technically savvy reader accept the more intuitive descriptions for the commands provided in some places rather than the actual verbatim from the specification in order to ease the learning curve.

It remains to each reader and the resources of electronic texts relevant to her/his work that they have available to them as to what additional functions can be found. The examples below concentrate on the core functions. For instance, I will not present the perfunctory and largely esoteric lines of code, which come before and after the key operators I am explaining. These are somewhat obtuse to the uninitiated and are themselves largely unchanging from example to example. As a result, they could deflect the reader from understanding basic operations and can therefore be learned later as needed. 3 Unfortunately, I can assure you of no macrocosmic wisdom or significance lying behind these code snippets.

This is not that different from the basic premise that a disciple can only begin to direct the deity's power after s/he renders themself fit to do so (Beyer, 1978:36). In just the same way can XSLT be unleashed with repetition of core techniques in working with primary sources. Similar to how japa cultivates power and concentration (Sadhu Santidev, 1999 -vol. II: 112) so the core units of condensed code when carefully practiced can allow one's ability with XSLT to permeate the full gamut of its potential.

I could draw the analogy further, and it would not be disingenuous to do so. There is a conceptual parallel with the function of mantra in Vedic and Tantric practice which became apparent as I looked through various sources for examples which might make the application of the technology appear more familiar. I will briefly revisit this discussion in the third and final section of this paper.

With regard to the title of this paper, I was reading Stephan Beyer's The Cult of Tara and noticed his citation of Kierkegaard: "Mysticism has not the patience to wait for God's revelation . . ." Beyer continues, "if we should ever be forced to attempt a definition of 'Tantra,' we would say that it is a technique for magically storming the gates of Buddhahood . . . In the broadest sense, magic is the manipulation of a distant object through control of a simulacrum that is in some way associated with it" (Beyer, 1978:92).

Similarly, XSLT-an accessible, efficacious tool for working with electronic texts-is directly associated with XML. It is written according to XML's rules and acts upon, or manipulates, it. XSLT is a simulacrum for navigating and traversing the components of an electronic text resource. Considering the long anticipation of scholars non-technical disciplines for accessible and powerful tools for working with their resources, XSLT is a "magical storming of the gates" of technological power.

The reader must understand that the discussion of mantra herein occurs for two related, but very different reasons. The first, and most pedestrian, concerns the context in which this paper is presented-a journal of tantric studies-and building the applied technology examples upon the subject matter is helpful. The second, and more evocative, is the useful analogy for understanding the relationship between XML and XSLT. The former representing the core semantics-or syllables-and the latter the effectuation (cf. Beyer, 1978:37, 243). In neither case am I suggesting an elevation of XML technology to tantric praxis, or a misrepresentation of the latter as the former.

Before the technology for research with XML can be demonstrated or further analogized with respect to mantra and its study, however, the core facets must be presented in some detail. It is assumed that the reader has a working knowledge, or at least familiarity with, the basics of markup technology. In other words, some rudimentary grasp of how HTML looks and works will make the following discussion infinitely more fruitful. In the absence of such prior knowledge, the astute reader will nonetheless find the basic underpinnings of XML technology remarkably accessible.

The paper has three sections. In the first, I will introduce XML. In the second, XSLT will be discussed. The third section will present applications of this technology for identifying and extracting mantra material for various ritual uses of the Rig Veda. With a little practice, you can also adapt them to serve your own research interests. An appendix regarding access to and installation of the related tools has also been included. It is assumed that the reader has a working familiarity with the World Wide Web, some awareness of HTML, and the ability to download a file and perform simple installations of software.

I. Extensible Markup Language and the Birth of E-textnology

Context. With the recent advent of XML and its attendant suite of technologies, scholars have at their disposal a sophisticated set of text analysis and manipulation tools that do not demand mastery of overly-abstract programming languages. Electronic Texts, or e-texts, can have handles and/or topic "markers" built into, and continuously added to them according to a scholar's own-or a collaborative group's-research agenda throughout the course of their career.

Appropriately, then, we have before us the era of electronic text technology, or "e-textnology." This new development builds upon the very earliest and most consistent threads of computer technology development. It derives from one of the most consistent and unchanging underpinnings of the otherwise always-changing world of digital technology, the syntax of electronic text markup.4

Since the late 1980s and before, computer technology has striven to accurately manage text data with the same efficiency that marks its handling of more abstract data. In fact, efforts to devise an artificial intelligence system to translate between languages took place as early as the 1950s. The U.S. government was anxious to expedite access to Soviet espionage materials regarding their missile program (Russell and Norvig, 1995:20f.).

It is from these early efforts that the classic example of the failure of a computer to understand language came about. Most notably popularized by Noam Chomsky as an example in the discussion of how individual language utterances are forever unique rather than pre-patterned (cf. his effective reputation of Skinner and behaviorism on this point, in Syntactic Structures, 1957), a computer effort to translate the biblical quote "the spirit is willing, but the flesh is weak," produced "the wine is agreeable, but the meat is spoiled."5

In fact, this particular event served as a rallying cry around which the so-called "AI winter" was identified as having begun (Russell and Norvig, 1995:21, 24f:). The space program then proceeded to dominate much of the development of artificial intelligence technology. Nonetheless, scholars of Sanskrit were among the earliest pioneers in electronic text technology, or e-textnology. I'm referring of course to the landmark achievement of Lehman and Ananthanarayanan in 1971 with their digitization of the Rig Veda and Shatapatha BrAhmaNa under a National Endowment for the Humanities Grant. This is not surprising considering the long-standing deference to Sanskrit linguistic theory throughout the Artificial Intelligence community. 6

Subsequent to these early efforts, work began in the 1980s to develop a system by which the structure and content within a text could be identified. As noted above, the term coined for this technology was "markup" or "text markup." The original set of rules for writing markup code is known as SGML, or Standard Generalized markup Language. SGML was the guiding specification that standardized the way that its many offspring markups were to be written.

The most well-known of these offspring is, of course, HTML. HTML has its own spirit/flesh shortcomings however. Take for instance the occasion of Italics in any given sentence. An HTML page will show you Italics and it's up to you to figure out why they are there. The basic technology behind this is remarkably simple, however.

Angle brackets, or less-than/greater-than symbols, are used to indicate an element of a document which can be described by a general name, such as one to designate an author: <author>Shakespeare</author>. These brackets and the information they contain are usually called tags.7 They generally appear at the beginning and immediately following whatever it is that they are marking. The tags are usually only visible to the computer or processing software. If it sounds simple, it really is. In spite the commercially-hyped changes in technology, this basic format has remained unchanged for over 15 years.

When you tag the text in this way it becomes possible to make your computer work with text material which has been marked by the tags as though it knew with the text was about. At a simple level, this means that when you search a database or the web for Tantra ritual based on XML, you can exclude certain unscrupulous online sources of questionable value. Or, for instance, in a search for "Xanadu" you can avoid turning up songs by Olivia-Newton John and, instead, focus on phonetically rich poetic offerings by Samuel Taylor Coleridge. Of course, the technology to work with markup has largely remained so difficult to learn and costly to employ that scholars have had little or no time to spare for such steep learning curves and personnel resources.

Enter XML. XML begins with the basic principle of markup tags, removes some of the more esoteric parts of SGML, and invokes a ruthlessly strict syntax. What is achieved through this "strict" set of rules for tags is, among other things, a wider range of software and a simpler set of applications for writing, working with, and storing XML documents.

The other advantage of XML lies in the word "extensible." This indicates that there is flexibility in the kinds of things that XML can identify. For years the primary academic tagging set has been the Text Encoding Initiative, or TEI. TEI is fairly all-purpose, primarily oriented toward document structure (e.g., divisions and subdivisions of text), and predominantly shaped by the requirements of Western -- usually English -- literature. Until just recently, software for TEI was difficult to learn and fairly expensive. It is understandable that this most ubiquitous of academic markup technologies has not been widely adopted.

For Indologists, XML opens a panacea of possibilities by contrast. Basic mechanisms for indicating text structure, such as TEI (why reinvent the wheel?), can be augmented with tradition- or text-specific tags designed to enable more precise marking, study, commentary, and reuse of an ongoing--even lifelong--text research project. Add to this the support for non-Roman-script languages, and you have an ideal tool for building a research resource. Of course, XML is also designed for storing and accessing all manner of media formats so that recordings and tapes from field study can be integrated into the research resource.

XML takes what HTML has made possible and expands upon it. For example, the beginning of my own work with markup technology came with my dissertation which traced every occasion, in relative chronological order, of some 13 different words related to the notion of the self in the Rig Veda. Working with the relative chronology established by Oldenberg (1888), Witzel (1995b), Lanman (1880), Narten (1968), et al., I was able to make links between the several hundred different occasions each of some 13 other terms related to the notion of the self throughout the Rig Veda. 8

When I was writing, it was possible to review all occasions, in order, of a given word to identify different keywords associated with it as they appeared or presented themselves as a new path of inquiry. What I was not able to do was to make these links between the terms tell me anything more than that they were there and that there was one more after each occasion to which I could advance by clicking with my mouse. I could not, for instance, classify all occasions of brahman which occurred in the gaayatrii meter, in a hymn by Agastya, addressed to Indra.

This is possible with XML, of course, but there is a lot more it can do as well. Ongoing research can not only be linked, but annotated, identified according to native categories (e.g., vyakarana, nirukta, comparison of different chronologies, phonetic variations, thematic or terminological associations, etc.) and continuously re-accessed and augmented based on different inquiries and new discoveries. Where a single passage is referred to or quoted in five other places, XML linking enables a single link to access and insert all five, or simultaneously open them in additional windows.

Specifics of XML. The accessibility, affordability (most of the best tools are free!), and learnability of the XML technology, or "X-nology," is possible due to the strict rules of syntax mentioned above. If you've never done any markup before, this will make little difference to you (in other words, you'll be starting off with good habits).

If you're used to some of the cut-n-paste shortcuts and other tricks of HTML, your coding habits will have to "get some manners" before working successfully with XML. It is worth it, in either case, however. Problem? You have lots of texts in rough HTML? Good news! A program called "HTML Tidy" will make your old HTML into new, rule-obeying XML-compatible material. It's free, works pretty easy, and fits most computer systems. 9

1. Terminology. Before going further, a couple of basic terminological clarifications will help. "Elements" are parts of the text marked with tags (cf. note 7 above). Technically, in the following example "<author>Shakespeare</author>", the entire string of characters between the quote marks is an element. The word "author" is the element name. It is common to refer to the pair of <author></author> components as a "tag." Thus, you could say that "author" is the "tag name," but markup purists won't accept this when you hobnob with them.

Most such tags have a start tag, and an end tag (marked by the "/"), and they "wrap around"-or come before and after-the text that they are tagging. The end tag is also called the "closing tag" and when it is present, the complete wrapped element is often said to be "closed" as of the occurrence of the end tag.

It is possible to add more information to a tag, while keeping the same element name. Consider the following: "<author reference="last name">Shakespeare</author>" as opposed to "<author reference="full name">William Shakespeare</author>". The addition of 'reference="last name"' is called an "attribute." The word "reference," then would be the attribute name and "last name" or "full name" would be the attribute value (sometimes this is called a "variable," but not as frequently).

Notice also that the attribute need only be stated in the start tag. In other words, it does not have to be "closed," or "ended" as with the tag itself. You can also add as many attributes to an element as required by your research task, or required level of detail (cf. Van Nooten and Holland, 1994:1, for the information below):

<mantra id="rv1.1.1a" meter="gaayatrii" family="vaizvaamitra" deity="agni">
agni'm iiLe puro'hitaM
</mantra>

These basic structures underlie all markup languages-for the most part-currently in use.10

Also, for purposes of making notes or place markers, it's actually possible to have a tag that doesn't contain anything, that just sits by itself: <note correction="This is where I suggest this mandala was originally appended" source="me" date="January, 2000" />. Note that there is not a pair of tags here, just one, which "closes itself" by having a "/" just before the ">" symbol. This is called an "empty tag" or "empty element" and is acceptable for use almost anywhere.

If you follow the basic XML syntax rules outlined below, you can begin adding your own notes to whatever primary sources you have in electronic form, or notes you've entered on disk over the years. Then, using the XSLT techniques below and a little practice, you can begin spending less time finding what you've researched and more time reflecting on, adding to, and publishing your work.

As you know, each time you immerse yourself in what you're studying, you know that text line-by-line best right then. You need to be able to mark what you're finding as you find it, while your focus is at its most refined. In addition, you might see other points for later study or ongoing research. All these can be tagged as you go, and annotated over time.

Later, however, when it comes time to write something about it or summarize it, you want quick access to all the different things you've looked at, and maybe even combine different sets of notes you've made on the same text as you make connections in your mind for your written work. This is where XML rules enable powerful tools like XSLT to actualize the rich potential of your work.

2. Syntax Rules for XML The syntax for XML is more strict than the usual HTML and other markup. These strict rules are based on a notion of what is called "well-formedness." Well-formedness is a new concept introduced by XML. Essentially this means that all tags must either have closing tags or be written in a special form (as described below), and that all the elements must nest one within the other.

It's sort of like having good manners: saying "please" before receiving something, and "thank you" afterward-or, use starting tags before the content you're marking, and closing tags afterward, for instance. The following short review is derived from the World Wide Web Consortium's (W3C) web site.11

1. Although overlapping is widely tolerated in existing browsers, it cannot be done in XML. This can also be called "straddling" your tags.

CORRECT- nested elements:

<p>here is an emphasized <emph>word</emph>.</p>

INCORRECT- overlapping elements:

<p>here is an emphasized <emph>word.</p></emph>

Here's a more familiar example of proper tag nesting:

<rigveda>
<mandala>
<hymn>
<mantra></mantra>
<mantra></mantra>
</hymn>
</mandala>

--AND SO ON--

</rigveda>

Most conventional documents are intrinsically nested anyway (e.g., paragraphs within sections within chapters, etc.), and this key feature of XML syntax takes advantage of and formalizes it. As you will see in the XSLT discussion below, familiarity with the hierarchical nesting of whatever text you're working with is a key to powerful features. This notion of layered, hierarchical nesting should be a familiar concept to scholars and students of Tantra (not unlike concentric circles in, for instance, the Krama school of Kashmir Shaivism).12

2. Element and attribute names must be in lower case. This difference is necessary because XML is case-sensitive e.g. <li> (for a list item) and <LI> are considered different tags. So to avoid confusion, lower case is the norm.

This means, by the way, that the Harvard-Kyoto ASCII transliteration is less conducive to tag or element names than is ITRANS. For attribute values, almost any character can be used however. Thus, you can have: <mantra meter="gaayatrii">agni'm iiLe. .. </mantra>.

3. Tags which begin must end. In traditional markup such as HTML, the paragraph or "p" tags are often left unterminated or not closed, in other words, bad manners-a "please" with no "thank you."

CORRECT- closed, or terminated, tags/elements:

<mantra>agni'm iiLe puro'hitaM</mantra>
<mantra>yaJJa'sya deva'm Rtvi'jam</mantra>

INCORRECT- unclosed, or unterminated, tags/elements:

<mantra>agni'm iiLe puro'hitaM
<mantra>yaJJa'sya deva'm Rtvi'jam

4. Attribute values for elements/tags must always be quoted, that is, contained within single or double quote marks (whichever you choose, either " " or ' ', you must match them for each attribute, you cannot have 'something", but instead, "something" or 'something'). So, for working with attributes or, "variables" which are part of an element or tag, you should be careful to note this.

CORRECT- quoted attribute values:

<mandala id="3">

INCORRECT- unquoted attribute values:

<mandala id=3>

5. (This one is a little abstract) Any time you have an attribute that is called "id" such as in the various div, div1, div2, div3, and div4 elements which structure the main and sub headings for your ETD, the value of that attribute must begin with a letter of the alphabet:

CORRECT- an id beginning with an alphabetic character:

<mantra id="rv3.62.10">

INCORRECT- an id beginning with a numerical character:

<mantra id="3.62.10">

It is important to note that the latter "INCORRECT" codes in each example will "work" or view correctly in most browsers like Netscape. Nonetheless, they will not pass the technical checks for XML, so you can't make some of the wider range of XML tools work.

These are the basic, essential rules for producing XML documents according to their most fundamental and indispensable criteria, that of "well-formedness." Within the parameters of these few basic rules, tags of almost any kind can be added to a document to reflect and enhance the level of inquiry desired.13

3. Optional Additions to XML

One of the best things to remember as you reflect on the basic rules above, is the fact that you can begin working with existing XML texts without having to start from scratch with a new file. When you begin with a well-structured file like the RV, then you can begin to see and understand how to design and apply your own XML uses for your research. Adding tags to an XML file is easy to do with various free tools (see links in the appendix) available so that you can tailor the examples below to your own needs.

Those rules of syntax for well-formed documents, presented above insure the key criteria essential to an XML document. Every XML document must be well-formed. In addition, an XML document can be "valid." To be valid, however, a set of tagging rules must exist such that your XML document is not only well-formed, but that it obeys these rules.

The rules for making a valid XML document are called Document Type Definitions, or DTD's, specify the names of tags, which tags can nest in which other tags, and what attributes can or must be used. Beyond this, I would recommend that you visit the online resources at http://www.oasis-open.org/ to learn more if you are interested. However, for the work we will be doing, it is not essential to have a DTD.14

One reason for DTD's is they can help you be sure you apply an orderly structure to your e-text which makes it easy to format the document for printing or display when reviewing your research or publishing. You can use the Text Encoding Initiative's (TEI) well-known tags for the basic structure of any document. Building on this basic outline, you can take advantage of XML's extensibility to add my own categories of inquiry such as chronology, words for the self, thematic changes, etc.

XML comes with a raft of related standards, some already completed, and others close behind. These standards add to the functionality and truly exciting potential of X-nology, often without requiring expensive software. A great deal of electronic commerce and the wireless information devices such as web-smart cell phones use X-nology. To be sure, there are deluxe XML gadgets out there with some pretty deluxe prices, but there are just as many functional ones which are free or less than $100.

Some of these related XML standards are fairly esoteric for the purposes of the current article, such as the Document Object Model (DOM) which enables XML to be piped in and out of different software and computers with ease. For the most part, you would never know it's there. It's like a cyber-passport for XML's travels inside of your computer and beyond.

Two XML "technologies" that are recent standards (as of November, 1999)-Extensible Stylesheet Language for Transformations (XSLT) and XPath (a system for describing where a piece of information is based on the tags and other content in an XML document)-are especially powerful and exciting. I will introduce some of the core functions of XSLT below. To begin to do more, XPath will enable more precise selection of tags and ways of working with your files. These specifications enable sophisticated manipulation and extraction of data you've added to an XML file with a fairly human-friendly set of commands. Let's put it this way: if you can manage to research and read Sanskrit, Vedic, or Prakrit, these languages are destined to quickly be your favorite research tool.

It is important to underscore that all of these technologies are standards. They are not "owned" by a corporation (though, predictably, Microsoft is trying to by implementing XML their own unique way). Therefore, to begin working with XML is not the same thing as upgrading in response to the commercial forces of the digital market place, only to be out of date before you've even received the bills.

The basic tagging technology behind XML has existed since before SGML, and is a reliable international standard (ISO 8879). XML has been designed to be compatible with older systems (you maybe can't get all the bells and whistles, but you can never be at risk of not being able to read your carefully constructed data). In short, your time invested in X-nology is a long-term investment, not some new cyber-flavor-of-month.

If you want to begin tagging, you can use a plain text editor such as the Microsoft Wordpad in your Windows/Programs/Accessories menu. There are plenty of tools for specific work with XML which work much better, however. If you are adding your own notes and comments, marking passages, or organizing your own files of data, then you will want to use an XML editor such as those at the links provided in the Appendix. For Windows, WordPerfect 2000 is the best and easiest to use. Of all these tools, it is the most "costly"-but still quite reasonable at less than $100 educational price. On a Mac, at the same price is the Pro version of Media Design in-Progress's Emilé. For free, on Windows there is a wide variety. I prefer the simple approach which concentrates on keystrokes rather than some fussy graphic interface, in a product called XED. On the Mac, the Lite version of Emilé is the best.

Important: the advantage of working with an XML-automating software such as these is that they make it nearly-certain that all the rules are followed, so your files will work the first time. If you just type your own, typo's can make errors which violate the XML rules.

But XML tagging is only part of the story. It is the indispensable core foundation, but the addition of a few XSLT commands is where the magic happens. The more detail—with attributes or additional elements—that you add, the more you can do with your text.

Next: Extensible Stylesheet Language for Transformations and XPath
Top: Introduction
Endnotes

1 The first two sections were adapted from material previously prepared for http://vedavid.org and ATLA-CERTR, at http://purl.org/CERTR/ and http://www.oasis-open.org/cover/atlas.html.
Back to text

2 http://www.asiatica.org/publications/ijts/vol4_special/examples.zip, also mirrored at http://vedavid.org/xml/docs/.
Back to text

3 If you wish to advance in the esoteric ranks of XSLT interpretation, you must first find a guru, bring your load of firewood (paper, printed with numerical increments of fiscal significance in most cases), and sit patiently through long, arduous, demanding tapas. My own first "guru," G. Ken Holman, has yet to receive sufficient firewood, though these thanks must suffice.
Back to text

4 Charles Goldfarb, the author of the original set of rules, originally coined the term “markup” by which markup is performed, known as Standard General Markup Language or SGML (Goldfarb and Prescod, 2000:back-jacket). To learn more, see http://www.oasis-open.org/cover/.
Back to text

5 The first draft of this paper was dictated using Dragon Systems' Naturally Speaking on a laptop which was insufficiently powered. Accordingly, and somewhat germane to the current discussion of computers' ability to understand language, here is how the computer "heard" this example of how one of its ancestors failed to "understand:" 'severe it is willing, but the flash is weak,' produced "the wine is agreeable, but the beat is boiled." I should emphasize that Naturally Speaking is the sore typer's dream, and you can control your desktop with it—editing, switching programs—and train it to recognize specialized vocabulary (http://www.dragonsystems.com).
Back to text

6 There is a well-known amenity between Sanskrit and computer technology. PaaNini has been a topic in the AI community for sometime, specifically with regard to his formal system of logic in the grammatical sutras. Also noted by Chomsky (1956, 1957), it is a frequent allusion by computer scientists (Russell and Norvig, 1995:15; Ingerman, 1967:137), and is identified by Chomsky's type 2 grammar (Knuth, 1964:736) as a syntactic origin for propositional logic (Russell and Norvig, 1995:256, 685, cf. Briggs, 1985). My special thanks to Patrick Durusau, the digital wizard of Scholar's Press at Emory University, for one of the BNF references.
Back to text

7 The definitive discussion of the often-misused nomenclature for markup is found Goldfarb and Prescod's XML Handbook, 2nd ed. Specifically, "tag" is an acceptable reference for these markup items, and element refers to what is marked up. "Tag name" is technically not acceptable, though "element name" is—e.g., you can refer to "author" as the element name of the <author> tag; while <author>Shakespeare</author> in its entirety is an element. As you can see, this is indeed confusing for the beginner, so I refer to <author> as a tag, and the word "author" therein as the "tag, or element, name" in this paper (cf. Goldfarb and Prescod, 2000:71-72). Incidentally, I highly recommend this book, and Elliot Rusty Harold's XML Bible as the best resources for the beginner. The first edition of the XML Handbook is not a safe bet as the technology has developed significantly beyond that stage.
Back to text

8 For examples--note that this is a 4-year-old web page, meant for research use more than presentation flair-- see http://vedavid.org/atha.html, for results, see http://vedavid.org/diss/.
Back to text

9 This software can be used on Windows, Unix, or Mac, and is easy to learn, see (http://www.w3.org/People/Raggett/tidy/). HTML with XML rules and extensible tags is known as a formal WWW specification called XHTML- Extensible HTML- and is a good way to start. There are a lot of resources on XHTML, begin with the http://www.w3.org/TR/xhtml1/, and learn more from searching at http://www.xhtml.org/.
Back to text

10 A general principle when you are working is to always err on the side of more detail and information that you are tagging than less. Sometimes, also, it is easier to add more attributes to existing elements (see examples of how to do this in section two of this paper), so you don't have to worry about changing and remembering a complex element structure. E.g., with the RV, you normally are going to always mark mandalas, hymns, verses, and mantras. To do more detailed things, add attributes to the verse or hymn tags rather than adding more tags.
Back to text

11 The XML-conformant version of HTML is a good way to start, with Tidy (mentioned in note 9), getting yourself comfortable with XML, see: http://www.w3.org/TR/xhtml1/#diffs8.
Back to text

12 Thanks to E. Garzilli for furnishing the specifics of this analogy (e-mail, Thu, 06 Jan 2000 17:16:51 –0500).
Back to text

13 There are additional rules and details, but most of these you will not or need not encounter. To introduce them here would detract from the fundamental simplicity of the basic essentials required for the majority of your work with XML. To learn more, follow the links from http://www.xml.com, or http://vedavid.org/xml/.
Back to text

14 XML is also complemented another standard currently nearing completion, called Schemas, which can accomplish many of the tasks managed by DTD's (see the World Wide Web Consortium for more at http://www.w3.org/TR/xmlschema-1/).
Back to text