ThML: Theological Markup Language

For the Christian Classics Ethereal Library

Harry Plantinga

Version 0.8, Monday, July 20, 1998

Abstract

This document describes the Theological Markup Language (ThML), a markup language for theological texts. ThML was developed for use in the Christian Classics Ethereal Library (CCEL), but it is hoped that the language will prove useful for other applications as well. Key design goals are that the language should (1) represent information needed for digital libraries and for theological study involving multiple, related texts, including cross-reference, synchronization, indexing, and scripture; (2) be based on XML and usable with World Wide Web tools, and (3) be easy to learn and use with common web tools.

Introduction

The study of theology involves uses of texts that are infrequent in other areas of study. Theological books usually make many references to the Bible: quotations, commentary, explanations, citations, and the like. Special processing for scripture references can greatly aid study. Theological study also often involves ancient texts available in multiple variations or translations, which may have to be synchronized and displayed in parallel columns. It involves the use of cross-reference systems such as Strongs numbers, various sorts of indexes, and the synchronization of multiple texts in various ways, as for example layers of commentary on a text. Theological study often makes use of several texts related by subject or scripture reference; tools that support library-wide searching by subject or scripture reference are also useful.

Existing markup languages are not well suited for non-commercial theological texts. Word processor formats don't represent semantic information about a text--an area in which HTML is also weak. This has a number of drawbacks--for example, that searching, indexing, and converting to other formats are more difficult. The Text Encoding Initiative (TEI) language is semantically rich but not easy to learn or tuned for theological study. It doesn't offer special handling of scripture references or Strongs-like reference systems, for example. Also, the language is very large and the overhead required to learn and handle the language is high. Commercial formats, including STEP and the Logos Library System (LLS), are not designed for integration with the World Wide Web, and preparing texts for these systems requires expensive software, beyond the means of most individuals. Publication in one of these formats may also be controlled by the company or consortium in question. As a result, few public-domain or on-line texts are available in these formats.

This paper describes the Theological Markup Language, or ThML, which is a markup language for theological texts designed for use in the Christian Classics Ethereal Library (CCEL), an experimental theological library on the Internet. ThML is a superset of HTML and borrows some elements from TEI. It is also designed to handle all of the semantic information in STEP-format documents (version 0.9) as well. Electronic texts for the CCEL will be prepared in Microsoft Word, using paragraph styles and XML elements to represent markup. The resulting files will be converted automatically into XML as ThML markup. These XML files may be used directly by XML-aware applications or converted into formats needed for end use, such as HTML webs, plain text files, PDF, and other formats.

Designing a Theological Markup Language

A language for theological study must handle the normal markup of text into headings, paragraphs, block quotes, emphasized text, and the like. This common markup can be represented with HTML. Markup needs for theological study that go beyond HTML include the special handling of scripture references, numbering and synchronization schemes such as Strongs numbers, handling multiple versions or translations of the same text, handling footnotes, index entries, lexicons, and representing page breaks in the original text.

For the use of digital libraries, bibliographic data about the text should be represented. In fact, bibliographic data about two texts may have to be represented: the electronic text and the book from which it originated. Such a language should also represent subject classifications, edit history, and other relevant meta-data. If the language also represents scripture references, subject index entries, and names, it is possible to build indexes for individual books as well as library-wide indexes of these references.

The language should be rich enough to support conversion to other electronic formats that may be needed. And, especially now that it is possible to print books with very short run and one-off printing presses, the language should represent all information needed for printing the book. Finally, the language should be easy to learn and use and able to be processed with inexpensive, widely available tools. To top off the list of desirable features, making the language extensible and programmable would enable users to address additional needs.

I believe that the widespread availability of web browsers and programs, the widespread knowledge and use of HTML, the rapid development of tools for the Internet, the advent of the ability to work with XML documents in mainstream applications such as Microsoft Office, and the extensibility and programmability of XML, make it the best metalanguage on which to base ThML. This is all the more so given that most use of digital libraries is over the World Wide Web.

Therefore, the design goals for ThML are these:

Theological Markup Language

Based on XML and HTML

Since the Theological Markup Language will be based on XML, it will support all of the markup of HTML, a rich linking language in XLink, and stylesheet support in XSL. HTML may be used for markup of paragraphs, headings, lists, tables, blockquotes, emphasis, etc. Links may make use of the extended link types of XLL, and formatting will be specified in XSL. These facilities will be used wherever possible, to make the language easier to learn for those who already use HTML and easier to use with the World Wide Web. The restrictions of XML are also applicable to ThML documents: the markup must be properly nested; element and attribute names are not case-sensitive but attribute values are, etc.

Document Structure

ThML documents are contained in one, global ThML element, and like HTML documents, they contain a head and a body section.

<ThML>

<head> … </head>

<body> … </body>

</ThML>

The body element contains all of the contents of the print edition of the book on which the text is based. It will be described in detail below. The head element contains bibliographic and meta information about the text, both the electronic publication and the print edition upon which it is based, if any. The information in this section is not generally a part of the original book but taken from MARC records or added to document the publication of the electronic version. It may also include keywords, information about the program used to generate the text, etc.

Any HTML element that may occur in the head element may occur in the head of a ThML document as well. The additional contents of the head element in ThML are described below. ThML, head, and body are all required elements of a ThML document -- in fact, the only required elements.

Divisions of the Text

Structural divisions in the body of the text are marked with <divn> tags as in this example:

<div1 title="The Imitation of Christ">

<div2 type="Book" title="Admonitions on Things Internal" n="2">

<div3 type="Chapter" n="1" title="Of the Inward Life">

<div4 type="Section" n="I">

</div4> </div3> </div2> </div1>

<div1> is used for top level parts of a text, including, for example, a title page, preface, table of contents, chapters, index, etc. Additional levels are used for lesser structural divisions of the document. These structural divisions show the structure of the original text, and they are also used to prepare a table of contents and allow splitting of a text info files and access to a text by section.

The optional title attribute is used in constructing a table of contents and may be used for running heads or other identification purposes. The optional type and n attributes may be used to specify the type and number of the division. If they are present, they will also be used to identify the section in the table of contents.

Whereas an XML file will contain all of the information relevant to an electronic text in a single file, books in HTML format are generally split up into a number of files to facilitate downloading and browsing across the internet. Typically, these splits will occur according to some algorithm, based on minimum and maximum desired segment sizes, breaking the text at divn tags whereever possible. Note that since XML elements must be nested and since divn elements occur only inside ThML, head, body, and other divn elements, the contents of a divn (with the addition of <ThML><head></head><body>…</body></ThML>) must be a legal ThML document. The divn elements must be properly nested, though levels may be skipped. All additional text and markup must be inside divn elements(s).

The division elements may be used to identify locations in a text. Together with the unique publisher, author, and book codes described below, they represent a location-independent means of referring to a part of a text that is intended to be valid even if file locacations and servers change. For example, one could refer to the CCEL edition of Augustine's Confessions, Book 10, Section 5, Paragraph 1 by referring to the publisher ID assigned by the CCEL and the author and book IDs assigned by the publisher, followed by the div numbers and optional paragraph and word numbers: ccel/augustine/confessions#X.5.p3. In this case, ccel is the publisher code, augustine and confessions are identifiers for the author and book, the X refers to the <div2 n="X"> section, the 5 to the <div3 n="5"> subsection, and the p3 to the third paragraph of that section. Appending .w7 would refer to the seventh token (word). Another point in the same document may be referred to by omitting the publisher, author, and book IDs.

The <insertContents/ level="3"> tag may be used to insert a table of contents at that location. In the example above, the title attribute of the <div1> tag would be used as the title of the Table of Contents, and all of the all of the <div1> and <div2> entries would be gathered and listed hierarchically. Each entry would be linked to the appropriate section.

Page Breaks

It is often useful to know the page breaks from the print edition of a book. They may be used as targets for subject index entries identified by page number or to display a text with the pagination of the print edition. Page breaks are marked by the insertion of <pb/> tags, with the n attribute giving the page number of the upcoming page (<pb/ n="37"> or <pb/ n="xii">). These elements should appear at the start of the identified page.

Many electronic texts will also have images of pages available on line. The pb element will also take an href attribute specifying a URI for an image of the page (<pb/ n="37" href="gif/0021a.gif">). So that it is not necessary to add the attributed for every pb element, incrementing rules are defined as follows: If the href attribute is present for one page break and not for subsequent ones, later hrefs will be interpolated by incrementing once for each subsequent <pb> element. Increments are performed by adding one to the rightmost numeric component of the filename section of the URI, i.e. the portion of the URL to the left of a hash mark (#), if it exists. If the URI has a numeric component followed by the letter a or b, then that will also be incremented according to the sequence 0001a, 0001b, 0002a, 0002b, etc. Thus, http://ccel2a.wheaton.edu/edwards/vol1/0001.gif would become http://ccel2a.wheaton.edu/edwards/vol1/0002.gif; vol1/001b.html#x47 would become vol1/002a.html#x47.

Scripture References

In theological texts, scripture passages may be cited, quoted, or explained. Citations refer to a passage, but quotes include the text of a passage in the document. In that case, presentation software should provide an easy way to see the cited passage in a specified translation or a translation of choice -- perhaps with a hypertext link or separate, synchronized window of notes and references. It is also helpful to provide an index of scripture quotes and citations for a text and also perhaps for a whole library.

Scripture references may occur in footnotes, in parentheses (Phil. 2:1-11), or in the text itself -- see Rom. 8:28. Context may be needed in order to interpret a reference -- see verse 29 and 10:8-13. Several passages may be stacked together in one citation (Matt. 5:44, 46; Luke 7:42; John 5:42, 13:35, 14:15, 23; 15:12-13; 21:15-16). For marking scripture citations, ThML will use the <scripRef> element, as in this example:

<scripRef passage="Rom. 8:27,28; 10:8-13" version="NIV">Romans
8:27,28; 10:8-13
</scripRef>

The version attribute specifies the translation or version, and the passage attribute is a list of scripture references separated by commas or semicolons. Each reference may consist of a book name (or abbreviation), a chapter, and a verse. The chapter and verse are separated by a semicolon or period. If the book name or chapter are missing, they are assumed to be the same as the previous reference. If two references are separated by a dash, all of the intermediary verses are included as well. In the case of books with only one chapter, a reference consists of a book name or abbreviation and a verse. Book names should be as they appear in the version in use or a unique prefix of at least two letters of the name. Abbreviations that are not prefixes may also be accepted by programs that process ThML documents.

Software for processing ThML texts will likely have a scripture parser incorporated that finds scripture citations and marks them appropriately, so that it will not be necessary to mark them all by hand. However, parsing text to find and identify scripture references involves several difficulties. One problem is that different translations of the Bible use different versification schemes. For example, Psalm 9 in the King James Version is split into two -- Psalms 9 and 10 -- in the Septuagint. In order to interpret a reference, the versification scheme used must be known. Scripture references will be assumed to be compatible with the versification scheme used by the KJV, ASV, NASB, NIV, and TLB unless otherwise specified.

Also, context is sometimes necessary in interpreting a reference. A passage may refer to Romans 8:28 at one point and later to verses 29 and 30 and chapter 10:8-13. A parser should be able to identify the context in most cases, but in some cases it may be necessary to set the context or turn the parser off. The <scripContext/ version="NIV" passage="Romans 8"> element is used to set the default context for the parser, and the <scripParseOff/> and <scripParseOn/> elements may be used to turn the parser off or on, to prevent linking of a passage such as "Bob had 2 apples and John 3." The version attribute may be set in a scripContext element but it is never set by the parser.

In theological texts, scripture is also sometimes quoted. In this case, it is not desirable to link the reference to the scripture passage, but it may be desirable to incorporate the passage into a table of scripture references. Quotations of scripture may be marked with the <scripture> element. A passage may be represented as in this example:

<scripture passage="Mark 7:16" version="NKJV">If anyone has ears to hear, let them hear!</scripture>

This markup may also be used for a translation or version of a book or a whole Bible, perhaps as in the example below. Scripture marked in this way could be automatically retrieved by book, chapter, and verse with an appropriate program.

<scripContext/ version="Calvin's Translation, in English"> …

<scripContext/ passage="Romans 8"> …

<scripture passage="28">We further know, that to those who love God all things co-operate for good, even to those who are called according to <I>his</I> purpose:</scripture>

<scripture passage="29">for those whom he has foreknown, he has also predetermined to be conformed to the image of his Son, that he might be the first born among many brethren;</scripture>

Explanation or commentary on a passage involves a semantic relationship between the explanation and the passage explained. This relationship should be represented in the text in order to be able to build an index of scripture commentary. For example, it would be useful to be able to see everything the early church fathers, said or preached on a passage. Commentary or explanation of a passage will use the <scripCom> tag, as in this example:

<scripCom passage="Mark 7:16">Mark 7:16. This admonition seems to apply to most everyone . . .</scripComm>

Cross Referencing Schemes and Synchronization

Cross referencing is the ability to find related passages in separate texts. Cross referencing in theological texts takes many forms. They include simple links such as those that can be handled by an HTML anchor; numeric or symbolic indexing schemes such as dates, Strongs numbers, scripture references, or subject index entries; annotation such as footnotes and commentary; different translations of the same text, etc. In this section we will define markup for handling symbolic cross-referencing schemes other than those that can be handled as ordinary links, scripture references, or annotation.

Standardized symbolic cross reference schemes such as dates, keywords, or Strongs numbers aren't really links to other documents, because any two documents with compatible cross reference schemes can be linked together and no particular documents are intended. Therefore XLL links, element IDs, etc. don't capture the semantics of such information. We will define a new sync element to represent this information. For example, the element

<sync/ type="Strongs" value="G42">

might be used to represent a Strongs number at a location in a text. The sync element can be used in either the empty (<sync/>) or non-empty (<sync></sync>) form.

Software tools may be provided to use this information in a variety of ways. For example, a program would be able to find other passages on related topics or create an index using the Strongs numbering. Multiple different manuscripts of the same original text could be aligned this way, and displayed in parallel columns, with appropriate software.

The scheme name given in the type attribute are not pre-defined; applications may invent new synchronization types for specific purposes. For example, the Rule of Benedict is available in several different manuscripts in two different traditions. If a common synchronization scheme were defined and manuscripts marked up, any two or more could be selected and aligned as parallel columns on the screen, or alternate forms of a passage could be located.

Annotation

Footnotes or endnotes occur frequently in books and are not well supported in HTML. A common strategy for handling notes is to store them in a separate file, with links back and forth between the text and the notes. A drawback of this approach is that to see a footnote it is necessary to unload the current page and load a page of footnotes -- and reverse the process to get back to the main document. This process is slow, and little semantic information about the relationship between the text and the notes is stored.

In a ThML document, footnotes, endnotes, etc. are all marked with the <note> tag, following the syntax used by TEI Lite. The note element may take the following attributes: place, resp, target, targetEnd, and anchored. The place attribute specifies how it appears in the text (e.g. end, foot, inline, or margin). The target (and targetEnd) attributes refer to the start (and end) of the text being annotated, if the note does not occur in the text at its reference point. These attributes allow the notes to be gathered at the end of a chapter or file if desired. The resp attribute identifies the person responsible for the note -- for example, the author, editor, or a person's initials. The anchored attribute specifies that the note is anchored at an exact location; margin notes typically are not anchored.

It is probably best to insert notes at the point they refer to in the text, in general. However, if the XML text will ever be used directly by an HTML web browser, the text of the note would appear in the body of the text. In that case, it may be better to gather the notes at the bottom of each section. Then too, a ThML to HTML web conversion program would easily be able to handle notes inserted at the point of reference.

The <note> element could also be used to store commentary, margin scrawls, and the like for a text in a separate file. In that case, the target and targetEnd attributes would be references to a point in another document.

Foreign Languages

The primary language for a document is specified in the header. Passages in other languages may be marked with the foreign tag and the lang attribute. For example, the Greek passage <foreign lang="el">logos</foreign> may be marked as shown. "lang" attribute values are as specified in ISO 639. Some examples are Dutch: nl, English: en, French: fr, German: de, Greek: el, Hebrew: he, Latin: la, Spanish: es, Portuguese: pt, Russian: ru. Note that the lang attribute may also be used with most HTML elements.

If the language uses characters not available in the ISO-8859-1 (Latin-1) character set, they may be represented in Unicode or in ISO 8859 using an appropriate font. An example of the latter situation is as follows: <foreign lang="el"><font face="Symbol">logos</font></foreign>. The Greek and Hebrew fonts used for the CCEL are the excellent, freeware SIL Galatia and SIL Ezra fonts and related software from the Summer Institute of Linguistics, used here in a Greek example (logov) and a Hebrew example (hwhy). The latter method depends upon the availability of a particular font to the client. We expect the Unicode representation to become the standard when Unicode support in common web-related tools improves.

Verse

Theological books often contain verse -- poetry, hymns, or versified presentation of material such as the Psalms. A stanza, verse, or other unit of verse is encoded in a <verse> element. Verse is often written with varying levels of line indentation. Lines are marked with <l>, <l2>, and <l3> elements, identifying relative levels of indentation. In the example below, the indentation is of course ignored by the XML parser, but it should be reproduced by the presentation software based on the <l>, <l2>, and <l3> tags.

<verse> <l>O God, a world of empty show,</l>
<l2>Dark wilds of restless, fruitless quest</l2>
<l>Lie round me wheresoe'er I go: </l>
<l3>Within, with Thee, is rest.</l3>

</verse><verse>

<l>And sated with the weary sum</l>
<l2>Of all men think, and hear, and see, </l2>
<l>O more than mother's heart, I come, </l>
<l3>A tired child to Thee. </l3>

</verse><verse>

<l>Sweet childhood of eternal life! </l>
<l2>Whilst troubled days and years go by, </l2>
<l>In stillness hushed from stir and strife, </l>
<l3>Within Thine Arms I lie. </l3>

</verse><verse>

<l>Thine Arms, to whom I turn and cling</l>
<l2>With thirsting soul that longs for Thee; </l2>
<l>As rain that makes the pastures sing, </l>
<l3>Art Thou, my God, to me. </l3>

</verse>

<attr><name>G. Ter Steegen</name></attr>

Attributions and Names

Attributions of authors of poetry, letters, etc. may be marked with the <attr> element. This is typically rendered right-justified. Also, names may be marked with the <name> element. When they are thus marked, an index of names can be automatically constructed. A different representation of the name for the index may be specified with the title attribute: <name title="Ter Steegen, Gerhard">G. Ter Steegen</name>.

Index Entries and Indexes

Passages in the text may be marked for insertion into an index using the <index> element. For example, one might mark a passage for inclusion in a subject index this way:

<index type="subject" subject1="Christian Life" subject2="Sanctification" subject3="Apotheosis">Apotheosis (or Deification) is an ancient theological word commonly used in Eastern theology to describe the process by which a Christian becomes more like God . . . </index>.

The title attribute is used in the Table of Contents. If it is not present, the text inside the <index> element is used as a title.

The <index> element may also be used in the empty form, <index/>, when the reference is to a point of in the text rather than a section. A document may use several user-selected types of index entries. An XML element (<insertIndex/ type="subject">) is also provided to specify that a sorted, hiererchical index of all the "subject" (e.g.) index entries should be inserted at that point, with links to the appropriate locations in the text. Certain special types are automatically understood: <insertIndex/ type="name"> inserts an index of all names marked with the <name> element; <insertIndex/ type="foreign"> inserts an index of foreign-language words and phrases marked with the <foreign> element and/or the lang= attribute. <insertIndex/ type="image"> inserts an index of images inserted with the <img> tag, using their title attributes as titles in the index.

Terms, Definitions, and Glossaries

Some documents contain a glossary. Glossaries may be marked up with the <glossary> element and HTML <dl> (definition list), <dt> (term) and <dd> (definition) elements.

<glossary>

<dl>

<dt>Apotheosis</dt>

<dd>An ancient theological word used to describe the process by which a Christian becomes more like God</dd>

</dt>

</glossary>

Software tools will likely be provided for linking two documents (which may be the same), using the glossary in one and the text in another. Words of the text defined in the glossary could be footnoted, underlined and linked, or defined in a separate window.

Additions and Deletions

The <body> section of the electronic text should have all of the contents of the print edition. However, for display purposes, it may be desirable to add or delete to the print edition. For example, it may be desirable to delete the original table of contents and replace it with one that is automatically generated. The <added> element is used to mark sections that have been added and do not appear in the print edition, and the <deleted> element is used to mark the sections that have been added. For example, a table of contents might be marked this way:

<added>

<H1>Table of Contents</H1>

<insertContents/ level="2">

</added>

<deleted>

[original table of contents here]

</deleted>

Header Information

The head section of a ThML etext has the most detailed (and least frequently used) markup. In a practical ThML software system, much of this information will be filled in by making entries in a form or template, including pasting the MARC record for the print source into the form. The head section may start with some HTML elements, such as <title> and <meta>. In addition, it has three optional sections that are unique to ThML: <generalInfo>, <electronicEdInfo> and <printSourceInfo>.

The <generalInfo> section contains information about the text that is not specific to the electronic edition or the print edition on which it is based. Whenever possible, its components are filled in from the information in the MARC record. It may contain these elements:

<generalInfo>

<author>…</author> These first fields are taken from the MARC record

<title> [Closing tags (omitted for readability) are required]

<editor>

<translator>

<edition>

<notes>

<LCNumber>

<DeweyNumber>

<subjects>

<firstPublished> Date first published in any edition

<primaryLanguage> Primary language of the text

<otherLanguages> All other languages used in the text

<copyrightComments> Added by producer of electronic edition -- for example, that a copyright renewal search was perfomed, with negative result

<description> Textual description of book, "blurb"

<pubHistory> Any available information on the publication history

<comments> Any other comments

</generalInfo>

The printSourceInfo section contains information specific to the print source from which the electronic text was derived, if there is one. The elements it may contain are these:

<printSourceInfo>

<publisher>

<pubLocation> Publisher location

<pubDate> Publication date

<copyright>

<seriesName>

<volume>

<ISBN>

<frontImageURL> Electronic photograph of front of book, ~ 200 pixels wide

<spineImageURL> Electronic photograph of spine of book, ~ 200 pixels high

<copyLocation> E.g. Buswell library, 231.4 b29h c.2.

<sourceURLbase> Base URL of print source, e.g. page scans of book

<MARCrecord> Machine readable version of MARC record

<MARCformatted> Formatted text version of MARC record, as returned by Library of

Congress' Z39.50 gateway (http://lcweb.loc.gov/z3950/gateway.html)

<comments>

</printSourceInfo>

The electronicEdInfo section contains information specific to the electronic edition, such as publication information, editorial practices and status, etc. It may contain the following elements:

<electronicEdInfo>

<URL> On-line location where text is published

<scanner> Person to scanned and OCRed the book

<typist> Person who typed the book, e.g. Kathy Sewell (ksewell@gate.net)

<source> Other source for electronic text, e.g. Wiretap

<sourceURL> URL for source

<proofreader> People who proofread the text

<markup> Person who applied ThML markup

<editorialComments> Comments about editorial practices: whether spelling was normalized, what was done with end-of-line hyphens, corrections that were made, tagging practices, etc.

<revisionHistory> A list of published editions and changes between them

<status> Current status of text -- e.g. This text still needs proofreading

<publisherID> Publisher code of electronic edition, as assigned by the CCEL, e.g. ccel

<pubDate> Date of publication, YYYY-MM-DD format

<authorID> Author ID as assigned by publisher

<bookID> Book ID as assigned by publisher

<copyright> Copyright statement for electronic edition

<version> Edition or version of electronic edition, e.g. 1.1

<ISBN> ISBN of electronic edition, if available

<MARCrecord> MARC record for electronic edition

<comments> Other comments

</electronicEdInfo>

Alphabetical List of ThML Elements in Body

The following list containts the special XML elements that may be used for ThML markup in the body section of a text. Elements that occur in HTML may also be used, but they are not listed here.

Name

Use

Example

added

Text added to print edition

<added><insertContents/ level="2"></added>

deleted

Text from print edition that should not be displayed

<deleted><H1>Table of Contents</H1>…</deleted>

divn

Major divisions in text

<div2 type="Chapter" n="I" title="Of the Inward Life">

index

Index entry

<index type="Subject" subject1="Christian Life" subject2="Sanctification">Apotheosis</index>

insertContents

Insert table of contents here

<insertContents/ level="2">

insertIndex

Insert index here

<insertIndex/ type="foreign">

foreign

Foreign language passage

<foreign lang="el">Logos</foreign>

glossary

Mark a glossary

<glossary><dl><dt>…<dd>…</glossary>

l

Line of verse

<l>O God, a world of empty show,</l>

l2

Line of verse (indented)

<l2>Dark wilds of restless, fruitless quest</l2>

l3

Line of verse (more indented)

<l3>Within, with Thee, is rest.</l3>

name

A person's name

<name title="Bernard, St.">St. Bernard</name>

note

Footnotes, endnotes, etc.

<note place="Foot" resp="Editor" target="#p1" targetEnd="#p2" anchored>…</note>

pb

Page break in print edition

<pb/ n="37" href="page37.gif">

scripCom

Commentary on scripture

<scripCom passage="Rom. 8:28" version="LXX">…</scripCom>

scripContext

Set scripture context for parser

<scripContext/ passage="Romans 8" version="NRSV">

scripParseOff

Turn scripture ref parser off

<scripParseOff/>

scripParseOn

Turn scripture ref parser on

<scripParseOn/>

scripRef

Scripture reference

<scripRef passage="Rom. 8.28" version="NRSV">…</scripRef>

scripture

Scripture passage

<scripture passage="Rom. 8:28" version="NIV">…</scripture>

sync

Synchronization point

<sync/ type="Strongs" value="G42">

verse

Poetry, verse

<verse>…</verse>

Using ThML in the CCEL

Software for processing and displaying ThML documents for the CCEL is being designed. The plan is that documents will be prepared in Microsoft Word, using paragraph styles and embedded XML codes. These will be entered with the assistance of macros and toolbar buttons. The MARC record for the print source will also be pasted into the document. Another paper (Theological Markup Language in Microsoft Word) describes the use of Microsoft Word for entering ThML.

The Word document will then be converted to XML format, which will be the "base format" for the text. A tool may be provided for converting the XML form back into a Microsoft Word document for further editing. (Microsoft has also said that they want to make the Office Suite the best environment for working with XML documents, so these conversion programs may not be necessary in the future.) This XML document will be converted by program to other desired formats, such as a collection of linked HTML files, plain text, PDF, and others.

Conclusion

Theological study requires text with relatively rich markup, needs that differ from other applications. The Theological Markup Language has been designed to address these needs powerfully and without too much complexity. ThML was designed to be a rich enough representation to support powerful indexing and user interface features and to allow conversion into other popular formats without loss. It is also hopefully a language that can be learned without exraordinary effort. It is a fundamental element of the Christian Classics Ethereal Library system, and it will make the library far more functional and useful.


This document (last modified July 21, 1998) from the Christian Classics Ethereal Library server, at Wheaton College