Simple ThML Markup in Microsoft Word

Version 0.8, Monday, July 20, 1998

This paper tells how to add a simplified subset of the markup that is used to prepare texts for the Christian Classics Ethereal Library (CCEL) in Microsoft Word. The markup consists of paragraph and character styles and XML tags used for specified purposes. A template contains the styles that are used as well as macros and a header that may be filled out for the bibliographic information of the head section.

In order to prepare a text for the CCEL with Microsoft Word, the first step is to get the ThML Template, ThML08.doc, and put it in the Templates folder, inside the Microsoft Office folder. The template can be downloaded from the ThML web page, http://ccel.wheaton.edu/ThML. Once it is installed, you can create a new ThML document by choosing New from the file menu and selecting the ThML template. Then the document is typed or scanned, if necessary, and formatted with appropriate styles and markup codes as described below. Footnotes may be entered as ordinary footnotes in Word, using the Insert | Footnote… menu item or the Insert-Footnote-Now shortcut, Alt+Ctrl+F.

Paragraph and Character Styles

Much of the formatting in Word is done by applying character and paragraph styles to the document. Paragraph style sheets are named groupings of styles for paragraphs, such as single-space, indent first line, Times New Roman 11-point, etc. A paragraph style can be applied to a paragraph by selecting it from the left-most dropbox on the formatting toolbar. The ThML template provides several paragraph styles that should be used for formatting documents—styles such as Body Text, Body Text First Indent, Heading 1, Verse, BlockQuote, and others.

Character styles are similar to paragraph styles, except that they only contain character formatting and they may occur within a paragraph style. To of the character styles used for ThML are "HTML Markup" and "Default". Keyboard shortcuts have been provided for certain common paragraph and character styles:

Style Name

Shortcut Keys

Description

Body Text

ctrl-alt-b

Text with no first-line indent

Body Text First Indent

ctrl-alt-i

Text with first line indent

Default (character)

ctrl-alt-d

Default paragraph font

Heading 1

ctrl-alt-1

Level-1 heading

Heading 2

ctrl-alt-2

Level-2 heading

Heading 3

ctrl-alt-3

Level-3 heading

Heading 4

ctrl-alt-4

Level-4 heading

HTML (character)

ctrl-alt-h

HTML (or XML) markup

Verse

ctrl-alt-v

Poetry, verse, etc.

XML

ctrl-alt-x

XML (or HTML) markup

 

XML and HTML Markup

When markup requires attributes (e.g. lang="el"), paragraph styles are not sufficient, and XML or HTML tags are used. The markup may consist of opening and closing tags with attributes, surrounding some text, as for example <foreign lang="el">logos</foreign>. The opening and closing tags and the contained text are called an "element." The markup may also consist of an empty element, that is, an element that doesn't contain any text, such as <pb/ n="37">. In that case, there is a trailing slash after the element name and no closing tag. These elements are represented in a Word document as text that is red, hidden, Courier New text. (In fact, any text that is red will be interpreted as markup.) This style may be applied to text by using the XML paragraph style or the HTML character style. These styles are identical, and used identically, except that one is a paragraph style and one is a character style.

Document Structure

ThML documents have a head section, with information about the document, and a body section, containing the document itself. When a new ThML document is created, a template for the head section appears. As much of the template as possible can be filled in. If possible, the MARC record should be retrieved from the Library of Congress gateway (http://lcweb.loc.gov/z3950/gateway.html), in machine-readable and formatted form, and inserted into the header at the appropriate spot. The information in the MARC record can then be pasted into other sections of the header.

Body

The body of the document, placed between the <body> and </body> tags of the template, should contain everything in the print edition of the book. It should be made to look as similar to the book as possible using the ThML template. In fact, if desired, the ThML styles may be modified to make the document look more like the book, though style names shouldn't be changed and styles other than those in the template should not be used.

Headings

Headings for the preface, table of contents, and index, chapter titles, section heads, and the like should all be formatted using the styles Heading 1, Heading 2, Heading 3, or Heading 4. These styles can also be applied with ctrl-alt-1, etc. and viewed or modified in the outline view of a document.

Page Breaks

It is often useful to know the page breaks from the print edition of a book. They may be used as targets for subject index entries that identify the page of the entry or to display a text with the pagination of the print edition. Page breaks are marked by the insertion of <pb/> tags, with the n attribute giving the page number of the upcoming page (<pb/ n="37"> or <pb/ n="xii">). These elements should appear at the start of the identified page.

Paragraphs

Normal paragraphs of text may be formatted with the Body Text Indent style. This is a single-spaced paragraph with indented first line. The Body Text style is similar, except that the first line is not indented. It is used for the first paragraph of a chapter or the continuation of a paragraph after a figure, for example. Body Text 2, a double-spaced version of Body Text, is also available.

Block Quotes

The BlockQuote paragraph style should be used for extended quotations. A BlockQuote paragraph is normally indented on both sides. There is also some extra space before and after a BlockQuote paragraph.

Verse

Theological books often contain verse -- poetry, hymns, or versified presentation of material such as the Psalms. Verse is often typeset with varying levels of indentation. These are represented with Verse 1, Verse 2, and Verse 3 paragraph styles. In the example below, the first and third line of each stanza is of style Verse 1, the second Verse 2, and the fourth Verse 3.

O God, a world of empty show,

Dark wilds of restless, fruitless quest
Lie round me wheresoe'er I go:
Within, with Thee, is rest.

And sated with the weary sum

Of all men think, and hear, and see,
O more than mother's heart, I come,
A tired child to Thee.

Sweet childhood of eternal life!

Whilst troubled days and years go by,
In stillness hushed from stir and strife,
Within Thine Arms I lie.

Thine Arms, to whom I turn and cling

With thirsting soul that longs for Thee;
As rain that makes the pastures sing,
Art Thou, my God, to me.

G. Ter Steegen

Scripture

In theological texts, scripture passages may be cited, quoted, or explained. Citations refer to a passage, but quotes include the text of a passage in the document. References may occur in footnotes, in parentheses (Phil. 2:1-11), or in the text itself -- see Rom. 8:28. Citations do not need to be marked, as there will be a program to find them automatically. However, quotations and explanations or commentary should be marked.

Quotations of scripture may be marked with the <scripture> element. A passage may be represented as in this example:

<scripture passage="Mark 7:16" version="NKJV">If anyone has ears to hear, let them hear!</scripture>

Explanation or commentary on a passage will be marked with the <scripCom> tag, as in this example:

<scripCom passage="Mark 7:16">Mark 7:16. This admonition seems to apply to most everyone . . .</scripComm>

Foreign Languages

The primary language for a document is specified in the header. Passages in other languages may be marked with the foreign tag and the lang attribute. For example, the Greek passage <foreign lang="el">logos</foreign> may be marked as shown. "lang" attribute values are as specified in ISO 639. Some examples are Dutch: nl, English: en, French: fr, German: de, Greek: el, Hebrew: he, Latin: la, Spanish: es, Portuguese: pt, Russian: ru.

If the language uses characters not available in the ISO-8859-1 (Latin-1) character set, they may be represented with the Latin-1 character set using an appropriate font. For example, <foreign lang="el"><font face="Symbol">logos</font></foreign>. The Greek and Hebrew fonts used for the CCEL are the freeware SIL Galatia and SIL Ezra fonts and related software from the Summer Institute of Linguistics, used here in a Greek example (logov) and a Hebrew example (hwhy). The latter method depends upon the availability of a particular font to the client.

Horizontal Rules


Horizontal rules that span 30% of the page can be inserted with a paragraph using the HR30 style. These would be rendered in html as <hr align="center" width="30%">. The above paragraph is an example. The paragraph below, of style HR, represents a horizontal rule that spans the entire page.


 

Conclusion

Electronic texts formatted according to these guidelines will be converted to XML format using custom software. Browsers that support XML will be able to use the resulting texts directly, and the format is semantically rich enough that it will be possible to convert texts to a variety of other formats without loss. Those formats may include multi-file HTML webs, plain text, PDF, OnLine Bible, Docbook, Windows Help, and others.


This document (last modified July 21, 1998) from the Christian Classics Ethereal Library server, at Wheaton College