The article is an extended report on the 1992 AIA [Aerospace Industries Association] Symposium. It reports on the strong trend toward electronic manuals, including the key role of SGML in the structuring of online documentation. The growing popularity of IETMs (interactive electronic technical manual) is explained. SGML products/vendors discussed include: ATA-100 DTD support; SoftQuad; ArborText; Electronic Book Technologies; Westinghouse (Pathways); CALS SGML in the aerospace industry.
Although tables occur frequently in, and are an integral part of, many documents, current computer systems do not adequately support the table author. This report examines the essential characteristics of tables, and proposes a model system that captures fundamental cognitive activities involved in authoring tables. Existing table edit[i]ng systems are compared to the author's model, and a prototype system containing unique functionality is presented.
The author supplies a summary of the OCLC/NCSA Metadata Workshop held in March, 1995. The Dublin Core includes an SGML DTD as part of its description. See the full report from the Metadata workshop for further details.
Available online via UH WWW server, or here in mirror copy (text only, partial links)].
Abstract: "Much more than a better approach for formatting Web documents, XML has great potential for integrating XML documents with object oriented (OO) application programs. My theme is to suggest potential synergies between Web and object technologies, and to analyze how they might be applied to a problem like collaboration. I discuss some interesting new undercurrents in Web-based metadata. The Extensible Markup Language (XML) is a data interchange language for heterogeneous systems that is especially tuned for fast, online systems. XML is much more than a better approach for formatting Web documents: it is a representation language for describing the content and semantics of Web-based resources. I see great potential for integrating XML documents with OO application programs. First, the structure of XML documents can be very easily parsed into objects that can be programmatically manipulated. Second, XML document objects can be commingled with other application objects to create hybrid Web-object systems. As a way to introduce these possibilities, I discuss three topics: extensibility, reflection, and semantic models."
"XML has many benefits for folks who want to improve structure, maintainability, searchability, presentation, and other aspects of document management."
See the SIGS web site for information on the journal.
"Abstract: During the spring of 1986 the proceedings of the fifth British National Conference on Databases [BNCOD-5] was prepared using electronic methods. Thirteen academic papers were received in computer readable form and, in due course, were tagged in GML starter set and assembled into the text for a complete book. GML is the IBM mark-up language on which SGML has been based. The preparation of this volume highlighted four major issues which the users of mark up languages will need to address if mark up languages are to be widely accepted for the transfer of papers between authors, editors and publishers. These are the practical technical problems of transferring computer readable documents, the possibility of direct translation of one mark up language to another, the difficulty of defining a starter set acceptable to a wide class of users, and the handling of figures, tables and mathematics within a mark up language. The present paper addresses two of these problems (using starter set GML and handling tables and mathematics) in relation to the actual documents supplied for the BNCOD-5 proceedings and proposes the formulation of a discipline around which the practical use of mark up languages will need to develop."
"Abstract: Microcosm is an open hypermedia system. A fundamental feature of Microcosm is that all link information is held in external linkbases which contain the required details about the source and destination anchors of the links. This feature enables third party applications to act as document viewers since it is not necessary to adapt the application to manipulate data structures that include link anchor information. Another feature of Microcosm is that it is composed of independent components which communicate by passing messages. The author began investigating the possibilities of using Microcosm as a hypertext development environment and then mapping the completed hypertext onto other delivery systems. As an intermediate stage in mapping between Microcosm and other delivery systems the authors are producing some HyTime-based document structures which describe Microcosm hypertexts especially. linkbases. They are defining a process that will convert a Microcosm dataset into this representation, and then further translation programs to convert (possibly a subset of) this HyTime representation to run on other hypermedia delivery systems."
The Hypermedia/Time based Structuring Language (HyTime) is a recently adopted international standard (ISO/IEC Standard 10744, 1992). This paper presents the need and potential for HyTime, provides a brief explanation of its various facilities, and shows how it may be applied to good effect in various situations, with particular reference to hypertext interchange in Microcosm (an open hypertext system). It then goes on to explore several alternatives to HyTime and compares their relative strengths and weaknesses.
Summary: "This document discusses multilingual aspects of the Web. Extensions should be as compatible as possible with the present Web and they should be comprehensive; in particular, they should cover Language Engineering. The two main areas are: 1) Character set; 2) Multilingual Aligned Hypertext."
Available online: http://www.echo.lu/other/poster.html, [or mirror copy, September 1995].
"Abstract: Presents a general overview of the publishing process, including both paper and electronic publishing. The main actors (e.g. authors, publishers and libraries) and the associated publication channels are discussed. One channel is discussed in more detail: this channel incorporates all the steps involved from the acquisition of a manuscript in a generic mark-up language to the presentation of the final electronic publication to the user in the library. The use of a generic mark-up language (e.g. SGML) is viewed as being an essential component for facilitating the exchange of electronic documents between different systems and applications. In addition, the use of a generic mark-up language allows several steps of the publishing process to be automated, from the production of electronic books, to the addition of the resulting books to a (electronic) library. A system is proposed which provides the acquisition and authoring tools required to generate electronic books, together with an appropriate interface and readers' services. The system incorporates two notable features: a model of an electronic book ('hyper-book'), based on the book metaphor and designed within the context of an electronic library, and an environment which supports the semi-automatic generation of electronic books ('hyper-book builder') starting from a manuscript which is already available in SGML format."
Abstract: "The editorial process of the budget of the European Union provides a good example of a production environment that is entirely SGML-based, and meets severe constraints in terms of production time, quality, and costs."
"As such, it illustrates the fact that SGML realizes its full potential when used as a means of manipulating structured documents. It also highlights certain aspects of SGML, usually considered as advanced, making their significance apparent through a concrete example of their use."
"[Conclusion:] We have described a complex SGML-based client-server system that is used for the creation and maintenance of the European Union's budget, a huge 11-language document, revised three times a year. We have shown that an SGML system can be much more than just having SGML instances as input and output. We have described and illustrated how SGML is used in every aspect of the system, ranging from the server modules, over SGML based processing modules to an SGML-formatted messaging scheme between clients and server. Finally, we have outlined how at the very heart of the system presented here is SIT, the SGML Technologies Group's fully-featured SGML parser and integrated application language."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
A version of the document is available online in HTML format: "The European Union's Budget. SGML Used to its Full Potential"; [local archive copy]. Note: The SGML Technologies Group has published a number of other interesting papers online: see http://www.sgmltech.com/papers/index.htm.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: Standards are assuming a significant role in the fields of publishing and office automation, with the introduction of some significant techniques for describing documents in electronic form. The author discusses the standards making process. Significant standards include ISO 8879 standard generalized markup language (SGML) and ISO DIS/8613 office document architecture (ODA) and interchange format. He also mentions ISO DIS/9059 SGML document interchange format (SDIF) and other standards related to SGML. He discusses the origins of SGML and then looks at some of its features, describing it as a document markup metalanguage. He discusses implementation of SGML in existing systems and future systems and its limitations.
The article gives a detailed report on the November 1988 meeting in Boston, sponsored by the GCA: "Markup USA: SGML for the Desktop."
"This hardback is aimed at the publisher or editor who feel they need to have a better appreciation of what SGML is and how it can be incorporated into their organization. It avoids the use of jargon and acronyms and gives the reasons for needing to code documents for both conventional and electronic publishing. It gives a good grounding in what SGML is, where it comes from and how it can be used."
Contents include: Coding schemes; What is SGML?; The document type definition; Other forms of SGML; Using SGML; Illustrations and multimedia; From SGML to EP; Alternatives to SGML; Conclusions and a look to the future." [from the publisher's blurb; see URL below]
See: http://cobham.pira.co.uk/catalog/publishing/PUB4.HTM; [mirror copy].
For further details on the SGML technologies reviewed in this document, see the main entry for the RIDDLE Project.
"Abstract: This presentation outlines various approaches to SGML "up-translation", i.e., transformation of text data from arbitrary encoding formats to valid SGML instances. Visual recognition techniques, pattern matching techniques and two-step approaches with early conversion to low-level SGML structures, are analyzed with respect to various data sources: text processor files, OCR data and phototypesetting files. This presentation also explains why "up-translation" is by no way symmetrical to "down-translation" i.e., transformation of SGML data to arbitrary formats, and why different tools and programming paradigms are required for each problem."
The document is available online in HTML format: "Current approaches to SGML up-translation" [mirror copy, December 1995]. For further details on the Conference and BeLux, see the contact information for SGML BeLux. The article concludes with a brief biographical statement for François Chahuneau
Abstract: "This paper studies, from an historical perspective, the relationship between SGML and data modeling concerns. SGML did not the invent the concept of structural document models, or 'schemata'. Nevertheless, through the notion of DTDs, it made this powerful concept available and understandable to a large number of people with little or no data modeling experience.
"With the evolutionary trend towards 'content oriented' DTDs, the emergence of well-described methodologies to design them and the appearance of specialized 'case' tools to manipulate them, the potential of SGML as a data modeling methodology became clear, and some SGML enthusiasts suggested to use it as a general purpose tool.
"However, because an SGML DTD intimately mixes the notion of a 'grammar' and that of a 'schema', these two concepts remained partly confused, at least in the 'orthodox' SGML approach. This original characteristic caused some misunderstandings and raised many suspicions from the 'traditional' data modeling world. This largely precluded, so far, the use of SGML as a general data modeling tool outside the restricted arena of structured documents.
"By introducing a simplified syntax with a fixed grammar, XML isolated the role of DTDs as 'pure schemata', and also made them unnecessary for pure recognition of the 'de facto' document structure.
Finally, recent proposals such as MCF and XML-data suggest to use the XML syntax itself to encode document schemata, therefore making 'traditional' DTDs obsolete. At the same time, they propose several extensions to the SGML data modeling semantics, by incorporating object-oriented concepts. Will such an evolution allow XML to become the official, well-accepted and ubiquitous way to exchange structured data and associated models, and bring SGML power much beyond its original application niche?"
[Extract from the section "The Dual Nature of DTDs"]: "With the benefit of hindsight, after ten years of practice, the design of SGML appears as an unlikely and unique mixture of many brilliant ideas and a few mistakes, and strikes [one] by its total lack of references to data modeling or language design theories which had already emerged in computer science at the time it was designed. A major point of originality is the central SGML DTD concept itself: a DTD is both a generative grammar for the markup language which will be used to tag corresponding instances, and a schema which characterizes a document class: it assigns names to things and defines rules stating what structural patterns shall or shall not be not possible/required in an SGML document (modeled as a tree of typed nodes with attributes) which belongs to the class. In the same set of statements, one is instructed that 'the end tag for AUTHOR can be omitted' and that 'the document must have a title and a single one', although these two pieces of information admittedly belong to totally different areas of concern. This dual nature of DTD should not necessarily lead to confusing the two notions. Unfortunately, this is largely what happened in the SGML community..."
See also by François Chahuneau the posting "Beyond the SGML DTD," submitted to the W3C WG discussion forum.
This paper was delivered as part of the "Expert" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
A major feature article on Unicode and its relation to internationalization issues. The article's author is François Chahuneau, general manager of AIS/Berger-Levrault, who brings a wealth of experience to the topic of multilingual software development. AIS was one of the first developers to announce XML support in a major product (Balise). As expected, the article includes a section "SGML and XML Specific Issues," but the entire article is relevant to developers who are planning to support XML.
Abstract: "Generation of SGML-coded documents as a result of database query processes is a commonly used practice. In most cases, however, the contents of such documents are entirely built from scratch as an SGML-formatted image of the query results. We present an extension to this practice, in cases when documents are made of a combination of human-generated parts and database originated parts. When such documents are updated, human-generated parts should remain untouched, while database originated parts (text, tables and graphics) should be regenerated or updated.
"The method used here is that of SGML templates, which embed links targeted to a database. Such a technique can be used in many application fields, ranging from Web applications to industrial catalog publishing, where complex, human-generated document structures coexist with database extracts."
A related version of this document is available online in HTML format: see http://www.balise.com/current/articles/chahun.htm. See also the SGML 96 presentation, [mirror copy, text only].
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "Generation of SGML-coded documents as a result of database query processes is a commonly used practice. In most cases, however, the contents of such documents are entirely built from scratch as an SGML-formatted image of the query results. We present an extension to this practice, in cases when documents are made of a combination of human-generated parts and database originated parts. When such documents are updated, human-generated parts should remain untouched, while database originated parts (text, tables and graphics) should be regenerated or updated.
"The method used here is that of SGML templates, which embed links targeted to a database. Such a technique can be used in many application fields, ranging from Web applications to industrial catalog publishing, where complex, human-generated document structures coexist with database extracts."
The document is available online in HTML format: http://www.balise.com/current/articles/chahun.htm; [mirror copy].
Note: The above presentation was part of the "SGML Expert" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Today's workstations make it possible for users to create and interact with many types of objects. It is desirable that a document creation tool allow all these types of objects to be mixed and nested without restriction in documents, that each type of object be treated uniformally wherever it is found, and that the tool be extensible to new types of objects. The Quill document creation system addresses these requirements by providing an extensible family of specialized editors, coordinated by a Shell that provides common services and presents a consistent user interface. The Shell manages a database that records the properties of various objects in the document, allows objects to inherit properties from other objects, and allows users to override properties when desired. Quill generalizes the concept of properties to include user-supplied procedures that specify the active behavior of an object during WYSIWYG editing.
Abstract: A description is given of Quill: an extensible document creation system that is organized as a collection of cooperating editors, each with its own set of objects and commands. The objects implemented by the various editors can be nested without restriction, forming a hierarchical document that can be described by the Standard Generalized Markup Language (SGML). The user is presented with a 'what you see is what you get' (WYSIWYG) view of the document in which the various objects can be directly manipulated on the display screen. A system shell ensures consistency among the editors and coordinates their foreground and background activities to ensure keystroke responsiveness. Each Quill editor is a programming object that communicate with the shell and with other editors by means of a standard set of procedures. A rigorous specification of the shell/editor interface enables additional editors to be added to the Quill system without affecting the existing editors.
Abstract: The Standard Generalized Markup Language (SGML) is a language for representing document structure. This paper discusses ways in which the SGML language might be used to represent graphic as well as textual contents of a document. By using SGML markup for both graphics and text, a document processing application can achieve a more uniform treatment and tighter coupling between these two types of materials.
Abstract: This paper describes the architecture of a proposed document composition system named JANUS, which is intended to provide support for authors of complex documents containing mixtures of text, line art, and tone art. The JANUS system is highly interactive, providing authors with immediate feedback and direct electronic control over page layouts, using a special two-display workstation. Authors communicate with the system by marking up their documents with high-level descriptive "tags". A tag definition language is provided whereby new tags may be defined and the format of each tagged object may be controlled.
The document was also issued as IBM Computer Science Research Report, RJ3006 (37371), IBM Research Laboratory, San Jose, CA, December 1980.
The document was also issued as IBM Computer Science Research Report, RJ3366 (40402), IBM Research Laboratory, San Jose, CA, January 1982. See the abstract of a previous article under a similar title for a project overview.
Abstract: Recent years have shown two distinct but converging trends in document processing: the trend toward direct manipulation, or "WYSIWYG" systems, and the trend toward high-level generic markup. The Quill project at IBM Research is an attempt to combine the flexibility and ease of use of a WYSIWYG interface with the formatting power of the international standard SGML markup language. The Quill system will present a WYSIWYG user interface but will format documents under control of an external Document Design that specifies the degree of user control over document appearance. Quill includes a tool called the Designer's Workbench that enables a Document Designer to specify the syntax and semantics of a given type of document. Each element in the document type is defined by a "look" consisting of a property sheet and an optional semantic routine. The semantic routines are written in a high-level programming language and can call a set of system-provided utility functions that are designed, according to rules described in this paper, to be suitable for WYSIWYG processing." See also on Quill: Wolfsthal.
See the online version of the [same/related?] article in French.
Available on the Internet: George Charlebois: Le langage SGML : vue d'ensemble et derniers progrès - Flash Réseau #3 or text version. Also: [mirror copy, November 1995]. See similarly the article in English.
Abstract: "The Canadian Department of National Defence requires suppliers to produce technical manuals for equipment supplied to project offices within the department. DND now requires suppliers to support the Standard Generalized Markup Language (SGML).
"The DND CALS Assembly Information Model is the content-oriented SGML structure mandated to markup information about equipment. The technical manuals are typically presentation-oriented, based on a book paradigm.
"A preliminary publishing system architecture along with Document Type Definitions (DTDs) to support this architecture were designed to address the transformation requirements necessary to produce the technical manuals from the information model as well as future requirements that may arise for electronic-based information products. This paper describes the evolution of this architecture, based on feedback from multiple field trials which validated various segments of this preliminary architecture.
"Version 2.0 of the DND CALS DTD with extensive documentation, applications and scenarios based on this architecture is now available from the DND CALS Office."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Just as JAVA has brought an open, distributable way to enable users to interact with data by transferring applications in real time from server to client, so SGML can enable them to interact with persistent database objects by transferring, real time, the database schema for those objects. This talk explores the potential of SGML as a universal database definition language for reusable, distributable information objects and shows how existing technology is already turning that potential into reality."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: The approach to the production of technical documentation described, arose from a need to supply documentation for a large library of mathematical software. The library has evolved over a period of twenty years and for most of that period the documentation has consisted of printed volumes describing the software routines and the mathematical algorithms on which they are based. In recent years there has been an increasing demand from the users of such software to receive documentation in both printed and online forms. There is also a need to vary the technical content of the documentation. The requirements discussed above have been addressed by developing a database which contains all the information necessary to generate and maintain the technical documentation for mathematical software libraries. The database is an object-oriented model of the software products, containing source code, technical information about the software and small fragments of text marked up in SGML. The database can now be used to generate both the original documents and new documents with varying style, format or content. It also makes the information held in the original documents accessible through regular database queries.
Abstract: "Two approaches to electronic publishing are examined: the conventional batch-oriented programming language approach, and the more elaborate direct-manipulation paradigm. The authors indicate which aspects of document preparation are more conveniently handled under which model and point out several instances of a hybrid approach that takes advantage of multiple representations. The design of a fairly sophisticated document development environment is discussed as an example of a multiple-representation system."
Abstract: The authors review a large number of document development systems for both text and graphics from the perspectives of source-language and direct-manipulation models. They describe the task domain and discuss the pros and cons of direct-manipulation techniques versus a programming-language source code and of procedural versus declarative schemes. They then establish a framework for analyzing and designing multiple-representation systems. The central theme is that program constructs and visual feedback are complementary to each other and that a hybrid approach would be most desirable.
"Abstract: This paper presents a library references recognition system for retrospective conversion of catalogues. The system is guided by a structure model of a reference class, described by an attribute grammar. The analysis method is based on prediction and verification of segmentation hypotheses proposed by the model. The result, given in UNIMARC format, contains the different sub-fields of the reference with their confidence score. This method is general enough to be adopted for any document having a micro-structure. The method has also been used on other kinds of documents such as author index and subjects."
Abstract: "This presentation discusses the ways in which SGML is being used to build prototypes of scholarly editions for distribution on the Internet. Based on the Text Encoding Initiative (TEI), the sophisticated markup offers greater intellectual access as well as sophisticated frameworks on which to build editions. Chesnutt will review the work of the consortium and its plans to create an American Documentary Heritage Database encompassing documentary editions drawn from all fields of the arts and sciences."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
See also the The Model Editions Partnership Home Page, or the main database entry for the Model Editions Partnership in the SGML/XML Web Page. Also, dLib Magazine contains an overview of the project.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Summary: Summary: "The Model Editions Partnership, a consortium of seven historical editions, is currently developing a series of prototypes which will be mounted on the World-Wide Web later this year. These small samples (equivalent to 150-200 pages) will demonstrate a variety of intellectual approaches in creating new editions for the Internet. Using a subset of the SGML markup system developed by the Text Encoding Initiative (TEI), the editors are preparing image editions (using images of historical manuscripts) and live text editions (using transcribed historical documents). A third approach uses a sequel database with CGI scripts to provide the user interface. The user interface for the SGML models uses software provided under a grant from Electronic Book Technologies. . .As a successor to the Partnership, we are now preparing the ground to build an American Documentary Heritage database (ADH) Unlike the Partnership, the ADH would include modern editions from all disciplines which publish letters, diaries, journals, public records, and other documentary source materials."
The article is available online in HTML format; local archive copy. Note that the July/August 1997 double issue of D-Lib Magazine (Amy Friedlander, editor) contains several articles referencing the use of SGML encoding in digital library research.
[Excerpt]: "The Partnership is predicated on the view that SGML markup can be used to create the scholarly frameworks required and that SGML markup offers a practical method for preparing and delivering documentary editions. A close study of the scholarly issues involved in preparing documentary editions led to the publication of "A Prospectus for Electronic Historical Editions" in May 1996 (http://mep.cla.sc.edu/MEP-Docs/proptoc.HTM). The Prospectus set forth a series of design principles; a typology of the kind of editions which might be expected to develop; and a discussion of importance of markup."
Abstract available online in HTML format: "The Model Editions Partnership -- Towards a National Database", by David R. Chesnutt; [archive copy]
Additional information on the ACH-ALLC '97 Conference is available in the SGML/XML Web Page main conference entry, or [August 1997] via the Queen's University WWW server. See also the The Model Editions Partnership Home Page, or the main database entry for the Model Editions Partnership.
Summary: "This paper will discuss the relationship of the Text Encoding Initiative to the Model Editions Partnership in general, the relationship of the TEI markup system to each of the prototypes of historical editions in particular. Concluding remarks will focus on the Partnership's experience in working with the TEI Guidelines."
"The Text Encoding Initiative and the SGML markup system developed under its aegis has had a profound effect in the development of digital resources in the humanities. Nowhere is this more evident than in the Model Editions Partnership--a project which is developing prototypes of electronic historical editions on the World Wide Web. [...] Guidelines provided us with a carefully crafted markup system which met most of our needs. If we had had to build a markup system for the Model Editions Partnership from scratch, the Partnership would probably never have come into existence. I think is fair to say that the Partnership exists because the TEI exists."
"Our approach in adapting the TEI Guidelines to our work is probably not uncommon. We have a data capture DTD we use in marking up the texts; and we have an archival form of the DTD for long-term migration. In the data capture DTD, we redefine the element <docGroup> and we have four main elements within it. The document <doc> element is used for fulltext transcriptions; the surrogate <surrogate> element is used for abstracts of documents; the target <target> element is used to provide information about images of original documents; and finally, the docgroup <docgroup> element itself can be used to create sub-groups within a docgroup.
See the database entry for the Model Editions Partnership: Historical Editions in the Digital Age, or the University of South Carolina web site.
The extended abstract for the document is available online: http://www.stg.brown.edu/webs/tei10/tei10.papers/chesnutt.html; [local archive copy]. See the main database entry for additional information about the conference, or the Brown University web site.
"The Model Editions Partnership is a consortium of seven historical editions which joined forces with leaders from the Text Encoding Initiative and the Center for Electronic Text in the Humanities to develop a foundation for the next generation of historical editions. That generation will consist of electronic editions disseminated via the Internet or on CD-ROM (or its equivalent). . . Our central task is to create a set of "Markup Guidelines for Electronic Historical Editions" based on the TEI Guidelines. Document analysis sessions were held last fall and small samples from each of the Partner projects have been encoded using TEI Lite in a series of mock-ups. The samples and the results from the document analysis sessions will guide the development of a subset of the TEI markup as well as extensions to the TEI markup designed specifically for electronic historical editions." [from the Introduction and Current Work]
The document is available online: ; [mirror copy]. See the main entry for the Model Editions Partnership. See also the main workshop entry or the program listing for other workshop details.
The Model Editions Partnership is a consortia of seven ongoing historical editions publishing projects joined in the task of creating electronic editions. Susan Hockey and Michael Sperberg-McQueen will coordinate work in the project. The electronic editions will be created through the use of TEI/SGML encoding. Contributions from the three contributing authors are given in separate papers in the Abstracts volume: "I Already Have A Job: An Editor/Historian Contemplates Electronic Editions" [Gordon]; "Model Editions Partnership: An Overview" [Chesnutt]; "Technical Issues in the Model Editions Partnership" [Sperberg-McQueen], focusing upon the application of the TEI encoding Guidelines.
Abstract: "This paper describes a prototype Collaborative Environment for Language Learning (CELL) which is used for the text-centered multimedia study of Chinese. The CELL uses Standard Generalized Markup Language (SGML) for the definition and interchange of learning material and is deployed on the Internet through the World Wide Web (WWW). The CELL emphasizes collaboration, taking advantage of the communication capabilities of the Internet, by allowing Chinese language students, teachers, and scholars to share their knowledge with other users of the system. The two-way information flow of the WWW also permits the setup of a virtual classroom with structured exercises and on-line guidance from a human teacher."
Available on the Internet in HTML format, from Text Science, Inc.: [mirror copy, text only]. See also the link to CELL (COLLABORATIVE ENVIRONMENT FOR LANGUAGE LEARNING).
"Abstract: We present a description system for the transformation of structured documents based on context free grammars (CFGs). [The structured documents considered here are similar to those in two international standards: the Open Document Architecture (ODA) and the Standard Generalized Markup Language (SGML). These documents have tree structures; in particular, only leaf nodes are associated with contents such as texts, graphics, and mathematical expressions.] The system caters for transformations between different document class descriptions, and is presented mainly in terms of logical structure transformation. Two requirements for transformation are proposed: the output document class must be explicitly representable, and inconsistency must be avoidable. First, a grammar for document class descriptions, called a T-CFG (tree-preserving context free grammar), is introduced, then SDTT (syntax-directed tree translation) is given for a document transformation. The SDTT transformation is formal, concise, and consistent with the above two requirements."
Chisholm reports on recent progress in computer-based Post-Renaissance German Studies. He says: "Most electronic text centers are encoding textual materials in the Standard Generalized Markup Language (SGML) . . ." The report summarizes activity at the University of Virginia Electronic Text Library, the use of TEI-SGML encoding for German texts, electronic seminars, and other research centers. Two URLs relevant to the German corpora are cited: Greensboro, http://www.uncg.edu/~lixlpurc/german.html and Tucson, http://aruba.ccit.arizona.edu/~chisholm/chisholm.html.
Abstract: "This article identifies problems and proposes solutions for encoding verse texts in SGML. It is organized around a series of distinctions and oppositions which the TEI Work Group on Verse regard as significant. These include examination of the formal properties which distinguish verse from prose, followed by discussions of (1) text-searching vs analysis, (2) markup vs algorithms, (3) markup vs transcription, (4) uniformity vs choice, (5) specificity vs generality, (6) metrical convention vs linguistic realization, (7) structural vs non-structural divisions and (8) fidelity vs interpretation. Using German and English verse forms as illustrations, the advantages and disadvantages of of pre-line tagging, in-line tagging and feature structure analysis are discussed. We suggest that metrical and rhyme conventions always be tagged at the highest possible level of text divisions."
[needs abstract, and xref to published version] Apparently published as pages 313-324 in SIGMOD '94. Draft version vailable in PostScript [UNIX compressed] via FTP from INRIA.
Abstract: Structured documents (e.g. in SGML) can benefit a lot from database support, and more specifically from object-oriented database (OODB) management systems. This paper describes a natural mapping from SGML documents into OODBs and a formal extension of two OODB query languages (one SQL-like and the other a calculus) in order to deal with SGML document retrieval. Although motivated by structured documents, the extensions of query languages that we present are general and useful for a variety of other OODB applications. A key element is the introduction of paths as first class citizens. The new features allow one to query data (and to some extent schema) without exact knowledge of the schema, in a simple and homogeneous fashion.
Submitted for publication to ACM [ECHT '94]. A draft version is available in PostScript [UNIX compressed] via FTP from INRIA. [needs abstract]
"Abstract: Charles Goldfarb has invented a document description language that is about to become extremely useful to many users. In 1979, IBM Corp. had Goldfarb - a law graduate who tired of writing elaborate assembler commands to retrieve data from legal briefs - submit his idea for a new computing standard to control the format of complex documents to the American National Standards Institute, an industry forum for making parts and such interchangeable. What the institute had done for pipe fittings and machine screws it was now trying to do for data files. Goldfarb's language, which he developed with his partners Edward Mosher and Raymond Lorie, went into the public domain as Standard GML (SGML). In the 1980s, the US Defense Department, sagging under the burden of paper documentation that accompanies procurements, demanded that contractors producing manuals for missile frigates, helicopters and the like follow certain formats based on the Goldfarb standard. A key vendor of large electronic publishing systems, Interleaf, added SGML to its products. In 1993, Goldfarb's invention will become a common feature in word processing packages."
A feature article, according to "Forbes Magazine on Charles Goldfarb," SGML Users' Group Newsletter 26 (February 1994) 24, which chronicles and celebrates the work of Charles F. Goldfarb on the development of IBM's GML and SGML language.
The note is a response to the article of Lynne Price, "A Note Comparing SGML to Text Processing Macro Languages,"
The report on the Paris '93 Seybold Conference includes a dedicated section on SGML-based editorial systems. Included in the discussion are: AIS (Berger-Levrault) SGML/Store database server for SGML documents; AIS SGML/Search (an SGML style query language compatible with SGML/Store); AIS Balise (SGML conversion tools); Grif SGML editor; GRIF GATE (an object-oriented environment for editing and database management); MID's use of Open Text's PAT.
Abstract: "Grupo Anaya is engaged in a 3 years project to create an educational encyclopaedia for the XXI century. The editorial work starts from scratch with no legacy information. From day one the encyclopaedia has been conceived as a product and media independent database, ready to be deployed in traditional paper based media as well as through any electronic media and channel (CD-ROM, Internet, cable services, etc.). Textual information is SGML coded, and hyperlinks are HyTime compliant; quantitative and perishable information is stored and maintained in relational databases linked to queries embedded into SGML structures.
"The editorial team, more than 100 authors, works in a distributed environment using internet, intranet and extranet technologies. All of them have access to the central encyclopaedia database (text and images), and also to an electronic library with hundreds of textbooks and reference sources available for documentation purposes. A detailed workflow has been designed to manage the editorial flow between authors, copy editors, documentalists, and managers.
"An evaluation of different SGML-based editorial systems has been undertaken (ASTORIA and SigmaLink among them) together with other standard solutions (Oracle InterOffice). The results of these evaluations together with the architecture of the solution finally adopted will be presented.
"On top of these SGML structures, a knowledge database is built in the form of an object network with semantic relations that will allow the creation of very sophisticated JAVA-based interfaces for Internet access and delivery of the encyclopaedic information. For this purpose, technology developed by GMD-IPSI in the context of MacMillan's Dictionary of Art, will be used.
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
"The explosive growth of HTML with its role in the World Wide Web has created a whole new market for SGML products and knowledge. This thorough guide to SGML covers the key elements needed to create and use online documents, and emphasizes the use of new easy-to-use software for SGML. The book includes sections on Techniques from the Pros, highlighting examples of SGML documents from experienced users;thoroughly covers SGML elements, software, and more; Extensive Index helps readers find information quickly and easily." [published blurb]
The volume attribution reads: "Written by Martin Colby and David S. Jackson, with Steven J. DeRose, Bob DuCharme, David Durand, Elli Mylonas"; however, casual inspection does not clarify what "with" means in terms of contributions from these four individuals. See the cautions below.
QUE has made available the online Table of Contents [mirror copy] and [as of January 09, 1997], a pre-publication draft of the full text of the book. [*Note: URLs may be assigned dynamically by the QUE server, so if the TOC linkfails, try QUE Home Page or the Que Digital Bookshelf.]
The release of the book occasioned considerable public controversy stemming from an excess of typographic error and/or errors of fact -- so serious, in the minds of some experts, that the book cannot be recommended. See especially the cautionary note by Steve Pepper [criticism for technical inaccuracies], or, more positively, the brief notes by W. Eliot Kimber and by Len Bullard [praise for a good presentation of SGML basics]. Apparently, a large number of recognized errors were not corrected before the book went to press, and it is unclear whether the publisher will release a corrected edition. One may infer that the input from the four recognized SGML authorities was not fully taken into account.
"Abstract: Working with documents in electronic format is inherently different from dealing with materials in print; nor can all electronic formats be considered equivalent. Processing and presenting SGML is not the same as processing and presenting materials in other markup or word processing formats. To maximize flexibility and extensibility, SGML is highly modular, which complicates implementation. Its emphasis on content structure rather than appearance enhances searchability but makes consistent and precise display difficult. Mechanisms used to maximize platform and software independence (e.g., entities, link protocols), though effective, can be used incorrectly or in ways difficult to implement on some systems or using certain software. This paper considers how difficult questions remain for libraries planning to implement SGML."
Abstract: "This report explores the suitability of Standard Generalized Markup Language for developing and providing access to digital libraries, with special emphasis on preservation issues. In a staged tutorial, the authors explain how the use of descriptive markup tools such as SGML is crucial to the quality and long-term accessbility of digitized materials."
"Abstract: This paper describes the development of a family of electronic library projects at De Montfort University including: ELINOR, the first UK electronic literary project; ELISE, the European project to develop interconnected image banks in libraries; ELSA, a European SGML project and PHOENIX, an on-demand publishing project in the UK E:Lib programme. The copyright and licensing issues arising out of these projects, particularly the ELINOR project are discussed. The negotiations with publishers in the ELINOR project have resulted in a model agreement which is increasingly being accepted by publishers and which streamlines the negotiation process. Finally, mention is made of initial progress being made towards a model agreement for images."
For further information on ELSA (Electronic Library SGML Applications), see the main ELSA entry in the Academic Applications section.
Abstract: "Some of the information processed through SGML systems should never be stored in a document instance. In particular, tabular data may already be maintained and stored in a relational database. This paper discusses the alternatives and outlines a strategy for integrating relational data in an SGML instance, and for automating the process of updating and delivering the information content."
[Conclusion:] "We have seen that not all information contained in documents should be stored there. Graphic elements provide one obvious example of this principle, which we can easily extend to tabular data. Using relational database technology and SQL, the data can be managed appropriately; while SGML provides the mechanism for delivery. This paper has offered a simple strategy for integrating these systems, through the storage of SQL queries in the SGML instance and scripts to intelligently process those queries."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "The Standard Generalized Markup Language (SGML) is a complex system for developing markup languages. It is used to define the Hypertext Markup Language (HTML) used in the World Wide Web, as well as several other hypermedia document representations.
"Systems with interactive performance constraints use only the simplest features of SGML. Unfortunately, the specification of those features is subtly mixed into the specification of SGML in all its generality. As a result, a number of ad-hoc SGML lexical analyzers have been developed and deployed on the Internet, and reliability has suffered.
"We present a self-contained specification of a lexical analyzer that uses automated parsing techniques to handle SGML document types limited to a tractable set of SGML features. An implementation is available as well."
Available electronically: http://www.w3.org/pub/WWW/Journal/3/s2.connolly.html; [archive copy].
See the dedicated document containing abstracts and annotations. The book was reviewed by Bob DuCharme in <TAG> Volume 11, Number 3 (March 1998) 1-2.
Abstract: "XML, a landmark in the evolution of Internet information systems, allows authors to say what they mean, rather than merely how to say it. The shift to XML will unleash a diverse range of new applications, ranging from mathematcial equation structures to new browser and client tools. This issue of the Web Journal, by guest editor Dan Connolly, is your first look at the technical specifications and early applications of a new data format that will rock every aspect of the Web, including markup, linking, and exchange." [from the publisher]
The volume is to be published as 'Volume 2, Issue 4' of the World Wide Web Journal, published by O'Reilly & Associates.
See also online: XML: Principles, Tools, and Techniques: Full Description [archive copy]. An the online table of contents is also avaliable, [local archive copy].
Summary: "Guest Editor Dan Connolly and Series Editor Rohit Khare team up to herald the appearance of XML and discuss its evolution from the Standard Generalized Markup Language (SGML)."
Abstract: "HTML is the ubiquitous data format for Web pages; most information providers are not even aware that there are other options. But now, with the development of XML, that is about to change. Not only will the choices of data formats become more apparent, but they will become more attractive as well. Although XML succeeds HTML in time, its design is based on SGML, which predates HTML and the Web altogether. SGML was designed to give information managers more flexibility to say what they mean, and XML brings that principle to the Web. Because it allows the development of custom tagsets, we can think of XML as HTML without the 'training wheels.' In this article, we trace the history and evolution of Web data formats, culminating in XML. We evaluate the relationship of XML, HTML, and SGML, and discuss the impact of XML on the evolution of the Web."
A version of this document is available online in HTML format: http://www.cs.caltech.edu/~adam/papers/xml/ascent-of-xml.html.
The article presents a model for information systems reengineering which is based upon "organic" principles of synchronizing technology and business. He relates the arguments to SGML. The article includes sidebars on "Organic DTDs" (11) and "HyTime and SGML" (17-19).
Abstract: "Many people mistakenly believe that SGML (the Standard Generalized Markup Language, ISO 8879) is useful only for document production. SGML can also be used for non-document applications, for example, to manage administrative and financial information data sets to support project planning, process improvement, and re-engineering efforts. SGML can help balance mechanical (efficiency-oriented) and organic (flexibility-oriented) approaches to information management, thereby contributing to the adaptability and well-being of an organization. This article looks at SGML implementation efforts at the Department of Energy's Hanford, Washington site and discusses the value of the standard for managing information in a changing organizational environment."
Available online: from WWW.KTIC.COM Knowledge Management Metazine (http://www.ktic.com/topic6/KMSGML.htm), or from the Sagebrush Group server: "SGML: It's Not Just for Documents Anymore", mirror copy]. The article "is a revision of work previously prepared by the author while employed by Boeing Computer Services, Richland. It was published November, 1994, in The Proceedings of SGML '94 and was presented at SGML '94, Vienna, Virginia, November 6-11, 1994."
Abstract: "Organizational decisionmaking patterns determine SGML investment strategies and potential benefits. A framework for understanding the primary policy objectives that can influence the selection of SGML (inherent policy effects) and application design (user-defined policy goals) will be presented. Competing and often contradictory goals and perceptions of value often make the development of a business case for SGML very difficult. Methods for integrating stakeholder principles, interests, and expectations in the early stages of application conceptualization and design will increase real and perceived benefits and de-fuse potential political problems before they develop."
See the bibliography entry for a related article by Kurt Conrad, "SGML, HyTime, and Organic Information Management Models.".
Note: The above presentation was part of the "SGML Business Management" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "This paper/presentation is an update of the one which was delivered at SGML'95. It is intended to be a general introduction to the issues and concepts involved in the selection of software tools for the electronic delivery and retrieval of SGML (Standard Generalized Marku

