Abstract: I describe a synthesis within TeX of descriptive markup and object-oriented programming. An underlying formatting system may use a number of different collections of user-level markup, such as LATEX or SGML. I give an extension of LATEX's markup scheme that more effectively addresses the needs of a production environment. The implementation of such a system benefits from the use of the model of object-oriented programming. LATEX environments can be thought of as objects, and several environments may share functionality donated by a common, more general object. This article is a companion to William Baxter's "An Object-Oriented Programming System in TeX." [See the relevant bibliographic entry.]
"Abstract: In computer science, context-free grammars are used extensively to describe data sets such as manuscript types and programming languages. The data, or members, contained in a particular set represent instances of the grammar describing that set, for example, documents and programs.
Determining the elements comprising instances is the task of content investigation. Imposing structure on these elements is the task of grammar development. Creating, editing, and manipulating instances of a grammar is the task of grammar instantiation. Grammar instantiation has received much attention with software systems such as programming environments and compound-document environments. Content investigation and grammar development have only recently been recognized as recurring complex tasks. They have received little attention because of their newly emerging significance. This work focuses on grammar development.
Grammar development produces a grammar description in a particular notation that contains two types of information: a formal, context-free grammar and auxiliary information. Auxiliary information describes the application of the grammar description. For example, a grammar may describe the manuscript type ``article,'' but the auxiliary information may describe how to format the instances for layout, how to analyze the sentence structure, or how to exchange documents of that type.
The separation of the general, context-free grammar from the application-specific, auxiliary information provides the power and flexibility to generalize problem classes associated with grammar development. The formalisms of context-free grammars motivate two such problem classes: syntactic properties and semantic properties. The analysis of the development of large grammars motivates two other problem classes: reusable grammars and multiple notations.
A review of existing software systems reveals that a new, general-purpose, support environment was required for developing grammar descriptions. A prototype environment for developing grammar descriptions, DeveGram, has been designed and implemented. DeveGram controls and manages the four problem classes by capturing any context-free grammar, providing mechanisms for determining properties about a grammar, capturing auxiliary information, and generating automatically grammar descriptions in a testbed of different notations. DeveGram produces grammar descriptions for a testbed of software systems differing in syntax and purpose. The testbed presently consists of Yacc, SGML, MDL, MANDEN, and BNF."
Note: see more on the Chameleon project by Mamrak and Walter. For a paper copy of the report, send email to: strawser@cis.ohio-state.edu OR to cso@cis.ohio-state.edu.
"Many software systems were developed to manipulate structured objects. The system developers were performing common tasks, e.g., parsing, manipulating the structured objects in a similar fashion, and even using common data structures, e.g., trees, to represent the structured objects being manipulated. System generators were developed to eliminate this duplication of effort. Each of these generators requires a specification describing the structured object under consideration, e.g., a programming language. In general, a specification consists of two parts: a context-free grammar and auxiliary information. The context-free grammar describes the manipulations to perform on the structure and content of an object. The task of describing a specification is inherently complex for the typical specifier. In particular, defining a structured-document specification presents considerable difficulties to the specifier. In this paper, we identify the complexities of defining a specification, in particular for structured documents. We also describe ideal features of a support environment that would aid in controlling and managing these complexities. Other system generators are then evaluated according to the identified features. Finally, the design of a prototype environment driven by this discussion is presented.
Abstract: "Various components of an SGML system are examined using a graphical framework; where applicable, software applications and the relevance of XML are reviewed within this framework. Using a broad concept of an SGML document, the following tools which work with these documents are discussed, including their interrelationships: authoring, conversion, document management, and output. The basic structure of a document is a DTD, a set of rules for applying SGML to the markup of a document ('tagging'). SGML Editors make possible the creation of information using SGML tags from the DTD. Conversion tools facilitate changing data to and from various coding schemes. Document managers permit a number of functions, including revision control, and coordination of the other tools. Having defined content with these tools, formatters use output specifications to control the output of data in a formatted fashion. The introduction of XML increases the importance of application documentation as XML removes the requirement of a DTD."
This paper was delivered as part of the "Newcomer" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
"Abstract: We address the problem of automating the display of database records in an intelligent way. By this we mean the synthesis of complete multimedia documents from database records. We propose an architecture for mapping diverse data stored in a database to markup language (SGML) programs. These programs are ready for final presentation. The mapping is based on a computational extension of the linguistic concept of registers. The resulting presentation represents data as information in an intelligent way. General conditions for such a system are discussed. Our own treatment of registers as rule-based computational structures is offered with some early results on the behaviour of rule-based registers."
The document is available online in Postscript format: http://www.cs.engr.uky.edu/~oldham/ismisfinal.ps; [mirror copy]. Possibly also as: ftp://al.cs.engr.uky.edu/cs/manuscripts/dexter1.ps, [mirror copy].
Summary: "The Energy Department is spending $825,000 to exploit little-known capabilities of the Standard Generalized Markup Language for managing distributed archives. Energy's Office of Scientific and Technical Information (OSTI) recognized the value of SGML several years ago when it adopted the language as its standard for document exchange. Now Energy is laying the groundwork for a distributed multimedia archive that agency scientists and academic researchers can access from any desktop computer. With the new architecture, OSTI officials want to give government scientists desktop access to the full text and multimedia output of Energy's more than 60 research sites and program offices that conduct basic materials research and other high-interest investigations."
Available online from the Government Computer News WWW server: GCN Online; [archive copy, text only].
"Abstract: The Food and Drug Administration (FDA) is suggesting the use of a standard template, based on the Standard Generalized Markup Language (SGML), for the use of pharmaceutical companies that are submitting new drug proposals. SGML-tagged documents can be created even on PCs and there are 47 SGML products in the market for the Apple Macintosh. FDA officials are in the process of working with their counterparts in the Netherlands, Sweden and Canada in developing a multinational electronic template. The agency has implemented a pilot data-exchange project by the Center for Drug Evaluation and Research that contained electronic data standards for text, bit-mapped graphics, quantitative data, chemical structures and analytical instrument data."
Extract: "Although the Standard Generalized Markup Language promised the same thing many years ago, converting all documents to SGML proved too time-consuming, said Bill Thornburg, vice president of publisher markets for Dataware Technologies Inc. of Cambridge, Mass. Dataware's Electronic Publishing Management System (EPMS) can accept documents in any non-SGML data formats, including active news feeds, by later this year. The text-based repository for the finished documents is an SGML document store, which maps non-SGML documents to its SGML structure."
Available online from GCN: "Middleware unifies publishing"; [archive copy].
An article on (National Security Agency) Intelink SGML applications. See for other details the SGML '96 presentation by Fredrick Thomas Martin, Deputy Director, Information Services Group, National Security Agency, "SGML in the Intranet for the US Intelligence Community: 'INTELINK' - A Case Study." Its provisional abstract, in part: "The US National Security Agency, the Central Intelligence Agency, the Defense Intelligence Agency, the National Reconnaissance Office and other agencies of the United States Intelligence Community are improving intelligence gathering and reporting through development and implementation of technology including SGML. INTELINK, the classified world-wide 'Intranet', addresses one of the world's largest data management problems."
Abstract: "The Publications Division of SAS Institute needed a way to replace the hardcopy formatting tools it had been using, and also faced the challenge of producing online documentation for its large variety of software products. After deciding to implement an SGML solution using Adept, the Institute decided to apply good software engineering and programming principles to the effort and develop a modular, maintainable store of declarative SGML structures and custom executables. This paper describes the implementation of that system."
A related presentation describing the implementation of SGML by the Publications Division of SAS Institute was given at SGML '96 by Craig R. Sampson, "SASOUT: A Context Based Table Model."
Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "The worlds of Java objects and Web documents are converging, and XML is key to providing the final gateway between them. While there are two opposing schools of thought in this arena, XML provides a radical software change with benefits that outpace HTML, SGML, RTF, and provide an interchange format for OO developers."
Note previous articles on XML in Object Magazine, including: "[The XML Revolution.] Document Objects With Style. An XML Document is a Composite Structure of Node Objects," by David Carlson. In Object Magazine (February 1998) 14-15.
Abstract: "This article advocates the use of SGML technology for the creation, dissemination, and display of Web documents. It presents a software architecture that allows for defining the operational interpretation of arbitrary document types by means of style sheets, written in a scripting language. Our approach has been motivated by a desire to extend the functionality of the Web with support for multimedia and active documents. Although growing in complexity, HTML is still lacking in functionality. We prefer a more flexible and generic approach, as enabled by the employment of SGML."
"After a brief introduction to SGML, we will illustrate how our approach accommodates (extensions of) HTML as well as arbitrary SGML documents containing multimedia data such as video and audio. We will then briefly sketch the software components used in the realization of our approach and discuss some topics for further research."
A postscript draft version of the document is available on the VU WWW server: http://www.cs.vu.nl/~dejavu/papers/ep96.ps.gz; [mirror copy]. For other conference information, see the main conference entry for EP '96, or the brief history of the conference as sixth in a series since 1986. See the volume main bibliographic entry for a linked list of other EP '96 titles relevant to SGML and structured documents.
Abstract: "The presentation will be a summary of a project at the University of Oslo, where about 80 persons have been working with SGML. The way we work with SGML is a bit different from many others. We want to use SGML as an infrastructure, applied to a wide range of documents. In this presentation I will summarize the evaluation of the project, and the interviews that I have done with some of the writers."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "The Astrophysical Journal, published by the University of Chicago Press for the American Astronomical Society, is a large and complex scientific journal of more than 25,000 pages per year. Over the last several years the production system for this publication has been re-engineered to be SGML-based, including on-screen SGML copy editing, exporting SGML for conventional typesetting, and producing an online HTML edition from the SGML archive. The most difficult part of the implementation was the use of SGML math and the problems encountered in translating complex mathematics between LaTeX, TeX, SGML, ASCII, HTML, and two different commercial typesetting systems. The key benefits of this implementation were (1) reduced conventional production costs, (2) the creation of additional electronic products, and (3) the establishment of a rigorous framework for future non-text content."
For more information on the use of SGML by the American Astronomical Society, see the main AAS entry in the SGML/XML Web Page.
Note: The above presentation was part of the "SGML User" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"Abstract: Describes the design of a multimedia database management system for a distributed news-on-demand multimedia information system. News-on-demand is an application that uses broadband network services to deliver news articles to subscribers in the form of multimedia documents. Different news providers insert articles into the database, which are then accessed by users remotely, over a broadband, asynchronous transfer mode (ATM) network. The particulars of our design are an object-oriented approach and strict adherence to international standards, in particular the Standard Generalized Markup Language (SGML) and HyTime. The multimedia database system has a visual query facility, which is also described in this paper. The visual query interface provides three major facilities for end users: presentation, navigation and querying of multimedia news documents. The main focus, however, is the querying of multimedia objects stored in the database."
Abstract: "Information access for people with disabilities is creating numerous opportunities and challenges in the Information Highway community. In addition, as a result of the increasing paradigm shift by the publishing industry toward Internet and WWW-based document delivery systems, the importance of producing accessible information using electronic document mechanisms has increased immeasurably. . . The paper will attempt to identify major problems in information and software design that deny access; cite successful products that can be used by people with disabilities to access publications; and point to resources that assist developers in creating accessible products in the future."
Originally published as "People with Disabilities Can't Access the Web!," World Wide Web Journal Volume 2, Issue 1 (Winter 1997), pages 173-182 = Advancing HTML: Style and Substance, from O'Reilly & Associates, Inc. URL: http://www.w3.org/Journal/5/s3.paciello.html; [archive copy, text only].
Abstract: "The World Wide Web is fast becoming the de facto repository of preference for on-line information, yet the technology of the Web has inadvertently created barriers for people with disabilities. Worldwide, more than 750 million people with disabilities (more than 100 million in Europe alone) are affected by the emergence of the Web, directly or indirectly. In order to 'push the envelope' of information access and truly realize the full potential of the Web, the World Wide Web Consortium (W3C) intends to take a leadership role in removing accessibility barriers by launching the Web Accessibility Initiative (WAI, pronounced 'WAY').
"Mr. Paciello, creator of the WAI, will discuss the initiative's goals and mission and how SGML plays a major role in the advancement of information accessibility."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
[No abstract available.]
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: The article describes the Oxford University Press's implementation of the Standard Generalized Markup language (SGML). SGML provides a rigorous syntax for describing unambiguously the content and structure of any document in such a way that its presentation can be controlled by conversion to typographic codes and selective retrieval can be enabled by the application of search software. Clearly, because the languages is generic, its use is independent of specific devices and it can be implemented universally, regardless of the make of front-end, host, printer and operating system. SGML has been adopted by the Oxford University Press in order to convert the Oxford English Dictionary into a lexical database.
Abstract: "The authors develop a model of meta-information architectures (header, local index and directory) and present three current or proposed meta-information structures for networked information resources with applicability to organization and access in libraries and networked information environments. Special emphasis is given to the Text Encoding Initiative's TEI Header and Independent Header as a model for meta-information for academic and library needs. Recommendation is made for the specification of a generalized SGML meta-information header based on the principles of the TEI Independent Header, to address the needs of cataloging, automatic processing and serving of networked information resources."
"ABSTRACT: This paper presents the early results of a research initiative constructing a system for automatically identifying structural features and applying SGML tagging based on the Text Encoding Initiative DTD to text generated from the scanning and OCR processing of print documents. The system interprets typographical and geometric analysis output from a specialized OCR software system, and maps combinations of these characteristic features to TEI constructs based on a user-generated document analysis specification. The system is being developed as part of a pilot project to create from the original paper document a TEI-encoded edition of the Transactions of the American Medical Association, Vol. 2, 1849, a research resource for 19th century United States medical and urban historical study. Although this project focuses on one specific text, an important goal of the project is to create a software system that can process and at least minimally tag many types of printed documents, given a proper document analysis specification, and thus allow a more rapid process of retrospective conversion of printed documents into SGML texts in libraries."
Available in HTML format: http://stirner.library.pitt.edu/DL95paper.html [mirror copy, text only; July 1996]
"Abstract: There is a real need for a tool to enable effective collaborative authoring of documents on the WWW. A number of sophisticated tools allow browsing of local and remote files but do not as yet allow authors to modify them. Our approach is to promote the creation of information directly on the WWW and so enable interaction between the different contributors. This approach relies on the use of a structured editing tool which recognizes the structured content of HTML documents and is wired on the network. We discuss various cooperative strategies and user interface issues and how SGML might help in the generalization of collaborative authoring on the WWW."
Available on the Internet: http://www.grif.fr/fr/newsref/coopwork.html; [mirror copy]. Apparently also published in Computer Networks and ISDN Systems, April 1995, pages 841-847.
See the conference proceedings, pages 323-333.
"Abstract: HTML suffers from its lack of extensibility, and anarchical tag proliferation is in danger of breaking the WWW. In our view, the extensibility of the text model is necessary and we should develop and make extensive use of SGML to extend the current HTML model, not only by defining other DTDs (document type definitions) which could replace HTML, but also by proposing an extensibility scheme offering Web users rules for extending the HTML DTD themselves. This approach has been developed in the Grif Symposia authoring tool. Grif Symposia, a joint INRIA/GRIF S.A. project, is an integrated authoring-browsing environment to be shipped with full extensible capabilities in order to handle mixed HTML/SGML data models. We discuss the advantages to be gained by using a mixed HTML/SGML data model for the WWW on the basis of the work that we have achieved by developing Grif Symposia. We present the different layers developed for Grif Symposia and highlight the advantages obtained in authoring information in a mixed SGML/HTML environment."
[The author's] "Conclusions:
- "It is possible to build a WYSIWYG authoring and viewing environment which support dynamic and structured tag set extensibility on the WWW.
- SGML could be used to extend the current HTML model not only by defining other DTDs which could replace HTML but by proposing an extensibility scheme offering to Web users rules for extending themselves the HTML DTD.
- The display of new SGML tags is the most serious problem one have to consider and a complete and powerful style sheet language has to be adopted.
- adopting a structured approach to information authoring and retrieval on the WWW, we can access and manipulate intelligently on both the client and the server sites the data which is semantically and structurally identified." [from the Net version]
Based upon (or being) a paper delivered at the Fifth International World Wide Web Conference, Paris, France, 6-10 May 1996. See: the presentation slides; or the full text of the article http://www5conf.inria.fr/fich_html/papers/P18/Overview.html [mirror copy, text only].
"Abstract: There is a great need for WWW clients to be extensible. The availability of the source code of some popular browsers (Mosaic) led many people to slice the original Mosaic or CERN code and to add diverse custom code for specific applications. In our view, a WWW authoring/viewing environment must be extensible enough to allow the building of interactive document authoring environments in which the user is able to access all relevant documentary information on the Web and incorporate it directly in his/her document. Symposia (shipping since March 95) is a joint INRIA / GRIF S.A. project for building a cooperative WYSIWYG authoring tool for the WWW. Symposia will soon be shipped with an API that we have developed that presents a set of solid principles for extending the user interface, document management, network extensibility and interactive behavior of document fragments in a WWW client. We will discuss in this paper the advantages gained from basing the extensibility of a WWW client on a generic structured environment. We will present different ways proposed today to extend WWW clients: Forms/CGI and Java and will compare them with the Symposia API."
Available on the Internet: http://www.grif.fr/fr/newsref/sympapi.html [mirror copy]
Abstract: "Microsoft cofounded the XML working group at the W3C in July 96 and actively participated in the definition of the standard. This article describes why Microsoft implemented its first XML application and how it led to the development of two XML parsers shipping in Internet Explorer 4.0, one written in C++ and the other in Java. We describe the importance of designing an object model API and our vision of XML as a universal, open data format for the Internet."
A version of this document is available online in HTML format: http://www.w3j.com/xml/excerpt.html; [local archive copy]. See also Microsoft's XML Support Page.
"Abstract: The importance of reusing information is well understood in electronic publishing, and is one of the motivations for the development and use of SGML. Reuse is actually quite hard to achieve with SGML, as the elements are strongly typed and there can be incompatibilities between the DTDs. HTML, an SGML derivative, relaxes those constraints, but unfortunately it does not provide a significant level of structure for identifying and extracting information, since the tags are mostly used for presentation. XML, another SGML derivative, is a promising alternative which could bring the power of SGML to the Web while keeping the simplicity of HTML. Those standards all have particularities which must addressed in a global solution to the reuse of information. We present our solution to the reuse of SGML information objects: a system that can dynamically combine information from various sources, including databases and SGML-like documents, to produce a virtual document, which allow an author to reuse information in a document-centric, descriptive way. We maintain support for the particularities of the data sources, by having them stored in different formats and accessed in their own native query language, but also support the integration of these information objects by converting them into a common, tree-like data structure, and by providing a language to extract and transform information in those trees. In this approach, a collection of SGML documents can be stored in an object-oriented database as a tree-like hierarchy of information objects; thus taking advantage of the strict typing of SGML to provide efficient storage and retrieval. By extending the standard query language of the object-oriented database, we can query on an incomplete or partial knowledge of the document structure whilst retaining the search efficiency that the database engine provides us. Combination of the results with other databases or data sources, and inclusion into the SGML virtual document is handled by our tree language. HTML and XML documents, do not always conform to a DTD, and, if they come from the Web, are volatile and fast-changing in nature. We propose in this case to access those documents through the standard file systems or http protocol, to convert them to our tree-like data structures on-line, and to use our tree language for both extraction and transformation, with possibly some specific instructions to handle links. The system is currently being implemented. Our prototypal application, a document to generate activity reports, reuses both an SGML database and a collection of HTML pages (as well as an SQL database), and shows how flexible and powerful our tool for information reuse is."
See also "Reuse of Linked Documents through Virtual Document Prescriptions." By Anne-Marie Vercoustre and François Paradis [INRIA (France) and CSIRO (Australia)]. Pages 499-512 in Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings of the 7th International Conference on Electronic Publishing (EP '98), Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, RIDT '98). Saint Malo, France, March 30 - April 3, 1998. Edited by Roger D. Hersch, Jacques André, and Heather Brown. New York/Berlin/Heidelberg: Springer-Verlag, 1998.
Contact François Paradis or Brendan Hills for a copy of the paper.
Abstract: "The importance of reuse of information is well recognised for electronic publishing. However, it is rarely achieved satisfactorily because of the complexity of the task: integrating different formats, handling updates of information, addressing the document author's need for intuitiveness and simplicity, etc. An approach which addresses these problems is to dynamically generate and update documents through a descriptive definition of virtual documents. We present a document interpreter that allows information to be gathered from multiple sources and combined dynamically to produce a virtual document. Two strengths of our approach are: the generic information objects that we use, which enable access to distributed, heterogeneous data sources; and the interpreter's evaluation strategy, which permits a minimum re-evaluation of the information objects from the data sources."
"The RIO (Reuse of Information Objects) project aims to develop techniques which can support information reuse in various contexts. The focus of the project is currently on the specification and interpretation of virtual documents to enable reuse of structured information from heterogeneous sources. The instructions for the construction of virtual documents are stored in a document prescription, which is processed by the document interpreter to generate or update a virtual document. An editor facilitates the writing of the document prescriptions; it is connected to the document interpreter in order to provide dynamic editing. The document prescription consists of: 1) Static data, or the structure and the text that does not change in the document; 2) Queries, or the commands needed to generate the dynamic part of the document; 3) Transformation instructions, to convert the reused information objects into new document objects. The document prescription is written as an SGML document or as one of its derivatives such as HTML or XML, that might not enforce compliance to a formal DTD. Static data is expressed using normal SGML constructs. Queries and transformation instructions are expressed as SGML Processing Instructions (PI). There are two kinds of queries: native queries, which send requests to the data sources in their specific language (e.g., SQL for a relational database, URL for an HTML server), and pick queries, written in an OQL-like language that we designed to combine results and provide search capabilities for semi-structured information."
See Slides: A Virtual Document Interpreter for Reuse of Information. See also the online document abstract and the full text in PDF format; [local archive copy].
Summary: "The EDIDOC project at ESA attempts to merge the functionalities of electronic data interchange with those of electronic document engineering. It uses SGML to exchange in electronic format documentation within ESA, and between ESA and its different industrial partners."
"Specific SGML and EDIFACT substandards will be defined and implemented, covering both technical and administrative applications, and will be submitted to international standardisation bodies. They will include a.o. the following document types: 1) project control documents such as Monthly Progress Report, Engineering Change Notice, and Contract Change Notice; 2) software documents that are part of the ESA Software Engineering Standards such as User Requirements, Architectural Design, Operator Manual, etc.; 3) and possibly ITT and Proposal."
The document is available online in HTML format: http://www.sgmlbelux.be/Newsletter/N12A4.HTM; [local archive copy]. See the EDIDOC main entry in the SGML/XML Web Page.
Brief description of DynaText 2.0 for the Macintosh. Includes comments of Paul Kahn, director of the IRIS Program at Brown University, where DynaText has been used to create electronic books in the areas of mathematics, literature, and science.
"ABSTRACT: During the last year many changes have been introduced into the system of maintaining OMIM. There are three major components of the reorganization. First, a distributed editorial system was introduced which provides a three-tiered editorial board with senior editors, science writers and subject editors. Second, MIM entries have been restructured to provide separate gene and phenotype information and to organize them into separate catalogs. The restructuring also establishes clearly defined sections for entering new information, converts old entries to the new structure, and establishes a file maintenance and editorial system in SGML format. Third, the entry numbering and naming system has been modified. In addition, the information has been made available through a variety of output media, including books, CD-ROM and online access based on the IRx, WAIS, Gopher and WWW formats."
"This article presents the results of research on the feasibility of using scanner technology to capture contents pages of collective monographs, and to extract the bibliographic information of each individual work and process this with a standardized language, such as SGML, for tagging electronic documents. By this means, data can be used as electronic information or stored in an online catalog (OPAC), thus providing additional access points. A pilot system has been designed to test the initial hypotheses, show the feasibility of achieving the suggested goals, and develop the tasks required so that they may be carried out as automatically as possible."
[Note the dissertation of Eduardo Peis...]
Abstract: "The article presents an intelligent information agent which provides just-in-time help to flight simulator maintenance technicians. The agent, called STEALTH, is embedded within an information management system, TOPSS, and works as a personal digital assistant (PDA) for information filtering. It provides users with intelligent guidance to useful information within a given context, taking user's preferences into account. STEALTH offers numerous advantages compared to several other information filtering paradigms: (1) its intelligence is based on an artificial neural network (ANN) technology which can easily add new documents to its knowledge base, as well as learn and forget user's preferences; (2) its information search strategy includes theme indexing and stemming, which goes beyond the use of full text keywords; (3) it can take advantage of an SGML document base, retrieving documents at different granularity levels (paragraph, section, chapter, document); and (4) it allows users to build their own collection of documents using available document parts. The approach taken in the development of STEALTH is outlined here: from the ergonomic task analysis to the actual implementation and testing of the agent. Although STEALTH is presently only used within a flight simulator context, it is believed that its generic design will make it applicable to a wide range of domains. In effect, one only has to provide the agent with the document base to be used in order to profile the agent for the new context."
Abstract: "A significant economical objective at Norsk Hydro is to reduce the time and cost of maintaining equipment used in oil production.
"According to NORSOK (NORsk SOkkels Konkuranseposisjon, or in English the competitive standing of the Norwegian offshore sector), 50% of the development cost of an off-shore installation is related to information. NORSOK is the Norwegian initiative to reduce development and operation cost for the off-shore oil and gas industry. An important part of this effort is to develop cost efficient standards to replace individual oil company specifications.
"In this case study we will explain the implementation of an interactive system to improve the accessibility of technical supplier documentation by utilising the SGML standard."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
Abstract: "A significant economical objective at Norsk Hydro is to reduce the time and cost of maintaining equipment used in oil production.
"According to NORSOK (NORsk SOkkels Konkuranseposisjon or in English the competitive standing of the Norwegian offshore sector), 50% of the development cost of an off-shore installation, is related to information. That will explain why there is substantial savings in make the information management process and methods more efficient.
"In this case study we will explain the implementation of an integrated, interactive system to improve the accessibility of information needed for the maintenance procedures at an off-shore installation by utilising the SGML standard. This system contains all the relevant parts in an information management process like authoring, storage and information distribution."
This paper was delivered as part of the "Case Studies" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
Abstract: "Wärtsilä Diesel is the largest medium speed diesel engine manufacturer in the world, with offices and factories all over the world. This is a case study where Wartsila Diesel Power Plant provides an editorial system for their subcontractors, so that they can easily produce content oriented information modules, based on the physical equipment breakdown structure (EBS) according to the WD Base-DTD. The study also covers the production system that is used in Wartsila to maintain and to produce presentation-oriented technical manuals from the content oriented information modules delivered by the subcontractors. We will also cover the background and problems of handling lots of information coming from several sources in different formats, why WD decided to implement an CALS/SGML information environment and what they achieved so far.
The editorial system consists of the WD Base-DTD that is mapped in SGML Author to templates in Microsoft WORD and a database that is used for mapping the information modules into the correct level in the EBS. This editorial system makes it very easy to author content oriented information, because of the familiar wordprocessor that helps the user to navigate in the DTD without having any knowledge about SGML. The key thing in the application is having an interface of a database from where the author chooses an information module and puts in information by using the next legal style, which follow the structure in the WD Base-DTD.
The production system consists of tools for navigating, searching, browsing and publishing of the technical information from the main repository. When the subcontractor delivers the technical information, it will be analysed in Wartsila Diesel and if it becomes approved it is saved into a main repository for the information. The main tool in the production system is a browser that is configured to the relational database (main repository) that holds the EBS with the associated information modules. The tool is used for searching, viewing and publishing of the information modules in a very object oriented way. By choosing publish, the user can produce information products, such as IETM, Online and/or paper manuals very easy by 'dragging and dropping'."
Note: The above presentation was part of the "SGML Case Studies" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
The Ivanhoe Career Guide to Information Systems 1996 is" a guide to the profession for students and others considering IT as a career." Penfold's article discusses the role of SGML in publishing.
Available online: http://www.bcs.org.uk/ivanhoe/part-2/g8.htm; [mirror copy].
Abstract: "Since 1994 all of the Norwegian Government's official reports (Norges Offentlige Utredninger, or the NOU series) have been produced using SGML. The same SGML source document is used to enable the publication of the printed version and two different on-line versions. As a result of this project, the information embodied in the NOU series is now available faster and more reliably to a greater number of people than before. Thanks to the use of SGML, the series can now be searched as free text, distributed via the Internet, accessed by the visually impaired, and re-used in other publications. It is also guaranteed to be available to future generations in machine-readable form. The success of the NOU Project has motivated the Norwegian Government Administration Services to use SGML for other important official publications, and a new project is currently underway.
"This presentation will describe the background to the NOU Project, the way in which it was implemented and some of the lessons that were learned. Particular emphasis will be given to the ways in which SGML enables small and medium-sized publishers to exploit new medier such as CD-ROM and World Wide Web."
See a summary of the presentation in the SGML Sweden '96 program; [mirror copy].
[A presentation based upon the author's popular "Whirlwind Guide to SGML Tools and Vendors."]
Note: The above presentation was part of the "SGML Newcomer" track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
"Abstract. SGML is an enabling technology: it doesn't actually do anything in and of itself. In order to make it work for you, you need software tools: tools to help you design your application, tools to help you get your information into SGML format, and tools to help you do something useful with it once you've got it there. This presentation aims to give a brief overview of the kinds of SGML products currently available and some of the questions you should be asking yourself (and vendors) when choosing them."
Note: The Guide is developed in several parts, with an introduction for each class of SGML software tool in the document overview. The Vendor and Tool directory is thus part of a larger document which explains the role played by different kinds of tools in an enterprise information management solution. The document sections include: (1) Document abstract; (2) Introduction: Classifying SGML Tools (according to hardware and software platform, level of SGML support, and function or activity); (3) Hardware and Software Platforms; (4) Level of SGML Support: Feature support, Syntax support, Validation services offered; (5) Activities and Functionality (planning the application, capturing the data, managing the information, putting the information to work); (6) Directory of SGML Tools and Vendors. It supplies a listing of all (known) SGML tools, categorised according to functionality, along with names, addresses and telephone numbers of the vendors. The database is an extremely valuable resource, and has been updated faithfully [from 1992 through September 1997, or later].
Available via the WWW on the URL http://www.infotek.no/sgmltool/guide.htm.
[Reference is from the PREMIUM Project]
The article summarizes some of the key recommended changes in 8879 and most likely changes to be ratified in the revision of ISO 8879:1986. Several of the changes involve the integration of HyTime constructs into SGML itself.
Summarizes material presented in a poster session at SGML '93. Discusses the value of creating reusable "DTD modules" (via parameter entities) as opposed to placing a copy of a DTD into a document instance.
The article discusses: "ISO 10646 and Unicode; Transmitting and Storing 16-bit Bit Patterns on 8-bit Byte-oriented Systems; 10646/Unicode and XML; Non-canonical Representations of Strings of UCS-2 Characters; UTF-8 and ASCII; Which Am I Getting: UCS-2, UTF-8, or Something Else?" On this topic, see also the excellent article written by François Chahuneau, "Unicode and Internationalization Issues in Document Management: A Global Solution to Local Problems," in The Gilbane Report on Open Information & Document Systems 5/4 (July/August 1997) 1-25; it also discusses Unicode and XML.
The article is a continuation of the author's presentation in the October issue of <TAG>. It discusses in greater detail XML's "Document Character Set," and current WG discussions on the status of "composite" characters.
See also: "More on XML Characters" (January 1998).
The first article in an announced series of articles on SGML character set issues. This lead article defines some of the key terms that are used variably within different industry and standards arenas: glyph, glyph image, font, character, character repertoire, etc.
The second in a series of tutorial articles on character sets, introducing some of the new terminology used in connection with the planned revision of SGML. See the first of the serialized articles on character sets in the February issue of <TAG>.
The author discusses the parts of the SGML declaration having to do with character sets, particularly in light of character-set handling in HTML. Titled sections of the article include: Character repertoires; SGML and characters; Coded character sets; Encoding schemes.
The author explains the different kinds of "data" (and metadata") in SGML, with a focus upon special processing concerns that need to be understood when using "RCDATA," and "CDATA" in different contexts. The article explains various wany of hiding markup characters (as literal data) within PCDATA so that it is not recognized as markup.
The author discusses the definition of a "character" (and related concepts) from the perspective of the SGML standard. He explains how SGML entities can be used for special characters in the ISO 8879 scheme, and explains how the SGML "characte set" relates to some industry-standard character sets.
The author reflects on the current discussion (within the XML design arena) to differentiate markup for EMPTY elements (which cannot have an end-tag) from markup for elements that have an omissible end-tag -- in support of parsing an SGML/XML instance without the DTD. The current proposals are apparently very similar to one made by John Klensin (INFOODS) in 1985 -- at a time when it was recognized that the design of SGML's EMPTY element was problematic, but when it was said to be too late to turn back... Etc.
The tutorial article is one in a series on basic document analysis. Peterson treats common features in "running text" and the relevance of some style rules for DTD design and tagging.
The document analysis tutorial discusses identification of "section" text objects, and explains when using recursion (nested section structure) to model the hierarchy may or may not be a good idea.
The third in a series of tutorial articles on fundamentals of document analysis. The article includes a dissussion of tables as logical structures versus tables as a display style.
Peterson provides extensive commentary on the definition of 'DTD' [document type definition] in SGML (ISO 8879) and then explains how XML schemas (may) relate to XML DTDs. He identifies three separate components in a DTD: syntax, semantic roles, and application semantics.
Comments on the difference between "document type definition" and "document type declaration".
Abstract: "SGML products could make 'good SGML' easier by separating tabular display from tabular data organization more thoroughly. This presentation will describe and discuss the structural versus display approaches to tabular data in SGML, and will describe the author's dream table-oriented capabilities for display-oriented SGML tools, especially editors. In the process, there will be provided a description of various simple and more complicated tabular structures. We don't accept products that only recognize one or two DTDs in general; why should we for tables?"
"There is a difference between tabular display of data and tabular organization of data. Tabular display involves how data is placed on the screen or page, whereas tabular organization involves the semantic relationships between various pieces of data. Most programs, such as SGML-aware editors, that provide a tabular data display currently require that the SGML markup for the data directly indicate how it is to be displayed, rather than any structural relationship between the pieces of data, and generally require that all data to be displayed tabularly be marked up to the same DTD fragment (hereinafter called a 'schema'). This should not be necessary. My 'dream' is that it not be necessary."
This paper was delivered as part of the "User" track in the SGML/XML '97 Conference.
Note: The SGML/XML '97 conference proceedings volume is available from the Graphic Communications Association, 100 Daingerfield Road, Alexandria, VA 22314-2888; Tel: +1 (703) 519-8160; FAX: +1 (703) 548-2867. Complete information about the conference (e.g., program listing, tutorials, show guide, DTDs, conference reports) is provided in the dedicated conference section of the SGML/XML Web Page and via the GCA Web server. The electronic proceedings on CDROM was produced courtesy of Jouve Data Management (Jouve PubUser).
This tutorial on SGML NOTATIONS explains how to use a notation with an external data entity and as an attribute of an SGML element.
The author refers to the work of SGML Open in its review of the CALS table model as variably interpreted by implementors. The model was designed in 1989, but the DTD did not adequately address "the semantic role of each of the types of elements and attributes declared therein." According to the author, the "lesson to be learned" from the current (expensive) re-interpretation of the CALS table model is: "It's going to cost a lot of money downstream if you don't document the semantics of your element types and attributes carefully."
The author provides an update on XML characters, as of the December 1997 XML specification. Peterson comments on the XML WG's choice of UTF-16 (variable-length representation) rather than UCS-2 (the 16-bit two-byte canonical representation of the first 65536 characters of ISO 10646). See also "Characters, Encodings, and XML, Continued" and "Characters, Encodings, and XML."
The author discusses SGML/HyTime 'groves' in light of the emerging importance of this notion in the development of SGM systems. The concept was formalized in the HyTime TC [1997]. Earlier articles by the author on architectural forms and groves (from September and November 1995) are refernced. [Dave Peterson, "Objects, Classes, Trees, and Groves." <TAG>: The SGML Newsletter 10/8 (August 1997) 9 - 10 "Trees, Groves, and SGML." By Dave Peterson. <TAG>: The SGML Newsletter 8/12 (December 1995) 11-12.]
The author examines the trend toward using "permissive" DTDs - a DTD that "permits different ways of organizing the same information, not because the different ways are needed (presumably because they have slightly different semantics), but simply to accommodate the preferences of different users." He warns that many common motivations for adopting a permissive DTD are uncritical, and end up working against the goals of the information management project.
Dave Peterson discusses the special problems raised by the fact that there is frequently more than one way to represent the same abstract character in Unicode (for example, "ö" (e.g., using 16-bit and 32-bit representation). Some languages use a base character having several stacked diacritics, where differential ordering of the combinations would create different bit patterns for the "same" distinct (abstract) character, from XML's perspective. Where a Unicode-compliant piece of software ought to be able to equate equivalent representations, ISO/IEC 10646 "does not address the issue." A question remains as to whether the XML specification will address this issue, and what the consequences will be for designers and users of XML applications/implementations if the issue is not formally addressed.
Abstract: "There are differences of opinion as to how the current SGML standard (ISO 8879 as amended in 1988) should be interpreted with respect to the handling of the characters that make up the SGML documents it describes. But a consensus has pretty well been achieved as to how the revision now being worked on will treat 'characters' and 'character strings', and how the 'character sets' described in an SGML declaration will be interpreted and used. This paper presents the character model that is being considered by the group working on the revision of ISO 8879 (the SGML Raporteur Group of ISO/IEC JTC1 SC 18 WG8).
Characters are recognized as 'abstract' data types, just as, for example, are integers. The new model will not assume, for example, that characters of a given character repertoire are always represented by fixed-width bit strings and that strings of characters are not always represented by direct concatenation of the representations of single characters.
The new character model clarifies the relationship between the character representations being used by an SGML system, the character representations used to store external entities, and the character sets described in the SGML declarations of SGML documents. It provides for the possibility of character representation information being in the SGML declaration's 'document character set' description or the 'formal system identifier' of an entity, or even being provided via external-to-the-document, system-dependent means."
Note: The above presentation was part of the "And More..." track at SGML '96. The SGML '96 Conference Proceedings volume containing the full text of the paper may be obtained from GCA.
Abstract: "SGML was designed in an environment where other-than-8-bit character representations were only vaguely known and not understood. The designers did not differentiate between (abstract) characters and the bit-patterns by which they are represented in machines. This resulted in a character-handling model that is no longer adequate in many respects. In addition, there have surfaced differences of opinion as to how the current SGML standard (ISO 8879 as amended in 1988) should be interpreted with respect to the handling of the characters that make up the SGML documents it describes.
"A new character and character-string model has been adopted by the SGML Rapporteur Group within WG8 [now WG4], where the ISO 8879 revision is being prepared. The new model encompasses handling of variable-width-character string representations such as Shift-JIS, and outside-the-document specification of character representations, as well as the traditional 'document character set' specification.
"A Technical 'Correction' to ISO 8879 was made official in 1996, which made it more feasible to use SGML with very-large-character-set languages such as Japanese and Chinese, for use on SGML systems not constrained to an 8-bit character set.
"This presentation will explain the distinction between (abstract) characters and the computer representations of characters, and will explain the new character handling model in terms thereof. It will further explain the relationship to the old character-handling model of 1986, and how older systems may be upgraded, and what is possible when still running under the old (1986/88) rules. This involves the relationship of the 'document character set' to the way systems may actually represent characters, and the use or non-use of the 'shunned character numbers' specification."
Note: The electronic conference proceedings in hypertext were produced by Inso Corporation (DynaText) and by High Text (EnLIGHTeN). Information about the SGML Europe '97 Conference may be found in the main database entry.
The author explains and illustrates the pitfalls of using nested marked sections, including precedence rules and (non-)support on the fine points by some of the leading SGML software packages, depending upon where the marked sections occur. Some of the complexities and problems will be addressed in the revision of ISO 8879.
Overview of the document analysis process from the perspective of an enterprise representative working with outside SGML consultants, with a view to conversion of legacy data.
Available online: stepir.dtd and readme.txt, or in mirror copy here [August 1995].
Absrtact:" The Standard for the Exchange of Product Data (STEP) is emerging under the auspices of the International Organization of Standards (ISO). As part of a major NIST effort in support of the advancement of STEP, NIST is developing an environment which will facilitate and accelerate the development of the component specifications in STEP, know as Application Protocols (APs). This environment is called the Application Protocol Development Environment (APDE), the purpose of which is to provide an integrated suite of STEP-tailored tools to assist STEP AP developers in the development of high-quality APs. A major part of the APDE will be document authoring, browsing, and publishing environment based on the Standard Generalized Markup Language (SGML), an ISO standard (ISO 8879) which is used to specify format. The SGML environment is expected to address two major challenges faced by the developers of STEP documents. These challenges are: 1) to ensure accurate interpretation and conformance to the specified structure of STEP documents in a "reasonable" amount of time and 2) to be able to intelligently query and access information from the component parts of the standar

