The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Last modified: November 11, 2002
General SGML/XML Applications

SGML/XML Applications in Cross-Domain and Multi-Disciplinary Enterprises

This document provides brief description of and pointers to general industry applications that use SGML/XML encoding for structured information processing and data interchange. Specific XML applications and XML vocabularies are listed in the main XML Page, while XML/XLL/XSL software tools are listed in the software section. Academic applications of SGML and XML are described in the document "Academic Applications".


Contents


HyTime: ISO 10744 Hypermedia/Time-based Structuring Language

[CR: 19980318] [Table of Contents]

[March 18, 1998] Note: Information on HyTime is now being maintained in a separate document, http://xml.coverpages.org/hytime.html. The content below will be retained for some time(so as not to break bookmarks), but it will not be maintained for currency/update.


SMDL - Standard Music Description Language, ISO/IEC DIS 10743:1995

[CR: 20031104] [Table of Contents]

SMDL is organized by Working Group 3 of ISO/IEC JTC1/SC34 (Information Association). "The Standard Music Description Language (SMDL), an application of the HyTime Hypermedia/Time-based document structuring facilities, is described. The discussion covers the domains of information that SMDL associates with any piece of music, the timing of cantus events, pitch in cantus events, gamut-based pitches, just-intoned pitches, user-defined functions for pitches, chords and chord symbols, instrumental and vocal sounds, and non-western music." [from Steven R. Newcomb]

Extracted from the [July 1995] DIS: "This International Standard defines an architecture for the representation of music information, either alone or in conjunction with text, graphics, or other information needed for publishing or business purposes. Multimedia time sequencing information is also supported. The architecture is known as the "Standard Music Description Language", or "SMDL". // SMDL is a "HyTime" application; it conforms to International Standard ISO/IEC 10744 - Hypermedia / Time-based Structuring Language ("HyTime"). Specifically, SMDL is a "derived architecture" derived from HyTime architecture, and SMDL is expressed in this International Standard in a manner which conforms to HyTime's specifications for the expression of architectures (also known as "meta-DTDs") and derived architectures. // SMDL is an SGML application conforming to International Standard ISO 8879 - Standard Generalized Markup Language." [See the Overview cited above]


SGML Initiative in Health Care (HL7 Health Level-7 and SGML/XML)

[CR: 20000620] [Table of Contents]

"HL7 was founded in 1987 to develop standards for the electronic interchange of clinical, financial and administrative information among independent health care oriented computer systems; e.g., hospital information systems, clinical laboratory systems, enterprise systems and pharmacy systems." In August 1996, the HL7 Technical Steering Committee authorized the creation of an SGML SIG as part of a larger initiative to integrate SGML into medical informatics standards. "HCML" is a proposed abbreviation for the evolving markup language: "Health Care Markup Language."

"The HL7-SGML Initiative is a special interest group of HL7 formed to create the standard for the use of SGML in all domains of health care. This standard will comply with ISO 8879 (SGML) and SGML-related standards and complement other appropriate standards. Participation is open to all parties. Our Mission statement will guide the primary objectives: (1) To create and coordinate the development of a comprehensive document architecture for health care; (2) To educate the healthcare community in the utility of SGML-based information; (3) To develop, coordinate, and maintain a framework for interoperable Document Type Definitions (DTDs) for use in health care; (4) To coordinate and cooperate with other SGML initiatives outside of healthcare where appropriate; (5) To enable and promote the use of these standards and to make the standard as widely available as possible; (6) To represent healthcare in SGML standards activities/evolution; (7) To promote longevity of all information encoded according to these guidelines." [from the main page, Mission Statement and Charter]

"Those presenting at SGML '96 made it clear that while planning for the design of the initiative is in the early stages, their vision for this initiative already differs substantially from earlier industry initiatives. The HL7/SGML initiative must design an information architecture that will make the EMR work within the heterogeneous computing environment of a healthcare enterprise and must make the information available to a wide range of applications from billing to epidemiology and decision support. They feel that it is a significant advantage to begin this initiative after the publication of the HyTime, DSSSL, and preliminary XML standards." [L.Alschuler post]

Links:

Contact addresses:
John Mattison, SGML SIG Chair
Kaiser Permanente
393 E. Walnut
Pasadena, CA 91188
USA
Tel: +1 (818) 405-5091
Email: john.mattison@kp.org
Email: Chris Zingo - Chris.Zingo@kp.org [meetings of the HL7-SGML Initiative]
or [general HL7]:
Mark McDougall, Executive Director, Health Level Seven
3300 Washtenaw Avenue, Suite 227
Ann Arbor, MI 48104-4250
Phone: (313) 677-7777
Fax: (313) 677-6622
Email: HQ@HL7.ORG


National Center for Biotechnology Information (NCBI)

[CR: 19971118] [Table of Contents]

Updated from 19971118. See now the separate document: "NCBI Molecular Biology Data Model."


SPIDER - Structured Platform-Independent Data Entry and Reporting

[CR: 19971118] [Table of Contents]

SPIDER is one of the research projects sponsored by MIDAS (The Medical Informatics and Decision Science Consortium, Milwaukee, Wisconsin, USA). Its principal investigators include Charles E. Kahn, Jr., (MD), Phiem N. Huynh, and Kurt J. Pfeifer. SPIDER uses platform-independent, public-domain technologies such as SGML and HTML (with the World Wide Web) to achieve structured entry of medical data. Applications include radiological reports and medical questionnaires." The requirements for SPIDER and the associated form-generation tools are informed by concepts related to structured data entry (SDE), as applicable in medical informatics: "SDE standardizes data collection, increases the certainty of data summaries, and facilitates integration with decision support systems. [. . .] clinical observations in a computer-based patient record may be acquired as natural language or through structured data entry."

SPIDER makes use of the "Data-entry and Report Markup Language (DRML), a platform-independent markup language for specifying the content and format of structured reporting applications. DRML is based on the Standard Generalized Markup Language (SGML), an international standard for document exchange. [. . .] DRML was created to provide a simple standardized, universal method for specifying reporting applications. With SPIDER, DRML documents are used to create structured data-entry forms, outline-format textual reports, and datasets for analysis of aggregate results. SPIDER transforms its reporting knowledge base, written in DRML, into the appropriate hypertext markup language (HTML) codes for display by Web client programs."

Links:


Metafile for Interactive Documents (MID)

[CR: 19960809] [Table of Contents]

"The Metafile for Interactive Documents (MID) is a common interchange structure, based on the international standards for SGML and HyTime, that takes neutral data from varying authoring systems and structures it for display on dissimilar presentation systems, with minimal human intervention. [MID Draft specification dated Nov. 94]. It is envisioned that a MID Instance (the actual MID document) will be a 'hub' document, containing references to various (external) source data components. The MID Instance will be created by an interactive, automated process (i.e., a MIDWriter), and will be interpreted for viewing by off-the-shelf software incorporating a MIDReader. Development of a MIDReader was the primary focus of the 1995 MID project, as its development was intended (and already has) served to both point out issues in the structure of the MID, and identify implementation issues. Resolution of these issues has resulted in an evolutionary improvement to the MID specification." [from the Abstract]

"The MID provides a modular approach to authoring and maintaining IETMs. A MID standardizes the presentation of information and the behavior of that presentation across platforms. This is achieved through a standard set of user interface objects combined with an internal scripting language that controls the interaction of these objects with each other and the user as the objects access databases and display information on a Presentation System."

"Cross-platform interoperability is achieved through the use of SGML/HyTime. The MID is an application of SGML (ISO 8879) and HyTime (ISO 10744). SGML standardizes the syntax of the Document Type Definition for the MID language. HyTime provides standard models for location and addressing element types used in the MID DTD. This document assumes that the reader is familiar with the concepts and requirements of SGML." [from the NavySGML description]

MID is ". . .an application of HyTime aimed at Interactive Electronic Technical Manuals [IETMs] . . . It has been implemented and it works" [Steven R. Newcomb]


Standard Hypermedia/Multimedia Scripting Language (SMSL)

[CR: 19961222] [Table of Contents]

SMSL . . ."Provides a standardized method for defining the constructs used in the script for an audiovisual presentation. . . Extends HyTime by providing SGML meta-DTD architectural forms for describing the object classes, virtual functions, messages, aggregates and class/data membership used in a multimedia presentation's script. Also contains a definitions for a starter-set of functions used by scripting languages. . .sponsored by ISO/IEC JTC1/SC18 WG8 . . . Still at committee draft stage (ISO/IEC CD 13240)" [from www.echo.lu]

"SMSL is a standardized mechanism for embedding scripts in SGML hyper-documents. SMSL prescribes: (1) a mechanism for describing script and data notations; (2) a mechanism for describing classes, data structures and method arguments; (3) services that SMSL applications use; (4) an optional library providing useful classes for multimedia, generic operating system functions and graphical user interfaces. . . The features of SMSL include: (1) object-oriented interface between scripts and documents (both the "outside-in" and "inside-out" views); (2) use of message passing model for inter-object communication; (3) use of SGML to describe data structures and interfaces; (4) upport for distributed applications; (5) support for a wide variety of scripting and programming languages."

"The Editor and Architect of SMSL is Mr. Brian Markey. Comments on SMSL documents may be sent to the Editor, but under ANSI/ISO rules, the Editor is under no obligation to respond to comments from non-participating parties. To participate in the development of SMSL, employees of U.S. companies or academic institutions are encouraged to contact Mr. Rudolf Riess, Chairman, ANSI X3V1." [from Permanent Wave WWW site]


Digital Libraries (Initiative) and SGML

[CR: 19970221] [Table of Contents]

A broad range of consortial and cooperative initiatives currently [Spring 1996] underway make use of the phrase digital library" in identifying or describing their resarch and development efforts. Several such projects are being carried out jointly by research libraries and government bodies. Several such projects are making use of SGML as a means of structuring the digitized information, whether of the documents themselves or of metadata (descriptive cataloging and finding aid information). For metadata issues, see further below. Some provisional links:


SGML and Metadata

[CR: 19971101] [Table of Contents]


Hyper-G Text Format (HTF)

[CR: 19951220] [Table of Contents]

"HTF is defined in terms of the ISO Standard Generalized Markup Language (SGML). SGML is a sort of meta-standard for defining structured document types and markup languages to represent instances of those document types."

"HTF is closely related to Hyper-G in the following sense: HTF is the preferred text format for Hyper-G. That means that all current and future Hyper-G clients will support this format, even if other text formats will become available in some future versions of Hyper-G."

"Hyper-G clients employ a generic SGML parser that makes it possible to display any kind of SGML markup, provided there is a DTD and a style sheet. There is a stand-alone incarnation of the parser called hgparse that is part of the Hyper-G server distribution that allows to use the parser to verify the correctness of the SGML markup, and to convert it to other text formats, given an appropriate style sheet." [from "Hyper-G Text Format", by Frank Kappe]"


Association of American Publishers (AAP)


ISO 12083 DTDs

[CR: 20001106] [Table of Contents]

ISO 12083 is the successor to the AAP/EPSIG standard, and four DTDs have been distributed by EPSIG as the "ISO" DTDs.

ISO 12083 Scope (1998): "This International Standard presents a reference document type definition which facilitates the authoring, interchange and archiving of a variety of publications. This document type definition is deliberately general. It is a reference document type definition which provides a set of building blocks for the structuring of books, articles, serials, and similar publications in print and electronic form. This International Standard is intended to provide a document architecture to facilitate the creation of various application-specific document type definitions."

[January 25, 1999] As of January 1999, Dianne Kennedy was "convener of ISO 12083, the electronic manuscript standard." Dianne indicated that ISO 12083 would be undergoing a 'major overhaul' during 1999. "ISO 12083 meeting minutes, existing DTDs, newly proposed XML DTDs and more are posted at http://www.xmlxperts.com/12083.htm. Our next meeting will be in Granada Spain, on May 1, 1999 (following XML Europe '99)."

[November 06, 2000] ISO 12083 is available free in PDF format from NISO: ANSI/NISO/ISO 12083 Electronic Manuscript Preparation and Markup ("The standard specifies the SGML declaration defining the syntax used by the document type definitions [DTD] and document instances, and a definition for mathematics which may be embedded in other SGML applications").

References:


IBMIDDoc: IBM Information Development document type

[CR: 20021111] [Table of Contents]

Wayne L. Wohler, Rick Dennis, and Eliot Kimber were principal contributors to the IBMIDDoc specification. Ref: IBMIDDoc User's Guide and Reference. Document Number SH21-0783-02. December 17, 1995. Rick Dennis (Author), Wayne Wohler (DTD Architect).


IEEE Computer Society Digital Library

[CR: 19971023] [Table of Contents]

The Computer Society Digital Library (CSDL) is an SGML-based document repository that uses Inso's DynaWeb server to deliver the documents as HTML on the fly to Web browsers. The CDROM version of the library also stores the articles in SGML format, delivered by the DynaText SGML browser. With this SGML-based software, users can search by any arbitrary string to find references for their current projects, attach their own electronic notes, place electronic bookmarks, and print full articles. The 1996 edition has over 12,000 pages of text from the 114 IEEE journals issues.

The CSDL digital library "contains all issues of seventeen (17) of the society's magazines and transactions from 1995 to the present. The material is viewable and full-text searchable via standard WWW browsers. . . For those interested in the publication technology, we have created a database of SGML files and linked images. These files are converted and displayed as HTML on the fly. This allows subscribers to manipulate and view the content -- including math -- with standard web browsers without any helper applications or plug-ins."

According to the CS style guide, the IEEE Computer Society "has made a corporate decision to become an active participant in the electronic publishing arena," and has now become a content provider rather than a publisher in the traditional sense. "To maintain its lead in this rapidly developing field, the society began in 1994 to archive all of its transactions and magazine articles electronically. This is done by putting Microsoft Word documents through a custom filter in order to convert them into SGML files. These SGML files can then be manipulated in any way that may be considered desirable for future reproduction. For example, the entire 1995 content of each of the society's periodical titles is now available on CD-ROM. In 1996, the society began to post the content of some of its publications on the World Wide Web home page it maintains."

Links:

Addresses:
Edna Straub
SGML Database Coordinator
Information Technology & Services Department
IEEE Computer Society
10662 Los Vaqueros Circle
Los Alamitos, CA 90720-1264
Tel: 1-714/816-2103
FAX: 1-714/821-4641
EMAIL: estraub@computer.org WWW: [Contact: Edna Straub]


IEEE Standards Department

[CR: 19990520] [Table of Contents]

The Standard Generalized Markup Language (SGML), "is the internal production format of IEEE Standards."

Provisional introduction: "SPAsystem modular DTD approach: The purpose of the SPAsystem Authoring DTD Suite (SPA Z-30) is to create an environment that allows authors to write an IEEE standards document in SGML in a simple and intuitive manner. This is accomplished through a series of DTD modules. Each module is a small, highly structured DTD that defines a particular portion of an IEEE standards document." [from the Authoring System description]

IEEESTD V3.0 DTD ('ieeestd.dtd' - DTD for an IEEE standard): "This DTD is used both to create and edit IEEE standards of all sorts and to hold legacy standards, some dating back 20 years ago. Hence it is a VERY lax structural DTD, with very few semantic elements, except in the front matter (frntmttr) and standard title (stdtitle). . . Public Identifier: -//IEEE//DTD IEEE Software Standards 3.0//EN. Please send comments regarding this DTD to m.v.rodriguez@ieee.org. Version: 3.0.0 1998-10-15.

Addresses:

IEEE Standards Department
Stephen Huffman, Electronic Program Implementation
445 Hoes Lane, P.O. Box 1331
Piscataway, NJ 08855-1331
Tel: 908-562-3828
Fax: 908-562-1571
email: s.huffman@ieee.org
or:
email: m.v.rodriguez@ieee.org


Davenport Group: DocBook DTD

[CR: 20000830] [Table of Contents]

See now also "DocBook XML DTD."

[January 06, 1999] In January 1999, the DocBook DTD moved to a new Web site at http://www.oasis-open.org/docbook/. DocBook is an SGML DTD that is 'particularly well suited to books and papers about computer hardware and software'. While the official DocBook distribution is an SGML DTD, an XML DTD based upon DocBook version 3.0 has been under development for some time -- principally through the efforts of Norman Walsh. Walsh also maintains the DSSSL Modular DocBook Stylesheets, used for print and online publishing of DocBook documents. "Because DocBook is a large and robust DTD, and because its main structures correspond to the general notion of what constitutes a 'book,' DocBook has been adopted by a large and growing community of authors writing books of all kinds. DocBook is supported 'out of the box' by a number of commercial tools, and there is rapidly expanding support for it in a number of free software environments. These features have combined to make DocBook a generally easy to understand, widely useful, and very popular DTD. Dozens of organizations are using DocBook for millions of pages of documentation, in various print and online formats, worldwide." Originally designed and implemented by HaL Computer Systems and O'Reilly & Associates in 1991, the DocBook DTD is now maintained by the OASIS DocBook Technical Committee. The meetings of the DocBook Technical Committee are open to anyone who wishes to attend, and thus not limited to OASIS members.

[July 01, 1998] In the summer of 1998, plans were made to turn over the maintenance of the DocBook application to OASIS (Organization for the Advancement of Structured Information Standards) under a new OASIS DocBook Technical Committee. See the relevant communique to the Davenport Group mailing list from Eve Maler, and Karl Best's invitation to a wider group of DocBook supporters to attend the first committee meeting in San Diego. And see the press release.

"The purpose of the Davenport Group [is] to promote the interchange and delivery of computer documentation using the Standard Generalized Markup Language (SGML, ISO 8879:1986) and other relevant standards." [adapted from the DAVENPORT GROUP CHARTER AND BY-LAWS]. A recent "version of the DocBook DTD is 2.4.1, released 10 April 1996. This release includes much improved documentation." "The DocBook DTD was developed specifically for computer software documentation, that is, user manuals and programming references. DocBook maintenance is performed under the aegis of the Davenport Group, a discussion forum sponsored by individuals representing large-scale producers and consumers of software documentation. Davenport meetings are held roughly quarterly, and are open to everyone. However, decisions about the DocBook DTD are made only after approval by the Davenport sponsors." [from the README].

Sponsors: As of October 07, 1996, the Davenport sponsors are: Terry Allen, Fujitsu Software Corporation (San Jose, CA), Mark Buckley, Microsoft (Redmond, WA), Ralph Ferris, Fujitsu Software Corporation (San Jose, CA), Lee Fogal, Digital Equipment Corporation (Nashua, NH), Eduardo Gutentag, SunSoft Inc. (Mountain View, CA), Eve Maler, ArborText, Inc. (Boston, MA), Murray Maloney, SoftQuad Inc. (Toronto, CA), Nancy Paisner, Open Environment Corp. (Boston, MA), Bob Stayton, The Santa Cruz Operation, Inc. (Santa Cruz, CA), Norman Walsh, O'Reilly & Associates, Inc. (Boston, MA).

Links:

Current efforts within the Davenport project include:


ICADD: International Committee on Accessible Document Design

[CR: 19970425] [Table of Contents]

SGML is being used by various entities supporting ICADD because the document structure captured in SGML is vital in the communication processes designed for persons with print disabilities. ICADD Statement of Purpose: The International Committee for Accessible Document Design (ICADD) is dedicated to making printed materials accessible to persons with print disabilities. ICADD is an international nonpartisan consortium of representatives from industry, education, and the disabled community. We believe that advancing computer based publishing, through adaptive computer technology for persons with disabilities, offers the potential to make printed information accessible simultaneously and at no greater cost than the able bodied community enjoys." [from the 1992 Statement of Purpose document referenced below]

For more information on the InfoUCLA - ICADD project, contact: (1) Jeff Suttor, Programmer/Analyst, Library Information Systems; 11334 University Research Library, 405 Hilgard Avenue, Los Angeles, CA 90024-1575; Email: JSuttor@Library.UCLA.Edu, Tel: 310-825-1206 or 310-206-5565; FAX: 310-206-4109; or, (2) Daniel Hilton-Chalfen, UCLA Disabilities and Computing Program, hilton-chalfen@mic.ucla.edu, 310-206-7133, TDD: 310-206-5155


CAPS (Communication and Access to Information for Persons with Special Needs) and HARMONY (Horizontal Action for the Harmonisation of Accessible Structured Documents)

[CR: 19960226] [Table of Contents]

CAPS

The CAPS Project started in 1991 with a pilot action which ran until March 93. The project then received an extension to further develop the ideas and techniques designed in the Pilot Phase. This Extension Phase was concluded in September 1994.

"The project is financed under the TIDE (Technology Initiative for Disabled and Elderly Persons) programme by the Directorate-General XIII for Telecommunications, Information Industries and Innovation of the E.U.

"The project's main objective is to provide broader access to digitally distributed documents, especially newspapers, books and public information. The Consortium believes that more traditional forms of information transfer will loose some of their importance in favour of electronic information. This presents an enormous opportunity to the reading impaired community as much more information will be available to them.

"In the Extension Project, an electronic library prototype has been set up. This electronic library provides a whole variety of SGML documents to its clients who can access them by means of the Reading Station or via a Telephone Access System. Also non-interactive access is guaranteed."

HARMONY

"The HARMONY Horizontal Action is the follow up to the two previous CAPS Projects (TP 136 and TP 218). Within HARMONY the Consortium will try to increase the quantity and quality of documents accessible to the print disabled. To accomplish this goal, the publishing community will be stimulated by means of involvement and lobbying with standardisation being a key issue.

"The use of the ISO Standard SGML (Standard Generalized Markup Language, ISO 8879) will be encouraged. The HARMONY Consortium expects to stimulate an increase in accessible newspapers through the use of the European Interchange Format (EIF) developed in CAPS, and other SGML based techniques. Publishers will be urged to introduce and incorporate these techniques within their organisations."

Links:


ELVIS - Elektronisches Literaturverzeichnis - Informatik für Sehgeschädigte

[CR: 19960730] [Table of Contents]

The ELVIS WWW server hosts a number of services and information resources relevant to SGML/HTML technologies for sight-disabled people (ICADD, CAPS, etc.). Most documents on the server are in German. As of November 1995, an SGML tutorial was also available. The resources apparently have been collected and developed by Thomas Kahlisch (email: kahlisch@inf.tu-dresden.de).

From the ELVIS Home Page: "Willkommen am WWW-Server der Arbeitsgruppe Studium für Blinde und Sehbehinderte an der Fakultät Informatik der TU Dresden. Unsere Arbeitsgruppe beschäftigt sich mit der Bereitstellung von blinden- und sehbehindertengerechten Studienmaterial. Über das ELVIS (Elektronisches Literaturverzeichnis - Informatik für Sehgeschädigte) werden diese Materialien angeboten. Wir beabsichtigen, ausgewählte Materialien über diesen WWW-Server anzubieten. Damit wollen wir helfen, Erfahrungen für die Gestaltung der blinden- bzw. sehbehindertengerechten Auffahrt auf die "Infobahn" zu sammeln."

Links:

Addresses:
Technische Universität Dresden
Attention: Thomas Kahlisch
Fakultät Informatik
Arbeitsgruppe Studium für Blinde und Sehbehinderte
01062 Dresden
Tel: 0351/4575 467
Fax: 0351/4575 335
E-Mail: elvis@irz.inf.tu-dresden.de


NITF (News Industry Text Format) [Formerly UTF - Universal Text Format] - SGML for the News Distribution Industry

[CR: 19990115] [Table of Contents]

Update March, 1996: "What was called the UTF is now known as the NITF (News Industry Text Format)" [Steve Pepper]. See the links to www.iptc.org below.

UTF ("Universal Text Format") is the name of a new standard being adopted by the news distribution industry, and particularly, under the direction of working committees in the IPTC (International Press Telecommunications Council) and NAA ( Newspaper Association of America). UTF is part of The Information Interchange Model (IIM) standard. Description from a working document on UTF by Dave Becker: "In June, 1992, a working subcommittee was established to create an industry standard for the interchange of textual material between news agencies and and their clients (primarily newspapers) that would replace the current standard IPTC 7901 and ANPA 1312 formats. The new standard is called the Universal Text Format (UTF). After significant discussion, SGML was adopted as the encoding language for the new standard. Members of the working subcommittee are now attempting to finalize and prototype the new standard in selected test environments."

Further information about NITF/UTF/SGML may be found in:

Contact addresses (New Text Subcommittee):

International Press Telecommunications Council
Attention: David Allen
8 Sheet Street
Windsor, Berkshire SL4 1BG
UNITED KINGDOM
TEL: 44-753-833-728
FAX: 44-753-833-750
Email: 100321.2156@CompuServe.COM [Davis Allen]

Newspaper Association of America
Attention: John W. Iobst [Director, Advanced Computer Science]
1921 Gallows Road
Suite 600
Vienna, VA 22182-3900
Tel: 703/902-1838
FAX: 703/902-1842
Email: iobsj@naa.org
WWW: http://www.naa.org/


Canadian Strategic Software Consortium (CSSC): SGML and SQL

[CR: 19960321] [Table of Contents]

"The objective of the Canadian Strategic Software Consortium (CSSC) is to perform pre-competitive research in order to create software technology that will permit the extension of database management technology to text-intensive data; produce working prototypes that are based on these new technologies; apply the working technology to several large-scale real-world problems; and to present the research and the technology in forums that are appropriate to the establishment of technical standards."

The consortium is currently [March 1996] composed of eight members: Fulcrum Technologies Inc., Grafnetix Systems Inc., InContext Corporation, Megalith Technologies Inc., OpenText Corporation, Public Sector Systems Ltd., SoftQuad Inc., and University of Waterloo Waterloo, Ontario. SGML plays a significant role in the members' current operations and in the consortium's development plans.

"The mandate of the consortium is to undertake pre-competitive research to: (1) create the technology that will permit the extension of database management technology to text-intensive data; (2) produce working prototypes that are based on these new technologies; (3) apply working technology to several large-scale real-world problems; and (4) present the research and the technology in forums that are appropriate to the establishment of technical standards." Several of the research and development efforts work toward the integration of SQL and (SGML) structured text models. A "Hybrid Query Processor" (HPQ) being developed at the University of Waterloo "will provide a gateway to a federated database system and will support the construction of "virtual" tables managed (and updated) solely by the HQP. Tuples in these managed tables can contain TEXT and standard types of relational information stored on one, two or many underlying database engines."

Links:


Bilingual Canadian Dictionary Project

[CR: 19971018] [Table of Contents]

The Bilingual Canadian Dictionary Project (Lexicographie comparée du français et de l'anglais au Canada) is funded by the Social Sciences and Humanities Research Council of Canada (through 1999), and is expected to publish a completed work in about 2003. SGML tools have been used for designing a dictionary DTD, editing database entries, and querying the lexical database. "This interuniversity, pan-Canadian project, known informally as the Bilingual Canadian Dictionary Project, was launched in early 1988 with the goals of producing a truly Canadian English-French, French-English dictionary and developing an electronic database of English- and French- Canadian texts for comparative work in the fields of translation and lexical research.

"Led by Professor Roda P. Roberts at the University of Ottawa's School of Translation and Interpretation, the project brings together a team of seven researchers specializing in various subfields of linguistics at the University of Ottawa, the University of Montreal and Laval University. The research team also includes a number of graduate students and researchers who are being trained in bilingual lexicography. Microstar Software Ltd. is serving as the project's computer consultant."

Links:

Addresses:
Bilingual Canadian Dictionary
University of Ottawa
40 Stewart
Ottawa, Ontario
Canada K1N 6N5
E-mail: langlois@balzac.sti.uottawa.ca
Fax: 1-613-562-5131


Electronic PROTEIN SCIENCE

[CR: 19960410] [Table of Contents]

"Protein Science, a peer reviewed journal published by The Protein Society and Cambridge University Press, has established the Electronic Protein Science at the University of California, Irvine." [see ad]. In 1993, a CDROM with the title 1993 Protein Science [PIR Sequence Database, and Special Edition of Brookhaven Protein Data Bank] was produced: all data were marked up in SGML and indexed and accessed by DynaText(TM) on Mac, PC, and Unix platforms.

Now "the entire Protein Science editorial production process at Cambridge University Press has been redesigned to accommodate the electronic edition and to incorporate it into the routine production of the printed edition. Underlying both the printed and electronic edition is a single master document that is prepared in the Standard Generalized Markup Language (SGML) that is sent for production of typesetting code and to the Web site for the production of Hypertext Markup Language (HTML) documents used for the delivery of the electronic edition. Underlying the SGML document is another document called the Document Type Definition (DTD) which describes the information content of the document and makes possible sophisticated indexing. A great deal of innovation has gone into the design of the DTD to make it serve simultaneously the requirement of electronic and print media. We believe it will help set standards for electronic and print publishing in the world of scientific publishing." [from: "Finally! Protein Science On-line," An Editorial from Protein Science Volume 4, Number 9, 4:1665, by Stephen H. White and Hans Neurath.]

"The copy editors and production editors must now produce a document prepared by using the Standard Generalized Markup Language (SGML), which is a machine-readable marking scheme that describes in detail the structure and information content of an article. SGML makes it possible to index the contents of articles for full-text searching and, importantly, to establish hyperlinks within individual documents and between related documents. In addition, SGML and its derivative language HTML can be used for controlling the layout and appearance of articles displayed in both the electronic and printed environments. The typesetters and printers must adapt to SGML-marked documents and the electronic production editor must make provisions for mounting the documents on computer servers in such a way that they can be indexed and delivered rapidly over the World Wide Web." [from: an Editorial from Protein Science Volume 4, Number 1, 4:1-2, by Stephen H. White and Hans Neurath.

Links:


MIME-SGML (Multipurpose Internet Mail Extensions)

[CR: 19980606] [Table of Contents]


SSSH - Simplified SGML for Serial Headers

[CR: 19970719] [Table of Contents]

"A Serial Header contains bibliographic information about an article appearing in a Serial publication, i.e., a journal or other periodical. Serial Headers created by journal publishers for various purposes including electronic delivery to current awareness and electronic document delivery services. SSSH - Simplified SGML for Serial Headers - was developed last year by Publishing Technology and New Media Group on behalf of Book Industry Communications, the standards body of the UK book and serials publishing industry. SSSH has much in common with its respected antecedent, MAJOUR, but reduces the number of required elements, in accordance with the recommendations of the OASIS group of UK serials publishers, and adds new elements for the article identification schemes (SICI and PII) that have been developed since MAJOUR was published in 1991."

Links:


OCLC SGML Projects

[CR: 20000828] [Table of Contents]

OCLC Fred: SGML Grammar Builder Project (DTD and document grammar) tool

"Fred is an ongoing research project at OCLC Online Computer Library Center, Inc. (OCLC) studying the manipulation of tagged text. As a service to the community, OCLC has decided to make several portions of Fred freely available via a WWW server. Fred addresses two main problems associated with managing tagged text as seen at OCLC: (1) tagged document collections with no corresponding DTD, and (2) arbitrary transformation of tagged text." [adapted from Fred main page]

"As an electronic publisher, OCLC receives tagged text from several data sources. Often, this tagged text is not valid SGML since it does not have or conform to a Document Type Definition (DTD). Despite this, OCLC must build data transformations, databases, and interfaces for this tagged text. To address the lack of DTDs, Fred can automatically build DTDs from tagged text. While it is fairly straight forward to extract tags from a tagged document without a DTD, it is non-trivial to produce a reduced representation of this structure. You can use Fred's free automatic DTD creation services to sample this process."

"To address arbitrary transformations, Fred includes a translation language that allows direct mappings based on tag names, attributes, and structure, as well as movement of text within the tree and several other manipulation features. See the Fred Translation Service Home Page to read more about Fred's translation capabilities or to access Fred's free translation services."

Provisional Links for Fred:


SGML and Chemistry: The OCLC CORE Project (Chemistry Online Retrieval Experiment) and other Initiatives

[CR: 19960322] [Table of Contents]

"The OCLC CORE Project: Overview. The CORE project is an electronic library prototype the provides networked access to the full text and graphics content of the American Chemical Society journals and associated Chemical Abstracts Services indexing since 1980 (some 250 journal years of data). The database is coded in SGML (Standard Generalized Markup Language) which was translated from the original typography codes, captures the structural richness of the original document and provides flexibility for indexing, searching and display. The prototype provides a full-scale laboratory environment in which to explore issues of database structure, user interface capabilities, and information retrieval questions on a large, real-world scholarly electronic journal database. The complete database, representing more than 600,000 pages of full text and graphics, will be available at Cornell University in late 1994. The major contributors of this electronic library project include: Cornell University (Mann Library); OCLC; Bellcore; American Chemical Society; Chemical Abstracts Services."


Chemical Markup Language (CML)

[CR: 19990724]

"CML is a powerful generic tool for management of molecular and technical information, especially geared to Inter- and Intra-net use. Object-Oriented, based on Java and SGML (XML) it covers a wide range of chemical disciplines. . ."

NB: Some links below are obsolete. See later references in "Chemical Markup Language (CML)."


SGML and Physics: The American Physical Society, American Astronomical Society, and The American Institute of Physics

[CR: 19961222] [Table of Contents]

American Physical Society

Bob Kelly:: "The current explosion in the use of information technology for early dissemination of scientific information dovetails with APS's interest in electronic publishing and consumption of scholarly journals and provides both physicists and the Society and other Physics Publishers and Librarians with an opportunity to join forces to support and improve scholarly communication in physics.

"SGML is the center piece of the APS strategy to accept manuscripts electronically and to provide storage and delivery choices. With the advent of software, it is becoming feasible to make the SGML files of individual articles available for viewing. It is believed that, if we as publishers can agree on some degree of SGML standardization, the authors and readers will benefit. A common SGML approach will facilitate ease of integration of papers from multiple publishers at the library or reader level. A common SGML approach will facilitate the development of many authoring tools and reading choices." [from "The American Physical Society (APS) and the Standard Generalized Markup Language (SGML)", see below]

"TORPEDO: As part of a cooperative experiment with the American Physical Society (APS), the Library is authorized to disseminate current issues of two APS journals, Physical Review Letters and Physical Review E through TORPEDO. Under the terms of the NRL-APS agreement, NRL employees, NRL on-site contractors and ONR Headquarters staff may search and view the journals and make single copies for personal use. Any other use of APS copyrighted material is not permitted."

"The APS is now in position to make Physical Review Letters available to the NRL Library in SGML format on a regular basis, thereby eliminating the need to scan and OCR the paper copies. Moreover, Physical Review E is now partially available in SGML and will soon be available entirely in SGML as well as all of Physical Revi