SGML Open/OASIS Technical Resolution
9601:1996
Steve DeRose, EBT, Co-chair, Fragment
Interchange Subcommittee
SGML Open/OASIS
Paul Grosso, ArborText, Co-chair,
Fragment Interchange Subcommittee,
SGML Open/OASIS
Revision date: 1996 November 7
Copyright � 1996 SGML Open/OASIS
Permission to reproduce parts or all
of this information in any form is
granted to OASIS members provided
that this information by itself is
not sold for profit and that OASIS
is credited as the author of this
information.
|
 |
 |
 |
|
 |
 |
 |
 |
 |
 |
 |
 |
|
SGML Open/OASIS Technical
Resolution 9601:1996
Steve DeRose, EBT, Co-chair,
Fragment Interchange Subcommittee
SGML Open/OASIS
Paul Grosso, ArborText,
Co-chair, Fragment Interchange
Subcommittee, SGML
Open/OASIS
Revision date: 1996
November 7
Copyright � 1996 SGML
Open/OASIS
Permission to reproduce
parts or all of this information
in any form is granted
to OASIS members provided
that this information
by itself is not sold
for profit and that OASIS
is credited as the author
of this information.
|
|
|
|
Abstract
The SGML standard supports
logical documents composed
of a possibly complex
organization of many entities.
It is not uncommon to
want to view or edit one
or more of the entities
or parts of entities while
having no interest, need,
or ability to view or
edit the entire document.
The problem, then, is
how to provide to a recipient
of such a "fragment"
the appropriate information
about the context of that
fragment in the original
document that is embodied
in the part of the document
that is not available
to the recipient.
The goal of this resolution
is to define a way to
send fragments of an SGML
document--regardless of
whether the fragments
are predetermined entities
or not--without having
to send everything up
to the part in question.
The delivered parts can
either be viewed or edited
immediately or accumulated
for later use, assembly,
or other processing. This
resolution addresses the
issues by defining:
|
|
|
|
- exact constraints
on what portions of
an SGML document may
constitute fragments
to be supported by this
resolution;
- the set of information
needed to allow for
successful parsing as
well as for viewing
or editing of a fragment
in a useful and important
set of cases;
- the notation (i.e.,
language) in which this
information will be
described;
- some possible mechanisms
for associating this
information with a fragment.
|
|
|
|
Issues involved with
the possible "return"
of any such fragment to
the original sender and
the determination of the
possible validity of the
"returned" fragment
in its original context
are beyond the scope of
this Resolution. While
implementations of this
Resolution may serve as
part of a larger system
that allows for "fragment
reuse," the many
important issues about
reuse of SGML text are
beyond the scope of this
Resolution.
Technical Resolution
9601:1996
Committee draft: 1995
November 21
Committee draft: 1996
February 29
Final Draft Technical
Resolution: 1996 July
31
Final Technical Resolution:
1996 November 7
Introduction
The need to make SGML
documents available over
the Internet is well known.
This is easy as long as
whole documents are sent,
including their DTDs,
SGML declarations, all
entities, etc. But many
SGML documents are too
large to be managed by
shipping them in their
entirety when only a portion
may be needed.
Many documents are megabytes
in length, even excluding
all the graphic, video,
and other entities a document
may reference. Transferring
such a document can take
too long for real-time
access. Even after a document
arrives, it may take too
long to parse it and get
to the desired part. If
the user asked to look
at chapter 20, one must
parse 19 whole chapters
before seeing it. With
hypertext documents, one
also can't afford to include
every document the first
one references, when the
user will likely follow
only a few of the links.
The obvious solution
is to not send it all,
but instead send things
as they become needed.
The goal of this resolution
is to define a way senders
can send small parts of
an SGML document at need,
without also having to
send everything up to
the part needed. This
can be done regardless
of whether the parts are
entities or not, and the
parts can either be viewed
immediately or accumulated
for later use, assembly,
or other processing.
The SGML standard has
some constructs that can
be used to address these
issues in certain situations.
External text entities
can be used, but they
generally do not contain
the necessary context
information. Some tools
and implementations, however,
may be able to make use
of such entities without
the explicit context information.
Furthermore, 8879 defines
SUBDOC entities that are
self-contained in terms
of context (they are complete
documents), but each SUBDOC
forms its own ID name
space and each must have
its own DTD. Though some
fragment applications
can be addressed using
the constructs already
available in 8879, the
constructs in the standard
were not seen as being
sufficient for all applications
that need to use fragments.
This Resolution was developed
to provide an interoperable
solution for fragment
applications when the
techniques of 8879 are
insufficient.
The challenge is that
an isolated element from
an SGML document may not
contain quite enough information
to be parsed correctly.
This resolution enables
senders to provide the
remaining information
required so that systems
can interchange any SGML
elements they choose,
from books or chapters
all the way down to paragraphs,
tables, footnotes, book
titles, and so on, without
having to manage each
as a separate entity or
having to risk incorrect
parsing due to loss of
context.
Scope
This resolution enables
interchanging portions
of SGML documents while
retaining the ability
to parse them correctly
(that is, as they would
be parsed in their originating
document context), and,
as far as practical, to
be formatted, edited,
and otherwise processed
in useful ways. Specifically:
|
|
|
|
- A sender can send
a fragment that consists
of any element or any
sequence of SGML data
that constitutes "mixed
content" or "element
content" drawn
from an SGML document.
Most commonly this means
a sequence of contiguous
sibling elements, but
processing instructions,
comments, whitespace,
and certain other SGML
constructs are also
permitted. Any element
that begins within the
fragment must end there
as well, and any element
that ends in the fragment
must also start there
(this constraint is
sometimes called "being
synchronous").
- The fragment sent
can be parsed correctly
at the recipient end
to produce precisely
the same ESIS (SGML
structure and content
information) that the
sender got when it parsed
the fragment in its
complete document context.
- All capabilities of
Basic SGML documents
(except shortrefs) can
be used in any fragment
so sent, as well as
variant capacities and
quantities, as well
as many variant delimiters
and name characters.
- Fragments can be sent
exactly as they occurred
in the original SGML
data. Because they need
not be changed in any
way, it is possible
to authenticate or validate
that they have been
received intact, and
it is possible for users
to cache them.
|
|
|
|
To accomplish these ends,
this resolution defines:
|
|
|
|
- exact constraints
on what portions of
an SGML document may
constitute fragments
to be supported by this
resolution;
- the set of information
needed to allow for
successful parsing as
well as for viewing
or editing of a fragment
in a useful and important
set of cases;
- the notation (i.e.,
language) in which this
information will be
described;
- some mechanisms for
associating this information
with a fragment.
|
|
|
|
Conceptually, a sender
examines a fragment to
be sent and, using the
notation defined in this
Resolution, constructs
a fragment context specification.
The object representing
the fragment removed from
its source document is
called the fragment body.
The sender sends the fragment
context specification
and the fragment body
to the recipient. The
storage object in which
the fragment body is transmitted
is call the fragment entity.
(In some packaging schemes,
the fragment context specification
may also be embedded in
the fragment entity.)
The recipient processes
the fragment context specification
to determine the proper
parser state for the beginning
of the fragment and uses
that information to put
the SGML parser into the
right state to be able
to parse the fragment.
The fragment body itself
can then be parsed normally.
Issues involved with
the possible "return"
of any such fragment to
the original sender and
the determination of the
possible validity of the
"returned" fragment
in its original context
are beyond the scope of
this Resolution. While
implementations of this
Resolution may serve as
part of a larger system
that allows for "fragment
reuse," the many
important issues about
reuse of SGML text are
beyond the scope of this
Resolution.
Definition of a fragment
This Resolution defines
a fragment to be the SGML
representation of SGML
data that constitutes
either element content
(SGML production [26])
or mixed content (SGML
production [25]) extracted
from a complete SGML-compliant
document. The fragment
shall be represented using
at most the syntax and
feature set of a Basic
SGML document as defined
in 8879, definition 4.22,
except that:
|
|
|
|
- the Core Concrete
Syntax rather than the
Reference Concrete Syntax
shall be used (i.e.,
there can be no SHORTREFs),
and
- certain changes from
the concrete syntax
of Basic SGML documents
to the capacities, quantities,
delimiters, and name
characters are permitted,
and the FORMAL feature
can be either YES or
NO.
|
|
|
|
Variant delimiters and
name characters may be
used to the extent that
they do not introduce
conflicts with the delimiters
required by this resolution.
For example, accented
or wide characters may
be used freely, but the
specific characters number
sign (#), single
(') and double
(") quotation
marks, parentheses (()),
equal sign ( =), and whitespace
may not be added to the
permitted SGML name characters
because they could conflict
with the use of those
characters by this resolution.
Fragment context specification
language
Formal syntax
A fragment context specification
uses an extremely simple
formal syntax which is
chosen (a) to prevent
delimiter conflicts if
placing a fragment context
specification inside an
SGML file; (b) to ease
the task of parsing fragment
context specifications
either with standard parser-generator
tools or with handwritten
programs; and (c) to reflect
that a fragment context
specification is information
about SGML data, not
SGML data itself. Though
SGML syntax itself was
considered as a possible
syntax for the fragment
context specification
language, it was rejected
on the basis of not being
the best language for
our purposes for a number
of reasons, including
complexities with delimiter
conflicts, escaping issues,
minimization, issues of
being able to embed a
string using SGML syntax
within an SGML document,
and so on.
Six delimiter characters
are used in fragment context
specifications, and they
are shown as quoted literals
in the grammar below.
They have the same values
regardless of what SGML
declaration applies to
the fragment itself (and
its document context).
Therefore variant concrete
syntaxes in which those
delimiter characters are
added to the list of SGML
name characters (LCNMSTRT,
UCNMSTRT, LCNMCHAR, and
UCNMCHAR) may not be used
with this specification
(variant concrete syntaxes
that do not introduce
such conflicts can be
used freely).
Literals in the grammar
shall be recognized without
regard to case distinctions.
Whitespace characters,
represented in the grammar
as "s",
include space, tab,
form feed, carriage return,
and line feed.
Fragment context specifications
use syntax that can be
processed by a wide variety
of commonly available
parsing tools. That syntax
is defined here combining
the methods of lex and
yacc, with these shorthand
conventions (see John
R. Levine, Tony Mason,
and Doug Brown, lex
& yacc, O'Reilly
& Associates, Inc.,
1990):
|
|
|
|
- * and + are used throughout, not only at the
lexical (lex) level. They indicate that the
preceding token or sub-rule may be repeated;
+ indicates that at least one instance is required.
Square brackets are also used as in lex (an
initial "" negates the list
of permitted characters).
- All characters other than the null character
and the delimiters and whitespace already discussed
are permitted as name characters in fragment
context specifications.
|
|
|
|
The grammar described
formally in the following
section and generally
in this document defines
a fragment context specification
language. Entities composed
of this language can be
said to be written in
the SGML Open Fragment
Context Specification
Notation whose Formal
Public Identifier is:
|
|
|
|
-//SGML Open//NOTATION
Fragment Context Specification//EN
|
|
|
|
BNF Specification
|
|
|
|
fragspec : global* s*
context
global : "("
s* item s* ")"
context : "("
s* "CONTEXT" s+
elemspec+ s* ")"
item
: "SGMLDECL" s+
dcl_loc
| "DOCTYPE" s+
dtdcl_loc
| "SUBSET" s+
external_id
| "SOURCE" s+
external_id locator?
| "LEVEL" (s+
attr)*
| "COMMENT" (s+
value)*
| "CURRENT" s+
gi (s+ attr)+
| "LASTOPENED"
s+ gi
| "LASTCLOSED"
s+ gi
| "RESTATE" s+
revalue
| extension (s+ attr)*
dcl_loc : external_id
| "WITHFRAGMENT"
| "WITHSOURCE"
dtdcl_loc: name s+ external_id
| "WITHFRAGMENT"
| "WITHSOURCE"
external_id : "PUBLIC"
s+ value (s+ value)?
| "SYSTEM" s+
value
locator : node (s+
dataloc)?
| node s* "TO"
s+ node
node
: s+ nameloc (s* treeloc)?
| s+ treeloc
nameloc : "("
s* "ID" s+ name
s* ")"
treeloc : "("
s* "TREELOC" (s+
number)+ s* ")"
dataloc : "("
s* "DATALOC" s+
number (s+ number)? S* ")"
extension: "X-"namechar+
revalue : "AFTERSTARTTAG"
| "AFTERDATA"
| "AFTERRSORRE"
| "PENDINGAFTERRSORRE"
| "PENDINGAFTERMARKUP"
elemspec : gi (s+ rep)?
(s+ elemprop)* s* "("
s* elemspec* s* ")"
| "#PCDATA"
| "#FRAGMENT"
rep
: "#"number
elemprop : attr
| "#NET"
| "#MAP" s* "="
s* value
attr
: name s* "="
s* value
gi
: name
name
: namechar+
value
: "\'"[^']*"\'"
| "\""[^"]*"\""
number : [0-9]+
namechar : [^#()'"=
\t\f\r\n]
s
: [ \t\f\r\n]
|
|
|
|
Examples
This example is intended
to represent a typical
case, which does not require
many of the features needed
to support particular
SGML advanced features:
|
|
|
|
(DOCTYPE book PUBLIC
"-//Acme//DTD Book//EN")
(SUBSET SYSTEM "c:\foo.ent")
(SOURCE SYSTEM "http://xyz.com/books/draft/b.sgm"
(TREELOC 1 2 5
5 1))
(CONTEXT
book version="draft"
(
fm()
bdy (
chp #4 ()
chp label="5"
(
ct() sec #3 () sec ( #fragment
) sec #5 () )
chp () )
bm() )
)
|
|
|
|
The example below includes
even cases that may be
rare in practice:
|
|
|
|
(COMMENT "This
fragment is subsection
(4.4.1) of
the book in galley
form.")
(SOURCE SYSTEM "http://xyz.com/books/draft/b.sgm"
(ID chap4) (TREELOC
1 5 1))
(DOCTYPE book PUBLIC "-//Acme//DTD
Book//EN")
(SUBSET SYSTEM "c:\foo.ent")
(LASTCLOSED CT)
(LASTOPENED CT)
(CURRENT FIGR ent="myvalue")
(CURRENT P security="top")
(CONTEXT
book version=draft
(
fm()
bdy #net
#map="map37"
(
chp #4 ()
chp label="5"
(
ct() sec #3 () sec ( #fragment
) sec #5 () )
chp () )
bm() )
)
|
|
|
|
Item keywords
All items shall be used
with the meanings explained
in this section; the order
in which they are specified
is insignificant. It is
an error to specify any
item other than CURRENT,
COMMENT, SOURCE, or an
extension more than once.
Should such an error be
encountered, the last
value specified shall
be applied.
For correct processing,
certain information must
definitely be available
to the recipient. Therefore
a sender must either send
those items, send references
to them, or have reason
to believe that the recipient
already has them or knows
how to find them. Such
items include the SGML
declaration, and all markup
declarations needed for
correct parsing. Few other
items are needed except
when specific SGML capabilities
are actually used: CURRENT
items are only needed
if #CURRENT attributes
occur, attributes and
sibling information are
only needed for particular
recipient processing such
as auto-numbering or other
formatting, and so on.
SGMLDECL: Reference
to applicable SGML declaration
The SGMLDECL item may
be included to indicate
the SGML declaration applicable
to the fragment's document
or to specify that it
can be found within the
SOURCE document or fragment.
There are several ways
of indicating the declaration's
location. The recipient
shall determine what SGML
declaration to use according
to the following ordered
list:
|
|
|
|
- If SGMLDECL specifies
the token WITHSOURCE,
the SGML declaration
should be included at
the beginning of the
storage object indicated
by the SOURCE item's
external id.
- If SGMLDECL specifies
the token WITHFRAGMENT,
the SGML declaration
should be included at
the top of the fragment
entity itself (except
that it must follow
the fragment context
specification if one
is embedded at the top
of the fragment entity).
- If the SGMLDECL item
is omitted, the fragment-aware
processor shall start
to process the doctype
declaration (as specified
implicitly or explicitly
via the DOCTYPE item).
If an SGML declaration
is found at the top
of it, it shall be used.
- If no SGML declaration
is found via any of
the above methods, then
the receiving system
shall apply any catalog
resolution which it
supports (e.g., the
SGMLDECL and DTDDECL
entries of an SGML Open
TR9401 catalog).
- If none of the above
steps results in an
SGML declaration, the
receiving system shall
apply its default implied
SGML declaration.
|
|
|
|
DOCTYPE: Reference
to applicable DTD
The DOCTYPE item specifies
the DOCTYPE name for the
document from which the
fragment comes (such as
"book") and
the external identifier
for the external subset
of its DTD. This is typically
obtained directly from
the DOCTYPE declaration
of the document. For example:
|
|
|
|
(DOCTYPE book SYSTEM
"http://z.org/public/dtds/book.dtd")
|
|
|
|
Note: "Formal system
identifiers" (or
FSIs) as described in
the "SGML General
Facilities" annex
of the present corrigendum
to ISO/IEC 10744:1992
are one appropriate means
of expressing system identifiers
in this context; they
can accommodate identifiers
such as URLs.
The token WITHSOURCE
as the value of the DOCTYPE
item means that the storage
object indicated by the
SOURCE item's external
id shall be inspected
for an initial doctype
declaration (optionally
preceded by an SGML declaration
if the SGMLDECL item is
omitted) in exactly the
form it would have been
specified if the fragment
were a complete document;
if one is found there,
this doctype declaration
shall be used to process
this fragment.
Similarly, WITHFRAGMENT
means that the fragment
entity (immediately following
any fragment context specification
that may be embedded at
the top of the fragment
entity) shall be inspected
for an initial doctype
declaration (optionally
preceded by an SGML declaration
if the SGMLDECL item is
omitted), and if one is
found there it shall be
used to process this fragment.
In the case of both WITHSOURCE
and WITHFRAGMENT, the
doctype declaration may
include an internal declaration
subset.
If there is no DOCTYPE
item, then
(a) if there is a SOURCE
item in this fragment
context specification,
the equivalent of
(DOCTYPE WITHSOURCE)
is assumed;
(b) if there is no SOURCE
item in this fragment
context specification,
the equivalent of
(DOCTYPE WITHFRAGMENT)
is assumed.
If the DOCTYPE is still
not found, the results
are implementation defined.
Note: In the case
of WITHFRAGMENT, the presence
of a DOCTYPE declaration
in the fragment entity
could allow a non-fragment-aware
SGML parser to mistakenly
attempt to parse the fragment
entity as a complete document.
If a system wishes to
protect against any such
possibility, it shall
not include the DOCTYPE
declaration at the top
of the fragment entity.
SUBSET: Reference
to applicable internal
document type declaration
subset
The SUBSET item specifies
an external identifier
for the internal document
type declaration subset
for the document from
which the fragment comes
or a sender-created portion
of it (the [ ] delimiters
are not to be included).
This is typically obtained
directly from the document
type declaration subset
of the document (if the
information needed from
the subset is not already
in a separate SGML entity,
the sender may create
such an entity and assign
it an external identifier).
SUBSET need not specify
the entire document type
declaration subset, but
must specify enough of
it to parse the fragment
as it would have been
parsed in the original,
complete context. For
example, it is permissible
to omit general ENTITY
declarations for entities
that are not referenced
or mentioned within the
fragment, but not permissible
to omit ones that are.
If the DOCTYPE declaration
is provided at the top
of the fragment entity
(see WITHFRAGMENT above),
then the subset must be
provided there as well,
and it is an error for
a SUBSET item to appear
in the fragment context
specification; the correct
error recovery is to ignore
the SUBSET item.
LEVEL: What optional
specification information
is included
The LEVEL item enables
senders to specify what
optional information they
are in fact including
in the fragment context
specification. Although
optional information cannot
change the way the fragment
is parsed, it can be useful
for other types of processing,
such as formatting. The
LEVEL item can contain
several name=value
pairs, from the set defined
here. If any such pair
is not present, the sender
is deemed to not be specifying
whether the corresponding
information is included
or not. Specifying names
or values not in this
list is an error, and
the erroneous value shall
be ignored. Specifying
the same name more than
once in the same LEVEL
item is also an error,
and the correct recovery
is to accept the last
occurrence.
|
|
|
|
- FSIB: NO | SOME |
LEFT | RIGHT | ALL
This keyword may be
used to state whether
the CONTEXT item includes
no siblings of the fragment,
some siblings but not
all, all left siblings,
all right siblings,
or all siblings.
- ASIB: NO | SOME |
LEFT | RIGHT | ALL
This keyword works like
FSIB, but identifies
what siblings are provided
for ancestors of the
fragment, rather than
for the fragment itself.
- SATTR: NO | SOME |
LEFT | RIGHT | ALL
This keyword may be
used to state what attributes
are provided for siblings
of the fragment: none,
some but not all, all
on left siblings, all
on right siblings, or
all on all siblings.
- AATTR: NO | SOME |
LEFT | RIGHT | ALL
This keyword works like
SATTR, but identifies
what attributes are
provided for ancestors
of the fragment.
- CONTENT: NODE | SIBLINGS
| ELEMENT | MIXED
This keyword may be
used to state whether
the fragment consists
of a single element,
a sequence of contiguous
sibling elements, SGML
element content, or
more general SGML mixed
content.
|
|
|
|
SOURCE: The identity
of the fragment
The SOURCE item may be
used to specify the origin
or identity of the fragment
sufficient for the recipient
to request it again later,
or to save a reference
to it, or to do other
contextual processing
such as resolving IDREFs
that point to elements
outside the fragment.
SOURCE is recommended
in all fragment context
specifications unless
the application context
makes it inapplicable
(such as when no persistent
identifier for the data
exists or the document
source is not accessible).
The external_id
shall identify the entire
document out of which
the fragment was taken.
The external_id
can be any valid public
or system identifier as
defined by 8879. The locator
shall identify the fragment
element(s) within that
document, using methods
drawn directly from HyTime
(ISO/IEC 10744:1992) and
DSSSL (ISO/IEC 10179:1996).
If the fragment consists
of a single element (including
its descendants), the
TO clause of the locator
shall not appear; if the
fragment consists of more
than a single element,
then the TO clause shall
appear: the locator before
"TO" shall identify
the first element or other
node in the fragment,
and the locator after
"TO" shall identify
the last element or other
node in the fragment.
Note: Child nodes shall
be counted as in the default
DSSSL grove plan. "Child
nodes" here means
the items in the node
list specified by the
"content" property
found on nodes of class
"element". The
node types used for content
in the default DSSSL grove
plan are: datachar, sdata,
element, extdata, subdoc,
and pi. Thus, the only
nodes that count as children
are those representing
elements; processing instructions;
SDATA, SUBDOC, and external
data entity references;
and characters in #PCDATA.
Things such as comments,
marked section boundaries,
ignored REs, and ignored
markup of any kind do
not count.
In each locator, at least
one of nameloc
or treeloc shall
appear:
|
|
|
|
- The nameloc
, if present, shall
contain the value of
the nearest ID attribute
available either on
the fragment's initial
element or on an ancestor
of it. If neither the
fragment's initial element
nor any ancestor has
an SGML ID attribute,
the nameloc parameter
shall not be specified.
- The treeloc
, if present, shall
contain a sequence of
sibling numbers for
walking down the document
tree to the fragment,
equivalent to the content
of a marklist
in a HyTime treeloc
location address element.
If nameloc is
also specified, the
element it locates shall
be treated as the location
source where the walk
begins; otherwise the
document's root element
is the location source.
For example, to locate
the second child of
the fourth child of
the root of the document
specified by the
external_id , the
treeloc would contain
" 1 4 2 ".
- The dataloc
shall only be used when
the fragment does not
consist of SGML "element
content" (essentially,
when it does not consist
of one or more complete
elements, but includes
#PCDATA chunks at its
root level).
Except that negative
offsets may not be used,
the offsets are equivalent
to the content of a
dimspec in a HyTime
dataloc location
address element whose
quantum is "str"
and whose location source
is the element(s) specified
by the adjacent nameloc
and/or treeloc
items. At least one
of those items must
be present whenever
dataloc is present.
The length parameter
for the dataloc
is optional because
the receiving system
can count the length
for itself. The starting
and ending offsets of
a non-element-content
fragment must point
to locations directly
within precisely the
same SGML element.
|
|
|
|
COMMENT: User comments
A fragment context specification
may include arbitrary
comments using this item.
The COMMENT item shall
not be used for extensions
intended to be processed
by computer, for which
the extension mechanism
shall be used instead.
CURRENT: values for
#CURRENT attributes
If the fragment uses
no #CURRENT attributes,
the CURRENT item is not
needed. A current item
must be included for every
#CURRENT attribute whose
value is not specified
on its first occurrence
within the SGML fragment
(this is required even
if a value for the attribute
is also specified on some
prior element mentioned
in the fragment context
specification, such as
an ancestor). For example,
given an attribute list
declaration such as:
|
|
|
|
<!ATTLIST p
type
NAME
#CURRENT
secure
(y|n) #CURRENT>
|
|
|
|
a fragment consisting
of section 2 such as:
|
|
|
|
<chap>
<sec>...<p
type=4 secure=Y>Some
text...</p></sec>
<sec n=2><p>Some
more text...</p></sec>
</chap>
|
|
|
|
contains a P element
that must receive attribute
values from a prior element
outside the fragment.
Therefore the fragment
context specification
for section 2 would include:
|
|
|
|
(Current P TYPE="4"
SECURE="Y")
|
|
|
|
If multiple #CURRENT
attributes are defined
in the same SGML ATTLIST
they may be either combined
(as just shown) or listed
separately (as shown below),
with no change of meaning:
|
|
|
|
(Current P TYPE="4")
(Current P SECURE="Y")
|
|
|
|
Note: It is never necessary
to indicate that a #CURRENT
attribute has not yet
been set before the fragment,
because under SGML rules
if that is true then the
first occurrence within
the fragment must have
an explicit value.
The attribute value may
generally be given either
as the original value
exactly as in the original
SGML source, or may be
the result obtained after
parsing the value, case-folding
it, and/or normalizing
white space within it
according to SGML rules.
However, if the value
contains an entity reference(s),
then the value must be
the exact source value,
to ensure correct interpretation
of entity reference(s)
within the value.
If a #CURRENT attribute
applies to a name group
rather than to a single
GI (as with the SGML ATTLIST
declaration shown below),
then each current item
given for that attribute
shall specify one of the
GIs, not the entire name
group. This is enough
because the recipient
has access to the DTD
and can find the applicable
ATTLIST and its name group.
|
|
|
|
<!ATTLIST (p |
bq | fn) secure (y | n)
#CURRENT>
|
|
|
|
A CURRENT item may be
included for #CURRENT
attributes that do not
in fact occur within the
fragment, and this is
not an error. Senders
should check and minimize
what to transmit, but
are permitted to send
all the possibly-needed
values without checking.
It is an error to specify
CURRENT more than once
for the same attribute;
should such an error be
encountered, the last
value specified shall
be used.
LASTOPENED and LASTCLOSED:
for empty start tags
|
|
|
|
If the fragment uses
SGML empty start tags
(<>) in
certain ways, the fragment
context specification
must include the LASTOPENED
and/or LASTCLOSED items:
|
|
|
|
- LASTOPENED must be
used to provide the
GI of the last element
opened prior to the
fragment if OMITTAG
is YES and the first
element in the fragment
begins with an empty
start tag.
- LASTCLOSED must be
used to provide the
GI of the last element
closed prior to the
fragment if (a) OMITTAG
is NO, (b) an empty
start tag occurs within
the fragment, and (c)
such a start tag occurs
before any element happens
to be closed within
the fragment.
|
|
|
|
It is not an error to
specify the LASTOPENED
and/or LASTCLOSED items
even if they are not actually
needed. It is never necessary
to send both. Implementors
may choose to always send
both, always send one
(choosing which one based
solely on OMITTAG), or
check the conditions above
and send these items only
when actually needed.
RESTATE: record end
handling state
An SGML parser implementing
clause 7.6.1 of ISO 8879
has five distinct record-boundary
processing states. The
RESTATE item specifies
which of these states
is current at the start
of the fragment. The following
identifies these states
by specifying one situation
in which the parser enters
this state; for each state,
there are also other situations
in which the parser can
enter the state:
|
|
|
|
- AFTERSTARTTAG: immediately
after the start of a
proper sub-element
- AFTERDATA: immediately
after data
- AFTERRSORRE: immediately
after an RS encountered
in state AFTERDATA
- PENDINGAFTERRSORRE:
immediately after an
RE encountered in state
AFTERDATA
- PENDINGAFTERMARKUP:
immediately after a
processing instruction
encountered in state
PENDINGAFTERRSORRE
|
|
|
|
If RESTATE is not sent,
then modifying the fragment
before the beginning of
the first (or only) element
of the fragment, after
the end of the last (or
only) element of the fragment,
or between two elements
at the top level of the
fragment may not in all
cases have unambiguous
results. In some applications
record boundaries in content
may never occur or may
have no significance,
as determined by some
application-specific semantic
rules outside SGML. In
such cases the RESTATE
item may always be omitted.
extension: User enhancements
To add machine-processable
information to fragment
context specifications,
a new item keyword may
be created. Such a keyword
must be named beginning
with X-. A tool
conforming to this Resolution
must handle all such extensions
(by processing those it
recognizes and safely
ignoring--while optionally
emitting a warning message--those
it does not recognize).
CONTEXT and its keywords
The CONTEXT item is required
in all fragment context
specifications and provides
information about the
element context of the
fragment such as the list
of element types open
when it begins. It is
the last item in any fragment
context specification.
The keywords described
in this section appear
when applicable within
individual element specifications,
rather than as freestanding
items. In order to avoid
potential conflict with
attribute names, they
all begin with "#"
(which is the RNI delimiter
in the Reference Concrete
Syntax).
Parentheses in the CONTEXT
item express tree structure
from the SGML document
from which the fragment
came. Ancestors of the
fragment by definition
do not have a close parenthesis
until after #FRAGMENT.
If mentioned at all, prior
siblings have both open
and close parentheses
before #FRAGMENT, and
later siblings have both
after. Thus, any element's
attribute list ends at
the first following (unquoted)
parenthesis.
#PCDATA: Pseudo-elements
In mixed content, portions
of character content between
elements contribute siblings.
In a fragment context
specification that chooses
to list siblings, such
portions are specified
by the keyword #PCDATA.
This keyword may not have
a repetition count or
attributes.
#FRAGMENT: The fragment
element
The token #FRAGMENT must
be included at the point
in the context where the
fragment fits. This keyword
may not have a repetition
count or attributes.
#NET: NET-enabling
start tags
The parameter "#NET"
must be specified if and
only if SHORTTAG is YES
and the element for which
it is specified is an
ancestor that was opened
with a NET-enabling start
tag. It is necessary in
this case so that the
recipient can know to
recognize a NET delimiter
in the fragment. For example:
|
|
|
|
<chap/<sec/<p>Some
text.....</p>//
|
|
|
|
The fragment context
specification for the
P element would then include:
|
|
|
|
CHAP #NET ( SEC #NET
( #FRAGMENT))
|
|
|
|
This parameter may also
be specified for siblings
which started with NET-enabling
start tags, but this is
unnecessary.
#MAP: Short reference
maps
The parameter #MAP=mapname
must be specified for
any ancestor element that
has a USEMAP declaration
directly within it which
precedes the fragment
being sent, unless a nearer
ancestor or the fragment
itself overrides that
map (making it inapplicable
to the fragment). It is
never needed in documents
that do not use short
references or that do
not use USEMAP declarations
within the document instance.
For example:
|
|
|
|
<chap>
<sec n=1>...</sec>
<!USEMAP map37>
<sec n=2>...</sec>
<sec n=3>...</sec>
</chap>
|
|
|
|
The keyword must specify
the name of the applicable
map, for example #MAP="map37".
If more than one
USEMAP has occurred, the
most recent one must be
specified, since it is
the one in effect at the
start of the fragment.
This parameter is permitted
(but entirely unnecessary)
for specifying short reference
maps that are associated
with all instances of
an element type via a
USEMAP declaration in
the DTD. The recipient's
parser already knows about
those by virtue of the
DTD plus the list of open
element types. #MAP may
also be specified for
other elements described
in the fragment context
specification that contain
USEMAP declarations, but
this is also unnecessary.
Supplemental information
The preceding information
is sufficient to enable
a recipient to parse the
fragment correctly; however,
some additional information
is commonly useful for
application-specific processing
of various kinds, and
this resolution provides
an optional way to send
it. This resolution does
not specify a method for
senders and recipients
to negotiate whether such
information is sent. This
resolution does, however,
require that all recipient
software be able to receive
all optional information
safely (even if it does
not use it). It also provides,
via the LEVEL item, a
way for senders to inform
recipients of what optional
information they have
actually sent.
Attributes
Processing specifications
often test attributes
to decide what to do,
and may pass ancestor's
attribute values downward
to descendant elements.
For example, setting SECURE=SECRET
on a SECTION element might
cause all elements within
the SECTION to be hidden
even though they do not
themselves specify the
SECURE attribute at all.
This resolution permits
sending attribute lists
for all elements for which
GIs can be sent. Attribute
values appear after the
GI and are separated by
white space. This is similar
to the syntax of SGML
attribute specification
lists. The syntax details
for attribute values on
CONTEXT items are exactly
the same as specified
above for the CURRENT
item. For example:
|
|
|
|
(CONTEXT
BOOK TYPE="MONOGRAPH"
BDY
SECURE="PUBLIC"
TOC="TRUE" (
CHP #NET #MAP="map37"
CNUM="1" (
(#FRAGMENT ))))
|
|
|
|
An element specification
may provide no, some,
or all of the attributes
that the corresponding
element instance had.
Putting two assignments
for the same attribute
name with the same element
is an error, and the correct
error recovery is that
the last assignment takes
effect.
Siblings
Many auto-numbering methods
use the sequence number
of an element instance
among its siblings, or
more generally the number
among just those siblings
that fit some special
criterion. For example,
a section may be "3.2"
because it is the second
SEC within its parent
CHP, while that parent
CHP is the third CHP within
the parent BDY. Because
of this common need, this
resolution permits listing
the element types of siblings
of the fragment element(s)
and of each of its (their)
ancestors.
For example, here the
fragment is the fifth
subelement of BDY (such
as chapter 4), which is
the first subelement of
the root element BOOK
(as in a document with
no front
matter): |
|
|
|
(CONTEXT
BOOK( BDY( INTRO()
CHP() CHP() CHP() #FRAGMENT
)))
|
|
|
|
In addition, the attribute
specification lists of
those elements may be
specified exactly as defined
above for attribute lists
of direct-line ancestors.
A fragment context specification
that provides attributes
for ancestors is not required
to send them for siblings
as well. For example:
|
|
|
|
(CONTEXT
BOOK TYPE="MONOGRAPH"
(
BDY
SECURE="PUBLIC"
TOC="TRUE" (
INTRO() CHP() CHP() CHP()
#FRAGMENT )))
|
|
|
|
A portion of character
data in mixed content
counts as a sibling. Such
portions are specified
by the keyword #PCDATA
as shown here, which
permits no associated
attributes or parentheses:
|
|
|
|
(CONTEXT
BOOK(
BDY(
INTRO() #PCDATA CHP()
#PCDATA
CHP() CHP() #FRAGMENT
)))
|
|
|
|
Series of like siblings
A list of preceding siblings
of a fragment element
or an ancestor might contain
a long sequence of repeated
instances of the same
element type. A repetition
factor may be specified
for any sibling GI listed
in the fragment context
specification. This optimization
can provide great bandwidth
benefits if a sender chooses
to include sibling information
at all.
A repetition count shall
be specified by a separate
token following the GI
to which it applies, preceding
any attributes, #NET,
or #MAP. The token shall
consist of "#"
plus an unsigned decimal
integer. It is an error
to specify a repetition
count of zero, and the
correct error recovery
is to ignore that elemspec.
A repetition count of
1 is unnecessary but permitted.
For example, the specifications
shown below are all equivalent:
|
|
|
|
(CONTEXT BOOK( BDY(
CHP() CHP() CHP() CHP(
P() P() #FRAGMENT ))))
|
|
|
(CONTEXT BOOK( BDY(
CHP #2() CHP() CHP( P()
P() #FRAGMENT ))))
|
|
|
(CONTEXT BOOK( BDY(
CHP #4( P #2() #FRAGMENT
))))
|
|
|
|
If an element specification
with a repetition factor
is not closed before #FRAGMENT,
then the last repetition
is an ancestor of the
fragment, and the other
repetitions constitute
prior siblings of that
ancestor.
If an element specification
gives both a repetition
count and attributes,
the specified attributes
must have the same value
for all element instances
so combined (attributes
not specified need not
have uniform values).
For example, a specification
such as this states that
all three chapters, the
last one of which is an
ancestor, have attribute
TYPE=X:
|
|
|
|
(CONTEXT BOOK( BDY(
CHP #3 TYPE="X"
( P( #FRAGMENT )))))
|
|
|
|
It may be useful in such
cases to collapse runs
of elements that share
both element type and
attribute values, but
not combine potentially
longer runs that share
element type but not attribute
values.
Note: the specification
of an attribute with declared
value ID on an element
specification (elemspec)
with a repetition factor
greater than 1 would necessarily
produce an invalid context
(one in which multiple
elements have the same
ID).
Packaging the fragment
and its fragment context
specification
This resolution recognizes
that there are various
uses of SGML fragments
and fragment context specifications.
In particular, a fragment
body need not be permanently
associated with a specific
fragment context specification,
nor does this Resolution
limit in any way whether
a fragment body is associated
with zero, one, or more
fragment context specifications.
Furthermore, this Resolution
does not limit how a fragment
body and its associated
fragment context specification(s),
if any, shall be associated.
It is left to the individual
applications, tools, and
users to determine the
most effective way given
the particular circumstances.
The principle goal of
this Resolution is to
define the fragment context
specification language
independent of any packaging
issues.
However, this Resolution
does realize that it will
often be a practical necessity
to "package"
a fragment body and its
associated fragment context
specification; therefore,
the following sections
describe two possible
ways to associate fragment
bodies and fragment context
specifications. Furthermore,
for an implementation
to be compliant with this
Resolution, it must be
able to process fragment
entities packaged as described
in the following section,
though this in no way
constrains users or applications
to using this particular
packaging method.
Embedding the fragment
context specification
in the fragment entity
When the concrete syntax
of the fragment body uses
the Reference Concrete
Syntax values for the
"processing instruction
open" (PIO) and "processing
instruction close"
(PIC) delimiters, the
entire fragment context
specification can be embedded
at the top of a fragment
entity by making the fragment
context specification
string the content of
one or more special SGML
processing instructions
(PIs) as described below.
The PI used to embed
a fragment context specification
at the top of a fragment
entity must begin with
the string SO
FRAG followed by one or
more whitespace characters
(except for the special
case of the SO
ESCPIC PI described below).
The content (that is,
all system data between
the PI's open and close
delimiters except for
SO FRAG and the immediately
following whitespace)
of the PI is taken as
the fragment context specification.
If desired (for readability
or to avoid exceeding
certain quantities such
as PILEN), the fragment
context specification
string can be split among
multiple consecutive
SO FRAG PIs. The
content of all such PIs
that occur prior to the
fragment body are concatenated
in order to produce the
fragment context specification.
(Note that, since the
whitespace immediately
following the initial
SO FRAG characters
will not be considered
content of the PI when
concatenating to reconstitute
the fragment context specification,
care must be taken when
splitting the fragment
context specification
so that there is whitespace
immediately following
the split.) The fragment
is deemed to begin at
the first construct which
is not a comment declaration,
an SO FRAG or SO ESCPIC
processing instruction,
or whitespace.
When fragment context
specifications are placed
in PIs, they must not
contain any instance of
the "processing instruction
close" (PIC) delimiter
(e.g., ">"
in the Reference Concrete
Syntax). Should the need
arise to encode the PIC
delimiter--for example
within a quoted attribute
value specified for some
ancestor or sibling--it
is to be done as follows:
|
|
|
|
- The SO FRAG PI
that contains the
character before the
PIC delimiter shall
be terminated after
that character (with
a PIC delimiter).
- An SO ESCPIC processing
instruction (e.g.,
<?SO ESCPIC> using
the Reference Concrete
Syntax PIO and PIC delimiters)
shall follow, possibly
separated by whitespace.
(If there are consecutive
occurrences of the PIC
delimiter, multiple
SO ESCPIC PIs shall
be used.)
- Another SO FRAG
processing instruction
shall follow, possibly
separated by whitespace.
This continues the fragment
context specification
starting immediately
after the occurrence
of the PIC delimiter(s)
in the fragment context
specification represented
by the preceding
SO ESCPIC PI(s).
|
|
|
|
The fragment context
specification shall be
reconstructed by concatenating
all SO FRAG and SO ESCPIC
processing instructions,
but replacing each SO
ESCPIC PI by an instance
of the PIC delimiter.
Note: Most cases requiring
the PIC to be embedded
in the fragment context
specification will arise
within quoted attribute
values, which means that
quotation marks within
individual SO FRAG PIs
will not balance. This
is not an error.
In the following example
fragment entity, the "bdy"
element's "code"
attribute has the value
">":
|
|
|
|
<?SO FRAG
(DOCTYPE PUBLIC "-//Acme//DTD
Book//EN")
(SOURCE SYSTEM "http://xyz.com/books/draft/b.sgm"
(TREELOC 1 2 4))
(CONTEXT book (bdy code=">
<?SO ESCPIC>
<?SP FRAG " date="1996-09-05"
( #fragment )))
>
<chp><ct>4:
Printing</ct>
...
|
|
|
Multipart packaging protocols
Alternatively, the fragment
body and its fragment
context specification
can be packaged using
any protocol that permits
including more than one
storage object in an interchange
package. A few examples
of such protocols are
tar, pkzip, stuffit, SDIF,
and MIME Multipart/Mixed.
In such a method, there
are no constraints on
characters within the
fragment context specification
(such as with the PIC
in the previous section)
unless they are imposed
by the particular method
chosen.
For example, the following
example shows packaging
a fragment body and its
fragment context specification
using MIME Multipart/Mixed:
|
|
|
|
Content-Type:
Multipart/Mixed Boundary=fragment-example
--fragment-example
Content-Type: Application/X-SGML-Open-Frag-Spec
Content-Id: fragment.sof.960209.153601.123
(DOCTYPE book PUBLIC "-//Acme//DTD
Book//EN")
(SUBSET SYSTEM "c:\foo.ent")
(SOURCE SYSTEM "http://xyz.com/books/draft/b.sgm"
(treeloc 1 2 4))
(CONTEXT book ( bdy (
#fragment )))
--fragment-example
Content-Type: APPLICATION/SGML
Content-Id: fragment.sgm.960209.153602.345
<chp><ct>4:
Printing</ct>
...
|
|
|
|
If sent as a separate
file, the fragment context
specification should be
assigned the name "fragspec"
and the extension
".sof"
(for "SGML Open Fragment").
If an application associates
a fragment context specification
with a fragment body via
an SGML Open Entity Catalog
(TR9401), it shall do
it via an extension whose
keyword is FRAGSPEC
and which takes as
arguments two quoted storage
object identifiers: that
of the fragment context
specification and then
that of the fragment body.
If the Document Type
Declaration is placed
in the fragment entity
just prior to the fragment
body (so that the DOCTYPE
item specifies WITHFRAGMENT
instead of an external
identifier), then the
resulting combined storage
object cannot be usefully
referenced as an SGML
text entity from within
another document. If,
on the other hand, the
Document Type Declaration
is separate, it may either
accompany the fragment
body and fragment context
specification for transmission
or may be omitted and
then obtained by the recipient
on demand using the external
identifiers given in the
fragment context specification's
DOCTYPE and SUBSET items.
Additional examples
The following examples
are intended to help further
illustrate how this Technical
Resolution might be applied.
|
|
|
 |
 |
|
<?SO FRAG
(DOCTYPE WITHFRAGMENT)
(CONTEXT book(front()body(chapter
#2 chapter(section #4()#fragment))))
>
<!DOCTYPE book PUBLIC "-//Acme//DTD
Acme Book//EN" [
<!-- This is the internal
subset -->
<!ENTITY foo "bar">
]>
<section>
<!-- the section contents
-->
</section>
|
 |
 |
|
 |
|
By taking advantage of the
defaults for DOCTYPE, the above
"(DOCTYPE WITHFRAGMENT)"
item can be omitted and
the example can be written:
|
|
|
|
<?SO FRAG (CONTEXT book(front()body(chapter
#2 chapter(section #4()#fragment))))>
<!DOCTYPE book PUBLIC "-//Acme//DTD
Acme Book//EN" [
<!-- This is the internal
subset -->
<!ENTITY foo "bar">
]>
<section>
<!-- the section contents
-->
</section>
|
|