Did you know ... | Search Documentation: |
DTD-Handling |
The DTD (Document Type Definition) is a separate entity in sgml2pl, that can be created, freed, defined and inspected. Like the parser itself, it is filled by opening it as a Prolog output stream and sending data to it. This section summarises the predicates for handling the DTD.
dialect
option from open_dtd/3
and the encoding
option from open/4.
Notably the dialect
option must match the dialect used for
subsequent parsing using this DTD.sgml
. Using xml
or
xmlns
processes the DTD case-sensitive.dtd
using
the call:
..., absolute_file_name(dtd(Type), [ extensions([dtd]), access(read) ], DtdFile), ...
Note that DTD objects may be modified while processing errornous
documents. For example, loading an SGML document starting with
<?xml ...?>
switches the DTD to XML mode and
encountering unknown elements adds these elements to the DTD object.
Re-using a DTD object to parse multiple documents should be restricted
to situations where the documents processed are known to be error-free.
The DTD html
is handled separately. The Prolog flag
html_dialect
specifies the default html dialect, which is
either
html4
or html5
(default).3Note
that HTML5 has no DTD. The loaded DTD is an informal DTD that includes
most of the HTML5 extensions (http://www.cs.tut.fi/~jkorpela/html5-dtd.html).
In addition, the parser sets the dialect
flag of the DTD
object. This is used by the parser to accept HTML extensions.
Next, the corresponding DTD is loaded.
omit(OmitOpen, OmitClose)
, where both
arguments are booleans (true
or false
representing whether the open- or close-tag may be omitted. Content
is the content-model of the element represented as a Prolog term. This
term takes the following form:
cdata
, but entity-references are expanded.*
(SubModel)?
(SubModel)+
(SubModel),
(SubModel1, SubModel2)|
(SubModel1,
SubModel2)cdata
, entity
,
id
, idref
, name
, nmtoken
,
notation
, number
or nutoken
. For
DTD types that allow for a list, the notation list(Type)
is
used. Finally, the DTD construct (a|b|...)
is mapped to the
term
nameof(ListOfValues)
.
Default describes the sgml default. It is one required
,
current
, conref
or implied
. If a
real default is present, it is one of default(Value)
or fixed(Value)
.
NOTATION
declarations.system(+File)
and/or
public(+PublicId)
.
As this parser allows for processing partial documents and process the DTD separately, the DOCTYPE declaration plays a special role.
If a document has no DOCTYPE declaraction, the parser returns a list holding all elements and CDATA found. If the document has a DOCTYPE declaraction, the parser will open the element defined in the DOCTYPE as soon as the first real data is encountered.