Did you know ... | Search Documentation: |
Loading Structured Documents |
SGML or XML files are loaded through the common predicate load_structure/3. This is a predicate with many options. For simplicity a number of commonly used shorthands are provided: load_sgml_file/2, load_xml_file/2, and load_html_file/2.
stream(StreamHandle)
or a file-name. Options is
a list of options controlling the conversion process.
A proper XML document contains only a single toplevel element whose name matches the document type. Nevertheless, a list is returned for consistency with the representation of element content. The ListOfContent consists of the following types:
CDATA
. Note this is possible in
SWI-Prolog, as there is no length-limit on atoms and atom garbage
collection is provided.
ListOfAttributes is a list of Name=Value
pairs for attributes. Attributes of type CDATA
are returned
literal. Multi-valued attributes (NAMES
, etc.) are
returned as a list of atoms. Handling attributes of the types NUMBER
and NUMBERS
depends on the setting of the number(+NumberMode)
attribute through
set_sgml_parser/2
or load_structure/3.
By default they are returned as atoms, but automatic conversion to
Prolog integers is supported. ListOfContent defines the
content for the element.
SDATA
is
encountered, this term is returned holding the data in Text.NDATA
is
encountered, this term is returned holding the data in Text.<?...?>
), Text
holds the text of the processing instruction. Please note that the
<?xml ...?>
instruction is handled internally.The Options list controls the conversion process. Currently defined options are below. Other options are passed to sgml_parse/2.
<!DOCTYPE ...>
declaration is ignored and the document is parsed and validated against
the provided DTD. If provided as a variable, the created DTD is
returned. See section 3.5.sgml
(default),
html4
, html5
, html
(same as html4
,
xhtml
, xhtml5
, xml
and xmlns
.
See the option dialect
of set_sgml_parser/2
for details./
is accepted with warning as part of an
unquoted attribute-value, though />
still closes the
element-tag in XML mode. It may be set to false for parsing HTML
documents to allow for unquoted URLs containing /
.xml:space
.
See
section 3.2.NUMBER
and NUMBERS
are handled. If token
(default) they are passed as an atom.
If
integer
the parser attempts to convert the value to an
integer. If successful, the attribute is passed as a Prolog integer.
Otherwise it is still passed as an atom. Note that SGML defines a
numeric attribute to be a sequence of digits. The -
sign is not allowed and
1
is different from 01
. For this reason the
default is to handle numeric attributes as tokens. If conversion to
integer is enabled, negative values are silently accepted.true
for XML and false
for SGML and HTML dialects.false
. Setting this option sets the
case_sensitive_attributes
to the same value. This option
was added to support HTML quasi quotations and most likely has little
value in other contexts.false
.false
, only the attributes occurring in the
source are emitted.CDATA
entities can be specified with this construct.
Multiple entity options are allowed.max_memory(0)
(the default) means no resource limit will be enforced.atom
(default), and string
. The choice is not
obvious. Strings are allocated on the Prolog stacks and subject to
normal stack garbage collection. They are quicker to create and avoid
memory fragmentation. But, multiple copies of the same string are stored
multiple times, while the text is shared if atoms are used. Strings are
also useful for security sensitive information as they are invisible to
other threads and cannot be enumerated using, e.g., current_atom/1.
Finally, using strings allows for resource usage limits using the global
stack limit (see set_prolog_stack/2).atom
(default), and string
. See above for the
advantages and disadvantages of using strings.true
, xmlns namespaces with prefixes are returned as
ns(Prefix, URI)
terms. If false
(default), the
prefix is ignored and the xmlns namespace is returned as just the URI.