SWI-Prolog has started support for web-documents with the development
of a small and fast SGML/XML parser, followed by an RDF parser (early
2000). With the semweb
library we provide more high level
support for manipulating semantic web documents. The semantic web is the
likely point of orientation for knowledge representation in the future,
making a library designed in its spirit promising.
Central to this library is the module rdf_db.pl
,
providing storage and basic querying for RDF triples. This triple store
is filled using the RDF parser realised by rdf.pl
. The
storage module can quickly save and load (partial) databases. The
modules rdfs.pl
and owl.pl
add querying in
terms of the more powerful RDFS and OWL languages. Module rdf_edit.pl
adds editing, undo, journaling and change-forwarding. Finally, a variety
of XPCE modules visualise and edit the database. Figure figure
1 summarised the modular design.
Figure 1 : Modules for the Semantic Web library |
The central module is called rdf_db
. It provides storage
and indexed querying of RDF triples. Triples are stored as a quintuple.
The first three elements denote the RDF triple. File and
Line provide information about the origin of the triple.
{Subject Predicate Object File Line}
The actual storage is provided by the foreign language (C)
module rdf_db.c
. Using a dedicated C-based implementation
we can reduced memory usage and improve indexing capabilities.1The
orginal implementation was in Prolog. This version was implemented in 3
hours, where the C-based implementation costed a full week. The C-based
implementation requires about half the memory and provides about twice
the performance. Currently the following indexing is
provided.
subPropertyOf
predicates indexing happens on the most
abstract predicate. This makes calls to rdf_has/4
very efficient.
literal(Value)
if the object is a literal value. If a value of the form
NameSpaceID : LocalName
is provided it is expanded to a
ground atom using expand_goal/2.
This implies you can use this construct in compiled code without paying
a performance penalty. See also
section 3.5. Literal values take one
of the following forms:
xml:lang
)
qualifier.rdf:datatype
TypeID. The Value is either the textual
representation or a natural Prolog representation. See the option
convert_typed_literal(:Convertor)
of the parser. The
storage layer provides efficient handling of atoms, integers (64-bit)
and floats (native C-doubles). All other data is represented as a Prolog
record.
For string querying purposes, Object can be of the form
literal(+Query, -Value)
, where Query is one of
the terms below. Details of literal matching and indexing are described
in section 3.1.1.
Backtracking never returns duplicate triples. Duplicates can be
retrieved using rdf/4.
The predicate rdf/3
raises a type-error if called with improper arguments. If rdf/3
is called with a term
literal(_)
as Subject or Predicate
object it fails silently. This allows for graph matching goals like
rdf(S,P,O),rdf(O,P2,O2)
to proceed without errors.2Discussion
in the SPARQL community votes for allowing literal values as subject.
Although we have no principal objections, we fear such an extension will
promote poor modelling practice.
Atom : Integer
where Atom is intended to be used
as filename or URL and Integer for representing the
line-number. Unlike rdf/3,
this predicate does not remove duplicates from the result set.subPropertyOf
relation. It
returns any triple whose stored predicate equals Predicate or
can reach this by following the recursive subPropertyOf
relation. The actual stored predicate is returned in TriplePred.
The example below gets all subclasses of an RDFS (or OWL) class, even if
the relation used is not rdfs:subClassOf
, but a
user-defined sub-property thereof.3This
predicate realises semantics defined in RDF-Schema rather than RDF. It
is part of the library(rdf_db)
module because the indexing
of this module incorporates the rdfs:subClassOf
predicate.
subclasses(Class, SubClasses) :- findall(S, rdf_has(S, rdfs:subClassOf, Class), SubClasses).
Note that rdf_has/4 and rdf_has/3 can return duplicate answers if they use a different TriplePred.
rdf_has(Subject, Predicate, Object, _)
.rdfs:Resource
:
?- rdf_reachable(X, rdfs:subClassOf, rdfs:'Resource'). X = 'http://www.w3.org/2000/01/rdf-schema#Resource' ; X = 'http://www.w3.org/2000/01/rdf-schema#Class' ; X = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Property' ; ...
rdf(Subject, _, _)
.
Starting with version 2.5.0 of this library, literal values are ordered and indexed using a balanced binary tree (AVL tree). The aim of this index is threefold.
As string literal matching is most frequently used for searching
purposes, the match is executed case-insensitive and after removal of
diacritics. Case matching and diacritics removal is based on Unicode
character properties and independent from the current locale. Case
conversion is based on the `simple uppercase mapping' defined by Unicode
and diacritic removal on the `decomposition type'. The approach is
lightweight, but somewhat simpleminded for some languages. The tables
are generated for Unicode characters upto 0x7fff. For more information,
please check the source-code of the mapping-table generator
unicode_map.pl
available in the sources of this package.
Currently the total order of literals is first based on the type of literal using the ordering
numeric < string < termNumeric values (integer and float) are ordered by value, integers preceed floats if they represent the same value. strings are sorted alphabetically after case-mapping and diacritic removal as described above. If they match equal, uppercase preceeds lowercase and diacritics are ordered on their unicode value. If they still compare equal literals without any qualifier preceeds literals with a type qualifier which preceeds literals with a language qualifier. Same qualifiers (both type or both language) are sorted alphabetically.5The ordering defined above may change in future versions to deal with new queries for literals.
The ordered tree is used for indexed execution of
literal(
as well as
prefix(Prefix)
, Literal)literal(
if Like
does not start with a `*'. Note that results of queries that use the
tree index are returned in alphabetical order.
like(Like)
, Literal)
The predicates below form an experimental interface to provide more
reasoning inside the kernel of the rdb_db engine. Note that
symetric
, inverse_of
and transitive
are not yet supported by the rest of the engine.
rdf_current_predicate(Predicate) :- findall(P, rdf(_,P,_), Ps), sort(Ps, S), member(Predicate, S).
symmetric
, inverse_of
and
transitive
as defined with rdf_predicate_property/2.
Adding an A inverse_of B also adds B
inverse_of A. An inverse relation is deleted using inverse_of([])
.
`
As depicted in figure 1, there
are two levels of modification. The rdf_db
module simply
modifies, where the rdf_edit
library provides transactions
and undo on top of this. Applications that wish to use the rdf_edit
layer must never use the predicates from this section directly.
user
. Subject and
Predicate are resources. Object is either a
resource or a term literal(Value)
. See rdf/3
for an explanation of Value for typed and language qualified
literals. All arguments are subject to name-space expansion (see section
3.5).literal(Value)
.
The predicates from section 3.3.1 perform immediate and atomic modifications to the database. There are two cases where this is not desirable:
( rdf(X, length, literal(L)), atom_number(L, IL), IL > 2, rdf_assert(X, size, large), fail ; true ).
Running this code without precautions causes an error because rdf_assert/3 tries to get a write lock on the database which has an a read operation (rdf/3 has choicepoints) in progress.
Where the second case is probably obvious, the first case is less so. The storage layer may require reindexing after adding or deleting triples. Such reindexing operatations however are not possible while there are active read operations in other threads or from choicepoints that can be in the same thread. For this reason we added rdf_transaction/2. Note that, like the predicates from section 3.3.1, rdf_transaction/2 raises a permission error exception if the calling thread has active choicepoints on the database. The problem is illustrated below. The rdf/3 call leaves a choicepoint and as the read lock originates from the calling thread itself the system will deadlock if it would not generate an exception.
1 ?- rdf_assert(a,b,c). Yes 2 ?- rdf_assert(a,b,d). Yes 3 ?- rdf(a,b,X), rdf_transaction(rdf_assert(a,b,e)). ERROR: No permission to write rdf_db `default' (Operation would deadlock) ^ Exception: (8) rdf_db:rdf_transaction(rdf_assert(a, b, e)) ? no debug 4 ?-
rdf_transaction(Goal, user
)
.On entry, rdf_transaction/1 gains exclusive access to the database, but does allow readers to come in from all threads. After the successful completion of Goal rdf_transaction/1 gains completely exclusive access while performing the database updates.
Transactions may be nested. Committing a nested transactions merges its change records into the outer transaction, while discarding a nested transaction simply destroys the change records belonging to the nested transaction.
The Id argument may be used to identify the transaction.
It is passed to the begin/end events posted to hooks registered with
rdf_monitor/2.
The Id log(Term)
can be used to enrich the
journal files with additional history context. See section
4.6.1.
The rdf_db
module can read and write RDF-XML for import
and export as well as a binary format built for quick load and save
described in section 3.4.3. Here
are the predicates for portable RDF load and save.
true
(default), try to use cached data or create a cache
file. Otherwise load the source.xml
(RDF/XML) and triples
(internal
quick load and cache format).changed
(default) load the source if it was not
loaded before or has changed; true
(re-)loads the source
unconditionally and not_loaded
loads the source if it was
not loaded, but does not check for modifications.true
, the message reporting
completion is printed using level silent
. Otherwise the
level is
informational
. See also print_message/2.true
(default false
), register xmlns:ns=url
namespace declarations as rdf_db:ns(ns,url) namespaces if there is no
conflict.
rdf_save(File,[])
.anon(false)
is provided anonymous resources are only
saved if the resource appears in the object field of another triple that
is saved.xml:base
="BaseURI" in the
header and emit all URIs that are relative to the base-uri. The xml:base
declaration can be suppressed using the option
write_xml_base(false)
false
(default true
), do not emit
the
xml:base
declaration from the given base_uri
option. The idea behind this option is to be able to create documents
with URIs relative to the document itself:
..., rdf_save(File, [ base_uri(BaseURI), write_xml_base(false) ]), ...
convert_typed_literal
option of
the RDF parser. The Converter is called with the same
arguments as in the RDF parser, but now with the last argument
instantiated and the first two unbound. A proper convertor that can be
used for both loading and saving must be a logical predicate.utf8
(default), iso_latin_1
and ascii
.
Using iso_latin_1
or ascii
, characters not
covered by the encoding are emitted as XML character entities (&#...;
).xml:lang
attribute in the outermost rdf:RDF
element. This language
acts as a default, which implies that the xml:lang
tag is
only used for literals with a different language identifier.
Please note that this option will cause all literals without language
tag to be interpreted using XMLLang.
The library library(semweb/rdf_cache)
defines the
caching strategy for triples sources. When using large RDF sources,
caching triples greatly speedup loading RDF documents. The cache library
implements two caching strategies that are controlled by rdf_set_cache_options/1.
Local caching This approach applies to files only. Triples are
cached in a sub-directory of the directory holding the source. This
directory is called .cache
(_cache
on
Windows). If the cache option create_local_directory
is true
,
a cache directory is created if posible.
Global caching This approach applies to all sources, except
for unnamed streams. Triples are cached in directory defined by the
cache option global_directory
.
When loading an RDF file, the system scans the configured cache files
unless cache(false)
is specified as option to rdf_load/2
or caching is disabled. If caching is enabled but no cache exists, the
system will try to create a cache file. First it will try to do this
locally. On failure it will try to configured global cache.
true
(default), caching is enabled..cache
(Windows: _cache
).true
(default false
), create a local cache
directory if none exists and the directory can be created.create_global_directory
is also given and set to
true
. Sub-directories are created to speedup indexing on
filesystems that perform poorly on directories with large numbers of
files. Initially not defined.true
(default false
), create a global cache
directory if none exists.
Sometimes it is necessary to make more arbitrary selections of
material to be saved or exchange RDF descriptions over an open network
link. The predicates in this section provide for this. Character
encoding issues are derived from the encoding of the Stream,
providing support for
utf8
, iso_latin_1
and ascii
.
DOCTYPE
,
ENTITY
and opening the rdf:RDF
element with
appropriate namespace declarations. It uses the primitives from section
3.5 to generate the required namespaces and desired short-name. Options
is one of:
rdf
and rdfs
are added
to the provided
List. If a namespace is not declared, the resource is emitted
in non-abreviated form.
Loading and saving RDF format is relatively slow. For this reason we designed a binary format that is more compact, avoids the complications of the RDF parser and avoids repetitive lookup of (URL) identifiers. Especially the speed improvement of about 25 times is worth-while when loading large databases. These predicates are used for caching by rdf_load/[1,2] under certain conditions.
user
all
information added using rdf_assert/3
is stored in the database.The rdf_db
library provides for MD5 digests. An
MD5 digest is a 128 bit long hash key computed from the triples based on
the RFC-1321 standard. MD5 keys are computed for each individual triple
and added together to compute the final key, resulting in a key that
describes the triple-set but is independant from the order in which the
triples appear. It is claimed that it is practically impossible for two
different datasets to generate the same MD5 key. The Triple20 editor
uses the MD5 key for detecting whether the triples associated to a file
have changed as well as to maintain a directory with snapshots of
versioned ontology files.
This predicate bears little relation to RDF handling. It is provided
because the RDF library already contains the MD5 algorithm and semantic
web services may involve security and consistency checking. This
predicate provides a platform independant alternative to the
library(crypt)
library provided with the clib
package.
Prolog code often contains references to constant resources in a
known XML namespace. For example,
http://www.w3.org/2000/01/rdf-schema#Class
refers to the
most general notion of a class. Readability and maintability concerns
require for abstraction here. The dynamic and multifile predicate
rdf_db:ns/2 maintains a mapping between short meaningful names and
namespace locations very much like the XML xmlns
construct.
The initial mapping contains the namespaces required for the semantic
web languages themselves:
ns(rdf, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'). ns(rdfs, 'http://www.w3.org/2000/01/rdf-schema#'). ns(owl, 'http://www.w3.org/2002/7/owl#'). ns(xsd, 'http://www.w3.org/2000/10/XMLSchema#'). ns(dc, 'http://purl.org/dc/elements/1.1/'). ns(eor, 'http://dublincore.org/2000/03/13/eor#').
All predicates for the semweb libraries use goal_expansion/2
rules to make the SWI-Prolog compiler rewrite terms of the form
Id : Local
into the fully qualified URL. In addition, the
following predicates are supplied:
Resource1 = Resource2
. As this predicate is
subject to goal-expansion it can be used to obtain or test global URL
values to readable values. The following goal unifies X with
http://www.w3.org/2000/01/rdf-schema#Class
without more
runtime overhead than normal Prolog unification.
rdf_equal(rdfs:'Class', X)
rdf_register_ns(Alias, URL,[])
.force(true)
is provided, the alias is
silently modified. Rebinding an alias must be done before any
code is compiled that relies on the alias. If the option
keep(true)
is provided the new registration is silently
ignored.literal(type(Type,
Value)
)
. This predicate is used for goal expansion of the
object fields in rdf/3
and similar goals.#
or /
character.
If we implement a new predicate based on one of the predicates of the semweb libraries that expands namespaces, namespace expansion is not automatically available to it. Consider the following code computing the number of distinct objects for a certain property on a certain object.
cardinality(S, P, C) :- ( setof(O, rdf_has(S, P, O), Os) -> length(Os, C) ; C = 0 ).
Now assume we want to write labels/2 that returns the number of distict labels of a resource:
labels(S, C) :- cardinality(S, rdfs:label, C).
This code will not work as rdfs:label
is not
expanded at compile time. To make this work, we need to add an rdf_meta/1
declaration.
:- rdf_meta cardinality(r,r,-).
As it is subject to term_expansion/2,
the rdf_meta/1
declaration can only be used as a directive. The directive must
be processed before the definition of the predicates as well as before
compiling code that uses the rdf meta-predicates. The atom rdf_meta
is declared as an operator exported from library rdf_db.pl
.
Files using
rdf_meta/1 must
explicitely load rdf_db.pl
.
Below are some examples from
rdf_db.pl
:- rdf_meta rdf(r,r,o), rdf_source_location(r,-), rdf_transaction(:).
Considering performance and modularity, we are working on a
replacement of the rdf_edit
(see section
7) layered design to deal with updates, journalling, transactions,
etc. Where the rdf_edit approach creates a single layer on top of rdf_db
and code using the RDF database must select whether to use rdf_db.pl or
rdf_edit.pl, the new approach allows to register monitors. This
allows multiple modules to provide additional services, while these
services will be used regardless of how the database is modified.
Monitors are used by the persistency library (section 4.6) and the literal indexing library (section 4.4).
literal(Arg)
of the triple's object. This event is
introduced in version 2.5.0 of this library.
begin(Nesting)
or
end(Nesting)
. Nesting expresses the nesting
level of transactions, starting at `0' for a toplevel transaction. Id
is the second argument of rdf_transaction/2.
The following transaction Ids are pre-defined by the library:
file(Path)
or stream(Stream)
.
file(Path)
.
Mask is a list of events this monitor is interested in.
Default (empty list) is to report all events. Otherwise each element is
of the form +Event or -Event to include or exclude monitoring for
certain events. The event-names are the functor names of the events
described above. The special name all
refers to all events
and
assert(load)
to assert events originating from rdf_load_db/1.
As loading triples using rdf_load_db/1
is very fast, monitoring this at the triple level may seriously harm
performance.
This predicate is intended to maintain derived data, such as a journal, information for undo, additional indexing in literals, etc. There is no way to remove registered monitors. If this is required one should register a monitor that maintains a dynamic list of subscribers like the XPCE broadcast library. A second subscription of the same hook predicate only re-assignes the mask.
The monitor hooks are called in the order of registration and in the
same thread that issued the database manipulation. To process all
changes in one thread they should be send to a thread message queue. For
all updating events, the monitor is called while the calling thread has
a write lock on the RDF store. This implies that these events are
processed strickly synchronous, even if modifications originate from
multiple threads. In particular, the transaction
begin,
... updates ... end sequence is never interleaved with
other events. Same for load
and parse
.
This section describes the remaining predicates of the rdf_db
module.
rdf_db
module. Defined
values for Statistics are:
rdf(S,P,O)
, where S, P and O
are either +
or -
. For example rdf(+,+,-)
returns the lookups with subject and predicate specified and object
unbound.This RDF low-level module has been created after two year
experimenting with a plain Prolog based module and a brief evaluation of
a second generation pure Prolog implementation. The aim was to be able
to handle upto about 5 million triples on standard (notebook) hardware
and deal efficiently with subPropertyOf
which was
identified as a crucial feature of RDFS to realise fusion of different
data-sets.
The following issues are identified and not solved in suitable manner.
subPropertyOf
of subPropertyOf
subPropertyOf
, it is likely to be profitable to
handle resource identity efficient. The current system has no support
for it.
The library(rdf_db)
module provides several hooks for
extending its functionality. Database updates can be monitored and acted
upon through the features described in section
3.6. The predicate rdf_load/2
can be hooked to deal with different formats such as turtle,
different input sources (e.g. http) and different strategies for caching
results.
The hooks below are used to add new RDF file formats and sources from which to load data to the library. They are used by the modules described below and distributed with the package. Please examine the source-code if you want to add new formats or locations.
rdf_turtle.pl
rdf_zlib_plugin.pl
rdf_http_plugin.pl
file(+Name)
,
stream(+Stream)
or url(Protocol, URL)
. If this
hook succeeds, the RDF will be read from Stream using rdf_load_stream/3.
Otherwise the default open functionality for file and stream are used.xml
.owl
. Format is either a built-in
format (xml
or triples
) or a format understood
by the rdf_load_stream/3
hook.This
module uses the library(zlib)
library to load compressed
files on the fly. The extension of the file must be .gz
.
The file format is deduced by the extension after stripping the .gz
extension. E.g. rdf_load('file.rdf.gz')
.
This module allows for rdf_load('http://...')
.
It exploits the library library(http/http_open.pl)
. The
format of the URL is determined from the mime-type returned by the
server if this is one of
text/rdf+xml
, application/x-turtle
or
application/turtle
. As RDF mime-types are not yet widely
supported, the plugin uses the extension of the URL if the claimed
mime-type is not one of the above. In addition, it recognises
text/html
and application/xhtml+xml
, scanning
the XML content for embedded RDF.
The library library(semweb/rdf_litindex.pl)
exploits the
primitives of section 4.5 and the NLP
package to provide indexing on words inside literal constants. It also
allows for fuzzy matching using stemming and `sounds-like' based on the double
metaphone algorithm of the NLP package.
sounds(Like,
Words)
, stem(Like, Words)
or prefix(Prefix,
Words)
. On compound expressions, only combinations that provide
literals are returned. Below is an example after loading the ULAN6Unified
List of Artist Names from the Getty Foundation. database
and showing all words that sounds like `rembrandt' and appear together
in a literal with the word `Rijn'. Finding this result from the 228,710
literals contained in ULAN requires 0.54 milliseconds (AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L). L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt', 'Rembrand', 'Rembrandt', 'Rembrandtsz', 'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens)
. On failure it
calls tokenize_atom/2
from the NLP package and deletes the following: atoms of length 1,
floats, integers that are out of range and the english words and
, an
, or
, of
,
on
, in
, this
and the
.
Deletion first calls the hook rdf_litindex:exclude_from_index(token,
X)
. This hook is called as follows:
no_index_token(X) :- exclude_from_index(token, X), !. no_index_token(X) :- ...
`Literal maps' provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 3.1.1. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2
on the channals new_literal
and old_literal
to
maintain an index of words that appear in a literal. Further abstraction
using Porter stemming or Metaphone can be used to create additional
search indices. These can map either directly to the literal values, or
indirectly to the plain word-map. The SWI-Prolog NLP package provides
complimentary building blocks, such as a tokenizer, Porter stem and
Double Metaphone.
rdf_litindex.pl
.The library(semweb/rdf_persistency)
provides reliable persistent storage for the RDF data. The store uses a
directory with files for each source (see rdf_source/1)
present in the database. Each source is represented by two files, one in
binary format (see rdf_save_db/2)
representing the base state and one represented as Prolog terms
representing the changes made since the base state. The latter is called
the journal.
cpu_count
or 1 (one) on
systems where this number is unknown. See also concurrent/3.true
, supress loading messages from rdf_attach_db/2.true
, nested log transactions are added to the
journal information. By default (false
), no log-term is
added for nested transactions.
The database is locked against concurrent access using a file
lock
in Directory. An attempt to attach to a
locked database raises a permission_error
exception. The
error context contains a term rdf_locked(Args)
, where args
is a list containing time(Stamp)
and pid(PID)
.
The error can be caught by the application. Otherwise it prints:
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
false
, the
journal and snapshot for the database are deleted and further changes to
triples associated with DB are not recorded. If Bool
is true
a snapshot is created for the current state and
further modifications are monitored. Switching persistency does not
affect the triples in the in-memory RDF database.min_size(KB)
only
journals larger than KB Kbytes are merged with the base
state. Flushing a journal takes the following steps, ensuring a stable
state can be recovered at any moment.
.new
.
.new
file over the base
state.
Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
The above predicates suffice for most applications. The predicates in
this section provide access to the journal files and the base state
files and are intented to provide additional services, such as reasoning
about the journals, loaded files, etc.7A
library library(rdf_history)
is under development
exploiting these features supporting wiki style editing of RDF.
Using rdf_transaction(Goal, log(Message))
, we can add
additional records to enrich the journal of affected databases with Term
and some additional bookkeeping information. Such a transaction adds a
term
begin(Id, Nest, Time, Message)
before the change operations
on each affected database and end(Id, Nest, Affected)
after
the change operations. Here is an example call and content of the
journal file mydb.jrn
. A full explanation of the terms that
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, []). end([time(1183540578)]).
Using rdf_transaction(Goal, log(Message, DB))
, where DB
is an atom denoting a (possibly empty) named graph, the system
guarantees that a non-empty transaction will leave a possibly empty
transaction record in DB. This feature assumes named graphs are named
after the user making the changes. If a user action does not affect the
user's graph, such as deleting a triple from another graph, we still
find record of all actions performed by some user in the journal of that
user.
time(Stamp)
.time(Stamp)
.log(Message)
. Id is an
integer counting the logged transactions to this database. Numbers are
increasing and designed for binary search within the journal file.
Nest is the nesting level, where `0' is a toplevel
transaction.
Time is a time-stamp, currently using float notation with two
fractional digits. Message is the term provided by the user
as argument of the log(Message)
transaction.log(Message)
. Id and Nest
match the begin-term. Others gives a list of other databases
affected by this transaction and the Id of these records. The
terms in this list have the format DB:Id.
.trp
for the base state and .jrn
for the
journal.
The module library(semweb/rdf_turtle)
provides a parser for the alternative RDF Turtle syntax.9http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/
The Turtle syntax is the basis for the SPARQL query language and much
easier to read and write by humans10And
computers ... than the RDF/XML syntax.
This module acts as a plugin to library(rdf_db.pl)
,
enabling
load_rdf/2
to load Turtle data transparently. The default extension for a turtle
file is .ttl
.
rdf(S,P,O)
triples. Input is either a term stream(Stream)
or a
specification for absolute_file_name/3.
Processed options are:
file://
and the filename.__
. Default is
constructed from the base URI.
db(DB)
, specifying the named graph.
Default is the base URI.
The library(rdfs)
library adds interpretation of the triple store in terms of concepts
from RDF-Schema (RDFS). There are two ways to provide support for more
high level languages in RDF. One is to view such languages as a set of entailment
rules. In this model the rdfs library would provide a predicate rdfs/3
providing the same functionality as rdf/3
on union of the raw graph and triples that can be derived by applying
the RDFS entailment rules.
Alternatively, RDFS provides a view on the RDF store in terms of
individuals, classes, properties, etc., and we can provide predicates
that query the database with this view in mind. This is the approach
taken in the library(rdfs.pl)
library, providing calls like
rdfs_individual_of(?Resource, ?Class)
.11The
SeRQL language is based on querying the deductive closure of the triple
set. The SWI-Prolog SeRQL library provides entailment modules
that take the approach outlined above.
The predicates in this section explore the rdfs:subPropertyOf
,
rdfs:subClassOf
and rdf:type
relations. Note
that the most fundamental of these, rdfs:subPropertyOf
, is
also used by rdf_has/[3,4].
rdfs:subPropertyOf
relation. It can be used to test as well
as generate sub-properties or super-properties. Note that the commonly
used semantics of this predicate is wired into rdf_has/[3,4].bugThe
current implementation cannot deal with cycles.bugThe
current implementation cannot deal with predicates that are an rdfs:subPropertyOf
of rdfs:subPropertyOf
, such as owl:samePropertyAs
.rdfs:subClassOf
relation. It can be used to test as well as
generate sub-classes or super-classes.bugThe
current implementation cannot deal with cycles.rdf:type
property that refers to
Class or a sub-class thereof. Can be used to test, generate
classes Resource belongs to or generate individuals described
by Class.
The
RDF construct rdf:parseType
=Collection
constructs a list using the rdf:first
and rdf:next
relations.
rdf:List
or rdf:Container
.rdf:List
into a Prolog list of objects.user
.Textual search is partly handled by the predicates from the
library(rdf_db)
module and its underlying C-library. For
example, literal objects are hashed case-insensitive to speed up the
commonly used case-insensitive search.
rdfs:label
or it is extracted from the URL
using rdf_split_url/3. Language
is unified to the value of the xml:lang
attribute of the
label or a variable if the label has no language specified.rdfs_label(Resource, _, Label)
.rdfs_ns_label(Resource, _, Label)
.The current SemWeb library distributed with
SWI-Prolog does not yet contain an OWL module. A module owl.pl
is part of the
Triple20
triple browser and editor provides limited support for OWL reasoning.
It is anticipated that this library will eventually be superseeded by facilities running on top of the native rdf_transaction/2 and rdf_monitor/2 facilities. See section 3.6.
The
module rdf_edit.pl
is a layer than encasulates the
modification predicates from section 3.3
for use from a (graphical) editor of the triple store. It adds the
following features:
Transactions group low-level modification actions together.
Transactions may be nested. A failing nested transaction only reverts the actions performed inside the nested transaction. If the outer transaction succeeds it is committed normally. Contrary, if the outer transaction fails, comitted nested transactions are reverted as well. If any of the modifications inside the transaction modifies a protected file (see rdfe_set_file_property/2) the transaction is reverted and rdfe_transaction/1 throws a permission error.
A successful outer transaction (`level-0') may be undone using rdfe_undo/0.
ro
or rw
. Access ro
is default when a file is loaded for which the user has no write access.
If a transaction (see rdfe_transaction/1)
modifies a file with access ro
the transaction is reversed.fallback
it is only the default for
triples that have no clear default destination. If it is all
all new triples are added to this file.
The following predicates encapsulate predicates from the rdf_db
module that modify the triple store. These predicates can only be called
when inside a transaction. See rdfe_transaction/1.
This section describes a (yet very incomplete) set of more high-level operations one would like to be able to perform. Eventually this set may include operations based on RDFS and OWL.
Undo aims at user-level undo operations from a (graphical) editor.
Optionally, every action through this module is immediately send to a journal-file. The journal provides a full log of all actions with a time-stamp that may be used for inspection of behaviour, version management, crash-recovery or an alternative to regular save operations.
append
and File exists, the journal is first replayed. See
rdfe_replay_journal/1.
If Mode is write
the journal is truncated if it
exists.append
mode of rdfe_open_journal/2.
To
realise a modular graphical interface for editing the triple store, the
system must use some sort of event mechanism. This is
implemented by the XPCE library library(broadcast)
which is
described in the XPCE
User Guide. In this section we describe the terms brodcasted by the
library.
If a transaction is reverted due to failure or exception no event is broadcasted. The initiating GUI element is supposed to handle this possibility itself and other components are not affected as the triple store is not changed.
undo
or redo
and Id
identifies the transaction as above.
The
SWI-Prolog SemWeb package is designed to provide access to the Semantic
Web languages from Prolog. It consists of the low level
rdf_db.pl
store with layers such as rdfs.pl
to
provide more high level querying of a triple set with relations such as
rdfs_individual_of/2, rdfs_subclass_of/2,
etc.
SeRQL is a semantic web
query language taking another route. Instead of providing alternative
relations SeRQL defines a graph query on de deductive closure
of the triple set. For example, under assumption of RDFS entailment
rules this makes the query rdf(S, rdf:type, Class)
equivalent to
rdfs_individual_of(S, Class)
.
We developed a parser for SeRQL
which compiles SeRQL path expressions into Prolog conjunctions of rdf(Subject,
Predicate, Object)
calls. Entailment modules realise a
fully logical implementation of rdf/3
including the entailment reasoning required to deal with a Semantic Web
language or application specific reasoning. The infra structure is
completed with a query optimiser and an HTTP server compliant to the Sesame
implementation of the SeRQL language. The Sesame Java client can be used
to access Prolog servers from Java, while the Prolog client can be used
to access the Sesame SeRQL server. For further details, see the
project
home.
This research was supported by the following projects: MIA and MultimediaN project (www.multimedian.nl) funded through the BSIK programme of the Dutch Government, the FP-6 project HOPS of the European Commision.
The implementation of AVL trees is based on libavl by Brad Appleton.
See the source file avl.c
for details.