This module finds literals of the RDF database based on words, stemming
and sounds like (metaphone). The normal user-level predicate is
rdf_set_literal_index_option(+Options:list)- Set options for the literal package. Currently defined options
- verbose(Bool)
- If
true
, print progress messages while building the
index tables.
- index_threads(+Count)
- Number of threads to use for initial indexing of
literals
- index(+How)
- How to deal with indexing new literals. How is one of
self
(execute in the same thread), thread(N)
(execute
in N concurrent threads) or default
(depends on number
of cores).
- stopgap_threshold(+Count)
- Add a token to the dynamic stopgap set if it appears in
more than Count literals. The default is 50,000.
rdf_find_literal(+Spec, -Literal) is nondet
rdf_find_literals(+Spec, -Literals) is det- Find literals in the RDF database matching Spec. Spec is defined
as:
Spec ::= and(Spec,Spec)
Spec ::= or(Spec,Spec)
Spec ::= not(Spec)
Spec ::= sounds(Like)
Spec ::= stem(Like) % same as stem(Like, en)
Spec ::= stem(Like, Lang)
Spec ::= prefix(Prefix)
Spec ::= between(Low, High) % Numerical between
Spec ::= ge(High) % Numerical greater-equal
Spec ::= le(Low) % Numerical less-equal
Spec ::= Token
sounds(Like)
and stem(Like)
both map to a disjunction. First we
compile the spec to normal form: a disjunction of conjunctions
on elementary tokens. Then we execute all the conjunctions and
generate the union using ordered-set algorithms.
Stopgaps are ignored. If the final result is only a stopgap, the
predicate fails.
- To be done
- - Exploit ordering of numbers and allow for > N, < N, etc.
rdf_token_expansions(+Spec, -Extensions)- Determine which extensions of a token contribute to finding
literals.
rdf_delete_literal_index(+Type)- Fully delete a literal index
rdf_tokenize_literal(+Literal, -Tokens) is semidet- Tokenize a literal. We make this hookable as tokenization is
generally domain dependent.
rdf_stopgap_token(-Token) is nondet- True when Token is a stopgap token. Currently, this implies one
of:
exclude_from_index(token, Token)
is true
default_stopgap(Token)
is true
- Token is an atom of length 1
- Token was added to the dynamic stopgap token set because
it appeared in more than stopgap_threshold literals.
rdf_literal_index(+Type, -Index) is det- True when Index is a literal map containing the index of Type.
Type is one of:
- token
- Tokens are basically words of literal values. See
rdf_tokenize_literal/2. The
token
map maps tokens to full
literal texts.
- stem
- Index of stemmed tokens. If the language is available, the
tokens are stemmed using the matching snowball stemmer.
The
stem
map maps stemmed to full tokens.
- metaphone
- Phonetic index of tokens. The
metaphone
map maps phonetic
keys to tokens.