SWI-Prolog -- Verify Type of a Term

LogicalCaptain said (2020-05-07T15:49:52):

If you have a a Prolog term denoted by variable name `T`, what could it be?

This is basically a decision tree that a program could run, although in practice you just immediately use the predictate at the very botoom: for example, integer(T) will tell you whether T denotes an integer or not. If yes, you immediately know that it is also a rational, a number, an atomic, a nonvar, and evidently "any".

Note the subtree under number:

float is a "representation type" (for want of a better word). You can have a numeric value that has a representation both as integer and float and these representations are distinguishable (1 =:= 1.0 but 1 \== 1.0).

Similarly, every number that has a float representation can equivalently be represented as a rational (except for infinities and negative zeros and NaNs), but the representations are distinguishable: 0.5 =:= 1r2 but 0.5 \== 1r2.

On the other hand, integer is a "subset type" of rational: There is no way to distinguish the (infinitely many) representations of an integer written as a non-canonical rational from "the" integer: 5r1 =:= 10r2 and also 5r1 == 10r2, as well as 5 =:= 10r2 and also 5 == 10r2 hold.

Special floats that could be added to the diagram:

The NaN (Not-a-Number), which is actually a number/1, see nan/0
The +Infinity and -Infinity, see inf/0
The "negative 0" -0.0, which is subtly different from +0.0 alias 0.0 (see Wikipedia: Signed Zero and NaN and Infinity floats and their syntax

See also in this manual:

2020-12-14




                                                                       any T
                                                                         |
                                     +-----------------------------------+---------------------------+
                                     |                                                               |
                                   var(T)                                                         nonvar(T)
                    the variable name T denotes an 'unbound                               the complement of var(T)
                    variable' (aka. uninstantiated variable),                                        |
                    which is an empty memory cell distinguishable                                    |
                    by '==' at the time the call to var/1 occurs                                     |
                    - a nonlogical predicate about the state of the                                  |
                      computation                                                                    |
                    - this predicate should have been called                                         |
                      unbound(T); a historical accident!                                             |
                                                                         +---------------------------+-----------------------------+
                                                                         |                                                         |
                                                                     atomic(T)                                                 compound(T)
                                                                         |                                                         |
                                      +----------------------------------+------------------------+                 +--------------+-------------+
                                      |                                  |                        |                 |                            |
                                  blob(T,_)                           string(T)                number(T)       compound term              compound term
                                      |                                                           |             of arity 0                 of arity > 0
                                      |                                                           |          (SWI-Prolog only)                   |
                                      |                                                           |                                              |
                                      |                                                           |                                              |
              +-----------------------+-----------------------+                         +---------+---------+                 +------------------+-------------------+
              |                       |                       |                         |                   |                 |                  |                   |
       (other blob types)    blob(T,reserved_symbol)     blob(T,text)           rational(T,Nu,De)        float(T)       SWI-Prolog dict      first listcell        others
      encapsulated foreign            |                    atom(T)                      |                                                  of a list backbone
           resources                  |                       |                         |                              an encapsulated       '[|]'(H,Rs)
                                      |                       |                         |                              data structure; for   alias [H|Rs]
                             +--------+--------+              |              +----------+----------+                   now implemented as
                             |                 |              |              |                     |                   a compound term)    "list" is a *nonlocal*
                           T==[]             T\==[]           |   rational(X),\+integer(X)      integer(T)                                 structure based on convention
                        empty list        for now only        |      "proper rational"                                                     there may or may not be an
                                         the unfakeable       |                                                                            actual list beyond the first
                                       dict functor name      |                                                                            listcell
                                                              |
                                           +------------------+------------------+
                                           |                  |                  |
                                       length=0           length=1            atom with
                                    the empty atom     character/char         length>1

The above as easily printable A4 PDF

More complete in in SVG.

Some notes I took about this.

The SWI-Prolog Wiki has more on SWI-Prolog datatypes here.

Interestingly, the empty list symbol [] can be used as functor name:

?- compound_name_arguments(F, [],[x,y]).
F = [](x,y).

Blobs can be used as functor names in general. However, performing that party trick is dis-recommended.

A symbol that is missing

In SQL, we have the value NULL, indicating missing data. This might be of some use in Prolog facts, and "officialize" a NULL value (what do we use currently? _? , '', "", []? It depends!)

Yes we can use the atom null, but having a dedicated NULL would be more expressive.

No booleans

Perphas surprisingly for a logic programming language, there is no dedicated "boolean" type (there seem to be no ready-to-use logical connectives working on "boolean" either but they are easy to write). Instead the atoms true and false are (generally) used when you want to "reify" the truth values:

xor(true,false,true) :- !.
xor(false,true,true) :- !.
xor(true,true,false) :- !.
xor(false,false,false) :- !.

There are 0-ary predicates of the same name true/0 and false/0 so you can call those to "un-reify" the truth value, obtaining proper success or failure. An atom at goal position is interpreted as a call to the similarly named 0-ary predicate:

?- true.
true.

?- false.
false.

There is also an undefined/0 truth-value predicate used in well-founded semantics:

?- undefined.
undefined.

Indeed, on can refer to must_be/2: must_be(boolean,X) checks whether X is one of the atoms true or false. (It doesn't accept a third truth value undefined, used tabling, either).

Thoughts

There should be a special "NULL" atomic value similar as for SQL to designate missing information. Currently one has to resort to contortions express NULL, choosing atoms with special meanings or using _, which is an unbund variable, not "NULL".
Actual arrays with access-by-numeric-index can be implemented with (large) compound terms.
The blob type gives the possibility to create unfakeable identifiers (i.e. enum types); there is absolutely use for this (think of manipulating a syntax tree where you really want to distinguish the variables that appear in the AST from the rest without awkward conventions about allowed names or using tagging) but it is not supported for now. You will have to add C code. See: blob
Why is Prolog still not "sorted" after all these years? It would be Prolog++ of course. Would still be worth it.

The concept of the Prolog "term" is still very primitive. In Higher-Order Logic Programming as Constraint Logic Programming (Spiro Michaylov, Frank Pfenning), 1993, we read:

Higher-order logic programming (HOLP) languages typically use a typed λ-calculus as their domain of computation. In the case of λProlog it is the simply-typed λ-calculus, while in the case of Elf it is a dependently typed λ-calculus. These languages are particularly useful for various kinds of meta-programming and theorem proving tasks because of the logical support for variable binding via λ-abstraction. They have been used for a wide range of applications including theorem proving, programming language interpretation, type inference, compilation, and natural language parsing.

However:

Full unification in higher-order languages is clearly impractical, due to the non- existence of minimal complete sets of most-general unifiers. Therefore, work on λProlog has used Huet’s algorithm for pre-unification, where so-called flex-flex pairs (which are always unifiable) are maintained as constraints, rather than being incorporated in an explicit parametric form. Yet, even pre-unifiability is undecidable, and sets of most general pre-unifiers may be infinite. While undecidability has not turned out to be a severe problem, the lack of unique most general unifiers makes it difficult to accurately predict the run-time behavior of λProlog programs that attempt to take advantage of full higher-order pre-unification. It can result in thrashing when certain combinations of unification problems have to be solved by extensive backtracking. Moreover, in a straightforward implementation, common cases of unification incur a high overhead compared to first-order unification. These problems have led to a search for natural, decidable subcases of higher-order unification where most general unifiers exist. Miller has suggested a syntactic restriction (Lλ) to λProlog, easily extensible to related languages, where most general unifiers are unique modulo βηα-equivalence

Going further, one attains logic programming on top of proper object-oriented database objects: F-Logic

A type tree just for "terms that are _texty_"

So I was thinking about what the type of a "text thing" could be and this seems appropriate:

Categorization always categorizes a term as not belonging to this tree or else as belonging to exactly one of the leaf types. Types nearer the root type just regroup sutypes in a meaningful way (for the programmer).

           +--- var?: an unbound variable; can become anything
           |
           |
           |                   +--- openvarlist?: nonempty open list of > 0 vars
           |                   |
           +--- openlist? -----+--- opencharylist?: nonempty open list of > 0 char, may contain vars
           |                   |
           |                   +--- opencodeylist?: nonempty open list of > 0 unicode code points, may contain vars
           |
           |                   +--- varlist?: list of N > 0 vars; edge case
           |                   |
           +--- changylist? ---+--- charylist?: list of C > 0 chars and V > 0 vars
           |                   |
           |                   +--- codeylist?: list of C> 0 unicode code points and V > 0 vars
           |
           |                                               +--- emptylist!: empty list; edge case as not sure whether charlist or codelist
           |                                               |
           |                             +--- textlist! ---+--- charlist!: nonempty list of chars (atoms of length 1)
           |                             |                 |
           |                             |                 +--- codelist!: nonempty list of unicode code points (integers between 0 and 0x10FFFF)
           |                             |
 texty* ---+               +--- text! ---+
           |               |             |
           |               |             |                +--- atom!: Prolog atoms including the empty atom '' (but not the empty list [])
           |               |             |                |
           +---anytext! ---+             +--- stringy! ---+
                           |                              |
                           |                              +--- string!: SWI-Prolog strings including the empty string ""
                           |
                           +--- number!  acceptable because a number can be
                                         transformed into text (according to some
                                         unspecified convention...)

Show me the code you say?

An implementation of the above type tree can be found as pre-release here

4.5 Verify Type of a Term

See also

If you have a a Prolog term denoted by variable name T, what could it be?

A symbol that is missing

No booleans

Thoughts

A type tree just for "terms that are _texty_"

If you have a a Prolog term denoted by variable name `T`, what could it be?