Did you know ... | Search Documentation: |
Predicate tokenize_atom/2 |
[-+][0-9]+(\.[0-9]+)?([eE][-+][0-9]+)? | number |
[:alpha:][:alnum:]+ | word |
[:space:]+ | skipped |
anything else | single-character |
Character classification is based on the C-library iswalnum() etc. functions. Recognised numbers are passed to Prolog read/1, supporting unbounded integers.
It is likely that future versions of this library will provide tokenize_atom/3 with additional options to modify space handling as well as the definition of words.
word_frequency_count(Words, Counts) :- maplist(downcase_atom, Words, LwrWords), msort(LwrWords, Sorted), clumped(Sorted, Counts).
?- word_frequency_count([a,b,'A',c,d,'B',b,e], Counts). Counts = [a-2, b-3, c-1, d-1, e-1].