Lexicographer files correspond to the syntactic categories implemented in WordNet - noun, verb, adjective and adverb. All of the synsets in a lexicographer file are in the same syntactic category. Each synset consists of a list of synonymous words or collocations (eg. "fountain pen" , "take in" ), and pointers that describe the relations between this synset and other synsets. These relations include (but are not limited to) hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy. A word or collocation may appear in more than one synset, and in more than one part of speech. Each use of a word in a synset represents a sense of that word in the part of speech corresponding to the synset.
Adjectives may be organized into clusters containing head synsets and satellite synsets. Adverbs generally point to the adjectives from which they are derived.
See wngloss(7WN) for a glossary of WordNet terminology and a discussion of the database's content and logical organization.
pos.suffix
where pos is either noun , verb , adj or adv . suffix may be used to organize groups of synsets into different files, for example noun.animal and noun.plant . See lexnames(5WN) for a list of lexicographer file names that are used in building WordNet.
A relation from a source to a target synset is formed by specifying a word from the target synset in the source synset, followed by the pointer_symbol indicating the pointer type. The location of a pointer within a synset defines it as either lexical or semantic. The Lexicographer File Format section describes the syntax for entering a semantic pointer, and Word Syntax describes the syntax for entering a lexical pointer.
Although there are many pointer types, only certain types of relations are permitted between synsets of each syntactic category.
The pointer_symbol s for nouns are:
! Antonym
@ Hypernym
@i Instance Hypernym
Hyponym
i Instance Hyponym
#m Member holonym
#s Substance holonym
#p Part holonym
%m Member meronym
%s Substance meronym
%p Part meronym
= Attribute
+ Derivationally related form
;c Domain of synset - TOPIC
-c Member of this domain - TOPIC
;r Domain of synset - REGION
-r Member of this domain - REGION
;u Domain of synset - USAGE
-u Member of this domain - USAGE
The pointer_symbol s for verbs are:
! Antonym
@ Hypernym
Hyponym
* Entailment
> Cause
^ Also see
$ Verb Group
+ Derivationally related form
;c Domain of synset - TOPIC
;r Domain of synset - REGION
;u Domain of synset - USAGE
The pointer_symbol s for adjectives are:
! Antonym
& Similar to
< Participle of verb
\ Pertainym (pertains to noun)
= Attribute
^ Also see
;c Domain of synset - TOPIC
;r Domain of synset - REGION
;u Domain of synset - USAGE
The pointer_symbol s for adverbs are:
! Antonym
\ Derived from adjective
;c Domain of synset - TOPIC
;r Domain of synset - REGION
;u Domain of synset - USAGE
Many pointer types are reflexive, meaning that if a synset contains a pointer to another synset, the other synset should contain a corresponding reflexive pointer. grind(1WN) automatically inserts missing reflexive pointers for the following pointer types:
Pointer | Reflect |
Antonym | Antonym |
Hyponym | Hypernym |
Hypernym | Hyponym |
Instance Hyponym | Instance Hypernym |
Instance Hypernym | Instance Hyponym |
Holonym | Meronym |
Meronym | Holonym |
Similar to | Similar to |
Attribute | Attribute |
Verb Group | Verb Group |
Derivationally Related | Derivationally Related |
Domain of synset | Member of Doman |
1 Something ----s
2 Somebody ----s
3 It is ----ing
4 Something is ----ing PP
5 Something ----s something Adjective/Noun
6 Something ----s Adjective/Noun
7 Somebody ----s Adjective
8 Somebody ----s something
9 Somebody ----s somebody
10 Something ----s somebody
11 Something ----s something
12 Something ----s to somebody
13 Somebody ----s on something
14 Somebody ----s somebody something
15 Somebody ----s something to somebody
16 Somebody ----s something from somebody
17 Somebody ----s somebody with something
18 Somebody ----s somebody of something
19 Somebody ----s something on somebody
20 Somebody ----s somebody PP
21 Somebody ----s something PP
22 Somebody ----s PP
23 Somebody's (body part) ----s
24 Somebody ----s somebody to INFINITIVE
25 Somebody ----s somebody INFINITIVE
26 Somebody ----s that CLAUSE
27 Somebody ----s to somebody
28 Somebody ----s to INFINITIVE
29 Somebody ----s whether INFINITIVE
30 Somebody ----s somebody into V-ing something
31 Somebody ----s something with something
32 Somebody ----s INFINITIVE
33 Somebody ----s VERB-ing
34 It ----s that CLAUSE
35 Something ----s INFINITIVE
The general synset syntax is:
{ words pointers ( gloss ) }
Synsets of this form are valid for all syntactic categories except verb, and are referred to as basic synsets. At least one word and a gloss are required to form a valid synset. Pointers entered following all the words in a synset represent semantic relations between all the words in the source and target synsets.
For verbs, the basic synset syntax is defined as follows:
{ words pointers frames ( gloss ) }
Adjective may be organized into clusters containing one or more head synsets and optional satellite synsets. Adjective clusters are of the form:
[
head synset
[satellite synsets]
[-]
[additional head/satellite synsets]
]
Each adjective cluster is enclosed in square brackets, and may have one or more parts. Each part consists of a head synset and optional satellite synsets that are conceptually similar to the head synset's meaning. Parts of a cluster are separated by one or more hyphens (- ) on a line by themselves, with the terminating square bracket following the last synset. Head and satellite synsets follow the syntax of basic synsets, however a "Similar to" pointer must be specified in a head synset for each of its satellite synsets. Most adjective clusters contain two antonymous parts. See wngloss(7WN) for a discussion of adjective clusters, and Special Adjective Syntax for more information on adjective cluster syntax.
Synsets for relational adjectives (pertainyms) and participial adjectives do not adhere to the cluster structure. They use the basic synset syntax.
Comments can be entered in a lexicographer file by enclosing the text of the comment in parentheses. Note that comments cannot appear within a synset, as parentheses within a synset have an entirely different meaning (see Gloss Syntax ). However, entire synsets (or adjective clusters) can be "commented out" by enclosing them in parentheses. This is often used by the lexicographers to verify the syntax of files under development or to leave a note to oneself while working on entries.
A simple word is of the form:
word[ ( marker ) ][lex_id] ,
word may be entered in any combination of upper and lower case unless it is in an adjective cluster. A collocation is entered by joining the individual words with an underscore character (_ ). Numbers (integer or real) may be entered, either by themselves or as part of a word string, by following the number with a double quote (" ).
See Special Adjective Syntax for a description of adjective clusters and markers.
word may be followed by an integer lex_id from 1 to 15 . The lex_id is used to distinguish different senses of the same word within a lexicographer file. The lexicographer assigns lex_id values, usually in ascending order, although there is no requirement that the numbers be consecutive. The default is 0 , and does not have to be specified. A lex_id must be used on pointers if the desired sense has a non-zero lex_id in its synset specification.
Word/pointer syntax is of the form:
[ word[ ( marker ) ][lex_id] , pointers ]
This syntax is used when one or more pointers correspond only to the specific word in the word/pointer set, rather than all the words in the synset, and represents a lexical relation. Note that a word/pointer set appears within a synset, therefore the square brackets used to enclose it are treated differently from those used to define an adjective cluster. Only one word can be specified in each word/pointer set, and any number of pointers may be included. A synset can have any number of word/pointer sets. Each is treated by grind(1WN) essentially as a word , so they all must appear before any synset pointers representing semantic relations.
For verbs, the word/pointer syntax is extended in the following manner to allow the user to specify generic sentence frames that, like pointers, correspond only to a specific word, rather than all the words in the synset. In this case, pointers are optional.
[ word , [pointers] frames ]
A pointer is of the form:
[lex_filename : ]word[lex_id] , pointer_symbol
or:
[lex_filename : ]word[lex_id] ^ word[lex_id] , pointer_symbol
For pointers, word indicates a word in another synset. When the second form of a pointer is used, the first word indicates a word in a head synset, and the second is a word in a satellite of that cluster. word may be followed by a lex_id that is used to match the pointer to the correct target synset. The synset containing word may reside in another lexicographer file. In this case, word is preceded by lex_filename as shown.
See Pointers for a list of pointer_symbol s and their meanings.
A frame number list is entered as follows:
frames: f_num [, f_num...]
Where f_num specifies a generic frame number. See Verb Frames for a list of generic sentences and their corresponding frame numbers.
The first word of a head synset must be entered in upper case, and can be thought of as the head word of the head synset. The word part of a pointer from one head synset to another head synset within the same cluster (usually an antonym) must also be entered in upper case. Usually antonymous adjectives are entered using the word/pointer syntax described in Word Syntax to indicate a lexical relation. There is no restriction on the number of parts that a cluster may have, and some clusters have three parts, representing antonymous triplets, such as solid , liquid , and gas .
A cross-cluster pointer may be specified, allowing a head or satellite synset to point to a head synset in a different cluster. A cross-cluster pointer is indicated by entering the word part of the pointer in upper case.
An adjective may be annotated with a syntactic marker indicating a limitation on the syntactic position the adjective may have in relation to noun that it modifies. If so marked, the marker appears between the word and its following comma. If a lex_id is specified, the marker immediately follows it. The syntactic markers are:
(p) predicate position
(a) prenominal (attributive) position
(ip) immediately postnominal position
Sample noun synsets:
{ canine, [ dog1, cat,! ] pooch, canid,@ }
{ collie, dog1,@ (large multi-colored dog with pointy nose) }
{ hound, hunting_dog, pack,#m dog1,@ }
{ dog, }
Sample verb synsets:
{ [ confuse, clarify,! frames: 1 ] blur, obscure, frames: 8, 10 }
{ [ clarify, confuse,! ] make_clear, interpret,@ frames: 8 }
{ interpret, construe, understand,@ frames: 8 }
Sample adjective clusters:
[
{ [ HOT, COLD,! ] lukewarm(a), TEPID,^ (hot to the touch) }
{ warm, }
-
{ [ COLD, HOT,! ] frigid, (cold to the touch) }
{ freezing, }
]
Sample adverb synsets:
{ [ basically, adj.all:essential^basic,\ ] [ essentially, adj.all:basic^fundamental,\ ] ( by one's very nature )}
{ pointedly, adj.all:pungent^pointed,\ }
{ [ badly, adj.all:bad,\ well,! ] ill, ("He was badly prepared") }
Fellbaum, C. (1998), ed. "WordNet: An Electronic Lexical Database" . MIT Press, Cambridge, MA.