Word  Net - A Lexical Database for English
1 Requirements
2 Installing the Word  Net library
3 About the library
wn-init
4 High Level Interface
5 The C Library Interface
5.1 Basic Type Definitions
5.2 Search Functions
5.3 Example for iteration forms
5.4 The c-synset data-structure
7.0

WordNet - A Lexical Database for English

This is a Racket FFI interface to the Princeton University’s WordNet® library. The following excerpt from their website adequately summarizes what WordNet is.

WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.

1 Requirements

This package has been developed and tested on Mac OS X (10.10). The instructions here should largely be applicable for Linux and other forms of Unix. If you run into any issues, please contact the author.

2 Installing the WordNet library

The WordNet library is available from here. The default library available from WordNet links into a static library, which is unusable by the Racket FFI. Follow the instructions in this section to build a shared library.

Assume that ~/Downloads/WordNet-3.0 is the directory into which the tarball has been untar’d.

> cd ~/Downloads/WordNet-3.0

Edit the configure.ac file and add the following lines to it, after the line that says AC_PROG_INSTALL:

AC_ENABLE_SHARED

AC_DISABLE_STATIC

AC_PROG_LIBTOOL(libtool)

Edit the lib/Makefile.am file and replace its contents with the following:

lib_LTLIBRARIES = libWN.la

libWN_la_SOURCES = binsrch.c morph.c search.c wnglobal.c wnhelp.c wnrtl.c\

                   wnutil.c

libWN_la_CPPFLAGS = $(INCLUDES) -fPIC

libWN_la_LDFLAGS = -shared -fPIC

INCLUDES = -I$(top_srcdir) -I$(top_srcdir)/include

SUBDIRS = wnres

Now, reconfigure and build the distribution. Replace the prefix to suit your installation appropriately

> autoreconf -i

> ./configure --prefix="/usr/local"

> make

> sudo make install

Your library and its associate data will now be installed in /usr/local

3 About the library

 (require wn/wn) package: base

The WordNet library consists of a few sections: Search, Morphology and Utilities. This Racket interface to the library leaves out some of the utilities because they are largely redundant. The documentation of the original C library functions is available here.

The library must be initialized before any of the functions can be used. The following function initializes the library.

procedure

(wn-init)  integer?

Returns 0 upon successful initialization, a non-zero number othrewise. This function must be called before any of the other functions are called.

4 High Level Interface

This section describes a higher level interface for accessing the word-net functions. It largely consists of two things: Searching and Lemmatization. The searching functions are defined as functions that return lists of words based on the search criteria. All the search-functions have an identical format which is as follows.

procedure

(<search-fn> word    
  part-of-speech    
  [#:recursive recursive?])  (listof string?)
  word : string?
  part-of-speech : parts-of-speech?
  recursive? : boolean? = #t
Apply the search-function designated by the name <search-fn> and return the results as a list of words. word can be any word, or a collocation (words joined by ‘_’). part-of-speech is one of noun,verb,adjective,adverb, or satellite. Refer to the type definition below for details. recursive? indicates if the results should be searched recursively. The default is to return results recursively. For example, when searching for hypernyms of a word, recursive? being #t will return not just the immediate hypernyms, but will recursively follow those hypernyms returning a whole chain of hypernyms. Providing #f for this argument will only return immediate hypernyms. The following search functions are provided:

antonyms

hypernyms

hyponyms

entails

similars

member-meronyms

substance-meronyms

part-meronyms

member-holonyms

substance-holonyms

part-holonyms

meronyms

holonyms

causes

participles-of-verb

attributes

derivations

classifications

classes

synonyms

noun-coordinates

hierarchical-meronyms

hierarchical-holonyms

classification-categories

classification-usages

classification-regionals

class-categories

class-usages

class-regionals

instances-of

instances

procedure

(lemma word part-of-speech)  (or/c string? #f)

  word : string?
  part-of-speech : parts-of-speech?
Lemmatize the provided word, in the given part of speech. part-of-speech is one of noun,verb,adjective,adverb, or satellite. Refer to the type definition below for details

procedure

(parts-of-speech? x)  boolean

  x : any/c
Returns #t if x is one of: 'noun, 'verb, 'adjective, 'adverb, 'satellite. Most of these are obvious, but satellite stands for an adjective cluster which consists of more than one concept in it. This part-of-speech very specific to WordNet. Use it if you understand what it is.

5 The C Library Interface

This section covers the lower-level C library interface. The high-level interface covers most of what is necessary, but should you need deeper access into the library, the following document should help.

5.1 Basic Type Definitions

procedure

(search-type? x)  boolean?

  x : any/c
Returns #f if x is not one of the following values (which are mostly self-explanatory.): 'antonym, 'recursive-antonym,
'hypernym, 'recursive-hypernym,
'hyponym, 'recursive-hyponym,
'entails, 'recursive-entails,
'similar, 'recursive-similar,
'member-meronym, 'recursive-member-meronym,
'substance-meronym, 'recursive-substance-meronym,
'part-meronym, 'recursive-part-meronym,
'member-holonym, 'recursive-member-holonym,
'substance-holonym, 'recursive-substance-holonym,
'part-holonym, 'recursive-part-holonym,
'meronym, 'recursive-meronym,
'holonym, 'recursive-holonym,
'cause, 'recursive-cause,
'particple-of-verb, 'recursive-particple-of-verb,
'see-also, 'recursive-see-also,
'pertains-to, 'recursive-pertains-to,
'attribute, 'recursive-attribute,
'verb-group, 'recursive-verb-group,
'derivation, 'recursive-derivation,
'classification, 'recursive-classification,
'class, 'recursive-class,
'synonyms, 'recursive-synonyms,
'polysemy, 'recursive-polysemy,
'frames, 'recursive-frames,
'noun-coordinates, '[email protected] (linebreak) inates, 'relatives, 'recursive-relatives,
'hierarchical-meronym, 'recursive-hierarchical-meronym,
'hierarchical-holonym, 'recursive-hierarchical-holonym,
'keywords-by-substring, 'recursive-keywords-by-substring,
'overview, 'recursive-overview,
'classification-category, 'recursive-classification-category,
'classification-usage, 'recursive-classification-usage,
'classification-regional, 'recursive-classification-regional,
'class-category, 'recursive-class-category,
'class-usage, 'recursive-class-usage,
'class-regional, 'recursive-class-regional,
'instance-of, 'recursive-instance-of,
'instances, 'recursive-instances.

The names here are made more readable, but are drawn from the list of “search ptrs” in the documentation. They correspond to #define’d constants in the file wn.h in the WordNet source directory. The ‘recursive-’ versions of these constants are negated, according to the convention used by WordNet, which uses negative search types for recursive searches. For more information about these search types, it is best to refer to the code. The WordNet Documentation is sparse, and will mostly direct you to play with the command line tools.

procedure

(limited-search-type? x)  boolean?

  x : any/c
Excludes the following symbols from search-type?: 'see-also, 'pertains-to 'verb-group 'polysemy 'frames 'relatives 'keywords-by-substring 'overview The "find-the-info-ds" function only accepts limited-search-type? values.

procedure

(c-synset? x)  boolean?

  x : any?
Values returned by the low level search function in the WordNet library. Results are pointers to a structure of this type. findTheInfo_ds in t

5.2 Search Functions

procedure

(find-the-info search-str    
  part-of-speech    
  search-type    
  sense-id)  (or/c string? #f)
  search-str : string?
  part-of-speech : part-of-speech?
  search-type : search-type?
  sense-id : non-negative-integer?
Finds the information about a word and returns it in the form of a string. Search results are automatically formatted, and the formatted string is returned. search-str is either the word, or a collocation (words conjoined by "_") to search for. Available search-types can be queried by calling available-search-types. sense-id is a non-negative integer indicating which sense is sought. Using 0 returns results for all senses.

procedure

(available-search-types string 
  part-of-speech) 
  (list-of search-type?)
  string : string?
  part-of-speech : part-of-speech?
Returns the types of searches that are available for a give string. It only return non-recursive versions of search types.

procedure

(find-the-info-ds search-str    
  part-of-speech    
  search-type    
  sense-id)  (or/c c-synset? #f)
  search-str : string?
  part-of-speech : part-of-speech?
  search-type : limited-search-type?
  sense-id : non-negative-integer?
Finds the information about a word and returns it in the form of a list of synsets. search-str is either the word, or a collocation (words conjoined by "_") to search for. Available search-types can be queried by calling available-search-types, but note that they must be of "limited-search-type?". sense-id is a non-negative integer indicating which sense is sought. Using 0 returns results for all senses.

syntax

(in-senses c-synset-ptr)

 
  c-synset-ptr : (or/c c-synset? #f)
A form intended to be used in for comprehensions to iterate over all the senses of a synset

syntax

(in-results c-synset-ptr)

 
  c-synset-ptr : (or/c c-synset? #f)
A form intended to be used in for comprehensions to iterate over all the results in a sense of synset

syntax

(in-words c-synset-ptr)

 
  c-synset-ptr : (or/c c-synset? #f)
A form intended to be used in for comprehensions to iterate over all the words in a synset

5.3 Example for iteration forms

The following example illustrates how to navigate a synset result and extract the returned values.
(define (hypernyms word part-of-speech search-type)
  (let ([synset (find-the-info-ds word part-of-speech 'recursive-hypernym 0)])
    (remove-duplicates
       (for*/list ([sense (in-senses synset)]
                   [result (in-results sense)]
                   [word  (in-words result)])
          word))))

5.4 The c-synset data-structure

The c-synset data structure is defined as follows. For each of the following fields, a field accessor called c-synset-<field-name> is defined and can be used to access data from returned C pointers. Refer to the FFI documentation for more information.

(define-cstruct _c-synset
    ([here-i-am                   long]                    ; current file position
     [synset-type                 _adjective-markers]       ; type of ADJ synset
     [file-num                    int]                     ; file number that synset comes from
     [part-of-speech              string]                  ; part of speech
     [word-count                  int]                     ; number of words in synset
     [c-words                     _string-pointer]          ; words in synset (pointer to string)
     [lex-id                      _int-pointer]             ; unique id in lexicographer file (pointer to int)
     [wn-sense                    _int-pointer]             ; sense number in wordnet (pointer to int)
     [which-word                  int]                     ; which word in synset we're looking for
     [pointer-count               int]                     ; number of pointers
     [pointer-type                _int-pointer]             ; pointer types (pointer to int)
     [pointer-offsets             _long-pointer]            ; pointer offsets (pointer to long)
     [pointer-part-of-speech      _int-pointer]             ; pointer part of speech (pointer to int)
     [pointer-to                  _int-pointer]             ; pointer 'to' fields (pointer to int)
     [pointer-from                _int-pointer]             ; pointer 'from' fields (pointer to int)
     [verb-frame-count            int]                     ; number of verb frames
     [frame-ids                   _int-pointer]             ; frame numbers (pointer to int)
     [frame-to                    _int-pointer]             ; frame 'to' fields (pointer to int)
     [definition                  string]                  ; synset gloss (definition)
     [key                         uint]                    ; unique synset key
     [next-synset                 _c-synset-pointer/null]   ; ptr to next synset containing searchword (pointer to synset)
     [next-form                   _c-synset-pointer/null]   ; ptr to list of synsets for alternate spelling of wordform (pointer to synset)
     [search-type                 _search-type]             ; type of search performed
     [pointer-list                _c-synset-pointer/null]   ; ptr to synset list result of search (pointer to synset)
     [head-word                   string]                  ; if pos is "s", this is cluster head word
     [head-sense                  short]))                 ; sense number of headword