« May 2008 | Main | July 2008 »

June 2008 Archives

June 3, 2008

Pimp My WordNet

WordNet may be the most widely used resource in computational linguistics. It's a great tool but far from perfect. Today I was reminded of this fact when trying to make use of its subcategorization frames. I started out trying to use the NLTK's interface to WordNet.

The NLTK's documentation of its WordNet interface provides a simple example of how to extract the subcat frames for the verb fly with some Python code:

>>> import nltk
>>> w = nltk.wordnet.V['fly']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Something fly', 'Somebody fly', 'Something is flying PP', 'Somebody fly PP']

So far, so good. But when I tried using this technique to extract the subcat frames for a longish list of verbs, things went south. I'll illustrate with a colorful verb popularized by MTV:

>>> import nltk
>>> w = nltk.wordnet.V['pimp']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Somebody pander']

I thought I would get the subcat frames for the verb pimp but I got the subcat frames for a different verb, pander. What's going on here?

The problem is that WordNet provides a subcat frame for a synset, rather than a particular verb:

>>> nltk.wordnet.V['pimp'].synsets()   
[{verb: pander, pimp, procure}]

Although only one subcategorization frame (intransitive) is provided for pander, pimp, and procure, there are others, which aren't shared by all three verbs:

  • The PR firm is pimping (*pandering/*procuring) out their client's product.
  • John McCain is pandering (*pimping/*procuring) to religious conservatives.
  • I want MTV to pimp (*pander/*procure) my ride.

These examples show that the verbs in a synset don't necessarily share the same subcat frames. As a result, a system that provides subcat frames for synsets, rather than particular verbs, is going to be of limited utility.

About June 2008

This page contains all entries posted to Nerd Industries: Stuart Robinson's blog in June 2008. They are listed from oldest to newest.

May 2008 is the previous archive.

July 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35