« Quibbling over Semantics | Main | Powerset Acquired By Microsoft »

Pimp My WordNet

WordNet may be the most widely used resource in computational linguistics. It's a great tool but far from perfect. Today I was reminded of this fact when trying to make use of its subcategorization frames. I started out trying to use the NLTK's interface to WordNet.

The NLTK's documentation of its WordNet interface provides a simple example of how to extract the subcat frames for the verb fly with some Python code:

>>> import nltk
>>> w = nltk.wordnet.V['fly']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Something fly', 'Somebody fly', 'Something is flying PP', 'Somebody fly PP']

So far, so good. But when I tried using this technique to extract the subcat frames for a longish list of verbs, things went south. I'll illustrate with a colorful verb popularized by MTV:

>>> import nltk
>>> w = nltk.wordnet.V['pimp']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Somebody pander']

I thought I would get the subcat frames for the verb pimp but I got the subcat frames for a different verb, pander. What's going on here?

The problem is that WordNet provides a subcat frame for a synset, rather than a particular verb:

>>> nltk.wordnet.V['pimp'].synsets()   
[{verb: pander, pimp, procure}]

Although only one subcategorization frame (intransitive) is provided for pander, pimp, and procure, there are others, which aren't shared by all three verbs:

  • The PR firm is pimping (*pandering/*procuring) out their client's product.
  • John McCain is pandering (*pimping/*procuring) to religious conservatives.
  • I want MTV to pimp (*pander/*procure) my ride.

These examples show that the verbs in a synset don't necessarily share the same subcat frames. As a result, a system that provides subcat frames for synsets, rather than particular verbs, is going to be of limited utility.

TrackBack

TrackBack URL for this entry:
http://prospero.bluescarf.net/cgi-bin/mt/mt-tb.cgi/106

Comments (1)

I wonder where they got the pander/pimp sense in the first place. I can't think of any examples where that works, can you? WordNet certainly isn't perfect (and of course, they need more funding to keep expanding it).

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on June 3, 2008 7:56 PM.

The previous post in this blog was Quibbling over Semantics.

The next post in this blog is Powerset Acquired By Microsoft.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35