Pimp My WordNet
WordNet may be the most widely used resource in computational linguistics. It's a great tool but far from perfect. Today I was reminded of this fact when trying to make use of its subcategorization frames. I started out trying to use the NLTK's interface to WordNet.
The NLTK's documentation of its WordNet interface provides a simple example of how to extract the subcat frames for the verb fly with some Python code:
>>> import nltk >>> w = nltk.wordnet.V['fly'] >>> s = w.synsets()[0] >>> s.verbFrameStrings ['Something fly', 'Somebody fly', 'Something is flying PP', 'Somebody fly PP']
So far, so good. But when I tried using this technique to extract the subcat frames for a longish list of verbs, things went south. I'll illustrate with a colorful verb popularized by MTV:
>>> import nltk >>> w = nltk.wordnet.V['pimp'] >>> s = w.synsets()[0] >>> s.verbFrameStrings ['Somebody pander']
I thought I would get the subcat frames for the verb pimp but I got the subcat frames for a different verb, pander. What's going on here?
The problem is that WordNet provides a subcat frame for a synset, rather than a particular verb:
>>> nltk.wordnet.V['pimp'].synsets()
[{verb: pander, pimp, procure}]
Although only one subcategorization frame (intransitive) is provided for pander, pimp, and procure, there are others, which aren't shared by all three verbs:
- The PR firm is pimping (*pandering/*procuring) out their client's product.
- John McCain is pandering (*pimping/*procuring) to religious conservatives.
- I want MTV to pimp (*pander/*procure) my ride.
These examples show that the verbs in a synset don't necessarily share the same subcat frames. As a result, a system that provides subcat frames for synsets, rather than particular verbs, is going to be of limited utility.