October 23, 2008

The National

It's hard to overstate just how good The National's albums Alligator and Boxer are. I don't think I've been so impressed since a friend of a friend recommended The Walkmen's debut album Everyone Who Pretended To Like Me Is Gone. As he said, "They blew me top off."

Why music like this never seems to penetrate the air waves is beyond me. Mainstream radio sucks. Thank goodness for the long tail.

August 29, 2008

Radiohead at the Hollywood Bowl, thanks to Original Penguin

A few weeks ago, while on my way down to San Diego for vacation, I visited the Original Penguin Store in LA with my wife, Rupe. I entered a drawing sponsored by Filter Magazine and Original Penguin for 2 tickets to the Radiohead concert at the Hollywood Bowl. Even though I desperately hoped I'd win, I wasn't holding my breath. (After all, winning contests is something other people do. Not me.) But the following week I got a voicemail from the store manager informing me that I'd won. I'm a winner! (Or, to quote Radiohead, "I'm on a roll, I'm on a roll, this time, I feel my luck could change.")

So Rupe and I drove back down to SoCal again. On the Sunday of the concert, I dropped by the store to pick up my tickets and spend my gift certificate. The folks at the store (Kenny, on the left in the first photo, and Heather, in the middle) were incredibly nice and the concert was fantastic. (Good weather, good venue, great show.) It was my third time seeing Radiohead live and they didn't disappoint.

Thanks to Filter and Original Penguin for a near perfect day.



July 13, 2008

FolderFest

Last night I was browsing through the SFBC's list of bike-related events, Chain of Events, and realized that FolderFest had happened earlier in the day, and that I had missed it! But at least I could enjoy the event second-hand through the web thanks to some photos and a video on YouTube. Next time I'll be there.

July 2, 2008

Powerset Acquired By Microsoft

The rumors had it right. Microsoft has acquired Powerset. The official announcements can be found on Live Search's blog and Powerset's. This is a huge endorsement of the technology that we've been building, and I'm excited about the future.

June 3, 2008

Pimp My WordNet

WordNet may be the most widely used resource in computational linguistics. It's a great tool but far from perfect. Today I was reminded of this fact when trying to make use of its subcategorization frames. I started out trying to use the NLTK's interface to WordNet.

The NLTK's documentation of its WordNet interface provides a simple example of how to extract the subcat frames for the verb fly with some Python code:

>>> import nltk
>>> w = nltk.wordnet.V['fly']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Something fly', 'Somebody fly', 'Something is flying PP', 'Somebody fly PP']

So far, so good. But when I tried using this technique to extract the subcat frames for a longish list of verbs, things went south. I'll illustrate with a colorful verb popularized by MTV:

>>> import nltk
>>> w = nltk.wordnet.V['pimp']
>>> s = w.synsets()[0]
>>> s.verbFrameStrings
['Somebody pander']

I thought I would get the subcat frames for the verb pimp but I got the subcat frames for a different verb, pander. What's going on here?

The problem is that WordNet provides a subcat frame for a synset, rather than a particular verb:

>>> nltk.wordnet.V['pimp'].synsets()   
[{verb: pander, pimp, procure}]

Although only one subcategorization frame (intransitive) is provided for pander, pimp, and procure, there are others, which aren't shared by all three verbs:

  • The PR firm is pimping (*pandering/*procuring) out their client's product.
  • John McCain is pandering (*pimping/*procuring) to religious conservatives.
  • I want MTV to pimp (*pander/*procure) my ride.

These examples show that the verbs in a synset don't necessarily share the same subcat frames. As a result, a system that provides subcat frames for synsets, rather than particular verbs, is going to be of limited utility.

May 18, 2008

Quibbling over Semantics

It's a sign of the generally low opinion that the public holds of semantics that whenever a discussion of meaning comes up, it can be dismissed as "just semantics". But semantics is important stuff, and I think the world is wising up to the fact that we need machines that can reckon in terms of it. I've been reading the press surrounding the launch of Powerset over the last week and people seem to be struggling to come to terms with this technology, literally. What I mean is that people can't seem to agree on what term to use for it: an understanding engine, a contextual search engine, a natural language processing (NLP) search engine, etc. And let's not forget about the term semantic search. According to the answers provided to an on-line quiz, the Industry Standard considers Powerset to be "the first semantic search engine"!

Powerset has launched

It's finally happened. After many long hours and much facial hair madness (featured in the LA Times), a beta version of the Powerset search engine is now live at powerset.com. I've been working at Powerset for over a year now, and it's great to see this come to fruition. We had a launch party on Thursday. And I think a good time was had by all. (Okay, maybe not all. There was one poor fellow doubled over a bench outside, vomitting. But certainly most.) Now it's back to the trenches (or, as the Dutch say, terug aan de slag).

April 11, 2008

Autism and vaccinations

The idea that autism is caused by vaccinations has been floating around in the media recently. The first episode of the television series Eli Stone revolved around the issue (NY Times) and recently I saw an episode of Larry King Live in which Jenny McCarthy very aggressively asserted a strong link between vaccinations and various health problems, including autism.

Let me say up front that I support additional research on autism, and I don't reject out of hand a possible link between the two. However, I got the impression from McCarthy's behavior on Larry King that the case is already closed in her mind. That's troubling, because the issue has wide-reaching public policy implication. And the science is by no means clearcut, as shown by a recent article in The Economist.

The article provides some evidence that the increase in autism diagnosis rates are not just because autism is on the rise. It's also because there is an increased awareness of the disease and therefore fewer misdiagnoses of autism as something else (e.g., a specific language impairment, for example). In other words, it isn't necessarily the case that autism is skyrocketting. Maybe doctors are just getting better at recognizing it. (It's an open question.)

But the bigger problem is that, if it can be established definitely that vaccinations cause autism in some cases, and there is no way of changing the vaccinations to reduce this risk (say, by eliminating some harmful ingredient), there is still a cost-benefits trade-off between the potential harm caused by vaccinating children and the potential harm caused by not vaccinating them, as Jenny McCarthy herself acknowledged (albeit in a somewhat simple-minded fashion) when she said, "An increase in the measles, I'll take that way over autism any day." But it's not just measles. It's hepatitis, tetanus, diphtheria, measles, smallpox, and polio (among others). Is the potential threat of autism so great that we're willing to put up with more children contracting these diseases instead? I guess that's the $64,000 question.

April 7, 2008

The Importantance of Computational Linguistics

This month's issue Computational Linguistics has a great article by Mark Steedman entitled "On Becoming a Discipline". In it, he talks about the success and prestige of physics compared to computational linguistics and provides an interesting analogy between the following propositions (the first being a quote from a physicist):

  1. "Everything is made of particles, so physics is very important."
  2. Human knowledge is expressed in language. So computational linguistics is very important.

Since I work as a computational linguist, I'm very sympathetic to the analogy. But I have to admit that it contains a few leaps of logic. Let's spell out the logic a bit more explicitly:

  1. Everything is made of particles. Physics is the study of particles, so physics is very important.
  2. Human knowledge is expressed in language. Computational linguistics is the study of language, so computational linguistics is very important.

When stated in this way, it's clear that there is a defect in (4). Computational linguistics is the study of language but only to the extent that linguistics is the study of language and computational linguistics is a subfield of linguistics. Therefore, the proposition is more correctly stated as follows:

  1. Human knowledge is expressed in language. Linguistics is the study of language, so linguistics is very important.

In order to justify the proposition that computational linguistics per se is very important, we would need to identify computational linguistics with linguistics or justify the importance of computational linguistics over other subfields of linguistics. That, I think, would be subject to controversy.

Web Advertising Gone Awry

This morning I heard about the protestors who unfurled pro-Tibet flags on the Golden Gate Bridge. I found an article about the event in the San Jose Mercury News. Amazingly, at the bottom of the article, there was an ad for Chinese weight loss secrets! Oops. Fail.