« The New York Times APIs | Main | Wolfram|Alpha: A New Kind of Search Engine »

Evri.com: Another Step Towards a Semantic Web

Another step towards a real semantic web has been taken recently with the incorporation of Evri.com's technology into articles from The Washington Post. (One small step for the semantic web, one giant leap for Evri.)

I originally learned about evri.com on Twitter. A colleague at Powerset, Will Fitzgerald, tweeted:

Evri (http::www.evri.com) looks mighty. (A company formed around what I do for my day job)

Yesterday another Tweet alerted me to the partnership with The Washington Post:

Evri is now showing up all over the washingtonpost.com on all articles published today. Like this one: http://is.gd/jshy

If you follow the link in the tweet, you'll find a Washington Post article that has a handy little widget providing information about some of the entities discussed in it.

You can poke around on the widget and start exploring. Want to know more about the House Appropriation Committee? It's only a click away.

Maybe you want to know about The White House. Again, just another click.

(Not all of the relationships are of the same semantic type. The last time I checked, Stevie Wonder, wasn't part of Obama's cabinet. The relationships are clusters, it seems, and not a more narrowly defined one such as cabinet membership.)

But the best collection of named entities isn't going to provide very valuable information if there isn't a way of aligning its contents with relevant entities in a the free text of newspaper articles. A good illustration of this was revealed a little while ago when Marshall Kirkpatrick (from ReadWriteWeb) commented on Google's exposure of semantic data, pointing out that it sometimes got things badly wrong, like stating that Jesus was born in 1963. The problem wasn't that Jesus wasn't born in 1963. The problem was that it wasn't the right Jesus. The query was about Jesus Christ, but they returned data for a different Jesus. (It reminds me of a joke I once heard about prison inmates finding religion. The punch line went something like: "That's not Jesus, our Lord and Savior. That's Jesús [Spanish pronunciation], your cellmates."

The point is that you need some reasonably sophisticated technology in order to identify entities in a document and then figure out which entities they correspond to in your database of known entities. The Jesus example shows that relying on first name alone isn't going to do the trick. Figuring out what will is going to be one of the major tasks that researchers in natural language processing will tackle in the coming years. It's a good time to be a computational linguist.

TrackBack

TrackBack URL for this entry:
http://prospero.bluescarf.net/cgi-bin/mt/mt-tb.cgi/128

Comments (1)

Deep:

Hi Stuart,

Thanks for exploring our technology and writing about us. I thought I would explain the Stevie Wonder situation. Unfortunately, its not always possible to get the perfect title/snippet combination revealing the relationship between two entities; there are multiple constraints including the available display size of the snippet. However, in this case, if you click on the top article result titled "Names & Faces", you will be taken to this URL: http://tinyurl.com/dhtg3u where the first sentence reveals the relationship between Stevie Wonder and the White House: "President Obama and first lady Michelle Obama plan to host a concert Wednesday in the White House's East Room in honor of Stevie Wonder, reports The Post's DeNeen Brown." While we could constrain these types of off beat relationships, we find that in many cases they are quite entertaining and readers enjoy encountering them, especially when they are timely and in the news zeitgeist. Hope that helps.

Kind regards,
Deep

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on February 14, 2009 7:15 PM.

The previous post in this blog was The New York Times APIs.

The next post in this blog is Wolfram|Alpha: A New Kind of Search Engine.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35