Une courte biographie d'un chercheur travaillant chez Google Inc.

par **Napoléon** Jeu 19 Mai - 14:56

Source: [Vous devez être inscrit et connecté pour voir ce lien]

My research interests are in the area of information retrieval
(IR), its application to web search, web graph analysis, and user interfaces for search.
Here are some of my selected
publications (chronologically ordered). At Google I have worked on
using IR techniques to improve web search. Before joining Google in
2000. I did research in the following sub-areas of Information
Retrieval:

Speech Retrieval: Increasing amounts of spoken
communication are stored in digital form for archival purposes (for
instance, broadcasts material). With advances in automatic speech
recognition (ASR) technology, it is now possible to automatically
transcribe speech with reasonable accuracy. Once transcribed, IR
methods can be used to search speech collections. Think of this as a
search engine for speech. However, the interesting problem is to
search speech given large number of automatic speech recognition
errors. More recently I have done some work in this area. When at
AT&T Labs, we developed SCAN, a system that combines speech
recognition, information retrieval and user interface techniques
to provide a multimodal interface to speech archives.
Document Ranking: Also called text/document
searching/retrieval (that makes four phrases by the way), this is the
best known part of our field. If you are reading this page, chances
are that you have already used a "search engine" before. Document
ranking is what search engines do: given a user query, how to rank a
large collection of documents (web pages, news articles, your email,
someone else's email that you happen to have hacked, ...) so that what
you are looking for is ranked ahead of other less useful (or useless)
documents.
Question Answering: People have questions and they need answers,
not documents. Automatic question answering will definitely be a
significant advance in the state-of-art information retrieval
technology. Systems that can do reliable question answering without
domain restrictions have not been developed yet.

I organized the first few runnings of the QA Track
under the Text REtrieval Conference
(TREC) umbrella to advance this sub-field of language
processing.
Document Routing/Filtering: This is the "query by example"
version of document ranking. Once you point the system to a few "good
documents", the system then tracks all NEW documents and points you to
only those ones that you should be looking at. Typically the system
tries to find new documents that are similar to the documents that you
said were good.
Automatic Text Summarization: Documents are huge and we
don't always want to read them all. (I don't know about you but I
certainly don't have the patience. And given the stuff you find on the
web ...) Techniques that automatically "summarize" documents will be
tremendously useful. Domain independent text summarization is very
hard, at times even for humans; typically machines do summarization by
text extraction. Relevant pieces (sentences, paragraphs, ...) of text
are typically extracted and presented as a "summary".
Miscellaneous (TREC): Since 1992 National Institute of Standards in
Technology (NIST) (along with DARPA) sponsors an annual conference
called Text REtrieval Conference
(TREC) to support research within the information retrieval
community by providing the infrastructure necessary for large-scale
evaluation of text retrieval methodologies. I have been actively
participating in TRECs since TREC-3 (held in 1994).

Brief Bio

I was born in India in the state of Uttar Pradesh
(Hindi, my native language, for "Northern State"). I spent most of my
boyhood in the foothills of the Himalayas. I got a BS degree in
Computer Science from University of Roorkee (now IIT Roorkee) in India, a MS, Computer
Science again, from University of Minnesota (somehow, back then, I always found
myself in cold places) and a PhD in Computer Science from Cornell
University. At Cornell I studied with (late) Prof. Gerard Salton, one of the founders of
the field of IR. Somewhere between my degrees I had real jobs doing
database programming and IR system hacking. After my PhD I joined AT&T
Labs in 1996. In 2000, my friend Krishna Bharat persuaded me to join Google.