| Achim's profilePattern RecognitionBlog | Help |
|
|
December 09 My first conference paperI was excited to hear that the paper "Finding parallel texts on the web using cross-language information retrieval" authored by myself and Fei Xia was accepted for the cross-language information access workshop at the IJCNLP 2008 conference! I won't go to Hyderabad myself, but hope our results will help other researchers build parallel corpora. And if you are not a language researcher: "parallel corpora" is just a fancy term for translated texts in two languages (preferably lots of them). August 04 Google to release 5-gram language modelGoogle will release their 5-gram language model trained on a training corpus of about 1 trillion words. Wow! What I would like to know: is if this is only for English or also other languages?
They say you can use it no matter how small your computing resources are - that can be debated
Update: Unfortunately this n-gram model is only available for English right now. January 19 Software sucksA dense, but interesting essay by Jaron Lanier on the brittleness of software and what it means for its economics.
Most interesting for me is his take on comparing computer and natural languages: The degree to which human, or "natural" language is unlike computer code cannot be overemphasized. Language can only be understood by the means of interpretation, so ambiguity is central to its character, and is properly understood as a strength rather than a weakness. Perfect precision would rob language of its robustness and potential for adaptation. Human language is not a phenomenon which is well understood by either science or philosophy, and it has not been reproduced by technologies. Well, we are working on the later. He brings up the interesting question though on how the inherent imperfectness and ambiguity of language can be part of the solution rather than part of the problem in NLP applications. How can we leverage these properties to build robust NLP applications? November 19 Integrated speech recognition and MT for Iraq warVia Wired News: War-Zone Test for Babel-Fish Tool. This seems to be a good field test for integrating compling components to create something like a babel fish, but for me this still falls in the category "shoot first, ask questions later" category. I do not believe war is or should be the mother of invention. We need to find ways to use these technologies to avoid conflicts. November 18 R is for statisticsAn article over on O'Reilly Net about The R Project for Statistical Computing. Seems interesting as a visualization tool for some of the statistics we do in our projects in class. |
|
|