Tuesday, November 16, 2010

Find your own keyword (s)

Author: LAU Tit rock

In the Institute work fast for 5 years, electronic engineering descent I will and SIGIR Note 1, the information retrieval field of top-level meetings in so close.

From 2004 to 2008, themselves in information retrieval this direction across the road, but also in the process of Microsoft Research Asia growth: information retrieval from familiar with this area, tailored to cast the first thesis, to improve the SIGIR research and writing skills, to determine their own main direction, to lead a study school.

Period of harvest and feel quite a lot, write down a wish to share with you.

First year: "published first SIGIR thesis"

I graduated from the Department of electronic engineering, Tsinghua University, doctoral thesis work was on video signal processing, such as video cutting, keyframe extraction, video lessons, etc.

2003 joined Microsoft Research Asia, 2004 into Internet search and mining group, began to research this new areas of exploration.

This line is not so difficult to imagine, because Microsoft research in information retrieval areas already have a lot of results, and also the SIGIR published many papers.

Having such a good one platform, you can communicate with colleagues quickly enter the State.

But the process is not easy, after all, the information retrieval field decades of historical precipitation a lot of knowledge and experience, takes a little bit to understand and grasp.

For faster and better grasp of this knowledge, I and my students are together, the Group launched a series of lectures, including the modern information retrieval "," Optimization "," statistical machine learning, etc. Experience has shown that this approach is very effective: its own book learning is a feeling, to be able to put things in front of a thorough speak out, is another realm. Although had to spend a lot of effort, but the process for me and my students will research the information retrieval field laid a solid theoretical basis.

In improving the basics, we start by reading the papers, as well as exchanges and colleagues to understand this Conference SIGIR.

The desire is simple: as soon as possible, like other colleagues, in the top-level academic conference SIGIR has published papers. By reading the paper, I gradually found SIGIR is actually a very traditional, attaches great importance to experience the results of the meeting. SIGIR papers usually have very good results, because the only way to verify that the algorithm in mass information processing is above-average performance. As this area of the first attempt, I decided to "coming out" as SIGIR "tailor-made" for an article on the experience of the comparison.

At that time, the Institute was participating in the TREC note 2 game.

This game has a task called TopicDistillation, its purpose is to find the most relevant query topic subsites entry, meaning even if sometimes the child page more relevant than the parent page, we still want to return to the parent page. To solve this problem, we put the page keywords according to the structure of the site transmitted to the parent page. After verification, this method is very effective. So I think, isn't there also other similar approach? in addition to keywords, we can put the page relevance score (relevancescore) to propagate? in addition to the outside along the site structure, we can also carry out along the hyperlink structures dissemination? with this idea in the future, our past related literature conducted research and found that indeed it was done the relevance score spread along the hyperlink. This inspired me to various communication mentioned above for a comparative study of the system. So I put all of the methods listed, classified, and a lot of comparison, and eventually got a lot of interesting results. I follow your own summary of SIGIR's "paradigm text", compare results and write a thesis, submitted to the SIGIR2005. Finally this article being hired. Although some lucky constituents, but anyway, through the "imitation," my journey to LXXXVI SIGIR.

Second year: "Mastering the ability to" achievable

Published the first article is important, but how lucky factors, truly persistent theses published SIGIR's strength is more important.

In this context, Microsoft Research Asia international platform gave me great help. Every year, the Institute will attract a large number of foreign scholars to access Exchange, I was acknowledged with the chanceS a Professor Yang Yi isn't.

Yang Yi-Professor is the United States isn't the Carnegie Mellon University Professor, is an expert in the field of text categorization.

I had the privilege to visit her and her cooperation during the Academy had a paper. When I write the first draft that she modify her back and forth and I discussed 5 x "introduction," how to write. In fact, she can directly post this chapter changing for the better, spent much less strength. But Mr teacher patiently gave me advice, make my own little modification. This process made me realize that there is a good technique, but also clearly and accurately express right to highlight the contribution of their own. This is my future writings, as well as to students read papers are a great help. Still very envious of Mr teacher realm: "thesis is actually a very enjoy things, write like the clear spring water, help to make it good to share research results and others."

And Mr teacher collaboration in SIGKDDExplorations note 3 published an article on the large-scale text classification of papers, I started preparing for the next annual independent SIGIR paper stage.

But this time clearly feel different: no longer is tailored to an article while looking for a topic, it is built around their own are doing research on the topic and writing papers.

This time I am preparing two articles a story is based on random fill network map, another article about document retrieval of new algorithms.

They are not a paper about the comparison of experience, nor does it like the first years in accordance with SIGIR's "paradigm" of text, but these two articles have been SIGIR2006 hired.

After this process, I feel really started: at least know what kind of work is this area really recognized SIGIR, also know how to write a paper with your own style.

Third year: "find your keywords"

Join two 3 's article was published in the paper, in fact, SIGIR is not an easy thing, because the competition is very fierce, each year globally only included dozens of articles, and will undoubtedly come from the United States accounted for the vast majority of papers.

Therefore, I slowly by some academics recognised outside, also exposed to more peers and friends.

Once the meeting time, and a few colleagues in a chat, introduce their research interests.

To me, discovery can only be used in "information retrieval" such a big word to describe, because they've done 3 article SIGIR papers associated is not big, it is difficult to find a more appropriate description. A friend said: you have your own keywords, such as the United States University of Illinois at Urbana-Champaign Campus of Zhai teacher Champagne of keywords is the language model, Carnegie Mellon's Yang teacher of keywords is the text category, your keywords?

This gave me great touches.

Think about it, really renowned scholars tend to have their own chengmingzhizuo, there are a set of directions. But my present state seems to be a bit in order to send papers and papers, not really planning their own direction. If you go on like this, probably the next few years I will publish more SIGIR papers, but when asked again by others of the same issues, I still can't avoid this embarrassment. So, I decided to concentrate, as influential as their keyword research direction.

Me and my manager has undertaken a long talk.

Conversation, on the one hand, he asked me to emphasize the Microsoft Research Asia open research atmosphere, I expressed the great support; on the other hand, and I shared a lessismore "truth", and I worked together to analyse and identify the main lines of research. Considering that my math Foundation is solid, on machine learning and optimization theory, taking into account whether the information retrieval field or on the Microsoft search engine, sort (ranking) is a core issue, we eventually put the study focused on the sort of learning (learningto rank).

On this basis, I own and interns of direction made a larger adjustment: everyone's research interests are mounted around the sort of learning, such as: sorting learning loss function studies, based on the plane of the sort method sorts the learning to learn, feature selection, based on the sort sequence of fusion of learning and so on.

We also continue in SIGIR2007 published three papers. The three essays as are on the sort of learning that are arranged in the same branch on propaganda. The chapter on total only 4 articles, so we're doing a lot of attention. I also has the ownKeywords: sort learning.

After the meeting, I was invited to become senior program Committee SIGIR2008 members, as well as the international journal "information retrieval on the Editorial Board, from one of the participants in the field of information retrieval into Organizer.

Fourth year: "in order to lead a school and work."

Microsoft has an internal tutorial system, encourage senior staff as trainers of young workers, their growth for help and guidance.

I am very fortunate, Manager of Sen., RakeshAgrawal note 4, the most successful data mining areas, in the end of 2007 became my mentor. I also clearly remember me and Rakesh interviews, he studies views gave me a great shock. For example, "write the article is not for the moment been meeting included, but in order to promote the development of academic direction, to a certain school, at least in 10 years will have a profound impact"; another example, "people always forget your good essay, and bearing in mind that you are not a good thesis, prestige to 10 years to accumulate but can be destroyed, so be very seriously his every paper to ensure that quality."

And Rakesh exchanges made me realize there's a keyword is still far from enough, the keywords need to represent myself-led school.

With this in mind, I am in the Manager's help on research topics for new examination, and on the research process for better quality control. I and my partner were now in order to lead the "list-level sort learning (listwiseapproach to learning to rank)," this is our own school of thought and effort.

Fortunately, we in this direction have already yielded results.

For example, we have published on SIGIR2008 3 article related papers, also in the ICML note 5 published 2 article on "the list level of the sort of learning" theoretical articles, discussed the columns don't level sort learning statistics consistency and generalization performance. In addition to published papers, we organize by SIGIR Workshop, publishing Benchmark data sets, SIGIR and WWW note 6 and other top-level meeting do seminars promote "list level sort learning."

Our research results have been more and more attention, but we know that the forward to go a long way to go.

However, Microsoft Research Asia on this platform, we are confident that we can further to promote the progress of the sort fields of study, as well as the development of the entire SIGIR.

The author describes

LAU Tit rock, 2003 Ph.d. Tsinghua University, in the same year joined Microsoft Research Asia, information retrieval and mining group of competent researchers.

His research interests include scheduling theory, algorithms and systems. He has been in international journals and conferences by nearly 70 academic papers, with nearly 40 patents or applications. He is the international journal "Visual Communications and image expression" grant 2004 ~ 2006 annual maximum reference paper award, was awarded best student thesis SIGIR2008 award. He is one of dozens of International Conference Program Committee member and International Journal editorial board. His style is combined with information retrieval application needs, proposing new research direction and effective solutions and rigorous analysis.

Note 1, SIGIR: Special InterestGroup on Information Retrieval, International Conference on information retrieval

Note 2, TREC: Text REtrieval Conference, international text retrieval Conference.

Note 3, SIGKDD Explorations: is ACM Special interest group on data mining publications that focus on data mining of frontier problems, one year general publishing two topics.

Note 4, Rakesh Agrawal, introduced in 1994, Apriori algorithm makes the Association rule mining technology availability have been greatly improved.

United States engineering academician, known as data mining the Godfather, the current field is a Microsoft technical fellow of the Institute of Silicon Valley.

Note 5, ICML: International Conference on Machine Learning, International Conference on machine learning, this area one of the top-level international meetings.

Note 6, WWW: World Wide Web, the International World Wide Web Conference.

2008 year of the 17th World Wide Web Conference was held in Beijing for the first time.

No comments:

Post a Comment