Tuesday, November 16, 2010

Search the blade of the Pao Ding (continued)

The future of search how to compensate for the deficiency of PageRank outstanding? when searching from the page-level promotion to block-level, will undoubtedly bring a efficiency revolution.

Author: the Internet Weekly reporter Yang

(Above)

A website is like a cow.

It has a head and a tail that can be divided into different zones. When you browse a page, often look to lock in the most important area, instead of first browse navigation, advertisements, copyright information and other information block. Along the way, you can further enhance the search for precision. Speed upgrade

This technology is born for the search term is a fundamental change.

It addition to link analysis, to compensate for the deficiency of GooglePageRank algorithm, it is able to image search fields for valid application. Prior to this, the image search in the page grab a picture, but also on the page look for the interpretation of this picture's context. But with this technology you can greatly enhance efficiency, can directly in the picture to appear in the block, and remove the description text under the picture from the importance of the block, to determine the importance of this picture. In addition, at the made similar information extraction, you can use this technology.

Since 2001, the text following the wing fellows with two students total spent over two years of the technology.

In fact, the initial ideas determine after a few months to write the algorithm. But this algorithm in the actual tests are often impassable "run." "You can't imagine, page internal code nested even up to 100 floors. "Man jirong laughs," but this is the actual situation on the Internet. Whether a Web page and write how rotten, we also take care of the algorithm. "Next time, they are a million-page level on the algorithm has been tested and refined. In the end, when Microsoft in international academic conference on this technology, caused no small sensation.

But when it come to practical application, and generate a new problem — too slow.

Usually the browser to render a Web page requires a few hundred milliseconds. This time appears to be very short, but when you have to deal with billions of pages is, NT, the speed becomes intolerable. As part of Microsoft's headquarters to be used in the search engines, product departments will be shortened to a few milliseconds or less. This had to be a challenge. Under pressure, Microsoft researchers came up with a solution: put aside the IE browser's rendering engine, I have written a relatively simple and refined rendering engine. Because of this, only need to pass the rendering engine for the output of the data structure, such as length, width, coordinates, and so on, but does not need to consider the final page rendering effects, JavaScript and other scripts but also no filtering. Through this way, each page of the average processing time was successfully shortened to 2 milliseconds or less.

What is impossible as soon as you want.

No comments:

Post a Comment