| An Internet search engine is a software programme specially-designed to search for information on the World Wide Web. The search results are normally provided in the form of a list and are normally called hits. The data may consist of web pages, images, information and other types of files. Some search engines also collect data available in databanks or open directories. If compared with Web directories which are maintained by human editors, search tools function algorithmically or are a mixture of human and algorithmic input.
Web search engines work by storing data about a huge number of web pages which they retrieve from the INTERNET. These pages are retrieved by a web crawler, also known as a spider. It is an automatically-controlled Web browser that follows every link it discovers. Afterwards the content of each page is analyzed to decide how to index it. Words, for example, are extracted from titles, headings and subheadings or special fields called meta tags. Data about web pages are saved and stored in an index database for further use in queries. Some search tools, such as Google, store the whole or part of the source page (known as a cache) and data about web pages, while others, such as AltaVista, store every word of every page they have discovered. This cached page always holds the actual search text, because it is the one that was actually indexed. Consequently, it can be very helpful when the content of the current page has been updated and the search terms are no longer in it.
As soon as an Internet user has typed key words in the search field, the tool carries out checks on its index and provides a listing of best-matching web pages in accordance with its criteria, usually with a brief summary combined with the title of the document and sometimes extracts from the text. Some search engines have introduced an advanced tool called proximity search that allows users to define the distance between key words.
The usefulness of a search engine rests on the relevancy of the result set it gives back. Since there can be millions of web pages containing a particular search term or word combination, some pages may appear to be more relevant and popular than others. The results can be ranked to show the "best" ones first.
The way a search software program ranks web pages is specific to a search engine. The techniques also alter over time, since the use of Internet services changes and advanced techniques become available. |