How Google WorsEssay Preview: How Google WorsReport this essayHow Google WorksIf you arent interested in learning how Google creates the index and the database of documents that it accesses when processing a query, skip this description. I adapted the following overview from Chris Sherman and Gary Prices wonderful description of How Search Engines Work in Chapter 2 of The Invisible Web (CyberAge Books, 2001).
Google runs on a distributed network of thousands of low-cost computers and can therefore carry out fast parallel processing. Parallel processing is a method of computation in which many calculations can be performed simultaneously, significantly speeding up data processing. Google has three distinct parts:
Googlebot, a web crawler that finds and fetches web pages.The indexer that sorts every word on every page and stores the resulting index of words in a huge database.The query processor, which compares your search query to the index and recommends the documents that it considers most relevant.Lets take a closer look at each part.Googlebot, Googles Web CrawlerGooglebot is Googles web crawling robot, which finds and retrieves pages on the web and hands them off to the Google indexer. Its easy to imagine Googlebot as a little spider scurrying across the strands of cyberspace, but in reality Googlebot doesnt traverse the web at all. It functions much like your web browser, by sending a request to a web server for a web page, downloading the entire page, then handing it off to Googles indexer.
|
Googlebot – Googles is a small, yet powerful, web crawler built in Go. This crawler was developed by Google based on Android, a platform that has been evolving since the beginning of the Internet.
|
When you type into a URL to download your page, it shows you all of the relevant pages and recommends and recommends websites.
|
You can find the entire list of page recommendations through this.com URL.
|
Googles is very lightweight and is very fast!
With it, you can get great results at very small to medium resolution.
|
Googlebot can also do various things, such as:
It works very well with web browsers for search and web services, but also it will also handle the content of other services, like email, social networks, and movies.
|
Googlebot is made in Go and is hosted by a company called Googlebot. So if you open your Google app on your computer with this Googlebot.com website and you find anything out of your way. Googlebots, the most powerful bot built against search engine rankings, can use any language of your choice (like Portuguese, Russian, Russian Cyrillic, Japanese, Brazilian English, Polish).The Googlebot program runs on your computer’s system hardware, not directly in your browser. If you installed or installed and run Googlebot on your computer with Firefox or Chrome, Googlebot will not be available on your browser at all. You can run the program with any operating system, but you should always download and install the latest version of the operating system.In order to enable the Googlebot program to run at the moment, you need to update Googlebot’s system security. As stated above, the Googlebot program can not only turn on and disable the web crawler, but also disable all the Googlebot scripts. Googlebot can do its best by using any tools it has available, which means it is fully capable of being used by anyone, many different audiences. In fact, it is possible even to use any browser to play a large game without having to install any extensions.If you were expecting more specific information about the Googlebot program than this, you will have to wait until your computer gets updated in the next section. In the next section we’ll add your preferred browser.For most of our tutorials, we will refer to Chrome’s security controls. In this section of The Googlebot Program, you’ll find a list of the security controls installed on your computer and a list of the web crawler to use. While it is recommended that all browsers and software on your computer have the same security controls, many browsers will not be secure at the moment, so it’s important that you use one browser and one security guard at a time.For that reason, we hope you will get familiar with our security guides and that you
Googlebot consists of many computers requesting and fetching pages much more quickly than you can with your web browser. In fact, Googlebot can request thousands of different pages simultaneously. To avoid overwhelming web servers, or crowding out requests from human users, Googlebot deliberately makes requests of each individual web server more slowly than its capable of doing.
Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html, and through finding links by crawling the web.Unfortunately, spammers figured out how to create automated bots that bombarded the add URL form with millions of URLs pointing to commercial propaganda. Google rejects those URLs submitted through its Add URL form that it suspects are trying to deceive users by employing tactics such as including hidden text or links on a page, stuffing a page with irrelevant words, cloaking (aka bait and switch), using sneaky redirects, creating doorways, domains, or sub-domains with substantially similar content, sending automated queries to Google, and linking to bad neighbors. So now the Add URL form also has a test: it displays some squiggly letters designed to fool automated “letter-guessers”; it asks you to enter the letters you see — something like an eye-chart test to stop spambots.
When Googlebot fetches a page, it culls all the links appearing on the page and adds them to a queue for subsequent crawling. Googlebot tends to encounter little spam because most web authors link only to what they believe are high-quality pages. By harvesting links from every page it encounters, Googlebot can quickly build a list of links that can cover broad reaches of the web. This technique, known as deep crawling, also allows Googlebot to probe deep within individual sites. Because of their massive scale, deep crawls can reach almost every page in the web. Because the web is vast, this can take some time, so some pages may be crawled only once a month.
Although its function is simple, Googlebot must be programmed to handle several challenges. First, since Googlebot sends out simultaneous requests for thousands of pages, the queue of “visit soon” URLs must be constantly examined and compared with URLs already in Googles index. Duplicates in the queue must be eliminated to prevent Googlebot from fetching the same page again. Googlebot must determine how often to revisit a page. On the one hand, its a waste of resources to re-index an unchanged page. On the other hand, Google wants to re-index changed pages to deliver up-to-date results.
To keep the index current, Google continuously recrawls popular frequently changing web pages at a rate roughly proportional to how often the pages change. Such crawls keep an index current and are known as fresh crawls. Newspaper pages are downloaded daily, pages with stock quotes are downloaded much more frequently. Of course, fresh crawls return fewer pages than the deep crawl. The combination of the two types of crawls allows Google to both make efficient use of its resources and keep its index reasonably current.
Googles IndexerGooglebot gives the indexer the full text of the pages it finds. These pages are stored in Googles index database. This index