How does a Search Engine Work: A Simplified Guide
What happens when you search for a keyword on Google?
A whole bunch of results in less than a second!
But how does this happen?
I mean, we do know that there’s a sophisticated algorithm behind it. But, can we break it down to understand the ‘how’ of it?
Let’s give it a shot!
Search Engines, like Google, have a list of URLs that they’ve already collected (or indexed). Once your search query matches any URL, they get displayed on SERP (Search Engine Results Page).
They, however, get displayed in a particular order. We call it ‘Getting ranked,’ while Google refers to it as being ‘Served.’ Search engines decide which ones to rank against the search query based on several factors (more than 200 apparently).
Earlier, search engines would check the title, description, H1s and understand if a given search query is relevant and rank a list of URLs from its index.
That was the time when doing SEO meant stuffing keywords in Titles, description, page content, and sometimes even hidden under HTML code! But a lot has changed. Search Engines have become smarter, and they understand page content instead of blindly trusting website owners to tell them what the content is about.
Now, search engines are complex. They implement mechanisms to understand what each of the pages is trying to talk about. However, this isn’t going to stop, and the job of SEOs will continue to evolve.
Let’s try to simplify things to understand how Search Engines usually work!
Think of an empty library. Like, a library without any books at all.
Now, multiple book vendors are vying for a slot in the library. They need to register themselves before they can submit their books. Once they register and let the librarian know that they have a set of books, they can send their books regularly. For example, Jack has books of technology while Jill has books about travel. Mark has a whole range of newspapers and Dolly has books about Science.
After the first set of books get added to the library, the librarian continues to receive similar requests from those who want to submit books. However, since the queue of people standing outside the library keeps getting bigger (It’s a pretty big library 😀), the librarian tells them that books will get collected at delayed intervals and the library staff itself will go to collect the books!
You might think that this is insane! But, please hold onto this line of thought.
Now, think of the librarian as Google and the library staff as the Google Bot! The bot will go to different websites (people) and crawl various URLs (the books) and bring them back to the library (the index). For a book or any information, all they have to do is, ask the librarian (Google) about it.
Got it? ?
Well, that’s pretty much how the search engine works!
To know how a Search Engine works, it is first necessary to understand a few concepts.
Crawling, Indexing and Ranking.
While it is easier to vaguely understand these concepts, it is also easy to get confused about this. We’ll try to break these concepts down and simplify them to the best we can below.
What is Crawling?
Crawling is a process of discovery. Search Engines crawl the entire web to find and add relevant information for its index.
How does the crawling process begin?
- Robots.txt directives
- Sitemaps submitted by individual sites
- The existing list of past crawls
- Crawling requests from GSC
- Links on different URLs and brings back crawled content
Now, let’s do a match with the following with our library example:
- To bring back the library example, if the librarian’s staff goes to
- People who have already informed the staff to come and collect the books (Sitemap submissions/Requests from GSC)
- New homes to get more books to get added to the library, but skips ones who specifically tell them not to come (On its own; following robots.txt directive)
- Existing contacts for new books or even adding pages to the existing books (List of past crawls)
- Referral by existing contributors for a new book by another new/current contributor. (Links on other page URLs)
How to Improve Crawling?
- Encourage Crawling
- Submit Sitemap
- Submit individual URLs
If there is still some confusion with regards to ‘Crawling,’ then just think of this as Google knocking every website’s door to note down all the different URLs of their website.
What is Indexing?
When a search engine tries to understand what the page or website is about, and then classifies them in different categories, and places them under its index, as part of its database. This process is called Indexing.
What will search engines won’t/can’t index?
Your website or webpages won’t get indexed, if:
- They have been blocked by noindex directives either on your page or excluded in Robots.txt
- You haven’t selected a preferred domain option: www or non-www
- No Sitemap in place
- Javascript content which is difficult for the bots to crawl
- Issues with your hosting provider
[Related Read: If you’re not sure if your site is indexed or not, here are ways to check just that]
How to improve Indexing?
- Improve Internal Linking for all relevant pages
- Create an HTML Sitemap and include all the necessary links
- Ensure the URLs are part of your XML sitemap
- Share URLs on Social Media/Popular Forums as they get crawled and indexed faster
- Ensure the basic On-Page SEO Elements are in place which explains what your page is about
- Include Structured Data
The basic idea behind improving indexing is how best can your site help Search Engines?
Now, it is easy to get confused between Crawling and Indexing. Let’s try to understand how these terms are differentiated.
What’s the difference between Crawling and Indexing?
Crawling means a Google bot has analyzed your page. Indexing means the bot has understood/classified your page and has included that as part of the index.
All crawled pages might not be indexed, although if a page is indexed, it has been crawled before.
Now, that we’ve reasonably understood Crawling and Indexing, let’s take a look at the easier of the three – Rankings!
What is Ranking?
Search engines fetch the most relevant result and display it to the user as soon as they enter the query. Google refers to this as ‘Serving.’ Once the indexed page matches the query, it is relevant to the user and will answer the query, resulted get ranked.
This answer/ranking isn’t universal, of course.
If you were to search for a query in one country vs. another, it’d vary. It varies even from a user to user based on factors akin to personalization.
The bottom line is, the Rankings for these URLs on SERP need to be Relevant.
What are the ranking factors because of which rankings vary for users?
- Page performance
- Mobile Indexing
- AMP
- hreflang
- Freshness of content
All of these factors can vary the search results from one person to the other. Or even from one device type to another!
Now, how can you improve Rankings?
Well, it’ll require a more detailed answer than what this post can promise!
Do comment and let us know if you were able to understand the difference between the three factors- Crawling, Indexing, and Ranking.
If you did, you would be able to understand, ‘How a Search Engine Works.’