Crawling is the process of gathering the content for search. To retrieve information, the crawl component connects to the content sources by using the proper out-of-the-box or custom connectors. After retrieving the content, the Crawl Component passes crawled items to the Content Processing Component.
There are three main types of SharePoint crawl: Full Crawl, Incremental Crawl and Continuous Crawl.
During a full crawl, the search engine crawls, processes and indexes every item in the content source, regardless of the previous crawl status.
In most cases, we do not need to (re-)index everything in the content source.
Instead, incremental crawl enables the Crawler to crawl only the items which have been newly created or modiﬁed since the last crawl. Modification includes:
- content updates,
- metadata updates,
- permission updates,
When doing incremental crawl, the processes take much less time as the unchanged items are not processed.
Similar to incremental, however significantly different, too, continuous crawl is a dynamic way of crawling SharePoint and Ofﬁce 365 content. When it is enabled on a content source, the Crawler checks the SharePoint change logs regularly (every 15 min, by default) and looks for recent changes. If there’s any item on the changelog, the Crawler takes immediate action on it and sends to the Content Processor.
Please note, that continuous crawl is not real-time. However, it checks the SharePoint change logs often enough to be quick and provide good content freshness.
Also, please note that continuous crawl doesn’t fix any errors – therefore, we still need incremental crawl run on the content source (every 4 hours, by default) even if we use continuous crawl.