Recently I’ve been working for a customer where I’ve found some interesting requirements: they had several content sources and wanted to crawl them one by one after each other. Scheduling the incrementals for fix time was not a good solution as their content incrementals were very hectic: incremental crawl for the same content source took 5 min at one time, then 1.5 hours next time. And of course, they didn’t want idle time.

But we cannot define these kind of rules from the UI, so the ultimate solution was PowerShell.

First, we need to be able to start the crawl. Let’s talk about Incremental Crawl only this time. Here is the PowerShell script for this:

$SSA = Get-SPEnterpriseSearchServiceApplication -Identity “Search Service Application”

$ContentSourceName = My Content Source

$ContentSource = $SSA | Get-SPEnterpriseSearchCrawlContentSource -Identity $ContentSourceName

$ContentSource.StartIncrementalCrawl()

It’s an easy one, isn’t it?

Next step is checking the status of this content source. We need this for several reasons, for example we want to start the crawl only if it’s in Idle status, or we want to display the current status of the crawl in every minute, etc.

Here is the PowerShell command you need:

$ContentSource.CrawlStatus

What values can it have? Here you are, the list of crawl statuses:

  • Idle
  • CrawlStarting
  • CrawlingIncremental / CrawlingFull
  • CrawlPausing
  • Paused
  • CrawlResuming
  • CrawlCompleting
  • CrawlStopping

Ok, we can decide the status now, we can start a crawl. How to make it event driven? Here is the logical sequence we have to follow:

  1. Start the crawl of a content source.
  2. Wait until it’s done.
  3. Take the next content source and repeat the steps 1. and 2. until you’re done with each content source.
  4. Repeat this sequence.

First step is creating a function if we want a nice code. Here you go, my first one:

function Crawl {             #Start crawling     $ContentSourceName = $args[0]     $ContentSource = $SSA | Get-SPEnterpriseSearchCrawlContentSource –Identity $ContentSourceName     $CrawlStarted = Get-Date

#Check crawl status     if (($ContentSource.CrawlStatus -eq “Idle”) -and ($CrawlNumber -eq 0)) {             $ContentSource.StartIncrementalCrawl()         Start-sleep 1         Write-Host $ContentSourceName ” – Crawl Starting…”

do {             Start-Sleep 60     # Display the crawl status in every 60 seconds             $Now = Get-Date             $Duration = $Now.Subtract($CrawlStarted)    # Duration of the current crawl             $Speed = $ContentSource.SuccessCount / $Duration.TotalSeconds    # Speed of the current crawl, docs/sec                         Write-Host $ContentSourceName ” – ” $ContentSource.CrawlState (Get-Date).ToString() “-” $ContentSource.SuccessCount”/” $ContentSource.WarningCount”/” $ContentSource.ErrorCount “(” (“{0:N2}” -f $Speed) ” doc/sec)”         } while (($ContentSource.CrawlStatus -eq “CrawlStarting” ) -or ($ContentSource.CrawlStatus -eq “CrawlCompleting”) -or ($ContentSource.CrawlStatus -eq “CrawlingIncremental”) -or ($ContentSource.CrawlStatus  -eq “CrawlingFull” ))

Write-Host $ContentSourceName ” – Crawling Finished”         Write-Host “”     } }

This is how you can call this function:

Crawl(“My Content Source”)

Some additional steps you might need:

  • If you want to run this script once a day (need daily incrementals only but would like to be done as quick as possible), just schedule this script as a Windows task.
  • If you want to run this script during your day only (and release the resources for some other jobs for nights, for example), you can do the start in the morning and start in the evening logic. I’ve made a simple example in my blog post a few months ago.
  • If you want to run this sequence all day long, you might insert this logic into an infinite loop. (But be careful, sometimes you’ll need to run full crawl and then you have to stop running this script.)
  • You can insert some other steps into this script too. If you want to do something (logging, sending some alerts, etc.) when the crawl starts / stops, just do that here. It’ll be your custom event handler on the crawl events.
  • You can even write the output of this script to a file, so that you’ll have your own crawl log.

The scripts above works fine with both SharePoint Search and FAST Search for SharePoint. Enjoy!