I manage a fast installation with a few million documents.
Something went wrong and FAST is returning results for files that no longer exist.
The “Normal” way to fix this is to do another crawl of the content source – In this case, it did not work.
The “best” way to fix this is to reset the index and re-crawl all the content.
Unfortunately, because of the size of our fast install, this is not practical – it takes over a week to index everything.
In other words, fixing this problem the “right” way will also bring down fast for at least a week for some content – not good.
On a support call with Microsoft – they told me of a quick way to remove individual results – it’s not quite as effective as a full index reset, but it has it’s place – for example – say that a confidential document got crawled, and the summary is showing up in search results. You’d want to get that out of the way right away – this approach can be good for that.
First download the free tool FS4SP Query Logger by Mikael Svenson – I found version 3 on codeplex with a quick internet search.
Run this on the fast server and click the start logging button, then go do your search using whatever search page is returning the bad results.
Once you see your search term show up on the upper left, look at the result XML and find the result.
you’ll want to grab the value of the “contentid” field – it will look something like this:
Be sure the Area of XML you are looking at matches the search result you are trying to eliminate!
now, also on the fast server, open a FAST powershell command.
enter the command:
DOCPUSH -c sp -U -d ssic://YourNumberHere
Just like that, your search result should stop appearing in search results.
As a side note, while we were looking at some things, we used a clever powershell command to search multiple directories for some text
select-string <longIDnumberrepresentingwhatwewerelookingfor> [0-9]\*[0-9]\index_data\urlmap_sorted.txt
Select-string is like Grep in Unix or Findstr in windows – it looks for strings.
what was neat here was the Regular expression for the path it limited the search to just a few key directories. – ie