Crawl issue with SharePoint 2010/FAST Search Server 2010

Backgroud:

During one of the recent FAST implementations, I faced another interesting issue with crawling internet sites and it is confirmed by Microsoft to be a product (SharePoint 2010) issue.  The details of my findings are given below.

Problem:

In SharePoint server 2010, you received following error when you try to crawl an anonymous + NTLM enabled web site (non-SharePoint site)

 “Access is denied. Verify that either the Default Content Access Account has access to this repository, or add a crawl rule to crawl this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has “Full Read” permissions on the SharePoint Web Application being crawled.”

Cause:

It is confirmed to be  a product issue. 

Such issue can occur when you ever set a default content access account. In such scenario, Search will incorrectly pick up the default account and assume the site to crawl is NTLM and it fails to crawl the site using NTLM, it does not even fall back to anonymous.

Resolution:

MS Product Group has confirmed this to be product issue. However we were told, since changes required to fix the issue is big, they will more consider fixing it in future release of the product. 

Workarounds:

The below 2 workarounds are available to this issue, either of them will solve the issue

I.    Remove and re-provision a new search service application ( I know its a pain spcially if you have already set up hunderds of crawl rules/etc…). After the new search service application  is provisioned, DO NOT set or change the default content access account. (unfortunately, there is no public interface that can be used to take away the default content access account once it is there). Even retyping the password for the same default content access account will set the authentication type to NTLM as explained and will trigger the issue.

OR

II.    Create a crawl rule for the start address of the site and configure it to use cookie. 

  1. Create a dummy text file, i.e. dummy_cooki.txt, to contain some dummy text:   E.g. Test: testing
  2. Create a crawl rule to use cookie to crawl, and locate the cookie created in step 1 
  3. Specify whatever Error.aspx (it is more a dummy or filler, and is not really used)
  4.  Crawl again

 The idea is to configure SharePoint to crawl the site using cookie (from some degree, this can also be interpreted as anonymous). As long as the target site does not have the extra dummy cookie (many web sites don’t specifically look for cookies other than those sent by themselves), crawl can succeed.

Have a nice Day !!!

Advertisements

12 comments on “Crawl issue with SharePoint 2010/FAST Search Server 2010

  1. Hi, so what you are saying is never touch the default crawl account, and create crawl rules for everything else, in case you need to add a source which triggers this scenario?

    • Prashanth says:

      Yes Mikael, the workarounds are either of these.
      1. Never touch the default crawl account.
      OR
      2. Create crawl rules.
      I will update the workaround section with the word ‘either’, to make it more clear

      • vikram says:

        We have a scenario of using custom service application to get the content from FAST using FQL .Is it possible to do so?And we have the conetnet as three lists where the conetent is placing in FAST .Please help me out with your suggestion.Thank you.

      • Prashanth says:

        I am not sure why you want to create a custom service application for this, if I get you correctly you are trying to search content in 3 SharePoint lists.
        if this is the case all you need to do, is to create a seperate scope in FAST Query Service to filter content from these 3 lists, and you can use FQL to search on this scope.

  2. Rick Fafard says:

    Prashanth,

    Thank you for the insight into using a dummy cookie file to get around this problem.
    I’ve done option II of your instructions and they work for several web sites that were giving me problems.

    Regards,
    Rick Fafard

  3. Prashanth says:

    Cool 🙂 No Worries

  4. vikram says:

    Hi Prashanth,
    Thanks for the reply.In oue scenario we are having three lists in which the content in the lists are placed in the FAST and we need to use FQL to get the content to other sources for publishing (In our case we need to use either custom service appliaction which should be automated using a timer job to send mails to different users in different timezones)whenever a new item is added into list crwlling should be done to that particular item only and it should be send immediately to the user through email using timer job) will thi sbe possible??

  5. vikram says:

    HI prashanth is there any powershell scripting to crawl the items ??
    Thanks in advance for your replies.

    • Prashanth says:

      Got carried away with some stuff and couldnt reply to your earlier post, anyway what I wanted to suggest is, did you cosider event receivers for the lists, when a new item is added to the list you can send emails as well as execute the crawl using code.
      Yes powershell can be used to execute the crawler, you can execute these powershell commands from code.

  6. vikram says:

    Hi Prashanth,
    Thanks for the reply .Yes we are going to use timerjobs and event receivers .Can u refer me some blogs or snippets for me to try ahead .Thanks in advance..

  7. Kalashnikov says:

    Hey,

    This article might help to troubleshoot your issue..

    http://kalashnikovtechnoblogs.blogspot.in/2012/02/troubleshooting-crawl-issues-for-ssl.html

    Thanks,
    Kalashnikov
    http://kalashnikovtechnoblogs.blogspot.in

    if it helps, Please dont forget to mark it as an answer. Thank you.

    • Prashanth says:

      Your article is totally different to the issue that I am talking in this post. This is a SharePoint 2010 product bug (Confirmed by MS) and the solution I suggested will fix.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s