Summary: This blog post describes some limitations that you need to consider before implementing cloud hybrid search in your organization.
Do not hesitate to contact me on twitter or LinkedIn to see if there are any changes that are not reflected in this blog post.
For a full overview of how Cloud Hybrid Search works, read my blog post: Everything you need to know about Cloud Hybrid Search
1. Provider-hosted apps
All provider-hosted add-ins will break when running the onboarding script *
One part of the onboarding script will change the SPAuthenticationRealm for your entire SharePoint farm.
As all your SPTrustedSecurityTokenIssuers rely on this SPAuthenticationRealm, they stop working after running the onboarding script.
Microsoft released a new hybrid picker that includes a fix for this issue. If you are configuring Cloud Hybrid Search only using the onboarding script, you have to do some additional work to get your apps to work again. Microsoft released a KB article with the fix here: https://support.microsoft.com/en-us/help/4010011/provider-hosted-add-ins-stop-working-and-http-401-error
2. Search customizations
The Cloud Search Service Application shares a similar architecture as the native SharePoint Search Service Application.
However, customization is limited because the search experience is derived from Office 365.
Below is a table that shows the current limitations in search customizations when using Cloud Hybrid Search.
3. Searching on Non-default Zone URL’s
Crawling Happens on-premises and the Default Zone url which you are crawling is fed to the Index on the Cloud. Since the Query is being served from Components in the Cloud which have no knowledge of the alternate URL’s available for On-prem Web-application, no URL Translation happens.
In Fact there is No concept of AAM’s in cloud Space.
The Only way to work around these limitations is to ensure that:
- The Public Default Zone URL should be a FQDN URL or The END URL which you want to Show up to the users in search results.
- The Default Zone URL which is being crawled should have Windows Claims Auth so that ACL information sent along with crawled Content is recognized by Search Components correctly.
- The User Needs to Perform Search from a URL while logged in windows Identity , so that it can be resolved effectively to validate Claims & Security Trimming to work effectively.
Note : Please ensure that Public URL which you want end users to see , should be accessible to Crawler on premise to be able to crawl the site successfully.
4. Windows Claims Only
Cloud SSA or To be Specific the Search Components in SharePoint online at present do not understand any other ACL type other than Windows Claims , This ACL information ,about the Content which is fed along the index is used for Security Trimming the results . SAML Identity Providers are not supported with Cloud SSA . So basically the On-prem Site you are crawling using the Cloud SSA should be using Windows Auth only .
if you search on an extended URL of Web-applications which is ADFS / LDAP or Some other type of Auth other than Windows Claims , No search results will be returned as the logged in user’s Identity cannot be resolved to ACL mapping we have on an Index Item.
5. Index item limits and pricing
Vlad Catrinescu blogged about this on his blog.
For every 1TB of pooled storage in SharePoint Online, we are allowed to put one million index items from our On-Premises SharePoint Farm.
You can check your current searchable items from the Search Service Application in Central Administration
In this example, we would need 13TB of pooled storage in SharePoint Online. This might mean that you have to reconfigure your content sources.
6. Security and compliancy
As your index is stored in Office 365, what does this mean compliancy wise?
This is the answer from Microsoft:
The content that is passed from on-premises to the azure cloud search connector (SCS) consists of crawled properties, keywords, ACLs, tenant info and some other metadata about the item. This is encrypted on-premises using a key supplied by the SCS and transmitted to the endpoint in Azure. Once there it is stored in an encrypted blob store and queued for processing. We retain the encrypted package in the blob store for use should we need to issue a content recrawl. The encrypted object is not the document though, it is just a parsed and filtered version that makes sense to the search engine.
In my opinion it would be wise to consult with your legal department before setting up Cloud Hybrid search to make sure it is okay to store content in the cloud.
You can always modify the content sources to exclude highly classified documents.
7. Licensing and Office 365 accounts
Every user that wants to use the new Cloud Hybrid Search functionality will need an active Office 365 license and an account synchronized from your on-premises Active Directory.
As items are indexed in Office 365, the access control entities are looked up in the cloud directory service.
User SIDs are mapped to PUIDs; Group SIDs are mapped to Object IDs; and and are mapped to .
As the cloud hybrid search is still in preview, documentation is limited at best.
For example, if you are using a proxy for Internet access on your server, make sure to specify this in your machine.config.
I have created a more detailed blog post around this issue here: https://www.sharepointrelated.com/2015/12/11/cloud-hybrid-search-proxy-settings/
The documentation for Cloud Hybrid Search has been greatly improved. You can find all information you need here: https://technet.microsoft.com/en-us/library/dn720906.aspx