Thursday 25 January 2018

Sitecore cache problem when iterating through the content using API

Sitecore uses a number of various caches to keep the most frequently used data in memory instead of making requests to a database every time API needs the data.

The most important thing here to remember is that the cache should keep the necessary data only. However, there are a number of cases when things might get wrong and cache will start keeping data that has been accessed once. In this case, next time, you need the item which is not stored in cache, Sitecore will make a database request and cache the item again. Later, the obsolete data will be removed from the cache (in other words, the cache will be recovered) but this requires some time. During the recovering time, the performance of the processing requests will be decreased.
Things might become even worth in case the database server is located far from the Web servers and latency start playing its role.

Which operations might cause problems with cache?


I would say that any that work with a great number of data items that are not supposed to be cached or requested again during a short period of time.

A few examples:

1. Reindexing data. 
In this case, the indexing API iterates thought the Sitecore data (tree) and perform an indexing operation. As a result, all indexing items will be added to Sitecore caches. Just imagine, what will happen in case you are reindexing the whole content tree with millions of items... The cache will be overfilled soon and Sitecore will start cleaning it up in the middle of the indexing operation. Thus, instead of just indexing data, the processor time is spent on cache operations and further data clean up.

2. Publishing data
Similarly to the indexing, the publishing process is accessing a number of data, thus a number of odd items are cached.

3. Sitecore initialization logic
Sitecore initialization logic performs a number of operations that also require access to the items. For example loading localization data, scanning some items to load application settings, etc.


What can we do to improve the situation?

The easiest solution that came to my mind is using a sort of cache disabling context that would prevent data to be cached.
Sitecore allows disabling data caches by using Sitecore.Data.DatabaseCacheDisabler:

using(new DatabaseCacheDisabler())
{
  // your code here...
}
All the code, executed in DatabaseCacheDisabler context will not get and put data from \ to Sitecore caches. Another good thing is that DatabaseCacheDisabler inherits from Switcher class that is thread static. This means that cache disabling logic will be performed in current context only and will not influence other threads.
The bad thing about this code is that it will not get data from the cache even if it is there! In other words, if part of the items, you need to work with, from your code is already in cache Sitecore will not use them and make a request to a database by introducing a performance penalty.

To improve the performance we would rather want to get data from cache if it is already there but do not add new data to Sotecore caches if we read it from a database.

After some investigation, I have managed to find the switcher that does the trick: Sitecore.Data.CacheWriteDisabler.
The usage of this class is quite the same as the previous one but the code in the scope of this disabler will be using cached data if it is available.
Performance tests confirmed that the code in the scope of this disabler works faster than when I was using just DatabaseCacheDisabler.

At the end, I would like to warn about using of disablers in Sitecore.
One should remember that in case the code inside the disabler not only reads the items but also modifies them, then changes you did with these items will not appear in Sitecore caches if these items have alreay been cached.
Thus, one should use disablers wisely without breaking the cache integrity.



Saturday 6 January 2018

Using PowerShell with Sitecore

Recently, I have been asked whether it is possible to use PowerShell to work with Sitecore services.
I have never tried this before. I knew about existing modules that allow managing Sitecore via PowerShell.
After brief research, I found a few but all of them require a custom package to be installed on the Sitecore instance. This is not what I would like to do without knowing all the details about the package to be installed. It would be interesting to check what we can do with a clean Sitecore instance.

As a starting point, I have installed clean Sitecore 8.2 Update-3 (with the hostname sitecore82u3) and started my experiments.

Requesting not protected page using PowerShell

The task seems trivial but still quite useful.
For example, you might have a page that would return some statistics or perform some actions basing on passed parameters. I have decided to try to request sample layout.aspx page that is located in /layouts folder by default.
The PowerShell command looks simple:
Invoke-WebRequest -uri "http://sitecore82u3/layouts/sample layout.aspx"
I have got a response with status code 200 and page content.
Good, but what if I need to request security protected page that requires login before I continue?

Requesting protected page using PowerShell via login dialog

Once I request protected page e.g. /sitecore/admin/cache.aspx I will receive 200 response code but from the content, I can figure out that my request has been redirected to the login page:
Invoke-WebRequest -uri "http://sitecore82u3/sitecore/admin/cache.aspx"

Response content fragment:
...
<title>
        Sitecore Login
</title>
...

Thus, we need a way to fill in the login credentials before requesting the protected page. The easiest way of doing this is to get the input controls from the page and set some data there:

# define session variable
$session = $null
# url to the login page in admin
$loginUrl = "http://sitecore82u3/sitecore/admin/login.aspx"
$actionResponse = Invoke-WebRequest -uri $loginUrl -SessionVariable session -UseBasicParsing
$fields = @{}
#search for input fields
$actionResponse.InputFields.ForEach({
if($_.PSobject.Properties.name -match "Value"){
    $fields[$_.Name] = $_.Value
  }
})
# Set login info. Note: this code is specific for admin login page. For standard Sitecore 8.2 Update-3 login page one should use $fields.UserName and  $fields.Password
$fields.LoginTextBox = "sitecore\my_user"
$fields.PasswordTextBox = "my_password"
# Perform POST request with credentials
Invoke-WebRequest -uri $loginUrl -WebSession $session -Method POST -Body $fields -UseBasicParsing
# Using authenticated session make a request to a protected page
(Invoke-WebRequest -uri "http://sitecore82u3/sitecore/admin/cache.aspx" -WebSession $session).Content
In result, you will see the cache page content.

Current approach works but... we should remember that current approach works with the html markup that might be changed at some point and your code will not work. It would be better to use a better approach.

Requesting protected page using PowerShell via login endpoint

After some investigations, I have noticed that Sitecore includes Sitecore Client Services component by default. This component allows creating own services easily and provides Login endpoint service. This is exactly what I need.
To authenticate the user we just need to update our script to use service instead of working with login page:
# define session variable
$session = $null
# url to the login page in admin
$loginUrl = "https://sitecore82u3/sitecore/api/ssc/auth/login"
$params = @{"domain"="sitecore";
        "username"="my_user";
        "password"="my_password";
    }
# Perform POST request with credentials
$actionResponse = Invoke-WebRequest -uri $loginUrl -SessionVariable session -Method POST -Body $params -UseBasicParsing
# Using authenticated session make a request to a protected page
(Invoke-WebRequest -uri "http://sitecore82u3/sitecore/admin/cache.aspx" -WebSession $session).Content
After executing this script, I was able to see the content of the protected cache page.

Current approach looks more secure one since the login endpoint requires SSL connection configured and will not allow http connections.
Another important note about this endpoint is related to the fact that, by default, it is configured in a way of allowing local connections only and will reject all remote ones.
In case you need to connect to remote Sitecore solution, I can see at least a few options:

  1. Use PoweShell to connect to the remote server and proxy requests so that real requests are done by remote server to a local Sitecore instance.
  2. Configure Sitecore Client Services to process all requests by changing value of the setting "Sitecore.Services.SecurityPolicy" in "Sitecore.Services.Client.config" configuration file. 

To me, the first option seems more appropriate. It does not open a potential security problem and does not require updating Sitecore configuration.

In addition, it is important to mention that, using described approach you can make calls to Sitecore services based on Sitecore Client Services component, WebAPI component or any other services protected by Sitecore security.


PS: while searching for Sitecore Client Services login endpoint I have also noticed that Sitecore WebAPI component also provides login endpoint: "authenticate". However, I decided not to describe its usage here. The component has not been updated for a long time and the main development has been shifted to a more powerful SSC component.

Sitecore Content Serialization - first look

Agenda Preparations Configuration Module Configuration Performing Serialization Operations in CLI How to migrate from Unicorn to SCS Generat...