The most important thing here to remember is that the cache should keep the necessary data only. However, there are a number of cases when things might get wrong and cache will start keeping data that has been accessed once. In this case, next time, you need the item which is not stored in cache, Sitecore will make a database request and cache the item again. Later, the obsolete data will be removed from the cache (in other words, the cache will be recovered) but this requires some time. During the recovering time, the performance of the processing requests will be decreased.
Things might become even worth in case the database server is located far from the Web servers and latency start playing its role.
Which operations might cause problems with cache?
I would say that any that work with a great number of data items that are not supposed to be cached or requested again during a short period of time.
A few examples:
1. Reindexing data.
In this case, the indexing API iterates thought the Sitecore data (tree) and perform an indexing operation. As a result, all indexing items will be added to Sitecore caches. Just imagine, what will happen in case you are reindexing the whole content tree with millions of items... The cache will be overfilled soon and Sitecore will start cleaning it up in the middle of the indexing operation. Thus, instead of just indexing data, the processor time is spent on cache operations and further data clean up.
2. Publishing data
Similarly to the indexing, the publishing process is accessing a number of data, thus a number of odd items are cached.
3. Sitecore initialization logic
Sitecore initialization logic performs a number of operations that also require access to the items. For example loading localization data, scanning some items to load application settings, etc.
What can we do to improve the situation?
The easiest solution that came to my mind is using a sort of cache disabling context that would prevent data to be cached.Sitecore allows disabling data caches by using Sitecore.Data.DatabaseCacheDisabler:
using(new DatabaseCacheDisabler())All the code, executed in DatabaseCacheDisabler context will not get and put data from \ to Sitecore caches. Another good thing is that DatabaseCacheDisabler inherits from Switcher class that is thread static. This means that cache disabling logic will be performed in current context only and will not influence other threads.
{
// your code here...
}
The bad thing about this code is that it will not get data from the cache even if it is there! In other words, if part of the items, you need to work with, from your code is already in cache Sitecore will not use them and make a request to a database by introducing a performance penalty.
To improve the performance we would rather want to get data from cache if it is already there but do not add new data to Sotecore caches if we read it from a database.
After some investigation, I have managed to find the switcher that does the trick: Sitecore.Data.CacheWriteDisabler.
The usage of this class is quite the same as the previous one but the code in the scope of this disabler will be using cached data if it is available.
Performance tests confirmed that the code in the scope of this disabler works faster than when I was using just DatabaseCacheDisabler.
At the end, I would like to warn about using of disablers in Sitecore.
One should remember that in case the code inside the disabler not only reads the items but also modifies them, then changes you did with these items will not appear in Sitecore caches if these items have alreay been cached.
Thus, one should use disablers wisely without breaking the cache integrity.