So, the service is configured with a managed account, it has the correct permissions and everything.
Once started, it takes about 3 minutes until it goes down. The events with source .NET Runtime are the actual crash of the service, just 2 lines above event ID 0 which is the start of it.
A closer look at the errors:
EventID 1000:
Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | select Server, Status command shows the host is online though (Server name and user details are removed due to confidentiality):
If you try to get the cache cluster health when the windows service is running, you can't connect:
Health Analyzer shows nothing related to the Distributed Cache. The Services on Server page shows it as running... but you can't really do more about it. I decided to recreate the instance as the quickest fix:
So, the steps to do that are:
1. Remove-SPDistributedCacheServiceInstance
And then
2. Add-SPDistributedCacheServiceInstance
The service will then warm-up in a few minutes and it’s time to check its health:
3. Get-CacheClusterHealth
Here’s the nice output that we’d expect from a healthy instance (all Healthy = 10.00):
If the result is not 10.00, we would see some unallocated fractions and we need to wait for a couple of more minutes to give the service the time to warm up. Then execute the command again.
Cluster health statistics
=========================
HostName = <localhost>
-------------------------
NamedCache =
DistributedActivityFeedLMTCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedDefaultCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache =
DistributedActivityFeedCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache =
DistributedSecurityTrimmingCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedAccessCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache =
DistributedLogonTokenCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedViewStateCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedServerToAppServerAccessTokenCache_6275b5f8-662d-4d0
6-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedBouncerCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
NamedCache = DistributedSearchCache_6275b5f8-662d-4d06-bb63-ff3ab18a0e21
Healthy = 10.00
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
That's it. The service is running stable now. I guess the root cause is that the memory was not enough in the first place as I saw that as a warning in the event logs prior to the issue, but that's to be confirmed.
No comments:
Post a Comment