31 Temmuz 2011 Pazar

Monitoring Windows Services – Automatic, Manual, and Disabled, using CheckStartupType


The Basic Service Unit Monitor is a very common monitor type to check the running status of any Windows Service.

The design of this Monitor by default – is to ONLY monitor the service – if the Startup Type is set to “Automatic”
image
This is because many services are set to manual or disabled by design, and we don’t want to consider those as a “failed” state creating noise out of the box.  Therefore – they are ignored.

Probably the biggest complaint about this behavior – is the UI.  Health explorer will show “Healthy” for the service monitor, EVEN if the service is not running, or doesn’t exist.  Let me explain.  If the service is set to Manual or Disabled, and not running – the monitor will initialize, ignore the service, and show healthy.  This is probably not the best behavior and it would be nice if we could control this to show warning state or unmonitored state, but that is another topic.  Additionally, if the service does not exist – the monitor will also show as healthy.  It is simply ignored.
So – to recap – the default Service Monitor will only monitor Automatic startup type services:
AutomaticRunningHealthy
AutomaticNot RunningNot Healthy
ManualRunningHealthy
ManualNot RunningHealthy
DisabledNot RunningHealthy
Does Not ExistNot RunningHealthy


The PROPER way to monitor a service, NO MATTER the startup type – is to OVERRIDE the Unit monitor, setting the “Alert only if service startup type is automatic” to “False”
image

Doing the above will now monitor the service, no matter the startup type setting…. it will ignore the startup type and only check to ensure the service is running or not.
Using the override set to false:
AutomaticRunningHealthy
AutomaticNot RunningNot Healthy
ManualRunningHealthy
ManualNot RunningNot Healthy
DisabledNot RunningNot Healthy
Does Not ExistNot RunningNot Healthy


Now – let me explain why and how this works.
The Basic Service Monitor utilizes a specific MonitorType.  The MonitorType is “CheckNTServiceStateMonitorType” from the Microsoft.Windows.Library.  This MonitorType contains Member Modules of a DataSource, two expression based condition detections, and a Probe.
The datasource is “Win32ServiceInformationProvider” which is a native module to inspect a Windows Service.  In the datasource, we will pass the ComputerName, the ServiceName, the Frequency, and the CheckStartupType.  The Frequency default is 60 seconds… so we will inspect the service running state every 60 seconds.  The “CheckStartupType” is simply a value of True or False, to examine the startup type or not.
The two condition detections are based on System.ExpressionFilter, which is a simple expression.  This is where “CheckStartupType” comes into play.
The “ServiceRunning” CD (Condition Detection) uses a complex formula:
image
The above means – that we consider the monitor healthy (ServiceRunning):  when ( ( ( CheckStartupType Does not = false ) AND ( StartMode Does not = 2 ) ) OR ( State = 4 ) )
Here – you can clearly see why we treat disabled or non-existent services as healthy, when CheckStartupType = True (which is the default)
When we override CheckStartupType to false, we can see why they change to Unhealthy…. as this condition will no longer match.

The “ServiceNotRunning” CD (Condition Detection) uses a complex formula:
image
The above means - that we consider the monitor unhealthy (ServiceNotRunning):  when ( ( ( StartMode = 2 ) OR ( ( CheckStartupType = false ) AND ( StartMode Does not equal 2 ) ) ) AND ( State Does not equal 4 ) )
So for a service to be considered “Not Running”, it must be State = 4 (not running) *AND* also be ONE of the following…  set to Automatic, *OR* set to Manual/Disabled and StartupType = false.

Ok – that explains the Monitor and how/why it works as it does, with and without the overrides.

There are some blogs out there which document the ability to edit the XML, and set false.  This is hard coding the CheckStartupType value.  I don’t recommend doing this – for a few reasons:
1.  The override use gives more granular options, over which agents you need to set this to.
2.  If you ever EDIT the monitor again in any way using the UI (even to change something simple like an alert property, severity, etc…) this will force the XML back to true and break your monitoring.  That is simply because the UI expects this setting.  As you can see – using the override in this case is far more effective.

Lets look at the XML of a Service Unit Monitor.
When we create the Service Monitor using the UI – it will look like the following:

      <UnitMonitor ID="UIGeneratedMonitor8b9d2b9c2ada46a284429b5569b8185b" Accessibility="Public" Enabled="true" Target="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="MicrosoftWindowsLibrary6172210!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false">
        <Category>CustomCategory>
        <OperationalStates>
          <OperationalState ID="UIGeneratedOpStateId8f7f4049ca124f9db3e4c0a4b3a1c730" MonitorTypeStateID="Running" HealthState="Success" />
          <OperationalState ID="UIGeneratedOpStateId98d7e3348650477598849feb6776f583" MonitorTypeStateID="NotRunning" HealthState="Warning" />
        OperationalStates>
        <Configuration>
          <ComputerName>$Target/Host/Property[Type="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Computer"]/NetworkName$ComputerName>
          <ServiceName>SpoolerServiceName>
          <CheckStartupType>trueCheckStartupType>
        Configuration>
      UnitMonitor>

When we create the Service Monitor using the Authoring Console – it will look like the following:

<UnitMonitor ID="Spooler.Auth.SpoolerSrv" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false">
        <Category>AvailabilityHealthCategory>
        <OperationalStates>
          <OperationalState ID="Running" MonitorTypeStateID="Running" HealthState="Success" />
          <OperationalState ID="NotRunning" MonitorTypeStateID="NotRunning" HealthState="Warning" />
        OperationalStates>
        <Configuration>
          <ComputerName />
          <ServiceName>SpoolerServiceName>
          <CheckStartupType />
        Configuration>
      UnitMonitor>

Note that BOTH uses a slightly different method to set CheckStartupType value, but both have the same effect – setting it to true.
If the Monitor has NO configuration for CheckStartupType – then the override will not work and will always assume “True”.

So – if you want to monitor services set other than Automatic, use the override.  It is the best way.  Editing the XML and hard coding to false will also work, but your changes will be lost of anyone edits the monitor in any way in the future.  Using the override, this will not happen.

There are some advanced scenarios where the basic design wont work well.  The scenario that comes to mind, is a setting where you want to monitor the service in manual startup type, but if this service is clustered, you get alerts from the passive node.  This is caused when you target your service monitor at a non-cluster aware class, such as “Windows Server Operating System”.  On those cases, you should create a new class that is cluster aware, and then target your service monitor at the new custom class.  Take a look at “SQL DBEngine” – it behaves perfectly in this way.
You should target your service monitors to the appropriate class.  You should NEVER use “Windows Computer” or “Windows Server” as a monitoring target.  If you use a widespread generic class, like “Windows Server Operating System” you must ONLY monitor a service that would exist on ALL Windows Server Operating Systems.  If it doesn’t, then you will see false monitoring conditions, or creating an unhealthy state for a computer which does not have the service.  In those cases, you should enable your monitor only for a group of systems, or (better) create a new class of systems that will always contain that service or application.
Lastly – you could create some advanced MonitorTypes if you don’t like this one.  Use the existing MonitorType as an example, and then change the Expression based Condition Detections as you see fit.  You could make a MonitorType that ignores Disabled, but does monitor Auto and Manual services by default, quite easily.
Probably my only complaint in all of this, is that by default, when a service does not exist on a machine, we show the monitor as healthy.  To me, we should have some other condition detection capability to consider this an unhealthy condition.


Kevin Holman

Hiç yorum yok: