I had a customer looking for an example of how SCOM can monitor a server for multiple reboots in a period of time.
I previously wrote about the typical scenario of looking for repeated events in a defined time period here: http://blogs.technet.com/b/kevinholman/archive/2014/12/18/creating-a-repeated-event-detection-rule.aspx
However – this wont work across reboots. The consolidator Condition Detection that keeps a count of multiple events across time is handled in memory, on the agent. If the agent service or server is restarted – we lose the count because the workflow must reinitialize.
One way to handle this is via a script write action. Essentially – a reboot is typically detected via a 6009 event in the SYSTEM log. (Dirty shutdowns can be detected via 6008 event and you should already be monitoring for these) However – in this example we don’t want an alert on every normal reboot. We only want to know if a server is rebooted multiple times in a specific time period.
We can accomplish this via two rules.
One rule will use an Event datasource, but instead of alerting – we will execute a script WriteAction as the response to the event. The script is a simple VBscript that looks in the system log for a specific duration of time, and counts the number of matching events.
Here is the rule:
The script is very simple: You can reuse this just change the event ID, count, and time you want at the top. You might also need to customize the events created by LogScriptEvent to suit your needs and provide a good message for the alert.
My log for a detection of 3 events looks like:
Here is the script:
We just need to wrap this up into a write action:
Lastly – we create a simple Alert Generating rule – to look in the Operations Manager event log – to alert on the “1001” event ID with source “Health Service Script” and EventDescription contains “CRITICAL”
After 3 reboots in 20 minutes – we get this:”"
I will attach my example management pack below:
I previously wrote about the typical scenario of looking for repeated events in a defined time period here: http://blogs.technet.com/b/kevinholman/archive/2014/12/18/creating-a-repeated-event-detection-rule.aspx
However – this wont work across reboots. The consolidator Condition Detection that keeps a count of multiple events across time is handled in memory, on the agent. If the agent service or server is restarted – we lose the count because the workflow must reinitialize.
One way to handle this is via a script write action. Essentially – a reboot is typically detected via a 6009 event in the SYSTEM log. (Dirty shutdowns can be detected via 6008 event and you should already be monitoring for these) However – in this example we don’t want an alert on every normal reboot. We only want to know if a server is rebooted multiple times in a specific time period.
We can accomplish this via two rules.
One rule will use an Event datasource, but instead of alerting – we will execute a script WriteAction as the response to the event. The script is a simple VBscript that looks in the system log for a specific duration of time, and counts the number of matching events.
Here is the rule:
<Rule ID="Custom.Example.EventLogCheck.Event6009.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Custom</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>System</LogName> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">6009</Value> </ValueExpression> </SimpleExpression> </Expression> </DataSource> </DataSources> <WriteActions> <WriteAction ID="ScriptWriteAction" TypeID="Custom.Example.EventLogCheck.WA" /> </WriteActions> </Rule>
The script is very simple: You can reuse this just change the event ID, count, and time you want at the top. You might also need to customize the events created by LogScriptEvent to suit your needs and provide a good message for the alert.
My log for a detection of 3 events looks like:
Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes")This will log a critical event with ID 1001 in the OpsMgr event log on the agent, with the event description resembling this:
Here is the script:
'========================================================================== ' ' NAME: CheckEventLog.vbs ' ' COMMENT: This is a write action script to inspect the event log for previous events ' ' Change the values for EventId, Count, and Minutes for your write action example (minutes is expressed as a negative number offset) ' '========================================================================== Option Explicit SetLocale("en-us") Dim EventId, Count, Minutes EventId = 6009 Count = 3 Minutes = -20 Dim oAPI Set oAPI = CreateObject("MOM.ScriptAPI") Dim strComputer 'The script will always be run on the machine that generated the original event strComputer = "." Dim strTime strTime = Time 'Check to see if this event has been logged x occurrences in n minutes Dim dtmStartDate, iCount, colEvents, objWMIService, objEvent Const CONVERT_TO_LOCAL_TIME = True Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") dtmStartDate.SetVarDate dateadd("n", Minutes, now)' CONVERT_TO_LOCAL_TIME iCount = 0 Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate,(Security)}!\\" _ & strComputer & "\root\cimv2") Set colEvents = objWMIService.ExecQuery _ ("Select * from Win32_NTLogEvent Where Logfile = 'SYSTEM' and " _ & "TimeWritten > '" & dtmStartDate & "' and EventCode = " & EventId & "") For Each objEvent In colEvents iCount = iCount+1 Next If iCount => Count Then Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes") WScript.Quit End If Call oAPI.LogScriptEvent("CheckEventLog.vbs",1002,0,": INFO : Event " & EventId & " was detected, but has not been detected " & Count & " or more times in the past " & Minutes & " minutes") Wscript.Quit
We just need to wrap this up into a write action:
<WriteActionModuleType ID="Custom.Example.EventLogCheck.WA" Accessibility="Public" Batching="false"> <Configuration /> <ModuleImplementation Isolation="Any"> <Composite> <MemberModules> <WriteAction ID="ScriptWrite" TypeID="Windows!Microsoft.Windows.ScriptWriteAction"> <ScriptName>CheckEventLog.vbs</ScriptName> <Arguments /> <ScriptBody><![CDATA[ '========================================================================== ' ' NAME: CheckEventLog.vbs ' ' COMMENT: This is a write action script to inspect the event log for previous events ' ' Change the values for EventId, Count, and Minutes for your write action example (minutes is expressed as a negative number offset) ' '========================================================================== Option Explicit SetLocale("en-us") Dim EventId, Count, Minutes EventId = 6009 Count = 3 Minutes = -20 Dim oAPI Set oAPI = CreateObject("MOM.ScriptAPI") Dim strComputer 'The script will always be run on the machine that generated the original event strComputer = "." Dim strTime strTime = Time 'Check to see if this event has been logged x occurrences in n minutes Dim dtmStartDate, iCount, colEvents, objWMIService, objEvent Const CONVERT_TO_LOCAL_TIME = True Set dtmStartDate = CreateObject("WbemScripting.SWbemDateTime") dtmStartDate.SetVarDate dateadd("n", Minutes, now)' CONVERT_TO_LOCAL_TIME iCount = 0 Set objWMIService = GetObject("winmgmts:" _ & "{impersonationLevel=impersonate,(Security)}!\\" _ & strComputer & "\root\cimv2") Set colEvents = objWMIService.ExecQuery _ ("Select * from Win32_NTLogEvent Where Logfile = 'SYSTEM' and " _ & "TimeWritten > '" & dtmStartDate & "' and EventCode = " & EventId & "") For Each objEvent In colEvents iCount = iCount+1 Next If iCount => Count Then Call oAPI.LogScriptEvent("CheckEventLog.vbs",1001,1,": CRITICAL : Event " & EventId & " has been detected " & Count & " or more times in the past " & Minutes & " minutes") WScript.Quit End If Call oAPI.LogScriptEvent("CheckEventLog.vbs",1002,0,": INFO : Event " & EventId & " was detected, but has not been detected " & Count & " or more times in the past " & Minutes & " minutes") Wscript.Quit ]]></ScriptBody> <TimeoutSeconds>60</TimeoutSeconds> </WriteAction> </MemberModules> <Composition> <Node ID="ScriptWrite" /> </Composition> </Composite> </ModuleImplementation> <InputType>System!System.BaseData</InputType> </WriteActionModuleType>
Lastly – we create a simple Alert Generating rule – to look in the Operations Manager event log – to alert on the “1001” event ID with source “Health Service Script” and EventDescription contains “CRITICAL”
<Rule ID="Custom.Example.EventLogCheck.MultipleReboots.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100"> <Category>Alert</Category> <DataSources> <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider"> <ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName> <LogName>Operations Manager</LogName> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="UnsignedInteger">1001</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">PublisherName</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">Health Service Script</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <RegExExpression> <ValueExpression> <XPathQuery Type="String">EventDescription</XPathQuery> </ValueExpression> <Operator>ContainsSubstring</Operator> <Pattern>CRITICAL</Pattern> </RegExExpression> </Expression> </And> </Expression> </DataSource> </DataSources> <WriteActions> <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertName /> <AlertDescription /> <AlertOwner /> <AlertMessageId>$MPElement[Name="Custom.Example.EventLogCheck.MultipleReboots.Rule.AlertMessage"]$</AlertMessageId> <AlertParameters> <AlertParameter1>$Data/EventDescription$</AlertParameter1> </AlertParameters> <Suppression /> <Custom1 /> <Custom2 /> <Custom3 /> <Custom4 /> <Custom5 /> <Custom6 /> <Custom7 /> <Custom8 /> <Custom9 /> <Custom10 /> </WriteAction> </WriteActions> </Rule>
I will attach my example management pack below:
Hiç yorum yok:
Yorum Gönder