Health Check Report Banner

Operations Manager Health Check Report

Operations Manager provides a level of monitoring beyond server health, and ensures the availability of critical applications. Therefore it is crucial that such an important component of your IT infrastructure functions flawlessly.

Online community has created many SQL Server scripts which can help you in gathering information about different components of your SCOM environment. However handling this vast amount of resources is not just confusing but also takes a lot of time thus not being practical in daily life.

Together with one of the RAP as a service developers for SCOM - Danny Hermans from Microsoft we are taking the community knowledge one step further.

We have identified the most important elements which usually require most attention from SCOM administrators and combined those with suggestions and SQL Server script snippets from Kevin Holman blog and the SCC Health Check Reports Management Pack by Oscar Landmann & Pete Zerger.

All of this information is fused into a one page easy to understand and intuitive dashboard. From now on you can have better knowledge of your Operations Manager environment all in one place and a click of a button away from you. Make sure your SCOM is healthy and do it every day without hassle!

Download Now!

Assistance

We recommend to start by reading installation instructions here.

In case you have trouble installing the package, or are not sure how to handle a certain measurement which indicates need of attention, do not hesitate to contact our Support team. We can help you troubleshoot your SCOM environment issues and fine-tune the performance.

Need to take it further?

If you feel the need to take the free community edition of the report one step further, you are welcome to contact our Operational Intelligence team. We can help you create tailor-made reporting package just for your business.

Walk-through of the report

Each section of the report is explained here. You will find information about where the data is coming from (source), what time frame is displayed (duration), thresholds on which colors change to yellow for warning and red for critical state (conditions) and finally a short suggestion (sometimes with links to full descriptions) on what you should do to avoid having problems with your Operational Database.

Top 5 Alerts

Gets the 5 most occurring Operational Database alerts.

Source: OperationsManager
Duration: 7 days
Conditions: Warning (50) Critical (100)
Recommendation: Review the alert source to tune or fix the alerts.

Top 5 Events

Returns the top 5 events in the operational database that are being saved. This helps you look for too many events, event storms, and the result after tuning rules that generate too many events.

Source: OperationsManager
Duration: 7 days
Conditions: Warning (50) Critical (100)
Recommendation:Tuning of management packs. Review the event source to troubleshoot if an error occurred that creates all of these events or tune the MP if the event is relevant.

Alerts By day

Gets the alert count by day to show if alerts are increasing or decreasing over the last 7 days.

Source: OperationsManager
Duration: 7 days
Conditions: -
Recommendation: Look for increasing number of alerts.

Events By day

Gets the event count by day to show if events are increasing or decreasing over the last 7 days.

Source: OperationsManager
Duration: 7 days
Conditions: -
Recommendation: Look for increasing number of events.

Top 5 Performance Counters

Displays the top 5 performance counters inserted into the operational database.

Source: OperationsManager
Duration: 7 days
Conditions: Warning (300 000) Critical (500 000)
Recommendation: Tuning of management packs. Review the performance rule to check if you need the performance or if you can tune it.

Top 5 Discoveries

Gets the top 5 discoveries from the Data Warehouse to reveal if there might be problems with config churns.

Source: OperationsManagerDW
Duration: 7 days
Conditions: Warning (50) Critical (100)
Recommendation: Tuning of management packs. Review the discovery rule to check if you need the discovery of the object or if you can tune it. Example increase the interval of the discovery frequency. Review Kevin Holman blog about config churns.

Top 5 State Changes

Gets the top 5 state changes from the operational database to look for flapping monitors.

Source: OperationsManager
Duration: 7 days
Conditions: Warning (100) Critical (200)
Recommendation: Tuning of management packs. Review the monitor to check the Health State history to check if the monitor is flip flopping all the time. Check if you need the monitor or if you can tune it. Example increase the polling interval. Review Kevin Holman blog about monitors generating many state changes.

Total Rows In Staging

Get the number of rows in the staging tables in the Data Warehouse from Alert.AlertStage, State.StateStage, Perf.PerformanceStage and Event.EventStage.

Source: OperationsManagerDW
Duration: All
Conditions: Warning (50 000) Critical (100 000)
Recommendation: Review that your Data Warehouse is working correctly and that these tables don't get to large. This indicates that the data is not being transferred to the right tables in the Data Warehouse. View this troubleshooting article.

Operational Database Space

Gets the space used and the space available for the operations manager database only (log storage not included).

Source: OperationsManager
Duration: Current State
Conditions: Warning (free space between 40% and 50%) Critical (free space < 40%)
Recommendation: Check in the DB Space by Data Types % to see which data types that is using most space. Consider tuning of MPs or add more disk

DB Space by Data Types %

Retrieves the largest tables by data type to review what is taking up space in the Operational Database.

Source: OperationsManager
Duration: Current State
Conditions: -
Recommendation: Review if the data being collected is necessary for your environment and consider tuning.

Data Warehouse Grooming History

Retrieves the list of grooming jobs and displays last run time.

Source: OperationsManager
Duration: -
Conditions: Critical (older than 3 days)
Recommendation: Grooming of the OpsDB is called once per day at 12:00am…. by the rule:  “Partitioning and Grooming”.
See http://blogs.technet.com/b/kevinholman/archive/2008/02/13/grooming-process-in-the-operations-database.aspx.

Also review the performance signatures settings according to Kevin Holman: http://blogs.technet.com/b/kevinholman/archive/2008/11/04/boosting-opsmgr-performance-by-reducing-the-opsdb-data-retention.aspx

State Changes Within Grooming Period

Counts the days for the oldest state change in the Operational Database to find state changes older than the defined grooming period.

Source: OperationsManager
Duration: All
Conditions: Warning (older than 20 days) Critical (older than 30 days)
Recommendation: Clean up old State Change Event data for state changes that are older than the defined grooming period, such as monitors currently in a disabled, warning, or critical state. Review Kevin Holman blog on how to clean up the database from old state changes.

Unhealthy Management Servers

Checks if the management servers are in warning,critical or grayed-out state.

Source: OperationsManager
Duration: All
Conditions: Warning (gray servers) Critical (unavailable servers)
Recommendation: Review the health explorer and the event log on the affected management server. Try and restart the Health Service and clear the cache files.

 

Disclaimer

Approved Consulting AB are not responsible for any problems derived from the management pack (MP) although the MP has been tested on different installations and there are no known issues. There is always a possibility things are different in your environment so as always use the documentation and make sure that you know what you are doing.  Queries to the Operations Manager database are sensitive and all queries are made with the No Lock option and the refresh should only be scheduled once a day for least impact.

 

Customers