Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 15 Next »

Introduction

[Title Frame]

Welcome.

In this video, we’ll guide you through creating threshold overrides in IT-Conductor.

[Definition]

But first, what are threshold overrides?

In a monitoring context, thresholds are typically set to ensure the system responds appropriately when specific metrics exceed or fall below a specific value. On the other hand, an override allows users to modify these limits, either temporarily or permanently, based on unique requirements or conditions. In IT-Conductor, we refer to these adjustable limits, which trigger status changes and alert generation, as threshold overrides.

Let's get started!

Create Threshold Override Using an Existing Template

[Title Frame]

IT-Conductor provides predefined templates that make it easy for users to create threshold overrides.

[Service Grid]

First, navigate to the service grid. Then, select the metric for which you want to create a threshold override.

Let's take <CCMS Alerts> as an example.

[Metric Chart]

In the metric chart screen, click the Threshold Overrides icon to view the list of existing threshold overrides for the selected metric.

In this demo, there are currently no available threshold overrides configured.

[Overrides]

To get started using a template, click the Create from Templates icon to view the available options.

(Pause for 8 seconds before selecting the override template)

Then, select the override template you'd like to use.

For this demo, let's select SAP ShortDump Alert Count.

[Override Template]

Clicking on a template creates an override. You can adjust these fields as needed to meet your specific requirements.

  • Name refers to the assigned name for the override being added.

  • Description is any relevant information about the override being added.

  • Object Criteria refer to the specific attributes that will be monitored. You may specify the override criteria under the Object Criteria field. If you're creating an override from a template, you already have a list of pre-selected criteria. To add more criteria, click on the Add New Row icon.

  • In the Name field, you can choose one of the available criteria from the drop-down menu. 

  • In the Oper field, select the appropriate operator.

  • Finally, specify the exact value in the Value field. This is an open field where you can enter file names or formats to monitor.

(Pause for 7 seconds before scrolling to Scheduling)

// Note on Screen //

The more criteria you add, the more specific the override becomes, resulting in higher precedence.

[Scheduling]

Now, let's define when the override will perform validation. You may choose to run the override on a specific day and time. If no day is specified, it will run daily at the indicated time. Alternatively, you can assign a pre-existing schedule from the dropdown menu.

[Aggregation]

The next section is where you define the aggregation values. Since we're creating from a template, these are already pre-filled. However, should you wish to modify them, each field can be adjusted according to your requirements.

  • Value attribute - specifies object attribute that contains the value used for threshold comparison. Typically, you should not modify that value as it is to associate the overrides with the monitors.

  • Aggregation interval specifies the interval for which the historical values (see Value Attribute) are summarized and used for threshold comparison. This also defines the regularity at which the interval values are calculated.

  • Consecutive interval is optional and overrides the interval used to aggregate historical value (see Aggregation interval). However, it doesn’t override the regularity. If you want to calculate the Hourly average value every 5 minutes, set the Aggregation interval to 5 minutes and the Consecutive Interval to 60 (min).

  • Aggregation denotes the function that will be applied, such as sum, average, count, minimum, or maximum.

[Thresholds]

Now, we're ready to define the threshold values. These fields are also pre-filled already, but they can also be adjusted.

  • Warning Value indicates the threshold value that, if met, raises a warning threshold violation. that determines whether the limit has been reached. If this field is left blank, it means a warning is not used. If this field is left blank or set to zero, it signifies that no data is available.

  • Warning Operator indicates the operator used for validation.

  • Warning Severity denotes the severity status that will be triggered when the validation meets the Warning Value.

  • Alarm Value refers to the specific value that, when reached or exceeded, triggers an alarm.

  • Alarm Operator refers to the operator used to evaluate the condition for triggering the alarm.

  • Alarm Severity indicates the level of criticality associated with the alarm. In IT-Conductor, we have these available for selection.

  • Finally, Reset After defines the duration or conditions under which an alarm will automatically reset or be cleared after being triggered.

[Alerting]

Next, we have the Alerting section, where you can specify when users will receive alerts. Remember, this section focuses on defining the conditions for notifications, in contrast to the Scheduling section, which is dedicated to setting the timing and frequency of the override validation performed by the system.

  • Alert On refers to the status that will trigger the alert and notify the users. This is usually set to “Warning”.

  • Alert Message refers to the message that the users will see. This message is customizable by the user. (Pause for 7 seconds before proceeding to the next fields)

  • // Note on Screen // You may refer to the Threshold Overrides Variables in the IT-Conductor Knowledge Base for more details.

  • Repeat After refers to the setting that determines how long the system should wait before sending a subsequent alert after the initial notification has been triggered. For example, if an alert is triggered and the Repeat After duration is set to <120> minutes, the system will send another alert if the condition is still active after that time has passed. You can also define the time period for which alerts will be raised. You can also define the duration for which this condition remains active before subsequent alerts are sent.

  • Alert Priority refers to the classification that determines the urgency and importance of an alert within a monitoring system. In IT-Conductor, we have these available for selection.

  • In the Notification Template field, you can select a predefined template to ensure that alert notifications adhere to a standard format.

  • Resolve Alerts allows users to mark alerts as resolved when the alert is no longer active or when the metric falls below the configured threshold value.

  • If you want to receive an alert and notifications when the system returns to its normal status, check the Alert On Normal checkbox and optionally specify the alert text.

  • The Escalate feature allows you to set rules for how alert escalation is handled:

    • if they remain unresolved for a certain period.

    • If the number of repeated alerts (based on the rule criteria) exceeds the specified value

    • An escalation Alert is created depending on the rules, that alert can be picked up by a subscription that targets the management

[Recovery] - info taken from: https://docs.itconductor.com/user-guide/automation/sap-batch-job-restart-on-error#create-a-recovery-activity-to-restart-the-job

Finally, we have the Recovery section.

The Recovery functionality in IT-Conductor allows you to automate responses when certain thresholds are exceeded. By creating a Recovery Activity, you can configure the system to automatically take action, such as restarting a job whenever an incident occurs.

To enable the recovery activity, you can select either Warning or Alarm in the “Recovery On” field.

  • If you choose Warning, the recovery activity will be triggered when the Warning threshold is exceeded. This is useful for preemptive measures, allowing you to address potential issues before they escalate.

  • Selecting the Alarm option means that the recovery activity will only activate when the Alarm threshold is breached. This is typically reserved for more critical situations requiring immediate attention.

Next, you’ll need to define which Recovery action the system will take. This might include restarting a job or executing a specific recovery script. IT-Conductor provides predefined recovery activities based on common scenarios, which you can easily select from the list. Any custom recovery can be implemented in collaboration with the IT-Conductor Support Team.

[Saving & Verification]

Once all fields are filled out properly, click Save to complete the configuration.

Then, verify if the newly created override has been successfully added to the list.

Create Threshold Override Without a Template

[Title Frame]

Now, let's try and create a threshold override without using a template.

[Create New Override]

For this demo, let's try creating a threshold override for the Active Users Metric.

Navigate to the Overrides screen from the service grid. Then, click the Create New Override icon here.

As you can see, most fields are blank, except for those that have been automatically populated with default values based on the selected system information.

For the rest of the video, follow along with the configuration steps being shown.

In this example, we’ll show how to create an override that will be triggered once 800 or more users have logged in to the system.

In Alert Message, with the help of the Variables button, we’ll write the message that will be sent once the 800 users threshold has been breached.

If you want to review what each field does, feel free to revisit the previous part of this demo or visit the IT-Conductor Knowledge Base at your convenience.

Contact Us

<existing outro>

  • No labels