Introduction
[Title Frame]
Welcome.
In this video, we’ll guide you through creating threshold overrides in IT-Conductor.
[Definition]
But first, what are threshold overrides?
In a monitoring context, thresholds are typically set to ensure that when specific metrics exceed or fall below a specific value, the system responds appropriately. An override, on the otherhand, allows users to modify these limits, either temporarily or permanently, based on unique requirements or conditions. In IT-Conductor, we refer to these adjustable limits, which triggers status changes and alert generation, as threshold overrides.
Let's get started!
Create Threshold Override Using an Existing Template
[Title Frame]
IT-Conductor provides predefined templates that make it easy for users to create threshold overrides.
[Service Grid]
First, navigate to the service grid and select the metric for which you want to create a threshold override.
Let's take <metric> as an example.
[Metric Chart]
In the metric chart screen, click the Threshold Overrides icon to view the list of existing threshold overrides for the selected metric.
[Overrides]
In the Overrides screen, click the Create from Templates icon to display the list of available templates.
(Pause for 8 seconds before selecting the override template)
Then, select the override template you'd like to use.
// Note on Screen //
If no templates are visible, please contact the IT-Conductor Support Team to ensure you have the appropriate access level.
For this demo, let's select <override template>.
[Override Template]
You'll be redirected to the template page, where parameters will be pre-filled. You can adjust these fields as needed to meet your specific requirements.
Name refers to the assigned name for the override being added.
Description is any relevant information about the override being added.
Object Criteria refers to the specific attributes that will be monitored. You may specify the override criteria under Object Criteria field. If you're creating an override from a template, you already have a list of pre-selected criteria. To add more criteria, click on the Add New Row icon.
In the Name field, you can choose one of the available criteria from the drop-down menu.
In the Oper field, select the appropriate operator.
Finally, specify the exact value in the Value field. This is an open field where you can enter file names or formats to monitor.
(Pause for 7 seconds before scrolling to Scheduling)
// Note on Screen //
The more criteria you add, the more specific the override becomes, resulting in higher precedence.
[Scheduling]
Now, let's define when the override will perform validation. You may choose to run the override on a specific day and time. If no day is specified, it will run daily at the indicated time. Alternatively, you can assign a pre-existing schedule from the dropdown menu.
[Aggregation]
The next section is where you define the aggregation values. Since we're creating from a template, these are already pre-filled. But should you wish to modify them, each field can be adjusted according to your requirements.
Value attribute states the action that the override will do based on the information entered in the upcoming fields. This attribute forms the basis for evaluating conditions that trigger alerts, such as system load or job completion times.
Aggregation interval defines the period during which files are collected and added to the file server.
Consecutive interval refers to the regularity or frequency of occurrences within a specified number of minutes.
Aggregation denotes the function that will be applied, such as sum, average, count, minimum, or maximum.
[Thresholds]
Now, we're ready to define the threshold values. These fields are also pre-filled already but they can also be adjusted.
Warning Value indicates the threshold value that determines whether the limit has been reached. If this field is left blank or set to zero, it signifies that no data is available.
Warning Operator indicates the operator used for validation.
Warning Severity denotes the severity status that will be triggered when the validation meets the Warning Value.
Alarm Value refers to the specific value that, when reached or exceeded, triggers an alarm.
Alarm Operator refers to the operator used to evaluate the condition for triggering the alarm.
Alarm Severity indicates the level of criticality associated with the alarm. In IT-Conductor, we have these available for selection.
Finally, Reset After defines the duration or conditions under which an alarm will automatically reset or be cleared after being triggered.
[Alerting]
Next, we have the Alerting section, where you can specify when users will receive alerts. Remember, this section focuses on defining the conditions for notifications, in contrast to the Scheduling section, which is dedicated to setting the timing and frequency of the override validation performed by the system.
Alert On refers to the status that will trigger the alert and notify the users. This is usually set to “Warning”.
Alert Message refers to the message that the users will see. This message is customizable by the user. (Pause for 7 seconds before proceeding to the next fields)
// Note on Screen // You may refer to the Threshold Overrides Variables in the IT-Conductor Knowledge Base for more details.
Repeat After refers to the setting that determines how long the system should wait before sending a subsequent alert after the initial notification has been triggered. For example, if an alert is triggered and the Repeat After duration is set to <120> minutes, the system will send another alert if the condition is still active after that time has passed. You can also define the duration for which this condition remains active before subsequent alerts are sent.
Alert Priority refers to the classification that determines the urgency and importance of an alert within a monitoring system. In IT-Conductor, we have these available for selection.
In the Notification Template field, you can select a predefined template to ensure that alert notifications adhere to a standard format.
Resolve Alerts allows users to mark alerts as resolved when the alert is no longer active or when the metric falls below the configured threshold value.
If you want to receive notifications when the system returns to its normal status, check the Alert On Normal checkbox.
The Escalate feature allows you to set rules for how alerts are handled if they remain unresolved for a certain period. By enabling escalation, alerts can be automatically forwarded to higher-level personnel, ensuring critical issues receive timely attention and enhancing accountability.
[Recovery] - info taken from: https://docs.itconductor.com/user-guide/automation/sap-batch-job-restart-on-error#create-a-recovery-activity-to-restart-the-job
Finally, we have the Recovery section.
The Recovery functionality in IT-Conductor allows you to automate responses when certain thresholds are exceeded. By creating a Recovery Activity, you can configure the system to automatically take action, such as restarting a job, whenever an incident occurs.
To enable the recovery activity, navigate to the “Recovery On” dropdown. Here, you can select either Warning or Alarm.
Warning: If you choose this option, the recovery activity will be triggered when the Warning threshold is exceeded. This is useful for preemptive measures, allowing you to address potential issues before they escalate.
Alarm: Selecting this option means that the recovery activity will only activate when the Alarm threshold is breached. This is typically reserved for more critical situations requiring immediate attention.
Next, you’ll need to define what action the system should take. This might include restarting a job or executing a specific recovery script. IT-Conductor provides predefined recovery activities based on common scenarios, which you can easily select from the list.
For Recovery, select the automation user that will be responsible for managing the recovery actions. This ensures that the appropriate people are notified and can take any necessary follow-up actions.
If you want to be alerted whenever this recovery activity is executed, check the Alert box. This feature keeps you informed about automated processes, ensuring you can monitor actions taken on critical jobs.
Once you’ve configured the recovery settings to your liking, make sure to click Save Recovery Activity to finalize your changes. This ensures that your new recovery parameters are active and ready to respond to any incidents.
[Saving & Verification]
Once all fields are filled out properly, click Save to complete the configuration.
Then, verify if the newly created override has been successfully added to the list.
Create Threshold Override Without a Template
[Title Frame]
Now, let's try and create a threshold override without using a template.
[Create New Override]
For this demo, let's try creating a threshold override for <metric>.
Navigate to the Overrides screen from the service grid. Then, click the Create New Override icon here.
As you can see, most fields are blank, except for those that have been automatically populated with default values based on the selected system information.
For the rest of the video, follow along with the configuration steps being shown. If you want to review what each field does, feel free to revisit the previous part of this demo or visit the IT-Conductor Knowledge Base at your convenience.
Contact Us
<existing outro>
Add Comment