Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

[Title Frame]

...

In a monitoring context, thresholds are typically set to ensure that the system responds appropriately when specific metrics exceed or fall below a specific value, the system responds appropriately. An override, on the otherhand, . On the other hand, an override allows users to modify these limits, either temporarily or permanently, based on unique requirements or conditions. In IT-Conductor, we refer to these adjustable limits, which triggers trigger status changes and alert generation, as threshold overrides.

...

First, navigate to the service grid and . Then, select the metric for which you want to create a threshold override.

Let's take <metric> <CCMS Alerts> as an example.

[Metric Chart]

In the metric chart screen, click the Threshold Overrides icon to view the list of existing threshold overrides for the selected metric.

In this demo, there are currently no available threshold overrides configured.

[Overrides]

In the Overrides screenTo get started using a template, click the Create from Templates icon to display view the list of available templatesoptions.

(Pause for 8 seconds before selecting the override template)

Then, select the override template you'd like to use.

// Note on Screen //

If no templates are visible, please contact the IT-Conductor Support Team to ensure you have the appropriate access level.

For this demo, let's select <override template>SAP ShortDump Alert Count.

[Override Template]

You'll be redirected to the template page, where parameters will be pre-filledClicking on a template creates an override. You can adjust these fields as needed to meet your specific requirements.

  • Name refers to the assigned name for the override being added.

  • Description is any relevant information about the override being added.

  • Object Criteria refers refer to the specific attributes that will be monitored. You may specify the override criteria under the Object Criteria field. If you're creating an override from a template, you already have a list of pre-selected criteria. To add more criteria, click on the Add New Row icon.

  • In the Name field, you can choose one of the available criteria from the drop-down menu. 

  • In the Oper field, select the appropriate operator.

  • Finally, specify the exact value in the Value field. This is an open field where you can enter file names or formats to monitor.

...

Now, let's define when the override will perform validationbe active. You may choose to run have the override active on a specific day and time. If no day is specified, it will run days of the week and at any time. By default, all weekdays are selected, so it will be active daily at the indicated time. Alternatively, you can assign a pre-an existing schedule from the dropdown menu. This will override the scheduling parameters mentioned above, and the override will be active when the schedule is On.

[Aggregation]

The next section is where you define the aggregation values. Since we're creating from a template, these are already pre-filled. But However, should you wish to modify them, each field can be adjusted according to your requirements.

  • Value attribute states the action that the override will do based on the information entered in the upcoming fields. This attribute forms the basis for evaluating conditions that trigger alerts, such as system load or job completion timesattribute - specifies object attribute that contains the value used for threshold comparison. Typically, you should not modify that value as it is to associate the overrides with the monitors.

  • Aggregation interval defines specifies the period during which files are collected and added to the file server.Consecutive interval refers to the regularity or frequency of occurrences within a specified number of minutesinterval for which the historical values (see Value Attribute) are summarized and used for threshold comparison. This also defines the regularity at which the interval values are calculated.

  • Consecutive interval is optional and overrides the interval used to aggregate historical value (see Aggregation interval). However, it doesn’t override the regularity. If you want to calculate the Hourly average value every 5 minutes, set the Aggregation interval to 5 minutes and the Consecutive Interval to 60 (min).

  • Aggregation denotes the function that will be applied, such as sum, average, count, minimum, or maximum.

...

Now, we're ready to define the threshold values. These fields are also pre-filled already, but they can also be adjusted.

  • Warning

    • Warning Value

    indicates
    • defines the threshold value that, when met based on the specified operator logic, sets monitor severity to Warning or the severity level configured in the Warning Severity field. that determines whether the limit has been reached. If this field is left blank, it means a warning is not used. If this field is left blank or set to zero, it signifies that no data is available.

    • Warning Operator

    indicates
    • is the logical comparison operator used

    for validation
    • to check the Value matching Warning threshold.

    • Warning Severity denotes the monitor severity

    status
    • that will be

    triggered
    • set when the

    validation meets the Warning Value.
    • value matches the warning threshold.

  • Alarm

    • Alarm Value

    refers to
    • defines the

    specific
    • threshold value that, when

    reached or exceeded, triggers an alarm
    • met based on the specified operator logic, sets monitor severity to Alarm or the severity level configured in the Alarm Severity field. If this field is left blank, it means an alarm is not used.

    • Alarm Operator

    refers to
    • is the logical comparison operator used

    to evaluate the condition
    • for

    triggering the alarm
    • checking the Value matching Alarm threshold.

    • Alarm Severity

    indicates the level of criticality associated with the alarm. In IT-Conductor, we have these available for selection.
    • denotes the monitor severity that will be set when the value matches alarm threshold.

  • Finally, Reset After defines the duration or conditions under which an alarm will automatically reset or be cleared after being triggered.

...

  • Alert On refers to the status that will trigger the alert and notify the users. This is usually set to “Warning”.

  • Alert Message refers to the message that the users will see. This message is customizable by the user. (Pause for 7 seconds before proceeding to the next fields)

  • // Note on Screen // You may refer to the Threshold Overrides Variables in the IT-Conductor Knowledge Base for more details.

  • Repeat After refers to the setting that determines how long the system should wait before sending a subsequent alert after the initial notification has been triggered. For example, if an alert is triggered and the Repeat After duration is set to <120> minutes, the system will send another alert if the condition is still active after that time has passed. You can also define the time interval for which alerts will be generated. You can also define the duration for which this condition remains active before subsequent alerts are sent.

  • Alert Priority refers to the classification that determines the urgency and importance of an alert within a monitoring system. In IT-Conductor, we have these available for selection.

  • In the Notification Template field, you can select a predefined template to ensure that alert notifications adhere to a standard format.

  • Resolve Alerts allows users to mark alerts as resolved when the alert is no longer active or when the metric falls below the configured threshold value.

  • If you want to receive an alert and notifications when the system returns to its normal status, check the Alert On Normal checkbox and optionally specify the alert text.

  • The Escalate feature allows you to set rules for how alerts are handled alert escalation is handled:

    • if they remain unresolved for a certain period.

    By enabling escalation, alerts can be automatically forwarded to higher-level personnel, ensuring critical issues receive timely attention and enhancing accountability.
    • If the number of repeated alerts (based on the rule criteria) exceeds the specified value

    • An escalation Alert is created depending on the rules, that alert can be picked up by a subscription that targets the management

[Recovery] - info taken from: https://docs.itconductor.com/user-guide/automation/sap-batch-job-restart-on-error#create-a-recovery-activity-to-restart-the-job

...

The Recovery functionality in IT-Conductor allows you to automate responses when certain thresholds are exceeded. By creating a Recovery Activity, you can configure the system to automatically take action, such as restarting a job , whenever an incident occurs.

To enable the recovery activity, navigate to the “Recovery On” dropdown. Here, you can select either Warning or Alarm in the “Recovery On” field.

  • Warning: If you choose this optionWarning, the recovery activity will be triggered when the Warning threshold is exceeded. This is useful for preemptive measures, allowing you to address potential issues before they escalate.

  • Selecting the Alarm : Selecting this option means that the recovery activity will only activate when the Alarm threshold is breached. This is typically reserved for more critical situations requiring immediate attention.

Next, you’ll need to define which Recovery action the system will take. This might include restarting a job or executing a specific recovery script. IT-Conductor provides predefined recovery activities based on common scenarios, which you can easily select from the list.

For Recovery, select the automation user that will be responsible for managing the recovery actions. This ensures that the appropriate people are notified and can take any necessary follow-up actions.

If you want to be alerted whenever this recovery activity is executed, check the Alert box. This feature keeps you informed about automated processes, ensuring you can monitor actions taken on critical jobs.

Once you’ve configured the recovery settings to your liking, make sure to click Save Recovery Activity to finalize your changes. This ensures that your new recovery parameters are active and ready to respond to any incidentsCustom recoveries can be implemented in collaboration with the IT-Conductor Support Team.

[Saving & Verification]

Once all fields are filled out properly, click Save to complete the configuration.

...

For this demo, let's try creating a threshold override for <metric>the Active Users Metric.

Navigate to the Overrides screen from the service grid. Then, click the Create New Override icon here.

...

For the rest of the video, follow along with the configuration steps being shown.

In this example, we’ll show how to create an override that will be triggered once 800 or more users have logged in to the system.

In Alert Message, with the help of the Variables button, we’ll write the message that will be sent once the 800 users threshold has been breached.

If you want to review what each field does, feel free to revisit the previous part of this demo or visit the IT-Conductor Knowledge Base at your convenience.

...