Set Up Default Notification Rules for Each System

Monitoring Using Oracle Grid Control

3.2 Using Oracle Grid Control for System Monitoring

3.2.1 Set Up Default Notification Rules for Each System

Notification Rules are defined sets of alerts on metrics that are automatically applied to a target when it is discovered by Oracle Grid Control. For example, an administrator can create a rule that monitors the availability of database targets and generates an e-mail message if a database fails. After that rule is generated, it is applied to all existing databases and any database created in the future. Access these rules by navigating to Preferences and then choosing Rules.

The rules monitor problems that require immediate attention, such as those that can affect service availability, and Oracle or application errors. Service availability can be affected by an outage in any layer of the application stack: node, database, listener, and critical application data. A service availability failure, such as the inability to connect to the database, or the inability to access data critical to the functionality of the application, must be identified, reported, and reacted to quickly. Potential service outages such as a full archive log directory also must be addressed correctly to avoid a system outage.

Oracle Grid Control provides a series of default rules that provide a strong framework for monitoring availability. A default rule is provided for each of the preinstalled target types that come with Oracle Grid Control. These rules can be modified to conform to the policies of each individual site, and new rules can be created for site-specific targets or applications. The rules can also be set to notify users during specific time periods to create an automated coverage policy.

Consider the following recommendations:

■ Modify each rule for high-value components in the target architecture to suit the required availability requirements by using the rules modification wizard. For the database rule, set the events in Table 3–1, Table 3–2, and Table 3–3 for each target.

See Also: Oracle Enterprise Manager Concepts for more information about monitoring and using metrics in Oracle Grid Control

The frequency of the monitoring is determined by the service level agreement (SLA) for each component.

■ Use Beacon functionality to track the performance of individual applications. A Beacon can be set to perform a user transaction representative of normal

application work. Enterprise Manager can then break down the response time of that transaction into its component pieces for analysis. In addition, an alert can be triggered if the execution time of that transaction exceeds a predefined limit.

■ Add Notification Methods and use them in each Notification Rule. By default, the easiest method for alerting an administrator to a potential problem is to send e-mail. Supplement this notification method by adding a callout to an SNMP trap or operating system script that sends an alert by some method other than e-mail.

This avoids the problem that might occur if a component of the e-mail system has failed. Set additional Notification Methods by using the Set-up link at the top of any Oracle Grid Control page.

■ Modify Notification Rules to notify the administrator when there are errors in computing target availability. This might generate a false positive reading on the availability of the component, but it ensures the highest level of notification to system administrators.

Figure 3–2 shows the Notification Rule property page for choosing availability states with Down, Agent Unreachable, Agent Unreachable Resolved, and Metric Error Detected chosen.

Figure 3–2 Setting Notification Rules for Availability

In addition, modify the metrics monitored by the database rule to report the metrics shown in Table 3–1, Table 3–2, and Table 3–3. This ensures that these metrics are captured for all database targets and that trend data will be available for future

See Also:

■ Oracle Enterprise Manager Concepts for conceptual information about Beacons

■ Oracle Enterprise Manager Advanced Configuration for information about configuring service tests and Beacons

analysis. All events described in Table 3–1, Table 3–2, and Table 3–3 can be accessed from the Database Homepage by choosing All Metrics > Expand All.

Space management conditions that have the potential to cause a service outage should be monitored using the events shown in Table 3–1.

From the Alert Log Metric group, set Oracle Grid Control to monitor the alert log for errors as shown in Table 3–2.

Monitor the system to ensure that the processing capacity is not exceeded. The warning and critical levels for these events should be modified based on the usage pattern of the system. Set the events from the Database Limits metric group using the recommendations in Table 3–3.

Figure 3–3 shows the Notification Rule property page for setting choosing metrics. The user has chosen Critical and Warning as the severity states for notification. The list of Table 3–1 Recommendations for Monitoring Space

Metric Recommendation

Tablespace Space Used (%) Set this metric to monitor root file systems for any critical hardware server. This metric enables the administrator to choose the threshold percentages that Oracle Grid Control tests against, as well as the number of samples that must occur in error before a message is generated to the administrator. The recommended default settings are 70 percent for a warning and 90 percent for an error, but these values should be adjusted depending on system usage. This metric can be customized to monitor only specific tablespaces.

This metric and similar events can be set in the Tablespace Full metric group.

Archiver Hung Alert Log Error

Set this metric to monitor the alert log for ORA-00257 errors, which indicate a full archive log directory.

This metric can be set in the Alert Log Error Status metric group.

Dump Area Used (%) Set this metric to monitor the dump directory destinations. Dump space must be available so that the maximum amount of diagnostic information is saved the first time an error occurs. The recommended default settings are 70 percent for a warning and 90 percent for an error, but these should be adjusted depending on system usage.

This metric can be set in the Dump Area metric group.

Table 3–2 Recommendations for Monitoring the Alert Log

Metric Recommendation

Alert Set this metric to send an alert when an ORA-6XX, ORA-1578 (database corruption), or ORA-0060 (deadlock detected) error occurs. If any other error is recorded, then a warning message is generated.

Data Block Corruption Set this metric to monitor the alert log for ORA-01157 and ORA-27048 errors. They signal a corruption in an Oracle Database datafile.

Table 3–3 Recommendations for Monitoring Processing Capacity

Metric Recommendation

Process limit Set thresholds for this metric to warn if the number of current processes approaches the value of the PROCESSES initialization parameter.

Session limit Set thresholds for this metric to warn if the instance is approaching the maximum number of concurrent connections allowed by the database.

Available Metrics is shown in the left list box. The metrics that have been selected for notification are shown in the right list box.

Figure 3–3 Setting Notification Rules for Metrics

Nel documento Oracle® Database High Availability Best Practices 10g (pagine 73-76)