Scheduled Outages - Outage Overview

Managing Outages

4.1 Outage Overview

4.1.2 Scheduled Outages

Scheduled outages are required for regular maintenance of the technology infrastructure that supports the application, including tasks such as:

■ Hardware maintenance, repair, and upgrades

■ Software upgrades and patching

■ Application changes and patching

■ Changes to improve performance and manageability of systems

These tasks should be scheduled at times best suited for continual application availability.

Table 4–4 describes scheduled outages that affect either the primary or secondary site.

Table 4–3 Recovery Steps for Unscheduled Outages on the Secondary Site

Outage Type Oracle Database 10g with Data Guard Oracle Database 10g - MAA Computer failure

(instance)

1. Restart node and standby instance.

2. Restart recovery.

If there is only one standby database and if maximum database protection is configured, then the production database will shut down to ensure that there is no data divergence with the standby database.

There is no effect on production availability if the production database Oracle Net descriptor is configured to use connect-time failover to an available standby instance.

Broker will automatically restart the apply process.

Restart node and instance when they are available.

Data corruption Restoring Fault Tolerance After a Standby Database Data Failure on page 4-54

Restoring Fault Tolerance After a Standby Database Data Failure on page 4-54 Primary database

opens with

RESETLOGS because of Flashback Database operations or point-in-time media recovery

Restoring Fault Tolerance After the Production Database Was Opened Resetlogs on page 4-55

The following sections provide best practice recommendations and preparations for reducing scheduled outages on the primary and secondary sites:

■ Managing Scheduled Outages on the Primary Site

■ Managing Scheduled Outages on the Secondary Site

■ Preparing for Scheduled Outages on the Secondary Site Table 4–4 Scheduled Outages

Outage Scope Description Examples

Site-wide The entire site where the current production database resides is unavailable. Usually known well in advance.

Scheduled power outages Site maintenance

Regular planned switchovers to test infrastructure

Hardware maintenance (node impact)

Hardware maintenance on a database server. Restricted to a node of the database cluster.

Repair of a failed component such as a memory card or CPU board

Addition of memory or CPU to an existing node in the database tier Hardware maintenance

(clusterwide impact)

Hardware maintenance on a database server cluster

Some cases of adding a node to the cluster

Upgrade or repair of the cluster interconnect

Upgrade to the storage tier that requires downtime on the database tier

System software

maintenance (node impact)

System software maintenance on a database server. The scope of the downtime is restricted to a node.

Upgrade of a software component such as the operating system

Changes to the configuration parameters for the operating system

System software

maintenance (clusterwide impact)

System software maintenance on a database server cluster

Upgrade or patching of the cluster software

Upgrade of the volume management software

Oracle patch upgrade for the database

Scheduled outage for installation of an Oracle patch

Patch Oracle software to fix a specific customer issue

Oracle patch set or software upgrade for the database

Scheduled outage for Oracle patch set or software upgrade

Patching Oracle software with a patch set

Upgrading Oracle software Database object

reorganization

Changes to the logical structure or the physical organization of Oracle Database objects, primarily to improve

performance or manageability.

Using Oracle Database online

reorganization features enables objects to be available during the reorganization.

Moving an object to a different tablespace

Converting a table to a partitioned table Renaming or dropping columns of a table

Storage maintenance Maintenance of storage where database files reside

Converting to ASM

Adding or removing storage Platform migration Changing operating system platform of

the primary and standby databases

Moving to the Linux operating system

Location migration Changing physical location of the primary database

Moving the primary database from one data center to another.

4.1.2.1 Managing Scheduled Outages on the Primary Site

If the primary site contains the production database and the secondary site contains the standby database, then outages on the primary site are the most crucial. Solutions for theses outages are critical for continued availability of the system.

Table 4–5 shows the high-level recovery steps for scheduled outages on the primary site. For outages that require multiple recovery steps, the table includes links to the detailed descriptions in Section 4.4, "Eliminating or Reducing Downtime for Scheduled Outages" beginning on page 4-57.

Table 4–5 Recovery Steps for Scheduled Outages on the Primary Site Outage Site Site shutdown Restart database

after outage

Not applicable Restart database after outage

Not applicable ^1. Database Switchover

Not applicable In general, CRS upgrades can be done online and do not require downtime.

4.1.2.2 Managing Scheduled Outages on the Secondary Site

Outages on the secondary site do not affect availability because the clients always access the primary site. Outages on the secondary site might affect the RTO if there are concurrent failures on the primary site. Outages on the secondary site can be managed with no effect on availability. If maximum protection database mode is configured, then downgrade the protection mode before scheduled outages on the standby instance or database so that there will be no downtime on the production database.

Table 4–6 describes the recovery steps for scheduled outages on the secondary site.

Primary Table 4–5 (Cont.) Recovery Steps for Scheduled Outages on the Primary Site

Outage

4.1.2.3 Preparing for Scheduled Outages on the Secondary Site

To achieve continued service during scheduled outages on a secondary site when in maximum protection mode, downgrade the maximum protection mode to maximum availability or maximum performance temporarily. When scheduling secondary site maintenance, consider that the duration of a site-wide or clusterwide outage adds to the time that the standby database lags behind the production database, which in turn lengthens the time to restore fault tolerance. See Section 2.4.2, "Data Protection Mode"

on page 2-23 for an overview of the Data Guard protection modes.

Nel documento Oracle® Database High Availability Best Practices 10g (pagine 86-90)