Managing Outages
4.1 Outage Overview
4.1.2 Scheduled Outages
Scheduled outages are required for regular maintenance of the technology infrastructure that supports the application, including tasks such as:
■ Hardware maintenance, repair, and upgrades
■ Software upgrades and patching
■ Application changes and patching
■ Changes to improve performance and manageability of systems
These tasks should be scheduled at times best suited for continual application availability.
Table 4–4 describes scheduled outages that affect either the primary or secondary site.
Table 4–3 Recovery Steps for Unscheduled Outages on the Secondary Site
Outage Type Oracle Database 10g with Data Guard Oracle Database 10g - MAA Computer failure
(instance)
1. Restart node and standby instance.
2. Restart recovery.
If there is only one standby database and if maximum database protection is configured, then the production database will shut down to ensure that there is no data divergence with the standby database.
There is no effect on production availability if the production database Oracle Net descriptor is configured to use connect-time failover to an available standby instance.
Broker will automatically restart the apply process.
Restart node and instance when they are available.
Data corruption Restoring Fault Tolerance After a Standby Database Data Failure on page 4-54
Restoring Fault Tolerance After a Standby Database Data Failure on page 4-54 Primary database
opens with
RESETLOGS because of Flashback Database operations or point-in-time media recovery
Restoring Fault Tolerance After the Production Database Was Opened Resetlogs on page 4-55
Restoring Fault Tolerance After the Production Database Was Opened Resetlogs on page 4-55
The following sections provide best practice recommendations and preparations for reducing scheduled outages on the primary and secondary sites:
■ Managing Scheduled Outages on the Primary Site
■ Managing Scheduled Outages on the Secondary Site
■ Preparing for Scheduled Outages on the Secondary Site Table 4–4 Scheduled Outages
Outage Scope Description Examples
Site-wide The entire site where the current production database resides is unavailable. Usually known well in advance.
Scheduled power outages Site maintenance
Regular planned switchovers to test infrastructure
Hardware maintenance (node impact)
Hardware maintenance on a database server. Restricted to a node of the database cluster.
Repair of a failed component such as a memory card or CPU board
Addition of memory or CPU to an existing node in the database tier Hardware maintenance
(clusterwide impact)
Hardware maintenance on a database server cluster
Some cases of adding a node to the cluster
Upgrade or repair of the cluster interconnect
Upgrade to the storage tier that requires downtime on the database tier
System software
maintenance (node impact)
System software maintenance on a database server. The scope of the downtime is restricted to a node.
Upgrade of a software component such as the operating system
Changes to the configuration parameters for the operating system
System software
maintenance (clusterwide impact)
System software maintenance on a database server cluster
Upgrade or patching of the cluster software
Upgrade of the volume management software
Oracle patch upgrade for the database
Scheduled outage for installation of an Oracle patch
Patch Oracle software to fix a specific customer issue
Oracle patch set or software upgrade for the database
Scheduled outage for Oracle patch set or software upgrade
Patching Oracle software with a patch set
Upgrading Oracle software Database object
reorganization
Changes to the logical structure or the physical organization of Oracle Database objects, primarily to improve
performance or manageability.
Using Oracle Database online
reorganization features enables objects to be available during the reorganization.
Moving an object to a different tablespace
Converting a table to a partitioned table Renaming or dropping columns of a table
Storage maintenance Maintenance of storage where database files reside
Converting to ASM
Adding or removing storage Platform migration Changing operating system platform of
the primary and standby databases
Moving to the Linux operating system
Location migration Changing physical location of the primary database
Moving the primary database from one data center to another.
4.1.2.1 Managing Scheduled Outages on the Primary Site
If the primary site contains the production database and the secondary site contains the standby database, then outages on the primary site are the most crucial. Solutions for theses outages are critical for continued availability of the system.
Table 4–5 shows the high-level recovery steps for scheduled outages on the primary site. For outages that require multiple recovery steps, the table includes links to the detailed descriptions in Section 4.4, "Eliminating or Reducing Downtime for Scheduled Outages" beginning on page 4-57.
Table 4–5 Recovery Steps for Scheduled Outages on the Primary Site Outage Site Site shutdown Restart database
after outage
Not applicable Restart database after outage
Not applicable 1. Database Switchover
Not applicable In general, CRS upgrades can be done online and do not require downtime.
Not applicable In general, CRS upgrades can be done online and do not require downtime.
4.1.2.2 Managing Scheduled Outages on the Secondary Site
Outages on the secondary site do not affect availability because the clients always access the primary site. Outages on the secondary site might affect the RTO if there are concurrent failures on the primary site. Outages on the secondary site can be managed with no effect on availability. If maximum protection database mode is configured, then downgrade the protection mode before scheduled outages on the standby instance or database so that there will be no downtime on the production database.
Table 4–6 describes the recovery steps for scheduled outages on the secondary site.
Primary Table 4–5 (Cont.) Recovery Steps for Scheduled Outages on the Primary Site
Outage
4.1.2.3 Preparing for Scheduled Outages on the Secondary Site
To achieve continued service during scheduled outages on a secondary site when in maximum protection mode, downgrade the maximum protection mode to maximum availability or maximum performance temporarily. When scheduling secondary site maintenance, consider that the duration of a site-wide or clusterwide outage adds to the time that the standby database lags behind the production database, which in turn lengthens the time to restore fault tolerance. See Section 2.4.2, "Data Protection Mode"
on page 2-23 for an overview of the Data Guard protection modes.