Flash Recovery Area Disk Group Failure - ASM Recovery After Disk and Storage Failures

Managing Outages

3. Click Yes to continue with the switchover. Click No to cancel

4.2.6 ASM Recovery After Disk and Storage Failures

4.2.6.4 Flash Recovery Area Disk Group Failure

When the flash recovery-area disk group fails, the database crashes because the control file member usually resides in the flash recovery area and Oracle requires that all control file members are accessible. The flash recovery area can also contain the flashback logs, redo log members and all backups.

A flash recovery area disk group failure should occur only when there have been multiple failures. For example, if the flash recovery-area disk group is defined as external redundancy, a single-disk failure should not be exposed to ASM. However, multiple-disk failures in a storage array may be seen by ASM causing the disk group

Note: If you have performed an Oracle Data Guard failover to a new primary database, then you can now use the following procedure to reintroduce the database into the Data Guard environment. Also, see Section 4.3.2, "Restoring a Standby Database After a Failover" on page 4-50.

to go offline. Similarly, multiple-disk failures in different failure groups in a normal or high-redundancy disk group may cause the disk group to go offline.

Table 4–9 summarizes the two possible solutions when the flash recovery-area disk group fails.

Because the loss affects only the flash recovery-area disk group, there is no loss of data.

No database media recovery is required, because the data files and the online redo log files are still present and available in the data area. A fast local restart is to startup the primary database after removing the controlfile member located in the failed flash recovery area and point to a new flash recovery area for local archiving (see "Local Restart Steps" discussion later in this section for the step-by-step procedure). However, this is a temporary fix until a new flash recovery area is created to replace the failed storage components. Oracle recommends using the "Local Recovery Steps" discussion later in this section.

If you decide to perform a Data Guard failover, then the RTO will be expressed in terms of minutes or perhaps seconds depending on the presence of the Data Guard observer process and fast-start failover. After Data Guard failover has completed, and the application is available, the flash recovery area disk group failure must still be resolved. Continue with the instructions in the following "Local Recovery Steps"

section to resolve the ASM disk group failure.

If the protection level is maximum performance or the standby database is unsynchronized with the primary database, then temporarily start up the primary database by removing the controlfile member and pointing to a temporary flash recovery area (file system) in the SPFILE. Issue a Data Guard switchover to ensure no data loss. After Data Guard switchover has completed, and the application is

available, the flash recovery area disk group failure must still be resolved. Shut down the affected database and continue with the instructions in the following "Local Recovery Steps" section to resolve the ASM disk group failure.

The RTO for local recovery only is based primarily on the time to repair and replace the failed storage components and then on the time to restore the control-file copy.

Because the loss affects only the flash recovery-area disk group, there is no loss of data.

No database media recovery is required, because the data files and the online redo log files are still present and available in the data area. As mentioned previously, you can start up the primary database by removing the controlfile member and pointing to a new flash recovery area. However, this is a temporary fix filled with availability and performance risks unless the flash recovery area is configured properly. Therefore, Oracle recommends the "Local Recovery Steps" that follow.

Local Restart Steps

For a fast local restart, perform the following steps on the primary database:

1. Change the CONTROL_FILES initialization parameter to refer only to members in the Data Area. For example:

ALTER SYSTEM SET CONTROL_FILES='+DATA/sales/control1.dbf' SCOPE=spfile;

Table 4–9 Recovery Options for Flash Recovery Area Disk Group Failure

Recovery Option Recovery Time Objective (RTO) Recovery Point Objective (RPO) Local Recovery Five minutes or less Zero

Data Guard Failover or Switchover

Five minutes or less Zero

2. Change local archive destinations and/or the flash recovery area to the local redundant, scalable destination. For example:

ALTER SYSTEM SET DB_RECOVERY_FILE_DEST='+DATA' SCOPE=spfile;

3. Startup with new settings:

STARTUP MOUNT:

You may need to disable and reenable Flashback Database because the flashback logs were damaged or lost:

ALTER DATABASE FLASHBACK OFF;

ALTER DATABASE FLASHBACK ON;

ALTER DATABASE OPEN;

Local Recovery Steps

1. Replace or get access to new storage to be leveraged as flash recovery area 2. Rebuild the ASM disk group using the new storage location:

SQL> CREATE DISKGROUP RECO NORMAL REDUNDANCY DISK 'path1','path2',...;

3. Start the instance NOMOUNT:

RMAN> STARTUP FORCE NOMOUNT;

4. Restore the control file from the surviving copy located in the data area:

RMAN> RESTORE CONTROLFILE FROM 'data_area_controlfile';

5. Start the instance MOUNT:

RMAN> STARTUP FORCE MOUNT;

6. If you use Flashback Database, then disable it:

SQL> ALTER DATABASE FLASHBACK OFF;

7. Open the database and allow instance recovery to complete:

SQL> ALTER DATABASE OPEN;

8. Issue the following statements only if Flashback Database is required:

SQL> SHUTDOWN IMMEDIATE;

SQL> STARTUP MOUNT;

SQL> ALTER DATABASE FLASHBACK ON;

SQL> ALTER DATABASE OPEN;

9. Re-create the log file members on the failed ASM disk group:

SQL> ALTER DATABASE DROP LOGFILE MEMBER 'filename';

SQL> ALTER DATABASE ADD LOGFILE MEMBER 'disk_group' TO GROUP group_no;

Note: If you performed an Oracle Data Guard failover to a new primary database, then you cannot use this procedure to reintroduce the old primary database as a standby database. This is because Flashback Database log files that are required as part of reintroducing the database have been lost. You must perform a full reinstantiation of the standby database.

10. Synchronize the control file and the flash recovery area:

RMAN> CATALOG RECOVERY AREA;

RMAN> CROSSCHECK ARCHIVELOG ALL;

RMAN> CROSSCHECK BACKUPSET;

RMAN> CROSSCHECK DATAFILECOPY ALL;

RMAN> LIST EXPIRED type;

RMAN> DELETE EXPIRED type;

11. Assuming that data has been lost in some way, take a new backup:

RMAN> BACKUP INCREMENTAL LEVEL 0 DATABASE;

Nel documento Oracle® Database High Availability Best Practices 10g (pagine 112-115)