Oracle® Database High Availability Best Practices 10g

(1)

High Availability Best Practices 10g Release 2 (10.2)

B25159-01

July 2006

(2)

Contributing Authors: Andrew Babb, Tammy Bednar, Immanuel Chan, Timothy Chien, Craig B. Foch, Michael Nowak, Viv Schupmann, Michael Todd Smith, Vinay Srihari, Lawrence To, Randy Urbano, Douglas Utzig, James Viscusi

Contributors: Larry Carpenter, Joseph Meeks, Ashish Ray (coauthors of MAA white papers) Contributor: Valarie Moore (graphic artist)

The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited.

The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free.

Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose.

If the Programs are delivered to the United States Government or anyone licensing or using the Programs on behalf of the United States Government, the following notice is applicable:

U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data"

pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial Computer Software--Restricted Rights (June 1987). Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065.

The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and we disclaim liability for any damages caused by such use of the Programs.

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

The Programs may provide links to Web sites and access to content, products, and services from third parties. Oracle is not responsible for the availability of, or any content provided on, third-party Web sites.

You bear all risks associated with the use of such content. If you choose to purchase any products or services from a third party, the relationship is directly between you and the third party. Oracle is not responsible for:

(a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the third party, including delivery of products or services and warranty obligations related to purchased products or services. Oracle is not responsible for any loss or damage of any sort that you may incur from dealing with any third party.

(3)

Preface

... ix

Audience... ix

Documentation Accessibility ... ix

1 Introduction to High-Availability Best Practices

1.1 Oracle Database High-Availability Architecture ... 1-1 1.2 Oracle Database High-Availability Best Practices ... 1-1 1.3 Oracle Maximum Availability Architecture ... 1-2 1.4 Operational Best Practices ... 1-2

2 Configuring for High-Availability

2.1 Configuring Storage ... 2-1 2.1.1 Evaluate Database Performance Requirements and Storage Performance Capabilities...

2-2

2.1.2 Use Automatic Storage Management (ASM) to Manage Database Files ... 2-2 2.1.3 Use a Simple Disk and Disk Group Configuration ... 2-3 2.1.4 Use Disk Multipathing Software to Protect from Path Failure ... 2-5 2.1.5 Use Redundancy to Protect from Disk Failure... 2-5 2.1.6 Consider HARD-Compliant Storage ... 2-7 2.2 Configuring Oracle Database 10g ... 2-7 2.2.1 Requirements for High Availability... 2-8 2.2.2 Recommendations for High Availability and Fast Recoverability... 2-9 2.2.3 Recommendations to Improve Manageability ... 2-13 2.3 Configuring Oracle Database 10g with RAC ... 2-16 2.3.1 Connect to Database using Services and Virtual Internet Protocol (VIP) Address 2-16 2.3.2 Use Oracle Clusterware to Manage the Cluster and Application Availability ... 2-17 2.3.3 Use Client-Side and Server-Side Load Balancing ... 2-17 2.3.4 Mirror Oracle Cluster Registry (OCR) and Configure Multiple Voting Disks... 2-18 2.3.5 Regularly Back Up OCR to Tape or Offsite ... 2-18 2.3.6 Verify That CRS and RAC Use Same Interconnect Network... 2-19 2.3.7 Configure All Databases for Maximum Instances in the Cluster ... 2-19 2.4 Configuring Oracle Database 10g with Data Guard... 2-20 2.4.1 Physical or Logical Standby ... 2-21

(4)

2.4.4 General Configuration Best Practices for Data Guard ... 2-25 2.4.5 Redo Transport Services Best Practices ... 2-29 2.4.6 Log Apply Services Best Practices... 2-33 2.4.7 Role Transition Best Practices ... 2-37 2.4.8 Maintaining a Physical Standby Database as a Clone... 2-41 2.4.9 Recommendations on Protecting Data Outside of the Database... 2-43 2.4.10 Assessing Data Guard Performance ... 2-43 2.5 Configuring Backup and Recovery ... 2-45 2.5.1 Use Oracle Database Features and Products ... 2-46 2.5.2 Configuration and Administration ... 2-47 2.5.3 Backup to Disk ... 2-49 2.5.4 Backup to Tape... 2-52 2.5.5 Backup and Recovery Maintenance... 2-52 2.6 Configuring Fast Application Failover ... 2-53 2.6.1 Configuring Clients for Failover ... 2-54 2.6.2 Client Failover in a RAC Database... 2-54 2.6.3 Failover from a RAC Primary Database to a Standby Database ... 2-55

3 Monitoring Using Oracle Grid Control

3.1 Overview of Monitoring and Detection for High Availability ... 3-1 3.2 Using Oracle Grid Control for System Monitoring ... 3-1 3.2.1 Set Up Default Notification Rules for Each System... 3-3 3.2.2 Use Database Target Views to Monitor Health, Availability, and Performance... 3-6 3.2.3 Use Event Notifications to React to Metric Changes... 3-8 3.2.4 Use Events to Monitor Data Guard System Availability ... 3-8 3.3 Managing the High-Availability Environment with Oracle Grid Control... 3-9 3.3.1 Check Oracle Grid Control Policy Violations... 3-9 3.3.2 Use Oracle Grid Control to Manage Oracle Patches and Maintain System Baselines ...

3-9

3.3.3 Use Oracle Grid Control to Manage Data Guard Targets ... 3-10

4 Managing Outages

4.1 Outage Overview... 4-1 4.1.1 Unscheduled Outages ... 4-1 4.1.2 Scheduled Outages ... 4-5 4.2 Recovering from Unscheduled Outages... 4-9 4.2.1 Complete Site Failover ... 4-10 4.2.2 Database Failover with a Standby Database... 4-13 4.2.3 Database Switchover with a Standby Database ... 4-19 4.2.4 RAC Recovery for Unscheduled Outages... 4-23 4.2.5 Application Failover... 4-25 4.2.6 ASM Recovery After Disk and Storage Failures ... 4-25 4.2.7 Recovering from Data Corruption (Data Failures) ... 4-34 4.2.8 Recovering from Human Error... 4-37 4.3 Restoring Fault Tolerance ... 4-44

(5)

4.3.3 Restoring ASM Disk Groups after a Failure ... 4-52 4.3.4 Restoring Fault Tolerance After Planned Downtime on Secondary Site or Clusterwide

Outage 4-53

4.3.5 Restoring Fault Tolerance After a Standby Database Data Failure... 4-54 4.3.6 Restoring Fault Tolerance After the Production Database Was Opened Resetlogs 4-55 4.3.7 Restoring Fault Tolerance After Dual Failures... 4-57 4.4 Eliminating or Reducing Downtime for Scheduled Outages... 4-57 4.4.1 Storage Maintenance ... 4-57 4.4.2 RAC Database Patches ... 4-58 4.4.3 Database Upgrades... 4-61 4.4.4 Database Platform or Location Migration... 4-63 4.4.5 Online Database and Application Upgrades... 4-66 4.4.6 Database Object Reorganization... 4-68 4.4.7 System Maintenance... 4-70

5 Migrating to an MAA Environment

5.1 Overview of Migrating to MAA ... 5-1 5.2 Migrating to RAC from a Single Instance ... 5-2 5.3 Adding a Data Guard Configuration to a RAC Primary ... 5-2

A Database SPFILE and Oracle Net Configuration File Samples

A.1 SPFILE Samples... A-2 A.2 Oracle Net Configuration Files ... A-6 A.2.1 SQLNET.ORA Example for All Hosts Using Dynamic Instance Registration ... A-6 A.2.2 LISTENER.ORA Example for All Hosts Using Dynamic Instance Registration ... A-7 A.2.3 TNSNAMES.ORA Example for All Hosts Using Dynamic Instance Registration.... A-7

Glossary

Index

(6)

2–1 Allocating Entire Disks ... 2-4 2–2 Partitioning Each Disk... 2-4 2–3 LGWR ASYNC Archival with Network Server (LNSn) Processes... 2-31 3–1 Oracle Grid Control Home Page... 3-2 3–2 Setting Notification Rules for Availability ... 3-4 3–3 Setting Notification Rules for Metrics... 3-6 3–4 Overview of System Performance ... 3-7 4–1 Network Routes Before Site Failover ... 4-11 4–2 Network Routes After Site Failover ... 4-12 4–3 Data Guard Overview Page Showing ORA-16625 Error ... 4-16 4–4 Failover Confirmation Page ... 4-16 4–5 Failover Progress Page ... 4-17 4–6 Data Guard Overview Page After a Failover Completes... 4-18 4–7 Switchover Operation Confirmation... 4-21 4–8 Processing Page During Switchover ... 4-21 4–9 New Primary Database After Switchover ... 4-22 4–10 Enterprise Manager Reports Disk Failures ... 4-28 4–11 Enterprise Manager Reports ASM Disk Groups Status ... 4-29 4–12 Enterprise Manager Reports Pending REBAL Operation... 4-29 4–13 Partitioned Two-Node RAC Database... 4-48 4–14 RAC Instance Failover in a Partitioned Database ... 4-49 4–15 Nonpartitioned RAC Instances... 4-50 4–16 Fast-Start Failover and the Observer Are Successfully Enabled... 4-52 4–17 Reinstating the Former Primary Database After a Fast-Start Failover... 4-52 4–18 Online Database Upgrade with Oracle Streams... 4-67 4–19 Database Object Reorganization Using Oracle Enterprise Manager... 4-69

(7)

2–1 Determining the Appropriate Protection Mode ... 2-24 2–2 Archiving Recommendations... 2-27 2–3 Minimum Recommended Settings for FastStartFailoverThreshold... 2-40 2–4 Comparison of Backup Options... 2-50 2–5 Typical Wait Times for Client Failover... 2-53 3–1 Recommendations for Monitoring Space ... 3-5 3–2 Recommendations for Monitoring the Alert Log... 3-5 3–3 Recommendations for Monitoring Processing Capacity ... 3-5 3–4 Recommended Notification Rules for Metrics ... 3-8 3–5 Recommendations for Setting Data Guard Events ... 3-9 4–1 Unscheduled Outages ... 4-2 4–2 Recovery Times and Steps for Unscheduled Outages on the Primary Site... 4-3 4–3 Recovery Steps for Unscheduled Outages on the Secondary Site ... 4-5 4–4 Scheduled Outages ... 4-6 4–5 Recovery Steps for Scheduled Outages on the Primary Site ... 4-7 4–6 Managing Scheduled Outages on the Secondary Site ... 4-9 4–7 Types of ASM Failures and Recommended Repair ... 4-26 4–8 Recovery Options for Data Area Disk Group Failure ... 4-30 4–9 Recovery Options for Flash Recovery Area Disk Group Failure... 4-32 4–10 Non Database Object Corruption and Recommended Repair ... 4-35 4–11 Flashback Solutions for Different Outages... 4-38 4–12 Summary of Flashback Features... 4-38 4–13 Additional Processing When Restarting or Rejoining a Node or Instance ... 4-45 4–14 Restoration and Connection Failback ... 4-47 4–15 SQL Statements for Starting Physical and Logical Standby Databases ... 4-53 4–16 SQL Statements to Start Redo Apply and SQL Apply ... 4-53 4–17 Queries to Determine RESETLOGS SCN and Current SCN OPEN RESETLOGS ... 4-55 4–18 SCN on Standby Database is Behind Resetlogs SCN on the Production Database ... 4-55 4–19 SCN on the Standby is Ahead of Resetlogs SCN on the Production Database ... 4-56 4–20 Re-Creating the Production and Standby Databases... 4-57 4–21 Platform Migration and Database Upgrade Options ... 4-61 4–22 Platform and Location Migration Options... 4-64 4–23 Some Object Reorganization Capabilities ... 4-68 5–1 Starting configurations Before Migrating to an MAA Environment ... 5-1 A–1 Generic SPFILE Parameters for Primary, Physical Standby, and Logical Standby Databases

A-2

A–2 RAC SPFILE Parameters for Primary, Physical Standby, and Logical Standby Databases....

A-3

A–3 Data Guard SPFILE Parameters for Primary, Physical Standby, and Logical Standby Databases A-3

A–4 Data Guard Broker SPFILE Parameters for Primary, Physical Standby, and Logical Standby Databases A-4

A–5 Data Guard (No Broker) SPFILE Parameters for Primary, Physical Standby, and Logical Standby Databases A-4

A–6 Data Guard SPFILE Parameters for Primary and Physical Standby Database Only... A-4 A–7 Data Guard SPFILE Parameters for Primary and Logical Standby Database Only... A-5 A–8 Data Guard SPFILE Parameters for Primary Database, Physical Standby Database, and

Logical Standby Database: Maximum Availability or Maximum Protection Modes A-5 A–9 Data Guard SPFILE Parameters for Primary Database, Physical Standby Database, and

Logical Standby Database: Maximum Performance Mode A-6

(8)

(9)

This book describes best practices for configuring and maintaining your Oracle database system and network components for high availability.

Audience

This book is intended for chief technology officers, information technology architects, database administrators, system administrators, network administrators, and

application administrators who perform the following tasks:

■ Plan data centers

■ Implement data center policies

■ Maintain high availability systems

■ Plan and build high availability solutions

Documentation Accessibility

Our goal is to make Oracle products, services, and supporting documentation accessible, with good usability, to the disabled community. To that end, our

documentation includes features that make information available to users of assistive technology. This documentation is available in HTML format, and contains markup to facilitate access by the disabled community. Accessibility standards will continue to evolve over time, and Oracle is actively engaged with other market-leading

technology vendors to address technical obstacles so that our documentation can be accessible to all of our customers. For more information, visit the Oracle Accessibility Program Web site at

http://www.oracle.com/accessibility/

Accessibility of Code Examples in Documentation

Screen readers may not always correctly read the code examples in this document. The conventions for writing code require that closing braces should appear on an

otherwise empty line; however, some screen readers may not always read a line of text that consists solely of a bracket or brace.

Accessibility of Links to External Web Sites in Documentation

This documentation may contain links to Web sites of other companies or

organizations that Oracle does not own or control. Oracle neither evaluates nor makes any representations regarding the accessibility of these Web sites.

(10)

within the United States of America 24 hours a day, seven days a week. For TTY support, call 800.446.2398.

Conventions

The following text conventions are used in this document:

Convention Meaning

boldface Boldface type indicates graphical user interface elements associated with an action, or terms defined in text or the glossary.

italic Italic type indicates book titles, emphasis, or placeholder variables for which you supply particular values.

monospace Monospace type indicates commands within a paragraph, URLs, code in examples, text that appears on the screen, or text that you enter.

(11)

1

Introduction to High-Availability Best Practices

This chapter describes how using Oracle high-availability best practices can increase availability to the Oracle database as well as the entire technology stack. This chapter contains the following topics:

■ Oracle Database High-Availability Architecture

■ Oracle Database High-Availability Best Practices

■ Oracle Maximum Availability Architecture

■ Operational Best Practices

1.1 Oracle Database High-Availability Architecture

Choosing and implementing the architecture that best fits the availability requirements of a business can be a daunting task. This architecture must encompass appropriate redundancy, provide adequate protection from all types of outages, ensure consistent high performance and robust security, while being easy to deploy, manage, and scale.

Needless to mention, this architecture should be driven by well-understood business requirements. Choosing and implementing a high-availability architecture is covered in Oracle Database High Availability Overview.

Before using the best practices presented in this book, your organization should have already chosen a high-availability architecture for your database as described in Oracle Database High Availability Overview. If you have not already done so, then refer to that document to learn about the high-availability solutions that Oracle offers for Oracle Database before proceeding with this book.

1.2 Oracle Database High-Availability Best Practices

To build, implement and maintain a high-availability architecture, a business needs high-availability best practices that involve both technical and operational aspects of its IT systems and business processes. Such a set of best practices removes the complexity of designing a high-availability architecture, maximizes availability while using minimum system resources, reduces the implementation and maintenance costs of the high-availability systems in place, and makes it easy to duplicate the

high-availability architecture in other areas of the business. An enterprise with a well-articulated set of high-availability best practices that encompass high-availability analysis frameworks, business drivers and system capabilities, will enjoy an improved operational resilience and enhanced business agility.

(12)

Building, implementing, and maintaining a high-availability architecture for Oracle Database using high-availability best practices is the purpose of this book. By using the Oracle Database high-availability best practices described in this book, you will be able to:

■ Reduce the implementation cost of an Oracle Database high-availability system by following detailed guidelines on configuring your database, storage, application failover, backup and recovery as described in Chapter 2, "Configuring for High-Availability"

■ Avoid potential downtime by monitoring and maintaining your database using Oracle Grid Control as described in Chapter 3, "Monitoring Using Oracle Grid Control"

■ Recover quickly from unscheduled outages caused by computer failure, storage failure, human error, or data corruption as described in Chapter 4, "Managing Outages"

■ Eliminate or reduce downtime that might occur due to scheduled maintenance such as database patches or application upgrades as described in Chapter 4,

"Managing Outages"

1.3 Oracle Maximum Availability Architecture

Oracle Maximum Availability Architecture (MAA) is an Oracle best practices blueprint based on proven Oracle high-availability technologies and recommendations. The high-availability best practices described in this book make up one of several components of MAA. MAA involves high-availability best practices for all Oracle products across the entire technology stack—Oracle Database, Oracle Application Server, Oracle Applications, Oracle Collaboration Suite, and Oracle Grid Control.

Some of the key features of MAA include:

■ Considers various business service level agreements (SLA) to make high-availability best practices as widely applicable as possible

■ Leverages database grid servers and storage grid with low-cost storage to provide highly resilient, lower cost infrastructure

■ Uses results from extensive performance impact studies for different configurations to ensure that the high-availability architecture is optimally configured to perform and scale to business needs

■ Gives the ability to control the length of time to recover from an outage and the amount of acceptable data loss from a natural disaster

■ Evolves with each Oracle version and is completely independent of hardware and operating system

For more information on MAA and documentation on best practices for all components of MAA, visit the MAA web site at:

http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm

1.4 Operational Best Practices

One of the best ways to reduce downtime is incorporating operational best practices.

You can often prevent problems and downtime before they occur by rigorously testing changes in your test environment, following stringent change control policies to guard your primary database from harm, and having a well-validated repair strategy for each outage type.

(13)

A monitoring infrastructure such as Grid Control is essential to quickly detect problems. Having an outage and repair decision tree as well as an automated or automatic repair facility reduces downtime by eliminating or reducing decision and repair times.

The following is a list of key operational practices:

■ Document and communicate service level agreements (SLA)

■ Create test environments

A good test environment accurately mimics the production system to test changes and prevent problems before they can affect your business.

■ Establish change control and security procedures

Change control and security procedures maintain the stability of the system and ensure that no changes are incorporated in the primary database unless they have been rigorously evaluated on your test systems.

■ Set up and follow security best practices

The biggest threat to corporate data comes from employees and contractors with internal access to networks and facilities. Corporate data can be at grave risk if placed on a system or database that does not have proper security measures in place. A well-defined security policy can help protect your systems from unwanted access and protect sensitive corporate information from sabotage.

Proper data protection reduces the chance of outages due to security breaches.

■ Leverage Grid Control or another monitoring infrastructure to detect and react to potential failures and problems before they occur

– Monitor system, network, and database statistics – Monitor performance statistics

– Create performance thresholds as early warning indicators that a system or application has a problem or is underperforming

■ Leverage MAA recommended repair strategies and create an outage and repair decision tree for crisis scenarios using the recommended MAA matrix

■ Automate and optimize repair practices to minimize downtime by following MAA best practices

See Also:

■ Oracle Database Concepts for an overview of database security

■ Oracle Database Security Guide for security checklists and recommendations

See Also: Chapter 4, "Managing Outages" for more information on repair strategies and practices

(14)

(15)

2

Configuring for High-Availability

This chapter describes Oracle configuration best practices for Oracle Database and related components.

This chapter contains these topics:

■ Configuring Storage

■ Configuring Oracle Database 10g

■ Configuring Oracle Database 10g with RAC

■ Configuring Oracle Database 10g with Data Guard

■ Configuring Backup and Recovery

■ Configuring Fast Application Failover

2.1 Configuring Storage

This section describes best practices for configuring a fault-tolerant storage subsystem that protects data while providing manageability and performance. These practices apply to all Oracle Database high-availability architectures described in Oracle Database High Availability Overview.

This section contains these topics:

■ Evaluate Database Performance Requirements and Storage Performance Capabilities

■ Use Automatic Storage Management (ASM) to Manage Database Files

■ Use a Simple Disk and Disk Group Configuration

■ Use Disk Multipathing Software to Protect from Path Failure

■ Use Redundancy to Protect from Disk Failure

■ Consider HARD-Compliant Storage

See Also: Appendix A, "Database SPFILE and Oracle Net Configuration File Samples" for complete examples of database parameter settings

(16)

2.1.1 Evaluate Database Performance Requirements and Storage Performance Capabilities

Characterize your database performance requirements using different application workloads. Extract statistics during your target workloads by getting the beginning and end statistical snapshots. Example target workloads include:

■ Average load

■ Peak load

■ Batch processing

The necessary statistics can be derived from Automatic Workload Repository (AWR) reports or gathered from the GV$SYSSTAT view. Along with understanding the database performance requirements, the performance capabilities of a storage array must be evaluated.

Low-cost storage arrays, low-cost storage networks, and Oracle Database 10g can in combination create a low-cost storage grid with excellent performance and availability.

Low-cost storage is most successfully deployed for databases with certain types of performance and availability requirements. Compared to traditional high-end storage arrays, low-cost storage arrays have excellent data throughput and superior price for each gigabyte. However, low-cost storage arrays do not have better I/O rates for OLTP type applications than traditional storage, although the cost for each I/O per second is comparable. The Oracle Resilient Low-Cost Storage Initiative is designed to help customers reduce IT spending and promote use of low-cost storage arrays in both departmental and enterprise environments.

The Oracle Database flash recovery area is an ideal candidate for low-cost storage.

Because the flash recovery area contains recovery related files that are typically accessed with sequential 1MB streams, the performance characteristics of low-cost storage are well suited for the flash recovery area. The flash recovery area can be configured to use low-cost storage while the database area remains on traditional storage.

2.1.2 Use Automatic Storage Management (ASM) to Manage Database Files

ASM is a vertical integration of both the file system and the volume manager built specifically for Oracle database files. It extends the concept of stripe and mirror everything (SAME) to optimize performance, while removing the need for manual I/O tuning (distributing the datafile layout to avoid hot spots). ASM helps manage a dynamic database environment by letting you grow the database size without shutting down the database to adjust the storage allocation. ASM also enables low-cost

modular storage to deliver higher performance and greater availability by supporting mirroring as well as striping.

ASM should be used to manage all database files. However, ASM can be phased into your environment initially supporting only the flash recovery area. This approach is

See Also:

■ Best Practices for Creating a Low-Cost Storage Grid for Oracle Databases at

http://www.oracle.com/technology/deploy/availabil ity/htdocs/maa.htm

■ Oracle Resilient Low-Cost Storage Initiative Web site at

http://www.oracle.com/technology/deploy/availabil ity/htdocs/lowcoststorage.html

(17)

particularly well suited for introducing low-cost storage into an existing environment where traditional storage configurations currently exist.

To improve manageability, ASMLib should be used on platforms where it is available.

ASMLib is a support library for ASM. ASMLib eliminates the impact when the mappings of disk device names change upon system reboot. Although ASMLib is not required to run ASM, it simplifies the management of disk device names, makes the discovery process simpler, and removes the challenge of having disks added to one node and not be known to other nodes in the cluster.

2.1.3 Use a Simple Disk and Disk Group Configuration

When using ASM for database storage, you should create two disk groups: one disk group for the database area and another disk group for the flash recovery area:

■ The database area contains active database files, such as datafiles, control files, online redo log files, Data Guard Broker metadata files, and change tracking files used for RMAN incremental backups. For example:

CREATE DISKGROUP DATA DISK

'/devices/lun01','/devices/lun02','/devices/lun03','/devices/lun04';

■ The flash recovery area contains recovery-related files, such as a copy of the current control file, a member of each online redo log file group, archived redo log files, RMAN backup sets, and flashback log files. For example:

CREATE DISKGROUP RECO DISK

'/devices/lun05','/devices/lun06','/devices/lun07','/devices/lun08', '/devices/lun09','/devices/lun10','/devices/lun11','/devices/lun12';

To simplify file management, use Oracle managed files to control file naming. Enable Oracle managed files by setting initialization parameters DB_CREATE_FILE_DEST, DB_RECOVERY_FILE_DEST, and DB_RECOVERY_FILE_DEST_SIZE:

See Also:

■ Chapter 16 "Migrating Databases To and From ASM with Recovery Manager" in the Oracle Database Backup and Recovery Advanced User's Guide

■ Oracle Database 10g Release 2 Automatic Storage Management Overview and Technical Best Practices at

http://www.oracle.com/technology/products/databas e/asm/pdf/asm_10gr2_bptwp_sept05.pdf

■ Oracle ASMLib Web site at

http://www.oracle.com/technology/tech/linux/asmli b/index.html

■ Oracle Database Administrator's Guide for more information on configuring Automatic Storage Management

Note: Using a flash recovery area by setting DB_RECOVERY_FILE_

DEST requires that you also set DB_RECOVERY_FILE_DEST_SIZE to bound the amount of disk space used by the flash recovery area. DB_

RECOVERY_FILE_DEST and DB_RECOVERY_FILE_DEST_SIZE are dynamic parameters that allow you to change the destination and size of the flash recovery area.

(18)

DB_CREATE_FILE_DEST=+DATA DB_RECOVERY_FILE_DEST=+RECO DB_RECOVERY_FILE_DEST_SIZE=500G

You have two options when partitioning disks for ASM use:

■ Allocate entire disks to the database area and flash recovery area disk groups

■ Partition each disk into two partitions, one for the database area and another for the flash recovery area

Figure 2–1 Allocating Entire Disks

Figure 2–1 illustrates allocating entire disks. The advantages of this option are:

■ It is easier to manage the disk partitions at the operating system level, because each disk is partitioned as just one large partition.

■ ASM rebalance operations following a disk failure complete more quickly, because there is only one disk group to rebalance.

The disadvantage of this option is:

■ Less I/O bandwidth, because each disk group is spread over only a subset of the available disks.

Figure 2–2 Partitioning Each Disk

The second option is illustrated in Figure 2–2. It requires partitioning each disk into two partitions: a smaller partition on the faster outer portion of each drive for the database area, and a larger partition on the slower inner portion of each drive for the flash recovery area. The ratio for the size of the inner and outer partitions depends on the estimated size of the database area and the flash recovery area.

Flash Recovery Area Database Area

Database Area

Flash Recovery

Area

(19)

The advantage of this approach is:

■ Higher I/O bandwidth available, because both disk groups are spread over all available spindles. This advantage is considerable for the database area disk group for I/O intensive applications.

The disadvantages are:

■ A double disk failure may result in the loss of both disk groups, requiring the use of a standby database or tape backups for recovery.

■ An ASM rebalance operation following a disk failure is longer, because both disk groups are affected.

■ Higher initial administrative efforts are required to partition each disk properly.

2.1.4 Use Disk Multipathing Software to Protect from Path Failure

Disk multipathing software aggregates multiple independent I/O paths into a single logical path. The path abstraction provides I/O load balancing across host bus adapters (HBA) and nondisruptive failovers when there is a failure in the I/O path.

Disk multipathing software should be used in conjunction with ASM.

When specifying disk names during disk group creation in ASM, the logical device representing the single logical path should be used. For example, when using Device Mapper on Linux 2.6, a logical device path of /dev/dm-0 may be the aggregation of physical disks /dev/sdc and /dev/sdh. Within ASM, the asm_diskstring parameter should contain /dev/dm-* to discover the logical device /dev/dm-0, and that logical device should be used during disk group creation:

asm_diskstring='/dev/dm-*' CREATE DISKGROUP DATA DISK

'/dev/dm-0','/dev/dm-1','/dev/dm-2','/dev/dm-3';

2.1.5 Use Redundancy to Protect from Disk Failure

When setting up redundancy to protect from hardware failures, there are two options to consider:

■ Storage array based RAID

■ ASM redundancy

Oracle recommends that you configure redundancy in the storage array by enabling RAID protection, such as RAID1 (mirroring) or RAID5 (striping plus parity). For example, to create an ASM disk group where redundancy is provided by the storage array, first create the RAID-protected logical unit numbers (LUNs) in the storage array, and then create the ASM disk group using the EXTERNAL REDUNDANCY clause:

CREATE DISKGROUP DATA EXTERNAL REDUNDANCY DISK

'/devices/lun1','/devices/lun2','/devices/lun3','/devices/lun4';

If the storage array does not offer the desired level of redundancy, or if there is a need to configure redundancy across multiple storage arrays, then use ASM redundancy.

See Also:

■ Oracle Database Backup and Recovery Basics for information on setting up and sizing the flash recovery area

■ Oracle Database Administrator's Guide for information on automatic storage management

(20)

ASM provides redundancy with the use of failure groups, which are defined during disk group creation. ASM redundancy can be either Normal redundancy, where files are two-way mirrored, or high redundancy, where files are three-way mirrored. Once a disk group is created, the redundancy level cannot be changed.

Failure group definition is specific to each storage setup, but these guidelines should be followed:

■ If every disk is available through every I/O path, as would be the case if using disk multipathing software, then leave each disk in its own failure group. This is the default ASM behavior if creating a disk group without explicitly defining failure groups.

CREATE DISKGROUP DATA NORMAL REDUNDANCY DISK

'/devices/diska1','/devices/diska2','/devices/diska3','/devices/diska4', '/devices/diskb1','/devices/diskb2','/devices/diskb3','/devices/diskb4';

■ If every disk is not available through every I/O path, then define failure groups to protect against the piece of hardware that you are concerned about failing. Here are three examples:

– For an array with two controllers where each controller sees only half of the drives, create a disk group with two failure groups, one for each controller, to protect against controller failure:

CREATE DISKGROUP DATA NORMAL REDUNDANCY FAILGROUP controller1 DISK

'/devices/diska1','/devices/diska2','/devices/diska3','/devices/diska4' FAILGROUP controller2 DISK

'/devices/diskb1','/devices/diskb2','/devices/diskb3','/devices/diskb4';

– For an array with two controllers where every disk is seen through both controllers, create a disk group with each disk in its own failure group:

CREATE DISKGROUP DATA NORMAL REDUNDANCY DISK

'/devices/diska1','/devices/diska2','/devices/diska3','/devices/diska4', '/devices/diskb1','/devices/diskb2','/devices/diskb3','/devices/diskb4';

– For a storage network with multiple storage arrays, you want to mirror across storage arrays, then create a disk group with two failure groups, one for each array, to protect against array failure:

CREATE DISKGROUP DATA NORMAL REDUNDANCY FAILGROUP array1 DISK

'/devices/diska1','/devices/diska2','/devices/diska3','/devices/diska4' FAILGROUP array2 DISK

'/devices/diskb1','/devices/diskb2','/devices/diskb3','/devices/diskb4';

When determining the proper size of a disk group that is protected with ASM redundancy, enough free space must exist in the disk group so that when a disk fails ASM can automatically reconstruct the contents of the failed drive to other drives in the disk group while the database remains online. The amount of space required to ensure ASM can restore redundancy following disk failure is in the column

REQUIRED_MIRROR_FREE_MB in the V$ASM_DISKGROUP view. The amount of free space that can be safely used in a disk group, taking mirroring into account, and yet be able to restore redundancy after a disk failure is in the column USABLE_FILE_MB in the V$ASM_DISKGROUP view. USABLE_FILE_MB should always be greater than zero.

If USABLE_FILE_MB falls below zero, then more disks should be added to the disk group.

(21)

2.1.6 Consider HARD-Compliant Storage

Consider HARD-compliant storage for the greatest protection against data corruption.

Data corruption is very rare, but it can have a catastrophic effect on a business when it occurs.

The goal of the Hardware Assisted Resilient Data (HARD) initiative is to eliminate a class of failures that the computer industry has so far been powerless to prevent. RAID has gained a wide following in the storage industry by ensuring the physical

protection of data. HARD takes data protection to the next level by going beyond protecting physical data to protecting business data.

The HARD initiative is designed to prevent data corruption before it happens. Under the HARD initiative, Oracle partners with storage vendors to implement Oracle data validation and checking algorithms inside storage devices. This makes it possible to prevent corrupted data from being written to permanent storage.

The classes of data corruption that Oracle addresses with HARD include:

■ Writes that physically and logically corrupt Oracle blocks

■ Writes of database blocks to incorrect locations

■ Writes of partial or incomplete blocks

■ Writes by other applications to Oracle data blocks

End-to-end block validation is the technology employed by the operating system or storage subsystem to validate the Oracle Database data block contents. By validating Oracle Database data in the storage devices, data corruption is detected and

eliminated before it can be written to permanent storage. This goes beyond the current Oracle Database block validation features that do not detect a stray, lost, or corrupted write until the next physical read.

Storage vendors who are partners with Oracle are given the opportunity to implement validation checks based on a specification. A particular vendor's implementation may offer features specific to its storage technology. Oracle maintains a Web site that shows a comparison of each vendor's solution by product and Oracle version.

2.2 Configuring Oracle Database 10g

The best practices discussed in this section apply to Oracle Database 10g database architectures in general, including all architectures described in Oracle Database High Availability Overview:

■ Oracle Database 10g

■ Oracle Database 10g with RAC

Note: When using ASM to manage database storage, ASM should always be configured as external redundancy. Additionally, HARD protections should be disabled when doing any rebalance operations, such as adding a new disk, to avoid the risk of HARD inadvertently flagging the movement of data as a bad write.

See Also:

http://www.oracle.com/technology/deploy/availability /htdocs/HARD.html for the most recent information on the HARD initiative

(22)

■ Oracle Database 10g with Data Guard

■ Oracle Database 10g with RAC and Data Guard (MAA)

These recommendations are identical for both the primary and standby databases when Oracle Data Guard is used. It is necessary to adopt these practices to reduce or avoid outages, reduce risk of corruption, and improve recovery performance.

This section contains the following types of best practices for configuring the database in general:

■ Requirements for High Availability – Enable ARCHIVELOG Mode – Enable Block Checksums

■ Recommendations for High Availability and Fast Recoverability – Configure the Size of Redo Log Files and Groups Appropriately – Use a Flash Recovery Area

– Enable Flashback Database

– Use Fast-Start Fault Recovery to Control Instance Recovery Time – Enable Database Block Checking

– Set DISK_ASYNCH_IO

■ Recommendations to Improve Manageability – Use Automatic Performance Tuning Features – Use a Server Parameter File

– Use Automatic Undo Management – Use Locally Managed Tablespaces

– Use Automatic Segment Space Management

– Use Temporary Tablespaces and Specify a Default Temporary Tablespace – Use Resumable Space Allocation

– Use Database Resource Manager

2.2.1 Requirements for High Availability

This section describes the following minimum requirements for configuring Oracle Database for high availability:

■ Enable ARCHIVELOG Mode

■ Enable Block Checksums

2.2.1.1 Enable ARCHIVELOG Mode

ARCHIVELOG mode enables online database backup and is necessary to recover the database to a point in time later than what has already been restored. Architectures such as Oracle Data Guard and Flashback Database require that the production database run in ARCHIVELOG mode.

See Also: Oracle Database Administrator's Guide for more information about using automatic archiving

(23)

2.2.1.2 Enable Block Checksums

By default, Oracle always validates the data blocks that it reads from disk. Enabling data and log block checksums by setting DB_BLOCK_CHECKSUM to TYPICAL enables Oracle to detect other types of corruption caused by underlying disks, storage systems, or I/O systems. Before a data block is written to disk, a checksum is computed and stored in the block. When the block is subsequently read from disk, the checksum is computed again and compared with the stored checksum. Any difference is treated as a media error, and an ORA-1578 error is signaled. Block checksums are always maintained for the SYSTEM tablespace. If DB_BLOCK_CHECKSUM is set to FULL, then in-memory corruption is also detected before being written to disk.

In addition to enabling data block checksums, Oracle also calculates a checksum for every redo log block before writing it to the current log. Redo record corruption is found as soon as the log is archived. Without this option, corruption in a redo log can go unnoticed until the log is applied to a standby database, or until a backup is restored and rolled forward through the log containing the corrupt log block.

RMAN also calculates checksums when taking backups to ensure that all blocks being backed up are validated.

Ordinarily the overhead for TYPICAL is one to two percent and for FULL is four to five percent. The default setting, TYPICAL, provides critical detection of corruption at very low cost and remains a requirement for high availability. Measure the performance impact with your workload on a test system and ensure that the performance impact is acceptable before moving from TYPICAL to FULL on an active database.

2.2.2 Recommendations for High Availability and Fast Recoverability

This section describes Oracle Database best practices for reducing recovery time or increasing its availability and redundancy:

■ Configure the Size of Redo Log Files and Groups Appropriately

■ Use a Flash Recovery Area

■ Enable Flashback Database

■ Use Fast-Start Fault Recovery to Control Instance Recovery Time

■ Enable Database Block Checking

■ Set DISK_ASYNCH_IO

■ Set LOG_BUFFER to At Least 8 MB

■ Use Automatic Shared Memory Management

■ Increase PARALLEL_EXECUTION_MESSAGE_SIZE

■ Tune PARALLEL_MIN_SERVERS

2.2.2.1 Configure the Size of Redo Log Files and Groups Appropriately

Use Oracle log multiplexing to create multiple redo log members in each redo group, one in the data area and one in the flash recovery area. This protects against a failure involving the redo log, such as a disk or I/O failure for one of the members, or a user error that accidentally removes a member through an operating system command. If at least one redo log member is available, then the instance can continue to function.

All online redo log files should be the same size and configured to switch

approximately once an hour during normal activity. They should not switch more frequently than every 20 minutes during peak activity.

(24)

There should be a minimum of four online log groups to prevent LGWR from waiting for a group to be available following a log switch. A group might be unavailable because a checkpoint has not yet completed or because the group has not yet been archived.

2.2.2.2 Use a Flash Recovery Area

The flash recovery area is Oracle managed disk space that provides a centralized disk location for backup and recovery files. The flash recovery area is defined by setting the following database initialization parameters:

■ DB_RECOVERY_FILE_DEST

This parameter specifies the default location for the flash recovery area.

■ DB_RECOVERY_FILE_DEST_SIZE

This parameter specifies (in bytes) the hard limit on the total space to be used by target database recovery files created in the recovery area location.

The flash recovery area should be the primary location for recovery. When the flash recovery area is properly sized, files needed for repair will be readily available. The minimum recommended disk limit is the combined size of the database, incremental backups, all archived redo logs that have not been copied to tape, and flashback logs.

2.2.2.3 Enable Flashback Database

Flashback Database enables you to rewind the database to a previous point in time without restoring backup copies of the datafiles. Flashback Database is a revolutionary recovery feature that operates on only the changed data. Flashback Database makes the time to correct an error proportional to the time to cause and detect the error, without recovery time being a function of the database size. You can flash back a database from both RMAN and SQL*Plus with a single command instead of using a complex procedure.

During normal runtime, Flashback Database buffers and writes before images of data blocks into the flashback logs, which reside in the flash recovery area. Ensure there is sufficient I/O bandwidth available to the flash recovery area to maintain flashback write throughput. If flashback writes are slow, as evidenced by the flashback free buffer waits wait event, then database throughput is affected. The amount of disk writes caused by Flashback Database differs depending on the workload and

application profile. For a typical OLTP workload that is using a flash recovery area with sufficient disk spindles and I/O throughput, the overhead incurred by Flashback Database is less than two percent.

Flashback Database can flash back a primary or standby database to a point in time prior to a role transition. In addition, a Flashback Database can be performed to a

See Also:

■ Oracle Database Administrator's Guide for more information about managing redo logs

■ Oracle Data Guard Concepts and Administration for more information about online, archived, and standby redo log files

■ Section 2.4, "Configuring Oracle Database 10g with Data Guard" on page 2-20

See Also: Oracle Database Backup and Recovery Basics for detailed information about sizing the flash recovery area and setting the retention period

(25)

point in time prior to a resetlogs operation, which allows administrators more flexibility to detect and correct human errors. Flashback Database is required when using fast-start failover so that Data Guard Broker can automatically reinstate the primary database following an automatic failover.

If you have a standby database, then set DB_FLASHBACK_RETENTION_TARGET to the same value for both primary and standby databases.

2.2.2.4 Use Fast-Start Fault Recovery to Control Instance Recovery Time

The fast-start fault recovery feature reduces the time required to recover from a crash.

It also makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint. With this feature, the FAST_START_MTTR_TARGET

initialization parameter simplifies the configuration of recovery time from instance or system failure. This parameter specifies a target for the expected recover time objective (RTO), which is the time (in seconds) that it should take to start up the instance and perform cache recovery. Once this parameter is set, the database manages incremental checkpoint writes in an attempt to meet that target. If you have chosen a practical value for this parameter, then you can expect your database to recover, on average, in approximately the number of seconds you have chosen.

2.2.2.5 Enable Database Block Checking

Enable database block checking by setting DB_BLOCK_CHECKING to LOW, MEDIUM, or FULL. The block checking performed for each value is as follows:

■ LOW

Block checking is performed after any in-memory block change.

■ MEDIUM

All in-memory block change checking is performed as well as semantic block checking for all non index organized-table blocks.

■ FULL

Block checking is performed for all in-memory block changes as well as semantic block checking for all non index organized-table blocks and index blocks.

When one of these three forms of block checking is enabled, Oracle Database verifies that the block is self-consistent whenever the corresponding types of block change occur. If the block is inconsistent, then it is marked corrupt, an ORA-1578 error is returned, and a trace file is created containing details of the problem. Without block checking enabled, corruption can go undetected until the block is accessed again.

Block checking for the SYSTEM tablespace is always enabled, no matter what setting is chosen for DB_BLOCK_CHECKING.

Block checking can often prevent memory and data corruption. Turning on this feature typically causes an additional one percent to ten percent overhead, depending on the setting and the workload. Measure the performance impact on a test system using your workload and ensure that it is acceptable before introducing this feature on an active database.

See Also: Oracle Database Backup and Recovery Basics for more information on restore points and Flashback Database

See Also: Oracle Database Backup and Recovery Advanced User's Guide for more information on fast-start fault recovery

(26)

To check for block corruption on a disk that was not preventable by utilizing DB_

BLOCK_CHECKING use one of the following:

■ RMAN BACKUP command with the VALIDATE option

■ DBVERIFY utility

■ ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE SQL statement

2.2.2.6 Set DISK_ASYNCH_IO

Set DISK_ASYNCH_IO=TRUE to enable asynchronous disk I/O for optimal I/O performance.

2.2.2.7 Set LOG_BUFFER to At Least 8 MB

For large production databases, set the LOG_BUFFER initialization parameter to a minimum of 8 MB. This setting ensures the database allocates maximum memory (typically 16 MB) for writing Flashback Database logs.

2.2.2.8 Use Automatic Shared Memory Management

Memory management has improved significantly with the advent of Automatic Shared Memory Management (ASM). By setting the SGA_TARGET parameter to a nonzero value, the shared pool, large pool, Java pool, Streams pool, and buffer cache can automatically and dynamically resize, as needed. See the Oracle Database

Administrator's Guide for more information.

2.2.2.9 Increase PARALLEL_EXECUTION_MESSAGE_SIZE

Increase initialization parameter PARALLEL_EXECUTION_MESSAGE_SIZE from default value of 2048 to 4096. This configuration step accelerates parallel executions, including instance recovery.

2.2.2.10 Tune PARALLEL_MIN_SERVERS

Set PARALLEL_MIN_SERVERS so that the required number of parallel recovery processes are pre-spawned for fast recovery from an instance or node crash. This works with FAST_START_MTTR_TARGET to bound recovery time.

PARALLEL_MIN_SERVERS = CPU_COUNT + average number of parallel query processes in use for GV$ queries and parallel execution

2.2.2.11 Disable Parallel Recovery

When the value of RECOVERY_ESTIMATED_IOS in the V$INSTANCE_RECOVERY view is small (for example, < 5000), then the overhead of parallel recovery may outweigh any benefit. This will typically occur with a very aggressive setting of FAST_START_

MTTR_TARGET. In this case, set RECOVERY_PARALLELISM to 1 to disable parallel recovery.

See Also:

■ Oracle Database Backup and Recovery Reference for more information about the RMAN BACKUP VALIDATE command

■ Oracle Database SQL Reference for more information about the SQL ANALYZE TABLE statement

■ Oracle Database Utilities for information on DBVERIFY

(27)

2.2.3 Recommendations to Improve Manageability

This section describes best practices for improving Oracle Database manageability:

■ Use Automatic Performance Tuning Features

■ Use a Server Parameter File

■ Use Automatic Undo Management

■ Use Locally Managed Tablespaces

■ Use Automatic Segment Space Management

■ Use Temporary Tablespaces and Specify a Default Temporary Tablespace

■ Use Resumable Space Allocation

■ Use Database Resource Manager

2.2.3.1 Use Automatic Performance Tuning Features

Effective data collection and analysis is essential for identifying and correcting performance problems. Oracle provides a number of tools that allow a performance engineer to gather information regarding database performance.

The Oracle Database automatic performance tuning features include:

■ Automatic Workload Repository (AWR)

■ Automatic Database Diagnostic Monitor (ADDM)

■ SQL Tuning Advisor

■ SQL Access Advisor

When using AWR, consider the following best practices:

■ Set the AWR automatic snapshot interval to 10-20 minutes to capture performance peaks during stress testing or to diagnose performance issues.

■ Under usual workloads a 60-minute interval is sufficient

2.2.3.2 Use a Server Parameter File

The server parameter file (SPFILE) enables a single, central parameter file to hold all database initialization parameters associated with all instances of a database. This provides a simple, persistent, and robust environment for managing database parameters. An SPFILE is required when using Data Guard Broker.

See Also:

■ Oracle Database Administrator's Guide for information on managing initialization parameters with an SPFILE

■ Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide for information on initialization parameters with Real Application Clusters

■ Oracle Data Guard Broker for information on other prerequisites for using Oracle Data Guard Broker

■ Appendix A, "Database SPFILE and Oracle Net Configuration File Samples"

(28)

2.2.3.3 Use Automatic Undo Management

With automatic undo management, the Oracle Database server effectively and efficiently manages undo space, leading to lower administrative complexity and cost.

When Oracle Database internally manages undo segments, undo block and consistent read contention are eliminated because the size and number of undo segments are automatically adjusted to meet the current workload requirement.

To use automatic undo management, set the following initialization parameters:

■ UNDO_MANAGEMENT

This parameter should be set to AUTO.

■ UNDO_RETENTION

This parameter specifies the desired time in seconds to retain undo data. It must be the same on all instances.

■ UNDO_TABLESPACE

This parameter should specify a unique undo tablespace for each instance.

Advanced object recovery features, such as Flashback Query, Flashback Version Query, Flashback Transaction Query, and Flashback Table, require automatic undo

management. By default, Oracle Database automatically tunes undo retention by collecting database use statistics and estimating undo capacity needs. You can affect this automatic tuning by setting the UNDO_RETENTION initialization parameter. It is only necessary to set this initialization parameter in the following cases:

■ The undo tablespace has the AUTOEXTEND option enabled.

■ You want to have undo retention for LOBs.

■ You want a retention guarantee.

With the retention guarantee option, the undo guarantee is preserved even if there is need for DML activity (DDL statements are still allowed). If the tablespace is

configured with less space than the transaction throughput requires, then the following four things will occur in this sequence:

1. If you have an autoextensible file, then it will automatically grow to accommodate the retained undo data.

2. A warning alert is issued at 85 percent full.

3. A critical alert is issued at 97 percent full.

4. Transactions receive an out-of-space error.

Note: By default, undo data can be overwritten by ongoing transactions, even if the UNDO_RETENTION setting specifies that the undo data should be maintained. To guarantee that unexpired undo data is not overwritten, retention guarantee must be enabled for the undo tablespace.

See Also: Oracle Database Administrator's Guide for more information about the UNDO_RETENTION setting and the size of the undo

tablespace

(29)

2.2.3.4 Use Locally Managed Tablespaces

Locally managed tablespaces perform better than dictionary-managed tablespaces, are easier to manage, and eliminate space fragmentation concerns. Locally managed tablespaces use bitmaps stored in the datafile headers and, unlike dictionary managed tablespaces, do not contend for centrally managed resources for space allocations and de-allocations.

2.2.3.5 Use Automatic Segment Space Management

Automatic segment space management simplifies space administration tasks, thus reducing the chance of human error. An added benefit is the elimination of

performance tuning related to space management. It facilitates management of free space within objects such as tables or indexes, improves space utilization, and provides significantly better performance and scalability with simplified

administration. The automatic segment space management feature is enabled by default for all new tablespaces created using default attributes.

2.2.3.6 Use Temporary Tablespaces and Specify a Default Temporary Tablespace

Temporary tablespaces improve the concurrency of multiple sort operations, reduce sort operation overhead, and avoid data dictionary space management operations altogether. This is a more efficient way of handling temporary segments, from the perspective of both system resource usage and database performance.

A default temporary tablespace should be specified for the entire database to prevent accidental exclusion of the temporary tablespace clause. This can be done at database creation time by using the DEFAULT TEMPORARY TABLESPACE clause of the CREATE DATABASE statement, or after database creation by the ALTER DATABASE statement.

Using the default temporary tablespace ensures that all disk sorting occurs in a temporary tablespace and that other tablespaces are not mistakenly used for sorting.

2.2.3.7 Use Resumable Space Allocation

Resumable space allocation provides a way to suspend and later resume database operations if there are space allocation failures. The affected operation is suspended instead of the database returning an error. No processes need to be restarted. When the space problem is resolved, the suspended operation is automatically resumed.

To use resumable space allocation, set the RESUMABLE_TIMEOUT initialization parameter to the number of seconds of the retry time. You must also at the session level issue the ALTER SESSION ENABLE RESUMABLE statement.

2.2.3.8 Use Database Resource Manager

The Database Resource Manager gives database administrators more control over resource management decisions, so that resource allocation can be aligned with the business objectives of an enterprise. The Database Resource Manager provides the

See Also: Oracle Database Administrator's Guide for more information on locally managed tablespaces

See Also: Oracle Database Administrator's Guide for more information on segment space management

See Also: Oracle Database Administrator's Guide for more information on managing tablespaces

See Also: Oracle Database Administrator's Guide for more information on managing resumable space allocation

(30)

ability to prioritize work within the Oracle Database server. Availability of the database encompasses both its functionality and performance. If the database is available but users are not getting the level of performance they need, then availability and service level objectives are not being met. Application performance, to a large extent, is affected by how resources are distributed among the applications that access the database. The main goal of the Database Resource Manager is to give the Oracle Database server more control over resource management decisions, thus

circumventing problems resulting from inefficient operating system management and operating system resource managers. The Database Resource Manager is enabled by default.

2.3 Configuring Oracle Database 10g with RAC

The best practices discussed in this section apply to Oracle Database 10g with RAC.

These best practices build on the Oracle Database 10g configuration best practices described in Section 2.2, "Configuring Oracle Database 10g" on page 2-7. These best practices are identical for the primary and standby databases if they are used with Data Guard in Oracle Database 10g with RAC and Data Guard - MAA. Some of these best practices might reduce performance levels, but are necessary to reduce or avoid outages. The minimal performance impact is outweighed by the reduced risk of corruption or the performance improvement for recovery.

This section includes the following topics:

■ Connect to Database using Services and Virtual Internet Protocol (VIP) Address

■ Use Oracle Clusterware to Manage the Cluster and Application Availability

■ Use Client-Side and Server-Side Load Balancing

■ Mirror Oracle Cluster Registry (OCR) and Configure Multiple Voting Disks

■ Regularly Back Up OCR to Tape or Offsite

■ Verify That CRS and RAC Use Same Interconnect Network

■ Configure All Databases for Maximum Instances in the Cluster

2.3.1 Connect to Database using Services and Virtual Internet Protocol (VIP) Address

With Oracle Database 10g, application workloads can be defined as services so that they can be individually managed and controlled. DBAs control which processing resources are allocated to each service during both normal operations and in response to failures. Performance metrics are tracked by service and thresholds set to

automatically generate alerts should these thresholds be crossed. CPU resource allocations and resource consumption controls are managed for services using Resource Manager. Oracle tools and facilities such as Job Scheduler, Parallel Query, and Oracle Streams Advanced Queuing also use services to manage their workloads.

With Oracle Database 10g, rules can be defined to automatically allocate processing resources to services. Oracle RAC 10g instances can be allocated to process individual services or multiple services as needed. These allocation rules can be modified dynamically to meet changing business needs. For example, these rules could be modified at the end of a quarter to ensure that there are enough processing resources to complete critical financial functions on time. Rules can also be defined so that when instances running critical services fail, the workload will be automatically shifted to instances running less critical workloads. Services can be created and administered

See Also: Oracle Database Administrator's Guide for more information on Database Resource Manager

Oracle® Database High Availability Best Practices 10g

Preface

1 Introduction to High-Availability Best Practices

2 Configuring for High-Availability

3 Monitoring Using Oracle Grid Control

4 Managing Outages

5 Migrating to an MAA Environment

A Database SPFILE and Oracle Net Configuration File Samples

Glossary

Index

Audience

Documentation Accessibility

Related Documents

Conventions

1

Introduction to High-Availability Best Practices

1.1 Oracle Database High-Availability Architecture

1.2 Oracle Database High-Availability Best Practices

1.3 Oracle Maximum Availability Architecture

1.4 Operational Best Practices

2

Configuring for High-Availability

2.1 Configuring Storage

2.1.1 Evaluate Database Performance Requirements and Storage Performance Capabilities

2.1.2 Use Automatic Storage Management (ASM) to Manage Database Files

2.1.3 Use a Simple Disk and Disk Group Configuration

2.1.4 Use Disk Multipathing Software to Protect from Path Failure

2.1.5 Use Redundancy to Protect from Disk Failure

2.1.6 Consider HARD-Compliant Storage

2.2 Configuring Oracle Database 10g

2.2.1 Requirements for High Availability

2.2.1.1 Enable ARCHIVELOG Mode

2.2.1.2 Enable Block Checksums

2.2.2 Recommendations for High Availability and Fast Recoverability

2.2.2.1 Configure the Size of Redo Log Files and Groups Appropriately

2.2.2.2 Use a Flash Recovery Area

2.2.2.3 Enable Flashback Database

2.2.2.4 Use Fast-Start Fault Recovery to Control Instance Recovery Time

2.2.2.5 Enable Database Block Checking

2.2.2.6 Set DISK_ASYNCH_IO

2.2.2.7 Set LOG_BUFFER to At Least 8 MB

2.2.2.8 Use Automatic Shared Memory Management

2.2.2.9 Increase PARALLEL_EXECUTION_MESSAGE_SIZE

2.2.2.10 Tune PARALLEL_MIN_SERVERS

2.2.2.11 Disable Parallel Recovery

2.2.3 Recommendations to Improve Manageability

2.2.3.1 Use Automatic Performance Tuning Features

2.2.3.2 Use a Server Parameter File

2.2.3.3 Use Automatic Undo Management

2.2.3.4 Use Locally Managed Tablespaces

2.2.3.5 Use Automatic Segment Space Management

2.2.3.6 Use Temporary Tablespaces and Specify a Default Temporary Tablespace

2.2.3.7 Use Resumable Space Allocation

2.2.3.8 Use Database Resource Manager

2.3 Configuring Oracle Database 10g with RAC

2.3.1 Connect to Database using Services and Virtual Internet Protocol (VIP) Address