install Oracle 9iR2 RAC on SUSE LINUX Enterprise Server 8

--- Using the Network File System (NFS) ---

For example with Netapp (Network Applicane, www.netapp.com) NFS storage devices
Advantage: No expensive Fiber Channel equipment needed and much easier to
administer, both from the hardware and from a software point of view! This
was the easiest Oracle RAC installation we have ever done (when compared
with using a cluster filesystem like OCFS and with SCSI or Fiber Channel
shared storage systems).

We tested and wrote this documentation at Netapp HQ (Sunnyvale, California).
We used a Netapp F840 for NFS storage. The ORACLE_HOME was *not* put on
NFS, only the data files were shared, just like in a setup with Fiber Channel
storage and raw I/O or OCFS.
It is possible to share ORACLE_HOME too but we did not test it and will not
attempt to describe the necessary procedures in this document.
The two test nodes were two IBM xSeries 335 units, the network was Gigabit
ethernet.

More info about NFS options: http://www.netapp.com/tech_library/3183.html


Requires at least these version numbers:

* United Linux 1.0 (SUSE LINUX Enterprise Server 8 is "owered by UL 1.0")
* UL Kernel update for Oracle, at least these version numbers:
   k_smp-2.4.19-196.i586.rpm     - SMP kernel, for almost all Oracle users
   k_deflt-2.4.19-207.i586.rpm   - Single CPU
   k_athlon-2.4.19-200.i586.rpm  - Optimized for AMD Athlon
   k_debug-2.4.19-164.i586.rpm   - Debug kernel
   k_psmp-2.4.19-201.i586.rpm    - Support for *very* old Pentium CPUs
   kernel-source-2.4.19.SUSE-152.i586.rpm
* orarun.rpm: Version 1.8 or greater


Tip: There is an easy way to work on all nodes simultaneously! Simply open
     a KDE "Konsole" and in it open one terminal for each node. Now log
     into each node on each of the terminals. After that, under "View"
     in the "KDE Konsole" menu enable "Send Input to All Sessions" for
     one of the terminals. Now, whatever you type in this session is also
     sent to the other sessions, so that you work on all nodes
     simultaneously! This greatly reduces the amount of typing you have
     to do! If you do that, remember a few things: The node names, IPs etc.
     will be different on each node. The shell history may be different on
     each node. "vi" remembers where in a file you left off - so if you
     edit a file on all nodes simultaneously first check that the cursor
     is in the same position in the file on all terminals. And so on -
     check what's going on on the other terminals often (SHIFT left/right
     arrow makes this a very quick and painless exercise)!!!  

Tip: Use "sux" instead of "su" and X-server permissions and setting DISPLAY
     happens automatically!

Tip: If you work in a noisy server room: Get a Bose noise canceling headset
     which is sold for frequent flyers. We found it very valuable in server
     rooms, too!



ALL NODES
---------

- Install SLES-8 and the latest Service Pack, esp. orarun (at least version
  1.8) and kernel updates. Make sure to satisfy all dependencies of
  package "orarun"!
  SP1: United Linux Service Pack #1. Alternatively, get at least the orarun
  package from:
  ftp://ftp.suse.com/pub/suse/i386 ... cial/Oracle/sles-8/
  Kernel- and other updates are available on the SUSE Maintenance web and
  can be installed conveniently via YOU (Yast Online Update).

- You may have to add "acpi=off" to the boot options if the system hangs
  during boot! Selecting "Safe Settings" as boot option includes that.

- set the password for user oracle: as root do "passwd oracle"

- optional: create /home/oracle:
    cp -a /etc/skel /home/oracle
    chown oracleinstall /home/oracle
    usermod -d /home/oracle oracle

- remove gcc 3.2 ("rpm -e gcc --nodeps") to be sure it's not used - we prefer
  an error message during installation over inadvertently using gcc 3.2
  If you choose not to remove it you have to edit $ORACLE_HOME/bin/genclntsh
  as well as $ORACLE_HOME/bin/genagtsh
  and add "/opt/gcc295/bin" *in front* of the PATH variable set in those
  scripts! Then do "relink all".

- in file /etc/sysconfig/suseconfig set
    CHECK_ETC_HOSTS="no"
    BEAUTIFY_ETC_HOSTS="no"

- For using an NFS mounted filesystem for the Oracle data files add this
  line to /etc/fstab on all nodes. These particular values are the ones
  recommended and tested for Netapp NFS storage devices.

  nfs-storage.domain.com:/vol/oradata  /var/opt/oracle  nfs  rw,fg,hard,nointr,rsize=32768,wsize=32768,tcp,noac,noatime,nfsvers=3,timeo=600   0 0

  See "man nfs" for a description of NFS options.

  Please note that this is for Oracle DATA files only, for other files some
  of the options used are not very useful! For example, with "fg" we mount
  the NFS filesystem in the "foreground", so that the startup process of the
  machine will hang should the NFS server be down. This way an Oracle server
  will fully start only if the NFS storage is available instead of
  producing an error, it will hang and wait until the network storage is
  available! We use "nolock" because within the Oracle data files Oracle does
  its own locking already.

  Also note that for non-RAC (single-instance) Oracle databases you* must* use
  the "nolock" option, but for RAC you must *not* use it!

  Make sure the NFS server allows write access for "root" because the Oracle
  Cluster Manager runs as "root" and has to be able to access and write to
  a shared file. All nodes must have root access of course.

  Then run these four commands (as root):
    chkconfig nfslock on
    chkconfig nfs on
    rcnfslock start
    rcnfs start

- setup the network interfaces (internal/external)

- set up /etc/hosts and /etc/hosts.equiv (for rsh)

- rcoracle start
  This sets the kernel parameters *before* we even start the
  installation!

- edit /etc/inetd.conf (or use yast2 module for inetd), remove the "#"
  in front of "shell..." and "login...", one service is needed for "rsh",
  the other one for "rcp"
  (there are two lines for each, the one with the additional "-a" option
   does hostname verification via reverse lookup before accepting a
   connection)

- as root, do "chkconfig inetd on" and "rcinetd on" (for immediate start)

- Check if you can "rsh" and "rcp" - as user oracle - from any node to
  any other node in the cluster

- Optional: install and configure xntpd to synchronize the time/date on all nodes
  example: my ntp.conf

- edit /etc/profile.d/oracle[c]sh and set ORACLE_SID to some SID[#nodenumber]
  for each node


NODE #1 (installation node)
---------------------------

- as oracle ./runInstaller - Install cluster manager
  For quorum file enter this: /var/opt/oracle/oracm
  (or wherever you mounted the shared NFS filesystem)

  Exit the installer when you're done. If you select "Next install" to
  install the patchset (next point) right from there the installer
  will crash.

- as oracle: ./runInstaller - change source to where you saved the 9.2.0.2
  (or later) patchset and install the 920x patch for "Cluster Manager"

  The installation of the patchset is HIGHLY recommended since beginning
  with 9.2.0.2 you no longer use "watchdogd" but a kernel-module (written by
  Oracle and included in SUSE kernels) called hangcheck-timer, which has
  many big advantages over the old "watchdogd"!

- edit /etc/sysconfig/oracle to enable start of OCM and GSD
  (GSD will work only later after the full software is installed)
    START_ORACLE_DB_OCM="yes"
    START_ORACLE_DB_GSD="yes"

- Start/stop the oracm on one node - the very first time it starts it
  will create the shared quorum file. If started simultaneously on other
  nodes when this file doesn't exist yet creates conflicts and oracm starts
  only on some nodes, a question of random timing! Once this file exists there
  is no problem any more!
    rcoracle start
    rcoracle stop


ALL NODES
---------

- rcoracle start
  Starts OCM, and hangcheck-timer (called "iofence-timer" by oracm)
  If you didn't install our Oracle update kernel you will get an error
  about a missing module "iofence-timer"!

- On each node: Check processes and $ORACLE_HOME/oracm.log/cm.log if oracm is up.
  Check /var/log/messages and cm.log if there are problems.
  
  The end of cm.log should look like this (here: 4 nodes):
  ....
  HandleUpdate(): SYNC(2) from node(0) completed {Thu Feb 13 18:20:19 2003 }
  HandleUpdate(): NODE(0) IS ACTIVE MEMBER OF CLUSTER {Thu Feb 13 18:20:19 2003 }
  HandleUpdate(): NODE(1) IS ACTIVE MEMBER OF CLUSTER {Thu Feb 13 18:20:19 2003 }
  HandleUpdate(): NODE(2) IS ACTIVE MEMBER OF CLUSTER {Thu Feb 13 18:20:19 2003 }
  HandleUpdate(): NODE(3) IS ACTIVE MEMBER OF CLUSTER {Thu Feb 13 18:20:19 2003 }
  NMEVENT_RECONFIG [00][00][00][00][00][00][00][0f] {Thu Feb 13 18:20:20 2003 }
  Successful reconfiguration,  4 active node(s) node 0 is the master, my node num is 0 (reconfig 3)


NODE #1 (installation node)
---------------------------

- as oracle: export SRVM_SHARED_CONFIG=/var/opt/oracle/SharedConfig

- as oracle: ./runInstaller
  The installer will detect the running Oracle Cluster Manager and through
  it all nodes that are part of the cluster, and show them to you. Select
  ALL of the nodes to install the Oracle software on all of them!
  Select "software only", i.e. no database creation (we want to upgrade to
  the latest Oracle 9.2.0.x patchset first)

  Exit the installer.

- as oracle: in $ORACLE_BASE/oui/bin/linux/, do:
  ln -s libclntsh.so.9.0 libclntsh.so

- as oracle: runInstaller
  As source select the 920x patchset directory (./stage/products.jar)
  Install 920x patchset (we already patched the Cluster Manager earlier)

- copy the installation node file /etc/oratab to the same location on all
  other nodes, making sure the owner (oracleinstall) remains the same!


ALL NODES
---------

- rcoracle stop

- rcoracle start
  So that this time GSD is started too, which we only just installed
  GSD is needed by OEM and by dbca

- Go to $ORACLE_BASE and create a link to the shared NFS mounted directory
    cd $ORACLE_BASE
    ln -s /var/opt/oracle oradata
  (assuming you mounted the NFS directory under /var/opt/oracle)

- Go to $ORACLE_HOME and create a link to the shared NFS mounted directory
    cd $ORACLE_HOME
    rm -rf dbs
    ln -s /var/opt/oracle dbs
  (assuming you mounted the NFS directory under /var/opt/oracle)

- The installer "forgot" all log directories on the other nodes
  when copying the software from the installation node:
    mkdir $ORACLE_HOME/rdbms/log
    mkdir $ORACLE_HOME/rdbms/audit
    mkdir $ORACLE_HOME/network/log
    mkdir $ORACLE_HOME/network/agent/log
    mkdir $ORACLE_HOME/ctx/log
    mkdir $ORACLE_HOME/hs/log
    mkdir $ORACLE_HOME/mgw/log
    mkdir $ORACLE_HOME/srvm/log
    mkdir $ORACLE_HOME/sqlplus/log
    mkdir $ORACLE_HOME/sysman/log


FINISHED
--------

Now the software is installed and ready, and the cluster manager
and the GSD are up and running, we are ready to create a database!


NODE #1 (installation node)
---------------------------

- (as of orarun-1.8-8 this is done by "rcoracle start" when oracm is started)
  as root:
    touch /etc/rac_on
  A dirty Oracle hack, see last 5 lines of $ORACLE_HOME/bin/dbca for what it does
  if you are interested. Or: edit "dbca" and you don't need this "touch" command.

- (as of orarun-1.8-8 this is done by /etc/profile.d/oracle.sh - for user
   oracle only)
  to run gsdctl or gsd or srvconfig you have to do this in the same shell:
    unset JAVA_BINDIR JAVA_HOME

- as oracle: run "netca" to create an Oracle network configuration
  The Network Configuration Assistant should detect the running cluster
  manager and offer a cluster configuration option! You must at least
  configure a listener. You can accept all the defaults, i.e. simply
  press NEXT until the listener configuration is done.

- run "lsnrctl start" on ALL NODES.

- as oracle: dbca -datafileDestination $ORACLE_HOME/dbs
  Set up a database. Without the -datafileDestination parameter dbca
  assumes (and checks for!) raw devices which we don't use here!
  If there's an error right at the start, try restarting the cluster
  manager and GSD via "rcoracle stop; rcoracle start".

- edit /etc/sysconfig/oracle to start additional services, e.g. the
  Oracle listener. If you set START_ORACLE_DB="yes" you have to edit
  /etc/oratab (on ALL NODES) and change the last letter in the line for
  your database (usually just one line, at the bottom) to "Y" or no
  database will be started.