Solaris Boot Process

Understanding the Solaris boot process is important in the sense that you can get a clear idea when a system faces a booting problem if you are familiar with the booting sequence and steps involved. You can thereby isolate a booting phase and quickly resolve the issues.

The boot process in Solaris can be divided in to different phases for ease of study. First phase starts at the time of switching on the machine and is boot prom level, it displays a identification banner mentioning machine host id serial no, architecture type memory and Ethernet address This is followed by the self test of various systems in the machine.

This process ultimately looks for the default boot device and reads the boot program from the boot block which is located on the 1-15 blocks of boot device. The boot block contains the ufs file system reader which is required by the next boot processes.

The ufs file system reader opens the boot device and loads the secondary boot program from /usr/platform/`uname -i`/ufsboot (uname -i expands to system architecture type)

The boot program above loads a platform specific kernel along with a generic solaris kernel

The kernel initialize itself and load modules which are required to mount the root partition for continuing the booting process.

The booting process undergoes the following phases afterwards:

  1. init phase
  2. inittab file
  3. rc scripts & run level

1. INIT phase

The Init phase is started by the execution of /sbin/init program and starts other processes after reading the /etc/inittab file as per the directives in the /etc/inittab file.

Two most important functions of init are:

  1. It runs the processes to bring the system to the default run level state ( Run level 3 in Solaris, defined by initdefault parameter in /etc/inittab)
  2. It controls the transition between different run levels by executing appropriate rc scripts to start and the stop the processes for that run level.

2. /etc/inittab file

This file states the default run level and some actions to be performed while the system reaches up to that level. The fields and their explanation are as follows:

S3:3:wait:/sbin/rc3 > /dev/console 2>&1 < /dev/console

where:

  • S3 — denotes a identification if the line
  • 3 — is run level
  • wait — is action to be performed
  • /sbin/rc3 — is the command to be run.

So the fields in the inittab are:

Identification : run level : action : process

The complete line thus means run the command /sbin/rc3 at run level 3 and wait until the rc3 process is complete.

The action field can have any of the following keywords :

  • Initdefault — default run level of the system
  • Respawn — start and restart the process if it stops.
  • Powerfail — stop on powerfail
  • Sysinit — start and wait till console in accessible
  • Wait &mdashl wait till the process ends before going on to the next line

3. RC scripts & Run Levels

RC scripts performs the following functions:

  1. They check and mount the file systems
  2. Start and stop the various processes like network, nfs etc.
  3. Perform some of the house keeping jobs.

System goes in to one of the following run level after booting depending on default run level and the commands issued for changing the run level to some other one:

  • 0 — Boot prom level ok> or > prompt in Sun.
  • 1 — Administrative run level. Single user mode
  • 2 — Multiuser mode with no resource sharing.
  • 3 — Multiuser level with nfs resource sharing
  • 4 — Not used
  • 5 — Shutdown & power off (Sun 4m and 4u architecture )
  • 6 — Reboot to default run level
  • S s — Single user mode user logins are disabled.

Broadly speaking the running system can be in any of the folloing state:

  • Single user — Minimum processes running, user logins disabled and root password is required to gain access to the shell.
  • Multiuser — All system processes are running and user logins are permitted

Run level of a desired state is achieved by a number of scripts executed by the rc program the rc scripts are located in /etc/rc0.d, /etc/rc1.d, /etc/rc2.d, /etc/rc3.d & /etc/rcS.d directories. All the files of a particular run level are executed in the alphanumeric order.Those files beginning with letter S starts the processes and those beginning with K stops the processes.

These files are hard linked to the files in /etc/init.d in order to provide a central location for all these files and eliminating the need to change the run level in case these scripts needs to be run separately. The files in /etc/init.d directory are without any S, K and numeric prefix instead a stop / start argument has to be supplied whenever these scripts are to be executed.

By default system has a number of rc scripts needed for run level transition but sometimes it becomes necessary to start some custom scripts at the booting time and turn them off at the shutdown. Custom scripts can be put in any of the required rc directory but following major considerations has to be kept in mind:

  • The sequence number of the file should not conflict with other files.
  • The sevices needed should be available by previously executed scripts.
  • File should be hard linked to the /etc/init.d directory.
  • The system looks for only those files beginning with letter K & S, any thing else is ignored, therefore, to make a file inactive simply changing uppercase K or S to lower case will cause system to ignore it.

Troubleshooting

The following are common Solaris booting issues, error messages, their meaning and possible resolutions:

Booting in single user mode and mounting root hard disk

Most important step in diagnosing the booting problems is booting the system in single user mode and examining the hard disk for possible errors & work out the corrective measure. Single user mode can be achieved by any of the following methods:

ok> boot -s ;from root disk
ok> boot net -s ;from network
ok> boot cdrom -s ;from cdrom

For example, from CD-Rom:

ok> boot cdrom -s
Rebooting with command: cdrom -s
Configuring the /devices directory
Configuring the /dev directory |
INIT: SINGLE USER MODE
#
# fsck /dev/rdsk/c1t0d0s0
# mount /dev/dsk/c1t0d0s0 /mnt 

Perform the required operation on mounted disk , now accessible through /mnt ,& unmount the hard disk after you are done:

# umount /mnt
# reboot

Making boot device alias

In case system can not boot from primary disk and it is needed to make another boot disk to access the data, the nvalias command is used.

nvalias command makes the device alias and assigns an alternate name to a physical disk. Physical address of target disk is required which can be had by show-disk at the ok> prompt:

ok> nvalias altdisk /iommu@f,e0000000/sbus@f,e0001000/dma@3,81000/esp@3,80000/sd2,0

The new aliased disk can be named as boot disk or can be used for booting by refering its name:

ok> setenv boot-device altdisk
ok> boot altdisk

Timeout waiting for ARP/RARP packet ?

At ok> type printenv and look for these parameters:

boot-device disk
mfg-switch? false
diag-switch? false

if you see "lt;code>boot-device net" or true value for the other two parameter change it to the values above.

In case you wants to boot from network make sure your client is properly configured in boot server and network connections & configuration are proper.

The file just loaded does not appear to be executable

The Boot block on the hard disk is corrupted .Boot the system in single user mode with cdrom and reinstall boot block:

# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t0d0s0

bootblk: can't find the boot program

The boot block can not find the boot program ufsboot in Solaris. Either ufsboot is missing or corrupted . In such cases it can be restored from the cdrom after booting from cdrom & mounting the hard disk:

# cp /platform/`uname -i`/ufsboot /mnt/platform/`uname -i`

boot: cannot open kernel/unix

Kernel directory or unix kernel file in this directory is not found .Probably deleted during fsck or deleted by mistake. Copy it from the cdrom or restore from the backup tape:

# cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel

Error reading ELF header ?

The kernel directory or unix kernel file in this directory is corrupted.Copy it from the cdrom or restore from the backup tape:

# cp /platform/`uname -i`/kernel/unix /mnt/platform/`uname -i`/kernel

Cannot open /etc/path_to_inst

The system can not find the /etc/path_to_install file .It might be missing or corrupted and needs to be rebuild.

To rebuild this file boot the system with -ar option:

ok> boot -ar
Press enter to select default values for the questions asked during booting and select yes to rebuild /etc/path_to_install
The /etc/path_to_inst on your system does not exist or is empty. Do you want to rebuild this file [n]? y

system will continue booting after rebuilding the file.

Can't stat /dev/rdsk/c1t0d0s0

When booted from cdrom and you have completed an fsck of the root partition comes out to be fine but on booting from root disk this error occurs. The device name for / is missing from /dev/dsk directory and to resolve the issue /dev & /devices directories has to be restored from root backup tapes.