122 lines
4.6 KiB
Text
122 lines
4.6 KiB
Text
Assembling md arrays at boot time.
|
|
---------------------------------
|
|
December 2005
|
|
|
|
These notes apply to 2.6 kernels only and, in some cases,
|
|
to 2.6.15 or later.
|
|
|
|
Md arrays can be assembled at boot time using the 'autodetect' functionality
|
|
which is triggered by storing components of an array in partitions of type
|
|
'fd' - Linux Raid Autodetect.
|
|
They can also be assembled by specifying the component devices in a
|
|
kernel parameter such as
|
|
md=0,/dev/sda,/dev/sdb
|
|
In this case, /dev/md0 will be assembled (because of the 0) from the listed
|
|
devices.
|
|
|
|
These mechanisms, while useful, do not provide complete functionality
|
|
and are unlikely to be extended. The preferred way to assemble md
|
|
arrays at boot time is using 'mdadm'. To assemble an array which
|
|
contains the root filesystem, mdadm needs to be run before that
|
|
filesystem is mounted, and so needs to be run from an initial-ram-fs.
|
|
It is how this can work that is the primary focus of this document.
|
|
|
|
It should be noted up front that only the array containing the root
|
|
filesystem should be assembled from the initramfs. Any other arrays
|
|
should be assembled under the control of files on the main filesystem
|
|
as this enhanced flexibility and maintainability.
|
|
|
|
A minimal initramfs for assembling md arrays can be created using 3
|
|
files and one directory. These are:
|
|
|
|
/bin Directory
|
|
/bin/mdadm statically linked mdadm binary
|
|
/bin/busybox statically linked busybox binary
|
|
/bin/sh hard link to /bin/busybox
|
|
/init a shell script which call mdadm appropriately.
|
|
|
|
An example init script is:
|
|
|
|
==============================================
|
|
#!/bin/sh
|
|
|
|
echo 'Auto-assembling boot md array'
|
|
mkdir /proc
|
|
mount -t proc proc /proc
|
|
if [ -n "$rootuuid" ]
|
|
then arg=--uuid=$rootuuid
|
|
elif [ -n "$mdminor" ]
|
|
then arg=--super-minor=$mdminor
|
|
else arg=--super-minor=0
|
|
fi
|
|
echo "Using $arg"
|
|
mdadm -Acpartitions $arg --auto=part /dev/mda
|
|
cd /
|
|
mount /dev/mda1 /root || mount /dev/mda /root
|
|
umount /proc
|
|
cd /root
|
|
exec chroot . /sbin/init < /dev/console > /dev/console 2>&1
|
|
=============================================
|
|
|
|
This could certainly be extended, or merged into a larger init script.
|
|
Though tested and in production use, it is not presented here as
|
|
"The Right Way" to do it, but as a useful example.
|
|
Some key points are:
|
|
|
|
/proc needs to be mounted so that /proc/partitions can be accessed
|
|
by mdadm, and so that /proc/filesystems can be accessed by mount.
|
|
|
|
The uuid of the array can be passed in as a kernel parameter
|
|
(rootuuid). As the kernel doesn't use this value, it is made available
|
|
in the environment for /init
|
|
|
|
If no uuid is given, we default to md0, (--super-minor=0) which is a
|
|
commonly used to store the root filesystem. This may not work in
|
|
all situations.
|
|
|
|
We assemble the array as a partitionable array (/dev/mda) even if we
|
|
end up using the whole array. There is no cost in using the partitionable
|
|
interface, and in this context it is simpler.
|
|
|
|
We try mounting both /dev/mda1 and /dev/mda as they are the most like
|
|
part of the array to contain the root filesystem.
|
|
|
|
The --auto flag is given to mdadm so that it will create /dev/md*
|
|
files automatically. This is needed as /dev will not contain
|
|
and md files, and udev will not create them (as udev only created device
|
|
files after the device exists, and mdadm need the device file to create
|
|
the device). Note that the created md files may not exist in /dev
|
|
of the mounted root filesystem. This needs to be deal with separately
|
|
from mdadm - possibly using udev.
|
|
|
|
We do not need to create device files for the components which will
|
|
be assembled into /dev/mda. mdadm finds the major/minor numbers from
|
|
/proc/partitions and creates a temporary /dev file if one doesn't already
|
|
exist.
|
|
|
|
The script "mkinitramfs" which is included with the mdadm distribution
|
|
can be used to create a minimal initramfs. It creates a file called
|
|
'init.cpio.gz' which can be specified as an 'initrd' to lilo or grub
|
|
(or whatever boot loader is being used).
|
|
|
|
|
|
|
|
|
|
Resume from an md array
|
|
-----------------------
|
|
|
|
If you want to make use of the suspend-to-disk/resume functionality in Linux,
|
|
and want to have swap on an md array, you will need to assemble the array
|
|
before resume is possible.
|
|
However, because the array is active in the resumed image, you do not want
|
|
anything written to any drives during the resume process, such as superblock
|
|
updates or array resync.
|
|
|
|
This can be achieved in 2.6.15-rc1 and later kernels using the
|
|
'start_readonly' module parameter.
|
|
Simply include the command
|
|
echo 1 > /sys/module/md_mod/parameters/start_ro
|
|
before assembling the array with 'mdadm'.
|
|
You can then echo
|
|
9:0
|
|
or whatever is appropriate to /sys/power/resume to trigger the resume.
|