Adding upstream version 4.2.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
16732c81e5
commit
4fd4995b67
279 changed files with 77998 additions and 0 deletions
257
mdmon.8
Normal file
257
mdmon.8
Normal file
|
@ -0,0 +1,257 @@
|
|||
.\" See file COPYING in distribution for details.
|
||||
.TH MDMON 8 "" v4.2
|
||||
.SH NAME
|
||||
mdmon \- monitor MD external metadata arrays
|
||||
|
||||
.SH SYNOPSIS
|
||||
|
||||
.BI mdmon " [--all] [--takeover] [--foreground] CONTAINER"
|
||||
|
||||
.SH OVERVIEW
|
||||
The 2.6.27 kernel brings the ability to support external metadata arrays.
|
||||
External metadata implies that user space handles all updates to the metadata.
|
||||
The kernel's responsibility is to notify user space when a "metadata event"
|
||||
occurs, like disk failures and clean-to-dirty transitions. The kernel, in
|
||||
important cases, waits for user space to take action on these notifications.
|
||||
|
||||
.SH DESCRIPTION
|
||||
.SS Metadata updates:
|
||||
To service metadata update requests a daemon,
|
||||
.IR mdmon ,
|
||||
is introduced.
|
||||
.I Mdmon
|
||||
is tasked with polling the sysfs namespace looking for changes in
|
||||
.BR array_state ,
|
||||
.BR sync_action ,
|
||||
and per disk
|
||||
.BR state
|
||||
attributes. When a change is detected it calls a per metadata type
|
||||
handler to make modifications to the metadata. The following actions
|
||||
are taken:
|
||||
.RS
|
||||
.TP
|
||||
.B array_state \- inactive
|
||||
Clear the dirty bit for the volume and let the array be stopped
|
||||
.TP
|
||||
.B array_state \- write pending
|
||||
Set the dirty bit for the array and then set
|
||||
.B array_state
|
||||
to
|
||||
.BR active .
|
||||
Writes
|
||||
are blocked until userspace writes
|
||||
.BR active.
|
||||
.TP
|
||||
.B array_state \- active-idle
|
||||
The safe mode timer has expired so set array state to clean to block writes to the array
|
||||
.TP
|
||||
.B array_state \- clean
|
||||
Clear the dirty bit for the volume
|
||||
.TP
|
||||
.B array_state \- read-only
|
||||
This is the initial state that all arrays start at.
|
||||
.I mdmon
|
||||
takes one of the three actions:
|
||||
.RS
|
||||
.TP
|
||||
1/
|
||||
Transition the array to read-auto keeping the dirty bit clear if the metadata
|
||||
handler determines that the array does not need resyncing or other modification
|
||||
.TP
|
||||
2/
|
||||
Transition the array to active if the metadata handler determines a resync or
|
||||
some other manipulation is necessary
|
||||
.TP
|
||||
3/
|
||||
Leave the array read\-only if the volume is marked to not be monitored; for
|
||||
example, the metadata version has been set to "external:\-dev/md127" instead of
|
||||
"external:/dev/md127"
|
||||
.RE
|
||||
.TP
|
||||
.B sync_action \- resync\-to\-idle
|
||||
Notify the metadata handler that a resync may have completed. If a resync
|
||||
process is idled before it completes this event allows the metadata handler to
|
||||
checkpoint resync.
|
||||
.TP
|
||||
.B sync_action \- recover\-to\-idle
|
||||
A spare may have completed rebuilding so tell the metadata handler about the
|
||||
state of each disk. This is the metadata handler's opportunity to clear
|
||||
any "out-of-sync" bits and clear the volume's degraded status. If a recovery
|
||||
process is idled before it completes this event allows the metadata handler to
|
||||
checkpoint recovery.
|
||||
.TP
|
||||
.B <disk>/state \- faulty
|
||||
A disk failure kicks off a series of events. First, notify the metadata
|
||||
handler that a disk has failed, and then notify the kernel that it can unblock
|
||||
writes that were dependent on this disk. After unblocking the kernel this disk
|
||||
is set to be removed+ from the member array. Finally the disk is marked failed
|
||||
in all other member arrays in the container.
|
||||
.IP
|
||||
+ Note This behavior differs slightly from native MD arrays where
|
||||
removal is reserved for a
|
||||
.B mdadm --remove
|
||||
event. In the external metadata case the container holds the final
|
||||
reference on a block device and a
|
||||
.B mdadm --remove <container> <victim>
|
||||
call is still required.
|
||||
.RE
|
||||
|
||||
.SS Containers:
|
||||
.P
|
||||
External metadata formats, like DDF, differ from the native MD metadata
|
||||
formats in that they define a set of disks and a series of sub-arrays
|
||||
within those disks. MD metadata in comparison defines a 1:1
|
||||
relationship between a set of block devices and a RAID array. For
|
||||
example to create 2 arrays at different RAID levels on a single
|
||||
set of disks, MD metadata requires the disks be partitioned and then
|
||||
each array can be created with a subset of those partitions. The
|
||||
supported external formats perform this disk carving internally.
|
||||
.P
|
||||
Container devices simply hold references to all member disks and allow
|
||||
tools like
|
||||
.I mdmon
|
||||
to determine which active arrays belong to which
|
||||
container. Some array management commands like disk removal and disk
|
||||
add are now only valid at the container level. Attempts to perform
|
||||
these actions on member arrays are blocked with error messages like:
|
||||
.IP
|
||||
"mdadm: Cannot remove disks from a \'member\' array, perform this
|
||||
operation on the parent container"
|
||||
.P
|
||||
Containers are identified in /proc/mdstat with a metadata version string
|
||||
"external:<metadata name>". Member devices are identified by
|
||||
"external:/<container device>/<member index>", or "external:-<container
|
||||
device>/<member index>" if the array is to remain readonly.
|
||||
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
CONTAINER
|
||||
The
|
||||
.B container
|
||||
device to monitor. It can be a full path like /dev/md/container, or a
|
||||
simple md device name like md127.
|
||||
.TP
|
||||
.B \-\-foreground
|
||||
Normally,
|
||||
.I mdmon
|
||||
will fork and continue in the background. Adding this option will
|
||||
skip that step and run
|
||||
.I mdmon
|
||||
in the foreground.
|
||||
.TP
|
||||
.B \-\-takeover
|
||||
This instructs
|
||||
.I mdmon
|
||||
to replace any active
|
||||
.I mdmon
|
||||
which is currently monitoring the array. This is primarily used late
|
||||
in the boot process to replace any
|
||||
.I mdmon
|
||||
which was started from an
|
||||
.B initramfs
|
||||
before the root filesystem was mounted. This avoids holding a
|
||||
reference on that
|
||||
.B initramfs
|
||||
indefinitely and ensures that the
|
||||
.I pid
|
||||
and
|
||||
.I sock
|
||||
files used to communicate with
|
||||
.I mdmon
|
||||
are in a standard place.
|
||||
.TP
|
||||
.B \-\-all
|
||||
This tells mdmon to find any active containers and start monitoring
|
||||
each of them if appropriate. This is normally used with
|
||||
.B \-\-takeover
|
||||
late in the boot sequence.
|
||||
A separate
|
||||
.I mdmon
|
||||
process is started for each container as the
|
||||
.B \-\-all
|
||||
argument is over-written with the name of the container. To allow for
|
||||
containers with names longer than 5 characters, this argument can be
|
||||
arbitrarily extended, e.g. to
|
||||
.BR \-\-all-active-arrays .
|
||||
.TP
|
||||
|
||||
.PP
|
||||
Note that
|
||||
.I mdmon
|
||||
is automatically started by
|
||||
.I mdadm
|
||||
when needed and so does not need to be considered when working with
|
||||
RAID arrays. The only times it is run other than by
|
||||
.I mdadm
|
||||
is when the boot scripts need to restart it after mounting the new
|
||||
root filesystem.
|
||||
|
||||
.SH START UP AND SHUTDOWN
|
||||
|
||||
As
|
||||
.I mdmon
|
||||
needs to be running whenever any filesystem on the monitored device is
|
||||
mounted there are special considerations when the root filesystem is
|
||||
mounted from an
|
||||
.I mdmon
|
||||
monitored device.
|
||||
Note that in general
|
||||
.I mdmon
|
||||
is needed even if the filesystem is mounted read-only as some
|
||||
filesystems can still write to the device in those circumstances, for
|
||||
example to replay a journal after an unclean shutdown.
|
||||
|
||||
When the array is assembled by the
|
||||
.B initramfs
|
||||
code, mdadm will automatically start
|
||||
.I mdmon
|
||||
as required. This means that
|
||||
.I mdmon
|
||||
must be installed on the
|
||||
.B initramfs
|
||||
and there must be a writable filesystem (typically tmpfs) in which
|
||||
.B mdmon
|
||||
can create a
|
||||
.B .pid
|
||||
and
|
||||
.B .sock
|
||||
file. The particular filesystem to use is given to mdmon at compile
|
||||
time and defaults to
|
||||
.BR /run/mdadm .
|
||||
|
||||
This filesystem must persist through to shutdown time.
|
||||
|
||||
After the final root filesystem has be instantiated (usually with
|
||||
.BR pivot_root )
|
||||
.I mdmon
|
||||
should be run with
|
||||
.I "\-\-all \-\-takeover"
|
||||
so that the
|
||||
.I mdmon
|
||||
running from the
|
||||
.B initramfs
|
||||
can be replaced with one running in the main root, and so the
|
||||
memory used by the initramfs can be released.
|
||||
|
||||
At shutdown time,
|
||||
.I mdmon
|
||||
should not be killed along with other processes. Also as it holds a
|
||||
file (socket actually) open in
|
||||
.B /dev
|
||||
(by default) it will not be possible to unmount
|
||||
.B /dev
|
||||
if it is a separate filesystem.
|
||||
|
||||
.SH EXAMPLES
|
||||
|
||||
.B " mdmon \-\-all-active-arrays \-\-takeover"
|
||||
.br
|
||||
Any
|
||||
.I mdmon
|
||||
which is currently running is killed and a new instance is started.
|
||||
This should be run during in the boot sequence if an initramfs was
|
||||
used, so that any mdmon running from the initramfs will not hold
|
||||
the initramfs active.
|
||||
.SH SEE ALSO
|
||||
.IR mdadm (8),
|
||||
.IR md (4).
|
Loading…
Add table
Add a link
Reference in a new issue