1
0
Fork 0

Adding upstream version 4.2.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-14 05:52:19 +01:00
parent 16732c81e5
commit 4fd4995b67
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
279 changed files with 77998 additions and 0 deletions

18
.gitignore vendored Normal file
View file

@ -0,0 +1,18 @@
/*.o
/*.man
/*-stamp
/mdadm
/mdadm.8
/mdadm.udeb
/mdassemble
/mdmon
/swap_super
/test_stripe
/TAGS
/mdadm.O2
/mdadm.Os
/mdadm.static
/mdassemble.auto
/mdassemble.static
/mdmon.O2
/raid6check

98
ANNOUNCE-3.0 Normal file
View file

@ -0,0 +1,98 @@
Subject: ANNOUNCE: mdadm 3.0 - A tool for managing Soft RAID under Linux
I am pleased to (finally) announce the availability of
mdadm version 3.0
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a major new version and as such should be treated with some
caution. However it has seen substantial testing and is considerred
to be ready for wide use.
The significant change which justifies the new major version number is
that mdadm can now handle metadata updates entirely in userspace.
This allows mdadm to support metadata formats that the kernel knows
nothing about.
Currently two such metadata formats are supported:
- DDF - The SNIA standard format
- Intel Matrix - The metadata used by recent Intel ICH controlers.
Also the approach to device names has changed significantly.
If udev is installed on the system, mdadm will not create any devices
in /dev. Rather it allows udev to manage those devices. For this to work
as expected, the included udev rules file should be installed.
If udev is not installed, mdadm will still create devices and symlinks
as required, and will also remove them when the array is stopped.
mdadm now requires all devices which do not have a standard name (mdX
or md_dX) to live in the directory /dev/md/. Names in this directory
will always be created as symlinks back to the standard name in /dev.
The man pages contain some information about the new externally managed
metadata. However see below for a more condensed overview.
Externally managed metadata introduces the concept of a 'container'.
A container is a collection of (normally) physical devices which have
a common set of metadata. A container is assembled as an md array, but
is left 'inactive'.
A container can contain one or more data arrays. These are composed from
slices (partitions?) of various devices in the container.
For example, a 5 devices DDF set can container a RAID1 using the first
half of two devices, a RAID0 using the first half of the remain 3 devices,
and a RAID5 over thte second half of all 5 devices.
A container can be created with
mdadm --create /dev/md0 -e ddf -n5 /dev/sd[abcde]
or "-e imsm" to use the Intel Matrix Storage Manager.
An array can be created within a container either by giving the
container name and the only member:
mdadm -C /dev/md1 --level raid1 -n 2 /dev/md0
or by listing the component devices
mdadm -C /dev/md2 --level raid0 -n 3 /dev/sd[cde]
To assemble a container, it is easiest just to pass each device in turn to
mdadm -I
for i in /dev/sd[abcde]
do mdadm -I $i
done
This will assemble the container and the components.
Alternately the container can be assembled explicitly
mdadm -A /dev/md0 /dev/sd[abcde]
Then the components can all be assembled with
mdadm -I /dev/md0
For each container, mdadm will start a program called "mdmon" which will
monitor the array and effect any metadata updates needed. The array is
initially assembled readonly. It is up to "mdmon" to mark the metadata
as 'dirty' and which the array to 'read-write'.
The version 0.90 and 1.x metadata formats supported by previous
versions for mdadm are still supported and the kernel still performs
the same updates it use to. The new 'mdmon' approach is only used for
newly introduced metadata types.
NeilBrown 2nd June 2009

22
ANNOUNCE-3.0.1 Normal file
View file

@ -0,0 +1,22 @@
Subject: ANNOUNCE: mdadm 3.0.1 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.0.1
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This contains only minor bug fixes over 3.0. If you are using
3.0, you could consider upgrading.
The brief change log is:
- Fix various segfaults
- Fixed for --examine with containers
- Lots of other little fixes.
NeilBrown 25th September 2009

21
ANNOUNCE-3.0.2 Normal file
View file

@ -0,0 +1,21 @@
Subject: ANNOUNCE: mdadm 3.0.2 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.0.2
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This just contains one bugfix over 3.0.1 - I was obviously a bit hasty
in releasing that one.
The brief change log is:
- Fix crash when hosthost is not set, as often happens in
early boot.
NeilBrown 25th September 2009

29
ANNOUNCE-3.0.3 Normal file
View file

@ -0,0 +1,29 @@
Subject: ANNOUNCE: mdadm 3.0.3 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.0.3
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This contains a collection of bug fixes and minor enhancements over
3.0.1.
The brief change log is:
- Improvements for creating arrays giving just a name, like 'foo',
rather than the full '/dev/md/foo'.
- Improvements for assembling member arrays of containers.
- Improvements to test suite
- Add option to change increment for RebuildNN messages reported
by "mdadm --monitor"
- Improvements to mdmon 'hand-over' from initrd to final root.
- Handle merging of devices that have left an IMSM array and are
being re-incorporated.
- Add missing space in "--detail --brief" output.
NeilBrown 22nd October 2009

33
ANNOUNCE-3.1 Normal file
View file

@ -0,0 +1,33 @@
Subject: ANNOUNCE: mdadm 3.1 - A tool for managing Soft RAID under Linux
Hot on the heals of 3.0.3 I am pleased to announce the availability of
mdadm version 3.1
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
It contains significant feature enhancements over 3.0.x
The brief change log is:
- Support --grow to change the layout of RAID4/5/6
- Support --grow to change the chunksize of raid 4/5/6
- Support --grow to change level from RAID1 -> RAID5 -> RAID6 and
back.
- Support --grow to reduce the number of devices in RAID4/5/6.
- Support restart of these grow options which assembling an array
which is partially grown.
- Assorted tests of this code, and of different RAID6 layouts.
Note that a 2.6.31 or later is needed to have access to these.
Reducing devices in a RAID4/5/6 requires 2.6.32.
Changing RAID5 to RAID1 requires 2.6.33.
You should only upgrade if you need to use, or which to test, these
features.
NeilBrown 22nd October 2009

39
ANNOUNCE-3.1.1 Normal file
View file

@ -0,0 +1,39 @@
Subject: ANNOUNCE: mdadm 3.1.1 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.1.1
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a bugfix release over 3.1, which was withdrawn due to serious
bugs. So it might be best to ignore 3.1 and say that this is a significant
feature release over 3.0.x
Significant changes are:
- RAID level conversion between RAID1, RAID5, and RAID6 are
possible were the kernel supports it (2.6.32 at least)
- online chunksize and layout changing for RAID5 and RAID6
where the kernel supports it.
- reduce the number of devices in a RAID4/5/6 array.
- The default metadata is not v1.1. This metadata is stored at the
start of the device so is safer in many ways but could interfere with
boot loaded. The old default (0.90) is still available and fully
supported.
- The default chunksize is now 512K rather than 64K. This seems more
appropriate for modern devices.
- The default bitmap chunksize for internal bitmaps is now at least
64Meg as fine grained bitmaps tend to impact performance more for
little extra gain.
This release is believed to be stable and you should feel free to
upgrade to 3.1.1.
NeilBrown 19th November 2009

46
ANNOUNCE-3.1.2 Normal file
View file

@ -0,0 +1,46 @@
Subject: ANNOUNCE: mdadm 3.1.2 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.1.2
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a bugfix/stability release over 3.1.1.
Significant changes are:
- The default metadata has change again (sorry about that).
It is now v1.2 and will hopefully stay that way. It turned
out there with boot-block issues with v1.1 which make it
unsuitable for a default, though in many cases it is still
suitable to use.
- Stopping a container is not permitted when members are still
active
- Add 'homehost' to the valid words for the "AUTO" config file
line. When followed by "-all", this causes mdadm to
auto-assemble any array belonging to this host, but not
auto-assemble anything else.
- Fix some bugs with "--grow --chunksize=" for changing chunksize.
- VAR_RUN can be easily changed at compile time just like ALT_RUN.
This gives distros more flexability in how to manage the
pid and sock files that mdmon needs.
- Various mdmon fixes
- Alway make bitmap 4K-aligned if at all possible.
- If mdadm.conf lists arrays which have inter-dependencies,
the previously had to be listed in the "right" order. Now
any order should work.
- Fix --force assembly of v1.x arrays which are in the process
of recovering.
- Add section on 'scrubbing' to 'md' man page.
- Various command-line-option parsing improvements.
- ... and lots of other bug fixes.
This release is believed to be stable and you should feel free to
upgrade to 3.1.2
NeilBrown 10th March 2010

46
ANNOUNCE-3.1.3 Normal file
View file

@ -0,0 +1,46 @@
Subject: ANNOUNCE: mdadm 3.1.3 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.1.3
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a bugfix/stability release over 3.1.2
Significant changes are:
- mapfile now lives in a fixed location which default to
/dev/.mdadm/map but can be changed at compile time. This
location is choses and most distros provide it during early
boot and preserve it through. As long a /dev exists and is
writable, /dev/.mdadm will be created.
Other files file communication with mdmon live here too.
This fixes a bug reported by Debian and Gentoo users where
udev would spin in early-boot.
- IMSM and DDF metadata will not be recognised on partitions
as they should only be used on whole-disks.
- Various overflows causes by 2G drives have been addressed.
- A subarray of an IMSM contain can now be killed with
--kill-subarray. Also subarrays can be renamed with
--update-subarray
- -If (or --incremental --fail) can be used from udev to
fail and remove from all arrays a device which has been
unplugged from the system. i.e. hot-unplug-support.
- "mdadm /dev/mdX --re-add missing" will look for any device
that looks like it should be a member of /dev/mdX but isn't
and will automatically --re-add it
- Now compile with -Wextra to get extra warnings.
- Lots of minor bug fixes, documentation improvements, etcc
This release is believed to be stable and you should feel free to
upgrade to 3.1.3
It is expected that the next release will be 3.2 with a number of new
features. 3.1.4 will only happen if important bugs show up before 3.2
is stable.
NeilBrown 6th August 2010

37
ANNOUNCE-3.1.4 Normal file
View file

@ -0,0 +1,37 @@
Subject: ANNOUNCE: mdadm 3.1.4 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.1.4
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a bugfix/stability release over 3.1.3.
3.1.3 had a couple of embarrasing regressions and a couple of other
issues surfaces which had easy fixes so I decided to make a 3.1.4
release after all.
Two fixes related to configs that aren't using udev:
- Don't remove md devices which 'standard' names on --stop
- Allow dev_open to work on read-only /dev
And fixed regressions:
- Allow --incremental to add spares to an array
- Accept --no-degraded as a deprecated option rather than
throwing an error
- Return correct success status when --incrmental assembling
a container which does not yet have enough devices.
- Don't link mdadm with pthreads, only mdmon needs it.
- Fix compiler warning due to bad use of snprintf
- Fix spare migration
This release is believed to be stable and you should feel free to
upgrade to 3.1.4
It is expected that the next release will be 3.2 with a number of new
features.
NeilBrown 31st August 2010

42
ANNOUNCE-3.1.5 Normal file
View file

@ -0,0 +1,42 @@
Subject: ANNOUNCE: mdadm 3.1.5 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.1.5
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git?p=mdadm
This is a bugfix/stability release over 3.1.4. It contains all the
important bugfixes found while working on 3.2 and 3.2.1. It will be
the last 3.1.x release - 3.2.1 is expected to be released in a few days.
Changes include:
- Fixes for v1.x metadata on big-endian machines.
- man page improvements
- Improve '--detail --export' when run on partitions of an md array.
- Fix regression with removing 'failed' or 'detached' devices.
- Fixes for "--assemble --force" in various unusual cases.
- Allow '-Y' to mean --export. This was documented but not implemented.
- Various fixed for handling 'ddf' metadata. This is now more reliable
but could benefit from more interoperability testing.
- Correctly list subarrays of a container in "--detail" output.
- Improve checks on whether the requested number of devices is supported
by the metadata - both for --create and --grow.
- Don't remove partitions from a device that is being included in an
array until we are fully committed to including it.
- Allow "--assemble --update=no-bitmap" so an array with a corrupt
bitmap can still be assembled.
- Don't allow --add to succeed if it looks like a "--re-add" is probably
wanted, but cannot succeed. This avoids inadvertently turning
devices into spares when an array is failed.
This release is believed to be stable and you should feel free to
upgrade to 3.1.5
NeilBrown 23rd March 2011

77
ANNOUNCE-3.2 Normal file
View file

@ -0,0 +1,77 @@
Subject: ANNOUNCE: mdadm 3.2 - A tool for managing Soft RAID under Linux (DEVEL ONLY)
I am pleased to announce the availability of
mdadm version 3.2
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm devel-3.2
http://neil.brown.name/git?p=mdadm
This is a "Developers only" release. Please don't consider using it
or making it available to others without reading the following.
By far the most significant change in this release related to the
management of reshaping arrays. This code has been substantially
re-written so that it can work with 'externally managed metadata' -
Intel's IMSM in particular. We now support level migration and
OnLine Capacity Expansion on these arrays.
However, while the code largely works it has not been tested
exhaustively so there are likely to be problems. As the reshape code
for native metadata arrays was changed as part of this rewrite these
problems could also result in regressions for reshape of native
metadata.
It is partly to encourage greater testing that this release is being
made. Any reports of problem - particular reproducible recipes for
triggering the problems - will be gratefully received.
It is hopped that a "3.2.1" release will be available in early March
which will be a bugfix release over this and can be considered
suitable for general use.
Other changes of note:
- Policy framework.
Various policy statements can be made in the mdadm.conf to guide
the behaviour of mdadm, particular with regards to how new devices
are treated by "mdadm -I".
Depending on the 'action' associated with a device (identified by
its 'path') such need devices can be automatically re-added to and
existing array that they previously fell out off, or automatically
added as a spare if they appear to contain no data.
- mdadm now has a limited understanding of partition tables. This
allows the policy framework to make decisions about partitioned
devices as well.
- --incremental --remove can be told what --path the device was on,
and this info will be recorded so that another device appearing at
the same physical location can be preferentially added to the same
array (provides the spare-same-slot action policy applied to the
path).
- A new flags "--invalid-backup" flag is available in --assemble
mode. This can be used to re-assemble an array which was stopping
in the middle of a reshape, and for which the 'backup file' is no
longer available or is corrupted. The array may have some
corruption in it at the point where reshape was up to, but at least
the rest of the array will become available.
- Various internal restructuring - more is needed.
Any feed back and bug reports are always welcomed at:
linux-raid@vger.kernel.org
And please: don't use this in production - particularly not the
--grow functionality.
NeilBrown 1st February 2011

75
ANNOUNCE-3.2.1 Normal file
View file

@ -0,0 +1,75 @@
I am pleased to announce the availability of
mdadm version 3.2.1
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
Many of the changes in this release are of internal interest only,
restructuring and refactoring code and so forth.
Most of the bugs found and fixed during development for 3.2.1 have been
back-ported for the recently-release 3.1.5 so this release primarily
provides a few new features over 3.1.5.
They include:
- policy framework
Policy can be expressed for moving spare devices between arrays, and
for how to handle hot-plugged devices. This policy can be different
for devices plugged in to different controllers etc.
This, for example, allows a configuration where when a device is plugged
in it is immediately included in an md array as a hot spare and
possibly starts recovery immediately if an array is degraded.
- some understanding of mbr and gpt paritition tables
This is primarly to support the new hot-plug support. If a
device is plugged in and policy suggests it should have a partition table,
the partition table will be copied from a suitably similar device, and
then the partitions will hot-plug and can then be added to md arrays.
- "--incremental --remove" can remember where a device was removed from
so if a device gets plugged back in the same place, special policy applies
to it, allowing it to be included in an array even if a general hotplug
will not be included.
- enhanced reshape options, including growing a RAID0 by converting to RAID4,
restriping, and converting back. Also convertions between RAID0 and
RAID10 and between RAID1 and RAID10 are possible (with a suitably recent
kernel).
- spare migration for IMSM arrays.
Spare migration can now work across 'containers' using non-native metadata
and specifically Intel's IMSM arrays support spare migrations.
- OLCE and level migration for Intel IMSM arrays.
OnLine Capacity Expansion and level migration (e.g. RAID0 -> RAID5) is
supported for Intel Matrix Storage Manager arrays.
This support is currently 'experimental' for technical reasons. It can
be enabled with "export MDADM_EXPERIMENTAL=1"
- avoid including wayward devices
If you split a RAID1, mount the two halves as two separate degraded RAID1s,
and then later bring the two back together, it is possible that the md
metadata won't properly show that one must over-ride the other.
mdadm now does extra checking to detect this possibilty and avoid
potentially corrupting data.
- remove any possible confusion between similar options.
e.g. --brief and --bitmap were mapped to 'b' and mdadm wouldn't
notice if one was used where the other was expected.
- allow K,M,G suffixes on chunk sizes
While mdadm-3.2.1 is considered to be reasonably stable, you should
only use it if you want to try out the new features, or if you
generally like to be on the bleeding edge. If the new features are not
important to you, then 3.1.5 is probably the appropriate version to be using
until 3.2.2 comes out.
NeilBrown 28th March 2011

36
ANNOUNCE-3.2.2 Normal file
View file

@ -0,0 +1,36 @@
Subject: ANNOUNCE: mdadm 3.2.2 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.2.2
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
This release is largely a stablising release for the 3.2 series.
Many of the changes just fix bugs introduces in 3.2 or 3.2.1.
There are some new features. They are:
- reshaping IMSM (Intel metadata) arrays is no longer 'experimental',
it should work properly and be largely compatible with IMSM drivers in
other platforms.
- --assume-clean can be used with --grow --size to avoid resyncing the
new part of the array. This is only support with very new kernels.
- RAID0 arrays can have chunksize which is not a power of 2. This has been
supported in the kernel for a while but is only now supprted by
mdadm.
- A new tool 'raid6check' is available which can check a RAID6 array,
or part of it, and report which device is most inconsistent with the
others if any stripe is inconsistent. This is still under development
and does not have a man page yet. If anyone tries it out and has any
questions or experience to report, they would be most welcome on
linux-raid@vger.kernel.org.
Future releases in the 3.2 series will only be made if bugfixes are needed.
The next release to add features is expected to be 3.3.
NeilBrown 17th June 2011

24
ANNOUNCE-3.2.3 Normal file
View file

@ -0,0 +1,24 @@
Subject: ANNOUNCE: mdadm 3.2.3 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.2.3
It is available at the usual places:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
This release is largely a bugfix release for the 3.2 series with many
minor fixes with little or no impact.
The largest single area of change is support for reshape of Intel
IMSM arrays (OnLine Capacity Explansion and Level Migtration).
Among other fixes, this now has a better chance of surviving if a
device fails during reshape.
Upgrading is recommended - particularly if you use mdadm for IMSM
arrays - but not essential.
NeilBrown 23rd December 2011

144
ANNOUNCE-3.2.4 Normal file
View file

@ -0,0 +1,144 @@
Subject: ANNOUNCE: mdadm 3.2.4 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.2.4
It is available at the usual places, now including github:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
This release is largely a bugfix release for the 3.2 series with many
minor fixes with little or no impact.
"--oneline" log of changes is below. Some notable ones are:
- --offroot argument to improve interactions between mdmon and initrd
- --prefer argument to select which /dev names to display in some
circumstances.
- relax restructions on when "--add" will be allowed
- Fix bug with adding write-intent-bitmap to active array
- Now defaults to "/run/mdadm" for storing run-time files.
Upgrading is encouraged.
The next mdadm release is expected to be 3.3 with a number of new
features.
NeilBrown 9th May 2012
77b3ac8 monitor: make return from read_and_act more symbolic.
68226a8 monitor: ensure we retry soon when 'remove' fails.
8453f8d fix: Monitor sometimes crashes
90fa1a2 Work around gcc-4.7's strict aliasing checks
0c4304c fix: container creation with --incremental used.
5d1c7cd FIX: External metadata sometimes is not updated
3c20f98 FIX: mdmon check in reshape_container() can cause a problem
59ab9f5 FIX: Typo error in fprint command
9587c37 imsm: load_super_imsm_all function refactoring
ec50f7b imsm: load_imsm_super_all supports loading metadata from the device list
ca9de18 imsm: validate the number of imsm volumes per controller
30602f5 imsm: display fd in error trace when when store_imsm_mpb failes
eb155f6 mdmon: Use getopt_long() to parse command line options
08ca2ad Add --offroot argument to mdadm
da82751 Add --offroot argument to mdmon
a0963a8 Spawn mdmon with --offroot if mdadm was launched with --offroot
f878b24 imsm: fix, the second array need to have the whole available space on devices
d597705 getinfo_super1: Use MaxSector in place of sb->size
6ef8905 super1: make aread/awrite always use an aligned buffer.
de5a472 Remove avail_disks arg from 'enough'.
da8fe5a Assemble: fix --force assemble during reshape.
b10c663 config: fix handing of 'homehost' in AUTO line.
92d49ec FIX: NULL pointer to strdup() can be passed
d2bde6d imsm: FIX: No new missing disks are allowed during general migration
111e9fd FIX: Array is not run when expansion disks are added
bf5cf7c imsm: FIX: imsm_get_allowed_degradation() doesn't count degradation for raid1
50927b1 Fix: Sometimes mdmon throws core dump during reshape
78340e2 Flush mdmon before next reshape step during container operation
e174219 imsm: FIX: Chunk size migration problem
f93346e FIX: use md position to reshape restart
6a75c8c imsm: FIX: use md position to reshape restart
51d83f5 imsm: FIX: Clear migration record when migration switches to next volume.
e1dd332 FIX: restart reshape when reshape process is stopped just between 2 reshapes
1ca90aa FIX: Do not try to (continue) reshape using inactive array
9f1b0f0 config: conf_match should ignore devname when not set.
d669228 Use posix_memalign() for memory used to write bitmaps
178950e FIX: Changes in '0' case for reshape position verification
9200d41 avoid double-free upon "old buggy kernel" sysfs_read failure
4011421 Print error message if failing to write super for 1.x metadata
0011874 Use MDMON_DIR for pid files created in Monitor.c
56d1885 Assemble: don't use O_EXCL until we have checked device content.
b720636 Assemble: support assembling of a RAID0 being reshaped.
c69ffac Manage: allow --re-add to failed array.
52f07f5 Reset bad flag on map update
911cead super1: support superblocks up to 4K.
ad6db3c Create: reduce the verbosity of 'default_layout'.
b2bfdfa super1.c don't keep recalculating bitmap pointer
4122675 Define and use SUPER1_SIZE for allocations
1afa930 init_super1() memset full buffer allocated for superblock
2de0b8a match_metadata_desc1(): Use calloc instead of malloc+memset
3c0bcd4 Use 4K buffer alignment for superblock allocations
308340a Use struct align_fd to cache fd's block size for aligned reads/writes
65ed615 match_metadata_desc0(): Use calloc instead of malloc+memset
de89706 Generalize ROUND_UP() macro and introduce matching ROUND_UP_PTR()
0a2f189 super1.c: use ROUND_UP/ROUND_UP_PTR
654a381 super-intel.c: Use ROUND_UP() instead of manually coding it
42d5dfd __write_init_super_ddf(): Use posix_memalign() instead of static aligned buffer
d4633e0 Examine: fix array size calculation for RAID10.
e62b778 Assemble: improve verbose logging when including old devices.
0073a6e Remove possible crash during RAID6 -> RAID5 reshape.
69fe207 Incremental: fix adding devices with --incremental
bcbb311 Manage: replace 'return 1' with 'goto abort'.
9f58469 Manage: freeze recovery while adding multiple devices.
ae6c05a Create: round off size for RAID1 arrays.
5ca3a90 Grow: print useful error when converting RAID1->RAID5 will fail.
c07d640 Fix tests/05r1-re-add-nosupper
2d762ad Fix the new ROUND_UP macro.
fd324b0 sysfs: fixed sysfs_freeze_array array to work properly with Manage_subdevs.
5551b11 imsm: avoid overflows for disks over 1TB
97f81ee clear hi bits if not used after loading metadata from disk
e03640b simplify calculating array_blocks
29cd082 show 2TB volumes/disks support in --detail-platform
2cc699a check volume size in validate_geometry_imsm_orom
9126b9a check that no disk over 2TB is used to create container when no support
027c374 imsm: set 2tb disk attribute for spare
3556c2f Fix typo: wan -> want
15632a9 parse_size: distinguish between 0 and error.
fbdef49 Bitmap_offset is a signed number
508a7f1 super1: leave more space in front of data by default.
40110b9 Fix two typos in fprintf messages
342460c mdadm man page: fix typo
0e7f69a imsm: display maximum volumes per controller and array
36fd8cc imsm: FIX: Update function imsm_num_data_members() for Raid1/10
7abc987 imsm: FIX: Add volume size expand support to imsm_analyze_change()
f3871fd imsm: Add new metadata update for volume size expansion
54397ed imsm: Execute size change for external metatdata
016e00f FIX: Support metadata changes rollback
fbf3d20 imsm: FIX: Support metadata changes rollback
44f6f18 FIX: Extend size of raid0 array
7e7e9a4 FIX: Respect metadata size limitations
65a9798 FIX: Detect error and rollback metadata
13bcac9 imsm: Add function imsm_get_free_size()
b130333 imsm: Support setting max size for size change operation
c41e00b imsm: FIX: Component size alignment check
58d26a2 FIX: Size change is possible as standalone change only
4aecb54 FIX: Assembled second array is in read only state during reshape
ae2416e FIX: resolve make everything compilation error
480f356 Raid limit of 1024 when scanning for devices.
c2ecf5f Add --prefer option for --detail and --monitor
0a99975 Relax restrictions on when --add is permitted.
7ce0570 imsm: fix: rebuild does not continue after reboot
b51702b fix: correct extending size of raid0 array
34a1395 Fix sign extension of bitmap_offset in super1.c
012a864 Introduce sysfs_set_num_signed() and use it to set bitmap/offset
5d7b407 imsm: fix: thunderdome may drop 2tb attribute
5ffdc2d Update test for "is udev active".
96fd06e Adjust to new standard of /run
974e039 test: don't worry too much about array size.
b0a658f Grow: failing the set the per-device size is not an error.
36614e9 super-intel.c: Don't try to close negative fd
562aa10 super-intel.c: Fix resource leak from opendir()

31
ANNOUNCE-3.2.5 Normal file
View file

@ -0,0 +1,31 @@
Subject: ANNOUNCE: mdadm 3.2.5 - A tool for managing Soft RAID under Linux
I am somewhat disappointed to have to announce the availability of
mdadm version 3.2.5
It is available at the usual places, now including github:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
This release primarily fixes a serious regression in 3.2.4.
This regression does *not* cause any risk to data. It simply
means that adding a device with "--add" would sometime fail
when it should not.
The fix also includes a couple of minor fixes such as making
the "--layout=preserve" option to "--grow" work again.
A reminder that the default location for runtime files is now
"/run/mdadm". If you compile this for a distro that does not
have "/run", you will need to compile with an alternate setting for
MAP_DIR. e.g.
make MAP_DIR=/var/run/mdadm
or
make MAP_DIR=/dev/.mdadm
NeilBrown 18th May 2012

57
ANNOUNCE-3.2.6 Normal file
View file

@ -0,0 +1,57 @@
Subject: ANNOUNCE: mdadm 3.2.6 - A tool for managing Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.2.6
It is available at the usual places, now including github:
countrycode=xx.
http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://neil.brown.name/git/mdadm
This is a stablity release which adds a number of bugfixs to 3.2.5.
There are no real stand-out fixes, just lots of little bits and pieces.
Below is the "git log --oneline --reverse" list of changes since
3.2.5.
NeilBrown 25th October 2012
b7e05d2 udev-rules: prevent systemd from mount devices before they are ready.
0d478e2 mdadm: Fix Segmentation fault.
42f0ca1 imsm: fix: correct checking volume's degradation
fcf2195 Monitor: fix inconsistencies in values for ->percent
5f862fb Monitor: Report NewArray when an array the disappeared, reappears.
6f51b1c Monitor: fix reporting for Fail vs FailSpare etc.
68ad53b mdmon: fix arg parsing.
517f135 Assemble: don't leak memory with fdlist.
090900c udev-rules: prevent systemd from mount devices before they are ready.
446e000 sha1.h: remove ansidecl.h header inclusion
ec894f5 Manage: zero metadata before adding to 'external' array.
3a84db5 ddf: allow a non-spare to be used to recovery a missing device.
c5d61ca ddf: hack to fix container recognition.
23084aa mdmon: fix arg processing for -a
c4e96a3 mdmon: allow --takeover when original was started with --offroot
80841df find_free_devnum: avoid auto-using names in /etc/mdadm.conf
c5c56d6 mapfile: fix mapfile rebuild for containers
aec89f6 fix segfaults in Detail()
2117ad1 Fix 'enough' function for RAID10.
0bc300d Use --offroot flag when assembling md arrays via --incrmental
ac78f24 Grow: make warning about old metadata more explicit.
14026ab Replace sha1.h with slightly older version.
6f6809f Add zlib license to crc32.c
5267ba0 Handles spaces in array names better.
c51f288 imsm: allow --assume-clean to work.
acf7076 Grow: allow --grow --continue to work for native metadata.
335d2a6 Grow: fix a couple of typos with --assume-clean usage
9ff1427 Fix open_container
3713633 mdadm: super0: do not override uuid with homehost
31bff58 Trivial bugfix and spelling fixes.
e1e539f Detail: don't report a faulty device as 'spare' or 'rebuilding'.
22a6461 super0: allow creation of array on 2TB+ devices.
a5d47a2 Create new md devices consistently
eb48676 Monitor: don't complain about non-monitorable arrays in mdadm.conf
ecdf2d7 Query: don't be confused by partition tables.
f7b75c1 Query: allow member of non-0.90 arrays to be better reported.

63
ANNOUNCE-3.3 Normal file
View file

@ -0,0 +1,63 @@
Subject: ANNOUNCE: mdadm 3.3 - A tools for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.3
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm
This is a major new release so don't be too surprised if there are a
few issues. If I hear about them they will be fixed in 3.3.1.
git log reports nearly 500 changes since 3.2.6 so I won't list them
all.
Some highlights are:
- Some array reshapes can proceed without needing backup file.
This is done by changing the 'data_offset' so we never need to write
any data back over where it was before. If there is no "head space"
or "tail space" to allow data_offset to change, the old mechanism
with a backup file can still be used.
- RAID10 arrays can be reshaped to change the number of devices,
change the chunk size, or change the layout between 'near'
and 'offset'.
This will always change data_offset, and will fail if there is no
room for data_offset to be moved.
- "--assemble --update=metadata" can convert a 0.90 array to a 1.0 array.
- bad-block-logs are supported (but not heavily tested yet)
- "--assemble --update=revert-reshape" can be used to undo a reshape
that has just been started but isn't really wanted. This is very
new and while it passes basic tests it cannot be guaranteed.
- improved locking between --incremental and --assemble
- uses systemd to run "mdmon" if systemd is configured to do that.
- kernel names of md devices can be non-numeric. e.g. "md_home" rather than
"md0". This will probably confuse lots of other tools, so you need to
echo CREATE names=yes >> /etc/mdadm.conf
or the feature will not be used. (you also need a reasonably new kernel).
- "--stop" can be given a kernel name instead of a device name. i.e
mdadm --stop md4
will work even if /dev/md4 doesn't exist.
- "--detail --export" has some information about the devices in the array
- --dump and --restore can be used to backup and restore the metadata on an
array.
- Hot-replace is supported with
mdadm /dev/mdX --replace /dev/foo
and
mdadm /dev/mdX --replace /dev/foo --with /dev/bar
- Config file can be a directory in which case all "*.conf" files are
read in lexical order.
Default is to read /etc/mdadm.conf and then /etc/mdadm.conf.d
Thus
echo CREATE name=yes > /etc/mdadm.conf.d/names.conf
will also enable the use of named md devices.
- Lots of improvements to DDF support including adding support for
RAID10 (thanks Martin Wilck).
and lots of bugfixes and other little changes.
NeilBrown 3rd September 2013

23
ANNOUNCE-3.3.1 Normal file
View file

@ -0,0 +1,23 @@
Subject: ANNOUNCE: mdadm 3.3.1 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.3.1
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm.git
The main changes are:
- lots of work on "DDF" support. Hopefully it will be more stable
now. Bug reports are always welcome.
- improved interactions with 'systemd'. Where possible, background
tasks are run from systemd (if it is present) rather then forking
disassociationg from the session. This is important because udev
doesn't really let you disassociate.
though there are a number of other little bug fixes too.
NeilBrown 5th June 2014

16
ANNOUNCE-3.3.2 Normal file
View file

@ -0,0 +1,16 @@
Subject: ANNOUNCE: mdadm 3.3.2 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.3.2
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm.git
Changes since 3.3.1 are mostly little bugfixes and some man-page
updates.
NeilBrown 21st August 2014

18
ANNOUNCE-3.3.3 Normal file
View file

@ -0,0 +1,18 @@
Subject: ANNOUNCE: mdadm 3.3.3 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.3.3
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm.git
The 100 changes since 3.3.3 are mostly little bugfixes and some improvements
to the selftests.
raid6check now handle all RAID6 layouts including DDF correctly.
See git log for the rest.
NeilBrown 24th July 2015

37
ANNOUNCE-3.3.4 Normal file
View file

@ -0,0 +1,37 @@
Subject: ANNOUNCE: mdadm 3.3.4 - A tool for managing md Soft RAID under Linux
I am somewhat disappointed to have to announce the availability of
mdadm version 3.3.4
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm.git
In mdadm-3.3 a change was made to how IMSM (Intel Matrix Storage
Manager) metadata was handled. Previously an IMSM array would only
be assembled if it was attached to an IMSM controller.
In 3.3 this was relaxed as there are circumstances where the
controller is not properly detected. Unfortunately this has negative
consequences which have only just come to light.
If you have an IMSM RAID1 configured and then disable RAID in the
BIOS, the metadata will remain on the devices. If you then install
some other OS on one device and then install Linux on the other, Linux
might eventually start noticing the IMSM metadata (depending a bit on whether
mdadm is included in the initramfs) and might start up the RAID1. This could
copy one device over the other, thus trashing one of the installations.
Not good.
So with this release IMSM arrays will only be assembled if attached to
an IMSM controller, or if "--force" is given to --assemble, or if the
environment variable IMSM_NO_PLATFORM is set (used primarily for
testing).
I strongly recommend upgrading to 3.3.4 if you are using 3.3 or later.
NeilBrown 3rd August 2015.

24
ANNOUNCE-3.4 Normal file
View file

@ -0,0 +1,24 @@
Subject: ANNOUNCE: mdadm 3.4 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 3.4
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://github.com/neilbrown/mdadm
git://neil.brown.name/mdadm
http://git.neil.brown.name/git/mdadm
The new second-level version number reflects significant new
functionality, particular support for journalled RAID5/6 and clustered
RAID1. This new support is probably still buggy. Please report bugs.
There are also a number of fixes for Intel's IMSM metadata support,
and an assortment of minor bug fixes.
I plan for this to be the last release of mdadm that I provide as I am
retiring from MD and mdadm maintenance. Jes Sorensen has volunteered
to oversee mdadm for the next while. Thanks Jes!
NeilBrown 28th January 2016

22
ANNOUNCE-4.0 Normal file
View file

@ -0,0 +1,22 @@
Subject: ANNOUNCE: mdadm 4.0 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 4.0
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
http://git.kernel.org/cgit/utils/mdadm/
The update in major version number primarily indicates this is a
release by it's new maintainer. In addition it contains a large number
of fixes in particular for IMSM RAID and clustered RAID support. In
addition this release includes support for IMSM 4k sector drives,
failfast and better documentation for journaled RAID.
This is my first release of mdadm. Please thank Neil Brown for his
previous work as maintainer and blame me for all the bugs I caused
since taking over.
Jes Sorensen, 2017-01-09

16
ANNOUNCE-4.1 Normal file
View file

@ -0,0 +1,16 @@
Subject: ANNOUNCE: mdadm 4.1 - A tool for managing md Soft RAID under Linux
I am pleased to announce the availability of
mdadm version 4.1
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
http://git.kernel.org/cgit/utils/mdadm/
The update constitutes more than one year of enhancements and bug fixes
including for IMSM RAID, Partial Parity Log, clustered RAID support,
improved testing, and gcc-8 support.
Jes Sorensen, 2018-10-01

19
ANNOUNCE-4.2 Normal file
View file

@ -0,0 +1,19 @@
Subject: ANNOUNCE: mdadm 4.2 - A tool for managing md Soft RAID under Linux
I am pleased to finally announce the availability of mdadm-4.2.
get 4.2 out the door soon.
It is available at the usual places:
http://www.kernel.org/pub/linux/utils/raid/mdadm/
and via git at
git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
http://git.kernel.org/cgit/utils/mdadm/
The release includes more than two years of development and bugfixes,
so it is difficult to remember everything. Highlights include
enhancements and bug fixes including for IMSM RAID, Partial Parity
Log, clustered RAID support, improved testing, and gcc-9 support.
Thank you everyone who contributed to this release!
Jes Sorensen, 2021-12-30

2227
Assemble.c Normal file

File diff suppressed because it is too large Load diff

227
Build.c Normal file
View file

@ -0,0 +1,227 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2009 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
int Build(char *mddev, struct mddev_dev *devlist,
struct shape *s, struct context *c)
{
/* Build a linear or raid0 arrays without superblocks
* We cannot really do any checks, we just do it.
* For md_version < 0.90.0, we call REGISTER_DEV
* with the device numbers, and then
* START_MD giving the "geometry"
* geometry is 0xpp00cc
* where pp is personality: 1==linear, 2=raid0
* cc = chunk size factor: 0==4k, 1==8k etc.
*/
int i;
dev_t rdev;
int subdevs = 0, missing_disks = 0;
struct mddev_dev *dv;
int bitmap_fd;
unsigned long long bitmapsize;
int mdfd;
char chosen_name[1024];
int uuid[4] = {0,0,0,0};
struct map_ent *map = NULL;
mdu_array_info_t array;
mdu_param_t param; /* not used by syscall */
if (s->level == UnSet) {
pr_err("a RAID level is needed to Build an array.\n");
return 1;
}
/* scan all devices, make sure they really are block devices */
for (dv = devlist; dv; dv=dv->next) {
subdevs++;
if (strcmp("missing", dv->devname) == 0) {
missing_disks++;
continue;
}
if (!stat_is_blkdev(dv->devname, NULL))
return 1;
}
if (s->raiddisks != subdevs) {
pr_err("requested %d devices in array but listed %d\n",
s->raiddisks, subdevs);
return 1;
}
if (s->layout == UnSet)
switch(s->level) {
default: /* no layout */
s->layout = 0;
break;
case 10:
s->layout = 0x102; /* near=2, far=1 */
if (c->verbose > 0)
pr_err("layout defaults to n1\n");
break;
case 5:
case 6:
s->layout = map_name(r5layout, "default");
if (c->verbose > 0)
pr_err("layout defaults to %s\n", map_num(r5layout, s->layout));
break;
case LEVEL_FAULTY:
s->layout = map_name(faultylayout, "default");
if (c->verbose > 0)
pr_err("layout defaults to %s\n", map_num(faultylayout, s->layout));
break;
}
/* We need to create the device. It can have no name. */
map_lock(&map);
mdfd = create_mddev(mddev, NULL, c->autof, LOCAL,
chosen_name, 0);
if (mdfd < 0) {
map_unlock(&map);
return 1;
}
mddev = chosen_name;
map_update(&map, fd2devnm(mdfd), "none", uuid, chosen_name);
map_unlock(&map);
array.level = s->level;
if (s->size == MAX_SIZE)
s->size = 0;
array.size = s->size;
array.nr_disks = s->raiddisks;
array.raid_disks = s->raiddisks;
array.md_minor = 0;
if (fstat_is_blkdev(mdfd, mddev, &rdev))
array.md_minor = minor(rdev);
array.not_persistent = 1;
array.state = 0; /* not clean, but no errors */
if (s->assume_clean)
array.state |= 1;
array.active_disks = s->raiddisks - missing_disks;
array.working_disks = s->raiddisks - missing_disks;
array.spare_disks = 0;
array.failed_disks = missing_disks;
if (s->chunk == 0 && (s->level==0 || s->level==LEVEL_LINEAR))
s->chunk = 64;
array.chunk_size = s->chunk*1024;
array.layout = s->layout;
if (md_set_array_info(mdfd, &array)) {
pr_err("md_set_array_info() failed for %s: %s\n",
mddev, strerror(errno));
goto abort;
}
if (s->bitmap_file && strcmp(s->bitmap_file, "none") == 0)
s->bitmap_file = NULL;
if (s->bitmap_file && s->level <= 0) {
pr_err("bitmaps not meaningful with level %s\n",
map_num(pers, s->level)?:"given");
goto abort;
}
/* now add the devices */
for ((i=0), (dv = devlist) ; dv ; i++, dv=dv->next) {
mdu_disk_info_t disk;
unsigned long long dsize;
int fd;
if (strcmp("missing", dv->devname) == 0)
continue;
if (!stat_is_blkdev(dv->devname, &rdev))
goto abort;
fd = open(dv->devname, O_RDONLY|O_EXCL);
if (fd < 0) {
pr_err("Cannot open %s: %s\n",
dv->devname, strerror(errno));
goto abort;
}
if (get_dev_size(fd, NULL, &dsize) &&
(s->size == 0 || s->size == MAX_SIZE || dsize < s->size))
s->size = dsize;
close(fd);
disk.number = i;
disk.raid_disk = i;
disk.state = (1<<MD_DISK_SYNC) | (1<<MD_DISK_ACTIVE);
if (dv->writemostly == FlagSet)
disk.state |= 1<<MD_DISK_WRITEMOSTLY;
disk.major = major(rdev);
disk.minor = minor(rdev);
if (ioctl(mdfd, ADD_NEW_DISK, &disk)) {
pr_err("ADD_NEW_DISK failed for %s: %s\n",
dv->devname, strerror(errno));
goto abort;
}
}
/* now to start it */
if (s->bitmap_file) {
bitmap_fd = open(s->bitmap_file, O_RDWR);
if (bitmap_fd < 0) {
int major = BITMAP_MAJOR_HI;
#if 0
if (s->bitmap_chunk == UnSet) {
pr_err("%s cannot be opened.\n", s->bitmap_file);
goto abort;
}
#endif
bitmapsize = s->size >> 9; /* FIXME wrong for RAID10 */
if (CreateBitmap(s->bitmap_file, 1, NULL,
s->bitmap_chunk, c->delay,
s->write_behind, bitmapsize, major)) {
goto abort;
}
bitmap_fd = open(s->bitmap_file, O_RDWR);
if (bitmap_fd < 0) {
pr_err("%s cannot be opened.\n", s->bitmap_file);
goto abort;
}
}
if (bitmap_fd >= 0) {
if (ioctl(mdfd, SET_BITMAP_FILE, bitmap_fd) < 0) {
pr_err("Cannot set bitmap file for %s: %s\n",
mddev, strerror(errno));
goto abort;
}
}
}
if (ioctl(mdfd, RUN_ARRAY, &param)) {
pr_err("RUN_ARRAY failed: %s\n", strerror(errno));
if (s->chunk & (s->chunk - 1)) {
cont_err("Problem may be that chunk size is not a power of 2\n");
}
goto abort;
}
if (c->verbose >= 0)
pr_err("array %s built and started.\n",
mddev);
wait_for(mddev, mdfd);
close(mdfd);
return 0;
abort:
ioctl(mdfd, STOP_ARRAY, 0);
close(mdfd);
return 1;
}

339
COPYING Normal file
View file

@ -0,0 +1,339 @@
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.
<signature of Ty Coon>, 1 April 1989
Ty Coon, President of Vice
This General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License.

306
ChangeLog Normal file
View file

@ -0,0 +1,306 @@
Please see git logs for detailed change log.
This file just contains highlight.
Changes Prior to release 3.3
- Some array reshapes can proceed without needing backup file.
This is done by changing the 'data_offset' so we never need to write
any data back over where it was before. If there is no "head space"
or "tail space" to allow data_offset to change, the old mechanism
with a backup file can still be used.
- RAID10 arrays can be reshaped to change the number of devices,
change the chunk size, or change the layout between 'near'
and 'offset'.
This will always change data_offset, and will fail if there is no
room for data_offset to be moved.
- "--assemble --update=metadata" can convert a 0.90 array to a 1.0 array.
- bad-block-logs are supported (but not heavily tested yet)
- "--assemble --update=revert-reshape" can be used to undo a reshape
that has just been started but isn't really wanted. This is very
new and while it passes basic tests it cannot be guaranteed.
- improved locking between --incremental and --assemble
- uses systemd to run "mdmon" if systemd is configured to do that.
- kernel names of md devices can be non-numeric. e.g. "md_home" rather than
"md0". This will probably confuse lots of other tools, so you need to
echo CREATE names=yes >> /etc/mdadm.conf
or the feature will not be used. (you also need a reasonably new kernel).
- "--stop" can be given a kernel name instead of a device name. i.e
mdadm --stop md4
will work even if /dev/md4 doesn't exist.
- "--detail --export" has some information about the devices in the array
- --dump and --restore can be used to backup and restore the metadata on an
array.
- Hot-replace is supported with
mdadm /dev/mdX --replace /dev/foo
and
mdadm /dev/mdX --replace /dev/foo --with /dev/bar
- Config file can be a directory in which case all "*.conf" files are
read in lexical order.
Default is to read /etc/mdadm.conf and then /etc/mdadm.conf.d
Thus
echo CREATE name=yes > /etc/mdadm.conf.d/names.conf
will also enable the use of named md devices.
- Lots of improvements to DDF support including adding support for
RAID10 (thanks Martin Wilck).
Changes Prior to release 3.2.6
- There are no real stand-out fixes, just lots of little bits and pieces.
Changes Prior to release 3.2.5
- This release primarily fixes a serious regression in 3.2.4.
This regression does *not* cause any risk to data. It simply
means that adding a device with "--add" would sometime fail
when it should not.
- The fix also includes a couple of minor fixes such as making
the "--layout=preserve" option to "--grow" work again.
Changes Prior to release 3.2.4
"--oneline" log of changes is below. Some notable ones are:
- --offroot argument to improve interactions between mdmon and initrd
- --prefer argument to select which /dev names to display in some
circumstances.
- relax restructions on when "--add" will be allowed
- Fix bug with adding write-intent-bitmap to active array
- Now defaults to "/run/mdadm" for storing run-time files.
Changes Prior to release 3.2.3
- The largest single area of change is support for reshape of Intel
IMSM arrays (OnLine Capacity Explansion and Level Migration).
- Among other fixes, this now has a better chance of surviving if a
device fails during reshape.
Changes Prior to release 3.2.2
- reshaping IMSM (Intel metadata) arrays is no longer 'experimental',
it should work properly and be largely compatible with IMSM drivers in
other platforms.
- --assume-clean can be used with --grow --size to avoid resyncing the
new part of the array. This is only support with very new kernels.
- RAID0 arrays can have chunksize which is not a power of 2. This has been
supported in the kernel for a while but is only now supprted by
mdadm.
- A new tool 'raid6check' is available which can check a RAID6 array,
or part of it, and report which device is most inconsistent with the
others if any stripe is inconsistent. This is still under development
and does not have a man page yet. If anyone tries it out and has any
questions or experience to report, they would be most welcome on
linux-raid@vger.kernel.org.
Changes Prior to release 3.2.1
- policy framework
Policy can be expressed for moving spare devices between arrays, and
for how to handle hot-plugged devices. This policy can be different
for devices plugged in to different controllers etc.
This, for example, allows a configuration where when a device is plugged
in it is immediately included in an md array as a hot spare and
possibly starts recovery immediately if an array is degraded.
- some understanding of mbr and gpt paritition tables
This is primarly to support the new hot-plug support. If a
device is plugged in and policy suggests it should have a partition table,
the partition table will be copied from a suitably similar device, and
then the partitions will hot-plug and can then be added to md arrays.
- "--incremental --remove" can remember where a device was removed from
so if a device gets plugged back in the same place, special policy applies
to it, allowing it to be included in an array even if a general hotplug
will not be included.
- enhanced reshape options, including growing a RAID0 by converting to RAID4,
restriping, and converting back. Also convertions between RAID0 and
RAID10 and between RAID1 and RAID10 are possible (with a suitably recent
kernel).
- spare migration for IMSM arrays.
Spare migration can now work across 'containers' using non-native metadata
and specifically Intel's IMSM arrays support spare migrations.
- OLCE and level migration for Intel IMSM arrays.
OnLine Capacity Expansion and level migration (e.g. RAID0 -> RAID5) is
supported for Intel Matrix Storage Manager arrays.
This support is currently 'experimental' for technical reasons. It can
be enabled with "export MDADM_EXPERIMENTAL=1"
- avoid including wayward devices
If you split a RAID1, mount the two halves as two separate degraded RAID1s,
and then later bring the two back together, it is possible that the md
metadata won't properly show that one must over-ride the other.
mdadm now does extra checking to detect this possibilty and avoid
potentially corrupting data.
- remove any possible confusion between similar options.
e.g. --brief and --bitmap were mapped to 'b' and mdadm wouldn't
notice if one was used where the other was expected.
- allow K,M,G suffixes on chunk sizes
Changes Prior to release 3.2
- By far the most significant change in this release related to the
management of reshaping arrays. This code has been substantially
re-written so that it can work with 'externally managed metadata' -
Intel's IMSM in particular. We now support level migration and
OnLine Capacity Expansion on these arrays.
- Policy framework.
Various policy statements can be made in the mdadm.conf to guide
the behaviour of mdadm, particular with regards to how new devices
are treated by "mdadm -I".
Depending on the 'action' associated with a device (identified by
its 'path') such need devices can be automatically re-added to and
existing array that they previously fell out off, or automatically
added as a spare if they appear to contain no data.
- mdadm now has a limited understanding of partition tables. This
allows the policy framework to make decisions about partitioned
devices as well.
- --incremental --remove can be told what --path the device was on,
and this info will be recorded so that another device appearing at
the same physical location can be preferentially added to the same
array (provides the spare-same-slot action policy applied to the
path).
- A new flags "--invalid-backup" flag is available in --assemble
mode. This can be used to re-assemble an array which was stopping
in the middle of a reshape, and for which the 'backup file' is no
longer available or is corrupted. The array may have some
corruption in it at the point where reshape was up to, but at least
the rest of the array will become available.
- Various internal restructuring - more is needed.
Changes Prior to release 3.1.5
- Fixes for v1.x metadata on big-endian machines.
- man page improvements
- Improve '--detail --export' when run on partitions of an md array.
- Fix regression with removing 'failed' or 'detached' devices.
- Fixes for "--assemble --force" in various unusual cases.
- Allow '-Y' to mean --export. This was documented but not implemented.
- Various fixed for handling 'ddf' metadata. This is now more reliable
but could benefit from more interoperability testing.
- Correctly list subarrays of a container in "--detail" output.
- Improve checks on whether the requested number of devices is supported
by the metadata - both for --create and --grow.
- Don't remove partitions from a device that is being included in an
array until we are fully committed to including it.
- Allow "--assemble --update=no-bitmap" so an array with a corrupt
bitmap can still be assembled.
- Don't allow --add to succeed if it looks like a "--re-add" is probably
wanted, but cannot succeed. This avoids inadvertently turning
devices into spares when an array is failed.
Changes Prior to release 3.1.4
Two fixes related to configs that aren't using udev:
- Don't remove md devices which 'standard' names on --stop
- Allow dev_open to work on read-only /dev
And fixed regressions:
- Allow --incremental to add spares to an array
- Accept --no-degraded as a deprecated option rather than
throwing an error
- Return correct success status when --incrmental assembling
a container which does not yet have enough devices.
- Don't link mdadm with pthreads, only mdmon needs it.
- Fix compiler warning due to bad use of snprintf
Changes Prior to release 3.1.3
- mapfile now lives in a fixed location which default to
/dev/.mdadm/map but can be changed at compile time. This
location is choses and most distros provide it during early
boot and preserve it through. As long a /dev exists and is
writable, /dev/.mdadm will be created.
Other files file communication with mdmon live here too.
This fixes a bug reported by Debian and Gentoo users where
udev would spin in early-boot.
- IMSM and DDF metadata will not be recognised on partitions
as they should only be used on whole-disks.
- Various overflows causes by 2G drives have been addressed.
- A subarray of an IMSM contain can now be killed with
--kill-subarray. Also subarrays can be renamed with
--update-subarray
- -If (or --incremental --fail) can be used from udev to
fail and remove from all arrays a device which has been
unplugged from the system. i.e. hot-unplug-support.
- "mdadm /dev/mdX --re-add missing" will look for any device
that looks like it should be a member of /dev/mdX but isn't
and will automatically --re-add it
- Now compile with -Wextra to get extra warnings.
- Lots of minor bug fixes, documentation improvements, etcc
Changes Prior to release 3.1.2
- The default metadata has change again (sorry about that).
It is now v1.2 and will hopefully stay that way. It turned
out there with boot-block issues with v1.1 which make it
unsuitable for a default, though in many cases it is still
suitable to use.
- Stopping a container is not permitted when members are still
active
- Add 'homehost' to the valid words for the "AUTO" config file
line. When followed by "-all", this causes mdadm to
auto-assemble any array belonging to this host, but not
auto-assemble anything else.
- Fix some bugs with "--grow --chunksize=" for changing chunksize.
- VAR_RUN can be easily changed at compile time just like ALT_RUN.
This gives distros more flexability in how to manage the
pid and sock files that mdmon needs.
- Various mdmon fixes
- Alway make bitmap 4K-aligned if at all possible.
- If mdadm.conf lists arrays which have inter-dependencies,
the previously had to be listed in the "right" order. Now
any order should work.
- Fix --force assembly of v1.x arrays which are in the process
of recovering.
- Add section on 'scrubbing' to 'md' man page.
- Various command-line-option parsing improvements.
- ... and lots of other bug fixes.
Changes Prior to release 3.1.1
- Multiple fixes for new --grow levels including fixes for
serious data corruption problems.
- Change default metadata to v1.1
- Change default chunk size to 512K
- Change default bitmap chunk size to 64Meg
- When --re-add is used, don't fall back to
--add if --re-add fails as this can destroy data.
Changes Prior to release 3.1
- Support --grow to change the layout of RAID4/5/6
- Support --grow to change the chunksize of raid 4/5/6
- Support --grow to change level from RAID1 -> RAID5 -> RAID6 and
back.
- Support --grow to reduce the number of devices in RAID4/5/6.
- Support restart of these grow options which assembling an array
which is partially grown.
- Assorted tests of this code, and of different RAID6 layouts.
Changes Prior to release 3.0.3
- Improvements for creating arrays giving just a name, like 'foo',
rather than the full '/dev/md/foo'.
- Improvements for assembling member arrays of containers.
- Improvements to test suite
- Add option to change increment for RebuildNN messages reported
by "mdadm --monitor"
- Improvements to mdmon 'hand-over' from initrd to final root.
- Handle merging of devices that have left an IMSM array and are
being re-incorporated.
- Add missing space in "--detail --brief" output.
Changes Prior to release 3.0.2
- Fix crash when hosthost is not set, as often happens in
early boot.
Changes Prior to release 3.0.1
- Fix various segfaults
- Fixed for --examine with containers
- Lots of other little fixes.
Changes Prior to release 3.0
- Support for externally managed metadata, specifically DDF and IMSM.
- Depend on udev to create entries in /dev, rather than creating them
ourselves.
- remove --auto-update-home-hosts
- new config file line "auto"
- new "<ignore>" and "any" options for "homehost"
- numerous bug fixes and minor enhancements.

1118
Create.c Normal file

File diff suppressed because it is too large Load diff

879
Detail.c Normal file
View file

@ -0,0 +1,879 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2013 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include "md_p.h"
#include "md_u.h"
#include <ctype.h>
#include <dirent.h>
static int cmpstringp(const void *p1, const void *p2)
{
return strcmp(* (char * const *) p1, * (char * const *) p2);
}
static int add_device(const char *dev, char ***p_devices,
int *p_max_devices, int n_devices)
{
if (n_devices + 1 >= *p_max_devices) {
*p_max_devices += 16;
*p_devices = xrealloc(*p_devices, *p_max_devices *
sizeof(**p_devices));
if (!*p_devices) {
*p_max_devices = 0;
return 0;
}
};
(*p_devices)[n_devices] = xstrdup(dev);
return n_devices + 1;
}
int Detail(char *dev, struct context *c)
{
/*
* Print out details for an md array
*/
int fd = open(dev, O_RDONLY);
mdu_array_info_t array;
mdu_disk_info_t *disks = NULL;
int next;
int d;
time_t atime;
char *str;
char **devices = NULL;
int max_devices = 0, n_devices = 0;
int spares = 0;
struct stat stb;
int failed = 0;
struct supertype *st = NULL;
char *subarray = NULL;
int max_disks = MD_SB_DISKS; /* just a default */
struct mdinfo *info = NULL;
struct mdinfo *sra = NULL;
struct mdinfo *subdev;
char *member = NULL;
char *container = NULL;
int rv = c->test ? 4 : 1;
int avail_disks = 0;
char *avail = NULL;
int external;
int inactive;
int is_container = 0;
char *arrayst;
if (fd < 0) {
pr_err("cannot open %s: %s\n",
dev, strerror(errno));
return rv;
}
sra = sysfs_read(fd, NULL, GET_VERSION | GET_DEVS |
GET_ARRAY_STATE | GET_STATE);
if (!sra) {
if (md_get_array_info(fd, &array)) {
pr_err("%s does not appear to be an md device\n", dev);
goto out;
}
}
external = (sra != NULL && sra->array.major_version == -1 &&
sra->array.minor_version == -2);
inactive = (sra != NULL && !md_array_is_active(sra));
st = super_by_fd(fd, &subarray);
if (md_get_array_info(fd, &array)) {
if (errno == ENODEV) {
if (sra->array.major_version == -1 &&
sra->array.minor_version == -1 &&
sra->devs == NULL) {
pr_err("Array associated with md device %s does not exist.\n",
dev);
goto out;
}
array = sra->array;
} else {
pr_err("cannot get array detail for %s: %s\n",
dev, strerror(errno));
goto out;
}
}
if (array.raid_disks == 0 && external)
is_container = 1;
if (fstat(fd, &stb) != 0 && !S_ISBLK(stb.st_mode))
stb.st_rdev = 0;
rv = 0;
if (st)
max_disks = st->max_devs;
if (subarray) {
/* This is a subarray of some container.
* We want the name of the container, and the member
*/
dev_t devid = devnm2devid(st->container_devnm);
int cfd, err;
member = subarray;
container = map_dev_preferred(major(devid), minor(devid),
1, c->prefer);
cfd = open_dev(st->container_devnm);
if (cfd >= 0) {
err = st->ss->load_container(st, cfd, NULL);
close(cfd);
if (err == 0)
info = st->ss->container_content(st, subarray);
}
}
/* try to load a superblock. Try sra->devs first, then try ioctl */
if (st && !info)
for (d = 0, subdev = sra ? sra->devs : NULL;
d < max_disks || subdev;
subdev ? (void)(subdev = subdev->next) : (void)(d++)){
mdu_disk_info_t disk;
char *dv;
int fd2;
int err;
if (subdev)
disk = subdev->disk;
else {
disk.number = d;
if (md_get_disk_info(fd, &disk) < 0)
continue;
if (d >= array.raid_disks &&
disk.major == 0 && disk.minor == 0)
continue;
}
if (array.raid_disks > 0 &&
(disk.state & (1 << MD_DISK_ACTIVE)) == 0)
continue;
dv = map_dev(disk.major, disk.minor, 1);
if (!dv)
continue;
fd2 = dev_open(dv, O_RDONLY);
if (fd2 < 0)
continue;
if (st->sb)
st->ss->free_super(st);
err = st->ss->load_super(st, fd2, NULL);
close(fd2);
if (err)
continue;
if (info)
free(info);
if (subarray)
info = st->ss->container_content(st, subarray);
else {
info = xmalloc(sizeof(*info));
st->ss->getinfo_super(st, info, NULL);
}
if (!info)
continue;
if (array.raid_disks != 0 && /* container */
(info->array.ctime != array.ctime ||
info->array.level != array.level)) {
st->ss->free_super(st);
continue;
}
/* some formats (imsm) have free-floating-spares
* with a uuid of uuid_zero, they don't
* have very good info about the rest of the
* container, so keep searching when
* encountering such a device. Otherwise, stop
* after the first successful call to
* ->load_super.
*/
if (memcmp(uuid_zero,
info->uuid,
sizeof(uuid_zero)) == 0) {
st->ss->free_super(st);
continue;
}
break;
}
/* Ok, we have some info to print... */
if (inactive && info)
str = map_num(pers, info->array.level);
else
str = map_num(pers, array.level);
if (c->export) {
if (array.raid_disks) {
if (str)
printf("MD_LEVEL=%s\n", str);
printf("MD_DEVICES=%d\n", array.raid_disks);
} else {
if (is_container)
printf("MD_LEVEL=container\n");
printf("MD_DEVICES=%d\n", array.nr_disks);
}
if (container) {
printf("MD_CONTAINER=%s\n", container);
printf("MD_MEMBER=%s\n", member);
} else {
if (sra && sra->array.major_version < 0)
printf("MD_METADATA=%s\n", sra->text_version);
else
printf("MD_METADATA=%d.%d\n",
array.major_version,
array.minor_version);
}
if (st && st->sb && info) {
char nbuf[64];
struct map_ent *mp, *map = NULL;
fname_from_uuid(st, info, nbuf, ':');
printf("MD_UUID=%s\n", nbuf + 5);
mp = map_by_uuid(&map, info->uuid);
if (mp && mp->path &&
strncmp(mp->path, "/dev/md/", 8) == 0) {
printf("MD_DEVNAME=");
print_escape(mp->path + 8);
putchar('\n');
}
if (st->ss->export_detail_super)
st->ss->export_detail_super(st);
map_free(map);
} else {
struct map_ent *mp, *map = NULL;
char nbuf[64];
mp = map_by_devnm(&map, fd2devnm(fd));
if (mp) {
__fname_from_uuid(mp->uuid, 0, nbuf, ':');
printf("MD_UUID=%s\n", nbuf+5);
}
if (mp && mp->path &&
strncmp(mp->path, "/dev/md/", 8) == 0) {
printf("MD_DEVNAME=");
print_escape(mp->path+8);
putchar('\n');
}
map_free(map);
}
if (!c->no_devices && sra) {
struct mdinfo *mdi;
for (mdi = sra->devs; mdi; mdi = mdi->next) {
char *path;
char *sysdev = xstrdup(mdi->sys_name);
char *cp;
path = map_dev(mdi->disk.major,
mdi->disk.minor, 0);
for (cp = sysdev; *cp; cp++)
if (!isalnum(*cp))
*cp = '_';
if (mdi->disk.raid_disk >= 0)
printf("MD_DEVICE_%s_ROLE=%d\n",
sysdev,
mdi->disk.raid_disk);
else
printf("MD_DEVICE_%s_ROLE=spare\n",
sysdev);
if (path)
printf("MD_DEVICE_%s_DEV=%s\n",
sysdev, path);
}
}
goto out;
}
disks = xmalloc(max_disks * 2 * sizeof(mdu_disk_info_t));
for (d = 0; d < max_disks * 2; d++) {
disks[d].state = (1 << MD_DISK_REMOVED);
disks[d].major = disks[d].minor = 0;
disks[d].number = -1;
disks[d].raid_disk = d / 2;
}
next = array.raid_disks * 2;
if (inactive) {
struct mdinfo *mdi;
for (mdi = sra->devs; mdi; mdi = mdi->next) {
disks[next++] = mdi->disk;
disks[next - 1].number = -1;
}
} else for (d = 0; d < max_disks; d++) {
mdu_disk_info_t disk;
disk.number = d;
if (md_get_disk_info(fd, &disk) < 0) {
if (d < array.raid_disks)
pr_err("cannot get device detail for device %d: %s\n",
d, strerror(errno));
continue;
}
if (disk.major == 0 && disk.minor == 0)
continue;
if (disk.raid_disk >= 0 && disk.raid_disk < array.raid_disks &&
disks[disk.raid_disk * 2].state == (1 << MD_DISK_REMOVED) &&
((disk.state & (1 << MD_DISK_JOURNAL)) == 0))
disks[disk.raid_disk * 2] = disk;
else if (disk.raid_disk >= 0 &&
disk.raid_disk < array.raid_disks &&
disks[disk.raid_disk * 2 + 1].state ==
(1 << MD_DISK_REMOVED) &&
!(disk.state & (1 << MD_DISK_JOURNAL)))
disks[disk.raid_disk * 2 + 1] = disk;
else if (next < max_disks * 2)
disks[next++] = disk;
}
avail = xcalloc(array.raid_disks, 1);
for (d = 0; d < array.raid_disks; d++) {
char dv[PATH_MAX], dv_rep[PATH_MAX];
snprintf(dv, PATH_MAX, "/sys/dev/block/%d:%d",
disks[d*2].major, disks[d*2].minor);
snprintf(dv_rep, PATH_MAX, "/sys/dev/block/%d:%d",
disks[d*2+1].major, disks[d*2+1].minor);
if ((is_dev_alive(dv) && (disks[d*2].state & (1<<MD_DISK_SYNC))) ||
(is_dev_alive(dv_rep) && (disks[d*2+1].state & (1<<MD_DISK_SYNC)))) {
avail_disks ++;
avail[d] = 1;
} else
rv |= !! c->test;
}
if (c->brief) {
mdu_bitmap_file_t bmf;
if (inactive && !is_container)
printf("INACTIVE-ARRAY %s", dev);
else
printf("ARRAY %s", dev);
if (c->verbose > 0) {
if (array.raid_disks)
printf(" level=%s num-devices=%d",
str ? str : "-unknown-",
array.raid_disks);
else if (is_container)
printf(" level=container num-devices=%d",
array.nr_disks);
else
printf(" num-devices=%d", array.nr_disks);
}
if (container) {
printf(" container=%s", container);
printf(" member=%s", member);
} else {
if (sra && sra->array.major_version < 0)
printf(" metadata=%s", sra->text_version);
else
printf(" metadata=%d.%d", array.major_version,
array.minor_version);
}
/* Only try GET_BITMAP_FILE for 0.90.01 and later */
if (ioctl(fd, GET_BITMAP_FILE, &bmf) == 0 && bmf.pathname[0]) {
printf(" bitmap=%s", bmf.pathname);
}
} else {
mdu_bitmap_file_t bmf;
unsigned long long larray_size;
struct mdstat_ent *ms = mdstat_read(0, 0);
struct mdstat_ent *e;
char *devnm;
devnm = stat2devnm(&stb);
for (e = ms; e; e = e->next)
if (strcmp(e->devnm, devnm) == 0)
break;
if (!get_dev_size(fd, NULL, &larray_size))
larray_size = 0;
printf("%s:\n", dev);
if (container)
printf(" Container : %s, member %s\n",
container, member);
else {
if (sra && sra->array.major_version < 0)
printf(" Version : %s\n",
sra->text_version);
else
printf(" Version : %d.%d\n",
array.major_version,
array.minor_version);
}
atime = array.ctime;
if (atime)
printf(" Creation Time : %.24s\n", ctime(&atime));
if (is_container)
str = "container";
if (str)
printf(" Raid Level : %s\n", str);
if (larray_size)
printf(" Array Size : %llu%s\n",
(larray_size >> 10),
human_size(larray_size));
if (array.level >= 1) {
if (sra)
array.major_version = sra->array.major_version;
if (array.major_version != 0 &&
(larray_size >= 0xFFFFFFFFULL|| array.size == 0)) {
unsigned long long dsize;
dsize = get_component_size(fd);
if (dsize > 0)
printf(" Used Dev Size : %llu%s\n",
dsize/2,
human_size((long long)dsize<<9));
else
printf(" Used Dev Size : unknown\n");
} else
printf(" Used Dev Size : %lu%s\n",
(unsigned long)array.size,
human_size((unsigned long long)
array.size << 10));
}
if (array.raid_disks)
printf(" Raid Devices : %d\n", array.raid_disks);
printf(" Total Devices : %d\n", array.nr_disks);
if (!container &&
((sra == NULL && array.major_version == 0) ||
(sra && sra->array.major_version == 0)))
printf(" Preferred Minor : %d\n", array.md_minor);
if (sra == NULL || sra->array.major_version >= 0)
printf(" Persistence : Superblock is %spersistent\n",
array.not_persistent ? "not " : "");
printf("\n");
/* Only try GET_BITMAP_FILE for 0.90.01 and later */
if (ioctl(fd, GET_BITMAP_FILE, &bmf) == 0 && bmf.pathname[0]) {
printf(" Intent Bitmap : %s\n", bmf.pathname);
printf("\n");
} else if (array.state & (1<<MD_SB_CLUSTERED))
printf(" Intent Bitmap : Internal(Clustered)\n\n");
else if (array.state & (1<<MD_SB_BITMAP_PRESENT))
printf(" Intent Bitmap : Internal\n\n");
atime = array.utime;
if (atime)
printf(" Update Time : %.24s\n", ctime(&atime));
if (array.raid_disks) {
static char *sync_action[] = {
", recovering", ", resyncing",
", reshaping", ", checking" };
char *st;
if (avail_disks == array.raid_disks)
st = "";
else if (!enough(array.level, array.raid_disks,
array.layout, 1, avail))
st = ", FAILED";
else
st = ", degraded";
if (array.state & (1 << MD_SB_CLEAN)) {
if ((array.level == 0) ||
(array.level == LEVEL_LINEAR))
arrayst = map_num(sysfs_array_states,
sra->array_state);
else
arrayst = "clean";
} else {
arrayst = "active";
if (array.state & (1<<MD_SB_CLUSTERED)) {
for (d = 0; d < max_disks * 2; d++) {
char *dv;
mdu_disk_info_t disk = disks[d];
/* only check first valid disk in cluster env */
if ((disk.state & (MD_DISK_SYNC | MD_DISK_ACTIVE))
&& (disk.major | disk.minor)) {
dv = map_dev_preferred(disk.major, disk.minor, 0,
c->prefer);
if (!dv)
continue;
arrayst = IsBitmapDirty(dv) ? "active" : "clean";
break;
}
}
}
}
printf(" State : %s%s%s%s%s%s%s \n",
arrayst, st,
(!e || (e->percent < 0 &&
e->percent != RESYNC_PENDING &&
e->percent != RESYNC_DELAYED &&
e->percent != RESYNC_REMOTE)) ?
"" : sync_action[e->resync],
larray_size ? "": ", Not Started",
(e && e->percent == RESYNC_DELAYED) ?
" (DELAYED)": "",
(e && e->percent == RESYNC_PENDING) ?
" (PENDING)": "",
(e && e->percent == RESYNC_REMOTE) ?
" (REMOTE)": "");
} else if (inactive && !is_container) {
printf(" State : inactive\n");
}
if (array.raid_disks)
printf(" Active Devices : %d\n", array.active_disks);
if (array.working_disks > 0)
printf(" Working Devices : %d\n",
array.working_disks);
if (array.raid_disks) {
printf(" Failed Devices : %d\n", array.failed_disks);
if (!external)
printf(" Spare Devices : %d\n", array.spare_disks);
}
printf("\n");
if (array.level == 5) {
str = map_num(r5layout, array.layout);
printf(" Layout : %s\n",
str ? str : "-unknown-");
}
if (array.level == 0 && array.layout) {
str = map_num(r0layout, array.layout);
printf(" Layout : %s\n",
str ? str : "-unknown-");
}
if (array.level == 6) {
str = map_num(r6layout, array.layout);
printf(" Layout : %s\n",
str ? str : "-unknown-");
}
if (array.level == 10) {
printf(" Layout :");
print_r10_layout(array.layout);
printf("\n");
}
switch (array.level) {
case 0:
case 4:
case 5:
case 10:
case 6:
if (array.chunk_size)
printf(" Chunk Size : %dK\n\n",
array.chunk_size/1024);
break;
case -1:
printf(" Rounding : %dK\n\n",
array.chunk_size/1024);
break;
default:
break;
}
if (array.raid_disks) {
struct mdinfo *mdi;
mdi = sysfs_read(fd, NULL, GET_CONSISTENCY_POLICY);
if (mdi) {
char *policy = map_num(consistency_policies,
mdi->consistency_policy);
sysfs_free(mdi);
if (policy)
printf("Consistency Policy : %s\n\n",
policy);
}
}
if (e && e->percent >= 0) {
static char *sync_action[] = {
"Rebuild", "Resync", "Reshape", "Check"};
printf(" %7s Status : %d%% complete\n",
sync_action[e->resync], e->percent);
}
if ((st && st->sb) && (info && info->reshape_active)) {
#if 0
This is pretty boring
printf(" Reshape pos'n : %llu%s\n",
(unsigned long long) info->reshape_progress << 9,
human_size((unsigned long long)
info->reshape_progress << 9));
#endif
if (info->delta_disks != 0)
printf(" Delta Devices : %d, (%d->%d)\n",
info->delta_disks,
array.raid_disks - info->delta_disks,
array.raid_disks);
if (info->new_level != array.level) {
str = map_num(pers, info->new_level);
printf(" New Level : %s\n",
str ? str : "-unknown-");
}
if (info->new_level != array.level ||
info->new_layout != array.layout) {
if (info->new_level == 5) {
str = map_num(r5layout,
info->new_layout);
printf(" New Layout : %s\n",
str ? str : "-unknown-");
}
if (info->new_level == 6) {
str = map_num(r6layout,
info->new_layout);
printf(" New Layout : %s\n",
str ? str : "-unknown-");
}
if (info->new_level == 10) {
printf(" New Layout : near=%d, %s=%d\n",
info->new_layout & 255,
(info->new_layout & 0x10000) ?
"offset" : "far",
(info->new_layout >> 8) & 255);
}
}
if (info->new_chunk != array.chunk_size)
printf(" New Chunksize : %dK\n",
info->new_chunk/1024);
printf("\n");
} else if (e && e->percent >= 0)
printf("\n");
free_mdstat(ms);
if (st && st->sb)
st->ss->detail_super(st, c->homehost, subarray);
if (array.raid_disks == 0 && sra &&
sra->array.major_version == -1 &&
sra->array.minor_version == -2 &&
sra->text_version[0] != '/') {
/* This looks like a container. Find any active arrays
* That claim to be a member.
*/
DIR *dir = opendir("/sys/block");
struct dirent *de;
printf(" Member Arrays :");
while (dir && (de = readdir(dir)) != NULL) {
char path[287];
char vbuf[1024];
int nlen = strlen(sra->sys_name);
dev_t devid;
if (de->d_name[0] == '.')
continue;
sprintf(path,
"/sys/block/%s/md/metadata_version",
de->d_name);
if (load_sys(path, vbuf, sizeof(vbuf)) < 0)
continue;
if (strncmp(vbuf, "external:", 9) ||
!is_subarray(vbuf + 9) ||
strncmp(vbuf + 10, sra->sys_name, nlen) ||
vbuf[10 + nlen] != '/')
continue;
devid = devnm2devid(de->d_name);
printf(" %s",
map_dev_preferred(major(devid),
minor(devid), 1,
c->prefer));
}
if (dir)
closedir(dir);
printf("\n\n");
}
if (!c->no_devices) {
if (array.raid_disks)
printf(" Number Major Minor RaidDevice State\n");
else
printf(" Number Major Minor RaidDevice\n");
}
}
/* if --no_devices specified, not print component devices info */
if (c->no_devices)
goto skip_devices_state;
for (d = 0; d < max_disks * 2; d++) {
char *dv;
mdu_disk_info_t disk = disks[d];
if (d >= array.raid_disks * 2 &&
disk.major == 0 && disk.minor == 0)
continue;
if ((d & 1) && disk.major == 0 && disk.minor == 0)
continue;
if (!c->brief) {
if (d == array.raid_disks*2)
printf("\n");
if (disk.number < 0 && disk.raid_disk < 0)
printf(" - %5d %5d - ",
disk.major, disk.minor);
else if (disk.raid_disk < 0 ||
disk.state & (1 << MD_DISK_JOURNAL))
printf(" %5d %5d %5d - ",
disk.number, disk.major, disk.minor);
else if (disk.number < 0)
printf(" - %5d %5d %5d ",
disk.major, disk.minor, disk.raid_disk);
else
printf(" %5d %5d %5d %5d ",
disk.number, disk.major, disk.minor,
disk.raid_disk);
}
if (!c->brief && array.raid_disks) {
if (disk.state & (1 << MD_DISK_FAULTY)) {
printf(" faulty");
if (disk.raid_disk < array.raid_disks &&
disk.raid_disk >= 0)
failed++;
}
if (disk.state & (1 << MD_DISK_ACTIVE))
printf(" active");
if (disk.state & (1 << MD_DISK_SYNC)) {
printf(" sync");
if (array.level == 10 &&
(array.layout & ~0x1FFFF) == 0) {
int nc = array.layout & 0xff;
int fc = (array.layout >> 8) & 0xff;
int copies = nc*fc;
if (fc == 1 &&
array.raid_disks % copies == 0 &&
copies <= 26) {
/* We can divide the devices
into 'sets' */
int set;
set = disk.raid_disk % copies;
printf(" set-%c", set + 'A');
}
}
}
if (disk.state & (1 << MD_DISK_REMOVED))
printf(" removed");
if (disk.state & (1 << MD_DISK_WRITEMOSTLY))
printf(" writemostly");
if (disk.state & (1 << MD_DISK_FAILFAST))
printf(" failfast");
if (disk.state & (1 << MD_DISK_JOURNAL))
printf(" journal");
if ((disk.state &
((1 << MD_DISK_ACTIVE) | (1 << MD_DISK_SYNC) |
(1 << MD_DISK_REMOVED) | (1 << MD_DISK_FAULTY) |
(1 << MD_DISK_JOURNAL))) == 0) {
printf(" spare");
if (disk.raid_disk < array.raid_disks &&
disk.raid_disk >= 0)
printf(" rebuilding");
}
}
if (disk.state == 0)
spares++;
dv = map_dev_preferred(disk.major, disk.minor, 0, c->prefer);
if (dv != NULL) {
if (c->brief)
n_devices = add_device(dv, &devices,
&max_devices, n_devices);
else
printf(" %s", dv);
} else if (disk.major | disk.minor)
printf(" missing");
if (!c->brief)
printf("\n");
}
skip_devices_state:
if (spares && c->brief && array.raid_disks)
printf(" spares=%d", spares);
if (c->brief && st && st->sb)
st->ss->brief_detail_super(st, subarray);
if (st)
st->ss->free_super(st);
if (c->brief && c->verbose > 0 && devices) {
qsort(devices, n_devices, sizeof(*devices), cmpstringp);
printf("\n devices=%s", devices[0]);
for (d = 1; d < n_devices; d++)
printf(",%s", devices[d]);
}
if (c->brief)
printf("\n");
if (c->test &&
!enough(array.level, array.raid_disks, array.layout, 1, avail))
rv = 2;
out:
free(info);
free(disks);
close(fd);
free(subarray);
free(avail);
if (devices)
for (d = 0; d < n_devices; d++)
free(devices[d]);
free(devices);
sysfs_free(sra);
free(st);
return rv;
}
int Detail_Platform(struct superswitch *ss, int scan, int verbose, int export, char *controller_path)
{
/* display platform capabilities for the given metadata format
* 'scan' in this context means iterate over all metadata types
*/
int i;
int err = 1;
if (ss && export && ss->export_detail_platform)
err = ss->export_detail_platform(verbose, controller_path);
else if (ss && ss->detail_platform)
err = ss->detail_platform(verbose, 0, controller_path);
else if (ss) {
if (verbose > 0)
pr_err("%s metadata is platform independent\n",
ss->name ? : "[no name]");
} else if (!scan) {
if (verbose > 0)
pr_err("specify a metadata type or --scan\n");
}
if (!scan)
return err;
err = 0;
for (i = 0; superlist[i]; i++) {
struct superswitch *meta = superlist[i];
if (meta == ss)
continue;
if (verbose > 0)
pr_err("checking metadata %s\n",
meta->name ? : "[no name]");
if (!meta->detail_platform) {
if (verbose > 0)
pr_err("%s metadata is platform independent\n",
meta->name ? : "[no name]");
} else if (export && meta->export_detail_platform) {
err |= meta->export_detail_platform(verbose, controller_path);
} else
err |= meta->detail_platform(verbose, 0, controller_path);
}
return err;
}

319
Dump.c Normal file
View file

@ -0,0 +1,319 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2013 Neil Brown <neilb@suse.de>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include <sys/dir.h>
int Dump_metadata(char *dev, char *dir, struct context *c,
struct supertype *st)
{
/* create a new file in 'dir' named for the basename of 'dev'.
* Truncate to the same size as 'dev' and ask the metadata
* handler to copy metadata there.
* For every name in /dev/disk/by-id that points to this device,
* create a hardlink in 'dir'.
* Complain if any of those hardlinks cannot be created.
*/
int fd, fl;
struct stat stb, dstb;
char *base;
char *fname = NULL;
unsigned long long size;
DIR *dirp;
struct dirent *de;
if (stat(dir, &stb) != 0 ||
(S_IFMT & stb.st_mode) != S_IFDIR) {
pr_err("--dump requires an existing directory, not: %s\n",
dir);
return 16;
}
fd = dev_open(dev, O_RDONLY);
if (fd < 0) {
pr_err("Cannot open %s to dump metadata: %s\n",
dev, strerror(errno));
return 1;
}
if (!get_dev_size(fd, dev, &size)) {
close(fd);
return 1;
}
if (st == NULL)
st = guess_super_type(fd, guess_array);
if (!st) {
pr_err("Cannot find RAID metadata on %s\n", dev);
close(fd);
return 1;
}
st->ignore_hw_compat = 1;
if (st->ss->load_super(st, fd, NULL) != 0) {
pr_err("No %s metadata found on %s\n",
st->ss->name, dev);
close(fd);
return 1;
}
if (st->ss->copy_metadata == NULL) {
pr_err("%s metadata on %s cannot be copied\n",
st->ss->name, dev);
close(fd);
return 1;
}
base = strrchr(dev, '/');
if (base)
base++;
else
base = dev;
xasprintf(&fname, "%s/%s", dir, base);
fl = open(fname, O_RDWR|O_CREAT|O_EXCL, 0666);
if (fl < 0) {
pr_err("Cannot create dump file %s: %s\n",
fname, strerror(errno));
close(fd);
free(fname);
return 1;
}
if (ftruncate(fl, size) < 0) {
pr_err("failed to set size of dump file: %s\n",
strerror(errno));
close(fd);
close(fl);
free(fname);
return 1;
}
if (st->ss->copy_metadata(st, fd, fl) != 0) {
pr_err("Failed to copy metadata from %s to %s\n",
dev, fname);
close(fd);
close(fl);
unlink(fname);
free(fname);
return 1;
}
if (c->verbose >= 0)
printf("%s saved as %s.\n", dev, fname);
fstat(fd, &dstb);
close(fd);
close(fl);
if ((dstb.st_mode & S_IFMT) != S_IFBLK) {
/* Not a block device, so cannot create links */
free(fname);
return 0;
}
/* mostly done: just want to find some other names */
dirp = opendir("/dev/disk/by-id");
if (!dirp) {
free(fname);
return 0;
}
while ((de = readdir(dirp)) != NULL) {
char *p = NULL;
if (de->d_name[0] == '.')
continue;
xasprintf(&p, "/dev/disk/by-id/%s", de->d_name);
if (stat(p, &stb) != 0 ||
(stb.st_mode & S_IFMT) != S_IFBLK ||
stb.st_rdev != dstb.st_rdev) {
/* Not this one */
free(p);
continue;
}
free(p);
xasprintf(&p, "%s/%s", dir, de->d_name);
if (link(fname, p) == 0) {
if (c->verbose >= 0)
printf("%s also saved as %s.\n",
dev, p);
} else {
pr_err("Could not save %s as %s!!\n",
dev, p);
}
free(p);
}
closedir(dirp);
free(fname);
return 0;
}
int Restore_metadata(char *dev, char *dir, struct context *c,
struct supertype *st, int only)
{
/* If 'dir' really is a directory we choose a name
* from it that matches a suitable name in /dev/disk/by-id,
* and copy metadata from the file to the device.
* If two names from by-id match and aren't both the same
* inode, we fail. If none match and basename of 'dev'
* can be found in dir, use that.
* If 'dir' is really a file then it is only permitted if
* 'only' is set (meaning there was only one device given)
* and the metadata is restored irrespective of file names.
*/
int fd, fl;
struct stat stb, dstb;
char *fname = NULL;
unsigned long long size;
if (stat(dir, &stb) != 0) {
pr_err("%s does not exist: cannot restore from there.\n",
dir);
return 16;
} else if ((S_IFMT & stb.st_mode) != S_IFDIR && !only) {
pr_err("--restore requires a directory when multiple devices given\n");
return 16;
}
fd = dev_open(dev, O_RDWR);
if (fd < 0) {
pr_err("Cannot open %s to restore metadata: %s\n",
dev, strerror(errno));
return 1;
}
if (!get_dev_size(fd, dev, &size)) {
close(fd);
return 1;
}
if ((S_IFMT & stb.st_mode) == S_IFDIR) {
/* choose one name from the directory. */
DIR *d = opendir(dir);
struct dirent *de;
char *chosen = NULL;
unsigned int chosen_inode = 0;
fstat(fd, &dstb);
while (d && (de = readdir(d)) != NULL) {
if (de->d_name[0] == '.')
continue;
xasprintf(&fname, "/dev/disk/by-id/%s", de->d_name);
if (stat(fname, &stb) != 0) {
free(fname);
continue;
}
free(fname);
if ((S_IFMT & stb.st_mode) != S_IFBLK)
continue;
if (stb.st_rdev != dstb.st_rdev)
continue;
/* This file is a good match for our device. */
xasprintf(&fname, "%s/%s", dir, de->d_name);
if (stat(fname, &stb) != 0) {
/* Weird! */
free(fname);
continue;
}
if (chosen == NULL) {
chosen = fname;
chosen_inode = stb.st_ino;
continue;
}
if (chosen_inode == stb.st_ino) {
/* same, no need to change */
free(fname);
continue;
}
/* Oh dear, two names both match. Must give up. */
pr_err("Both %s and %s seem suitable for %s. Please choose one.\n",
chosen, fname, dev);
free(fname);
free(chosen);
close(fd);
closedir(d);
return 1;
}
closedir(d);
if (!chosen) {
/* One last chance: try basename of device */
char *base = strrchr(dev, '/');
if (base)
base++;
else
base = dev;
xasprintf(&fname, "%s/%s", dir, base);
if (stat(fname, &stb) == 0)
chosen = fname;
else
free(fname);
}
fname = chosen;
} else
fname = strdup(dir);
if (!fname) {
pr_err("Cannot find suitable file in %s for %s\n",
dir, dev);
close(fd);
return 1;
}
fl = open(fname, O_RDONLY);
if (!fl) {
pr_err("Could not open %s for --restore.\n",
fname);
goto err;
}
if (stat(fname, &stb) != 0) {
pr_err("Could not stat %s for --restore.\n",
fname);
goto err;
}
if (((unsigned long long)stb.st_size) != size) {
pr_err("%s is not the same size as %s - cannot restore.\n",
fname, dev);
goto err;
}
if (st == NULL)
st = guess_super_type(fl, guess_array);
if (!st) {
pr_err("Cannot find metadata on %s\n", fname);
goto err;
}
st->ignore_hw_compat = 1;
if (st->ss->load_super(st, fl, NULL) != 0) {
pr_err("No %s metadata found on %s\n",
st->ss->name, fname);
goto err;
}
if (st->ss->copy_metadata == NULL) {
pr_err("%s metadata on %s cannot be copied\n",
st->ss->name, dev);
goto err;
}
if (st->ss->copy_metadata(st, fl, fd) != 0) {
pr_err("Failed to copy metadata from %s to %s\n",
fname, dev);
goto err;
}
if (c->verbose >= 0)
printf("%s restored from %s.\n", dev, fname);
close(fl);
close(fd);
free(fname);
return 0;
err:
close(fd);
close(fl);
free(fname);
return 1;
}

228
Examine.c Normal file
View file

@ -0,0 +1,228 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2013 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include "dlink.h"
#if ! defined(__BIG_ENDIAN) && ! defined(__LITTLE_ENDIAN)
#error no endian defined
#endif
#include "md_u.h"
#include "md_p.h"
int Examine(struct mddev_dev *devlist,
struct context *c,
struct supertype *forcest)
{
/* Read the raid superblock from a device and
* display important content.
*
* If cannot be found, print reason: too small, bad magic
*
* Print:
* version, ctime, level, size, raid+spare+
* prefered minor
* uuid
*
* utime, state etc
*
* If (brief) gather devices for same array and just print a mdadm.conf
* line including devices=
* if devlist==NULL, use conf_get_devs()
*/
int fd;
int rv = 0;
struct array {
struct supertype *st;
struct mdinfo info;
void *devs;
struct array *next;
int spares;
} *arrays = NULL;
for (; devlist ; devlist = devlist->next) {
struct supertype *st;
int have_container = 0;
int err = 0;
int container = 0;
fd = dev_open(devlist->devname, O_RDONLY);
if (fd < 0) {
if (!c->scan) {
pr_err("cannot open %s: %s\n",
devlist->devname, strerror(errno));
rv = 1;
}
continue;
}
if (forcest)
st = dup_super(forcest);
else if (must_be_container(fd)) {
/* might be a container */
st = super_by_fd(fd, NULL);
container = 1;
} else
st = guess_super(fd);
if (st) {
err = 1;
st->ignore_hw_compat = 1;
if (!container)
err = st->ss->load_super(st, fd,
(c->brief||c->scan) ? NULL
:devlist->devname);
if (err && st->ss->load_container) {
err = st->ss->load_container(st, fd,
(c->brief||c->scan) ? NULL
:devlist->devname);
if (!err)
have_container = 1;
}
st->ignore_hw_compat = 0;
} else {
if (!c->brief) {
pr_err("No md superblock detected on %s.\n", devlist->devname);
rv = 1;
}
err = 1;
}
close(fd);
if (err) {
if (st)
st->ss->free_super(st);
continue;
}
if (c->SparcAdjust)
st->ss->update_super(st, NULL, "sparc2.2",
devlist->devname, 0, 0, NULL);
/* Ok, its good enough to try, though the checksum could be wrong */
if (c->brief && st->ss->brief_examine_super == NULL) {
if (!c->scan)
pr_err("No brief listing for %s on %s\n",
st->ss->name, devlist->devname);
} else if (c->brief) {
struct array *ap;
char *d;
for (ap = arrays; ap; ap = ap->next) {
if (st->ss == ap->st->ss &&
st->ss->compare_super(ap->st, st, 0) == 0)
break;
}
if (!ap) {
ap = xmalloc(sizeof(*ap));
ap->devs = dl_head();
ap->next = arrays;
ap->spares = 0;
ap->st = st;
arrays = ap;
st->ss->getinfo_super(st, &ap->info, NULL);
} else
st->ss->getinfo_super(st, &ap->info, NULL);
if (!have_container &&
!(ap->info.disk.state & (1<<MD_DISK_SYNC)))
ap->spares++;
d = dl_strdup(devlist->devname);
dl_add(ap->devs, d);
} else if (c->export) {
if (st->ss->export_examine_super)
st->ss->export_examine_super(st);
st->ss->free_super(st);
} else {
printf("%s:\n",devlist->devname);
st->ss->examine_super(st, c->homehost);
st->ss->free_super(st);
}
}
if (c->brief) {
struct array *ap;
for (ap = arrays; ap; ap = ap->next) {
char sep='=';
char *d;
int newline = 0;
ap->st->ss->brief_examine_super(ap->st, c->verbose > 0);
if (ap->spares && !ap->st->ss->external)
newline += printf(" spares=%d", ap->spares);
if (c->verbose > 0) {
newline += printf(" devices");
for (d = dl_next(ap->devs);
d != ap->devs;
d=dl_next(d)) {
printf("%c%s", sep, d);
sep=',';
}
}
if (ap->st->ss->brief_examine_subarrays) {
if (newline)
printf("\n");
ap->st->ss->brief_examine_subarrays(ap->st, c->verbose);
}
ap->st->ss->free_super(ap->st);
/* FIXME free ap */
if (ap->spares || c->verbose > 0)
printf("\n");
}
}
return rv;
}
int ExamineBadblocks(char *devname, int brief, struct supertype *forcest)
{
int fd = dev_open(devname, O_RDONLY);
struct supertype *st = forcest;
int err = 1;
if (fd < 0) {
pr_err("cannot open %s: %s\n", devname, strerror(errno));
return 1;
}
if (!st)
st = guess_super(fd);
if (!st) {
if (!brief)
pr_err("No md superblock detected on %s\n", devname);
goto out;
}
if (!st->ss->examine_badblocks) {
pr_err("%s metadata does not support badblocks\n", st->ss->name);
goto out;
}
err = st->ss->load_super(st, fd, brief ? NULL : devname);
if (err)
goto out;
err = st->ss->examine_badblocks(st, fd, devname);
out:
if (fd >= 0)
close(fd);
if (st) {
st->ss->free_super(st);
free(st);
}
return err;
}

5229
Grow.c Normal file

File diff suppressed because it is too large Load diff

13
INSTALL Normal file
View file

@ -0,0 +1,13 @@
To build mdadm, simply run:
make
to install, run
make install
as root.
No configuration is necessary.

1764
Incremental.c Normal file

File diff suppressed because it is too large Load diff

147
Kill.c Normal file
View file

@ -0,0 +1,147 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2009 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*
* Added by Dale Stephenson
* steph@snapserver.com
*/
#include "mdadm.h"
#include "md_u.h"
#include "md_p.h"
int Kill(char *dev, struct supertype *st, int force, int verbose, int noexcl)
{
/*
* Nothing fancy about Kill. It just zeroes out a superblock
* Definitely not safe.
* Returns:
* 0 - a zero superblock was successfully written out
* 1 - failed to write the zero superblock
* 2 - failed to open the device.
* 4 - failed to find a superblock.
*/
int fd, rv = 0;
if (force)
noexcl = 1;
fd = open(dev, O_RDWR|(noexcl ? 0 : O_EXCL));
if (fd < 0) {
if (verbose >= 0)
pr_err("Couldn't open %s for write - not zeroing\n",
dev);
return 2;
}
if (st == NULL)
st = guess_super(fd);
if (st == NULL || st->ss->init_super == NULL) {
if (verbose >= 0)
pr_err("Unrecognised md component device - %s\n", dev);
close(fd);
return 4;
}
st->ignore_hw_compat = 1;
rv = st->ss->load_super(st, fd, dev);
if (rv == 0 || (force && rv >= 2)) {
st->ss->free_super(st);
st->ss->init_super(st, NULL, NULL, "", NULL, NULL,
INVALID_SECTORS);
if (st->ss->store_super(st, fd)) {
if (verbose >= 0)
pr_err("Could not zero superblock on %s\n",
dev);
rv = 1;
} else if (rv) {
if (verbose >= 0)
pr_err("superblock zeroed anyway\n");
rv = 0;
}
}
close(fd);
return rv;
}
int Kill_subarray(char *dev, char *subarray, int verbose)
{
/* Delete a subarray out of a container, the subarry must be
* inactive. The subarray string must be a subarray index
* number.
*
* 0 = successfully deleted subarray from all container members
* 1 = failed to sync metadata to one or more devices
* 2 = failed to find the container, subarray, or other resource
* issue
*/
struct supertype supertype, *st = &supertype;
int fd, rv = 2;
memset(st, 0, sizeof(*st));
fd = open_subarray(dev, subarray, st, verbose < 0);
if (fd < 0)
return 2;
if (!st->ss->kill_subarray) {
if (verbose >= 0)
pr_err("Operation not supported for %s metadata\n",
st->ss->name);
goto free_super;
}
if (is_subarray_active(subarray, st->devnm)) {
if (verbose >= 0)
pr_err("Subarray-%s still active, aborting\n",
subarray);
goto free_super;
}
if (mdmon_running(st->devnm))
st->update_tail = &st->updates;
/* ok we've found our victim, drop the axe */
rv = st->ss->kill_subarray(st, subarray);
if (rv) {
if (verbose >= 0)
pr_err("Failed to delete subarray-%s from %s\n",
subarray, dev);
goto free_super;
}
/* FIXME these routines do not report success/failure */
if (st->update_tail)
flush_metadata_updates(st);
else
st->ss->sync_metadata(st);
if (verbose >= 0)
pr_err("Deleted subarray-%s from %s, UUIDs may have changed\n",
subarray, dev);
rv = 0;
free_super:
st->ss->free_super(st);
close(fd);
return rv;
}

332
Makefile Normal file
View file

@ -0,0 +1,332 @@
#
# mdadm - manage Linux "md" devices aka RAID arrays.
#
# Copyright (C) 2001-2002 Neil Brown <neilb@cse.unsw.edu.au>
# Copyright (C) 2013 Neil Brown <neilb@suse.de>
#
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
# Author: Neil Brown
# Email: <neilb@cse.unsw.edu.au>
# Paper: Neil Brown
# School of Computer Science and Engineering
# The University of New South Wales
# Sydney, 2052
# Australia
#
# define "CXFLAGS" to give extra flags to CC.
# e.g. make CXFLAGS=-O to optimise
CXFLAGS ?=-O2
TCC = tcc
UCLIBC_GCC = $(shell for nm in i386-uclibc-linux-gcc i386-uclibc-gcc; do which $$nm > /dev/null && { echo $$nm ; exit; } ; done; echo false No uclibc found )
#DIET_GCC = diet gcc
# sorry, but diet-libc doesn't know about posix_memalign,
# so we cannot use it any more.
DIET_GCC = gcc -DHAVE_STDINT_H
KLIBC=/home/src/klibc/klibc-0.77
KLIBC_GCC = gcc -nostdinc -iwithprefix include -I$(KLIBC)/klibc/include -I$(KLIBC)/linux/include -I$(KLIBC)/klibc/arch/i386/include -I$(KLIBC)/klibc/include/bits32
ifdef COVERITY
COVERITY_FLAGS=-include coverity-gcc-hack.h
endif
ifeq ($(origin CC),default)
CC := $(CROSS_COMPILE)gcc
endif
CXFLAGS ?= -ggdb
CWFLAGS = -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter
ifdef WARN_UNUSED
CWFLAGS += -Wp,-D_FORTIFY_SOURCE=2 -O3
endif
FALLTHROUGH := $(shell gcc -v --help 2>&1 | grep "implicit-fallthrough" | wc -l)
ifneq "$(FALLTHROUGH)" "0"
CWFLAGS += -Wimplicit-fallthrough=0
endif
ifdef DEBIAN
CPPFLAGS += -DDEBIAN
endif
ifdef DEFAULT_OLD_METADATA
CPPFLAGS += -DDEFAULT_OLD_METADATA
DEFAULT_METADATA=0.90
else
DEFAULT_METADATA=1.2
endif
CPPFLAGS += -DBINDIR=\"$(BINDIR)\"
PKG_CONFIG ?= pkg-config
SYSCONFDIR = /etc
CONFFILE = $(SYSCONFDIR)/mdadm.conf
CONFFILE2 = $(SYSCONFDIR)/mdadm/mdadm.conf
MAILCMD =/usr/sbin/sendmail -t
CONFFILEFLAGS = -DCONFFILE=\"$(CONFFILE)\" -DCONFFILE2=\"$(CONFFILE2)\"
# Both MAP_DIR and MDMON_DIR should be somewhere that persists across the
# pivotroot from early boot to late boot.
# /run is best, but for distros that don't support that.
# /dev can work, in which case you probably want /dev/.mdadm
RUN_DIR=/run/mdadm
CHECK_RUN_DIR=1
MAP_DIR=$(RUN_DIR)
MAP_FILE = map
MAP_PATH = $(MAP_DIR)/$(MAP_FILE)
MDMON_DIR = $(RUN_DIR)
# place for autoreplace cookies
FAILED_SLOTS_DIR = $(RUN_DIR)/failed-slots
SYSTEMD_DIR=/lib/systemd/system
LIB_DIR=/usr/libexec/mdadm
COROSYNC:=$(shell [ -d /usr/include/corosync ] || echo -DNO_COROSYNC)
DLM:=$(shell [ -f /usr/include/libdlm.h ] || echo -DNO_DLM)
DIRFLAGS = -DMAP_DIR=\"$(MAP_DIR)\" -DMAP_FILE=\"$(MAP_FILE)\"
DIRFLAGS += -DMDMON_DIR=\"$(MDMON_DIR)\"
DIRFLAGS += -DFAILED_SLOTS_DIR=\"$(FAILED_SLOTS_DIR)\"
CFLAGS = $(CWFLAGS) $(CXFLAGS) -DSendmail=\""$(MAILCMD)"\" $(CONFFILEFLAGS) $(DIRFLAGS) $(COROSYNC) $(DLM)
VERSION = $(shell [ -d .git ] && git describe HEAD | sed 's/mdadm-//')
VERS_DATE = $(shell [ -d .git ] && date --iso-8601 --date="`git log -n1 --format=format:%cd --date=iso --date=short`")
DVERS = $(if $(VERSION),-DVERSION=\"$(VERSION)\",)
DDATE = $(if $(VERS_DATE),-DVERS_DATE="\"$(VERS_DATE)\"",)
DEXTRAVERSION = $(if $(EXTRAVERSION),-DEXTRAVERSION="\" - $(EXTRAVERSION)\"",)
CFLAGS += $(DVERS) $(DDATE) $(DEXTRAVERSION)
# The glibc TLS ABI requires applications that call clone(2) to set up
# TLS data structures, use pthreads until mdmon implements this support
USE_PTHREADS = 1
ifdef USE_PTHREADS
CFLAGS += -DUSE_PTHREADS
MON_LDFLAGS += -pthread
endif
# If you want a static binary, you might uncomment these
# LDFLAGS = -static
# STRIP = -s
LDLIBS = -ldl
# To explicitly disable libudev, set -DNO_LIBUDEV in CXFLAGS
ifeq (, $(findstring -DNO_LIBUDEV, $(CXFLAGS)))
LDLIBS += -ludev
endif
INSTALL = /usr/bin/install
DESTDIR =
BINDIR = /sbin
MANDIR = /usr/share/man
MAN4DIR = $(MANDIR)/man4
MAN5DIR = $(MANDIR)/man5
MAN8DIR = $(MANDIR)/man8
UDEVDIR := $(shell $(PKG_CONFIG) --variable=udevdir udev 2>/dev/null)
ifndef UDEVDIR
UDEVDIR = /lib/udev
endif
ifeq (,$(findstring s,$(MAKEFLAGS)))
ECHO=echo
else
ECHO=:
endif
OBJS = mdadm.o config.o policy.o mdstat.o ReadMe.o uuid.o util.o maps.o lib.o \
Manage.o Assemble.o Build.o \
Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
Incremental.o Dump.o \
mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
super-mbr.o super-gpt.o \
restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o xmalloc.o \
platform-intel.o probe_roms.o crc32c.o
CHECK_OBJS = restripe.o uuid.o sysfs.o maps.o lib.o xmalloc.o dlink.o
SRCS = $(patsubst %.o,%.c,$(OBJS))
INCL = mdadm.h part.h bitmap.h
MON_OBJS = mdmon.o monitor.o managemon.o uuid.o util.o maps.o mdstat.o sysfs.o \
policy.o lib.o \
Kill.o sg_io.o dlink.o ReadMe.o super-intel.o \
super-mbr.o super-gpt.o \
super-ddf.o sha1.o crc32.o msg.o bitmap.o xmalloc.o \
platform-intel.o probe_roms.o crc32c.o
MON_SRCS = $(patsubst %.o,%.c,$(MON_OBJS))
STATICSRC = pwgr.c
STATICOBJS = pwgr.o
all : mdadm mdmon
man : mdadm.man md.man mdadm.conf.man mdmon.man raid6check.man
check_rundir:
@if [ ! -d "$(dir $(RUN_DIR))" -a "$(CHECK_RUN_DIR)" = 1 ]; then \
echo "***** Parent of $(RUN_DIR) does not exist. Maybe set different RUN_DIR="; \
echo "***** e.g. make RUN_DIR=/dev/.mdadm" ; \
echo "***** or set CHECK_RUN_DIR=0"; exit 1; \
fi
everything: all mdadm.static swap_super test_stripe raid6check \
mdadm.Os mdadm.O2 man
everything-test: all mdadm.static swap_super test_stripe \
mdadm.Os mdadm.O2 man
# mdadm.uclibc doesn't work on x86-64
# mdadm.tcc doesn't work..
%.o: %.c
$(CC) $(CFLAGS) $(CPPFLAGS) $(COVERITY_FLAGS) -o $@ -c $<
mdadm : $(OBJS) | check_rundir
$(CC) $(CFLAGS) $(LDFLAGS) -o mdadm $(OBJS) $(LDLIBS)
mdadm.static : $(OBJS) $(STATICOBJS)
$(CC) $(CFLAGS) $(LDFLAGS) -static -o mdadm.static $(OBJS) $(STATICOBJS) $(LDLIBS)
mdadm.tcc : $(SRCS) $(INCL)
$(TCC) -o mdadm.tcc $(SRCS)
mdadm.klibc : $(SRCS) $(INCL)
rm -f $(OBJS)
$(CC) -nostdinc -iwithprefix include -I$(KLIBC)/klibc/include -I$(KLIBC)/linux/include -I$(KLIBC)/klibc/arch/i386/include -I$(KLIBC)/klibc/include/bits32 $(CFLAGS) $(SRCS)
mdadm.Os : $(SRCS) $(INCL)
$(CC) -o mdadm.Os $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -DHAVE_STDINT_H -Os $(SRCS) $(LDLIBS)
mdadm.O2 : $(SRCS) $(INCL) mdmon.O2
$(CC) -o mdadm.O2 $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -DHAVE_STDINT_H -O2 -D_FORTIFY_SOURCE=2 $(SRCS) $(LDLIBS)
mdmon.O2 : $(MON_SRCS) $(INCL) mdmon.h
$(CC) -o mdmon.O2 $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $(MON_LDFLAGS) -DHAVE_STDINT_H -O2 -D_FORTIFY_SOURCE=2 $(MON_SRCS) $(LDLIBS)
# use '-z now' to guarantee no dynamic linker interactions with the monitor thread
mdmon : $(MON_OBJS) | check_rundir
$(CC) $(CFLAGS) $(LDFLAGS) $(MON_LDFLAGS) -Wl,-z,now -o mdmon $(MON_OBJS) $(LDLIBS)
msg.o: msg.c msg.h
test_stripe : restripe.c xmalloc.o mdadm.h
$(CC) $(CFLAGS) $(CXFLAGS) $(LDFLAGS) -o test_stripe xmalloc.o -DMAIN restripe.c
raid6check : raid6check.o mdadm.h $(CHECK_OBJS)
$(CC) $(CXFLAGS) $(LDFLAGS) -o raid6check raid6check.o $(CHECK_OBJS)
mdadm.8 : mdadm.8.in
sed -e 's/{DEFAULT_METADATA}/$(DEFAULT_METADATA)/g' \
-e 's,{MAP_PATH},$(MAP_PATH),g' mdadm.8.in > mdadm.8
mdadm.man : mdadm.8
man -l mdadm.8 > mdadm.man
mdmon.man : mdmon.8
man -l mdmon.8 > mdmon.man
md.man : md.4
man -l md.4 > md.man
mdadm.conf.man : mdadm.conf.5
man -l mdadm.conf.5 > mdadm.conf.man
raid6check.man : raid6check.8
man -l raid6check.8 > raid6check.man
$(OBJS) : $(INCL) mdmon.h
$(MON_OBJS) : $(INCL) mdmon.h
sha1.o : sha1.c sha1.h md5.h
$(CC) $(CFLAGS) -DHAVE_STDINT_H -o sha1.o -c sha1.c
install : install-bin install-man install-udev
install-static : mdadm.static install-man
$(INSTALL) -D $(STRIP) -m 755 mdadm.static $(DESTDIR)$(BINDIR)/mdadm
install-tcc : mdadm.tcc install-man
$(INSTALL) -D $(STRIP) -m 755 mdadm.tcc $(DESTDIR)$(BINDIR)/mdadm
install-uclibc : mdadm.uclibc install-man
$(INSTALL) -D $(STRIP) -m 755 mdadm.uclibc $(DESTDIR)$(BINDIR)/mdadm
install-klibc : mdadm.klibc install-man
$(INSTALL) -D $(STRIP) -m 755 mdadm.klibc $(DESTDIR)$(BINDIR)/mdadm
install-man: mdadm.8 md.4 mdadm.conf.5 mdmon.8
$(INSTALL) -D -m 644 mdadm.8 $(DESTDIR)$(MAN8DIR)/mdadm.8
$(INSTALL) -D -m 644 mdmon.8 $(DESTDIR)$(MAN8DIR)/mdmon.8
$(INSTALL) -D -m 644 md.4 $(DESTDIR)$(MAN4DIR)/md.4
$(INSTALL) -D -m 644 mdadm.conf.5 $(DESTDIR)$(MAN5DIR)/mdadm.conf.5
install-udev: udev-md-raid-arrays.rules udev-md-raid-assembly.rules udev-md-raid-creating.rules \
udev-md-clustered-confirm-device.rules
@for file in 01-md-raid-creating.rules 63-md-raid-arrays.rules 64-md-raid-assembly.rules \
69-md-clustered-confirm-device.rules ; \
do sed -e 's,BINDIR,$(BINDIR),g' udev-$${file#??-} > .install.tmp.1 && \
$(ECHO) $(INSTALL) -D -m 644 udev-$${file#??-} $(DESTDIR)$(UDEVDIR)/rules.d/$$file ; \
$(INSTALL) -D -m 644 .install.tmp.1 $(DESTDIR)$(UDEVDIR)/rules.d/$$file ; \
rm -f .install.tmp.1; \
done
install-systemd: systemd/mdmon@.service
@for file in mdmon@.service mdmonitor.service mdadm-last-resort@.timer \
mdadm-last-resort@.service mdadm-grow-continue@.service \
mdcheck_start.timer mdcheck_start.service \
mdcheck_continue.timer mdcheck_continue.service \
mdmonitor-oneshot.timer mdmonitor-oneshot.service \
; \
do sed -e 's,BINDIR,$(BINDIR),g' systemd/$$file > .install.tmp.2 && \
$(ECHO) $(INSTALL) -D -m 644 systemd/$$file $(DESTDIR)$(SYSTEMD_DIR)/$$file ; \
$(INSTALL) -D -m 644 .install.tmp.2 $(DESTDIR)$(SYSTEMD_DIR)/$$file ; \
rm -f .install.tmp.2; \
done
@for file in mdadm.shutdown ; \
do sed -e 's,BINDIR,$(BINDIR),g' systemd/$$file > .install.tmp.3 && \
$(ECHO) $(INSTALL) -D -m 755 systemd/$$file $(DESTDIR)$(SYSTEMD_DIR)-shutdown/$$file ; \
$(INSTALL) -D -m 755 .install.tmp.3 $(DESTDIR)$(SYSTEMD_DIR)-shutdown/$$file ; \
rm -f .install.tmp.3; \
done
if [ -f /etc/SuSE-release -o -n "$(SUSE)" ] ;then $(INSTALL) -D -m 755 systemd/SUSE-mdadm_env.sh $(DESTDIR)$(LIB_DIR)/mdadm_env.sh ;fi
install-bin: mdadm mdmon
$(INSTALL) -D $(STRIP) -m 755 mdadm $(DESTDIR)$(BINDIR)/mdadm
$(INSTALL) -D $(STRIP) -m 755 mdmon $(DESTDIR)$(BINDIR)/mdmon
uninstall:
rm -f $(DESTDIR)$(MAN8DIR)/mdadm.8 $(DESTDIR)$(MAN8DIR)/mdmon.8 $(DESTDIR)$(MAN4DIR)/md.4 $(DESTDIR)$(MAN5DIR)/mdadm.conf.5 $(DESTDIR)$(BINDIR)/mdadm
test: mdadm mdmon test_stripe swap_super raid6check
@echo "Please run './test' as root"
clean :
rm -f mdadm mdmon $(OBJS) $(MON_OBJS) $(STATICOBJS) core *.man \
mdadm.tcc mdadm.uclibc mdadm.static *.orig *.porig *.rej *.alt \
.merge_file_* mdadm.Os mdadm.O2 mdmon.O2 swap_super init.cpio.gz \
mdadm.uclibc.static test_stripe raid6check raid6check.o mdmon mdadm.8
rm -rf cov-int
dist : clean
./makedist
testdist : everything-test clean
./makedist test
TAGS :
etags *.h *.c
DISTRO_MAKEFILE := $(wildcard distropkg/Makefile)
ifdef DISTRO_MAKEFILE
include $(DISTRO_MAKEFILE)
endif

1767
Manage.c Normal file

File diff suppressed because it is too large Load diff

1275
Monitor.c Normal file

File diff suppressed because it is too large Load diff

140
Query.c Normal file
View file

@ -0,0 +1,140 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2002-2009 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include "md_p.h"
#include "md_u.h"
int Query(char *dev)
{
/* Give a brief description of the device,
* whether it is an md device and whether it has
* a superblock
*/
int fd;
int ioctlerr, staterr;
int superror;
int level, raid_disks, spare_disks;
struct mdinfo info;
struct mdinfo *sra;
struct supertype *st = NULL;
unsigned long long larray_size;
struct stat stb;
char *mddev;
mdu_disk_info_t disc;
char *activity;
fd = open(dev, O_RDONLY);
if (fd < 0){
pr_err("cannot open %s: %s\n", dev, strerror(errno));
return 1;
}
if (fstat(fd, &stb) < 0)
staterr = errno;
else
staterr = 0;
ioctlerr = 0;
sra = sysfs_read(fd, dev, GET_DISKS | GET_LEVEL | GET_DEVS | GET_STATE);
if (sra) {
level = sra->array.level;
raid_disks = sra->array.raid_disks;
spare_disks = sra->array.spare_disks;
} else {
mdu_array_info_t array;
if (md_get_array_info(fd, &array) < 0) {
ioctlerr = errno;
level = -1;
raid_disks = -1;
spare_disks = -1;
} else {
level = array.level;
raid_disks = array.raid_disks;
spare_disks = array.spare_disks;
}
}
if (!ioctlerr && !staterr) {
if (!get_dev_size(fd, NULL, &larray_size))
larray_size = 0;
}
if (ioctlerr == ENODEV)
printf("%s: is an md device which is not active\n", dev);
else if (ioctlerr && major(stb.st_rdev) != MD_MAJOR)
printf("%s: is not an md array\n", dev);
else if (ioctlerr)
printf("%s: is an md device, but gives \"%s\" when queried\n",
dev, strerror(ioctlerr));
else {
printf("%s: %s %s %d devices, %d spare%s. Use mdadm --detail for more detail.\n",
dev, human_size_brief(larray_size,IEC),
map_num(pers, level), raid_disks,
spare_disks, spare_disks == 1 ? "" : "s");
}
st = guess_super(fd);
if (st && st->ss->compare_super != NULL)
superror = st->ss->load_super(st, fd, dev);
else
superror = -1;
close(fd);
if (superror == 0) {
/* array might be active... */
int uuid[4];
struct map_ent *me, *map = NULL;
st->ss->getinfo_super(st, &info, NULL);
st->ss->uuid_from_super(st, uuid);
me = map_by_uuid(&map, uuid);
if (me) {
mddev = me->path;
disc.number = info.disk.number;
activity = "undetected";
if (mddev && (fd = open(mddev, O_RDONLY))>=0) {
if (md_array_active(fd)) {
if (md_get_disk_info(fd, &disc) >= 0 &&
makedev((unsigned)disc.major,(unsigned)disc.minor) == stb.st_rdev)
activity = "active";
else
activity = "mismatch";
}
close(fd);
}
} else {
activity = "inactive";
mddev = "array";
}
printf("%s: device %d in %d device %s %s %s. Use mdadm --examine for more detail.\n",
dev,
info.disk.number, info.array.raid_disks,
activity,
map_num(pers, info.array.level),
mddev);
if (st->ss == &super0)
put_md_name(mddev);
}
return 0;
}

122
README.initramfs Normal file
View file

@ -0,0 +1,122 @@
Assembling md arrays at boot time.
---------------------------------
December 2005
These notes apply to 2.6 kernels only and, in some cases,
to 2.6.15 or later.
Md arrays can be assembled at boot time using the 'autodetect' functionality
which is triggered by storing components of an array in partitions of type
'fd' - Linux Raid Autodetect.
They can also be assembled by specifying the component devices in a
kernel parameter such as
md=0,/dev/sda,/dev/sdb
In this case, /dev/md0 will be assembled (because of the 0) from the listed
devices.
These mechanisms, while useful, do not provide complete functionality
and are unlikely to be extended. The preferred way to assemble md
arrays at boot time is using 'mdadm'. To assemble an array which
contains the root filesystem, mdadm needs to be run before that
filesystem is mounted, and so needs to be run from an initial-ram-fs.
It is how this can work that is the primary focus of this document.
It should be noted up front that only the array containing the root
filesystem should be assembled from the initramfs. Any other arrays
should be assembled under the control of files on the main filesystem
as this enhanced flexibility and maintainability.
A minimal initramfs for assembling md arrays can be created using 3
files and one directory. These are:
/bin Directory
/bin/mdadm statically linked mdadm binary
/bin/busybox statically linked busybox binary
/bin/sh hard link to /bin/busybox
/init a shell script which call mdadm appropriately.
An example init script is:
==============================================
#!/bin/sh
echo 'Auto-assembling boot md array'
mkdir /proc
mount -t proc proc /proc
if [ -n "$rootuuid" ]
then arg=--uuid=$rootuuid
elif [ -n "$mdminor" ]
then arg=--super-minor=$mdminor
else arg=--super-minor=0
fi
echo "Using $arg"
mdadm -Acpartitions $arg --auto=part /dev/mda
cd /
mount /dev/mda1 /root || mount /dev/mda /root
umount /proc
cd /root
exec chroot . /sbin/init < /dev/console > /dev/console 2>&1
=============================================
This could certainly be extended, or merged into a larger init script.
Though tested and in production use, it is not presented here as
"The Right Way" to do it, but as a useful example.
Some key points are:
/proc needs to be mounted so that /proc/partitions can be accessed
by mdadm, and so that /proc/filesystems can be accessed by mount.
The uuid of the array can be passed in as a kernel parameter
(rootuuid). As the kernel doesn't use this value, it is made available
in the environment for /init
If no uuid is given, we default to md0, (--super-minor=0) which is a
commonly used to store the root filesystem. This may not work in
all situations.
We assemble the array as a partitionable array (/dev/mda) even if we
end up using the whole array. There is no cost in using the partitionable
interface, and in this context it is simpler.
We try mounting both /dev/mda1 and /dev/mda as they are the most like
part of the array to contain the root filesystem.
The --auto flag is given to mdadm so that it will create /dev/md*
files automatically. This is needed as /dev will not contain
and md files, and udev will not create them (as udev only created device
files after the device exists, and mdadm need the device file to create
the device). Note that the created md files may not exist in /dev
of the mounted root filesystem. This needs to be deal with separately
from mdadm - possibly using udev.
We do not need to create device files for the components which will
be assembled into /dev/mda. mdadm finds the major/minor numbers from
/proc/partitions and creates a temporary /dev file if one doesn't already
exist.
The script "mkinitramfs" which is included with the mdadm distribution
can be used to create a minimal initramfs. It creates a file called
'init.cpio.gz' which can be specified as an 'initrd' to lilo or grub
(or whatever boot loader is being used).
Resume from an md array
-----------------------
If you want to make use of the suspend-to-disk/resume functionality in Linux,
and want to have swap on an md array, you will need to assemble the array
before resume is possible.
However, because the array is active in the resumed image, you do not want
anything written to any drives during the resume process, such as superblock
updates or array resync.
This can be achieved in 2.6.15-rc1 and later kernels using the
'start_readonly' module parameter.
Simply include the command
echo 1 > /sys/module/md_mod/parameters/start_ro
before assembling the array with 'mdadm'.
You can then echo
9:0
or whatever is appropriate to /sys/power/resume to trigger the resume.

656
ReadMe.c Normal file
View file

@ -0,0 +1,656 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2016 Neil Brown <neilb@suse.com>
* Copyright (C) 2016-2017 Jes Sorensen <Jes.Sorensen@gmail.com>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
* Maintainer: Jes Sorensen
* Email: <Jes.Sorensen@gmail.com>
*/
#include "mdadm.h"
#ifndef VERSION
#define VERSION "4.2"
#endif
#ifndef VERS_DATE
#define VERS_DATE "2021-12-30"
#endif
#ifndef EXTRAVERSION
#define EXTRAVERSION ""
#endif
char Version[] = "mdadm - v" VERSION " - " VERS_DATE EXTRAVERSION "\n";
/*
* File: ReadMe.c
*
* This file contains general comments about the implementation
* and the various usage messages that can be displayed by mdadm
*
*/
/*
* mdadm has 7 major modes of operation:
* 1/ Create
* This mode is used to create a new array with a superblock
* 2/ Assemble
* This mode is used to assemble the parts of a previously created
* array into an active array. Components can be explicitly given
* or can be searched for. mdadm (optionally) checks that the components
* do form a bona-fide array, and can, on request, fiddle superblock
* version numbers so as to assemble a faulty array.
* 3/ Build
* This is for building legacy arrays without superblocks
* 4/ Manage
* This is for doing something to one or more devices
* in an array, such as add,remove,fail.
* run/stop/readonly/readwrite are also available
* 5/ Misc
* This is for doing things to individual devices.
* They might be parts of an array so
* zero-superblock, examine might be appropriate
* They might be md arrays so
* run,stop,rw,ro,detail might be appropriate
* Also query will treat it as either
* 6/ Monitor
* This mode never exits but just monitors arrays and reports changes.
* 7/ Grow
* This mode allows for changing of key attributes of a raid array, such
* as size, number of devices, and possibly even layout.
* 8/ Incremental
* Is assembles an array incrementally instead of all at once.
* As devices are discovered they can be passed to "mdadm --incremental"
* which will collect them. When enough devices to for an array are
* found, it is started.
*/
char short_options[]="-ABCDEFGIQhVXYWZ:vqbc:i:l:p:m:r:n:x:u:c:d:z:U:N:safRSow1tye:k";
char short_bitmap_options[]=
"-ABCDEFGIQhVXYWZ:vqb:c:i:l:p:m:r:n:x:u:c:d:z:U:N:sarfRSow1tye:k:";
char short_bitmap_auto_options[]=
"-ABCDEFGIQhVXYWZ:vqb:c:i:l:p:m:r:n:x:u:c:d:z:U:N:sa:rfRSow1tye:k:";
struct option long_options[] = {
{"manage", 0, 0, ManageOpt},
{"misc", 0, 0, MiscOpt},
{"assemble", 0, 0, 'A'},
{"build", 0, 0, 'B'},
{"create", 0, 0, 'C'},
{"detail", 0, 0, 'D'},
{"examine", 0, 0, 'E'},
{"follow", 0, 0, 'F'},
{"grow", 0, 0, 'G'},
{"incremental",0,0, 'I'},
{"zero-superblock", 0, 0, KillOpt}, /* deliberately not a short_option */
{"query", 0, 0, 'Q'},
{"examine-bitmap", 0, 0, 'X'},
{"auto-detect", 0, 0, AutoDetect},
{"detail-platform", 0, 0, DetailPlatform},
{"kill-subarray", 1, 0, KillSubarray},
{"update-subarray", 1, 0, UpdateSubarray},
{"udev-rules", 2, 0, UdevRules},
{"offroot", 0, 0, OffRootOpt},
{"examine-badblocks", 0, 0, ExamineBB},
{"dump", 1, 0, Dump},
{"restore", 1, 0, Restore},
/* synonyms */
{"monitor", 0, 0, 'F'},
/* after those will normally come the name of the md device */
{"help", 0, 0, 'h'},
{"help-options",0,0, HelpOptions},
{"version", 0, 0, 'V'},
{"verbose", 0, 0, 'v'},
{"quiet", 0, 0, 'q'},
/* For create or build: */
{"chunk", 1, 0, ChunkSize},
{"rounding", 1, 0, ChunkSize}, /* for linear, chunk is really a
* rounding number */
{"level", 1, 0, 'l'}, /* 0,1,4,5,6,linear */
{"parity", 1, 0, Layout}, /* {left,right}-{a,}symmetric */
{"layout", 1, 0, Layout},
{"raid-disks",1, 0, 'n'},
{"raid-devices",1, 0, 'n'},
{"spare-disks",1,0, 'x'},
{"spare-devices",1,0, 'x'},
{"size", 1, 0, 'z'},
{"auto", 1, 0, Auto}, /* also for --assemble */
{"assume-clean",0,0, AssumeClean },
{"metadata", 1, 0, 'e'}, /* superblock format */
{"bitmap", 1, 0, Bitmap},
{"bitmap-chunk", 1, 0, BitmapChunk},
{"write-behind", 2, 0, WriteBehind},
{"write-mostly",0, 0, WriteMostly},
{"failfast", 0, 0, FailFast},
{"nofailfast",0, 0, NoFailFast},
{"re-add", 0, 0, ReAdd},
{"homehost", 1, 0, HomeHost},
{"symlinks", 1, 0, Symlinks},
{"data-offset",1, 0, DataOffset},
{"nodes",1, 0, Nodes}, /* also for --assemble */
{"home-cluster",1, 0, ClusterName},
{"write-journal",1, 0, WriteJournal},
{"consistency-policy", 1, 0, 'k'},
/* For assemble */
{"uuid", 1, 0, 'u'},
{"super-minor",1,0, SuperMinor},
{"name", 1, 0, 'N'},
{"config", 1, 0, ConfigFile},
{"scan", 0, 0, 's'},
{"force", 0, 0, Force},
{"update", 1, 0, 'U'},
{"freeze-reshape", 0, 0, FreezeReshape},
/* Management */
{"add", 0, 0, Add},
{"add-spare", 0, 0, AddSpare},
{"add-journal", 0, 0, AddJournal},
{"remove", 0, 0, Remove},
{"fail", 0, 0, Fail},
{"set-faulty",0, 0, Fail},
{"replace", 0, 0, Replace},
{"with", 0, 0, With},
{"run", 0, 0, 'R'},
{"stop", 0, 0, 'S'},
{"readonly", 0, 0, 'o'},
{"readwrite", 0, 0, 'w'},
{"no-degraded",0,0, NoDegraded },
{"wait", 0, 0, WaitOpt},
{"wait-clean", 0, 0, Waitclean },
{"action", 1, 0, Action },
{"cluster-confirm", 0, 0, ClusterConfirm},
/* For Detail/Examine */
{"brief", 0, 0, Brief},
{"no-devices",0, 0, NoDevices},
{"export", 0, 0, 'Y'},
{"sparc2.2", 0, 0, Sparc22},
{"test", 0, 0, 't'},
{"prefer", 1, 0, Prefer},
/* For Follow/monitor */
{"mail", 1, 0, EMail},
{"program", 1, 0, ProgramOpt},
{"alert", 1, 0, ProgramOpt},
{"increment", 1, 0, Increment},
{"delay", 1, 0, 'd'},
{"daemonise", 0, 0, Fork},
{"daemonize", 0, 0, Fork},
{"oneshot", 0, 0, '1'},
{"pid-file", 1, 0, 'i'},
{"syslog", 0, 0, 'y'},
{"no-sharing", 0, 0, NoSharing},
/* For Grow */
{"backup-file", 1,0, BackupFile},
{"invalid-backup",0,0,InvalidBackup},
{"array-size", 1, 0, 'Z'},
{"continue", 0, 0, Continue},
/* For Incremental */
{"rebuild-map", 0, 0, RebuildMapOpt},
{"path", 1, 0, IncrementalPath},
{0, 0, 0, 0}
};
char Usage[] =
"Usage: mdadm --help\n"
" for help\n"
;
char Help[] =
"mdadm is used for building, managing, and monitoring\n"
"Linux md devices (aka RAID arrays)\n"
"Usage: mdadm --create device options...\n"
" Create a new array from unused devices.\n"
" mdadm --assemble device options...\n"
" Assemble a previously created array.\n"
" mdadm --build device options...\n"
" Create or assemble an array without metadata.\n"
" mdadm --manage device options...\n"
" make changes to an existing array.\n"
" mdadm --misc options... devices\n"
" report on or modify various md related devices.\n"
" mdadm --grow options device\n"
" resize/reshape an active array\n"
" mdadm --incremental device\n"
" add/remove a device to/from an array as appropriate\n"
" mdadm --monitor options...\n"
" Monitor one or more array for significant changes.\n"
" mdadm device options...\n"
" Shorthand for --manage.\n"
"Any parameter that does not start with '-' is treated as a device name\n"
"or, for --examine-bitmap, a file name.\n"
"The first such name is often the name of an md device. Subsequent\n"
"names are often names of component devices.\n"
"\n"
" For detailed help on the above major modes use --help after the mode\n"
" e.g.\n"
" mdadm --assemble --help\n"
" For general help on options use\n"
" mdadm --help-options\n"
;
char OptionHelp[] =
"Any parameter that does not start with '-' is treated as a device name\n"
"or, for --examine-bitmap, a file name.\n"
"The first such name is often the name of an md device. Subsequent\n"
"names are often names of component devices.\n"
"\n"
"Some common options are:\n"
" --help -h : General help message or, after above option,\n"
" mode specific help message\n"
" --help-options : This help message\n"
" --version -V : Print version information for mdadm\n"
" --verbose -v : Be more verbose about what is happening\n"
" --quiet -q : Don't print un-necessary messages\n"
" --brief -b : Be less verbose, more brief\n"
" --export -Y : With --detail, --detail-platform or --examine use\n"
" key=value format for easy import into environment\n"
" --force -f : Override normal checks and be more forceful\n"
"\n"
" --assemble -A : Assemble an array\n"
" --build -B : Build an array without metadata\n"
" --create -C : Create a new array\n"
" --detail -D : Display details of an array\n"
" --examine -E : Examine superblock on an array component\n"
" --examine-bitmap -X: Display the detail of a bitmap file\n"
" --examine-badblocks: Display list of known bad blocks on device\n"
" --monitor -F : monitor (follow) some arrays\n"
" --grow -G : resize/ reshape and array\n"
" --incremental -I : add/remove a single device to/from an array as appropriate\n"
" --query -Q : Display general information about how a\n"
" device relates to the md driver\n"
" --auto-detect : Start arrays auto-detected by the kernel\n"
;
/*
"\n"
" For create or build:\n"
" --bitmap= -b : File to store bitmap in - may pre-exist for --build\n"
" --chunk= -c : chunk size of kibibytes\n"
" --rounding= : rounding factor for linear array (==chunk size)\n"
" --level= -l : raid level: 0,1,4,5,6,10,linear, or mp for create.\n"
" : 0,1,10,mp,faulty or linear for build.\n"
" --parity= -p : raid5/6 parity algorithm: {left,right}-{,a}symmetric\n"
" --layout= : same as --parity, for RAID10: [fno]NN \n"
" --raid-devices= -n : number of active devices in array\n"
" --spare-devices= -x: number of spare (eXtra) devices in initial array\n"
" --size= -z : Size (in K) of each drive in RAID1/4/5/6/10 - optional\n"
" --force -f : Honour devices as listed on command line. Don't\n"
" : insert a missing drive for RAID5.\n"
" --assume-clean : Assume the array is already in-sync. This is dangerous for RAID5.\n"
" --bitmap-chunk= : chunksize of bitmap in bitmap file (Kilobytes)\n"
" --delay= -d : seconds between bitmap updates\n"
" --write-behind= : number of simultaneous write-behind requests to allow (requires bitmap)\n"
" --name= -N : Textual name for array - max 32 characters\n"
"\n"
" For assemble:\n"
" --bitmap= -b : File to find bitmap information in\n"
" --uuid= -u : uuid of array to assemble. Devices which don't\n"
" have this uuid are excluded\n"
" --super-minor= -m : minor number to look for in super-block when\n"
" choosing devices to use.\n"
" --name= -N : Array name to look for in super-block.\n"
" --config= -c : config file\n"
" --scan -s : scan config file for missing information\n"
" --force -f : Assemble the array even if some superblocks appear out-of-date\n"
" --update= -U : Update superblock: try '-A --update=?' for list of options.\n"
" --no-degraded : Do not start any degraded arrays - default unless --scan.\n"
"\n"
" For detail or examine:\n"
" --brief -b : Just print device name and UUID\n"
"\n"
" For follow/monitor:\n"
" --mail= -m : Address to mail alerts of failure to\n"
" --program= -p : Program to run when an event is detected\n"
" --alert= : same as --program\n"
" --delay= -d : seconds of delay between polling state. default=60\n"
"\n"
" General management:\n"
" --add -a : add, or hotadd subsequent devices\n"
" --re-add : re-add a recently removed device\n"
" --remove -r : remove subsequent devices\n"
" --fail -f : mark subsequent devices as faulty\n"
" --set-faulty : same as --fail\n"
" --replace : mark a device for replacement\n"
" --run -R : start a partially built array\n"
" --stop -S : deactivate array, releasing all resources\n"
" --readonly -o : mark array as readonly\n"
" --readwrite -w : mark array as readwrite\n"
" --zero-superblock : erase the MD superblock from a device.\n"
" --wait -W : wait for recovery/resync/reshape to finish.\n"
;
*/
char Help_create[] =
"Usage: mdadm --create device --chunk=X --level=Y --raid-devices=Z devices\n"
"\n"
" This usage will initialise a new md array, associate some\n"
" devices with it, and activate the array. In order to create an\n"
" array with some devices missing, use the special word 'missing' in\n"
" place of the relevant device name.\n"
"\n"
" Before devices are added, they are checked to see if they already contain\n"
" raid superblocks or filesystems. They are also checked to see if\n"
" the variance in device size exceeds 1%.\n"
" If any discrepancy is found, the user will be prompted for confirmation\n"
" before the array is created. The presence of a '--run' can override this\n"
" caution.\n"
"\n"
" If the --size option is given then only that many kilobytes of each\n"
" device is used, no matter how big each device is.\n"
" If no --size is given, the apparent size of the smallest drive given\n"
" is used for raid level 1 and greater, and the full device is used for\n"
" other levels.\n"
"\n"
" Options that are valid with --create (-C) are:\n"
" --bitmap= -b : Create a bitmap for the array with the given filename\n"
" : or an internal bitmap if 'internal' is given\n"
" --chunk= -c : chunk size in kibibytes\n"
" --rounding= : rounding factor for linear array (==chunk size)\n"
" --level= -l : raid level: 0,1,4,5,6,10,linear,multipath and synonyms\n"
" --parity= -p : raid5/6 parity algorithm: {left,right}-{,a}symmetric\n"
" --layout= : same as --parity, for RAID10: [fno]NN \n"
" --raid-devices= -n : number of active devices in array\n"
" --spare-devices= -x : number of spare (eXtra) devices in initial array\n"
" --size= -z : Size (in K) of each drive in RAID1/4/5/6/10 - optional\n"
" --data-offset= : Space to leave between start of device and start\n"
" : of array data.\n"
" --force -f : Honour devices as listed on command line. Don't\n"
" : insert a missing drive for RAID5.\n"
" --run -R : insist of running the array even if not all\n"
" : devices are present or some look odd.\n"
" --readonly -o : start the array readonly - not supported yet.\n"
" --name= -N : Textual name for array - max 32 characters\n"
" --bitmap-chunk= : bitmap chunksize in Kilobytes.\n"
" --delay= -d : bitmap update delay in seconds.\n"
" --write-journal= : Specify journal device for RAID-4/5/6 array\n"
" --consistency-policy= : Specify the policy that determines how the array\n"
" -k : maintains consistency in case of unexpected shutdown.\n"
"\n"
;
char Help_build[] =
"Usage: mdadm --build device -chunk=X --level=Y --raid-devices=Z devices\n"
"\n"
" This usage is similar to --create. The difference is that it creates\n"
" a legacy array without a superblock. With these arrays there is no\n"
" different between initially creating the array and subsequently\n"
" assembling the array, except that hopefully there is useful data\n"
" there in the second case.\n"
"\n"
" The level may only be 0, 1, 10, linear, multipath, or faulty.\n"
" All devices must be listed and the array will be started once complete.\n"
" Options that are valid with --build (-B) are:\n"
" --bitmap= : file to store/find bitmap information in.\n"
" --chunk= -c : chunk size of kibibytes\n"
" --rounding= : rounding factor for linear array (==chunk size)\n"
" --level= -l : 0, 1, 10, linear, multipath, faulty\n"
" --raid-devices= -n : number of active devices in array\n"
" --bitmap-chunk= : bitmap chunksize in Kilobytes.\n"
" --delay= -d : bitmap update delay in seconds.\n"
;
char Help_assemble[] =
"Usage: mdadm --assemble device options...\n"
" mdadm --assemble --scan options...\n"
"\n"
"This usage assembles one or more raid arrays from pre-existing\n"
"components.\n"
"For each array, mdadm needs to know the md device, the identity of\n"
"the array, and a number of sub devices. These can be found in a number\n"
"of ways.\n"
"\n"
"The md device is given on the command line, is found listed in the\n"
"config file, or can be deduced from the array identity.\n"
"The array identity is determined either from the --uuid, --name, or\n"
"--super-minor commandline arguments, from the config file,\n"
"or from the first component device on the command line.\n"
"\n"
"The different combinations of these are as follows:\n"
" If the --scan option is not given, then only devices and identities\n"
" listed on the command line are considered.\n"
" The first device will be the array device, and the remainder will be\n"
" examined when looking for components.\n"
" If an explicit identity is given with --uuid or --super-minor, then\n"
" only devices with a superblock which matches that identity is considered,\n"
" otherwise every device listed is considered.\n"
"\n"
" If the --scan option is given, and no devices are listed, then\n"
" every array listed in the config file is considered for assembly.\n"
" The identity of candidate devices are determined from the config file.\n"
" After these arrays are assembled, mdadm will look for other devices\n"
" that could form further arrays and tries to assemble them. This can\n"
" be disabled using the 'AUTO' option in the config file.\n"
"\n"
" If the --scan option is given as well as one or more devices, then\n"
" Those devices are md devices that are to be assembled. Their identity\n"
" and components are determined from the config file.\n"
"\n"
" If mdadm can not find all of the components for an array, it will assemble\n"
" it but not activate it unless --run or --scan is given. To preserve this\n"
" behaviour even with --scan, add --no-degraded. Note that \"all of the\n"
" components\" means as many as were present the last time the array was running\n"
" as recorded in the superblock. If the array was already degraded, and\n"
" the missing device is not a new problem, it will still be assembled. It\n"
" is only newly missing devices that cause the array not to be started.\n"
"\n"
"Options that are valid with --assemble (-A) are:\n"
" --bitmap= : bitmap file to use with the array\n"
" --uuid= -u : uuid of array to assemble. Devices which don't\n"
" have this uuid are excluded\n"
" --super-minor= -m : minor number to look for in super-block when\n"
" choosing devices to use.\n"
" --name= -N : Array name to look for in super-block.\n"
" --config= -c : config file\n"
" --scan -s : scan config file for missing information\n"
" --run -R : Try to start the array even if not enough devices\n"
" for a full array are present\n"
" --force -f : Assemble the array even if some superblocks appear\n"
" : out-of-date. This involves modifying the superblocks.\n"
" --update= -U : Update superblock: try '-A --update=?' for option list.\n"
" --no-degraded : Assemble but do not start degraded arrays.\n"
" --readonly -o : Mark the array as read-only. No resync will start.\n"
;
char Help_manage[] =
"Usage: mdadm arraydevice options component devices...\n"
"\n"
"This usage is for managing the component devices within an array.\n"
"The --manage option is not needed and is assumed if the first argument\n"
"is a device name or a management option.\n"
"The first device listed will be taken to be an md array device, any\n"
"subsequent devices are (potential) components of that array.\n"
"\n"
"Options that are valid with management mode are:\n"
" --add -a : hotadd subsequent devices to the array\n"
" --re-add : subsequent devices are re-added if there were\n"
" : recent members of the array\n"
" --remove -r : remove subsequent devices, which must not be active\n"
" --fail -f : mark subsequent devices a faulty\n"
" --set-faulty : same as --fail\n"
" --replace : mark device(s) to be replaced by spares. Once\n"
" : replacement completes, device will be marked faulty\n"
" --with : Indicate which spare a previous '--replace' should\n"
" : prefer to use\n"
" --run -R : start a partially built array\n"
" --stop -S : deactivate array, releasing all resources\n"
" --readonly -o : mark array as readonly\n"
" --readwrite -w : mark array as readwrite\n"
;
char Help_misc[] =
"Usage: mdadm misc_option devices...\n"
"\n"
"This usage is for performing some task on one or more devices, which\n"
"may be arrays or components, depending on the task.\n"
"The --misc option is not needed (though it is allowed) and is assumed\n"
"if the first argument in a misc option.\n"
"\n"
"Options that are valid with the miscellaneous mode are:\n"
" --query -Q : Display general information about how a\n"
" device relates to the md driver\n"
" --detail -D : Display details of an array\n"
" --detail-platform : Display hardware/firmware details\n"
" --examine -E : Examine superblock on an array component\n"
" --examine-bitmap -X: Display contents of a bitmap file\n"
" --examine-badblocks: Display list of known bad blocks on device\n"
" --zero-superblock : erase the MD superblock from a device.\n"
" --run -R : start a partially built array\n"
" --stop -S : deactivate array, releasing all resources\n"
" --readonly -o : mark array as readonly\n"
" --readwrite -w : mark array as readwrite\n"
" --test -t : exit status 0 if ok, 1 if degrade, 2 if dead, 4 if missing\n"
" --wait -W : wait for resync/rebuild/recovery to finish\n"
" --action= : initiate or abort ('idle' or 'frozen') a 'check' or 'repair'.\n"
;
char Help_monitor[] =
"Usage: mdadm --monitor options devices\n"
"\n"
"This usage causes mdadm to monitor a number of md arrays by periodically\n"
"polling their status and acting on any changes.\n"
"If any devices are listed then those devices are monitored, otherwise\n"
"all devices listed in the config file are monitored.\n"
"The address for mailing advisories to, and the program to handle\n"
"each change can be specified in the config file or on the command line.\n"
"There must be at least one destination for advisories, whether\n"
"an email address, a program, or --syslog\n"
"\n"
"Options that are valid with the monitor (-F --follow) mode are:\n"
" --mail= -m : Address to mail alerts of failure to\n"
" --program= -p : Program to run when an event is detected\n"
" --alert= : same as --program\n"
" --syslog -y : Report alerts via syslog\n"
" --increment= -r : Report RebuildNN events in the given increment. default=20\n"
" --delay= -d : seconds of delay between polling state. default=60\n"
" --config= -c : specify a different config file\n"
" --scan -s : find mail-address/program in config file\n"
" --daemonise -f : Fork and continue in child, parent exits\n"
" --pid-file= -i : In daemon mode write pid to specified file instead of stdout\n"
" --oneshot -1 : Check for degraded arrays, then exit\n"
" --test -t : Generate a TestMessage event against each array at startup\n"
;
char Help_grow[] =
"Usage: mdadm --grow device options\n"
"\n"
"This usage causes mdadm to attempt to reconfigure a running array.\n"
"This is only possibly if the kernel being used supports a particular\n"
"reconfiguration.\n"
"\n"
"Options that are valid with the grow (-G --grow) mode are:\n"
" --level= -l : Tell mdadm what level to convert the array to.\n"
" --layout= -p : For a FAULTY array, set/change the error mode.\n"
" : for other arrays, update the layout\n"
" --size= -z : Change the active size of devices in an array.\n"
" : This is useful if all devices have been replaced\n"
" : with larger devices. Value is in Kilobytes, or\n"
" : the special word 'max' meaning 'as large as possible'.\n"
" --assume-clean : When increasing the --size, this flag will avoid\n"
" : a resync of the new space\n"
" --chunk= -c : Change the chunksize of the array\n"
" --raid-devices= -n : Change the number of active devices in an array.\n"
" --add= -a : Add listed devices as part of reshape. This is\n"
" : needed for resizing a RAID0 which cannot have\n"
" : spares already present.\n"
" --bitmap= -b : Add or remove a write-intent bitmap.\n"
" --backup-file= file : A file on a different device to store data for a\n"
" : short time while increasing raid-devices on a\n"
" : RAID4/5/6 array. Also needed throughout a reshape\n"
" : when changing parameters other than raid-devices\n"
" --array-size= -Z : Change visible size of array. This does not change any\n"
" : data on the device, and is not stable across restarts.\n"
" --data-offset= : Location on device to move start of data to.\n"
" --consistency-policy= : Change the consistency policy of an active array.\n"
" -k : Currently works only for PPL with RAID5.\n"
;
char Help_incr[] =
"Usage: mdadm --incremental [-Rqrsf] device\n"
"\n"
"This usage allows for incremental assembly of md arrays. Devices can be\n"
"added one at a time as they are discovered. Once an array has all expected\n"
"devices, it will be started.\n"
"\n"
"Optionally, the process can be reversed by using the fail option.\n"
"When fail mode is invoked, mdadm will see if the device belongs to an array\n"
"and then both fail (if needed) and remove the device from that array.\n"
"\n"
"Options that are valid with incremental assembly (-I --incremental) are:\n"
" --run -R : Run arrays as soon as a minimal number of devices are\n"
" : present rather than waiting for all expected.\n"
" --quiet -q : Don't print any information messages, just errors.\n"
" --rebuild-map -r : Rebuild the 'map' file that mdadm uses for tracking\n"
" : partial arrays.\n"
" --scan -s : Use with -R to start any arrays that have the minimal\n"
" : required number of devices, but are not yet started.\n"
" --fail -f : First fail (if needed) and then remove device from\n"
" : any array that it is a member of.\n"
;
char Help_config[] =
"The /etc/mdadm.conf config file:\n\n"
" The config file contains, apart from blank lines and comment lines that\n"
" start with a hash(#), array lines, device lines, and various\n"
" configuration lines.\n"
" Each line is constructed of a number of space separated words, and can\n"
" be continued on subsequent physical lines by indenting those lines.\n"
"\n"
" A device line starts with the word 'device' and then has a number of words\n"
" which identify devices. These words should be names of devices in the\n"
" filesystem, and can contain wildcards. There can be multiple words or each\n"
" device line, and multiple device lines. All devices so listed are checked\n"
" for relevant super blocks when assembling arrays.\n"
"\n"
" An array line start with the word 'array'. This is followed by the name of\n"
" the array device in the filesystem, e.g. '/dev/md2'. Subsequent words\n"
" describe the identity of the array, used to recognise devices to include in the\n"
" array. The identity can be given as a UUID with a word starting 'uuid=', or\n"
" as a minor-number stored in the superblock using 'super-minor=', or as a list\n"
" of devices. This is given as a comma separated list of names, possibly\n"
" containing wildcards, preceded by 'devices='. If multiple critea are given,\n"
" than a device must match all of them to be considered.\n"
"\n"
" Other configuration lines include:\n"
" mailaddr, mailfrom, program used for --monitor mode\n"
" create, auto used when creating device names in /dev\n"
" homehost, policy, part-policy used to guide policy in various\n"
" situations\n"
"\n"
;
char *mode_help[mode_count] = {
[0] = Help,
[ASSEMBLE] = Help_assemble,
[BUILD] = Help_build,
[CREATE] = Help_create,
[MANAGE] = Help_manage,
[MISC] = Help_misc,
[MONITOR] = Help_monitor,
[GROW] = Help_grow,
[INCREMENTAL] = Help_incr,
};

213
TODO Normal file
View file

@ -0,0 +1,213 @@
- add 'name' field to metadata type and use it.
- use validate_geometry more
- metadata should be able to check/reject bitmap stuff.
DDF:
Three new metadata types:
ddf - used only to create a container.
ddf-bvd - used to create an array in a container
ddf-svd - used to create a secondary array from bvds.
Usage:
mdadm -C /dev/ddf1 /dev/sd[abcdef]
mdadm -C /dev/md1 -e ddf /dev/sd[a-f]
mdadm -C /dev/md1 -l container /dev/sd[a-f]
Each of these create a new ddf container using all those
devices. The name 'ddf*' signals that ddf metadata should be used.
'-e ddf' only supports one level - 'container'. 'container' is only
supported by ddf.
mdadm -C /dev/md1 -l0 -n4 /dev/ddf1 # or maybe not ???
mdadm -C /dev/md1 -l1 -n2 /dev/sda /dev/sdb
If exactly one device is given, and it is a container, we select
devices from that container.
If devices are given that are already in use, they must be in use by
a container, and the array is created in the container.
If devices given are bvds, we slip under the hood to make
the svd arrays.
mdadm -A /dev/ddf ......
base drives make a container. Anything in that container is started
auto-read-only.
if /dev/ddf is already assembled, we assemble bvds and svds inside it.
2005-dec-20
Want an incremental assembly mode to work nicely with udev.
Core usage would be something like
mdadm --incr-assemble /dev/newdevice
This would
- examine the device to determine uuid etc.
- look for a match in /etc/mdadm.conf, abort if not found
- find that device and collect current contents
- perform an 'assemble' analysis to make sure we have the best set of devices.
- remove or add devices as appropriate
- possibly start the array if it was complete
Other usages could involve
- specify which array to auto-add to.
This requires an existing array for uuid matching... is there any point?
-
2004-june-02
* Don't print 'errors' flag, it is meaningless. DONE
* Handle new superblock format
* create device file on demand, particularly partitionable devices. DONE
BUT figure a way to create the partition devices.
auto=partN
* Use Event: interface to listen for events. DONE, untested
* Make sure mdadm -As can assemble multi-level RAIDs ok.
* --build to build raid1 or multipath arrays
clean or not ???
----------------------------------------------------------------------------
* mdadm --monitor to monitor failed multipath paths and re-instate them.
* Maybe make "--help" fit in 80x24 and have a --long-help with more info. DONE
* maybe "missing" instead of <bold>missing</> in doco DONE
* possibly wait for resync to start, or even finish while assembling.- NO
* -Db should have a devices= entry if possible. - DONE
* when assembling multipath arrays, ignore any error indicators. - DONE
* rationalise --monitor usage:
mdadm --monitor
doesn't do as expected. DONE
* --assemble could have a --update option. - DONE
following word can be:
sparc2.2
super-minor
* mdadm /dev/md11, where md11 is raid0 can segfault, particularly when looking in the
[UU_UUU] string ... which doesn't exist !
It should be more sensible. DONE
Example:
from Raimund Sacherer <raimund.sacherer@ngit.at>
mke2fs -m0 -q /dev/ram1 300
mount -n -t ext2 /dev/ram1 /tmp
echo DEVICE /dev/[sh]* >> /tmp/mdadm.conf
mdadm -Esb /dev/[sh]* 2>/dev/null >> /tmp/mdadm.conf
mdadm -ARsc /tmp/mdadm.conf
umount /tmp
?? Allow -S /dev/md? - current complains subsequent not a/d/r - DONE
* new "Query" mode to subsume --detail and --examine.
--query or -Q, takes a device and tells if it is an MD device,
and also tells in a raid superblock is found.
DONE
* write mdstat.c to parse /proc/mdstat file
Build list of arrays: name, rebuild-percent
DONE
* parse /proc/partitions and map major/minor into /dev/* names,
and use that for default DEVICE list ????
* --detail --scan to read /proc/mdstat, and then iterate over these,
but assume --brief. --verbose can override
check each subdevice to see if it is in conf_get_devs.
Warn if not.
DONE, but don't warn yet...
* Support multipath ... maybe...
maybe DONE
* --follow to syslog
* --follow to move spares around DONE
* --follow to notice other events: DONE
rebuild started
spare activated
spare removed
spare added
------------------------------------
- --examine --scan scans all drives and build an mdadm.conf file DONE
- check superblock checksum in examine DONE
- report "chunk" or "rounding" depending on raid level DONE
- report "linear" instead of "-1" for raid level DONE
- decode ayout depending on raid level DONE
- --verbose and --force flags. DONE
- set md_minor, *_disks for Create - DONE
- for create raid5, how to choose between
all working, but not insync
one missing, one spare, insync DONE (--force)
- and for raid1 - some failed drives... (missing)
- when RUN_ARRAY, make sure *_disks counts are right
- get --detail to extract extra stuff from superblock,
like uuid DONE
- --detail --brief to give a config file line DONE
- parse config file. DONE
- test...
- when --assemble --scan, if an underlying device is an md device,
then try to assemble that device first.
- mdadm -S /dev/md0 /dev/md1 gives internal error FIXED
- mdadm --detail --scan print summary of what it can find? DONE
---------
Assemble doesn't add spares. - DONE
Create to allow "missing" name for devices.
Create to accept "--force" for do exactly what is requested
- get Assemble to upgrade devices if force flag.
ARRAY lines in config file to have super_minor=n
ARRAY lines in config file to have device=pattern, and only accept
those devices
If UUID given, insist on that
If not, but super_minor given, require all found with that minor
to have same uuid
If only device given, all valid supers on those devices must have
same uuid
allow /dev/mdX as first argument before any options
Possible --dry-run option for create and assemble--force
Assemble to check that all devices mentioned in superblock
are present.
New mode: --Monitor (or --Follow)
Periodically check status of all arrays (listed in config file).
Log every event and apparent cause - or differences
Email and alert - or run a program - for important events
Move spares around if necessary.
An Array line can have a spare-group= field that indicates that
the array shares spares with other arrays with the same
spare-group name.
If an array has a failed and no spares, then check all other
arrays in the spare group. If one has no failures and a spare,
then consider that spare.
Choose the smallest considered spare that is large enough.
If there is one, then hot-remove it from it's home, and
hot-add it to the array in question.
--mail-to address
--alert-handler program
Will also extract information from /proc/mdstat if present,
and consider 20% marks in rebuild as events.
Events are:
drive fails - causes mail to be sent
rebuild started
spare activated
spare removed
spare added

534
bitmap.c Normal file
View file

@ -0,0 +1,534 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2004 Paul Clements, SteelEye Technology, Inc.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include "mdadm.h"
static inline void sb_le_to_cpu(bitmap_super_t *sb)
{
sb->magic = __le32_to_cpu(sb->magic);
sb->version = __le32_to_cpu(sb->version);
/* uuid gets no translation */
sb->events = __le64_to_cpu(sb->events);
sb->events_cleared = __le64_to_cpu(sb->events_cleared);
sb->state = __le32_to_cpu(sb->state);
sb->chunksize = __le32_to_cpu(sb->chunksize);
sb->daemon_sleep = __le32_to_cpu(sb->daemon_sleep);
sb->sync_size = __le64_to_cpu(sb->sync_size);
sb->write_behind = __le32_to_cpu(sb->write_behind);
sb->nodes = __le32_to_cpu(sb->nodes);
sb->sectors_reserved = __le32_to_cpu(sb->sectors_reserved);
}
static inline void sb_cpu_to_le(bitmap_super_t *sb)
{
sb_le_to_cpu(sb); /* these are really the same thing */
}
mapping_t bitmap_states[] = {
{ "OK", 0 },
{ "Out of date", 2 },
{ NULL, -1 }
};
static const char *bitmap_state(int state_num)
{
char *state = map_num(bitmap_states, state_num);
return state ? state : "Unknown";
}
static const char *human_chunksize(unsigned long bytes)
{
static char buf[16];
char *suffixes[] = { "B", "KB", "MB", "GB", "TB", NULL };
int i = 0;
while (bytes >> 10) {
bytes >>= 10;
i++;
}
snprintf(buf, sizeof(buf), "%lu %s", bytes, suffixes[i]);
return buf;
}
typedef struct bitmap_info_s {
bitmap_super_t sb;
unsigned long long total_bits;
unsigned long long dirty_bits;
} bitmap_info_t;
/* count the dirty bits in the first num_bits of byte */
static inline int count_dirty_bits_byte(char byte, int num_bits)
{
int num = 0;
switch (num_bits) { /* fall through... */
case 8: if (byte & 128) num++;
case 7: if (byte & 64) num++;
case 6: if (byte & 32) num++;
case 5: if (byte & 16) num++;
case 4: if (byte & 8) num++;
case 3: if (byte & 4) num++;
case 2: if (byte & 2) num++;
case 1: if (byte & 1) num++;
default: break;
}
return num;
}
static int count_dirty_bits(char *buf, int num_bits)
{
int i, num = 0;
for (i = 0; i < num_bits / 8; i++)
num += count_dirty_bits_byte(buf[i], 8);
if (num_bits % 8) /* not an even byte boundary */
num += count_dirty_bits_byte(buf[i], num_bits % 8);
return num;
}
static bitmap_info_t *bitmap_fd_read(int fd, int brief)
{
/* Note: fd might be open O_DIRECT, so we must be
* careful to align reads properly
*/
unsigned long long total_bits = 0, read_bits = 0, dirty_bits = 0;
bitmap_info_t *info;
void *buf;
unsigned int n, skip;
if (posix_memalign(&buf, 4096, 8192) != 0) {
pr_err("failed to allocate 8192 bytes\n");
return NULL;
}
n = read(fd, buf, 8192);
info = xmalloc(sizeof(*info));
if (n < sizeof(info->sb)) {
pr_err("failed to read superblock of bitmap file: %s\n", strerror(errno));
free(info);
free(buf);
return NULL;
}
memcpy(&info->sb, buf, sizeof(info->sb));
skip = sizeof(info->sb);
sb_le_to_cpu(&info->sb); /* convert superblock to CPU byte ordering */
if (brief || info->sb.sync_size == 0 || info->sb.chunksize == 0)
goto out;
/* read the rest of the file counting total bits and dirty bits --
* we stop when either:
* 1) we hit EOF, in which case we assume the rest of the bits (if any)
* are dirty
* 2) we've read the full bitmap, in which case we ignore any trailing
* data in the file
*/
total_bits = bitmap_bits(info->sb.sync_size, info->sb.chunksize);
while(read_bits < total_bits) {
unsigned long long remaining = total_bits - read_bits;
if (n == 0) {
n = read(fd, buf, 8192);
skip = 0;
if (n <= 0)
break;
}
if (remaining > (n-skip) * 8) /* we want the full buffer */
remaining = (n-skip) * 8;
dirty_bits += count_dirty_bits(buf+skip, remaining);
read_bits += remaining;
n = 0;
}
if (read_bits < total_bits) { /* file truncated... */
pr_err("WARNING: bitmap file is not large enough for array size %llu!\n\n",
(unsigned long long)info->sb.sync_size);
total_bits = read_bits;
}
out:
free(buf);
info->total_bits = total_bits;
info->dirty_bits = dirty_bits;
return info;
}
static int
bitmap_file_open(char *filename, struct supertype **stp, int node_num, int fd)
{
struct stat stb;
struct supertype *st = *stp;
/* won't re-open filename when (fd >= 0) */
if (fd < 0)
fd = open(filename, O_RDONLY|O_DIRECT);
if (fd < 0) {
pr_err("failed to open bitmap file %s: %s\n",
filename, strerror(errno));
return -1;
}
if (fstat(fd, &stb) < 0) {
pr_err("fstat failed for %s: %s\n", filename, strerror(errno));
close(fd);
return -1;
}
if ((stb.st_mode & S_IFMT) == S_IFBLK) {
/* block device, so we are probably after an internal bitmap */
if (!st)
st = guess_super(fd);
if (!st) {
/* just look at device... */
lseek(fd, 0, 0);
} else if (!st->ss->locate_bitmap) {
pr_err("No bitmap possible with %s metadata\n",
st->ss->name);
close(fd);
return -1;
} else {
if (st->ss->locate_bitmap(st, fd, node_num)) {
pr_err("%s doesn't have bitmap\n", filename);
close(fd);
fd = -1;
}
}
*stp = st;
}
return fd;
}
static __u32 swapl(__u32 l)
{
char *c = (char*)&l;
char t= c[0];
c[0] = c[3];
c[3] = t;
t = c[1];
c[1] = c[2];
c[2] = t;
return l;
}
int ExamineBitmap(char *filename, int brief, struct supertype *st)
{
/*
* Read the bitmap file and display its contents
*/
bitmap_super_t *sb;
bitmap_info_t *info;
int rv = 1;
char buf[64];
int swap;
int fd, i;
__u32 uuid32[4];
fd = bitmap_file_open(filename, &st, 0, -1);
if (fd < 0)
return rv;
info = bitmap_fd_read(fd, brief);
if (!info)
return rv;
sb = &info->sb;
if (sb->magic != BITMAP_MAGIC) {
pr_err("This is an md array. To view a bitmap you need to examine\n");
pr_err("a member device, not the array.\n");
pr_err("Reporting bitmap that would be used if this array were used\n");
pr_err("as a member of some other array\n");
}
printf(" Filename : %s\n", filename);
printf(" Magic : %08x\n", sb->magic);
if (sb->magic != BITMAP_MAGIC) {
pr_err("invalid bitmap magic 0x%x, the bitmap file appears\n",
sb->magic);
pr_err("to be corrupted or missing.\n");
}
printf(" Version : %d\n", sb->version);
if (sb->version < BITMAP_MAJOR_LO ||
sb->version > BITMAP_MAJOR_CLUSTERED) {
pr_err("unknown bitmap version %d, either the bitmap file\n",
sb->version);
pr_err("is corrupted or you need to upgrade your tools\n");
goto free_info;
}
rv = 0;
if (st)
swap = st->ss->swapuuid;
else
#if __BYTE_ORDER == BIG_ENDIAN
swap = 0;
#else
swap = 1;
#endif
memcpy(uuid32, sb->uuid, 16);
if (swap)
printf(" UUID : %08x:%08x:%08x:%08x\n",
swapl(uuid32[0]),
swapl(uuid32[1]),
swapl(uuid32[2]),
swapl(uuid32[3]));
else
printf(" UUID : %08x:%08x:%08x:%08x\n",
uuid32[0],
uuid32[1],
uuid32[2],
uuid32[3]);
if (sb->nodes == 0) {
printf(" Events : %llu\n", (unsigned long long)sb->events);
printf(" Events Cleared : %llu\n", (unsigned long long)sb->events_cleared);
printf(" State : %s\n", bitmap_state(sb->state));
}
printf(" Chunksize : %s\n", human_chunksize(sb->chunksize));
printf(" Daemon : %ds flush period\n", sb->daemon_sleep);
if (sb->write_behind)
sprintf(buf, "Allow write behind, max %d", sb->write_behind);
else
sprintf(buf, "Normal");
printf(" Write Mode : %s\n", buf);
printf(" Sync Size : %llu%s\n", (unsigned long long)sb->sync_size/2,
human_size(sb->sync_size * 512));
if (sb->nodes == 0) {
if (brief)
goto free_info;
printf(" Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
info->total_bits, info->dirty_bits,
100.0 * info->dirty_bits / (info->total_bits?:1));
} else {
printf(" Cluster nodes : %d\n", sb->nodes);
printf(" Cluster name : %-64s\n", sb->cluster_name);
for (i = 0; i < (int)sb->nodes; i++) {
st = NULL;
free(info);
fd = bitmap_file_open(filename, &st, i, fd);
if (fd < 0) {
printf(" Unable to open bitmap file on node: %i\n", i);
continue;
}
info = bitmap_fd_read(fd, brief);
if (!info) {
printf(" Unable to read bitmap on node: %i\n", i);
continue;
}
sb = &info->sb;
if (sb->magic != BITMAP_MAGIC)
pr_err("invalid bitmap magic 0x%x, the bitmap file appears to be corrupted\n", sb->magic);
printf(" Node Slot : %d\n", i);
printf(" Events : %llu\n",
(unsigned long long)sb->events);
printf(" Events Cleared : %llu\n",
(unsigned long long)sb->events_cleared);
printf(" State : %s\n", bitmap_state(sb->state));
if (brief)
continue;
printf(" Bitmap : %llu bits (chunks), %llu dirty (%2.1f%%)\n",
info->total_bits, info->dirty_bits,
100.0 * info->dirty_bits / (info->total_bits?:1));
}
}
free_info:
close(fd);
free(info);
return rv;
}
int IsBitmapDirty(char *filename)
{
/*
* Read the bitmap file
* It will break reading bitmap action immediately when meeting any error.
*
* Return: 1(dirty), 0 (clean), -1(error)
*/
int fd = -1, rv = 0, i;
struct supertype *st = NULL;
bitmap_info_t *info = NULL;
bitmap_super_t *sb = NULL;
fd = bitmap_file_open(filename, &st, 0, fd);
free(st);
if (fd < 0)
goto out;
info = bitmap_fd_read(fd, 0);
if (!info) {
close(fd);
goto out;
}
sb = &info->sb;
for (i = 0; i < (int)sb->nodes; i++) {
st = NULL;
free(info);
info = NULL;
fd = bitmap_file_open(filename, &st, i, fd);
free(st);
if (fd < 0)
goto out;
info = bitmap_fd_read(fd, 0);
if (!info) {
close(fd);
goto out;
}
sb = &info->sb;
if (sb->magic != BITMAP_MAGIC) { /* invalid bitmap magic */
free(info);
close(fd);
goto out;
}
if (info->dirty_bits)
rv = 1;
}
close(fd);
free(info);
return rv;
out:
return -1;
}
int CreateBitmap(char *filename, int force, char uuid[16],
unsigned long chunksize, unsigned long daemon_sleep,
unsigned long write_behind,
unsigned long long array_size /* sectors */,
int major)
{
/*
* Create a bitmap file with a superblock and (optionally) a full bitmap
*/
FILE *fp;
int rv = 1;
char block[512];
bitmap_super_t sb;
long long bytes, filesize;
if (!force && access(filename, F_OK) == 0) {
pr_err("bitmap file %s already exists, use --force to overwrite\n", filename);
return rv;
}
fp = fopen(filename, "w");
if (fp == NULL) {
pr_err("failed to open bitmap file %s: %s\n",
filename, strerror(errno));
return rv;
}
if (chunksize == UnSet) {
/* We don't want more than 2^21 chunks, as 2^11 fill up one
* 4K page (2 bytes per chunk), and 2^10 address of those
* fill up a 4K indexing page. 2^20 might be safer, especially
* on 64bit hosts, so use that.
*/
chunksize = DEFAULT_BITMAP_CHUNK;
/* <<20 for 2^20 chunks, >>9 to convert bytes to sectors */
while (array_size > ((unsigned long long)chunksize << (20-9)))
chunksize <<= 1;
}
memset(&sb, 0, sizeof(sb));
sb.magic = BITMAP_MAGIC;
sb.version = major;
if (uuid != NULL)
memcpy(sb.uuid, uuid, 16);
sb.chunksize = chunksize;
sb.daemon_sleep = daemon_sleep;
sb.write_behind = write_behind;
sb.sync_size = array_size;
sb_cpu_to_le(&sb); /* convert to on-disk byte ordering */
if (fwrite(&sb, sizeof(sb), 1, fp) != 1) {
pr_err("failed to write superblock to bitmap file %s: %s\n", filename, strerror(errno));
goto out;
}
/* calculate the size of the bitmap and write it to disk */
bytes = (bitmap_bits(array_size, chunksize) + 7) / 8;
if (!bytes) {
rv = 0;
goto out;
}
filesize = bytes + sizeof(sb);
memset(block, 0xff, sizeof(block));
while (bytes > 0) {
if (fwrite(block, sizeof(block), 1, fp) != 1) {
pr_err("failed to write bitmap file %s: %s\n", filename, strerror(errno));
goto out;
}
bytes -= sizeof(block);
}
rv = 0;
fflush(fp);
/* make the file be the right size (well, to the nearest byte) */
if (ftruncate(fileno(fp), filesize))
perror("ftrunace");
out:
fclose(fp);
if (rv)
unlink(filename); /* possibly corrupted, better get rid of it */
return rv;
}
int bitmap_update_uuid(int fd, int *uuid, int swap)
{
struct bitmap_super_s bm;
if (lseek(fd, 0, 0) != 0)
return 1;
if (read(fd, &bm, sizeof(bm)) != sizeof(bm))
return 1;
if (bm.magic != __cpu_to_le32(BITMAP_MAGIC))
return 1;
copy_uuid(bm.uuid, uuid, swap);
if (lseek(fd, 0, 0) != 0)
return 2;
if (write(fd, &bm, sizeof(bm)) != sizeof(bm)) {
lseek(fd, 0, 0);
return 2;
}
lseek(fd, 0, 0);
return 0;
}

291
bitmap.h Normal file
View file

@ -0,0 +1,291 @@
/*
* bitmap.h: Copyright (C) Peter T. Breuer (ptb@ot.uc3m.es) 2003
*
* additions: Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc.
*/
#ifndef BITMAP_H
#define BITMAP_H 1
#define BITMAP_MAJOR_LO 3
/* version 4 insists the bitmap is in little-endian order
* with version 3, it is host-endian which is non-portable
*/
#define BITMAP_MAJOR_HI 4
#define BITMAP_MAJOR_HOSTENDIAN 3
#define BITMAP_MAJOR_CLUSTERED 5
#define BITMAP_MINOR 39
/*
* in-memory bitmap:
*
* Use 16 bit block counters to track pending writes to each "chunk".
* The 2 high order bits are special-purpose, the first is a flag indicating
* whether a resync is needed. The second is a flag indicating whether a
* resync is active.
* This means that the counter is actually 14 bits:
*
* +--------+--------+------------------------------------------------+
* | resync | resync | counter |
* | needed | active | |
* | (0-1) | (0-1) | (0-16383) |
* +--------+--------+------------------------------------------------+
*
* The "resync needed" bit is set when:
* a '1' bit is read from storage at startup.
* a write request fails on some drives
* a resync is aborted on a chunk with 'resync active' set
* It is cleared (and resync-active set) when a resync starts across all drives
* of the chunk.
*
*
* The "resync active" bit is set when:
* a resync is started on all drives, and resync_needed is set.
* resync_needed will be cleared (as long as resync_active wasn't already set).
* It is cleared when a resync completes.
*
* The counter counts pending write requests, plus the on-disk bit.
* When the counter is '1' and the resync bits are clear, the on-disk
* bit can be cleared as well, thus setting the counter to 0.
* When we set a bit, or in the counter (to start a write), if the fields is
* 0, we first set the disk bit and set the counter to 1.
*
* If the counter is 0, the on-disk bit is clear and the stipe is clean
* Anything that dirties the stipe pushes the counter to 2 (at least)
* and sets the on-disk bit (lazily).
* If a periodic sweep find the counter at 2, it is decremented to 1.
* If the sweep find the counter at 1, the on-disk bit is cleared and the
* counter goes to zero.
*
* Also, we'll hijack the "map" pointer itself and use it as two 16 bit block
* counters as a fallback when "page" memory cannot be allocated:
*
* Normal case (page memory allocated):
*
* page pointer (32-bit)
*
* [ ] ------+
* |
* +-------> [ ][ ]..[ ] (4096 byte page == 2048 counters)
* c1 c2 c2048
*
* Hijacked case (page memory allocation failed):
*
* hijacked page pointer (32-bit)
*
* [ ][ ] (no page memory allocated)
* counter #1 (16-bit) counter #2 (16-bit)
*
*/
#ifdef __KERNEL__
#define PAGE_BITS (PAGE_SIZE << 3)
#define PAGE_BIT_SHIFT (PAGE_SHIFT + 3)
typedef __u16 bitmap_counter_t;
#define COUNTER_BITS 16
#define COUNTER_BIT_SHIFT 4
#define COUNTER_BYTE_RATIO (COUNTER_BITS / 8)
#define COUNTER_BYTE_SHIFT (COUNTER_BIT_SHIFT - 3)
#define NEEDED_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 1)))
#define RESYNC_MASK ((bitmap_counter_t) (1 << (COUNTER_BITS - 2)))
#define COUNTER_MAX ((bitmap_counter_t) RESYNC_MASK - 1)
#define NEEDED(x) (((bitmap_counter_t) x) & NEEDED_MASK)
#define RESYNC(x) (((bitmap_counter_t) x) & RESYNC_MASK)
#define COUNTER(x) (((bitmap_counter_t) x) & COUNTER_MAX)
/* how many counters per page? */
#define PAGE_COUNTER_RATIO (PAGE_BITS / COUNTER_BITS)
/* same, except a shift value for more efficient bitops */
#define PAGE_COUNTER_SHIFT (PAGE_BIT_SHIFT - COUNTER_BIT_SHIFT)
/* same, except a mask value for more efficient bitops */
#define PAGE_COUNTER_MASK (PAGE_COUNTER_RATIO - 1)
#define BITMAP_BLOCK_SIZE 512
#define BITMAP_BLOCK_SHIFT 9
/* how many blocks per chunk? (this is variable) */
#define CHUNK_BLOCK_RATIO(bitmap) ((bitmap)->chunksize >> BITMAP_BLOCK_SHIFT)
#define CHUNK_BLOCK_SHIFT(bitmap) ((bitmap)->chunkshift - BITMAP_BLOCK_SHIFT)
#define CHUNK_BLOCK_MASK(bitmap) (CHUNK_BLOCK_RATIO(bitmap) - 1)
/* when hijacked, the counters and bits represent even larger "chunks" */
/* there will be 1024 chunks represented by each counter in the page pointers */
#define PAGEPTR_BLOCK_RATIO(bitmap) \
(CHUNK_BLOCK_RATIO(bitmap) << PAGE_COUNTER_SHIFT >> 1)
#define PAGEPTR_BLOCK_SHIFT(bitmap) \
(CHUNK_BLOCK_SHIFT(bitmap) + PAGE_COUNTER_SHIFT - 1)
#define PAGEPTR_BLOCK_MASK(bitmap) (PAGEPTR_BLOCK_RATIO(bitmap) - 1)
/*
* on-disk bitmap:
*
* Use one bit per "chunk" (block set). We do the disk I/O on the bitmap
* file a page at a time. There's a superblock at the start of the file.
*/
/* map chunks (bits) to file pages - offset by the size of the superblock */
#define CHUNK_BIT_OFFSET(chunk) ((chunk) + (sizeof(bitmap_super_t) << 3))
#endif
/*
* bitmap structures:
*/
#define BITMAP_MAGIC 0x6d746962
/* use these for bitmap->flags and bitmap->sb->state bit-fields */
enum bitmap_state {
BITMAP_ACTIVE = 0x001, /* the bitmap is in use */
BITMAP_STALE = 0x002 /* the bitmap file is out of date or had -EIO */
};
/* the superblock at the front of the bitmap file -- little endian */
typedef struct bitmap_super_s {
__u32 magic; /* 0 BITMAP_MAGIC */
__u32 version; /* 4 the bitmap major for now, could change... */
__u8 uuid[16]; /* 8 128 bit uuid - must match md device uuid */
__u64 events; /* 24 event counter for the bitmap (1)*/
__u64 events_cleared;/*32 event counter when last bit cleared (2) */
__u64 sync_size; /* 40 the size of the md device's sync range(3) */
__u32 state; /* 48 bitmap state information */
__u32 chunksize; /* 52 the bitmap chunk size in bytes */
__u32 daemon_sleep; /* 56 seconds between disk flushes */
__u32 write_behind; /* 60 number of outstanding write-behind writes */
__u32 sectors_reserved; /* 64 number of 512-byte sectors that are
* reserved for the bitmap. */
__u32 nodes; /* 68 the maximum number of nodes in cluster. */
__u8 cluster_name[64]; /* 72 cluster name to which this md belongs */
__u8 pad[256 - 136]; /* set to zero */
} bitmap_super_t;
/* notes:
* (1) This event counter is updated before the eventcounter in the md superblock
* When a bitmap is loaded, it is only accepted if this event counter is equal
* to, or one greater than, the event counter in the superblock.
* (2) This event counter is updated when the other one is *if*and*only*if* the
* array is not degraded. As bits are not cleared when the array is degraded,
* this represents the last time that any bits were cleared.
* If a device is being added that has an event count with this value or
* higher, it is accepted as conforming to the bitmap.
* (3)This is the number of sectors represented by the bitmap, and is the range that
* resync happens across. For raid1 and raid5/6 it is the size of individual
* devices. For raid10 it is the size of the array.
*/
#ifdef __KERNEL__
/* the in-memory bitmap is represented by bitmap_pages */
struct bitmap_page {
/*
* map points to the actual memory page
*/
char *map;
/*
* in emergencies (when map cannot be allocated), hijack the map
* pointer and use it as two counters itself
*/
unsigned int hijacked;
/*
* count of dirty bits on the page
*/
int count;
};
/* keep track of bitmap file pages that have pending writes on them */
struct page_list {
struct list_head list;
struct page *page;
};
/* the main bitmap structure - one per mddev */
struct bitmap {
struct bitmap_page *bp;
unsigned long pages; /* total number of pages in the bitmap */
unsigned long missing_pages; /* number of pages not yet allocated */
mddev_t *mddev; /* the md device that the bitmap is for */
int counter_bits; /* how many bits per block counter */
/* bitmap chunksize -- how much data does each bit represent? */
unsigned long chunksize;
unsigned long chunkshift; /* chunksize = 2^chunkshift (for bitops) */
unsigned long chunks; /* total number of data chunks for the array */
/* We hold a count on the chunk currently being synced, and drop
* it when the last block is started. If the resync is aborted
* midway, we need to be able to drop that count, so we remember
* the counted chunk..
*/
unsigned long syncchunk;
__u64 events_cleared;
/* bitmap spinlock */
spinlock_t lock;
struct file *file; /* backing disk file */
struct page *sb_page; /* cached copy of the bitmap file superblock */
struct page **filemap; /* list of cache pages for the file */
unsigned long *filemap_attr; /* attributes associated w/ filemap pages */
unsigned long file_pages; /* number of pages in the file */
unsigned long flags;
/*
* the bitmap daemon - periodically wakes up and sweeps the bitmap
* file, cleaning up bits and flushing out pages to disk as necessary
*/
mdk_thread_t *daemon;
unsigned long daemon_sleep; /* how many seconds between updates? */
/*
* bitmap write daemon - this daemon performs writes to the bitmap file
* this thread is only needed because of a limitation in ext3 (jbd)
* that does not allow a task to have two journal transactions ongoing
* simultaneously (even if the transactions are for two different
* filesystems) -- in the case of bitmap, that would be the filesystem
* that the bitmap file resides on and the filesystem that is mounted
* on the md device -- see current->journal_info in jbd/transaction.c
*/
mdk_thread_t *write_daemon;
mdk_thread_t *writeback_daemon;
spinlock_t write_lock;
struct semaphore write_ready;
struct semaphore write_done;
unsigned long writes_pending;
wait_queue_head_t write_wait;
struct list_head write_pages;
struct list_head complete_pages;
mempool_t *write_pool;
};
/* the bitmap API */
/* these are used only by md/bitmap */
int bitmap_create(mddev_t *mddev);
void bitmap_destroy(mddev_t *mddev);
int bitmap_active(struct bitmap *bitmap);
char *file_path(struct file *file, char *buf, int count);
void bitmap_print_sb(struct bitmap *bitmap);
int bitmap_update_sb(struct bitmap *bitmap);
int bitmap_setallbits(struct bitmap *bitmap);
/* these are exported */
void bitmap_startwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors);
void bitmap_endwrite(struct bitmap *bitmap, sector_t offset, unsigned long sectors,
int success);
int bitmap_start_sync(struct bitmap *bitmap, sector_t offset, int *blocks);
void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, int *blocks, int aborted);
void bitmap_close_sync(struct bitmap *bitmap);
int bitmap_unplug(struct bitmap *bitmap);
#endif
#endif

View file

@ -0,0 +1,50 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check $NODE1 resync
check $NODE2 PENDING
check all wait
check all raid10
check all bitmap
check all nosync
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l10 -b clustered -n3 --layout n3 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid10
check all bitmap
check all state UUU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l10 -b clustered -n2 -x1 --layout n2 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid10
check all bitmap
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
name=tstmd
mdadm -CR $md0 -l10 -b clustered -n2 $dev0 $dev1 --layout n2 --name=$name --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
for ip in $NODE1 $NODE2
do
ssh $ip "mdadm -D $md0 | grep 'Name' | grep -q $name"
[ $? -ne '0' ] &&
die "$ip: check --name=$name failed."
done
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,50 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check $NODE1 resync
check $NODE2 PENDING
check all wait
check all raid1
check all bitmap
check all nosync
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid1
check all bitmap
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
name=tstmd
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --name=$name --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
for ip in $NODE1 $NODE2
do
ssh $ip "mdadm -D $md0 | grep 'Name' | grep -q $name"
[ $? -ne '0' ] &&
die "$ip: check --name=$name failed."
done
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,51 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
# switch 'clustered' bitmap to 'none', and then 'none' to 'internal'
stop_md $NODE2 $md0
mdadm --grow $md0 --bitmap=none
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'clustered' to 'none' failed."
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] &&
die "$NODE1: bitmap still exists in member_disks."
check all nobitmap
mdadm --grow $md0 --bitmap=internal
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'none' to 'internal' failed."
sleep 1
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] ||
die "$NODE1: create 'internal' bitmap failed."
check $NODE1 bitmap
# switch 'internal' bitmap to 'none', and then 'none' to 'clustered'
mdadm --grow $md0 --bitmap=none
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'internal' to 'none' failed."
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] &&
die "$NODE1: bitmap still exists in member_disks."
check $NODE1 nobitmap
mdadm --grow $md0 --bitmap=clustered
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'none' to 'clustered' failed."
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
sleep 1
for ip in $NODES
do
ssh $ip "mdadm -X $dev0 $dev1 | grep -q 'Cluster name'" ||
die "$ip: create 'clustered' bitmap failed."
done
check all bitmap
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,38 @@
#!/bin/bash
size=20000
mdadm -CR $md0 -l10 -b clustered --layout n2 --size $size --chunk=64 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
mdadm --grow $md0 --size max
check $NODE1 resync
check $NODE1 wait
check all state UU
mdadm --grow $md0 --size $size
check all nosync
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l10 -b clustered --layout n2 --chunk=64 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
mdadm --grow $md0 --chunk=128
check $NODE1 reshape
check $NODE1 wait
check all chunk 128
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,68 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
mdadm --grow $md0 --raid-devices=3 --add $dev2
sleep 0.3
grep recovery /proc/mdstat
if [ $? -eq '0' ]
then
check $NODE1 wait
else
check $NODE2 recovery
check $NODE2 wait
fi
check all state UUU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid1
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --grow $md0 --raid-devices=3 --add $dev3
sleep 0.3
grep recovery /proc/mdstat
if [ $? -eq '0' ]
then
check $NODE1 wait
else
check $NODE2 recovery
check $NODE2 wait
fi
check all state UUU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid1
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --grow $md0 --raid-devices=3
sleep 0.3
grep recovery /proc/mdstat
if [ $? -eq '0' ]
then
check $NODE1 wait
else
check $NODE2 recovery
check $NODE2 wait
fi
check all state UUU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,51 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
# switch 'clustered' bitmap to 'none', and then 'none' to 'internal'
stop_md $NODE2 $md0
mdadm --grow $md0 --bitmap=none
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'clustered' to 'none' failed."
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] &&
die "$NODE1: bitmap still exists in member_disks."
check all nobitmap
mdadm --grow $md0 --bitmap=internal
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'none' to 'internal' failed."
sleep 2
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] ||
die "$NODE1: create 'internal' bitmap failed."
check $NODE1 bitmap
# switch 'internal' bitmap to 'none', and then 'none' to 'clustered'
mdadm --grow $md0 --bitmap=none
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'internal' to 'none' failed."
mdadm -X $dev0 $dev1 &> /dev/null
[ $? -eq '0' ] &&
die "$NODE1: bitmap still exists in member_disks."
check $NODE1 nobitmap
mdadm --grow $md0 --bitmap=clustered
[ $? -eq '0' ] ||
die "$NODE1: change bitmap 'none' to 'clustered' failed."
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
sleep 2
for ip in $NODES
do
ssh $ip "mdadm -X $dev0 $dev1 | grep -q 'Cluster name'" ||
die "$ip: create 'clustered' bitmap failed."
done
check all bitmap
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,23 @@
#!/bin/bash
size=10000
mdadm -CR $md0 -l1 -b clustered --size $size -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
mdadm --grow $md0 --size max
check $NODE1 resync
check $NODE1 wait
check all state UU
mdadm --grow $md0 --size $size
check all nosync
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,33 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0 --remove $dev0
mdadm --zero $dev2
mdadm --manage $md0 --add $dev2
sleep 0.3
check $NODE1 recovery
check $NODE1 wait
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --add $dev2
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,30 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --add-spare $dev2
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid10
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --manage $md0 --add-spare $dev3
check all spares 2
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,18 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid10
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0 --remove $dev0
mdadm --manage $md0 --re-add $dev0
check $NODE1 recovery
check all wait
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,33 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0 --remove $dev0
mdadm --zero $dev2
mdadm --manage $md0 --add $dev2
sleep 0.3
check $NODE1 recovery
check $NODE1 wait
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --add $dev2
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,30 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --add-spare $dev2
check all spares 1
check all state UU
check all dmesg
stop_md all $md0
mdadm -CR $md0 -l1 -b clustered -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid1
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --manage $md0 --add-spare $dev3
check all spares 2
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,16 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check all nosync
check all raid1
check all bitmap
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0 --remove $dev0
mdadm --manage $md0 --re-add $dev0
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,21 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid10
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0
sleep 0.2
check $NODE1 recovery
stop_md $NODE1 $md0
check $NODE2 recovery
check $NODE2 wait
check $NODE2 state UU
check all dmesg
stop_md $NODE2 $md0
exit 0

View file

@ -0,0 +1,18 @@
#!/bin/bash
mdadm -CR $md0 -l10 -b clustered --layout n2 -n2 $dev0 $dev1
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check $NODE1 resync
check $NODE2 PENDING
stop_md $NODE1 $md0
check $NODE2 resync
check $NODE2 wait
mdadm -A $md0 $dev0 $dev1
check all raid10
check all bitmap
check all nosync
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,21 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 -x1 $dev0 $dev1 $dev2 --assume-clean
ssh $NODE2 mdadm -A $md0 $dev0 $dev1 $dev2
check all nosync
check all raid1
check all bitmap
check all spares 1
check all state UU
check all dmesg
mdadm --manage $md0 --fail $dev0
sleep 0.3
check $NODE1 recovery
stop_md $NODE1 $md0
check $NODE2 recovery
check $NODE2 wait
check $NODE2 state UU
check all dmesg
stop_md $NODE2 $md0
exit 0

View file

@ -0,0 +1,18 @@
#!/bin/bash
mdadm -CR $md0 -l1 -b clustered -n2 $dev0 $dev1
ssh $NODE2 mdadm -A $md0 $dev0 $dev1
check $NODE1 resync
check $NODE2 PENDING
stop_md $NODE1 $md0
check $NODE2 resync
check $NODE2 wait
mdadm -A $md0 $dev0 $dev1
check all raid1
check all bitmap
check all nosync
check all state UU
check all dmesg
stop_md all $md0
exit 0

View file

@ -0,0 +1,43 @@
# Prerequisite:
# 1. The clustermd_tests/ cases only support to test 2-node-cluster, cluster
# requires packages: 'pacemaker+corosync+sbd+crmsh', all packages link at
# "https://github.com/ClusterLabs/", and also requires dlm resource running
# on each node of cluster.
# For quick start HA-cluster with SUSE distributions, refer to the chapter 6-8:
# https://www.suse.com/documentation/sle-ha-12/install-quick/data/install-quick.html
# For Redhat distributions, please refer to:
# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/high_availability_add-on_administration/index
# 2. Setup ssh-access with no-authorized mode, it should be:
# # 'ssh $node1 -l root ls' and 'ssh $node2 -l root ls' success on any node.
# 3. Fill-up node-ip part and disks part as following.
# Set node1 as the master node, the cluster-md cases should run on this node,
# and node2 is the slave node.
# For example:
# NODE1=192.168.1.100 (testing run here)
# NODE2=192.168.1.101
NODE1=
NODE2=
# Provide the devlist for clustermd-testing, alternative: if set the step 1,
# don't set step 2, and vice versa.
# 1. Use ISCSI service to provide shared storage, then login ISCSI target via
# to ISCSI_TARGET_ID and ISCSI_TARGET_IP on iscsi clients, commands like:
# Execute on iscsi clients:
# 1) discover the iscsi server.
# # iscsiadm -m discovery -t st -p $ISCSI_TARGET_IP
# 2) login and establish connection.
# # iscsiadm -m node -T $ISCSI_TARGET_ID -p $ISCSI_TARGET_IP -l
# Note:
# On ISCSI server, must create all iscsi-luns in one target_id, recommend more
# than 6 luns/disks for testing, and each disk should be: 100M < disk < 800M.
# 2. If all cluster-nodes mounted the same disks directly, and the devname are
# the same on all nodes, then put them to 'devlist'.
# For example: (Only set $ISCSI_TARGET_ID is enough if iscsi has already connected)
# ISCSI_TARGET_ID=iqn.2018-01.example.com:clustermd-testing
# ISCSI_TARGET_IP=192.168.1.102
ISCSI_TARGET_ID=
#devlist=/dev/sda /dev/sdb /dev/sdc /dev/sdd
devlist=

332
clustermd_tests/func.sh Normal file
View file

@ -0,0 +1,332 @@
#!/bin/bash
check_ssh()
{
NODE1="$(grep '^NODE1' $CLUSTER_CONF | cut -d'=' -f2)"
NODE2="$(grep '^NODE2' $CLUSTER_CONF | cut -d'=' -f2)"
[ -z "$NODE1" -o -z "$NODE2" ] && {
echo "Please provide node-ip in $CLUSTER_CONF."
exit 1
}
for ip in $NODE1 $NODE2
do
ssh -o NumberOfPasswordPrompts=0 $ip -l root "pwd" > /dev/null
[ $? -ne 0 ] && {
echo "Please setup ssh-access with no-authorized mode."
exit 1
}
done
}
fetch_devlist()
{
ISCSI_ID="$(grep '^ISCSI_TARGET_ID' $CLUSTER_CONF | cut -d'=' -f2)"
devlist="$(grep '^devlist' $CLUSTER_CONF | cut -d'=' -f2)"
if [ ! -z "$ISCSI_ID" -a ! -z "$devlist" ]
then
echo "Config ISCSI_TARGET_ID or devlist in $CLUSTER_CONF."
exit 1
elif [ ! -z "$ISCSI_ID" -a -z "$devlist" ]
then
for ip in $NODE1 $NODE2
do
ssh $ip "ls /dev/disk/by-path/*$ISCSI_ID*" > /dev/null
[ $? -ne 0 ] && {
echo "$ip: No disks found in '$ISCSI_ID' connection."
exit 1
}
done
devlist=($(ls /dev/disk/by-path/*$ISCSI_ID*))
fi
# sbd disk cannot use in testing
# Init devlist as an array
i=''
devlist=(${devlist[@]#$i})
for i in ${devlist[@]}
do
sbd -d $i dump &> /dev/null
[ $? -eq '0' ] && devlist=(${devlist[@]#$i})
done
for i in $(seq 0 ${#devlist[@]})
do
eval "dev$i=${devlist[$i]}"
done
[ "${#devlist[@]}" -lt 6 ] && {
echo "Cluster-md testing requires 6 disks at least."
exit 1
}
}
check_dlm()
{
if ! crm configure show | grep -q dlm
then
crm configure primitive dlm ocf:pacemaker:controld \
op monitor interval=60 timeout=60 \
meta target-role=Started &> /dev/null
crm configure group base-group dlm
crm configure clone base-clone base-group \
meta interleave=true
fi
sleep 1
for ip in $NODE1 $NODE2
do
ssh $ip "pgrep dlm_controld > /dev/null" || {
echo "$ip: dlm_controld daemon doesn't exist."
exit 1
}
done
crm_mon -r -n1 | grep -iq "fail\|not" && {
echo "Please clear cluster-resource errors."
exit 1
}
}
check_env()
{
user=$(id -un)
[ "X$user" = "Xroot" ] || {
echo "testing can only be done as 'root'."
exit 1
}
[ \! -x $mdadm ] && {
echo "test: please run make everything before perform testing."
exit 1
}
check_ssh
commands=(mdadm iscsiadm bc modinfo dlm_controld
udevadm crm crm_mon lsblk pgrep sbd)
for ip in $NODE1 $NODE2
do
for cmd in ${commands[@]}
do
ssh $ip "which $cmd &> /dev/null" || {
echo "$ip: $cmd, command not found!"
exit 1
}
done
mods=(raid1 raid10 md_mod dlm md-cluster)
for mod in ${mods[@]}
do
ssh $ip "modinfo $mod > /dev/null" || {
echo "$ip: $mod, module doesn't exist."
exit 1
}
done
ssh $ip "lsblk -a | grep -iq raid"
[ $? -eq 0 ] && {
echo "$ip: Please run testing without running RAIDs environment."
exit 1
}
ssh $ip "modprobe md_mod"
done
fetch_devlist
check_dlm
[ -d $logdir ] || mkdir -p $logdir
}
# $1/node, $2/optional
stop_md()
{
if [ "$1" == "all" ]
then
NODES=($NODE1 $NODE2)
elif [ "$1" == "$NODE1" -o "$1" == "$NODE2" ]
then
NODES=$1
else
die "$1: unknown parameter."
fi
if [ -z "$2" ]
then
for ip in ${NODES[@]}
do
ssh $ip mdadm -Ssq
done
else
for ip in ${NODES[@]}
do
ssh $ip mdadm -S $2
done
fi
}
# $1/optional, it shows why to save log
save_log()
{
status=$1
logfile="$status""$_basename".log
cat $targetdir/stderr >> $targetdir/log
cp $targetdir/log $logdir/$_basename.log
for ip in $NODE1 $NODE2
do
echo "##$ip: saving dmesg." >> $logdir/$logfile
ssh $ip "dmesg -c" >> $logdir/$logfile
echo "##$ip: saving proc mdstat." >> $logdir/$logfile
ssh $ip "cat /proc/mdstat" >> $logdir/$logfile
array=($(ssh $ip "mdadm -Ds | cut -d' ' -f2"))
if [ ! -z "$array" -a ${#array[@]} -ge 1 ]
then
echo "##$ip: mdadm -D ${array[@]}" >> $logdir/$logfile
ssh $ip "mdadm -D ${array[@]}" >> $logdir/$logfile
md_disks=($(ssh $ip "mdadm -DY ${array[@]} | grep "/dev/" | cut -d'=' -f2"))
cat /proc/mdstat | grep -q "bitmap"
if [ $? -eq 0 ]
then
echo "##$ip: mdadm -X ${md_disks[@]}" >> $logdir/$logfile
ssh $ip "mdadm -X ${md_disks[@]}" >> $logdir/$logfile
echo "##$ip: mdadm -E ${md_disks[@]}" >> $logdir/$logfile
ssh $ip "mdadm -E ${md_disks[@]}" >> $logdir/$logfile
fi
else
echo "##$ip: no array assembled!" >> $logdir/$logfile
fi
done
[ "$1" == "fail" ] &&
echo "See $logdir/$_basename.log and $logdir/$logfile for details"
stop_md all
}
do_setup()
{
check_env
ulimit -c unlimited
}
do_clean()
{
for ip in $NODE1 $NODE2
do
ssh $ip "mdadm -Ssq; dmesg -c > /dev/null"
done
mdadm --zero ${devlist[@]} &> /dev/null
}
cleanup()
{
check_ssh
do_clean
}
# check: $1/cluster_node $2/feature $3/optional
check()
{
NODES=()
if [ "$1" == "all" ]
then
NODES=($NODE1 $NODE2)
elif [ "$1" == "$NODE1" -o "$1" == "$NODE2" ]
then
NODES=$1
else
die "$1: unknown parameter."
fi
case $2 in
spares )
for ip in ${NODES[@]}
do
spares=$(ssh $ip "tr '] ' '\012\012' < /proc/mdstat | grep -c '(S)'")
[ "$spares" -ne "$3" ] &&
die "$ip: expected $3 spares, but found $spares"
done
;;
raid* )
for ip in ${NODES[@]}
do
ssh $ip "grep -sq "$2" /proc/mdstat" ||
die "$ip: check '$2' failed."
done
;;
PENDING | recovery | resync | reshape )
cnt=5
for ip in ${NODES[@]}
do
while ! ssh $ip "grep -sq '$2' /proc/mdstat"
do
if [ "$cnt" -gt '0' ]
then
sleep 0.2
cnt=$[cnt-1]
else
die "$ip: no '$2' happening!"
fi
done
done
;;
wait )
local cnt=60
for ip in ${NODES[@]}
do
p=$(ssh $ip "cat /proc/sys/dev/raid/speed_limit_max")
ssh $ip "echo 200000 > /proc/sys/dev/raid/speed_limit_max"
while ssh $ip "grep -Esq '(resync|recovery|reshape|check|repair)' /proc/mdstat"
do
if [ "$cnt" -gt '0' ]
then
sleep 5
cnt=$[cnt-1]
else
die "$ip: Check '$2' timeout over 300 seconds."
fi
done
ssh $ip "echo $p > /proc/sys/dev/raid/speed_limit_max"
done
;;
bitmap )
for ip in ${NODES[@]}
do
ssh $ip "grep -sq '$2' /proc/mdstat" ||
die "$ip: no '$2' found in /proc/mdstat."
done
;;
nobitmap )
for ip in ${NODES[@]}
do
ssh $ip "grep -sq 'bitmap' /proc/mdstat" &&
die "$ip: 'bitmap' found in /proc/mdstat."
done
;;
chunk )
for ip in ${NODES[@]}
do
chunk_size=`awk -F',' '/chunk/{print $2}' /proc/mdstat | awk -F'[a-z]' '{print $1}'`
[ "$chunk_size" -ne "$3" ] &&
die "$ip: chunksize should be $3, but it's $chunk_size"
done
;;
state )
for ip in ${NODES[@]}
do
ssh $ip "grep -Esq 'blocks.*\[$3\]\$' /proc/mdstat" ||
die "$ip: no '$3' found in /proc/mdstat."
done
;;
nosync )
for ip in ${NODES[@]}
do
ssh $ip "grep -Eq '(resync|recovery)' /proc/mdstat" &&
die "$ip: resync or recovery is happening!"
done
;;
readonly )
for ip in ${NODES[@]}
do
ssh $ip "grep -sq "read-only" /proc/mdstat" ||
die "$ip: check '$2' failed!"
done
;;
dmesg )
for ip in ${NODES[@]}
do
ssh $ip "dmesg | grep -iq 'error\|call trace\|segfault'" &&
die "$ip: check '$2' prints errors!"
done
;;
* )
die "unknown parameter $2"
;;
esac
}

1235
config.c Normal file

File diff suppressed because it is too large Load diff

10
coverity-gcc-hack.h Normal file
View file

@ -0,0 +1,10 @@
#if !defined(__KERNEL__) && defined(__x86_64__) && defined(__COVERITY_GCC_VERSION_AT_LEAST)
#if __COVERITY_GCC_VERSION_AT_LEAST(7, 0)
typedef float _Float128 __attribute__((__vector_size__(128)));
typedef float _Float64 __attribute__((__vector_size__(64)));
typedef float _Float32 __attribute__((__vector_size__(32)));
typedef float _Float128x __attribute__((__vector_size__(128)));
typedef float _Float64x __attribute__((__vector_size__(64)));
typedef float _Float32x __attribute__((__vector_size__(32)));
#endif
#endif

360
crc32.c Normal file
View file

@ -0,0 +1,360 @@
/* crc32.c -- compute the CRC-32 of a data stream
* Copyright (C) 1995-2003 Mark Adler
* For conditions of distribution and use, see copyright notice in zlib.h
*
* Note: zlib license from from zlib.h added explicitly as mdadm does
* not include zlib.h. License from v1.2.2 of zlib:
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the authors be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute it
* freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must not
* claim that you wrote the original software. If you use this software
* in a product, an acknowledgment in the product documentation would be
* appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must not be
* misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source distribution.
*
*
* Thanks to Rodney Brown <rbrown64@csc.com.au> for his contribution of faster
* CRC methods: exclusive-oring 32 bits of data at a time, and pre-computing
* tables for updating the shift register in one step with three exclusive-ors
* instead of four steps with four exclusive-ors. This results about a factor
* of two increase in speed on a Power PC G4 (PPC7455) using gcc -O3.
*/
/* @(#) $Id$ */
/*
Note on the use of DYNAMIC_CRC_TABLE: there is no mutex or semaphore
protection on the static variables used to control the first-use generation
of the crc tables. Therefore, if you #define DYNAMIC_CRC_TABLE, you should
first call get_crc_table() to initialize the tables before allowing more than
one thread to use crc32().
*/
#ifdef MAKECRCH
# include <stdio.h>
# ifndef DYNAMIC_CRC_TABLE
# define DYNAMIC_CRC_TABLE
# endif /* !DYNAMIC_CRC_TABLE */
#endif /* MAKECRCH */
/* #include "zutil.h" / * for STDC and FAR definitions */
#define STDC
#define FAR
#define Z_NULL ((void*)0)
#define OF(X) X
#define ZEXPORT
typedef long ptrdiff_t;
#define NOBYFOUR
#define local static
/* Find a four-byte integer type for crc32_little() and crc32_big(). */
#ifndef NOBYFOUR
# ifdef STDC /* need ANSI C limits.h to determine sizes */
# include <limits.h>
# define BYFOUR
# if (UINT_MAX == 0xffffffffUL)
typedef unsigned int u4;
# else
# if (ULONG_MAX == 0xffffffffUL)
typedef unsigned long u4;
# else
# if (USHRT_MAX == 0xffffffffUL)
typedef unsigned short u4;
# else
# undef BYFOUR /* can't find a four-byte integer type! */
# endif
# endif
# endif
# endif /* STDC */
#endif /* !NOBYFOUR */
/* Definitions for doing the crc four data bytes at a time. */
#ifdef BYFOUR
# define REV(w) (((w)>>24)+(((w)>>8)&0xff00)+ \
(((w)&0xff00)<<8)+(((w)&0xff)<<24))
local unsigned long crc32_little OF((unsigned long,
const unsigned char FAR *, unsigned));
local unsigned long crc32_big OF((unsigned long,
const unsigned char FAR *, unsigned));
# define TBLS 8
#else
# define TBLS 1
#endif /* BYFOUR */
#ifdef DYNAMIC_CRC_TABLE
local volatile int crc_table_empty = 1;
local unsigned long FAR crc_table[TBLS][256];
local void make_crc_table OF((void));
#ifdef MAKECRCH
local void write_table OF((FILE *, const unsigned long FAR *));
#endif /* MAKECRCH */
/*
Generate tables for a byte-wise 32-bit CRC calculation on the polynomial:
x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1.
Polynomials over GF(2) are represented in binary, one bit per coefficient,
with the lowest powers in the most significant bit. Then adding polynomials
is just exclusive-or, and multiplying a polynomial by x is a right shift by
one. If we call the above polynomial p, and represent a byte as the
polynomial q, also with the lowest power in the most significant bit (so the
byte 0xb1 is the polynomial x^7+x^3+x+1), then the CRC is (q*x^32) mod p,
where a mod b means the remainder after dividing a by b.
This calculation is done using the shift-register method of multiplying and
taking the remainder. The register is initialized to zero, and for each
incoming bit, x^32 is added mod p to the register if the bit is a one (where
x^32 mod p is p+x^32 = x^26+...+1), and the register is multiplied mod p by
x (which is shifting right by one and adding x^32 mod p if the bit shifted
out is a one). We start with the highest power (least significant bit) of
q and repeat for all eight bits of q.
The first table is simply the CRC of all possible eight bit values. This is
all the information needed to generate CRCs on data a byte at a time for all
combinations of CRC register values and incoming bytes. The remaining tables
allow for word-at-a-time CRC calculation for both big-endian and little-
endian machines, where a word is four bytes.
*/
local void make_crc_table()
{
unsigned long c;
int n, k;
unsigned long poly; /* polynomial exclusive-or pattern */
/* terms of polynomial defining this crc (except x^32): */
static volatile int first = 1; /* flag to limit concurrent making */
static const unsigned char p[] = {0,1,2,4,5,7,8,10,11,12,16,22,23,26};
/* See if another task is already doing this (not thread-safe, but better
than nothing -- significantly reduces duration of vulnerability in
case the advice about DYNAMIC_CRC_TABLE is ignored) */
if (first) {
first = 0;
/* make exclusive-or pattern from polynomial (0xedb88320UL) */
poly = 0UL;
for (n = 0; n < sizeof(p)/sizeof(unsigned char); n++)
poly |= 1UL << (31 - p[n]);
/* generate a crc for every 8-bit value */
for (n = 0; n < 256; n++) {
c = (unsigned long)n;
for (k = 0; k < 8; k++)
c = c & 1 ? poly ^ (c >> 1) : c >> 1;
crc_table[0][n] = c;
}
#ifdef BYFOUR
/* generate crc for each value followed by one, two, and three zeros,
and then the byte reversal of those as well as the first table */
for (n = 0; n < 256; n++) {
c = crc_table[0][n];
crc_table[4][n] = REV(c);
for (k = 1; k < 4; k++) {
c = crc_table[0][c & 0xff] ^ (c >> 8);
crc_table[k][n] = c;
crc_table[k + 4][n] = REV(c);
}
}
#endif /* BYFOUR */
crc_table_empty = 0;
}
else { /* not first */
/* wait for the other guy to finish (not efficient, but rare) */
while (crc_table_empty)
;
}
#ifdef MAKECRCH
/* write out CRC tables to crc32.h */
{
FILE *out;
out = fopen("crc32.h", "w");
if (out == NULL) return;
fprintf(out, "/* crc32.h -- tables for rapid CRC calculation\n");
fprintf(out, " * Generated automatically by crc32.c\n */\n\n");
fprintf(out, "local const unsigned long FAR ");
fprintf(out, "crc_table[TBLS][256] =\n{\n {\n");
write_table(out, crc_table[0]);
# ifdef BYFOUR
fprintf(out, "#ifdef BYFOUR\n");
for (k = 1; k < 8; k++) {
fprintf(out, " },\n {\n");
write_table(out, crc_table[k]);
}
fprintf(out, "#endif\n");
# endif /* BYFOUR */
fprintf(out, " }\n};\n");
fclose(out);
}
#endif /* MAKECRCH */
}
#ifdef MAKECRCH
local void write_table(out, table)
FILE *out;
const unsigned long FAR *table;
{
int n;
for (n = 0; n < 256; n++)
fprintf(out, "%s0x%08lxUL%s", n % 5 ? "" : " ", table[n],
n == 255 ? "\n" : (n % 5 == 4 ? ",\n" : ", "));
}
#endif /* MAKECRCH */
#else /* !DYNAMIC_CRC_TABLE */
/* ========================================================================
* Tables of CRC-32s of all single-byte values, made by make_crc_table().
*/
#include "crc32.h"
#endif /* DYNAMIC_CRC_TABLE */
/* =========================================================================
* This function can be used by asm versions of crc32()
*/
const unsigned long FAR * ZEXPORT get_crc_table(void)
{
#ifdef DYNAMIC_CRC_TABLE
if (crc_table_empty)
make_crc_table();
#endif /* DYNAMIC_CRC_TABLE */
return (const unsigned long FAR *)crc_table;
}
/* ========================================================================= */
#define DO1 crc = crc_table[0][((int)crc ^ (*buf++)) & 0xff] ^ (crc >> 8)
#define DO8 DO1; DO1; DO1; DO1; DO1; DO1; DO1; DO1
/* ========================================================================= */
unsigned long ZEXPORT crc32(
unsigned long crc,
const unsigned char FAR *buf,
unsigned len)
{
if (buf == Z_NULL) return 0UL;
#ifdef DYNAMIC_CRC_TABLE
if (crc_table_empty)
make_crc_table();
#endif /* DYNAMIC_CRC_TABLE */
#ifdef BYFOUR
if (sizeof(void *) == sizeof(ptrdiff_t)) {
u4 endian;
endian = 1;
if (*((unsigned char *)(&endian)))
return crc32_little(crc, buf, len);
else
return crc32_big(crc, buf, len);
}
#endif /* BYFOUR */
/* crc = crc ^ 0xffffffffUL;*/
while (len >= 8) {
DO8;
len -= 8;
}
if (len) do {
DO1;
} while (--len);
return crc /* ^ 0xffffffffUL*/;
}
#ifdef BYFOUR
/* ========================================================================= */
#define DOLIT4 c ^= *buf4++; \
c = crc_table[3][c & 0xff] ^ crc_table[2][(c >> 8) & 0xff] ^ \
crc_table[1][(c >> 16) & 0xff] ^ crc_table[0][c >> 24]
#define DOLIT32 DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4
/* ========================================================================= */
local unsigned long crc32_little(crc, buf, len)
unsigned long crc;
const unsigned char FAR *buf;
unsigned len;
{
register u4 c;
register const u4 FAR *buf4;
c = (u4)crc;
c = ~c;
while (len && ((ptrdiff_t)buf & 3)) {
c = crc_table[0][(c ^ *buf++) & 0xff] ^ (c >> 8);
len--;
}
buf4 = (const u4 FAR *)buf;
while (len >= 32) {
DOLIT32;
len -= 32;
}
while (len >= 4) {
DOLIT4;
len -= 4;
}
buf = (const unsigned char FAR *)buf4;
if (len) do {
c = crc_table[0][(c ^ *buf++) & 0xff] ^ (c >> 8);
} while (--len);
c = ~c;
return (unsigned long)c;
}
/* ========================================================================= */
#define DOBIG4 c ^= *++buf4; \
c = crc_table[4][c & 0xff] ^ crc_table[5][(c >> 8) & 0xff] ^ \
crc_table[6][(c >> 16) & 0xff] ^ crc_table[7][c >> 24]
#define DOBIG32 DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4
/* ========================================================================= */
local unsigned long crc32_big(crc, buf, len)
unsigned long crc;
const unsigned char FAR *buf;
unsigned len;
{
register u4 c;
register const u4 FAR *buf4;
c = REV((u4)crc);
c = ~c;
while (len && ((ptrdiff_t)buf & 3)) {
c = crc_table[4][(c >> 24) ^ *buf++] ^ (c << 8);
len--;
}
buf4 = (const u4 FAR *)buf;
buf4--;
while (len >= 32) {
DOBIG32;
len -= 32;
}
while (len >= 4) {
DOBIG4;
len -= 4;
}
buf4++;
buf = (const unsigned char FAR *)buf4;
if (len) do {
c = crc_table[4][(c >> 24) ^ *buf++] ^ (c << 8);
} while (--len);
c = ~c;
return (unsigned long)(REV(c));
}
#endif /* BYFOUR */

441
crc32.h Normal file
View file

@ -0,0 +1,441 @@
/* crc32.h -- tables for rapid CRC calculation
* Generated automatically by crc32.c
*/
local const unsigned long FAR crc_table[TBLS][256] =
{
{
0x00000000UL, 0x77073096UL, 0xee0e612cUL, 0x990951baUL, 0x076dc419UL,
0x706af48fUL, 0xe963a535UL, 0x9e6495a3UL, 0x0edb8832UL, 0x79dcb8a4UL,
0xe0d5e91eUL, 0x97d2d988UL, 0x09b64c2bUL, 0x7eb17cbdUL, 0xe7b82d07UL,
0x90bf1d91UL, 0x1db71064UL, 0x6ab020f2UL, 0xf3b97148UL, 0x84be41deUL,
0x1adad47dUL, 0x6ddde4ebUL, 0xf4d4b551UL, 0x83d385c7UL, 0x136c9856UL,
0x646ba8c0UL, 0xfd62f97aUL, 0x8a65c9ecUL, 0x14015c4fUL, 0x63066cd9UL,
0xfa0f3d63UL, 0x8d080df5UL, 0x3b6e20c8UL, 0x4c69105eUL, 0xd56041e4UL,
0xa2677172UL, 0x3c03e4d1UL, 0x4b04d447UL, 0xd20d85fdUL, 0xa50ab56bUL,
0x35b5a8faUL, 0x42b2986cUL, 0xdbbbc9d6UL, 0xacbcf940UL, 0x32d86ce3UL,
0x45df5c75UL, 0xdcd60dcfUL, 0xabd13d59UL, 0x26d930acUL, 0x51de003aUL,
0xc8d75180UL, 0xbfd06116UL, 0x21b4f4b5UL, 0x56b3c423UL, 0xcfba9599UL,
0xb8bda50fUL, 0x2802b89eUL, 0x5f058808UL, 0xc60cd9b2UL, 0xb10be924UL,
0x2f6f7c87UL, 0x58684c11UL, 0xc1611dabUL, 0xb6662d3dUL, 0x76dc4190UL,
0x01db7106UL, 0x98d220bcUL, 0xefd5102aUL, 0x71b18589UL, 0x06b6b51fUL,
0x9fbfe4a5UL, 0xe8b8d433UL, 0x7807c9a2UL, 0x0f00f934UL, 0x9609a88eUL,
0xe10e9818UL, 0x7f6a0dbbUL, 0x086d3d2dUL, 0x91646c97UL, 0xe6635c01UL,
0x6b6b51f4UL, 0x1c6c6162UL, 0x856530d8UL, 0xf262004eUL, 0x6c0695edUL,
0x1b01a57bUL, 0x8208f4c1UL, 0xf50fc457UL, 0x65b0d9c6UL, 0x12b7e950UL,
0x8bbeb8eaUL, 0xfcb9887cUL, 0x62dd1ddfUL, 0x15da2d49UL, 0x8cd37cf3UL,
0xfbd44c65UL, 0x4db26158UL, 0x3ab551ceUL, 0xa3bc0074UL, 0xd4bb30e2UL,
0x4adfa541UL, 0x3dd895d7UL, 0xa4d1c46dUL, 0xd3d6f4fbUL, 0x4369e96aUL,
0x346ed9fcUL, 0xad678846UL, 0xda60b8d0UL, 0x44042d73UL, 0x33031de5UL,
0xaa0a4c5fUL, 0xdd0d7cc9UL, 0x5005713cUL, 0x270241aaUL, 0xbe0b1010UL,
0xc90c2086UL, 0x5768b525UL, 0x206f85b3UL, 0xb966d409UL, 0xce61e49fUL,
0x5edef90eUL, 0x29d9c998UL, 0xb0d09822UL, 0xc7d7a8b4UL, 0x59b33d17UL,
0x2eb40d81UL, 0xb7bd5c3bUL, 0xc0ba6cadUL, 0xedb88320UL, 0x9abfb3b6UL,
0x03b6e20cUL, 0x74b1d29aUL, 0xead54739UL, 0x9dd277afUL, 0x04db2615UL,
0x73dc1683UL, 0xe3630b12UL, 0x94643b84UL, 0x0d6d6a3eUL, 0x7a6a5aa8UL,
0xe40ecf0bUL, 0x9309ff9dUL, 0x0a00ae27UL, 0x7d079eb1UL, 0xf00f9344UL,
0x8708a3d2UL, 0x1e01f268UL, 0x6906c2feUL, 0xf762575dUL, 0x806567cbUL,
0x196c3671UL, 0x6e6b06e7UL, 0xfed41b76UL, 0x89d32be0UL, 0x10da7a5aUL,
0x67dd4accUL, 0xf9b9df6fUL, 0x8ebeeff9UL, 0x17b7be43UL, 0x60b08ed5UL,
0xd6d6a3e8UL, 0xa1d1937eUL, 0x38d8c2c4UL, 0x4fdff252UL, 0xd1bb67f1UL,
0xa6bc5767UL, 0x3fb506ddUL, 0x48b2364bUL, 0xd80d2bdaUL, 0xaf0a1b4cUL,
0x36034af6UL, 0x41047a60UL, 0xdf60efc3UL, 0xa867df55UL, 0x316e8eefUL,
0x4669be79UL, 0xcb61b38cUL, 0xbc66831aUL, 0x256fd2a0UL, 0x5268e236UL,
0xcc0c7795UL, 0xbb0b4703UL, 0x220216b9UL, 0x5505262fUL, 0xc5ba3bbeUL,
0xb2bd0b28UL, 0x2bb45a92UL, 0x5cb36a04UL, 0xc2d7ffa7UL, 0xb5d0cf31UL,
0x2cd99e8bUL, 0x5bdeae1dUL, 0x9b64c2b0UL, 0xec63f226UL, 0x756aa39cUL,
0x026d930aUL, 0x9c0906a9UL, 0xeb0e363fUL, 0x72076785UL, 0x05005713UL,
0x95bf4a82UL, 0xe2b87a14UL, 0x7bb12baeUL, 0x0cb61b38UL, 0x92d28e9bUL,
0xe5d5be0dUL, 0x7cdcefb7UL, 0x0bdbdf21UL, 0x86d3d2d4UL, 0xf1d4e242UL,
0x68ddb3f8UL, 0x1fda836eUL, 0x81be16cdUL, 0xf6b9265bUL, 0x6fb077e1UL,
0x18b74777UL, 0x88085ae6UL, 0xff0f6a70UL, 0x66063bcaUL, 0x11010b5cUL,
0x8f659effUL, 0xf862ae69UL, 0x616bffd3UL, 0x166ccf45UL, 0xa00ae278UL,
0xd70dd2eeUL, 0x4e048354UL, 0x3903b3c2UL, 0xa7672661UL, 0xd06016f7UL,
0x4969474dUL, 0x3e6e77dbUL, 0xaed16a4aUL, 0xd9d65adcUL, 0x40df0b66UL,
0x37d83bf0UL, 0xa9bcae53UL, 0xdebb9ec5UL, 0x47b2cf7fUL, 0x30b5ffe9UL,
0xbdbdf21cUL, 0xcabac28aUL, 0x53b39330UL, 0x24b4a3a6UL, 0xbad03605UL,
0xcdd70693UL, 0x54de5729UL, 0x23d967bfUL, 0xb3667a2eUL, 0xc4614ab8UL,
0x5d681b02UL, 0x2a6f2b94UL, 0xb40bbe37UL, 0xc30c8ea1UL, 0x5a05df1bUL,
0x2d02ef8dUL
#ifdef BYFOUR
},
{
0x00000000UL, 0x191b3141UL, 0x32366282UL, 0x2b2d53c3UL, 0x646cc504UL,
0x7d77f445UL, 0x565aa786UL, 0x4f4196c7UL, 0xc8d98a08UL, 0xd1c2bb49UL,
0xfaefe88aUL, 0xe3f4d9cbUL, 0xacb54f0cUL, 0xb5ae7e4dUL, 0x9e832d8eUL,
0x87981ccfUL, 0x4ac21251UL, 0x53d92310UL, 0x78f470d3UL, 0x61ef4192UL,
0x2eaed755UL, 0x37b5e614UL, 0x1c98b5d7UL, 0x05838496UL, 0x821b9859UL,
0x9b00a918UL, 0xb02dfadbUL, 0xa936cb9aUL, 0xe6775d5dUL, 0xff6c6c1cUL,
0xd4413fdfUL, 0xcd5a0e9eUL, 0x958424a2UL, 0x8c9f15e3UL, 0xa7b24620UL,
0xbea97761UL, 0xf1e8e1a6UL, 0xe8f3d0e7UL, 0xc3de8324UL, 0xdac5b265UL,
0x5d5daeaaUL, 0x44469febUL, 0x6f6bcc28UL, 0x7670fd69UL, 0x39316baeUL,
0x202a5aefUL, 0x0b07092cUL, 0x121c386dUL, 0xdf4636f3UL, 0xc65d07b2UL,
0xed705471UL, 0xf46b6530UL, 0xbb2af3f7UL, 0xa231c2b6UL, 0x891c9175UL,
0x9007a034UL, 0x179fbcfbUL, 0x0e848dbaUL, 0x25a9de79UL, 0x3cb2ef38UL,
0x73f379ffUL, 0x6ae848beUL, 0x41c51b7dUL, 0x58de2a3cUL, 0xf0794f05UL,
0xe9627e44UL, 0xc24f2d87UL, 0xdb541cc6UL, 0x94158a01UL, 0x8d0ebb40UL,
0xa623e883UL, 0xbf38d9c2UL, 0x38a0c50dUL, 0x21bbf44cUL, 0x0a96a78fUL,
0x138d96ceUL, 0x5ccc0009UL, 0x45d73148UL, 0x6efa628bUL, 0x77e153caUL,
0xbabb5d54UL, 0xa3a06c15UL, 0x888d3fd6UL, 0x91960e97UL, 0xded79850UL,
0xc7cca911UL, 0xece1fad2UL, 0xf5facb93UL, 0x7262d75cUL, 0x6b79e61dUL,
0x4054b5deUL, 0x594f849fUL, 0x160e1258UL, 0x0f152319UL, 0x243870daUL,
0x3d23419bUL, 0x65fd6ba7UL, 0x7ce65ae6UL, 0x57cb0925UL, 0x4ed03864UL,
0x0191aea3UL, 0x188a9fe2UL, 0x33a7cc21UL, 0x2abcfd60UL, 0xad24e1afUL,
0xb43fd0eeUL, 0x9f12832dUL, 0x8609b26cUL, 0xc94824abUL, 0xd05315eaUL,
0xfb7e4629UL, 0xe2657768UL, 0x2f3f79f6UL, 0x362448b7UL, 0x1d091b74UL,
0x04122a35UL, 0x4b53bcf2UL, 0x52488db3UL, 0x7965de70UL, 0x607eef31UL,
0xe7e6f3feUL, 0xfefdc2bfUL, 0xd5d0917cUL, 0xcccba03dUL, 0x838a36faUL,
0x9a9107bbUL, 0xb1bc5478UL, 0xa8a76539UL, 0x3b83984bUL, 0x2298a90aUL,
0x09b5fac9UL, 0x10aecb88UL, 0x5fef5d4fUL, 0x46f46c0eUL, 0x6dd93fcdUL,
0x74c20e8cUL, 0xf35a1243UL, 0xea412302UL, 0xc16c70c1UL, 0xd8774180UL,
0x9736d747UL, 0x8e2de606UL, 0xa500b5c5UL, 0xbc1b8484UL, 0x71418a1aUL,
0x685abb5bUL, 0x4377e898UL, 0x5a6cd9d9UL, 0x152d4f1eUL, 0x0c367e5fUL,
0x271b2d9cUL, 0x3e001cddUL, 0xb9980012UL, 0xa0833153UL, 0x8bae6290UL,
0x92b553d1UL, 0xddf4c516UL, 0xc4eff457UL, 0xefc2a794UL, 0xf6d996d5UL,
0xae07bce9UL, 0xb71c8da8UL, 0x9c31de6bUL, 0x852aef2aUL, 0xca6b79edUL,
0xd37048acUL, 0xf85d1b6fUL, 0xe1462a2eUL, 0x66de36e1UL, 0x7fc507a0UL,
0x54e85463UL, 0x4df36522UL, 0x02b2f3e5UL, 0x1ba9c2a4UL, 0x30849167UL,
0x299fa026UL, 0xe4c5aeb8UL, 0xfdde9ff9UL, 0xd6f3cc3aUL, 0xcfe8fd7bUL,
0x80a96bbcUL, 0x99b25afdUL, 0xb29f093eUL, 0xab84387fUL, 0x2c1c24b0UL,
0x350715f1UL, 0x1e2a4632UL, 0x07317773UL, 0x4870e1b4UL, 0x516bd0f5UL,
0x7a468336UL, 0x635db277UL, 0xcbfad74eUL, 0xd2e1e60fUL, 0xf9ccb5ccUL,
0xe0d7848dUL, 0xaf96124aUL, 0xb68d230bUL, 0x9da070c8UL, 0x84bb4189UL,
0x03235d46UL, 0x1a386c07UL, 0x31153fc4UL, 0x280e0e85UL, 0x674f9842UL,
0x7e54a903UL, 0x5579fac0UL, 0x4c62cb81UL, 0x8138c51fUL, 0x9823f45eUL,
0xb30ea79dUL, 0xaa1596dcUL, 0xe554001bUL, 0xfc4f315aUL, 0xd7626299UL,
0xce7953d8UL, 0x49e14f17UL, 0x50fa7e56UL, 0x7bd72d95UL, 0x62cc1cd4UL,
0x2d8d8a13UL, 0x3496bb52UL, 0x1fbbe891UL, 0x06a0d9d0UL, 0x5e7ef3ecUL,
0x4765c2adUL, 0x6c48916eUL, 0x7553a02fUL, 0x3a1236e8UL, 0x230907a9UL,
0x0824546aUL, 0x113f652bUL, 0x96a779e4UL, 0x8fbc48a5UL, 0xa4911b66UL,
0xbd8a2a27UL, 0xf2cbbce0UL, 0xebd08da1UL, 0xc0fdde62UL, 0xd9e6ef23UL,
0x14bce1bdUL, 0x0da7d0fcUL, 0x268a833fUL, 0x3f91b27eUL, 0x70d024b9UL,
0x69cb15f8UL, 0x42e6463bUL, 0x5bfd777aUL, 0xdc656bb5UL, 0xc57e5af4UL,
0xee530937UL, 0xf7483876UL, 0xb809aeb1UL, 0xa1129ff0UL, 0x8a3fcc33UL,
0x9324fd72UL
},
{
0x00000000UL, 0x01c26a37UL, 0x0384d46eUL, 0x0246be59UL, 0x0709a8dcUL,
0x06cbc2ebUL, 0x048d7cb2UL, 0x054f1685UL, 0x0e1351b8UL, 0x0fd13b8fUL,
0x0d9785d6UL, 0x0c55efe1UL, 0x091af964UL, 0x08d89353UL, 0x0a9e2d0aUL,
0x0b5c473dUL, 0x1c26a370UL, 0x1de4c947UL, 0x1fa2771eUL, 0x1e601d29UL,
0x1b2f0bacUL, 0x1aed619bUL, 0x18abdfc2UL, 0x1969b5f5UL, 0x1235f2c8UL,
0x13f798ffUL, 0x11b126a6UL, 0x10734c91UL, 0x153c5a14UL, 0x14fe3023UL,
0x16b88e7aUL, 0x177ae44dUL, 0x384d46e0UL, 0x398f2cd7UL, 0x3bc9928eUL,
0x3a0bf8b9UL, 0x3f44ee3cUL, 0x3e86840bUL, 0x3cc03a52UL, 0x3d025065UL,
0x365e1758UL, 0x379c7d6fUL, 0x35dac336UL, 0x3418a901UL, 0x3157bf84UL,
0x3095d5b3UL, 0x32d36beaUL, 0x331101ddUL, 0x246be590UL, 0x25a98fa7UL,
0x27ef31feUL, 0x262d5bc9UL, 0x23624d4cUL, 0x22a0277bUL, 0x20e69922UL,
0x2124f315UL, 0x2a78b428UL, 0x2bbade1fUL, 0x29fc6046UL, 0x283e0a71UL,
0x2d711cf4UL, 0x2cb376c3UL, 0x2ef5c89aUL, 0x2f37a2adUL, 0x709a8dc0UL,
0x7158e7f7UL, 0x731e59aeUL, 0x72dc3399UL, 0x7793251cUL, 0x76514f2bUL,
0x7417f172UL, 0x75d59b45UL, 0x7e89dc78UL, 0x7f4bb64fUL, 0x7d0d0816UL,
0x7ccf6221UL, 0x798074a4UL, 0x78421e93UL, 0x7a04a0caUL, 0x7bc6cafdUL,
0x6cbc2eb0UL, 0x6d7e4487UL, 0x6f38fadeUL, 0x6efa90e9UL, 0x6bb5866cUL,
0x6a77ec5bUL, 0x68315202UL, 0x69f33835UL, 0x62af7f08UL, 0x636d153fUL,
0x612bab66UL, 0x60e9c151UL, 0x65a6d7d4UL, 0x6464bde3UL, 0x662203baUL,
0x67e0698dUL, 0x48d7cb20UL, 0x4915a117UL, 0x4b531f4eUL, 0x4a917579UL,
0x4fde63fcUL, 0x4e1c09cbUL, 0x4c5ab792UL, 0x4d98dda5UL, 0x46c49a98UL,
0x4706f0afUL, 0x45404ef6UL, 0x448224c1UL, 0x41cd3244UL, 0x400f5873UL,
0x4249e62aUL, 0x438b8c1dUL, 0x54f16850UL, 0x55330267UL, 0x5775bc3eUL,
0x56b7d609UL, 0x53f8c08cUL, 0x523aaabbUL, 0x507c14e2UL, 0x51be7ed5UL,
0x5ae239e8UL, 0x5b2053dfUL, 0x5966ed86UL, 0x58a487b1UL, 0x5deb9134UL,
0x5c29fb03UL, 0x5e6f455aUL, 0x5fad2f6dUL, 0xe1351b80UL, 0xe0f771b7UL,
0xe2b1cfeeUL, 0xe373a5d9UL, 0xe63cb35cUL, 0xe7fed96bUL, 0xe5b86732UL,
0xe47a0d05UL, 0xef264a38UL, 0xeee4200fUL, 0xeca29e56UL, 0xed60f461UL,
0xe82fe2e4UL, 0xe9ed88d3UL, 0xebab368aUL, 0xea695cbdUL, 0xfd13b8f0UL,
0xfcd1d2c7UL, 0xfe976c9eUL, 0xff5506a9UL, 0xfa1a102cUL, 0xfbd87a1bUL,
0xf99ec442UL, 0xf85cae75UL, 0xf300e948UL, 0xf2c2837fUL, 0xf0843d26UL,
0xf1465711UL, 0xf4094194UL, 0xf5cb2ba3UL, 0xf78d95faUL, 0xf64fffcdUL,
0xd9785d60UL, 0xd8ba3757UL, 0xdafc890eUL, 0xdb3ee339UL, 0xde71f5bcUL,
0xdfb39f8bUL, 0xddf521d2UL, 0xdc374be5UL, 0xd76b0cd8UL, 0xd6a966efUL,
0xd4efd8b6UL, 0xd52db281UL, 0xd062a404UL, 0xd1a0ce33UL, 0xd3e6706aUL,
0xd2241a5dUL, 0xc55efe10UL, 0xc49c9427UL, 0xc6da2a7eUL, 0xc7184049UL,
0xc25756ccUL, 0xc3953cfbUL, 0xc1d382a2UL, 0xc011e895UL, 0xcb4dafa8UL,
0xca8fc59fUL, 0xc8c97bc6UL, 0xc90b11f1UL, 0xcc440774UL, 0xcd866d43UL,
0xcfc0d31aUL, 0xce02b92dUL, 0x91af9640UL, 0x906dfc77UL, 0x922b422eUL,
0x93e92819UL, 0x96a63e9cUL, 0x976454abUL, 0x9522eaf2UL, 0x94e080c5UL,
0x9fbcc7f8UL, 0x9e7eadcfUL, 0x9c381396UL, 0x9dfa79a1UL, 0x98b56f24UL,
0x99770513UL, 0x9b31bb4aUL, 0x9af3d17dUL, 0x8d893530UL, 0x8c4b5f07UL,
0x8e0de15eUL, 0x8fcf8b69UL, 0x8a809decUL, 0x8b42f7dbUL, 0x89044982UL,
0x88c623b5UL, 0x839a6488UL, 0x82580ebfUL, 0x801eb0e6UL, 0x81dcdad1UL,
0x8493cc54UL, 0x8551a663UL, 0x8717183aUL, 0x86d5720dUL, 0xa9e2d0a0UL,
0xa820ba97UL, 0xaa6604ceUL, 0xaba46ef9UL, 0xaeeb787cUL, 0xaf29124bUL,
0xad6fac12UL, 0xacadc625UL, 0xa7f18118UL, 0xa633eb2fUL, 0xa4755576UL,
0xa5b73f41UL, 0xa0f829c4UL, 0xa13a43f3UL, 0xa37cfdaaUL, 0xa2be979dUL,
0xb5c473d0UL, 0xb40619e7UL, 0xb640a7beUL, 0xb782cd89UL, 0xb2cddb0cUL,
0xb30fb13bUL, 0xb1490f62UL, 0xb08b6555UL, 0xbbd72268UL, 0xba15485fUL,
0xb853f606UL, 0xb9919c31UL, 0xbcde8ab4UL, 0xbd1ce083UL, 0xbf5a5edaUL,
0xbe9834edUL
},
{
0x00000000UL, 0xb8bc6765UL, 0xaa09c88bUL, 0x12b5afeeUL, 0x8f629757UL,
0x37def032UL, 0x256b5fdcUL, 0x9dd738b9UL, 0xc5b428efUL, 0x7d084f8aUL,
0x6fbde064UL, 0xd7018701UL, 0x4ad6bfb8UL, 0xf26ad8ddUL, 0xe0df7733UL,
0x58631056UL, 0x5019579fUL, 0xe8a530faUL, 0xfa109f14UL, 0x42acf871UL,
0xdf7bc0c8UL, 0x67c7a7adUL, 0x75720843UL, 0xcdce6f26UL, 0x95ad7f70UL,
0x2d111815UL, 0x3fa4b7fbUL, 0x8718d09eUL, 0x1acfe827UL, 0xa2738f42UL,
0xb0c620acUL, 0x087a47c9UL, 0xa032af3eUL, 0x188ec85bUL, 0x0a3b67b5UL,
0xb28700d0UL, 0x2f503869UL, 0x97ec5f0cUL, 0x8559f0e2UL, 0x3de59787UL,
0x658687d1UL, 0xdd3ae0b4UL, 0xcf8f4f5aUL, 0x7733283fUL, 0xeae41086UL,
0x525877e3UL, 0x40edd80dUL, 0xf851bf68UL, 0xf02bf8a1UL, 0x48979fc4UL,
0x5a22302aUL, 0xe29e574fUL, 0x7f496ff6UL, 0xc7f50893UL, 0xd540a77dUL,
0x6dfcc018UL, 0x359fd04eUL, 0x8d23b72bUL, 0x9f9618c5UL, 0x272a7fa0UL,
0xbafd4719UL, 0x0241207cUL, 0x10f48f92UL, 0xa848e8f7UL, 0x9b14583dUL,
0x23a83f58UL, 0x311d90b6UL, 0x89a1f7d3UL, 0x1476cf6aUL, 0xaccaa80fUL,
0xbe7f07e1UL, 0x06c36084UL, 0x5ea070d2UL, 0xe61c17b7UL, 0xf4a9b859UL,
0x4c15df3cUL, 0xd1c2e785UL, 0x697e80e0UL, 0x7bcb2f0eUL, 0xc377486bUL,
0xcb0d0fa2UL, 0x73b168c7UL, 0x6104c729UL, 0xd9b8a04cUL, 0x446f98f5UL,
0xfcd3ff90UL, 0xee66507eUL, 0x56da371bUL, 0x0eb9274dUL, 0xb6054028UL,
0xa4b0efc6UL, 0x1c0c88a3UL, 0x81dbb01aUL, 0x3967d77fUL, 0x2bd27891UL,
0x936e1ff4UL, 0x3b26f703UL, 0x839a9066UL, 0x912f3f88UL, 0x299358edUL,
0xb4446054UL, 0x0cf80731UL, 0x1e4da8dfUL, 0xa6f1cfbaUL, 0xfe92dfecUL,
0x462eb889UL, 0x549b1767UL, 0xec277002UL, 0x71f048bbUL, 0xc94c2fdeUL,
0xdbf98030UL, 0x6345e755UL, 0x6b3fa09cUL, 0xd383c7f9UL, 0xc1366817UL,
0x798a0f72UL, 0xe45d37cbUL, 0x5ce150aeUL, 0x4e54ff40UL, 0xf6e89825UL,
0xae8b8873UL, 0x1637ef16UL, 0x048240f8UL, 0xbc3e279dUL, 0x21e91f24UL,
0x99557841UL, 0x8be0d7afUL, 0x335cb0caUL, 0xed59b63bUL, 0x55e5d15eUL,
0x47507eb0UL, 0xffec19d5UL, 0x623b216cUL, 0xda874609UL, 0xc832e9e7UL,
0x708e8e82UL, 0x28ed9ed4UL, 0x9051f9b1UL, 0x82e4565fUL, 0x3a58313aUL,
0xa78f0983UL, 0x1f336ee6UL, 0x0d86c108UL, 0xb53aa66dUL, 0xbd40e1a4UL,
0x05fc86c1UL, 0x1749292fUL, 0xaff54e4aUL, 0x322276f3UL, 0x8a9e1196UL,
0x982bbe78UL, 0x2097d91dUL, 0x78f4c94bUL, 0xc048ae2eUL, 0xd2fd01c0UL,
0x6a4166a5UL, 0xf7965e1cUL, 0x4f2a3979UL, 0x5d9f9697UL, 0xe523f1f2UL,
0x4d6b1905UL, 0xf5d77e60UL, 0xe762d18eUL, 0x5fdeb6ebUL, 0xc2098e52UL,
0x7ab5e937UL, 0x680046d9UL, 0xd0bc21bcUL, 0x88df31eaUL, 0x3063568fUL,
0x22d6f961UL, 0x9a6a9e04UL, 0x07bda6bdUL, 0xbf01c1d8UL, 0xadb46e36UL,
0x15080953UL, 0x1d724e9aUL, 0xa5ce29ffUL, 0xb77b8611UL, 0x0fc7e174UL,
0x9210d9cdUL, 0x2aacbea8UL, 0x38191146UL, 0x80a57623UL, 0xd8c66675UL,
0x607a0110UL, 0x72cfaefeUL, 0xca73c99bUL, 0x57a4f122UL, 0xef189647UL,
0xfdad39a9UL, 0x45115eccUL, 0x764dee06UL, 0xcef18963UL, 0xdc44268dUL,
0x64f841e8UL, 0xf92f7951UL, 0x41931e34UL, 0x5326b1daUL, 0xeb9ad6bfUL,
0xb3f9c6e9UL, 0x0b45a18cUL, 0x19f00e62UL, 0xa14c6907UL, 0x3c9b51beUL,
0x842736dbUL, 0x96929935UL, 0x2e2efe50UL, 0x2654b999UL, 0x9ee8defcUL,
0x8c5d7112UL, 0x34e11677UL, 0xa9362eceUL, 0x118a49abUL, 0x033fe645UL,
0xbb838120UL, 0xe3e09176UL, 0x5b5cf613UL, 0x49e959fdUL, 0xf1553e98UL,
0x6c820621UL, 0xd43e6144UL, 0xc68bceaaUL, 0x7e37a9cfUL, 0xd67f4138UL,
0x6ec3265dUL, 0x7c7689b3UL, 0xc4caeed6UL, 0x591dd66fUL, 0xe1a1b10aUL,
0xf3141ee4UL, 0x4ba87981UL, 0x13cb69d7UL, 0xab770eb2UL, 0xb9c2a15cUL,
0x017ec639UL, 0x9ca9fe80UL, 0x241599e5UL, 0x36a0360bUL, 0x8e1c516eUL,
0x866616a7UL, 0x3eda71c2UL, 0x2c6fde2cUL, 0x94d3b949UL, 0x090481f0UL,
0xb1b8e695UL, 0xa30d497bUL, 0x1bb12e1eUL, 0x43d23e48UL, 0xfb6e592dUL,
0xe9dbf6c3UL, 0x516791a6UL, 0xccb0a91fUL, 0x740cce7aUL, 0x66b96194UL,
0xde0506f1UL
},
{
0x00000000UL, 0x96300777UL, 0x2c610eeeUL, 0xba510999UL, 0x19c46d07UL,
0x8ff46a70UL, 0x35a563e9UL, 0xa395649eUL, 0x3288db0eUL, 0xa4b8dc79UL,
0x1ee9d5e0UL, 0x88d9d297UL, 0x2b4cb609UL, 0xbd7cb17eUL, 0x072db8e7UL,
0x911dbf90UL, 0x6410b71dUL, 0xf220b06aUL, 0x4871b9f3UL, 0xde41be84UL,
0x7dd4da1aUL, 0xebe4dd6dUL, 0x51b5d4f4UL, 0xc785d383UL, 0x56986c13UL,
0xc0a86b64UL, 0x7af962fdUL, 0xecc9658aUL, 0x4f5c0114UL, 0xd96c0663UL,
0x633d0ffaUL, 0xf50d088dUL, 0xc8206e3bUL, 0x5e10694cUL, 0xe44160d5UL,
0x727167a2UL, 0xd1e4033cUL, 0x47d4044bUL, 0xfd850dd2UL, 0x6bb50aa5UL,
0xfaa8b535UL, 0x6c98b242UL, 0xd6c9bbdbUL, 0x40f9bcacUL, 0xe36cd832UL,
0x755cdf45UL, 0xcf0dd6dcUL, 0x593dd1abUL, 0xac30d926UL, 0x3a00de51UL,
0x8051d7c8UL, 0x1661d0bfUL, 0xb5f4b421UL, 0x23c4b356UL, 0x9995bacfUL,
0x0fa5bdb8UL, 0x9eb80228UL, 0x0888055fUL, 0xb2d90cc6UL, 0x24e90bb1UL,
0x877c6f2fUL, 0x114c6858UL, 0xab1d61c1UL, 0x3d2d66b6UL, 0x9041dc76UL,
0x0671db01UL, 0xbc20d298UL, 0x2a10d5efUL, 0x8985b171UL, 0x1fb5b606UL,
0xa5e4bf9fUL, 0x33d4b8e8UL, 0xa2c90778UL, 0x34f9000fUL, 0x8ea80996UL,
0x18980ee1UL, 0xbb0d6a7fUL, 0x2d3d6d08UL, 0x976c6491UL, 0x015c63e6UL,
0xf4516b6bUL, 0x62616c1cUL, 0xd8306585UL, 0x4e0062f2UL, 0xed95066cUL,
0x7ba5011bUL, 0xc1f40882UL, 0x57c40ff5UL, 0xc6d9b065UL, 0x50e9b712UL,
0xeab8be8bUL, 0x7c88b9fcUL, 0xdf1ddd62UL, 0x492dda15UL, 0xf37cd38cUL,
0x654cd4fbUL, 0x5861b24dUL, 0xce51b53aUL, 0x7400bca3UL, 0xe230bbd4UL,
0x41a5df4aUL, 0xd795d83dUL, 0x6dc4d1a4UL, 0xfbf4d6d3UL, 0x6ae96943UL,
0xfcd96e34UL, 0x468867adUL, 0xd0b860daUL, 0x732d0444UL, 0xe51d0333UL,
0x5f4c0aaaUL, 0xc97c0dddUL, 0x3c710550UL, 0xaa410227UL, 0x10100bbeUL,
0x86200cc9UL, 0x25b56857UL, 0xb3856f20UL, 0x09d466b9UL, 0x9fe461ceUL,
0x0ef9de5eUL, 0x98c9d929UL, 0x2298d0b0UL, 0xb4a8d7c7UL, 0x173db359UL,
0x810db42eUL, 0x3b5cbdb7UL, 0xad6cbac0UL, 0x2083b8edUL, 0xb6b3bf9aUL,
0x0ce2b603UL, 0x9ad2b174UL, 0x3947d5eaUL, 0xaf77d29dUL, 0x1526db04UL,
0x8316dc73UL, 0x120b63e3UL, 0x843b6494UL, 0x3e6a6d0dUL, 0xa85a6a7aUL,
0x0bcf0ee4UL, 0x9dff0993UL, 0x27ae000aUL, 0xb19e077dUL, 0x44930ff0UL,
0xd2a30887UL, 0x68f2011eUL, 0xfec20669UL, 0x5d5762f7UL, 0xcb676580UL,
0x71366c19UL, 0xe7066b6eUL, 0x761bd4feUL, 0xe02bd389UL, 0x5a7ada10UL,
0xcc4add67UL, 0x6fdfb9f9UL, 0xf9efbe8eUL, 0x43beb717UL, 0xd58eb060UL,
0xe8a3d6d6UL, 0x7e93d1a1UL, 0xc4c2d838UL, 0x52f2df4fUL, 0xf167bbd1UL,
0x6757bca6UL, 0xdd06b53fUL, 0x4b36b248UL, 0xda2b0dd8UL, 0x4c1b0aafUL,
0xf64a0336UL, 0x607a0441UL, 0xc3ef60dfUL, 0x55df67a8UL, 0xef8e6e31UL,
0x79be6946UL, 0x8cb361cbUL, 0x1a8366bcUL, 0xa0d26f25UL, 0x36e26852UL,
0x95770cccUL, 0x03470bbbUL, 0xb9160222UL, 0x2f260555UL, 0xbe3bbac5UL,
0x280bbdb2UL, 0x925ab42bUL, 0x046ab35cUL, 0xa7ffd7c2UL, 0x31cfd0b5UL,
0x8b9ed92cUL, 0x1daede5bUL, 0xb0c2649bUL, 0x26f263ecUL, 0x9ca36a75UL,
0x0a936d02UL, 0xa906099cUL, 0x3f360eebUL, 0x85670772UL, 0x13570005UL,
0x824abf95UL, 0x147ab8e2UL, 0xae2bb17bUL, 0x381bb60cUL, 0x9b8ed292UL,
0x0dbed5e5UL, 0xb7efdc7cUL, 0x21dfdb0bUL, 0xd4d2d386UL, 0x42e2d4f1UL,
0xf8b3dd68UL, 0x6e83da1fUL, 0xcd16be81UL, 0x5b26b9f6UL, 0xe177b06fUL,
0x7747b718UL, 0xe65a0888UL, 0x706a0fffUL, 0xca3b0666UL, 0x5c0b0111UL,
0xff9e658fUL, 0x69ae62f8UL, 0xd3ff6b61UL, 0x45cf6c16UL, 0x78e20aa0UL,
0xeed20dd7UL, 0x5483044eUL, 0xc2b30339UL, 0x612667a7UL, 0xf71660d0UL,
0x4d476949UL, 0xdb776e3eUL, 0x4a6ad1aeUL, 0xdc5ad6d9UL, 0x660bdf40UL,
0xf03bd837UL, 0x53aebca9UL, 0xc59ebbdeUL, 0x7fcfb247UL, 0xe9ffb530UL,
0x1cf2bdbdUL, 0x8ac2bacaUL, 0x3093b353UL, 0xa6a3b424UL, 0x0536d0baUL,
0x9306d7cdUL, 0x2957de54UL, 0xbf67d923UL, 0x2e7a66b3UL, 0xb84a61c4UL,
0x021b685dUL, 0x942b6f2aUL, 0x37be0bb4UL, 0xa18e0cc3UL, 0x1bdf055aUL,
0x8def022dUL
},
{
0x00000000UL, 0x41311b19UL, 0x82623632UL, 0xc3532d2bUL, 0x04c56c64UL,
0x45f4777dUL, 0x86a75a56UL, 0xc796414fUL, 0x088ad9c8UL, 0x49bbc2d1UL,
0x8ae8effaUL, 0xcbd9f4e3UL, 0x0c4fb5acUL, 0x4d7eaeb5UL, 0x8e2d839eUL,
0xcf1c9887UL, 0x5112c24aUL, 0x1023d953UL, 0xd370f478UL, 0x9241ef61UL,
0x55d7ae2eUL, 0x14e6b537UL, 0xd7b5981cUL, 0x96848305UL, 0x59981b82UL,
0x18a9009bUL, 0xdbfa2db0UL, 0x9acb36a9UL, 0x5d5d77e6UL, 0x1c6c6cffUL,
0xdf3f41d4UL, 0x9e0e5acdUL, 0xa2248495UL, 0xe3159f8cUL, 0x2046b2a7UL,
0x6177a9beUL, 0xa6e1e8f1UL, 0xe7d0f3e8UL, 0x2483dec3UL, 0x65b2c5daUL,
0xaaae5d5dUL, 0xeb9f4644UL, 0x28cc6b6fUL, 0x69fd7076UL, 0xae6b3139UL,
0xef5a2a20UL, 0x2c09070bUL, 0x6d381c12UL, 0xf33646dfUL, 0xb2075dc6UL,
0x715470edUL, 0x30656bf4UL, 0xf7f32abbUL, 0xb6c231a2UL, 0x75911c89UL,
0x34a00790UL, 0xfbbc9f17UL, 0xba8d840eUL, 0x79dea925UL, 0x38efb23cUL,
0xff79f373UL, 0xbe48e86aUL, 0x7d1bc541UL, 0x3c2ade58UL, 0x054f79f0UL,
0x447e62e9UL, 0x872d4fc2UL, 0xc61c54dbUL, 0x018a1594UL, 0x40bb0e8dUL,
0x83e823a6UL, 0xc2d938bfUL, 0x0dc5a038UL, 0x4cf4bb21UL, 0x8fa7960aUL,
0xce968d13UL, 0x0900cc5cUL, 0x4831d745UL, 0x8b62fa6eUL, 0xca53e177UL,
0x545dbbbaUL, 0x156ca0a3UL, 0xd63f8d88UL, 0x970e9691UL, 0x5098d7deUL,
0x11a9ccc7UL, 0xd2fae1ecUL, 0x93cbfaf5UL, 0x5cd76272UL, 0x1de6796bUL,
0xdeb55440UL, 0x9f844f59UL, 0x58120e16UL, 0x1923150fUL, 0xda703824UL,
0x9b41233dUL, 0xa76bfd65UL, 0xe65ae67cUL, 0x2509cb57UL, 0x6438d04eUL,
0xa3ae9101UL, 0xe29f8a18UL, 0x21cca733UL, 0x60fdbc2aUL, 0xafe124adUL,
0xeed03fb4UL, 0x2d83129fUL, 0x6cb20986UL, 0xab2448c9UL, 0xea1553d0UL,
0x29467efbUL, 0x687765e2UL, 0xf6793f2fUL, 0xb7482436UL, 0x741b091dUL,
0x352a1204UL, 0xf2bc534bUL, 0xb38d4852UL, 0x70de6579UL, 0x31ef7e60UL,
0xfef3e6e7UL, 0xbfc2fdfeUL, 0x7c91d0d5UL, 0x3da0cbccUL, 0xfa368a83UL,
0xbb07919aUL, 0x7854bcb1UL, 0x3965a7a8UL, 0x4b98833bUL, 0x0aa99822UL,
0xc9fab509UL, 0x88cbae10UL, 0x4f5def5fUL, 0x0e6cf446UL, 0xcd3fd96dUL,
0x8c0ec274UL, 0x43125af3UL, 0x022341eaUL, 0xc1706cc1UL, 0x804177d8UL,
0x47d73697UL, 0x06e62d8eUL, 0xc5b500a5UL, 0x84841bbcUL, 0x1a8a4171UL,
0x5bbb5a68UL, 0x98e87743UL, 0xd9d96c5aUL, 0x1e4f2d15UL, 0x5f7e360cUL,
0x9c2d1b27UL, 0xdd1c003eUL, 0x120098b9UL, 0x533183a0UL, 0x9062ae8bUL,
0xd153b592UL, 0x16c5f4ddUL, 0x57f4efc4UL, 0x94a7c2efUL, 0xd596d9f6UL,
0xe9bc07aeUL, 0xa88d1cb7UL, 0x6bde319cUL, 0x2aef2a85UL, 0xed796bcaUL,
0xac4870d3UL, 0x6f1b5df8UL, 0x2e2a46e1UL, 0xe136de66UL, 0xa007c57fUL,
0x6354e854UL, 0x2265f34dUL, 0xe5f3b202UL, 0xa4c2a91bUL, 0x67918430UL,
0x26a09f29UL, 0xb8aec5e4UL, 0xf99fdefdUL, 0x3accf3d6UL, 0x7bfde8cfUL,
0xbc6ba980UL, 0xfd5ab299UL, 0x3e099fb2UL, 0x7f3884abUL, 0xb0241c2cUL,
0xf1150735UL, 0x32462a1eUL, 0x73773107UL, 0xb4e17048UL, 0xf5d06b51UL,
0x3683467aUL, 0x77b25d63UL, 0x4ed7facbUL, 0x0fe6e1d2UL, 0xccb5ccf9UL,
0x8d84d7e0UL, 0x4a1296afUL, 0x0b238db6UL, 0xc870a09dUL, 0x8941bb84UL,
0x465d2303UL, 0x076c381aUL, 0xc43f1531UL, 0x850e0e28UL, 0x42984f67UL,
0x03a9547eUL, 0xc0fa7955UL, 0x81cb624cUL, 0x1fc53881UL, 0x5ef42398UL,
0x9da70eb3UL, 0xdc9615aaUL, 0x1b0054e5UL, 0x5a314ffcUL, 0x996262d7UL,
0xd85379ceUL, 0x174fe149UL, 0x567efa50UL, 0x952dd77bUL, 0xd41ccc62UL,
0x138a8d2dUL, 0x52bb9634UL, 0x91e8bb1fUL, 0xd0d9a006UL, 0xecf37e5eUL,
0xadc26547UL, 0x6e91486cUL, 0x2fa05375UL, 0xe836123aUL, 0xa9070923UL,
0x6a542408UL, 0x2b653f11UL, 0xe479a796UL, 0xa548bc8fUL, 0x661b91a4UL,
0x272a8abdUL, 0xe0bccbf2UL, 0xa18dd0ebUL, 0x62defdc0UL, 0x23efe6d9UL,
0xbde1bc14UL, 0xfcd0a70dUL, 0x3f838a26UL, 0x7eb2913fUL, 0xb924d070UL,
0xf815cb69UL, 0x3b46e642UL, 0x7a77fd5bUL, 0xb56b65dcUL, 0xf45a7ec5UL,
0x370953eeUL, 0x763848f7UL, 0xb1ae09b8UL, 0xf09f12a1UL, 0x33cc3f8aUL,
0x72fd2493UL
},
{
0x00000000UL, 0x376ac201UL, 0x6ed48403UL, 0x59be4602UL, 0xdca80907UL,
0xebc2cb06UL, 0xb27c8d04UL, 0x85164f05UL, 0xb851130eUL, 0x8f3bd10fUL,
0xd685970dUL, 0xe1ef550cUL, 0x64f91a09UL, 0x5393d808UL, 0x0a2d9e0aUL,
0x3d475c0bUL, 0x70a3261cUL, 0x47c9e41dUL, 0x1e77a21fUL, 0x291d601eUL,
0xac0b2f1bUL, 0x9b61ed1aUL, 0xc2dfab18UL, 0xf5b56919UL, 0xc8f23512UL,
0xff98f713UL, 0xa626b111UL, 0x914c7310UL, 0x145a3c15UL, 0x2330fe14UL,
0x7a8eb816UL, 0x4de47a17UL, 0xe0464d38UL, 0xd72c8f39UL, 0x8e92c93bUL,
0xb9f80b3aUL, 0x3cee443fUL, 0x0b84863eUL, 0x523ac03cUL, 0x6550023dUL,
0x58175e36UL, 0x6f7d9c37UL, 0x36c3da35UL, 0x01a91834UL, 0x84bf5731UL,
0xb3d59530UL, 0xea6bd332UL, 0xdd011133UL, 0x90e56b24UL, 0xa78fa925UL,
0xfe31ef27UL, 0xc95b2d26UL, 0x4c4d6223UL, 0x7b27a022UL, 0x2299e620UL,
0x15f32421UL, 0x28b4782aUL, 0x1fdeba2bUL, 0x4660fc29UL, 0x710a3e28UL,
0xf41c712dUL, 0xc376b32cUL, 0x9ac8f52eUL, 0xada2372fUL, 0xc08d9a70UL,
0xf7e75871UL, 0xae591e73UL, 0x9933dc72UL, 0x1c259377UL, 0x2b4f5176UL,
0x72f11774UL, 0x459bd575UL, 0x78dc897eUL, 0x4fb64b7fUL, 0x16080d7dUL,
0x2162cf7cUL, 0xa4748079UL, 0x931e4278UL, 0xcaa0047aUL, 0xfdcac67bUL,
0xb02ebc6cUL, 0x87447e6dUL, 0xdefa386fUL, 0xe990fa6eUL, 0x6c86b56bUL,
0x5bec776aUL, 0x02523168UL, 0x3538f369UL, 0x087faf62UL, 0x3f156d63UL,
0x66ab2b61UL, 0x51c1e960UL, 0xd4d7a665UL, 0xe3bd6464UL, 0xba032266UL,
0x8d69e067UL, 0x20cbd748UL, 0x17a11549UL, 0x4e1f534bUL, 0x7975914aUL,
0xfc63de4fUL, 0xcb091c4eUL, 0x92b75a4cUL, 0xa5dd984dUL, 0x989ac446UL,
0xaff00647UL, 0xf64e4045UL, 0xc1248244UL, 0x4432cd41UL, 0x73580f40UL,
0x2ae64942UL, 0x1d8c8b43UL, 0x5068f154UL, 0x67023355UL, 0x3ebc7557UL,
0x09d6b756UL, 0x8cc0f853UL, 0xbbaa3a52UL, 0xe2147c50UL, 0xd57ebe51UL,
0xe839e25aUL, 0xdf53205bUL, 0x86ed6659UL, 0xb187a458UL, 0x3491eb5dUL,
0x03fb295cUL, 0x5a456f5eUL, 0x6d2fad5fUL, 0x801b35e1UL, 0xb771f7e0UL,
0xeecfb1e2UL, 0xd9a573e3UL, 0x5cb33ce6UL, 0x6bd9fee7UL, 0x3267b8e5UL,
0x050d7ae4UL, 0x384a26efUL, 0x0f20e4eeUL, 0x569ea2ecUL, 0x61f460edUL,
0xe4e22fe8UL, 0xd388ede9UL, 0x8a36abebUL, 0xbd5c69eaUL, 0xf0b813fdUL,
0xc7d2d1fcUL, 0x9e6c97feUL, 0xa90655ffUL, 0x2c101afaUL, 0x1b7ad8fbUL,
0x42c49ef9UL, 0x75ae5cf8UL, 0x48e900f3UL, 0x7f83c2f2UL, 0x263d84f0UL,
0x115746f1UL, 0x944109f4UL, 0xa32bcbf5UL, 0xfa958df7UL, 0xcdff4ff6UL,
0x605d78d9UL, 0x5737bad8UL, 0x0e89fcdaUL, 0x39e33edbUL, 0xbcf571deUL,
0x8b9fb3dfUL, 0xd221f5ddUL, 0xe54b37dcUL, 0xd80c6bd7UL, 0xef66a9d6UL,
0xb6d8efd4UL, 0x81b22dd5UL, 0x04a462d0UL, 0x33cea0d1UL, 0x6a70e6d3UL,
0x5d1a24d2UL, 0x10fe5ec5UL, 0x27949cc4UL, 0x7e2adac6UL, 0x494018c7UL,
0xcc5657c2UL, 0xfb3c95c3UL, 0xa282d3c1UL, 0x95e811c0UL, 0xa8af4dcbUL,
0x9fc58fcaUL, 0xc67bc9c8UL, 0xf1110bc9UL, 0x740744ccUL, 0x436d86cdUL,
0x1ad3c0cfUL, 0x2db902ceUL, 0x4096af91UL, 0x77fc6d90UL, 0x2e422b92UL,
0x1928e993UL, 0x9c3ea696UL, 0xab546497UL, 0xf2ea2295UL, 0xc580e094UL,
0xf8c7bc9fUL, 0xcfad7e9eUL, 0x9613389cUL, 0xa179fa9dUL, 0x246fb598UL,
0x13057799UL, 0x4abb319bUL, 0x7dd1f39aUL, 0x3035898dUL, 0x075f4b8cUL,
0x5ee10d8eUL, 0x698bcf8fUL, 0xec9d808aUL, 0xdbf7428bUL, 0x82490489UL,
0xb523c688UL, 0x88649a83UL, 0xbf0e5882UL, 0xe6b01e80UL, 0xd1dadc81UL,
0x54cc9384UL, 0x63a65185UL, 0x3a181787UL, 0x0d72d586UL, 0xa0d0e2a9UL,
0x97ba20a8UL, 0xce0466aaUL, 0xf96ea4abUL, 0x7c78ebaeUL, 0x4b1229afUL,
0x12ac6fadUL, 0x25c6adacUL, 0x1881f1a7UL, 0x2feb33a6UL, 0x765575a4UL,
0x413fb7a5UL, 0xc429f8a0UL, 0xf3433aa1UL, 0xaafd7ca3UL, 0x9d97bea2UL,
0xd073c4b5UL, 0xe71906b4UL, 0xbea740b6UL, 0x89cd82b7UL, 0x0cdbcdb2UL,
0x3bb10fb3UL, 0x620f49b1UL, 0x55658bb0UL, 0x6822d7bbUL, 0x5f4815baUL,
0x06f653b8UL, 0x319c91b9UL, 0xb48adebcUL, 0x83e01cbdUL, 0xda5e5abfUL,
0xed3498beUL
},
{
0x00000000UL, 0x6567bcb8UL, 0x8bc809aaUL, 0xeeafb512UL, 0x5797628fUL,
0x32f0de37UL, 0xdc5f6b25UL, 0xb938d79dUL, 0xef28b4c5UL, 0x8a4f087dUL,
0x64e0bd6fUL, 0x018701d7UL, 0xb8bfd64aUL, 0xddd86af2UL, 0x3377dfe0UL,
0x56106358UL, 0x9f571950UL, 0xfa30a5e8UL, 0x149f10faUL, 0x71f8ac42UL,
0xc8c07bdfUL, 0xada7c767UL, 0x43087275UL, 0x266fcecdUL, 0x707fad95UL,
0x1518112dUL, 0xfbb7a43fUL, 0x9ed01887UL, 0x27e8cf1aUL, 0x428f73a2UL,
0xac20c6b0UL, 0xc9477a08UL, 0x3eaf32a0UL, 0x5bc88e18UL, 0xb5673b0aUL,
0xd00087b2UL, 0x6938502fUL, 0x0c5fec97UL, 0xe2f05985UL, 0x8797e53dUL,
0xd1878665UL, 0xb4e03addUL, 0x5a4f8fcfUL, 0x3f283377UL, 0x8610e4eaUL,
0xe3775852UL, 0x0dd8ed40UL, 0x68bf51f8UL, 0xa1f82bf0UL, 0xc49f9748UL,
0x2a30225aUL, 0x4f579ee2UL, 0xf66f497fUL, 0x9308f5c7UL, 0x7da740d5UL,
0x18c0fc6dUL, 0x4ed09f35UL, 0x2bb7238dUL, 0xc518969fUL, 0xa07f2a27UL,
0x1947fdbaUL, 0x7c204102UL, 0x928ff410UL, 0xf7e848a8UL, 0x3d58149bUL,
0x583fa823UL, 0xb6901d31UL, 0xd3f7a189UL, 0x6acf7614UL, 0x0fa8caacUL,
0xe1077fbeUL, 0x8460c306UL, 0xd270a05eUL, 0xb7171ce6UL, 0x59b8a9f4UL,
0x3cdf154cUL, 0x85e7c2d1UL, 0xe0807e69UL, 0x0e2fcb7bUL, 0x6b4877c3UL,
0xa20f0dcbUL, 0xc768b173UL, 0x29c70461UL, 0x4ca0b8d9UL, 0xf5986f44UL,
0x90ffd3fcUL, 0x7e5066eeUL, 0x1b37da56UL, 0x4d27b90eUL, 0x284005b6UL,
0xc6efb0a4UL, 0xa3880c1cUL, 0x1ab0db81UL, 0x7fd76739UL, 0x9178d22bUL,
0xf41f6e93UL, 0x03f7263bUL, 0x66909a83UL, 0x883f2f91UL, 0xed589329UL,
0x546044b4UL, 0x3107f80cUL, 0xdfa84d1eUL, 0xbacff1a6UL, 0xecdf92feUL,
0x89b82e46UL, 0x67179b54UL, 0x027027ecUL, 0xbb48f071UL, 0xde2f4cc9UL,
0x3080f9dbUL, 0x55e74563UL, 0x9ca03f6bUL, 0xf9c783d3UL, 0x176836c1UL,
0x720f8a79UL, 0xcb375de4UL, 0xae50e15cUL, 0x40ff544eUL, 0x2598e8f6UL,
0x73888baeUL, 0x16ef3716UL, 0xf8408204UL, 0x9d273ebcUL, 0x241fe921UL,
0x41785599UL, 0xafd7e08bUL, 0xcab05c33UL, 0x3bb659edUL, 0x5ed1e555UL,
0xb07e5047UL, 0xd519ecffUL, 0x6c213b62UL, 0x094687daUL, 0xe7e932c8UL,
0x828e8e70UL, 0xd49eed28UL, 0xb1f95190UL, 0x5f56e482UL, 0x3a31583aUL,
0x83098fa7UL, 0xe66e331fUL, 0x08c1860dUL, 0x6da63ab5UL, 0xa4e140bdUL,
0xc186fc05UL, 0x2f294917UL, 0x4a4ef5afUL, 0xf3762232UL, 0x96119e8aUL,
0x78be2b98UL, 0x1dd99720UL, 0x4bc9f478UL, 0x2eae48c0UL, 0xc001fdd2UL,
0xa566416aUL, 0x1c5e96f7UL, 0x79392a4fUL, 0x97969f5dUL, 0xf2f123e5UL,
0x05196b4dUL, 0x607ed7f5UL, 0x8ed162e7UL, 0xebb6de5fUL, 0x528e09c2UL,
0x37e9b57aUL, 0xd9460068UL, 0xbc21bcd0UL, 0xea31df88UL, 0x8f566330UL,
0x61f9d622UL, 0x049e6a9aUL, 0xbda6bd07UL, 0xd8c101bfUL, 0x366eb4adUL,
0x53090815UL, 0x9a4e721dUL, 0xff29cea5UL, 0x11867bb7UL, 0x74e1c70fUL,
0xcdd91092UL, 0xa8beac2aUL, 0x46111938UL, 0x2376a580UL, 0x7566c6d8UL,
0x10017a60UL, 0xfeaecf72UL, 0x9bc973caUL, 0x22f1a457UL, 0x479618efUL,
0xa939adfdUL, 0xcc5e1145UL, 0x06ee4d76UL, 0x6389f1ceUL, 0x8d2644dcUL,
0xe841f864UL, 0x51792ff9UL, 0x341e9341UL, 0xdab12653UL, 0xbfd69aebUL,
0xe9c6f9b3UL, 0x8ca1450bUL, 0x620ef019UL, 0x07694ca1UL, 0xbe519b3cUL,
0xdb362784UL, 0x35999296UL, 0x50fe2e2eUL, 0x99b95426UL, 0xfcdee89eUL,
0x12715d8cUL, 0x7716e134UL, 0xce2e36a9UL, 0xab498a11UL, 0x45e63f03UL,
0x208183bbUL, 0x7691e0e3UL, 0x13f65c5bUL, 0xfd59e949UL, 0x983e55f1UL,
0x2106826cUL, 0x44613ed4UL, 0xaace8bc6UL, 0xcfa9377eUL, 0x38417fd6UL,
0x5d26c36eUL, 0xb389767cUL, 0xd6eecac4UL, 0x6fd61d59UL, 0x0ab1a1e1UL,
0xe41e14f3UL, 0x8179a84bUL, 0xd769cb13UL, 0xb20e77abUL, 0x5ca1c2b9UL,
0x39c67e01UL, 0x80fea99cUL, 0xe5991524UL, 0x0b36a036UL, 0x6e511c8eUL,
0xa7166686UL, 0xc271da3eUL, 0x2cde6f2cUL, 0x49b9d394UL, 0xf0810409UL,
0x95e6b8b1UL, 0x7b490da3UL, 0x1e2eb11bUL, 0x483ed243UL, 0x2d596efbUL,
0xc3f6dbe9UL, 0xa6916751UL, 0x1fa9b0ccUL, 0x7ace0c74UL, 0x9461b966UL,
0xf10605deUL
#endif
}
};

104
crc32c.c Normal file
View file

@ -0,0 +1,104 @@
/*
* Oct 28, 2015 Song Liu simplified the code and port it to mdadm
*
* Aug 8, 2011 Bob Pearson with help from Joakim Tjernlund and George Spelvin
* cleaned up code to current version of sparse and added the slicing-by-8
* algorithm to the closely similar existing slicing-by-4 algorithm.
*
* Oct 15, 2000 Matt Domsch <Matt_Domsch@dell.com>
* Nicer crc32 functions/docs submitted by linux@horizon.com. Thanks!
* Code was from the public domain, copyright abandoned. Code was
* subsequently included in the kernel, thus was re-licensed under the
* GNU GPL v2.
*
* Oct 12, 2000 Matt Domsch <Matt_Domsch@dell.com>
* Same crc32 function was used in 5 other places in the kernel.
* I made one version, and deleted the others.
* There are various incantations of crc32(). Some use a seed of 0 or ~0.
* Some xor at the end with ~0. The generic crc32() function takes
* seed as an argument, and doesn't xor at the end. Then individual
* users can do whatever they need.
* drivers/net/smc9194.c uses seed ~0, doesn't xor with ~0.
* fs/jffs2 uses seed 0, doesn't xor with ~0.
* fs/partitions/efi.c uses seed ~0, xor's with ~0.
*
* This source code is licensed under the GNU General Public License,
* Version 2. See the file COPYING for more details.
*/
#include <sys/types.h>
#include <asm/types.h>
#include <stdlib.h>
/*
* There are multiple 16-bit CRC polynomials in common use, but this is
* *the* standard CRC-32 polynomial, first popularized by Ethernet.
* x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0
*/
#define CRCPOLY_LE 0xedb88320
#define CRCPOLY_BE 0x04c11db7
/*
* This is the CRC32c polynomial, as outlined by Castagnoli.
* x^32+x^28+x^27+x^26+x^25+x^23+x^22+x^20+x^19+x^18+x^14+x^13+x^11+x^10+x^9+
* x^8+x^6+x^0
*/
#define CRC32C_POLY_LE 0x82F63B78
/**
* crc32_le_generic() - Calculate bitwise little-endian Ethernet AUTODIN II
* CRC32/CRC32C
* @crc: seed value for computation. ~0 for Ethernet, sometimes 0 for other
* uses, or the previous crc32/crc32c value if computing incrementally.
* @p: pointer to buffer over which CRC32/CRC32C is run
* @len: length of buffer @p
* @polynomial: CRC32/CRC32c LE polynomial
*/
static inline __u32 crc32_le_generic(__u32 crc, unsigned char const *p,
size_t len, __u32 polynomial)
{
int i;
while (len--) {
crc ^= *p++;
for (i = 0; i < 8; i++)
crc = (crc >> 1) ^ ((crc & 1) ? polynomial : 0);
}
return crc;
}
__u32 crc32_le(__u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len, CRCPOLY_LE);
}
__u32 crc32c_le(__u32 crc, unsigned char const *p, size_t len)
{
return crc32_le_generic(crc, p, len, CRC32C_POLY_LE);
}
/**
* crc32_be_generic() - Calculate bitwise big-endian Ethernet AUTODIN II CRC32
* @crc: seed value for computation. ~0 for Ethernet, sometimes 0 for
* other uses, or the previous crc32 value if computing incrementally.
* @p: pointer to buffer over which CRC32 is run
* @len: length of buffer @p
* @polynomial: CRC32 BE polynomial
*/
static inline __u32 crc32_be_generic(__u32 crc, unsigned char const *p,
size_t len, __u32 polynomial)
{
int i;
while (len--) {
crc ^= *p++ << 24;
for (i = 0; i < 8; i++)
crc =
(crc << 1) ^ ((crc & 0x80000000) ? polynomial :
0);
}
return crc;
}
__u32 crc32_be(__u32 crc, unsigned char const *p, size_t len)
{
return crc32_be_generic(crc, p, len, CRCPOLY_BE);
}

74
dlink.c Normal file
View file

@ -0,0 +1,74 @@
/* doubly linked lists */
/* This is free software. No strings attached. No copyright claimed */
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#ifdef __dietlibc__
char *strncpy(char *dest, const char *src, size_t n) __THROW;
#endif
void *xcalloc(size_t num, size_t size);
#include "dlink.h"
void *dl_head()
{
void *h;
h = dl_alloc(0);
dl_next(h) = h;
dl_prev(h) = h;
return h;
}
void dl_free(void *v)
{
struct __dl_head *vv = v;
free(vv-1);
}
void dl_init(void *v)
{
dl_next(v) = v;
dl_prev(v) = v;
}
void dl_insert(void *head, void *val)
{
dl_next(val) = dl_next(head);
dl_prev(val) = head;
dl_next(dl_prev(val)) = val;
dl_prev(dl_next(val)) = val;
}
void dl_add(void *head, void *val)
{
dl_prev(val) = dl_prev(head);
dl_next(val) = head;
dl_next(dl_prev(val)) = val;
dl_prev(dl_next(val)) = val;
}
void dl_del(void *val)
{
if (dl_prev(val) == 0 || dl_next(val) == 0)
return;
dl_prev(dl_next(val)) = dl_prev(val);
dl_next(dl_prev(val)) = dl_next(val);
dl_prev(val) = dl_next(val) = 0;
}
char *dl_strndup(char *s, int l)
{
char *n;
if (s == NULL)
return NULL;
n = dl_newv(char, l+1);
strncpy(n, s, l+1);
n[l] = 0;
return n;
}
char *dl_strdup(char *s)
{
return dl_strndup(s, (int)strlen(s));
}

25
dlink.h Normal file
View file

@ -0,0 +1,25 @@
/* doubley linked lists */
/* This is free software. No strings attached. No copyright claimed */
struct __dl_head
{
void * dh_prev;
void * dh_next;
};
#define dl_alloc(size) ((void*)(((char*)xcalloc(1,(size)+sizeof(struct __dl_head)))+sizeof(struct __dl_head)))
#define dl_new(t) ((t*)dl_alloc(sizeof(t)))
#define dl_newv(t,n) ((t*)dl_alloc(sizeof(t)*n))
#define dl_next(p) *(&(((struct __dl_head*)(p))[-1].dh_next))
#define dl_prev(p) *(&(((struct __dl_head*)(p))[-1].dh_prev))
void *dl_head(void);
char *dl_strdup(char *);
char *dl_strndup(char *, int);
void dl_insert(void*, void*);
void dl_add(void*, void*);
void dl_del(void*);
void dl_free(void*);
void dl_init(void*);

280
external-reshape-design.txt Normal file
View file

@ -0,0 +1,280 @@
External Reshape
1 Problem statement
External (third-party metadata) reshape differs from native-metadata
reshape in three key ways:
1.1 Format specific constraints
In the native case reshape is limited by what is implemented in the
generic reshape routine (Grow_reshape()) and what is supported by the
kernel. There are exceptional cases where Grow_reshape() may block
operations when it knows that the kernel implementation is broken, but
otherwise the kernel is relied upon to be the final arbiter of what
reshape operations are supported.
In the external case the kernel, and the generic checks in
Grow_reshape(), become the super-set of what reshapes are possible. The
metadata format may not support, or have yet to implement a given
reshape type. The implication for Grow_reshape() is that it must query
the metadata handler and effect changes in the metadata before the new
geometry is posted to the kernel. The ->reshape_super method allows
Grow_reshape() to validate the requested operation and post the metadata
update.
1.2 Scope of reshape
Native metadata reshape is always performed at the array scope (no
metadata relationship with sibling arrays on the same disks). External
reshape, depending on the format, may not allow the number of member
disks to be changed in a subarray unless the change is simultaneously
applied to all subarrays in the container. For example the imsm format
requires all member disks to be a member of all subarrays, so a 4-disk
raid5 in a container that also houses a 4-disk raid10 array could not be
reshaped to 5 disks as the imsm format does not support a 5-disk raid10
representation. This requires the ->reshape_super method to check the
contents of the array and ask the user to run the reshape at container
scope (if all subarrays are agreeable to the change), or report an
error in the case where one subarray cannot support the change.
1.3 Monitoring / checkpointing
Reshape, unlike rebuild/resync, requires strict checkpointing to survive
interrupted reshape operations. For example when expanding a raid5
array the first few stripes of the array will be overwritten in a
destructive manner. When restarting the reshape process we need to know
the exact location of the last successfully written stripe, and we need
to restore the data in any partially overwritten stripe. Native
metadata stores this backup data in the unused portion of spares that
are being promoted to array members, or in an external backup file
(located on a non-involved block device).
The kernel is in charge of recording checkpoints of reshape progress,
but mdadm is delegated the task of managing the backup space which
involves:
1/ Identifying what data will be overwritten in the next unit of reshape
operation
2/ Suspending access to that region so that a snapshot of the data can
be transferred to the backup space.
3/ Allowing the kernel to reshape the saved region and setting the
boundary for the next backup.
In the external reshape case we want to preserve this mdadm
'reshape-manager' arrangement, but have a third actor, mdmon, to
consider. It is tempting to give the role of managing reshape to mdmon,
but that is counter to its role as a monitor, and conflicts with the
existing capabilities and role of mdadm to manage the progress of
reshape. For clarity the external reshape implementation maintains the
role of mdmon as a (mostly) passive recorder of raid events, and mdadm
treats it as it would the kernel in the native reshape case (modulo
needing to send explicit metadata update messages and checking that
mdmon took the expected action).
External reshape can use the generic md backup file as a fallback, but in the
optimal/firmware-compatible case the reshape-manager will use the metadata
specific areas for managing reshape. The implementation also needs to spawn a
reshape-manager per subarray when the reshape is being carried out at the
container level. For these two reasons the ->manage_reshape() method is
introduced. This method in addition to base tasks mentioned above:
1/ Processed each subarray one at a time in series - where appropriate.
2/ Uses either generic routines in Grow.c for md-style backup file
support, or uses the metadata-format specific location for storing
recovery data.
This aims to avoid a "midlayer mistake"[1] and lets the metadata handler
optionally take advantage of generic infrastructure in Grow.c
2 Details for specific reshape requests
There are quite a few moving pieces spread out across md, mdadm, and mdmon for
the support of external reshape, and there are several different types of
reshape that need to be comprehended by the implementation. A rundown of
these details follows.
2.0 General provisions:
Obtain an exclusive open on the container to make sure we are not
running concurrently with a Create() event.
2.1 Freezing sync_action
Before making any attempt at a reshape we 'freeze' every array in
the container to ensure no spare assignment or recovery happens.
This involves writing 'frozen' to sync_action and changing the '/'
after 'external:' in metadata_version to a '-'. mdmon knows that
this means not to perform any management.
Before doing this we check that all sync_actions are 'idle', which
is racy but still useful.
Afterwards we check that all member arrays have no spares
or partial spares (recovery_start != 'none') which would indicate a
race. If they do, we unfreeze again.
Once this completes we know all the arrays are stable. They may
still have failed devices as devices can fail at any time. However
we treat those like failures that happen during the reshape.
2.2 Reshape size
1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
initializes st->update_tail
2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the size change
is allowed (being performed at subarray scope / enough room) prepares a
metadata update
3/ mdadm::Grow_reshape(): flushes the metadata update (via
flush_metadata_update(), or ->sync_metadata())
4/ mdadm::Grow_reshape(): post the new size to the kernel
2.3 Reshape level (simple-takeover)
"simple-takeover" implies the level change can be satisfied without touching
sync_action
1/ mdadm::Grow_reshape(): checks if mdmon is running and optionally
initializes st->update_tail
2/ mdadm::Grow_reshape() calls ->reshape_super() to check that the level change
is allowed (being performed at subarray scope) prepares a
metadata update
2a/ raid10 --> raid0: degrade all mirror legs prior to calling
->reshape_super
3/ mdadm::Grow_reshape(): flushes the metadata update (via
flush_metadata_update(), or ->sync_metadata())
4/ mdadm::Grow_reshape(): post the new level to the kernel
2.4 Reshape chunk, layout
2.5 Reshape raid disks (grow)
1/ mdadm::Grow_reshape(): unconditionally initializes st->update_tail
because only redundant raid levels can modify the number of raid disks
2/ mdadm::Grow_reshape(): calls ->reshape_super() to check that the level
change is allowed (being performed at proper scope / permissible
geometry / proper spares available in the container), chooses
the spares to use, and prepares a metadata update.
3/ mdadm::Grow_reshape(): Converts each subarray in the container to the
raid level that can perform the reshape and starts mdmon.
4/ mdadm::Grow_reshape(): Pushes the update to mdmon.
5/ mdadm::Grow_reshape(): uses container_content to find details of
the spares and passes them to the kernel.
6/ mdadm::Grow_reshape(): gives raid_disks update to the kernel,
sets sync_max, sync_min, suspend_lo, suspend_hi all to zero,
and starts the reshape by writing 'reshape' to sync_action.
7/ mdmon::monitor notices the sync_action change and tells
managemon to check for new devices. managemon notices the new
devices, opens relevant sysfs file, and passes them all to
monitor.
8/ mdadm::Grow_reshape() calls ->manage_reshape to oversee the
rest of the reshape.
9/ mdadm::<format>->manage_reshape(): saves data that will be overwritten by
the kernel to either the backup file or the metadata specific location,
advances sync_max, waits for reshape, ping mdmon, repeat.
Meanwhile mdmon::read_and_act(): records checkpoints.
Specifically.
9a/ if the 'next' stripe to be reshaped will over-write
itself during reshape then:
9a.1/ increase suspend_hi to cover a suitable number of
stripes.
9a.2/ backup those stripes safely.
9a.3/ advance sync_max to allow those stripes to be backed up
9a.4/ when sync_completed indicates that those stripes have
been reshaped, manage_reshape must ping_manager
9a.5/ when mdmon notices that sync_completed has been updated,
it records the new checkpoint in the metadata
9a.6/ after the ping_manager, manage_reshape will increase
suspend_lo to allow access to those stripes again
9b/ if the 'next' stripe to be reshaped will over-write unused
space during reshape then we apply same process as above,
except that there is no need to back anything up.
Note that we *do* need to keep suspend_hi progressing as
it is not safe to write to the area-under-reshape. For
kernel-managed-metadata this protection is provided by
->reshape_safe, but that does not protect us in the case
of user-space-managed-metadata.
10/ mdadm::<format>->manage_reshape(): Once reshape completes changes the raid
level back to the nominal raid level (if necessary)
FIXME: native metadata does not have the capability to record the original
raid level in reshape-restart case because the kernel always records current
raid level to the metadata, whereas external metadata can masquerade at an
alternate level based on the reshape state.
2.6 Reshape raid disks (shrink)
3 Interaction with metadata handle.
The following calls are made into the metadata handler to assist
with initiating and monitoring a 'reshape'.
1/ ->reshape_super is called quite early (after only minimial
checks) to make sure that the metadata can record the new shape
and any necessary transitions. It may be passed a 'container'
or an individual array within a container, and it should notice
the difference and act accordingly.
When a reshape is requested against a container it is expected
that it should be applied to every array in the container,
however it is up to the metadata handler to determine final
policy.
If the reshape is supportable, the internal copy of the metadata
should be updated, and a metadata update suitable for sending
to mdmon should be queued.
If the reshape will involve converting spares into array members,
this must be recorded in the metadata too.
2/ ->container_content will be called to find out the new state
of all the array, or all arrays in the container. Any newly
added devices (with state==0 and raid_disk >= 0) will be added
to the array as spares with the relevant slot number.
It is likely that the info returned by ->container_content will
have ->reshape_active set, ->reshape_progress set to e.g. 0, and
new_* set appropriately. mdadm will use this information to
cause the correct reshape to start at an appropriate time.
3/ ->set_array_state will be called by mdmon when reshape has
started and again periodically as it progresses. This should
record the ->last_checkpoint as the point where reshape has
progressed to. When the reshape finished this will be called
again and it should notice that ->curr_action is no longer
'reshape' and so should record that the reshape has finished
providing 'last_checkpoint' has progressed suitably.
4/ ->manage_reshape will be called once the reshape has been set
up in the kernel but before sync_max has been moved from 0, so
no actual reshape will have happened.
->manage_reshape should call progress_reshape() to allow the
reshape to progress, and should back-up any data as indicated
by the return value. See the documentation of that function
for more details.
->manage_reshape will be called multiple times when a
container is being reshaped, once for each member array in
the container.
The progress of the metadata is as follows:
1/ mdadm sends a metadata update to mdmon which marks the array
as undergoing a reshape. This is set up by
->reshape_super and applied by ->process_update
For container-wide reshape, this happens once for the whole
container.
2/ mdmon notices progress via the sysfs files and calls
->set_array_state to update the state periodically
For container-wide reshape, this happens repeatedly for
one array, then repeatedly for the next, etc.
3/ mdmon notices when reshape has finished and call
->set_array_state to record the the reshape is complete.
For container-wide reshape, this happens once for each
member array.
...
[1]: Linux kernel design patterns - part 3, Neil Brown https://lwn.net/Articles/336262/

284
inventory Executable file
View file

@ -0,0 +1,284 @@
.gitignore
ANNOUNCE-3.0
ANNOUNCE-3.0.1
ANNOUNCE-3.0.2
ANNOUNCE-3.0.3
ANNOUNCE-3.1
ANNOUNCE-3.1.1
ANNOUNCE-3.1.2
ANNOUNCE-3.1.3
ANNOUNCE-3.1.4
ANNOUNCE-3.1.5
ANNOUNCE-3.2
ANNOUNCE-3.2.1
ANNOUNCE-3.2.2
ANNOUNCE-3.2.3
ANNOUNCE-3.2.4
ANNOUNCE-3.2.5
ANNOUNCE-3.2.6
ANNOUNCE-3.3
ANNOUNCE-3.3.1
ANNOUNCE-3.3.2
ANNOUNCE-3.3.3
ANNOUNCE-3.3.4
ANNOUNCE-3.4
ANNOUNCE-4.0
ANNOUNCE-4.1
ANNOUNCE-4.2
Assemble.c
Build.c
COPYING
ChangeLog
Create.c
Detail.c
Dump.c
Examine.c
Grow.c
INSTALL
Incremental.c
Kill.c
Makefile
Manage.c
Monitor.c
Query.c
README.initramfs
ReadMe.c
TODO
bitmap.c
bitmap.h
clustermd_tests/
clustermd_tests/00r10_Create
clustermd_tests/00r1_Create
clustermd_tests/01r10_Grow_bitmap-switch
clustermd_tests/01r10_Grow_resize
clustermd_tests/01r1_Grow_add
clustermd_tests/01r1_Grow_bitmap-switch
clustermd_tests/01r1_Grow_resize
clustermd_tests/02r10_Manage_add
clustermd_tests/02r10_Manage_add-spare
clustermd_tests/02r10_Manage_re-add
clustermd_tests/02r1_Manage_add
clustermd_tests/02r1_Manage_add-spare
clustermd_tests/02r1_Manage_re-add
clustermd_tests/03r10_switch-recovery
clustermd_tests/03r10_switch-resync
clustermd_tests/03r1_switch-recovery
clustermd_tests/03r1_switch-resync
clustermd_tests/cluster_conf
clustermd_tests/func.sh
config.c
coverity-gcc-hack.h
crc32.c
crc32.h
crc32c.c
dlink.c
dlink.h
external-reshape-design.txt
inventory
lib.c
makedist
managemon.c
mapfile.c
maps.c
md.4
md5.h
md_p.h
md_u.h
mdadm.8.in
mdadm.c
mdadm.conf-example
mdadm.conf.5
mdadm.h
mdadm.spec
mdmon-design.txt
mdmon.8
mdmon.c
mdmon.h
mdopen.c
mdstat.c
misc/
misc/mdcheck
misc/syslog-events
mkinitramfs
monitor.c
msg.c
msg.h
part.h
platform-intel.c
platform-intel.h
policy.c
probe_roms.c
probe_roms.h
pwgr.c
raid5extend.c
raid6check.8
raid6check.c
restripe.c
sg_io.c
sha1.c
sha1.h
super-ddf.c
super-gpt.c
super-intel.c
super-mbr.c
super0.c
super1.c
swap_super.c
sysfs.c
systemd/
systemd/SUSE-mdadm_env.sh
systemd/mdadm-grow-continue@.service
systemd/mdadm-last-resort@.service
systemd/mdadm-last-resort@.timer
systemd/mdadm.shutdown
systemd/mdcheck_continue.service
systemd/mdcheck_continue.timer
systemd/mdcheck_start.service
systemd/mdcheck_start.timer
systemd/mdmon@.service
systemd/mdmonitor-oneshot.service
systemd/mdmonitor-oneshot.timer
systemd/mdmonitor.service
test
tests/
tests/00linear
tests/00multipath
tests/00names
tests/00raid0
tests/00raid1
tests/00raid10
tests/00raid4
tests/00raid5
tests/00raid6
tests/00readonly
tests/01r1fail
tests/01r5fail
tests/01r5integ
tests/01raid6integ
tests/01replace
tests/02lineargrow
tests/02r1add
tests/02r1grow
tests/02r5grow
tests/02r6grow
tests/03assem-incr
tests/03r0assem
tests/03r5assem
tests/03r5assem-failed
tests/03r5assemV1
tests/04r0update
tests/04r1update
tests/04r5swap
tests/04update-metadata
tests/04update-uuid
tests/05r1-add-internalbitmap
tests/05r1-add-internalbitmap-v1a
tests/05r1-add-internalbitmap-v1b
tests/05r1-add-internalbitmap-v1c
tests/05r1-bitmapfile
tests/05r1-failfast
tests/05r1-grow-external
tests/05r1-grow-internal
tests/05r1-grow-internal-1
tests/05r1-internalbitmap
tests/05r1-internalbitmap-v1a
tests/05r1-internalbitmap-v1b
tests/05r1-internalbitmap-v1c
tests/05r1-n3-bitmapfile
tests/05r1-re-add
tests/05r1-re-add-nosuper
tests/05r1-remove-internalbitmap
tests/05r1-remove-internalbitmap-v1a
tests/05r1-remove-internalbitmap-v1b
tests/05r1-remove-internalbitmap-v1c
tests/05r5-bitmapfile
tests/05r5-internalbitmap
tests/05r6-bitmapfile
tests/05r6tor0
tests/06name
tests/06sysfs
tests/06wrmostly
tests/07autoassemble
tests/07autodetect
tests/07changelevelintr
tests/07changelevels
tests/07layouts
tests/07reshape5intr
tests/07revert-grow
tests/07revert-inplace
tests/07revert-shrink
tests/07testreshape5
tests/09imsm-assemble
tests/09imsm-create-fail-rebuild
tests/09imsm-overlap
tests/10ddf-assemble-missing
tests/10ddf-create
tests/10ddf-create-fail-rebuild
tests/10ddf-fail-create-race
tests/10ddf-fail-readd
tests/10ddf-fail-readd-readonly
tests/10ddf-fail-spare
tests/10ddf-fail-stop-readd
tests/10ddf-fail-twice
tests/10ddf-fail-two-spares
tests/10ddf-geometry
tests/10ddf-incremental-wrong-order
tests/10ddf-sudden-degraded
tests/11spare-migration
tests/12imsm-r0_2d-grow-r0_3d
tests/12imsm-r0_2d-grow-r0_4d
tests/12imsm-r0_2d-grow-r0_5d
tests/12imsm-r0_3d-grow-r0_4d
tests/12imsm-r5_3d-grow-r5_4d
tests/12imsm-r5_3d-grow-r5_5d
tests/13imsm-r0_r0_2d-grow-r0_r0_4d
tests/13imsm-r0_r0_2d-grow-r0_r0_5d
tests/13imsm-r0_r0_3d-grow-r0_r0_4d
tests/13imsm-r0_r5_3d-grow-r0_r5_4d
tests/13imsm-r0_r5_3d-grow-r0_r5_5d
tests/13imsm-r5_r0_3d-grow-r5_r0_4d
tests/13imsm-r5_r0_3d-grow-r5_r0_5d
tests/14imsm-r0_3d-r5_3d-migrate-r5_4d-r5_4d
tests/14imsm-r0_3d_no_spares-migrate-r5_3d
tests/14imsm-r0_r0_2d-takeover-r10_4d
tests/14imsm-r10_4d-grow-r10_5d
tests/14imsm-r10_r5_4d-takeover-r0_2d
tests/14imsm-r1_2d-grow-r1_3d
tests/14imsm-r1_2d-takeover-r0_2d
tests/14imsm-r5_3d-grow-r5_5d-no-spares
tests/14imsm-r5_3d-migrate-r4_3d
tests/15imsm-r0_3d_64k-migrate-r0_3d_256k
tests/15imsm-r5_3d_4k-migrate-r5_3d_256k
tests/15imsm-r5_3d_64k-migrate-r5_3d_256k
tests/15imsm-r5_6d_4k-migrate-r5_6d_256k
tests/15imsm-r5_r0_3d_64k-migrate-r5_r0_3d_256k
tests/16imsm-r0_3d-migrate-r5_4d
tests/16imsm-r0_5d-migrate-r5_6d
tests/16imsm-r5_3d-migrate-r0_3d
tests/16imsm-r5_5d-migrate-r0_5d
tests/18imsm-1d-takeover-r0_1d
tests/18imsm-1d-takeover-r1_2d
tests/18imsm-r0_2d-takeover-r10_4d
tests/18imsm-r10_4d-takeover-r0_2d
tests/18imsm-r1_2d-takeover-r0_1d
tests/19raid6auto-repair
tests/19raid6check
tests/19raid6repair
tests/19repair-does-not-destroy
tests/20raid5journal
tests/21raid5cache
tests/ToTest
tests/env-ddf-template
tests/env-imsm-template
tests/func.sh
tests/imsm-grow-template
tests/utils
udev-md-clustered-confirm-device.rules
udev-md-raid-arrays.rules
udev-md-raid-assembly.rules
udev-md-raid-creating.rules
udev-md-raid-safe-timeouts.rules
util.c
uuid.c
xmalloc.c

575
lib.c Normal file
View file

@ -0,0 +1,575 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2011 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include "dlink.h"
#include <ctype.h>
#include <limits.h>
bool is_dev_alive(char *path)
{
if (!path)
return false;
if (access(path, R_OK) == 0)
return true;
return false;
}
/* This fill contains various 'library' style function. They
* have no dependency on anything outside this file.
*/
int get_mdp_major(void)
{
static int mdp_major = -1;
FILE *fl;
char *w;
int have_block = 0;
int have_devices = 0;
int last_num = -1;
if (mdp_major != -1)
return mdp_major;
fl = fopen("/proc/devices", "r");
if (!fl)
return -1;
while ((w = conf_word(fl, 1))) {
if (have_block && strcmp(w, "devices:") == 0)
have_devices = 1;
have_block = (strcmp(w, "Block") == 0);
if (isdigit(w[0]))
last_num = atoi(w);
if (have_devices && strcmp(w, "mdp") == 0)
mdp_major = last_num;
free(w);
}
fclose(fl);
return mdp_major;
}
char *devid2kname(dev_t devid)
{
char path[30];
char link[PATH_MAX];
static char devnm[32];
char *cp;
int n;
/* Look at the
* /sys/dev/block/%d:%d link which must look like
* and take the last component.
*/
sprintf(path, "/sys/dev/block/%d:%d", major(devid), minor(devid));
n = readlink(path, link, sizeof(link) - 1);
if (n > 0) {
link[n] = 0;
cp = strrchr(link, '/');
if (cp) {
strcpy(devnm, cp + 1);
return devnm;
}
}
return NULL;
}
char *stat2kname(struct stat *st)
{
if ((S_IFMT & st->st_mode) != S_IFBLK)
return NULL;
return devid2kname(st->st_rdev);
}
char *fd2kname(int fd)
{
struct stat stb;
if (fstat(fd, &stb) == 0)
return stat2kname(&stb);
return NULL;
}
char *devid2devnm(dev_t devid)
{
char path[30];
char link[200];
static char devnm[32];
char *cp, *ep;
int n;
/* Might be an extended-minor partition or a
* named md device. Look at the
* /sys/dev/block/%d:%d link which must look like
* ../../block/mdXXX/mdXXXpYY
* or
* ...../block/md_FOO
*/
sprintf(path, "/sys/dev/block/%d:%d", major(devid), minor(devid));
n = readlink(path, link, sizeof(link) - 1);
if (n > 0) {
link[n] = 0;
cp = strstr(link, "/block/");
if (cp) {
cp += 7;
ep = strchr(cp, '/');
if (ep)
*ep = 0;
strcpy(devnm, cp);
return devnm;
}
}
if (major(devid) == MD_MAJOR)
sprintf(devnm,"md%d", minor(devid));
else if (major(devid) == (unsigned)get_mdp_major())
sprintf(devnm,"md_d%d",
(minor(devid)>>MdpMinorShift));
else
return NULL;
return devnm;
}
char *stat2devnm(struct stat *st)
{
if ((S_IFMT & st->st_mode) != S_IFBLK)
return NULL;
return devid2devnm(st->st_rdev);
}
char *fd2devnm(int fd)
{
struct stat stb;
if (fstat(fd, &stb) == 0)
return stat2devnm(&stb);
return NULL;
}
/* When we create a new array, we don't want the content to
* be immediately examined by udev - it is probably meaningless.
* So create /run/mdadm/creating-mdXXX and expect that a udev
* rule will noticed this and act accordingly.
*/
static char block_path[] = "/run/mdadm/creating-%s";
static char *unblock_path = NULL;
void udev_block(char *devnm)
{
int fd;
char *path = NULL;
xasprintf(&path, block_path, devnm);
fd = open(path, O_CREAT|O_RDWR, 0600);
if (fd >= 0) {
close(fd);
unblock_path = path;
} else
free(path);
}
void udev_unblock(void)
{
if (unblock_path)
unlink(unblock_path);
free(unblock_path);
unblock_path = NULL;
}
/*
* convert a major/minor pair for a block device into a name in /dev, if possible.
* On the first call, walk /dev collecting name.
* Put them in a simple linked listfor now.
*/
struct devmap {
int major, minor;
char *name;
struct devmap *next;
} *devlist = NULL;
int devlist_ready = 0;
int add_dev(const char *name, const struct stat *stb, int flag, struct FTW *s)
{
struct stat st;
if (S_ISLNK(stb->st_mode)) {
if (stat(name, &st) != 0)
return 0;
stb = &st;
}
if ((stb->st_mode&S_IFMT)== S_IFBLK) {
char *n = xstrdup(name);
struct devmap *dm = xmalloc(sizeof(*dm));
if (strncmp(n, "/dev/./", 7) == 0)
strcpy(n + 4, name + 6);
if (dm) {
dm->major = major(stb->st_rdev);
dm->minor = minor(stb->st_rdev);
dm->name = n;
dm->next = devlist;
devlist = dm;
}
}
return 0;
}
#ifndef HAVE_NFTW
#ifdef HAVE_FTW
int add_dev_1(const char *name, const struct stat *stb, int flag)
{
return add_dev(name, stb, flag, NULL);
}
int nftw(const char *path,
int (*han)(const char *name, const struct stat *stb,
int flag, struct FTW *s), int nopenfd, int flags)
{
return ftw(path, add_dev_1, nopenfd);
}
#else
int nftw(const char *path,
int (*han)(const char *name, const struct stat *stb,
int flag, struct FTW *s), int nopenfd, int flags)
{
return 0;
}
#endif /* HAVE_FTW */
#endif /* HAVE_NFTW */
/*
* Find a block device with the right major/minor number.
* If we find multiple names, choose the shortest.
* If we find a name in /dev/md/, we prefer that.
* This applies only to names for MD devices.
* If 'prefer' is set (normally to e.g. /by-path/)
* then we prefer a name which contains that string.
*/
char *map_dev_preferred(int major, int minor, int create,
char *prefer)
{
struct devmap *p;
char *regular = NULL, *preferred=NULL;
int did_check = 0;
if (major == 0 && minor == 0)
return NULL;
retry:
if (!devlist_ready) {
char *dev = "/dev";
struct stat stb;
while(devlist) {
struct devmap *d = devlist;
devlist = d->next;
free(d->name);
free(d);
}
if (lstat(dev, &stb) == 0 && S_ISLNK(stb.st_mode))
dev = "/dev/.";
nftw(dev, add_dev, 10, FTW_PHYS);
devlist_ready=1;
did_check = 1;
}
for (p = devlist; p; p = p->next)
if (p->major == major && p->minor == minor) {
if (strncmp(p->name, "/dev/md/",8) == 0 ||
(prefer && strstr(p->name, prefer))) {
if (preferred == NULL ||
strlen(p->name) < strlen(preferred))
preferred = p->name;
} else {
if (regular == NULL ||
strlen(p->name) < strlen(regular))
regular = p->name;
}
}
if (!regular && !preferred && !did_check) {
devlist_ready = 0;
goto retry;
}
if (create && !regular && !preferred) {
static char buf[30];
snprintf(buf, sizeof(buf), "%d:%d", major, minor);
regular = buf;
}
return preferred ? preferred : regular;
}
/* conf_word gets one word from the conf file.
* if "allow_key", then accept words at the start of a line,
* otherwise stop when such a word is found.
* We assume that the file pointer is at the end of a word, so the
* next character is a space, or a newline. If not, it is the start of a line.
*/
char *conf_word(FILE *file, int allow_key)
{
int wsize = 100;
int len = 0;
int c;
int quote;
int wordfound = 0;
char *word = xmalloc(wsize);
while (wordfound == 0) {
/* at the end of a word.. */
c = getc(file);
if (c == '#')
while (c != EOF && c != '\n')
c = getc(file);
if (c == EOF)
break;
if (c == '\n')
continue;
if (c != ' ' && c != '\t' && ! allow_key) {
ungetc(c, file);
break;
}
/* looks like it is safe to get a word here, if there is one */
quote = 0;
/* first, skip any spaces */
while (c == ' ' || c == '\t')
c = getc(file);
if (c != EOF && c != '\n' && c != '#') {
/* we really have a character of a word, so start saving it */
while (c != EOF && c != '\n' &&
(quote || (c != ' ' && c != '\t'))) {
wordfound = 1;
if (quote && c == quote)
quote = 0;
else if (quote == 0 && (c == '\'' || c == '"'))
quote = c;
else {
if (len == wsize-1) {
wsize += 100;
word = xrealloc(word, wsize);
}
word[len++] = c;
}
c = getc(file);
/* Hack for broken kernels (2.6.14-.24) that put
* "active(auto-read-only)"
* in /proc/mdstat instead of
* "active (auto-read-only)"
*/
if (c == '(' && len >= 6 &&
strncmp(word + len - 6, "active", 6) == 0)
c = ' ';
}
}
if (c != EOF)
ungetc(c, file);
}
word[len] = 0;
/* Further HACK for broken kernels.. 2.6.14-2.6.24 */
if (strcmp(word, "auto-read-only)") == 0)
strcpy(word, "(auto-read-only)");
/* printf("word is <%s>\n", word); */
if (!wordfound) {
free(word);
word = NULL;
}
return word;
}
void print_quoted(char *str)
{
/* Printf the string with surrounding quotes
* iff needed.
* If no space, tab, or quote - leave unchanged.
* Else print surrounded by " or ', swapping quotes
* when we find one that will cause confusion.
*/
char first_quote = 0, q;
char *c;
for (c = str; *c; c++) {
switch(*c) {
case '\'':
case '"':
first_quote = *c;
break;
case ' ':
case '\t':
first_quote = *c;
continue;
default:
continue;
}
break;
}
if (!first_quote) {
printf("%s", str);
return;
}
if (first_quote == '"')
q = '\'';
else
q = '"';
putchar(q);
for (c = str; *c; c++) {
if (*c == q) {
putchar(q);
q ^= '"' ^ '\'';
putchar(q);
}
putchar(*c);
}
putchar(q);
}
void print_escape(char *str)
{
/* print str, but change space and tab to '_'
* as is suitable for device names
*/
for (; *str; str++) {
switch (*str) {
case ' ':
case '\t':
putchar('_');
break;
case '/':
putchar('-');
break;
default:
putchar(*str);
}
}
}
int check_env(char *name)
{
char *val = getenv(name);
if (val && atoi(val) == 1)
return 1;
return 0;
}
int use_udev(void)
{
static int use = -1;
struct stat stb;
if (use < 0) {
use = ((stat("/dev/.udev", &stb) == 0 ||
stat("/run/udev", &stb) == 0) &&
check_env("MDADM_NO_UDEV") == 0);
}
return use;
}
unsigned long GCD(unsigned long a, unsigned long b)
{
while (a != b) {
if (a < b)
b -= a;
if (b < a)
a -= b;
}
return a;
}
/*
* conf_line reads one logical line from the conffile or mdstat.
* It skips comments and continues until it finds a line that starts
* with a non blank/comment. This character is pushed back for the next call
* A doubly linked list of words is returned.
* the first word will be a keyword. Other words will have had quotes removed.
*/
char *conf_line(FILE *file)
{
char *w;
char *list;
w = conf_word(file, 1);
if (w == NULL)
return NULL;
list = dl_strdup(w);
free(w);
dl_init(list);
while ((w = conf_word(file, 0))){
char *w2 = dl_strdup(w);
free(w);
dl_add(list, w2);
}
/* printf("got a line\n");*/
return list;
}
void free_line(char *line)
{
char *w;
for (w = dl_next(line); w != line; w = dl_next(line)) {
dl_del(w);
dl_free(w);
}
dl_free(line);
}
/**
* parse_num() - Parse int from string.
* @dest: Pointer to destination.
* @num: Pointer to string that is going to be parsed.
*
* If string contains anything after a number, error code is returned.
* The same happens when number is bigger than INT_MAX or smaller than 0.
* Writes to destination only if successfully read the number.
*
* Return: 0 on success, 1 otherwise.
*/
int parse_num(int *dest, char *num)
{
char *c = NULL;
long temp;
if (!num)
return 1;
errno = 0;
temp = strtol(num, &c, 10);
if (temp < 0 || temp > INT_MAX || *c || errno != 0 || num == c)
return 1;
*dest = temp;
return 0;
}

96
makedist Executable file
View file

@ -0,0 +1,96 @@
#!/bin/sh
# avoid silly sorting
export LANG=C
arg=$1
target=~/public_html/source/mdadm
if [ " $arg" = " test" ]
then
target=/tmp/mdadm-test
rm -rf $target
mkdir -p $target
fi
if [ -d $target ]
then :
else echo $target is not a directory
exit 2
fi
set `grep '^#define VERSION' ReadMe.c `
version=`echo $3 | sed -e 's/"//g'`
grep "^.TH MDADM 8 .. v$version" mdadm.8.in > /dev/null 2>&1 ||
{
echo mdadm.8.in does not mention version $version.
exit 1
}
grep "^.TH MDMON 8 .. v$version" mdmon.8 > /dev/null 2>&1 ||
{
echo mdmon.8 does not mention version $version.
exit 1
}
rpmv=`echo $version | tr - _`
grep "^Version: *$rpmv$" mdadm.spec > /dev/null 2>&1 ||
{
echo mdadm.spec does not mention version $version.
exit 1
}
if [ -f ANNOUNCE-$version ]
then :
else
echo ANNOUNCE-$version does not exist
exit 1
fi
if grep "^ANNOUNCE-$version\$" inventory
then :
else { cat inventory ; echo ANNOUNCE-$version ; } | sort -o inventory
fi
echo version = $version
base=mdadm-$rpmv.tar.gz
if [ " $arg" != " diff" ]
then
if [ -f $target/$base ]
then
echo $target/$base exists.
exit 1
fi
trap "rm $target/$base; exit" 1 2 3
git archive --prefix=mdadm-$rpmv/ HEAD | gzip --best > $target/$base
chmod a+r $target/$base
ls -l $target/$base
if tar tzf $target/$base | sed 's,[^/]*/,,' | sort | diff -u inventory -
then : correct files found
else echo "Extra files, or inventory is out-of-date"
rm $target/$base
exit 1
fi
rpmbuild -ta $target/$base || exit 1
find ~/rpmbuild/RPMS -name "*mdadm-$version-*" \
-exec cp {} $target/RPM \;
cp ANNOUNCE-$version $target/ANNOUNCE
cp ChangeLog $target/ChangeLog
if [ " $arg" != " test" ]
then
echo -n "Confirm signing this release? "
read a
if [ " $a" != " y" ]; then echo OK - bye. ; exit 1; fi
if zcat $target/$base | gpg -ba > $target/$base.sign && gpg -ba $target/ANNOUNCE
then
kup put $target/$base $target/$base.sign \
/pub/linux/utils/raid/mdadm/mdadm-$version.tar.gz
kup put $target/ANNOUNCE $target/ANNOUNCE.asc /pub/linux/utils/raid/mdadm/ANNOUNCE
else
echo signing failed
exit 1
fi
fi
else
if [ ! -f $target/$base ]
then
echo $target/$base does not exist.
exit 1
fi
( cd .. ; ln -s mdadm.v2 mdadm-$version ; tar chf - --exclude=.git --exclude="TAGS" --exclude='*,v' --exclude='*~' --exclude='*.o' --exclude mdadm --exclude=mdadm'.[^ch0-9]' --exclude=RCS mdadm-$version ; rm mdadm-$version ) | gzip --best > /var/tmp/mdadm-new.tgz
mkdir /var/tmp/mdadm-old ; zcat $target/$base | ( cd /var/tmp/mdadm-old ; tar xf - )
mkdir /var/tmp/mdadm-new ; zcat /var/tmp/mdadm-new.tgz | ( cd /var/tmp/mdadm-new ; tar xf - )
diff -ru /var/tmp/mdadm-old /var/tmp/mdadm-new
rm -rf /var/tmp/mdadm-old /var/tmp/mdadm-new /var/tmp/mdadm-new.tgz
fi

943
managemon.c Normal file
View file

@ -0,0 +1,943 @@
/*
* mdmon - monitor external metadata arrays
*
* Copyright (C) 2007-2009 Neil Brown <neilb@suse.de>
* Copyright (C) 2007-2009 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
*/
/*
* The management thread for monitoring active md arrays.
* This thread does things which might block such as memory
* allocation.
* In particular:
*
* - Find out about new arrays in this container.
* Allocate the data structures and open the files.
*
* For this we watch /proc/mdstat and find new arrays with
* metadata type that confirms sharing. e.g. "md4"
* When we find a new array we slip it into the list of
* arrays and signal 'monitor' by writing to a pipe.
*
* - Respond to reshape requests by allocating new data structures
* and opening new files.
*
* These come as a change to raid_disks. We allocate a new
* version of the data structures and slip it into the list.
* 'monitor' will notice and release the old version.
* Changes to level, chunksize, layout.. do not need re-allocation.
* Reductions in raid_disks don't really either, but we handle
* them the same way for consistency.
*
* - When a device is added to the container, we add it to the metadata
* as a spare.
*
* - Deal with degraded array
* We only do this when first noticing the array is degraded.
* This can be when we first see the array, when sync completes or
* when recovery completes.
*
* Check if number of failed devices suggests recovery is needed, and
* skip if not.
* Ask metadata to allocate a spare device
* Add device as not in_sync and give a role
* Update metadata.
* Open sysfs files and pass to monitor.
* Make sure that monitor Starts recovery....
*
* - Pass on metadata updates from external programs such as
* mdadm creating a new array.
*
* This is most-messy.
* It might involve adding a new array or changing the status of
* a spare, or any reconfig that the kernel doesn't get involved in.
*
* The required updates are received via a named pipe. There will
* be one named pipe for each container. Each message contains a
* sync marker: 0x5a5aa5a5, A byte count, and the message. This is
* passed to the metadata handler which will interpret and process it.
* For 'DDF' messages are internal data blocks with the leading
* 'magic number' signifying what sort of data it is.
*
*/
/*
* We select on /proc/mdstat and the named pipe.
* We create new arrays or updated version of arrays and slip
* them into the head of the list, then signal 'monitor' via a pipe write.
* 'monitor' will notice and place the old array on a return list.
* Metadata updates are placed on a queue just like they arrive
* from the named pipe.
*
* When new arrays are found based on correct metadata string, we
* need to identify them with an entry in the metadata. Maybe we require
* the metadata to be mdX/NN when NN is the index into an appropriate table.
*
*/
/*
* List of tasks:
* - Watch for spares to be added to the container, and write updated
* metadata to them.
* - Watch for new arrays using this container, confirm they match metadata
* and if so, start monitoring them
* - Watch for spares being added to monitored arrays. This shouldn't
* happen, as we should do all the adding. Just remove them.
* - Watch for change in raid-disks, chunk-size, etc. Update metadata and
* start a reshape.
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include "mdadm.h"
#include "mdmon.h"
#include <sys/syscall.h>
#include <sys/socket.h>
#include <signal.h>
static void close_aa(struct active_array *aa)
{
struct mdinfo *d;
for (d = aa->info.devs; d; d = d->next) {
close(d->recovery_fd);
close(d->state_fd);
close(d->bb_fd);
close(d->ubb_fd);
}
if (aa->action_fd >= 0)
close(aa->action_fd);
if (aa->info.state_fd >= 0)
close(aa->info.state_fd);
if (aa->resync_start_fd >= 0)
close(aa->resync_start_fd);
if (aa->metadata_fd >= 0)
close(aa->metadata_fd);
if (aa->sync_completed_fd >= 0)
close(aa->sync_completed_fd);
if (aa->safe_mode_delay_fd >= 0)
close(aa->safe_mode_delay_fd);
}
static void free_aa(struct active_array *aa)
{
/* Note that this doesn't close fds if they are being used
* by a clone. ->container will be set for a clone
*/
dprintf("sys_name: %s\n", aa->info.sys_name);
if (!aa->container)
close_aa(aa);
while (aa->info.devs) {
struct mdinfo *d = aa->info.devs;
aa->info.devs = d->next;
free(d);
}
free(aa);
}
static struct active_array *duplicate_aa(struct active_array *aa)
{
struct active_array *newa = xmalloc(sizeof(*newa));
struct mdinfo **dp1, **dp2;
*newa = *aa;
newa->next = NULL;
newa->replaces = NULL;
newa->info.next = NULL;
dp2 = &newa->info.devs;
for (dp1 = &aa->info.devs; *dp1; dp1 = &(*dp1)->next) {
struct mdinfo *d;
if ((*dp1)->state_fd < 0)
continue;
d = xmalloc(sizeof(*d));
*d = **dp1;
*dp2 = d;
dp2 = & d->next;
}
*dp2 = NULL;
return newa;
}
static void wakeup_monitor(void)
{
/* tgkill(getpid(), mon_tid, SIGUSR1); */
int pid = getpid();
syscall(SYS_tgkill, pid, mon_tid, SIGUSR1);
}
static void remove_old(void)
{
if (discard_this) {
discard_this->next = NULL;
free_aa(discard_this);
if (pending_discard == discard_this)
pending_discard = NULL;
discard_this = NULL;
wakeup_monitor();
}
}
static void replace_array(struct supertype *container,
struct active_array *old,
struct active_array *new)
{
/* To replace an array, we add it to the top of the list
* marked with ->replaces to point to the original.
* 'monitor' will take the original out of the list
* and put it on 'discard_this'. We take it from there
* and discard it.
*/
remove_old();
while (pending_discard) {
while (discard_this == NULL)
sleep(1);
remove_old();
}
pending_discard = old;
new->replaces = old;
new->next = container->arrays;
container->arrays = new;
wakeup_monitor();
}
struct metadata_update *update_queue = NULL;
struct metadata_update *update_queue_handled = NULL;
struct metadata_update *update_queue_pending = NULL;
static void free_updates(struct metadata_update **update)
{
while (*update) {
struct metadata_update *this = *update;
void **space_list = this->space_list;
*update = this->next;
free(this->buf);
free(this->space);
while (space_list) {
void *space = space_list;
space_list = *space_list;
free(space);
}
free(this);
}
}
void check_update_queue(struct supertype *container)
{
free_updates(&update_queue_handled);
if (update_queue == NULL &&
update_queue_pending) {
update_queue = update_queue_pending;
update_queue_pending = NULL;
wakeup_monitor();
}
}
static void queue_metadata_update(struct metadata_update *mu)
{
struct metadata_update **qp;
qp = &update_queue_pending;
while (*qp)
qp = & ((*qp)->next);
*qp = mu;
}
static void add_disk_to_container(struct supertype *st, struct mdinfo *sd)
{
int dfd;
char nm[20];
struct metadata_update *update = NULL;
mdu_disk_info_t dk = {
.number = -1,
.major = sd->disk.major,
.minor = sd->disk.minor,
.raid_disk = -1,
.state = 0,
};
dprintf("add %d:%d to container\n", sd->disk.major, sd->disk.minor);
sd->next = st->devs;
st->devs = sd;
sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);
dfd = dev_open(nm, O_RDWR);
if (dfd < 0)
return;
st->update_tail = &update;
st->ss->add_to_super(st, &dk, dfd, NULL, INVALID_SECTORS);
st->ss->write_init_super(st);
queue_metadata_update(update);
st->update_tail = NULL;
}
/*
* Create and queue update structure about the removed disks.
* The update is prepared by super type handler and passed to the monitor
* thread.
*/
static void remove_disk_from_container(struct supertype *st, struct mdinfo *sd)
{
struct metadata_update *update = NULL;
mdu_disk_info_t dk = {
.number = -1,
.major = sd->disk.major,
.minor = sd->disk.minor,
.raid_disk = -1,
.state = 0,
};
dprintf("remove %d:%d from container\n",
sd->disk.major, sd->disk.minor);
st->update_tail = &update;
st->ss->remove_from_super(st, &dk);
/* FIXME this write_init_super shouldn't be here.
* We have it after add_to_super to write to new device,
* but with 'remove' we don't ant to write to that device!
*/
st->ss->write_init_super(st);
queue_metadata_update(update);
st->update_tail = NULL;
}
static void manage_container(struct mdstat_ent *mdstat,
struct supertype *container)
{
/* Of interest here are:
* - if a new device has been added to the container, we
* add it to the array ignoring any metadata on it.
* - if a device has been removed from the container, we
* remove it from the device list and update the metadata.
* FIXME should we look for compatible metadata and take hints
* about spare assignment.... probably not.
*/
if (mdstat->devcnt != container->devcnt) {
struct mdinfo **cdp, *cd, *di, *mdi;
int found;
/* read /sys/block/NAME/md/dev-??/block/dev to find out
* what is there, and compare with container->info.devs
* To see what is removed and what is added.
* These need to be remove from, or added to, the array
*/
mdi = sysfs_read(-1, mdstat->devnm, GET_DEVS);
if (!mdi) {
/* invalidate the current count so we can try again */
container->devcnt = -1;
return;
}
/* check for removals */
for (cdp = &container->devs; *cdp; ) {
found = 0;
for (di = mdi->devs; di; di = di->next)
if (di->disk.major == (*cdp)->disk.major &&
di->disk.minor == (*cdp)->disk.minor) {
found = 1;
break;
}
if (!found) {
cd = *cdp;
*cdp = (*cdp)->next;
remove_disk_from_container(container, cd);
free(cd);
} else
cdp = &(*cdp)->next;
}
/* check for additions */
for (di = mdi->devs; di; di = di->next) {
for (cd = container->devs; cd; cd = cd->next)
if (di->disk.major == cd->disk.major &&
di->disk.minor == cd->disk.minor)
break;
if (!cd) {
struct mdinfo *newd = xmalloc(sizeof(*newd));
*newd = *di;
add_disk_to_container(container, newd);
}
}
sysfs_free(mdi);
container->devcnt = mdstat->devcnt;
}
}
static int sysfs_open2(char *devnum, char *name, char *attr)
{
int fd = sysfs_open(devnum, name, attr);
if (fd >= 0) {
/* seq_file in the kernel allocates buffer space
* on the first read. Do that now so 'monitor'
* never needs too.
*/
char buf[200];
if (read(fd, buf, sizeof(buf)) < 0)
/* pretend not to ignore return value */
return fd;
}
return fd;
}
static int disk_init_and_add(struct mdinfo *disk, struct mdinfo *clone,
struct active_array *aa)
{
if (!disk || !clone)
return -1;
*disk = *clone;
disk->recovery_fd = sysfs_open2(aa->info.sys_name, disk->sys_name,
"recovery_start");
if (disk->recovery_fd < 0)
return -1;
disk->state_fd = sysfs_open2(aa->info.sys_name, disk->sys_name, "state");
if (disk->state_fd < 0) {
close(disk->recovery_fd);
return -1;
}
disk->bb_fd = sysfs_open2(aa->info.sys_name, disk->sys_name,
"bad_blocks");
if (disk->bb_fd < 0) {
close(disk->recovery_fd);
close(disk->state_fd);
return -1;
}
disk->ubb_fd = sysfs_open2(aa->info.sys_name, disk->sys_name,
"unacknowledged_bad_blocks");
if (disk->ubb_fd < 0) {
close(disk->recovery_fd);
close(disk->state_fd);
close(disk->bb_fd);
return -1;
}
disk->prev_state = read_dev_state(disk->state_fd);
disk->curr_state = disk->prev_state;
disk->next = aa->info.devs;
aa->info.devs = disk;
return 0;
}
static void manage_member(struct mdstat_ent *mdstat,
struct active_array *a)
{
/* Compare mdstat info with known state of member array.
* We do not need to look for device state changes here, that
* is dealt with by the monitor.
*
* If a reshape is being requested, monitor will have noticed
* that sync_action changed and will have set check_reshape.
* We just need to see if new devices have appeared. All metadata
* updates will already have been processed.
*
* We also want to handle degraded arrays here by
* trying to find and assign a spare.
* We do that whenever the monitor tells us too.
*/
char buf[64];
int frozen;
struct supertype *container = a->container;
struct mdinfo *mdi;
if (container == NULL)
/* Raced with something */
return;
if (mdstat->active) {
// FIXME
a->info.array.raid_disks = mdstat->raid_disks;
// MORE
}
mdi = sysfs_read(-1, mdstat->devnm,
GET_COMPONENT|GET_CONSISTENCY_POLICY);
if (mdi) {
a->info.component_size = mdi->component_size;
a->info.consistency_policy = mdi->consistency_policy;
sysfs_free(mdi);
}
/* honor 'frozen' */
if (sysfs_get_str(&a->info, NULL, "metadata_version", buf, sizeof(buf)) > 0)
frozen = buf[9] == '-';
else
frozen = 1; /* can't read metadata_version assume the worst */
/* If sync_action is not 'idle' then don't try recovery now */
if (!frozen &&
sysfs_get_str(&a->info, NULL, "sync_action",
buf, sizeof(buf)) > 0 && strncmp(buf, "idle", 4) != 0)
frozen = 1;
if (mdstat->level) {
int level = map_name(pers, mdstat->level);
if (level == 0 || level == LEVEL_LINEAR) {
a->to_remove = 1;
wakeup_monitor();
return;
}
else if (a->info.array.level != level && level > 0) {
struct active_array *newa = duplicate_aa(a);
if (newa) {
newa->info.array.level = level;
replace_array(container, a, newa);
a = newa;
}
}
}
/* we are after monitor kick,
* so container field can be cleared - check it again
*/
if (a->container == NULL)
return;
if (sigterm && a->info.safe_mode_delay != 1 &&
a->safe_mode_delay_fd >= 0) {
long int new_delay = 1;
char delay[10];
ssize_t len;
len = snprintf(delay, sizeof(delay), "0.%03ld\n", new_delay);
if (write(a->safe_mode_delay_fd, delay, len) == len)
a->info.safe_mode_delay = new_delay;
}
/* We don't check the array while any update is pending, as it
* might container a change (such as a spare assignment) which
* could affect our decisions.
*/
if (a->check_degraded && !frozen &&
update_queue == NULL && update_queue_pending == NULL) {
struct metadata_update *updates = NULL;
struct mdinfo *newdev = NULL;
struct active_array *newa;
struct mdinfo *d;
a->check_degraded = 0;
/* The array may not be degraded, this is just a good time
* to check.
*/
newdev = container->ss->activate_spare(a, &updates);
if (!newdev)
return;
newa = duplicate_aa(a);
if (!newa)
goto out;
/* prevent the kernel from activating the disk(s) before we
* finish adding them
*/
dprintf("freezing %s\n", a->info.sys_name);
sysfs_set_str(&a->info, NULL, "sync_action", "frozen");
/* Add device to array and set offset/size/slot.
* and open files for each newdev */
for (d = newdev; d ; d = d->next) {
struct mdinfo *newd;
newd = xmalloc(sizeof(*newd));
if (sysfs_add_disk(&newa->info, d, 0) < 0) {
free(newd);
continue;
}
disk_init_and_add(newd, d, newa);
}
queue_metadata_update(updates);
updates = NULL;
while (update_queue_pending || update_queue) {
check_update_queue(container);
usleep(15*1000);
}
replace_array(container, a, newa);
if (sysfs_set_str(&a->info, NULL,
"sync_action", "recover") == 0)
newa->prev_action = recover;
dprintf("recovery started on %s\n", a->info.sys_name);
out:
while (newdev) {
d = newdev->next;
free(newdev);
newdev = d;
}
free_updates(&updates);
}
if (a->check_reshape) {
/* mdadm might have added some devices to the array.
* We want to disk_init_and_add any such device to a
* duplicate_aa and replace a with that.
* mdstat doesn't have enough info so we sysfs_read
* and look for new stuff.
*/
struct mdinfo *info, *d, *d2, *newd;
unsigned long long array_size;
struct active_array *newa = NULL;
a->check_reshape = 0;
info = sysfs_read(-1, mdstat->devnm,
GET_DEVS|GET_OFFSET|GET_SIZE|GET_STATE);
if (!info)
goto out2;
for (d = info->devs; d; d = d->next) {
if (d->disk.raid_disk < 0)
continue;
for (d2 = a->info.devs; d2; d2 = d2->next)
if (d2->disk.raid_disk ==
d->disk.raid_disk)
break;
if (d2)
/* already have this one */
continue;
if (!newa) {
newa = duplicate_aa(a);
if (!newa)
break;
}
newd = xmalloc(sizeof(*newd));
disk_init_and_add(newd, d, newa);
}
if (sysfs_get_ll(info, NULL, "array_size", &array_size) == 0 &&
a->info.custom_array_size > array_size*2) {
sysfs_set_num(info, NULL, "array_size",
a->info.custom_array_size/2);
}
out2:
sysfs_free(info);
if (newa)
replace_array(container, a, newa);
}
}
static int aa_ready(struct active_array *aa)
{
struct mdinfo *d;
int level = aa->info.array.level;
for (d = aa->info.devs; d; d = d->next)
if (d->state_fd < 0)
return 0;
if (aa->info.state_fd < 0)
return 0;
if (level > 0 && (aa->action_fd < 0 || aa->resync_start_fd < 0))
return 0;
if (!aa->container)
return 0;
return 1;
}
static void manage_new(struct mdstat_ent *mdstat,
struct supertype *container,
struct active_array *victim)
{
/* A new array has appeared in this container.
* Hopefully it is already recorded in the metadata.
* Check, then create the new array to report it to
* the monitor.
*/
struct active_array *new = NULL;
struct mdinfo *mdi = NULL, *di;
int i, inst;
int failed = 0;
char buf[40];
/* check if array is ready to be monitored */
if (!mdstat->active || !mdstat->level)
return;
if (strncmp(mdstat->level, "raid0", strlen("raid0")) == 0 ||
strncmp(mdstat->level, "linear", strlen("linear")) == 0)
return;
mdi = sysfs_read(-1, mdstat->devnm,
GET_LEVEL|GET_CHUNK|GET_DISKS|GET_COMPONENT|
GET_SAFEMODE|GET_DEVS|GET_OFFSET|GET_SIZE|GET_STATE|
GET_LAYOUT|GET_DEVS_ALL);
if (!mdi)
return;
new = xcalloc(1, sizeof(*new));
strcpy(new->info.sys_name, mdstat->devnm);
new->prev_state = new->curr_state = new->next_state = inactive;
new->prev_action= new->curr_action= new->next_action= idle;
new->container = container;
if (parse_num(&inst, to_subarray(mdstat, container->devnm)) != 0)
goto error;
new->info.array = mdi->array;
new->info.component_size = mdi->component_size;
for (i = 0; i < new->info.array.raid_disks; i++) {
struct mdinfo *newd = xmalloc(sizeof(*newd));
for (di = mdi->devs; di; di = di->next)
if (i == di->disk.raid_disk)
break;
if (disk_init_and_add(newd, di, new) != 0) {
if (newd)
free(newd);
failed++;
if (failed > new->info.array.failed_disks) {
/* we cannot properly monitor without all working disks */
new->container = NULL;
break;
}
}
}
new->action_fd = sysfs_open2(new->info.sys_name, NULL, "sync_action");
new->info.state_fd = sysfs_open2(new->info.sys_name, NULL, "array_state");
new->resync_start_fd = sysfs_open2(new->info.sys_name, NULL, "resync_start");
new->metadata_fd = sysfs_open2(new->info.sys_name, NULL, "metadata_version");
new->sync_completed_fd = sysfs_open2(new->info.sys_name, NULL, "sync_completed");
new->safe_mode_delay_fd = sysfs_open2(new->info.sys_name, NULL,
"safe_mode_delay");
dprintf("inst: %d action: %d state: %d\n", inst,
new->action_fd, new->info.state_fd);
if (mdi->safe_mode_delay >= 50)
/* Normal start, mdadm set this. */
new->info.safe_mode_delay = mdi->safe_mode_delay;
else
/* Restart, just pick a number */
new->info.safe_mode_delay = 5000;
sysfs_set_safemode(&new->info, new->info.safe_mode_delay);
/* reshape_position is set by mdadm in sysfs
* read this information for new arrays only (empty victim)
*/
if ((victim == NULL) &&
(sysfs_get_str(mdi, NULL, "sync_action", buf, 40) > 0) &&
(strncmp(buf, "reshape", 7) == 0)) {
if (sysfs_get_ll(mdi, NULL, "reshape_position",
&new->last_checkpoint) != 0)
new->last_checkpoint = 0;
else {
int data_disks = mdi->array.raid_disks;
if (mdi->array.level == 4 || mdi->array.level == 5)
data_disks--;
if (mdi->array.level == 6)
data_disks -= 2;
new->last_checkpoint /= data_disks;
}
dprintf("mdmon: New monitored array is under reshape.\n"
" Last checkpoint is: %llu\n",
new->last_checkpoint);
}
sysfs_free(mdi);
mdi = NULL;
/* if everything checks out tell the metadata handler we want to
* manage this instance
*/
if (!aa_ready(new) || container->ss->open_new(container, new, inst) < 0) {
goto error;
} else {
replace_array(container, victim, new);
if (failed) {
new->check_degraded = 1;
manage_member(mdstat, new);
}
}
return;
error:
pr_err("failed to monitor %s\n", mdstat->metadata_version);
if (new) {
new->container = NULL;
free_aa(new);
}
if (mdi)
sysfs_free(mdi);
}
void manage(struct mdstat_ent *mdstat, struct supertype *container)
{
/* We have just read mdstat and need to compare it with
* the known active arrays.
* Arrays with the wrong metadata are ignored.
*/
for ( ; mdstat ; mdstat = mdstat->next) {
struct active_array *a;
if (strcmp(mdstat->devnm, container->devnm) == 0) {
manage_container(mdstat, container);
continue;
}
if (!is_container_member(mdstat, container->devnm))
/* Not for this array */
continue;
/* Looks like a member of this container */
for (a = container->arrays; a; a = a->next) {
if (strcmp(mdstat->devnm, a->info.sys_name) == 0) {
if (a->container && a->to_remove == 0)
manage_member(mdstat, a);
break;
}
}
if ((a == NULL || !a->container) && !sigterm)
manage_new(mdstat, container, a);
}
}
static void handle_message(struct supertype *container, struct metadata_update *msg)
{
/* queue this metadata update through to the monitor */
struct metadata_update *mu;
if (msg->len <= 0)
while (update_queue_pending || update_queue) {
check_update_queue(container);
usleep(15*1000);
}
if (msg->len == 0) { /* ping_monitor */
int cnt;
cnt = monitor_loop_cnt;
if (cnt & 1)
cnt += 2; /* wait until next pselect */
else
cnt += 3; /* wait for 2 pselects */
wakeup_monitor();
while (monitor_loop_cnt - cnt < 0)
usleep(10 * 1000);
} else if (msg->len == -1) { /* ping_manager */
struct mdstat_ent *mdstat = mdstat_read(1, 0);
manage(mdstat, container);
free_mdstat(mdstat);
} else if (!sigterm) {
mu = xmalloc(sizeof(*mu));
mu->len = msg->len;
mu->buf = msg->buf;
msg->buf = NULL;
mu->space = NULL;
mu->space_list = NULL;
mu->next = NULL;
if (container->ss->prepare_update)
if (!container->ss->prepare_update(container, mu))
free_updates(&mu);
queue_metadata_update(mu);
}
}
void read_sock(struct supertype *container)
{
int fd;
struct metadata_update msg;
int terminate = 0;
long fl;
int tmo = 3; /* 3 second timeout before hanging up the socket */
fd = accept(container->sock, NULL, NULL);
if (fd < 0)
return;
fl = fcntl(fd, F_GETFL, 0);
fl |= O_NONBLOCK;
fcntl(fd, F_SETFL, fl);
do {
msg.buf = NULL;
/* read and validate the message */
if (receive_message(fd, &msg, tmo) == 0) {
handle_message(container, &msg);
if (msg.len == 0) {
/* ping reply with version */
msg.buf = Version;
msg.len = strlen(Version) + 1;
if (send_message(fd, &msg, tmo) < 0)
terminate = 1;
} else if (ack(fd, tmo) < 0)
terminate = 1;
} else
terminate = 1;
} while (!terminate);
close(fd);
}
int exit_now = 0;
int manager_ready = 0;
void do_manager(struct supertype *container)
{
struct mdstat_ent *mdstat;
sigset_t set;
sigprocmask(SIG_UNBLOCK, NULL, &set);
sigdelset(&set, SIGUSR1);
sigdelset(&set, SIGTERM);
do {
if (exit_now)
exit(0);
/* Can only 'manage' things if 'monitor' is not making
* structural changes to metadata, so need to check
* update_queue
*/
if (update_queue == NULL) {
mdstat = mdstat_read(1, 0);
manage(mdstat, container);
read_sock(container);
free_mdstat(mdstat);
}
remove_old();
check_update_queue(container);
manager_ready = 1;
if (sigterm)
wakeup_monitor();
if (update_queue == NULL)
mdstat_wait_fd(container->sock, &set);
else
/* If an update is happening, just wait for signal */
pselect(0, NULL, NULL, NULL, NULL, &set);
} while(1);
}

511
mapfile.c Normal file
View file

@ -0,0 +1,511 @@
/*
* mapfile - keep track of uuid <-> array mapping. Part of:
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2006-2010 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
* Paper: Neil Brown
* Novell Inc
* GPO Box Q1283
* QVB Post Office, NSW 1230
* Australia
*/
/* The mapfile is used to track arrays being created in --incremental
* mode. It particularly allows lookup from UUID to array device, but
* also allows the array device name to be easily found.
*
* The map file is line based with space separated fields. The fields are:
* Device id - mdX or mdpX where X is a number.
* metadata - 0.90 1.0 1.1 1.2 ddf ...
* UUID - uuid of the array
* path - path where device created: /dev/md/home
*
* The best place for the mapfile is /run/mdadm/map. Distros and users
* which have not switched to /run yet can choose a different location
* at compile time via MAP_DIR and MAP_FILE.
*/
#include "mdadm.h"
#include <sys/file.h>
#include <ctype.h>
#define MAP_READ 0
#define MAP_NEW 1
#define MAP_LOCK 2
#define MAP_DIRNAME 3
char *mapname[4] = {
MAP_DIR "/" MAP_FILE,
MAP_DIR "/" MAP_FILE ".new",
MAP_DIR "/" MAP_FILE ".lock",
MAP_DIR
};
int mapmode[3] = { O_RDONLY, O_RDWR|O_CREAT, O_RDWR|O_CREAT|O_TRUNC };
char *mapsmode[3] = { "r", "w", "w"};
FILE *open_map(int modenum)
{
int fd;
if ((mapmode[modenum] & O_CREAT))
/* Attempt to create directory, don't worry about
* failure.
*/
(void)mkdir(mapname[MAP_DIRNAME], 0755);
fd = open(mapname[modenum], mapmode[modenum], 0600);
if (fd >= 0)
return fdopen(fd, mapsmode[modenum]);
return NULL;
}
int map_write(struct map_ent *mel)
{
FILE *f;
int err;
f = open_map(MAP_NEW);
if (!f)
return 0;
for (; mel; mel = mel->next) {
if (mel->bad)
continue;
fprintf(f, "%s ", mel->devnm);
fprintf(f, "%s ", mel->metadata);
fprintf(f, "%08x:%08x:%08x:%08x ", mel->uuid[0],
mel->uuid[1], mel->uuid[2], mel->uuid[3]);
fprintf(f, "%s\n", mel->path?:"");
}
fflush(f);
err = ferror(f);
fclose(f);
if (err) {
unlink(mapname[1]);
return 0;
}
return rename(mapname[1],
mapname[0]) == 0;
}
static FILE *lf = NULL;
int map_lock(struct map_ent **melp)
{
while (lf == NULL) {
struct stat buf;
lf = open_map(MAP_LOCK);
if (lf == NULL)
return -1;
if (flock(fileno(lf), LOCK_EX) != 0) {
fclose(lf);
lf = NULL;
return -1;
}
if (fstat(fileno(lf), &buf) != 0 ||
buf.st_nlink == 0) {
/* The owner of the lock unlinked it,
* so we have a lock on a stale file,
* try again
*/
fclose(lf);
lf = NULL;
}
}
if (*melp)
map_free(*melp);
map_read(melp);
return 0;
}
void map_unlock(struct map_ent **melp)
{
if (lf) {
/* must unlink before closing the file,
* as only the owner of the lock may
* unlink the file
*/
unlink(mapname[2]);
fclose(lf);
}
if (*melp)
map_free(*melp);
lf = NULL;
}
void map_fork(void)
{
/* We are forking, so must close the lock file.
* Don't risk flushing anything though.
*/
if (lf) {
close(fileno(lf));
fclose(lf);
lf = NULL;
}
}
void map_add(struct map_ent **melp,
char * devnm, char *metadata, int uuid[4], char *path)
{
struct map_ent *me = xmalloc(sizeof(*me));
strcpy(me->devnm, devnm);
strcpy(me->metadata, metadata);
memcpy(me->uuid, uuid, 16);
me->path = path ? xstrdup(path) : NULL;
me->next = *melp;
me->bad = 0;
*melp = me;
}
void map_read(struct map_ent **melp)
{
FILE *f;
char buf[8192];
char path[201];
int uuid[4];
char devnm[32];
char metadata[30];
*melp = NULL;
f = open_map(MAP_READ);
if (!f) {
RebuildMap();
f = open_map(MAP_READ);
}
if (!f)
return;
while (fgets(buf, sizeof(buf), f)) {
path[0] = 0;
if (sscanf(buf, " %s %s %x:%x:%x:%x %200s",
devnm, metadata, uuid, uuid+1,
uuid+2, uuid+3, path) >= 7) {
map_add(melp, devnm, metadata, uuid, path);
}
}
fclose(f);
}
void map_free(struct map_ent *map)
{
while (map) {
struct map_ent *mp = map;
map = mp->next;
free(mp->path);
free(mp);
}
}
int map_update(struct map_ent **mpp, char *devnm, char *metadata,
int uuid[4], char *path)
{
struct map_ent *map, *mp;
int rv;
if (mpp && *mpp)
map = *mpp;
else
map_read(&map);
for (mp = map ; mp ; mp=mp->next)
if (strcmp(mp->devnm, devnm) == 0) {
strcpy(mp->metadata, metadata);
memcpy(mp->uuid, uuid, 16);
free(mp->path);
mp->path = path ? xstrdup(path) : NULL;
mp->bad = 0;
break;
}
if (!mp)
map_add(&map, devnm, metadata, uuid, path);
if (mpp)
*mpp = NULL;
rv = map_write(map);
map_free(map);
return rv;
}
void map_delete(struct map_ent **mapp, char *devnm)
{
struct map_ent *mp;
if (*mapp == NULL)
map_read(mapp);
for (mp = *mapp; mp; mp = *mapp) {
if (strcmp(mp->devnm, devnm) == 0) {
*mapp = mp->next;
free(mp->path);
free(mp);
} else
mapp = & mp->next;
}
}
void map_remove(struct map_ent **mapp, char *devnm)
{
if (devnm[0] == 0)
return;
map_delete(mapp, devnm);
map_write(*mapp);
map_free(*mapp);
*mapp = NULL;
}
struct map_ent *map_by_uuid(struct map_ent **map, int uuid[4])
{
struct map_ent *mp;
if (!*map)
map_read(map);
for (mp = *map ; mp ; mp = mp->next) {
if (memcmp(uuid, mp->uuid, 16) != 0)
continue;
if (!mddev_busy(mp->devnm)) {
mp->bad = 1;
continue;
}
return mp;
}
return NULL;
}
struct map_ent *map_by_devnm(struct map_ent **map, char *devnm)
{
struct map_ent *mp;
if (!*map)
map_read(map);
for (mp = *map ; mp ; mp = mp->next) {
if (strcmp(mp->devnm, devnm) != 0)
continue;
if (!mddev_busy(mp->devnm)) {
mp->bad = 1;
continue;
}
return mp;
}
return NULL;
}
struct map_ent *map_by_name(struct map_ent **map, char *name)
{
struct map_ent *mp;
if (!*map)
map_read(map);
for (mp = *map ; mp ; mp = mp->next) {
if (!mp->path)
continue;
if (strncmp(mp->path, "/dev/md/", 8) != 0)
continue;
if (strcmp(mp->path+8, name) != 0)
continue;
if (!mddev_busy(mp->devnm)) {
mp->bad = 1;
continue;
}
return mp;
}
return NULL;
}
/* sets the proper subarray and container_dev according to the metadata
* version super_by_fd does this automatically, this routine is meant as
* a supplement for guess_super()
*/
static char *get_member_info(struct mdstat_ent *ent)
{
if (ent->metadata_version == NULL ||
strncmp(ent->metadata_version, "external:", 9) != 0)
return NULL;
if (is_subarray(&ent->metadata_version[9])) {
char *subarray;
subarray = strrchr(ent->metadata_version, '/');
return subarray + 1;
}
return NULL;
}
void RebuildMap(void)
{
struct mdstat_ent *mdstat = mdstat_read(0, 0);
struct mdstat_ent *md;
struct map_ent *map = NULL;
int require_homehost;
char sys_hostname[256];
char *homehost = conf_get_homehost(&require_homehost);
if (homehost == NULL || strcmp(homehost, "<system>")==0) {
if (gethostname(sys_hostname, sizeof(sys_hostname)) == 0) {
sys_hostname[sizeof(sys_hostname)-1] = 0;
homehost = sys_hostname;
}
}
for (md = mdstat ; md ; md = md->next) {
struct mdinfo *sra = sysfs_read(-1, md->devnm, GET_DEVS);
struct mdinfo *sd;
if (!sra)
continue;
for (sd = sra->devs ; sd ; sd = sd->next) {
char namebuf[100];
char dn[30];
int dfd;
int ok;
dev_t devid;
struct supertype *st;
char *subarray = NULL;
char *path;
struct mdinfo *info;
sprintf(dn, "%d:%d", sd->disk.major, sd->disk.minor);
dfd = dev_open(dn, O_RDONLY);
if (dfd < 0)
continue;
st = guess_super(dfd);
if ( st == NULL)
ok = -1;
else {
subarray = get_member_info(md);
ok = st->ss->load_super(st, dfd, NULL);
}
close(dfd);
if (ok != 0)
continue;
if (subarray)
info = st->ss->container_content(st, subarray);
else {
info = xmalloc(sizeof(*info));
st->ss->getinfo_super(st, info, NULL);
}
if (!info)
continue;
devid = devnm2devid(md->devnm);
path = map_dev(major(devid), minor(devid), 0);
if (path == NULL ||
strncmp(path, "/dev/md/", 8) != 0) {
/* We would really like a name that provides
* an MD_DEVNAME for udev.
* The name needs to be unique both in /dev/md/
* and in this mapfile.
* It needs to match what -I or -As would come
* up with.
* That means:
* Check if array is in mdadm.conf
* - if so use that.
* determine trustworthy from homehost etc
* find a unique name based on metadata name.
*
*/
struct mddev_ident *match = conf_match(st, info,
NULL, 0,
NULL);
struct stat stb;
if (match && match->devname && match->devname[0] == '/') {
path = match->devname;
if (path[0] != '/') {
strcpy(namebuf, "/dev/md/");
strcat(namebuf, path);
path = namebuf;
}
} else {
int unum = 0;
char *sep = "_";
const char *name;
int conflict = 1;
if ((homehost == NULL ||
st->ss->match_home(st, homehost) != 1) &&
st->ss->match_home(st, "any") != 1 &&
(require_homehost ||
!conf_name_is_free(info->name)))
/* require a numeric suffix */
unum = 0;
else
/* allow name to be used as-is if no conflict */
unum = -1;
name = info->name;
if (!*name) {
name = st->ss->name;
if (!isdigit(name[strlen(name)-1]) &&
unum == -1) {
unum = 0;
sep = "";
}
}
if (strchr(name, ':')) {
/* Probably a uniquifying
* hostname prefix. Allow
* without a suffix, and strip
* hostname if it is us.
*/
if (homehost && unum == -1 &&
strncmp(name, homehost,
strlen(homehost)) == 0 &&
name[strlen(homehost)] == ':')
name += strlen(homehost)+1;
unum = -1;
}
while (conflict) {
if (unum >= 0)
sprintf(namebuf, "/dev/md/%s%s%d",
name, sep, unum);
else
sprintf(namebuf, "/dev/md/%s",
name);
unum++;
if (lstat(namebuf, &stb) != 0 &&
(map == NULL ||
!map_by_name(&map, namebuf+8)))
conflict = 0;
}
path = namebuf;
}
}
map_add(&map, md->devnm,
info->text_version,
info->uuid, path);
st->ss->free_super(st);
free(info);
break;
}
sysfs_free(sra);
}
/* Only trigger a change if we wrote a new map file */
if (map_write(map))
for (md = mdstat ; md ; md = md->next) {
struct mdinfo *sra = sysfs_read(-1, md->devnm,
GET_VERSION);
if (sra)
sysfs_uevent(sra, "change");
sysfs_free(sra);
}
map_free(map);
free_mdstat(mdstat);
}

185
maps.c Normal file
View file

@ -0,0 +1,185 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2011 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
/* name/number mappings */
mapping_t r5layout[] = {
{ "left-asymmetric", ALGORITHM_LEFT_ASYMMETRIC},
{ "right-asymmetric", ALGORITHM_RIGHT_ASYMMETRIC},
{ "left-symmetric", ALGORITHM_LEFT_SYMMETRIC},
{ "right-symmetric", ALGORITHM_RIGHT_SYMMETRIC},
{ "default", ALGORITHM_LEFT_SYMMETRIC},
{ "la", ALGORITHM_LEFT_ASYMMETRIC},
{ "ra", ALGORITHM_RIGHT_ASYMMETRIC},
{ "ls", ALGORITHM_LEFT_SYMMETRIC},
{ "rs", ALGORITHM_RIGHT_SYMMETRIC},
{ "parity-first", ALGORITHM_PARITY_0},
{ "parity-last", ALGORITHM_PARITY_N},
{ "ddf-zero-restart", ALGORITHM_RIGHT_ASYMMETRIC},
{ "ddf-N-restart", ALGORITHM_LEFT_ASYMMETRIC},
{ "ddf-N-continue", ALGORITHM_LEFT_SYMMETRIC},
{ NULL, UnSet }
};
mapping_t r6layout[] = {
{ "left-asymmetric", ALGORITHM_LEFT_ASYMMETRIC},
{ "right-asymmetric", ALGORITHM_RIGHT_ASYMMETRIC},
{ "left-symmetric", ALGORITHM_LEFT_SYMMETRIC},
{ "right-symmetric", ALGORITHM_RIGHT_SYMMETRIC},
{ "default", ALGORITHM_LEFT_SYMMETRIC},
{ "la", ALGORITHM_LEFT_ASYMMETRIC},
{ "ra", ALGORITHM_RIGHT_ASYMMETRIC},
{ "ls", ALGORITHM_LEFT_SYMMETRIC},
{ "rs", ALGORITHM_RIGHT_SYMMETRIC},
{ "parity-first", ALGORITHM_PARITY_0},
{ "parity-last", ALGORITHM_PARITY_N},
{ "ddf-zero-restart", ALGORITHM_ROTATING_ZERO_RESTART},
{ "ddf-N-restart", ALGORITHM_ROTATING_N_RESTART},
{ "ddf-N-continue", ALGORITHM_ROTATING_N_CONTINUE},
{ "left-asymmetric-6", ALGORITHM_LEFT_ASYMMETRIC_6},
{ "right-asymmetric-6", ALGORITHM_RIGHT_ASYMMETRIC_6},
{ "left-symmetric-6", ALGORITHM_LEFT_SYMMETRIC_6},
{ "right-symmetric-6", ALGORITHM_RIGHT_SYMMETRIC_6},
{ "parity-first-6", ALGORITHM_PARITY_0_6},
{ NULL, UnSet }
};
/* raid0 layout is only needed because of a bug in 3.14 which changed
* the effective layout of raid0 arrays with varying device sizes.
*/
mapping_t r0layout[] = {
{ "original", RAID0_ORIG_LAYOUT},
{ "alternate", RAID0_ALT_MULTIZONE_LAYOUT},
{ "1", 1}, /* aka ORIG */
{ "2", 2}, /* aka ALT */
{ "dangerous", 0},
{ NULL, UnSet},
};
mapping_t pers[] = {
{ "linear", LEVEL_LINEAR},
{ "raid0", 0},
{ "0", 0},
{ "stripe", 0},
{ "raid1", 1},
{ "1", 1},
{ "mirror", 1},
{ "raid4", 4},
{ "4", 4},
{ "raid5", 5},
{ "5", 5},
{ "multipath", LEVEL_MULTIPATH},
{ "mp", LEVEL_MULTIPATH},
{ "raid6", 6},
{ "6", 6},
{ "raid10", 10},
{ "10", 10},
{ "faulty", LEVEL_FAULTY},
{ "container", LEVEL_CONTAINER},
{ NULL, UnSet }
};
mapping_t modes[] = {
{ "assemble", ASSEMBLE},
{ "build", BUILD},
{ "create", CREATE},
{ "manage", MANAGE},
{ "misc", MISC},
{ "monitor", MONITOR},
{ "grow", GROW},
{ "incremental", INCREMENTAL},
{ "auto-detect", AUTODETECT},
{ NULL, UnSet }
};
mapping_t faultylayout[] = {
{ "write-transient", WriteTransient },
{ "wt", WriteTransient },
{ "read-transient", ReadTransient },
{ "rt", ReadTransient },
{ "write-persistent", WritePersistent },
{ "wp", WritePersistent },
{ "read-persistent", ReadPersistent },
{ "rp", ReadPersistent },
{ "write-all", WriteAll },
{ "wa", WriteAll },
{ "read-fixable", ReadFixable },
{ "rf", ReadFixable },
{ "clear", ClearErrors},
{ "flush", ClearFaults},
{ "none", ClearErrors},
{ "default", ClearErrors},
{ NULL, UnSet }
};
mapping_t consistency_policies[] = {
{ "unknown", CONSISTENCY_POLICY_UNKNOWN},
{ "none", CONSISTENCY_POLICY_NONE},
{ "resync", CONSISTENCY_POLICY_RESYNC},
{ "bitmap", CONSISTENCY_POLICY_BITMAP},
{ "journal", CONSISTENCY_POLICY_JOURNAL},
{ "ppl", CONSISTENCY_POLICY_PPL},
{ NULL, CONSISTENCY_POLICY_UNKNOWN }
};
mapping_t sysfs_array_states[] = {
{ "active-idle", ARRAY_ACTIVE_IDLE },
{ "active", ARRAY_ACTIVE },
{ "clear", ARRAY_CLEAR },
{ "inactive", ARRAY_INACTIVE },
{ "suspended", ARRAY_SUSPENDED },
{ "readonly", ARRAY_READONLY },
{ "read-auto", ARRAY_READ_AUTO },
{ "clean", ARRAY_CLEAN },
{ "write-pending", ARRAY_WRITE_PENDING },
{ "broken", ARRAY_BROKEN },
{ NULL, ARRAY_UNKNOWN_STATE }
};
char *map_num(mapping_t *map, int num)
{
while (map->name) {
if (map->num == num)
return map->name;
map++;
}
return NULL;
}
int map_name(mapping_t *map, char *name)
{
while (map->name && strcmp(map->name, name) != 0)
map++;
return map->num;
}

1317
md.4 Normal file

File diff suppressed because it is too large Load diff

136
md5.h Normal file
View file

@ -0,0 +1,136 @@
/* Declaration of functions and data types used for MD5 sum computing
library functions.
Copyright (C) 1995-1997,1999-2005 Free Software Foundation, Inc.
NOTE: The canonical source of this file is maintained with the GNU C
Library. Bugs can be reported to bug-glibc@prep.ai.mit.edu.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */
#ifndef _MD5_H
#define _MD5_H 1
#include <stdio.h>
#if HAVE_INTTYPES_H
# include <inttypes.h>
#endif
#if HAVE_STDINT_H || _LIBC || defined __UCLIBC__
# include <stdint.h>
#endif
#ifndef __GNUC_PREREQ
# if defined __GNUC__ && defined __GNUC_MINOR__
# define __GNUC_PREREQ(maj, min) \
((__GNUC__ << 16) + __GNUC_MINOR__ >= ((maj) << 16) + (min))
# else
# define __GNUC_PREREQ(maj, min) 0
# endif
#endif
#ifndef __THROW
# if defined __cplusplus && __GNUC_PREREQ (2,8)
# define __THROW throw ()
# else
# define __THROW
# endif
#endif
#ifndef __attribute__
# if ! __GNUC_PREREQ (2,8) || __STRICT_ANSI__
# define __attribute__(x)
# endif
#endif
#ifndef _LIBC
# define __md5_buffer md5_buffer
# define __md5_finish_ctx md5_finish_ctx
# define __md5_init_ctx md5_init_ctx
# define __md5_process_block md5_process_block
# define __md5_process_bytes md5_process_bytes
# define __md5_read_ctx md5_read_ctx
# define __md5_stream md5_stream
#endif
typedef uint32_t md5_uint32;
/* Structure to save state of computation between the single steps. */
struct md5_ctx
{
md5_uint32 A;
md5_uint32 B;
md5_uint32 C;
md5_uint32 D;
md5_uint32 total[2];
md5_uint32 buflen;
char buffer[128] __attribute__ ((__aligned__ (__alignof__ (md5_uint32))));
};
/*
* The following three functions are build up the low level used in
* the functions `md5_stream' and `md5_buffer'.
*/
/* Initialize structure containing state of computation.
(RFC 1321, 3.3: Step 3) */
extern void __md5_init_ctx (struct md5_ctx *ctx) __THROW;
/* Starting with the result of former calls of this function (or the
initialization function update the context for the next LEN bytes
starting at BUFFER.
It is necessary that LEN is a multiple of 64!!! */
extern void __md5_process_block (const void *buffer, size_t len,
struct md5_ctx *ctx) __THROW;
/* Starting with the result of former calls of this function (or the
initialization function update the context for the next LEN bytes
starting at BUFFER.
It is NOT required that LEN is a multiple of 64. */
extern void __md5_process_bytes (const void *buffer, size_t len,
struct md5_ctx *ctx) __THROW;
/* Process the remaining bytes in the buffer and put result from CTX
in first 16 bytes following RESBUF. The result is always in little
endian byte order, so that a byte-wise output yields to the wanted
ASCII representation of the message digest.
IMPORTANT: On some systems it is required that RESBUF be correctly
aligned for a 32 bits value. */
extern void *__md5_finish_ctx (struct md5_ctx *ctx, void *resbuf) __THROW;
/* Put result from CTX in first 16 bytes following RESBUF. The result is
always in little endian byte order, so that a byte-wise output yields
to the wanted ASCII representation of the message digest.
IMPORTANT: On some systems it is required that RESBUF is correctly
aligned for a 32 bits value. */
extern void *__md5_read_ctx (const struct md5_ctx *ctx, void *resbuf) __THROW;
/* Compute MD5 message digest for bytes read from STREAM. The
resulting message digest number will be written into the 16 bytes
beginning at RESBLOCK. */
extern int __md5_stream (FILE *stream, void *resblock) __THROW;
/* Compute MD5 message digest for LEN bytes beginning at BUFFER. The
result is always in little endian byte order, so that a byte-wise
output yields to the wanted ASCII representation of the message
digest. */
extern void *__md5_buffer (const char *buffer, size_t len,
void *resblock) __THROW;
#endif /* md5.h */

295
md_p.h Normal file
View file

@ -0,0 +1,295 @@
/*
md_p.h : physical layout of Linux RAID devices
Copyright (C) 1996-98 Ingo Molnar, Gadi Oxman
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
You should have received a copy of the GNU General Public License
(for example /usr/src/linux/COPYING); if not, write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
#ifndef _MD_P_H
#define _MD_P_H
/*
* RAID superblock.
*
* The RAID superblock maintains some statistics on each RAID configuration.
* Each real device in the RAID set contains it near the end of the device.
* Some of the ideas are copied from the ext2fs implementation.
*
* We currently use 4096 bytes as follows:
*
* word offset function
*
* 0 - 31 Constant generic RAID device information.
* 32 - 63 Generic state information.
* 64 - 127 Personality specific information.
* 128 - 511 12 32-words descriptors of the disks in the raid set.
* 512 - 911 Reserved.
* 912 - 1023 Disk specific descriptor.
*/
/*
* If x is the real device size in bytes, we return an apparent size of:
*
* y = (x & ~(MD_RESERVED_BYTES - 1)) - MD_RESERVED_BYTES
*
* and place the 4kB superblock at offset y.
*/
#define MD_RESERVED_BYTES (64 * 1024)
#define MD_RESERVED_SECTORS (MD_RESERVED_BYTES / 512)
#define MD_RESERVED_BLOCKS (MD_RESERVED_BYTES / BLOCK_SIZE)
#define MD_NEW_SIZE_SECTORS(x) ((x & ~(MD_RESERVED_SECTORS - 1)) - MD_RESERVED_SECTORS)
#define MD_NEW_SIZE_BLOCKS(x) ((x & ~(MD_RESERVED_BLOCKS - 1)) - MD_RESERVED_BLOCKS)
#define MD_SB_BYTES 4096
#define MD_SB_WORDS (MD_SB_BYTES / 4)
#define MD_SB_BLOCKS (MD_SB_BYTES / BLOCK_SIZE)
#define MD_SB_SECTORS (MD_SB_BYTES / 512)
/*
* The following are counted in 32-bit words
*/
#define MD_SB_GENERIC_OFFSET 0
#define MD_SB_PERSONALITY_OFFSET 64
#define MD_SB_DISKS_OFFSET 128
#define MD_SB_DESCRIPTOR_OFFSET 992
#define MD_SB_GENERIC_CONSTANT_WORDS 32
#define MD_SB_GENERIC_STATE_WORDS 32
#define MD_SB_GENERIC_WORDS (MD_SB_GENERIC_CONSTANT_WORDS + MD_SB_GENERIC_STATE_WORDS)
#define MD_SB_PERSONALITY_WORDS 64
#define MD_SB_DESCRIPTOR_WORDS 32
#define MD_SB_DISKS 27
#define MD_SB_DISKS_WORDS (MD_SB_DISKS*MD_SB_DESCRIPTOR_WORDS)
#define MD_SB_RESERVED_WORDS (1024 - MD_SB_GENERIC_WORDS - MD_SB_PERSONALITY_WORDS - MD_SB_DISKS_WORDS - MD_SB_DESCRIPTOR_WORDS)
#define MD_SB_EQUAL_WORDS (MD_SB_GENERIC_WORDS + MD_SB_PERSONALITY_WORDS + MD_SB_DISKS_WORDS)
/*
* Device "operational" state bits
*/
#define MD_DISK_FAULTY 0 /* disk is faulty / operational */
#define MD_DISK_ACTIVE 1 /* disk is running but may not be in sync */
#define MD_DISK_SYNC 2 /* disk is in sync with the raid set */
#define MD_DISK_REMOVED 3 /* disk is in sync with the raid set */
#define MD_DISK_CLUSTER_ADD 4 /* Initiate a disk add across the cluster
* For clustered enviroments only.
*/
#define MD_DISK_CANDIDATE 5 /* disk is added as spare (local) until confirmed
* For clustered enviroments only.
*/
#define MD_DISK_WRITEMOSTLY 9 /* disk is "write-mostly" is RAID1 config.
* read requests will only be sent here in
* dire need
*/
#define MD_DISK_FAILFAST 10 /* Fewer retries, more failures */
#define MD_DISK_REPLACEMENT 17
#define MD_DISK_JOURNAL 18 /* disk is used as the write journal in RAID-5/6 */
#define MD_DISK_ROLE_SPARE 0xffff
#define MD_DISK_ROLE_FAULTY 0xfffe
#define MD_DISK_ROLE_JOURNAL 0xfffd
#define MD_DISK_ROLE_MAX 0xff00 /* max value of regular disk role */
typedef struct mdp_device_descriptor_s {
__u32 number; /* 0 Device number in the entire set */
__u32 major; /* 1 Device major number */
__u32 minor; /* 2 Device minor number */
__u32 raid_disk; /* 3 The role of the device in the raid set */
__u32 state; /* 4 Operational state */
__u32 reserved[MD_SB_DESCRIPTOR_WORDS - 5];
} mdp_disk_t;
#define MD_SB_MAGIC 0xa92b4efc
/*
* Superblock state bits
*/
#define MD_SB_CLEAN 0
#define MD_SB_ERRORS 1
#define MD_SB_BBM_ERRORS 2
#define MD_SB_BLOCK_CONTAINER_RESHAPE 3 /* block container wide reshapes */
#define MD_SB_BLOCK_VOLUME 4 /* block activation of array, other arrays
* in container can be activated */
#define MD_SB_CLUSTERED 5 /* MD is clustered */
#define MD_SB_BITMAP_PRESENT 8 /* bitmap may be present nearby */
typedef struct mdp_superblock_s {
/*
* Constant generic information
*/
__u32 md_magic; /* 0 MD identifier */
__u32 major_version; /* 1 major version to which the set conforms */
__u32 minor_version; /* 2 minor version ... */
__u32 patch_version; /* 3 patchlevel version ... */
__u32 gvalid_words; /* 4 Number of used words in this section */
__u32 set_uuid0; /* 5 Raid set identifier */
__u32 ctime; /* 6 Creation time */
__u32 level; /* 7 Raid personality */
__u32 size; /* 8 Apparent size of each individual disk */
__u32 nr_disks; /* 9 total disks in the raid set */
__u32 raid_disks; /* 10 disks in a fully functional raid set */
__u32 md_minor; /* 11 preferred MD minor device number */
__u32 not_persistent; /* 12 does it have a persistent superblock */
__u32 set_uuid1; /* 13 Raid set identifier #2 */
__u32 set_uuid2; /* 14 Raid set identifier #3 */
__u32 set_uuid3; /* 15 Raid set identifier #4 */
__u32 gstate_creserved[MD_SB_GENERIC_CONSTANT_WORDS - 16];
/*
* Generic state information
*/
__u32 utime; /* 0 Superblock update time */
__u32 state; /* 1 State bits (clean, ...) */
__u32 active_disks; /* 2 Number of currently active disks */
__u32 working_disks; /* 3 Number of working disks */
__u32 failed_disks; /* 4 Number of failed disks */
__u32 spare_disks; /* 5 Number of spare disks */
__u32 sb_csum; /* 6 checksum of the whole superblock */
#if __BYTE_ORDER == __BIG_ENDIAN
__u32 events_hi; /* 7 high-order of superblock update count */
__u32 events_lo; /* 8 low-order of superblock update count */
__u32 cp_events_hi; /* 9 high-order of checkpoint update count */
__u32 cp_events_lo; /* 10 low-order of checkpoint update count */
#else
__u32 events_lo; /* 7 low-order of superblock update count */
__u32 events_hi; /* 8 high-order of superblock update count */
__u32 cp_events_lo; /* 9 low-order of checkpoint update count */
__u32 cp_events_hi; /* 10 high-order of checkpoint update count */
#endif
__u32 recovery_cp; /* 11 recovery checkpoint sector count */
/* There are only valid for minor_version > 90 */
__u64 reshape_position; /* 12,13 next address in array-space for reshape */
__u32 new_level; /* 14 new level we are reshaping to */
__u32 delta_disks; /* 15 change in number of raid_disks */
__u32 new_layout; /* 16 new layout */
__u32 new_chunk; /* 17 new chunk size (bytes) */
__u32 gstate_sreserved[MD_SB_GENERIC_STATE_WORDS - 18];
/*
* Personality information
*/
__u32 layout; /* 0 the array's physical layout */
__u32 chunk_size; /* 1 chunk size in bytes */
__u32 root_pv; /* 2 LV root PV */
__u32 root_block; /* 3 LV root block */
__u32 pstate_reserved[MD_SB_PERSONALITY_WORDS - 4];
/*
* Disks information
*/
mdp_disk_t disks[MD_SB_DISKS];
/*
* Reserved
*/
__u32 reserved[MD_SB_RESERVED_WORDS];
/*
* Active descriptor
*/
mdp_disk_t this_disk;
} mdp_super_t;
#ifdef __TINYC__
typedef unsigned long long __u64;
#endif
static inline __u64 md_event(mdp_super_t *sb) {
__u64 ev = sb->events_hi;
return (ev<<32)| sb->events_lo;
}
struct r5l_payload_header {
__u16 type;
__u16 flags;
} __attribute__ ((__packed__));
enum r5l_payload_type {
R5LOG_PAYLOAD_DATA = 0,
R5LOG_PAYLOAD_PARITY = 1,
R5LOG_PAYLOAD_FLUSH = 2,
};
struct r5l_payload_data_parity {
struct r5l_payload_header header;
__u32 size; /* sector. data/parity size. each 4k has a checksum */
__u64 location; /* sector. For data, it's raid sector. For
parity, it's stripe sector */
__u32 checksum[];
} __attribute__ ((__packed__));
enum r5l_payload_data_parity_flag {
R5LOG_PAYLOAD_FLAG_DISCARD = 1, /* payload is discard */
/*
* RESHAPED/RESHAPING is only set when there is reshape activity. Note,
* both data/parity of a stripe should have the same flag set
*
* RESHAPED: reshape is running, and this stripe finished reshape
* RESHAPING: reshape is running, and this stripe isn't reshaped
* */
R5LOG_PAYLOAD_FLAG_RESHAPED = 2,
R5LOG_PAYLOAD_FLAG_RESHAPING = 3,
};
struct r5l_payload_flush {
struct r5l_payload_header header;
__u32 size; /* flush_stripes size, bytes */
__u64 flush_stripes[];
} __attribute__ ((__packed__));
enum r5l_payload_flush_flag {
R5LOG_PAYLOAD_FLAG_FLUSH_STRIPE = 1, /* data represents whole stripe */
};
struct r5l_meta_block {
__u32 magic;
__u32 checksum;
__u8 version;
__u8 __zero_pading_1;
__u16 __zero_pading_2;
__u32 meta_size; /* whole size of the block */
__u64 seq;
__u64 position; /* sector, start from rdev->data_offset, current position */
struct r5l_payload_header payloads[];
} __attribute__ ((__packed__));
#define R5LOG_VERSION 0x1
#define R5LOG_MAGIC 0x6433c509
struct ppl_header_entry {
__u64 data_sector; /* raid sector of the new data */
__u32 pp_size; /* length of partial parity */
__u32 data_size; /* length of data */
__u32 parity_disk; /* member disk containing parity */
__u32 checksum; /* checksum of this entry's partial parity */
} __attribute__ ((__packed__));
#define PPL_HEADER_SIZE 4096
#define PPL_HDR_RESERVED 512
#define PPL_HDR_ENTRY_SPACE \
(PPL_HEADER_SIZE - PPL_HDR_RESERVED - 4 * sizeof(__u32) - sizeof(__u64))
#define PPL_HDR_MAX_ENTRIES \
(PPL_HDR_ENTRY_SPACE / sizeof(struct ppl_header_entry))
struct ppl_header {
__u8 reserved[PPL_HDR_RESERVED];/* reserved space, fill with 0xff */
__u32 signature; /* signature (family number of volume) */
__u32 padding; /* zero pad */
__u64 generation; /* generation number of the header */
__u32 entries_count; /* number of entries in entry array */
__u32 checksum; /* checksum of the header */
struct ppl_header_entry entries[PPL_HDR_MAX_ENTRIES];
} __attribute__ ((__packed__));
#endif

115
md_u.h Normal file
View file

@ -0,0 +1,115 @@
/*
md_u.h : user <=> kernel API between Linux raidtools and RAID drivers
Copyright (C) 1998 Ingo Molnar
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
You should have received a copy of the GNU General Public License
(for example /usr/src/linux/COPYING); if not, write to the Free
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
#ifndef _MD_U_H
#define _MD_U_H
/* ioctls */
/* status */
#define RAID_VERSION _IOR (MD_MAJOR, 0x10, mdu_version_t)
#define GET_ARRAY_INFO _IOR (MD_MAJOR, 0x11, mdu_array_info_t)
#define GET_DISK_INFO _IOR (MD_MAJOR, 0x12, mdu_disk_info_t)
#define RAID_AUTORUN _IO (MD_MAJOR, 0x14)
#define GET_BITMAP_FILE _IOR (MD_MAJOR, 0x15, mdu_bitmap_file_t)
/* configuration */
#define ADD_NEW_DISK _IOW (MD_MAJOR, 0x21, mdu_disk_info_t)
#define HOT_REMOVE_DISK _IO (MD_MAJOR, 0x22)
#define SET_ARRAY_INFO _IOW (MD_MAJOR, 0x23, mdu_array_info_t)
#define SET_DISK_FAULTY _IO (MD_MAJOR, 0x29)
#define SET_BITMAP_FILE _IOW (MD_MAJOR, 0x2b, int)
/* usage */
#define RUN_ARRAY _IOW (MD_MAJOR, 0x30, mdu_param_t)
#define STOP_ARRAY _IO (MD_MAJOR, 0x32)
#define STOP_ARRAY_RO _IO (MD_MAJOR, 0x33)
#define RESTART_ARRAY_RW _IO (MD_MAJOR, 0x34)
#define CLUSTERED_DISK_NACK _IO (MD_MAJOR, 0x35)
typedef struct mdu_version_s {
int major;
int minor;
int patchlevel;
} mdu_version_t;
typedef struct mdu_array_info_s {
/*
* Generic constant information
*/
int major_version;
int minor_version;
int patch_version;
unsigned int ctime;
int level;
int size;
int nr_disks;
int raid_disks;
int md_minor;
int not_persistent;
/*
* Generic state information
*/
unsigned int utime; /* 0 Superblock update time */
int state; /* 1 State bits (clean, ...) */
int active_disks; /* 2 Number of currently active disks */
int working_disks; /* 3 Number of working disks */
int failed_disks; /* 4 Number of failed disks */
int spare_disks; /* 5 Number of spare disks */
/*
* Personality information
*/
int layout; /* 0 the array's physical layout */
int chunk_size; /* 1 chunk size in bytes */
} mdu_array_info_t;
typedef struct mdu_disk_info_s {
/*
* configuration/status of one particular disk
*/
int number;
int major;
int minor;
int raid_disk;
int state;
} mdu_disk_info_t;
typedef struct mdu_start_info_s {
/*
* configuration/status of one particular disk
*/
int major;
int minor;
int raid_disk;
int state;
} mdu_start_info_t;
typedef struct mdu_bitmap_file_s
{
char pathname[4096];
} mdu_bitmap_file_t;
typedef struct mdu_param_s
{
int personality; /* 1,2,3,4 */
int chunk_size; /* in bytes */
int max_fault; /* unused for now */
} mdu_param_t;
#endif

3452
mdadm.8.in Normal file

File diff suppressed because it is too large Load diff

2078
mdadm.c Normal file

File diff suppressed because it is too large Load diff

65
mdadm.conf-example Normal file
View file

@ -0,0 +1,65 @@
# mdadm configuration file
#
# mdadm will function properly without the use of a configuration file,
# but this file is useful for keeping track of arrays and member disks.
# In general, a mdadm.conf file is created, and updated, after arrays
# are created. This is the opposite behavior of /etc/raidtab which is
# created prior to array construction.
#
#
# the config file takes two types of lines:
#
# DEVICE lines specify a list of devices of where to look for
# potential member disks
#
# ARRAY lines specify information about how to identify arrays so
# so that they can be activated
#
# You can have more than one device line and use wild cards. The first
# example includes SCSI the first partition of SCSI disks /dev/sdb,
# /dev/sdc, /dev/sdd, /dev/sdj, /dev/sdk, and /dev/sdl. The second
# line looks for array slices on IDE disks.
#
#DEVICE /dev/sd[bcdjkl]1
#DEVICE /dev/hda1 /dev/hdb1
#
# If you mount devfs on /dev, then a suitable way to list all devices is:
#DEVICE /dev/discs/*/*
#
#
# The AUTO line can control which arrays get assembled by auto-assembly,
# meaing either "mdadm -As" when there are no 'ARRAY' lines in this file,
# or "mdadm --incremental" when the array found is not listed in this file.
# By default, all arrays that are found are assembled.
# If you want to ignore all DDF arrays (maybe they are managed by dmraid),
# and only assemble 1.x arrays if which are marked for 'this' homehost,
# but assemble all others, then use
#AUTO -ddf homehost -1.x +all
#
# ARRAY lines specify an array to assemble and a method of identification.
# Arrays can currently be identified by using a UUID, superblock minor number,
# or a listing of devices.
#
# super-minor is usually the minor number of the metadevice
# UUID is the Universally Unique Identifier for the array
# Each can be obtained using
#
# mdadm -D <md>
#
#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
#
# ARRAY lines can also specify a "spare-group" for each array. mdadm --monitor
# will then move a spare between arrays in a spare-group if one array has a failed
# drive but no spare
#ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df spare-group=group1
#ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1
#
# When used in --follow (aka --monitor) mode, mdadm needs a
# mail address and/or a program. This can be given with "mailaddr"
# and "program" lines to that monitoring can be started using
# mdadm --follow --scan & echo $! > /run/mdadm/mon.pid
# If the lines are not found, mdadm will exit quietly
#MAILADDR root@mydomain.tld
#PROGRAM /usr/sbin/handle-mdadm-events

706
mdadm.conf.5 Normal file
View file

@ -0,0 +1,706 @@
.\" Copyright Neil Brown and others.
.\" This program is free software; you can redistribute it and/or modify
.\" it under the terms of the GNU General Public License as published by
.\" the Free Software Foundation; either version 2 of the License, or
.\" (at your option) any later version.
.\" See file COPYING in distribution for details.
.TH MDADM.CONF 5
.SH NAME
mdadm.conf \- configuration for management of Software RAID with mdadm
.SH SYNOPSIS
/etc/mdadm.conf
.SH DESCRIPTION
.PP
.I mdadm
is a tool for creating, managing, and monitoring RAID devices using the
.B md
driver in Linux.
.PP
Some common tasks, such as assembling all arrays, can be simplified
by describing the devices and arrays in this configuration file.
.SS SYNTAX
The file should be seen as a collection of words separated by white
space (space, tab, or newline).
Any word that beings with a hash sign (#) starts a comment and that
word together with the remainder of the line is ignored.
Spaces can be included in a word using quotation characters. Either
single quotes
.RB ( ' )
or double quotes (\fB"\fP)
may be used. All the characters from one quotation character to
next identical character are protected and will not be used to
separate words to start new quoted strings. To include a single quote
it must be between double quotes. To include a double quote it must
be between single quotes.
Any line that starts with white space (space or tab) is treated as
though it were a continuation of the previous line.
Empty lines are ignored, but otherwise each (non continuation) line
must start with a keyword as listed below. The keywords are case
insensitive and can be abbreviated to 3 characters.
The keywords are:
.TP
.B DEVICE
A
.B device
line lists the devices (whole devices or partitions) that might contain
a component of an MD array. When looking for the components of an
array,
.I mdadm
will scan these devices (or any devices listed on the command line).
The
.B device
line may contain a number of different devices (separated by spaces)
and each device name can contain wild cards as defined by
.BR glob (7).
Also, there may be several device lines present in the file.
Alternatively, a
.B device
line can contain either or both of the words
.B containers
and
.BR partitions .
The word
.B containers
will cause
.I mdadm
to look for assembled CONTAINER arrays and included them as a source
for assembling further arrays.
The word
.I partitions
will cause
.I mdadm
to read
.I /proc/partitions
and include all devices and partitions found therein.
.I mdadm
does not use the names from
.I /proc/partitions
but only the major and minor device numbers. It scans
.I /dev
to find the name that matches the numbers.
If no DEVICE line is present, then "DEVICE partitions containers" is assumed.
For example:
.IP
DEVICE /dev/hda* /dev/hdc*
.br
DEV /dev/sd*
.br
DEVICE /dev/disk/by-path/pci*
.br
DEVICE partitions
.TP
.B ARRAY
The ARRAY lines identify actual arrays. The second word on the line
may be the name of the device where the array is normally
assembled, such as
.B /dev/md1
or
.BR /dev/md/backup .
If the name does not start with a slash
.RB (' / '),
it is treated as being in
.BR /dev/md/ .
Alternately the word
.B <ignore>
(complete with angle brackets) can be given in which case any array
which matches the rest of the line will never be automatically assembled.
If no device name is given,
.I mdadm
will use various heuristics to determine an appropriate name.
Subsequent words identify the array, or identify the array as a member
of a group. If multiple identities are given,
then a component device must match ALL identities to be considered a
match. Each identity word has a tag, and equals sign, and some value.
The tags are:
.RS 4
.TP
.B uuid=
The value should be a 128 bit uuid in hexadecimal, with punctuation
interspersed if desired. This must match the uuid stored in the
superblock.
.TP
.B name=
The value should be a simple textual name as was given to
.I mdadm
when the array was created. This must match the name stored in the
superblock on a device for that device to be included in the array.
Not all superblock formats support names.
.TP
.B super\-minor=
The value is an integer which indicates the minor number that was
stored in the superblock when the array was created. When an array is
created as /dev/mdX, then the minor number X is stored.
.TP
.B devices=
The value is a comma separated list of device names or device name
patterns.
Only devices with names which match one entry in the list will be used
to assemble the array. Note that the devices
listed there must also be listed on a DEVICE line.
.TP
.B level=
The value is a RAID level. This is not normally used to
identify an array, but is supported so that the output of
.B "mdadm \-\-examine \-\-scan"
can be use directly in the configuration file.
.TP
.B num\-devices=
The value is the number of devices in a complete active array. As with
.B level=
this is mainly for compatibility with the output of
.BR "mdadm \-\-examine \-\-scan" .
.TP
.B spares=
The value is a number of spare devices to expect the array to have.
The sole use of this keyword and value is as follows:
.B mdadm \-\-monitor
will report an array if it is found to have fewer than this number of
spares when
.B \-\-monitor
starts or when
.B \-\-oneshot
is used.
.TP
.B spare\-group=
The value is a textual name for a group of arrays. All arrays with
the same
.B spare\-group
name are considered to be part of the same group. The significance of
a group of arrays is that
.I mdadm
will, when monitoring the arrays, move a spare drive from one array in
a group to another array in that group if the first array had a failed
or missing drive but no spare.
.TP
.B auto=
This option is rarely needed with mdadm-3.0, particularly if use with
the Linux kernel v2.6.28 or later.
It tells
.I mdadm
whether to use partitionable array or non-partitionable arrays and,
in the absence of
.IR udev ,
how many partition devices to create. From 2.6.28 all md array
devices are partitionable, hence this option is not needed.
The value of this option can be "yes" or "md" to indicate that a
traditional, non-partitionable md array should be created, or "mdp",
"part" or "partition" to indicate that a partitionable md array (only
available in linux 2.6 and later) should be used. This later set can
also have a number appended to indicate how many partitions to create
device files for, e.g.
.BR auto=mdp5 .
The default is 4.
.TP
.B bitmap=
The option specifies a file in which a write-intent bitmap should be
found. When assembling the array,
.I mdadm
will provide this file to the
.B md
driver as the bitmap file. This has the same function as the
.B \-\-bitmap\-file
option to
.BR \-\-assemble .
.TP
.B metadata=
Specify the metadata format that the array has. This is mainly
recognised for comparability with the output of
.BR "mdadm \-Es" .
.TP
.B container=
Specify that this array is a member array of some container. The
value given can be either a path name in /dev, or a UUID of the
container array.
.TP
.B member=
Specify that this array is a member array of some container. Each
type of container has some way to enumerate member arrays, often a
simple sequence number. The value identifies which member of a
container the array is. It will usually accompany a "container=" word.
.RE
.TP
.B MAILADDR
The
.B mailaddr
line gives an E-mail address that alerts should be
sent to when
.I mdadm
is running in
.B \-\-monitor
mode (and was given the
.B \-\-scan
option). There should only be one
.B MAILADDR
line and it should have only one address. Any subsequent addresses
are silently ignored.
.TP
.B MAILFROM
The
.B mailfrom
line (which can only be abbreviated to at least 5 characters) gives an
address to appear in the "From" address for alert mails. This can be
useful if you want to explicitly set a domain, as the default from
address is "root" with no domain. All words on this line are
catenated with spaces to form the address.
Note that this value cannot be set via the
.I mdadm
commandline. It is only settable via the config file.
.TP
.B PROGRAM
The
.B program
line gives the name of a program to be run when
.B "mdadm \-\-monitor"
detects potentially interesting events on any of the arrays that it
is monitoring. This program gets run with two or three arguments, they
being the Event, the md device, and possibly the related component
device.
There should only be one
.B program
line and it should be give only one program.
.TP
.B CREATE
The
.B create
line gives default values to be used when creating arrays, new members
of arrays, and device entries for arrays.
These include:
.RS 4
.TP
.B owner=
.TP
.B group=
These can give user/group ids or names to use instead of system
defaults (root/wheel or root/disk).
.TP
.B mode=
An octal file mode such as 0660 can be given to override the default
of 0600.
.TP
.B auto=
This corresponds to the
.B \-\-auto
flag to mdadm. Give
.BR yes ,
.BR md ,
.BR mdp ,
.B part
\(em possibly followed by a number of partitions \(em to indicate how
missing device entries should be created.
.TP
.B metadata=
The name of the metadata format to use if none is explicitly given.
This can be useful to impose a system-wide default of version-1 superblocks.
.TP
.B symlinks=no
Normally when creating devices in
.B /dev/md/
.I mdadm
will create a matching symlink from
.B /dev/
with a name starting
.B md
or
.BR md_ .
Give
.B symlinks=no
to suppress this symlink creation.
.TP
.B names=yes
Since Linux 2.6.29 it has been possible to create
.B md
devices with a name like
.B md_home
rather than just a number, like
.BR md3 .
.I mdadm
will use the numeric alternative by default as other tools that interact
with md arrays may expect only numbers.
If
.B names=yes
is given in
.I mdadm.conf
then
.I mdadm
will use a name when appropriate.
If
.B names=no
is given, then non-numeric
.I md
device names will not be used even if the default changes in a future
release of
.IR mdadm .
.TP
.B bbl=no
By default,
.I mdadm
will reserve space for a bad block list (bbl) on all devices
included in or added to any array that supports them. Setting
.B bbl=no
will prevent this, so newly added devices will not have a bad
block log.
.RE
.TP
.B HOMEHOST
The
.B homehost
line gives a default value for the
.B \-\-homehost=
option to mdadm. There should normally be only one other word on the line.
It should either be a host name, or one of the special words
.BR <system>,
.B <none>
and
.BR <ignore> .
If
.B <system>
is given, then the
.BR gethostname ( 2 )
systemcall is used to get the host name. This is the default.
If
.B <ignore>
is given, then a flag is set so that when arrays are being
auto-assembled the checking of the recorded
.I homehost
is disabled.
If
.B <ignore>
is given it is also possible to give an explicit name which will be
used when creating arrays. This is the only case when there can be
more that one other word on the
.B HOMEHOST
line. If there are other words, or other
.B HOMEHOST
lines, they are silently ignored.
If
.B <none>
is given, then the default of using
.BR gethostname ( 2 )
is over-ridden and no homehost name is assumed.
When arrays are created, this host name will be stored in the
metadata. When arrays are assembled using auto-assembly, arrays which
do not record the correct homehost name in their metadata will be
assembled using a "foreign" name. A "foreign" name alway ends with a
digit string preceded by an underscore to differentiate it
from any possible local name. e.g.
.B /dev/md/1_1
or
.BR /dev/md/home_0 .
.TP
.B AUTO
A list of names of metadata format can be given, each preceded by a
plus or minus sign. Also the word
.I homehost
is allowed as is
.I all
preceded by plus or minus sign.
.I all
is usually last.
When
.I mdadm
is auto-assembling an array, either via
.I \-\-assemble
or
.I \-\-incremental
and it finds metadata of a given type, it checks that metadata type
against those listed in this line. The first match wins, where
.I all
matches anything.
If a match is found that was preceded by a plus sign, the auto
assembly is allowed. If the match was preceded by a minus sign, the
auto assembly is disallowed. If no match is found, the auto assembly
is allowed.
If the metadata indicates that the array was created for
.I this
host, and the word
.I homehost
appears before any other match, then the array is treated as a valid
candidate for auto-assembly.
This can be used to disable all auto-assembly (so that only arrays
explicitly listed in mdadm.conf or on the command line are assembled),
or to disable assembly of certain metadata types which might be
handled by other software. It can also be used to disable assembly of
all foreign arrays - normally such arrays are assembled but given a
non-deterministic name in
.BR /dev/md/ .
The known metadata types are
.BR 0.90 ,
.BR 1.x ,
.BR ddf ,
.BR imsm .
.B AUTO
should be given at most once. Subsequent lines are silently ignored.
Thus an earlier config file in a config directory will over-ride
the setting in a later config file.
.TP
.B POLICY
This is used to specify what automatic behavior is allowed on devices
newly appearing in the system and provides a way of marking spares that can
be moved to other arrays as well as the migration domains.
.I Domain
can be defined through
.I policy
line by specifying a domain name for a number of paths from
.BR /dev/disk/by-path/ .
A device may belong to several domains. The domain of an array is a union
of domains of all devices in that array. A spare can be automatically
moved from one array to another if the set of the destination array's
.I domains
contains all the
.I domains
of the new disk or if both arrays have the same
.IR spare-group .
To update hot plug configuration it is necessary to execute
.B mdadm \-\-udev\-rules
command after changing the config file
Keywords used in the
.I POLICY
line and supported values are:
.RS 4
.TP
.B domain=
any arbitrary string
.TP
.B metadata=
0.9 1.x ddf or imsm
.TP
.B path=
file glob matching anything from
.B /dev/disk/by-path
.TP
.B type=
either
.B disk
or
.BR part .
.TP
.B action=
include, re-add, spare, spare-same-slot, or force-spare
.TP
.B auto=
yes, no, or homehost.
.P
The
.I action
item determines the automatic behavior allowed for devices matching the
.I path
and
.I type
in the same line. If a device matches several lines with different
.I actions
then the most permissive will apply. The ordering of policy lines
is irrelevant to the end result.
.TP
.B include
allows adding a disk to an array if metadata on that disk matches that array
.TP
.B re\-add
will include the device in the array if it appears to be a current member
or a member that was recently removed and the array has a
write-intent-bitmap to allow the
.B re\-add
functionality.
.TP
.B spare
as above and additionally: if the device is bare it can
become a spare if there is any array that it is a candidate for based
on domains and metadata.
.TP
.B spare\-same\-slot
as above and additionally if given slot was used by an array that went
degraded recently and the device plugged in has no metadata then it will
be automatically added to that array (or it's container)
.TP
.B force\-spare
as above and the disk will become a spare in remaining cases
.RE
.TP
.B PART-POLICY
This is similar to
.B POLICY
and accepts the same keyword assignments. It allows a consistent set
of policies to applied to each of the partitions of a device.
A
.B PART-POLICY
line should set
.I type=disk
and identify the path to one or more disk devices. Each partition on
these disks will be treated according to the
.I action=
setting from this line. If a
.I domain
is set in the line, then the domain associated with each patition will
be based on the domain, but with
.RB \(dq -part N\(dq
appended, when N is the partition number for the partition that was
found.
.TP
.B SYSFS
The
.B SYSFS
line lists custom values of MD device's sysfs attributes which will be
stored in sysfs after the array is assembled. Multiple lines are allowed and each
line has to contain the uuid or the name of the device to which it relates.
.RS 4
.TP
.B uuid=
hexadecimal identifier of MD device. This has to match the uuid stored in the
superblock.
.TP
.B name=
name of the MD device as was given to
.I mdadm
when the array was created. It will be ignored if
.B uuid
is not empty.
.RE
.TP
.B MONITORDELAY
The
.B monitordelay
line gives a delay in seconds
.I mdadm
shall wait before pooling md arrays
when
.I mdadm
is running in
.B \-\-monitor
mode.
.B \-d/\-\-delay
command line argument takes precedence over the config file
.SH EXAMPLE
DEVICE /dev/sd[bcdjkl]1
.br
DEVICE /dev/hda1 /dev/hdb1
# /dev/md0 is known by its UUID.
.br
ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
.br
# /dev/md1 contains all devices with a minor number of
.br
# 1 in the superblock.
.br
ARRAY /dev/md1 superminor=1
.br
# /dev/md2 is made from precisely these two devices
.br
ARRAY /dev/md2 devices=/dev/hda1,/dev/hdb1
# /dev/md4 and /dev/md5 are a spare-group and spares
.br
# can be moved between them
.br
ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df
.br
spare\-group=group1
.br
ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977
.br
spare\-group=group1
.br
# /dev/md/home is created if need to be a partitionable md array
.br
# any spare device number is allocated.
.br
ARRAY /dev/md/home UUID=9187a482:5dde19d9:eea3cc4a:d646ab8b
.br
auto=part
.br
# The name of this array contains a space.
.br
ARRAY /dev/md9 name='Data Storage'
.sp
POLICY domain=domain1 metadata=imsm path=pci-0000:00:1f.2-scsi-*
.br
action=spare
.br
POLICY domain=domain1 metadata=imsm path=pci-0000:04:00.0-scsi-[01]*
.br
action=include
.br
# One domain comprising of devices attached to specified paths is defined.
.br
# Bare device matching first path will be made an imsm spare on hot plug.
.br
# If more than one array is created on devices belonging to domain1 and
.br
# one of them becomes degraded, then any imsm spare matching any path for
.br
# given domain name can be migrated.
.br
MAILADDR root@mydomain.tld
.br
PROGRAM /usr/sbin/handle\-mdadm\-events
.br
CREATE group=system mode=0640 auto=part\-8
.br
HOMEHOST <system>
.br
AUTO +1.x homehost \-all
.br
SYSFS name=/dev/md/raid5 group_thread_cnt=4 sync_speed_max=1000000
.br
SYSFS uuid=bead5eb6:31c17a27:da120ba2:7dfda40d group_thread_cnt=4
sync_speed_max=1000000
.br
MONITORDELAY 60
.SH SEE ALSO
.BR mdadm (8),
.BR md (4).

1887
mdadm.h Normal file

File diff suppressed because it is too large Load diff

47
mdadm.spec Normal file
View file

@ -0,0 +1,47 @@
Summary: mdadm is used for controlling Linux md devices (aka RAID arrays)
Name: mdadm
Version: 4.2
Release: 1
Source: https://www.kernel.org/pub/linux/utils/raid/mdadm/mdadm-%{version}.tar.gz
URL: https://neil.brown.name/blog/mdadm
License: GPL
Group: Utilities/System
BuildRoot: %{_tmppath}/%{name}-root
Obsoletes: mdctl
%description
mdadm is a program that can be used to create, manage, and monitor
Linux MD (Software RAID) devices.
%prep
%setup -q
# we want to install in /sbin, not /usr/sbin...
%define _exec_prefix %{nil}
%build
# This is a debatable issue. The author of this RPM spec file feels that
# people who install RPMs (especially given that the default RPM options
# will strip the binary) are not going to be running gdb against the
# program.
make CXFLAGS="$RPM_OPT_FLAGS" SYSCONFDIR="%{_sysconfdir}"
%install
make DESTDIR=$RPM_BUILD_ROOT MANDIR=%{_mandir} BINDIR=%{_sbindir} install
install -D -m644 mdadm.conf-example $RPM_BUILD_ROOT/%{_sysconfdir}/mdadm.conf
%clean
rm -rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root)
%doc TODO ChangeLog mdadm.conf-example COPYING
%{_sbindir}/mdadm
%{_sbindir}/mdmon
/usr/lib/udev/rules.d/01-md-raid-creating.rules
/usr/lib/udev/rules.d/63-md-raid-arrays.rules
/usr/lib/udev/rules.d/64-md-raid-assembly.rules
/usr/lib/udev/rules.d/69-md-clustered-confirm-device.rules
%config(noreplace,missingok)/%{_sysconfdir}/mdadm.conf
%{_mandir}/man*/md*
%changelog

146
mdmon-design.txt Normal file
View file

@ -0,0 +1,146 @@
When managing a RAID1 array which uses metadata other than the
"native" metadata understood by the kernel, mdadm makes use of a
partner program named 'mdmon' to manage some aspects of updating
that metadata and synchronising the metadata with the array state.
This document provides some details on how mdmon works.
Containers
----------
As background: mdadm makes a distinction between an 'array' and a
'container'. Other sources sometimes use the term 'volume' or
'device' for an 'array', and may use the term 'array' for a
'container'.
For our purposes:
- a 'container' is a collection of devices which are described by a
single set of metadata. The metadata may be stored equally
on all devices, or different devices may have quite different
subsets of the total metadata. But there is conceptually one set
of metadata that unifies the devices.
- an 'array' is a set of datablock from various devices which
together are used to present the abstraction of a single linear
sequence of block, which may provide data redundancy or enhanced
performance.
So a container has some metadata and provides a number of arrays which
are described by that metadata.
Sometimes this model doesn't work perfectly. For example, global
spares may have their own metadata which is quite different from the
metadata from any device that participates in one or more arrays.
Such a global spare might still need to belong to some container so
that it is available to be used should a failure arise. In that case
we consider the 'metadata' to be the union of the metadata on the
active devices which describes the arrays, and the metadata on the
global spares which only describes the spares. In this case different
devices in the one container will have quite different metadata.
Purpose
-------
The main purpose of mdmon is to update the metadata in response to
changes to the array which need to be reflected in the metadata before
futures writes to the array can safely be performed.
These include:
- transitions from 'clean' to 'dirty'.
- recording the devices have failed.
- recording the progress of a 'reshape'
This requires mdmon to be running at any time that the array is
writable (a read-only array does not require mdmon to be running).
Because mdmon must be able to process these metadata updates at any
time, it must (when running) have exclusive write access to the
metadata. Any other changes (e.g. reconfiguration of the array) must
go through mdmon.
A secondary role for mdmon is to activate spares when a device fails.
This role is much less time-critical than the other metadata updates,
so it could be performed by a separate process, possibly
"mdadm --monitor" which has a related role of moving devices between
arrays. A main reason for including this functionality in mdmon is
that in the native-metadata case this function is handled in the
kernel, and mdmon's reason for existence to provide functionality
which is otherwise handled by the kernel.
Design overview
---------------
mdmon is structured as two threads with a common address space and
common data structures. These threads are know as the 'monitor' and
the 'manager'.
The 'monitor' has the primary role of monitoring the array for
important state changes and updating the metadata accordingly. As
writes to the array can be blocked until 'monitor' completes and
acknowledges the update, it much be very careful not to block itself.
In particular it must not block waiting for any write to complete else
it could deadlock. This means that it must not allocate memory as
doing this can require dirty memory to be written out and if the
system choose to write to the array that mdmon is monitoring, the
memory allocation could deadlock.
So 'monitor' must never allocate memory and must limit the number of
other system call it performs. It may:
- use select (or poll) to wait for activity on a file descriptor
- read from a sysfs file descriptor
- write to a sysfs file descriptor
- write the metadata out to the block devices using O_DIRECT
- send a signal (kill) to the manager thread
It must not e.g. open files or do anything similar that might allocate
resources.
The 'manager' thread does everything else that is needed. If any
files are to be opened (e.g. because a device has been added to the
array), the manager does that. If any memory needs to be allocated
(e.g. to hold data about a new array as can happen when one set of
metadata describes several arrays), the manager performs that
allocation.
The 'manager' is also responsible for communicating with mdadm and
assigning spares to replace failed devices.
Handling metadata updates
-------------------------
There are a number of cases in which mdadm needs to update the
metdata which mdmon is managing. These include:
- creating a new array in an active container
- adding a device to a container
- reconfiguring an array
etc.
To complete these updates, mdadm must send a message to mdmon which
will merge the update into the metadata as it is at that moment.
To achieve this, mdmon creates a Unix Domain Socket which the manager
thread listens on. mdadm sends a message over this socket. The
manager thread examines the message to see if it will require
allocating any memory and allocates it. This is done in the
'prepare_update' metadata method.
The update message is then queued for handling by the monitor thread
which it will do when convenient. The monitor thread calls
->process_update which should atomically make the required changes to
the metadata, making use of the pre-allocate memory as required. Any
memory the is no-longer needed can be placed back in the request and
the manager thread will free it.
The exact format of a metadata update is up to the implementer of the
metadata handlers. It will simply describe a change that needs to be
made. It will sometimes contain fragments of the metadata to be
copied in to place. However the ->process_update routine must make
sure not to over-write any field that the monitor thread might have
updated, such as a 'device failed' or 'array is dirty' state.
When the monitor thread has completed the update and written it to the
devices, an acknowledgement message is sent back over the socket so
that mdadm knows it is complete.

257
mdmon.8 Normal file
View file

@ -0,0 +1,257 @@
.\" See file COPYING in distribution for details.
.TH MDMON 8 "" v4.2
.SH NAME
mdmon \- monitor MD external metadata arrays
.SH SYNOPSIS
.BI mdmon " [--all] [--takeover] [--foreground] CONTAINER"
.SH OVERVIEW
The 2.6.27 kernel brings the ability to support external metadata arrays.
External metadata implies that user space handles all updates to the metadata.
The kernel's responsibility is to notify user space when a "metadata event"
occurs, like disk failures and clean-to-dirty transitions. The kernel, in
important cases, waits for user space to take action on these notifications.
.SH DESCRIPTION
.SS Metadata updates:
To service metadata update requests a daemon,
.IR mdmon ,
is introduced.
.I Mdmon
is tasked with polling the sysfs namespace looking for changes in
.BR array_state ,
.BR sync_action ,
and per disk
.BR state
attributes. When a change is detected it calls a per metadata type
handler to make modifications to the metadata. The following actions
are taken:
.RS
.TP
.B array_state \- inactive
Clear the dirty bit for the volume and let the array be stopped
.TP
.B array_state \- write pending
Set the dirty bit for the array and then set
.B array_state
to
.BR active .
Writes
are blocked until userspace writes
.BR active.
.TP
.B array_state \- active-idle
The safe mode timer has expired so set array state to clean to block writes to the array
.TP
.B array_state \- clean
Clear the dirty bit for the volume
.TP
.B array_state \- read-only
This is the initial state that all arrays start at.
.I mdmon
takes one of the three actions:
.RS
.TP
1/
Transition the array to read-auto keeping the dirty bit clear if the metadata
handler determines that the array does not need resyncing or other modification
.TP
2/
Transition the array to active if the metadata handler determines a resync or
some other manipulation is necessary
.TP
3/
Leave the array read\-only if the volume is marked to not be monitored; for
example, the metadata version has been set to "external:\-dev/md127" instead of
"external:/dev/md127"
.RE
.TP
.B sync_action \- resync\-to\-idle
Notify the metadata handler that a resync may have completed. If a resync
process is idled before it completes this event allows the metadata handler to
checkpoint resync.
.TP
.B sync_action \- recover\-to\-idle
A spare may have completed rebuilding so tell the metadata handler about the
state of each disk. This is the metadata handler's opportunity to clear
any "out-of-sync" bits and clear the volume's degraded status. If a recovery
process is idled before it completes this event allows the metadata handler to
checkpoint recovery.
.TP
.B <disk>/state \- faulty
A disk failure kicks off a series of events. First, notify the metadata
handler that a disk has failed, and then notify the kernel that it can unblock
writes that were dependent on this disk. After unblocking the kernel this disk
is set to be removed+ from the member array. Finally the disk is marked failed
in all other member arrays in the container.
.IP
+ Note This behavior differs slightly from native MD arrays where
removal is reserved for a
.B mdadm --remove
event. In the external metadata case the container holds the final
reference on a block device and a
.B mdadm --remove <container> <victim>
call is still required.
.RE
.SS Containers:
.P
External metadata formats, like DDF, differ from the native MD metadata
formats in that they define a set of disks and a series of sub-arrays
within those disks. MD metadata in comparison defines a 1:1
relationship between a set of block devices and a RAID array. For
example to create 2 arrays at different RAID levels on a single
set of disks, MD metadata requires the disks be partitioned and then
each array can be created with a subset of those partitions. The
supported external formats perform this disk carving internally.
.P
Container devices simply hold references to all member disks and allow
tools like
.I mdmon
to determine which active arrays belong to which
container. Some array management commands like disk removal and disk
add are now only valid at the container level. Attempts to perform
these actions on member arrays are blocked with error messages like:
.IP
"mdadm: Cannot remove disks from a \'member\' array, perform this
operation on the parent container"
.P
Containers are identified in /proc/mdstat with a metadata version string
"external:<metadata name>". Member devices are identified by
"external:/<container device>/<member index>", or "external:-<container
device>/<member index>" if the array is to remain readonly.
.SH OPTIONS
.TP
CONTAINER
The
.B container
device to monitor. It can be a full path like /dev/md/container, or a
simple md device name like md127.
.TP
.B \-\-foreground
Normally,
.I mdmon
will fork and continue in the background. Adding this option will
skip that step and run
.I mdmon
in the foreground.
.TP
.B \-\-takeover
This instructs
.I mdmon
to replace any active
.I mdmon
which is currently monitoring the array. This is primarily used late
in the boot process to replace any
.I mdmon
which was started from an
.B initramfs
before the root filesystem was mounted. This avoids holding a
reference on that
.B initramfs
indefinitely and ensures that the
.I pid
and
.I sock
files used to communicate with
.I mdmon
are in a standard place.
.TP
.B \-\-all
This tells mdmon to find any active containers and start monitoring
each of them if appropriate. This is normally used with
.B \-\-takeover
late in the boot sequence.
A separate
.I mdmon
process is started for each container as the
.B \-\-all
argument is over-written with the name of the container. To allow for
containers with names longer than 5 characters, this argument can be
arbitrarily extended, e.g. to
.BR \-\-all-active-arrays .
.TP
.PP
Note that
.I mdmon
is automatically started by
.I mdadm
when needed and so does not need to be considered when working with
RAID arrays. The only times it is run other than by
.I mdadm
is when the boot scripts need to restart it after mounting the new
root filesystem.
.SH START UP AND SHUTDOWN
As
.I mdmon
needs to be running whenever any filesystem on the monitored device is
mounted there are special considerations when the root filesystem is
mounted from an
.I mdmon
monitored device.
Note that in general
.I mdmon
is needed even if the filesystem is mounted read-only as some
filesystems can still write to the device in those circumstances, for
example to replay a journal after an unclean shutdown.
When the array is assembled by the
.B initramfs
code, mdadm will automatically start
.I mdmon
as required. This means that
.I mdmon
must be installed on the
.B initramfs
and there must be a writable filesystem (typically tmpfs) in which
.B mdmon
can create a
.B .pid
and
.B .sock
file. The particular filesystem to use is given to mdmon at compile
time and defaults to
.BR /run/mdadm .
This filesystem must persist through to shutdown time.
After the final root filesystem has be instantiated (usually with
.BR pivot_root )
.I mdmon
should be run with
.I "\-\-all \-\-takeover"
so that the
.I mdmon
running from the
.B initramfs
can be replaced with one running in the main root, and so the
memory used by the initramfs can be released.
At shutdown time,
.I mdmon
should not be killed along with other processes. Also as it holds a
file (socket actually) open in
.B /dev
(by default) it will not be possible to unmount
.B /dev
if it is a separate filesystem.
.SH EXAMPLES
.B " mdmon \-\-all-active-arrays \-\-takeover"
.br
Any
.I mdmon
which is currently running is killed and a new instance is started.
This should be run during in the boot sequence if an initramfs was
used, so that any mdmon running from the initramfs will not hold
the initramfs active.
.SH SEE ALSO
.IR mdadm (8),
.IR md (4).

594
mdmon.c Normal file
View file

@ -0,0 +1,594 @@
/*
* mdmon - monitor external metadata arrays
*
* Copyright (C) 2007-2009 Neil Brown <neilb@suse.de>
* Copyright (C) 2007-2009 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
*/
/*
* md array manager.
* When md arrays have user-space managed metadata, this is the program
* that does the managing.
*
* Given one argument: the name of the array (e.g. /dev/md0) that is
* the container.
* We fork off a helper that runs high priority and mlocked. It responds to
* device failures and other events that might stop writeout, or that are
* trivial to deal with.
* The main thread then watches for new arrays being created in the container
* and starts monitoring them too ... along with a few other tasks.
*
* The main thread communicates with the priority thread by writing over
* a pipe.
* Separate programs can communicate with the main thread via Unix-domain
* socket.
* The two threads share address space and open file table.
*
*/
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <fcntl.h>
#include <signal.h>
#include <dirent.h>
#ifdef USE_PTHREADS
#include <pthread.h>
#else
#include <sched.h>
#endif
#include "mdadm.h"
#include "mdmon.h"
char const Name[] = "mdmon";
struct active_array *discard_this;
struct active_array *pending_discard;
int mon_tid, mgr_tid;
int sigterm;
#ifdef USE_PTHREADS
static void *run_child(void *v)
{
struct supertype *c = v;
mon_tid = syscall(SYS_gettid);
do_monitor(c);
return 0;
}
static int clone_monitor(struct supertype *container)
{
pthread_attr_t attr;
pthread_t thread;
int rc;
mon_tid = -1;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 4096);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
rc = pthread_create(&thread, &attr, run_child, container);
if (rc)
return rc;
while (mon_tid == -1)
usleep(10);
pthread_attr_destroy(&attr);
mgr_tid = syscall(SYS_gettid);
return mon_tid;
}
#else /* USE_PTHREADS */
static int run_child(void *v)
{
struct supertype *c = v;
do_monitor(c);
return 0;
}
#ifdef __ia64__
int __clone2(int (*fn)(void *),
void *child_stack_base, size_t stack_size,
int flags, void *arg, ...
/* pid_t *pid, struct user_desc *tls, pid_t *ctid */ );
#endif
static int clone_monitor(struct supertype *container)
{
static char stack[4096];
#ifdef __ia64__
mon_tid = __clone2(run_child, stack, sizeof(stack),
CLONE_FS|CLONE_FILES|CLONE_VM|CLONE_SIGHAND|CLONE_THREAD,
container);
#else
mon_tid = clone(run_child, stack+4096-64,
CLONE_FS|CLONE_FILES|CLONE_VM|CLONE_SIGHAND|CLONE_THREAD,
container);
#endif
mgr_tid = syscall(SYS_gettid);
return mon_tid;
}
#endif /* USE_PTHREADS */
static int make_pidfile(char *devname)
{
char path[100];
char pid[10];
int fd;
int n;
if (mkdir(MDMON_DIR, 0755) < 0 &&
errno != EEXIST)
return -errno;
sprintf(path, "%s/%s.pid", MDMON_DIR, devname);
fd = open(path, O_RDWR|O_CREAT|O_EXCL, 0600);
if (fd < 0)
return -errno;
sprintf(pid, "%d\n", getpid());
n = write(fd, pid, strlen(pid));
close(fd);
if (n < 0)
return -errno;
return 0;
}
static void try_kill_monitor(pid_t pid, char *devname, int sock)
{
char buf[100];
int fd;
int n;
long fl;
int rv;
/* first rule of survival... don't off yourself */
if (pid == getpid())
return;
/* kill this process if it is mdmon */
sprintf(buf, "/proc/%lu/cmdline", (unsigned long) pid);
fd = open(buf, O_RDONLY);
if (fd < 0)
return;
n = read(fd, buf, sizeof(buf)-1);
buf[sizeof(buf)-1] = 0;
close(fd);
if (n < 0 || !(strstr(buf, "mdmon") ||
strstr(buf, "@dmon")))
return;
kill(pid, SIGTERM);
if (sock < 0)
return;
/* Wait for monitor to exit by reading from the socket, after
* clearing the non-blocking flag */
fl = fcntl(sock, F_GETFL, 0);
fl &= ~O_NONBLOCK;
fcntl(sock, F_SETFL, fl);
n = read(sock, buf, 100);
/* If there is I/O going on it might took some time to get to
* clean state. Wait for monitor to exit fully to avoid races.
* Ping it with SIGUSR1 in case that it is sleeping */
for (n = 0; n < 25; n++) {
rv = kill(pid, SIGUSR1);
if (rv < 0)
break;
usleep(200000);
}
}
void remove_pidfile(char *devname)
{
char buf[100];
sprintf(buf, "%s/%s.pid", MDMON_DIR, devname);
unlink(buf);
sprintf(buf, "%s/%s.sock", MDMON_DIR, devname);
unlink(buf);
}
static int make_control_sock(char *devname)
{
char path[100];
int sfd;
long fl;
struct sockaddr_un addr;
if (sigterm)
return -1;
sprintf(path, "%s/%s.sock", MDMON_DIR, devname);
unlink(path);
sfd = socket(PF_LOCAL, SOCK_STREAM, 0);
if (sfd < 0)
return -1;
addr.sun_family = PF_LOCAL;
strcpy(addr.sun_path, path);
umask(077); /* ensure no world write access */
if (bind(sfd, (struct sockaddr*)&addr, sizeof(addr)) < 0) {
close(sfd);
return -1;
}
listen(sfd, 10);
fl = fcntl(sfd, F_GETFL, 0);
fl |= O_NONBLOCK;
fcntl(sfd, F_SETFL, fl);
return sfd;
}
static void term(int sig)
{
sigterm = 1;
}
static void wake_me(int sig)
{
}
/* if we are debugging and starting mdmon by hand then don't fork */
static int do_fork(void)
{
#ifdef DEBUG
if (check_env("MDADM_NO_MDMON"))
return 0;
#endif
return 1;
}
void usage(void)
{
fprintf(stderr,
"Usage: mdmon [options] CONTAINER\n"
"\n"
"Options are:\n"
" --help -h : This message\n"
" --all -a : All devices\n"
" --foreground -F : Run in foreground (do not fork)\n"
" --takeover -t : Takeover container\n"
);
exit(2);
}
static int mdmon(char *devnm, int must_fork, int takeover);
int main(int argc, char *argv[])
{
char *container_name = NULL;
char *devnm = NULL;
int status = 0;
int opt;
int all = 0;
int takeover = 0;
int dofork = 1;
static struct option options[] = {
{"all", 0, NULL, 'a'},
{"takeover", 0, NULL, 't'},
{"help", 0, NULL, 'h'},
{"offroot", 0, NULL, OffRootOpt},
{"foreground", 0, NULL, 'F'},
{NULL, 0, NULL, 0}
};
if (in_initrd()) {
/*
* set first char of argv[0] to @. This is used by
* systemd to signal that the task was launched from
* initrd/initramfs and should be preserved during shutdown
*/
argv[0][0] = '@';
}
while ((opt = getopt_long(argc, argv, "thaF", options, NULL)) != -1) {
switch (opt) {
case 'a':
container_name = argv[optind-1];
all = 1;
break;
case 't':
takeover = 1;
break;
case 'F':
dofork = 0;
break;
case OffRootOpt:
argv[0][0] = '@';
break;
case 'h':
default:
usage();
break;
}
}
if (all == 0 && container_name == NULL) {
if (argv[optind])
container_name = argv[optind];
}
if (container_name == NULL)
usage();
if (argc - optind > 1)
usage();
if (strcmp(container_name, "/proc/mdstat") == 0)
all = 1;
if (all) {
struct mdstat_ent *mdstat, *e;
int container_len = strlen(container_name);
/* launch an mdmon instance for each container found */
mdstat = mdstat_read(0, 0);
for (e = mdstat; e; e = e->next) {
if (e->metadata_version &&
strncmp(e->metadata_version, "external:", 9) == 0 &&
!is_subarray(&e->metadata_version[9])) {
/* update cmdline so this mdmon instance can be
* distinguished from others in a call to ps(1)
*/
if (strlen(e->devnm) <= (unsigned)container_len) {
memset(container_name, 0, container_len);
sprintf(container_name, "%s", e->devnm);
}
status |= mdmon(e->devnm, 1, takeover);
}
}
free_mdstat(mdstat);
return status;
} else if (strncmp(container_name, "md", 2) == 0) {
int id = devnm2devid(container_name);
if (id)
devnm = container_name;
} else {
struct stat st;
if (stat(container_name, &st) == 0)
devnm = xstrdup(stat2devnm(&st));
}
if (!devnm) {
pr_err("%s is not a valid md device name\n",
container_name);
exit(1);
}
return mdmon(devnm, dofork && do_fork(), takeover);
}
static int mdmon(char *devnm, int must_fork, int takeover)
{
int mdfd;
struct mdinfo *mdi, *di;
struct supertype *container;
sigset_t set;
struct sigaction act;
int pfd[2];
int status;
int ignore;
pid_t victim = -1;
int victim_sock = -1;
dprintf("starting mdmon for %s\n", devnm);
mdfd = open_dev(devnm);
if (mdfd < 0) {
pr_err("%s: %s\n", devnm, strerror(errno));
return 1;
}
/* Fork, and have the child tell us when they are ready */
if (must_fork) {
if (pipe(pfd) != 0) {
pr_err("failed to create pipe\n");
return 1;
}
switch(fork()) {
case -1:
pr_err("failed to fork: %s\n", strerror(errno));
return 1;
case 0: /* child */
close(pfd[0]);
break;
default: /* parent */
close(pfd[1]);
if (read(pfd[0], &status, sizeof(status)) != sizeof(status)) {
wait(&status);
status = WEXITSTATUS(status);
}
close(pfd[0]);
return status;
}
} else
pfd[0] = pfd[1] = -1;
container = xcalloc(1, sizeof(*container));
strcpy(container->devnm, devnm);
container->arrays = NULL;
container->sock = -1;
mdi = sysfs_read(mdfd, container->devnm, GET_VERSION|GET_LEVEL|GET_DEVS);
if (!mdi) {
pr_err("failed to load sysfs info for %s\n", container->devnm);
exit(3);
}
if (mdi->array.level != UnSet) {
pr_err("%s is not a container - cannot monitor\n", devnm);
exit(3);
}
if (mdi->array.major_version != -1 ||
mdi->array.minor_version != -2) {
pr_err("%s does not use external metadata - cannot monitor\n",
devnm);
exit(3);
}
container->ss = version_to_superswitch(mdi->text_version);
if (container->ss == NULL) {
pr_err("%s uses unsupported metadata: %s\n",
devnm, mdi->text_version);
exit(3);
}
container->devs = NULL;
for (di = mdi->devs; di; di = di->next) {
struct mdinfo *cd = xmalloc(sizeof(*cd));
*cd = *di;
cd->next = container->devs;
container->devs = cd;
}
sysfs_free(mdi);
/* SIGUSR is sent between parent and child. So both block it
* and enable it only with pselect.
*/
sigemptyset(&set);
sigaddset(&set, SIGUSR1);
sigaddset(&set, SIGTERM);
sigprocmask(SIG_BLOCK, &set, NULL);
act.sa_handler = wake_me;
act.sa_flags = 0;
sigaction(SIGUSR1, &act, NULL);
act.sa_handler = term;
sigaction(SIGTERM, &act, NULL);
act.sa_handler = SIG_IGN;
sigaction(SIGPIPE, &act, NULL);
victim = mdmon_pid(container->devnm);
if (victim >= 0)
victim_sock = connect_monitor(container->devnm);
ignore = chdir("/");
if (!takeover && victim > 0 && victim_sock >= 0) {
if (fping_monitor(victim_sock) == 0) {
pr_err("%s already managed\n", container->devnm);
exit(3);
}
close(victim_sock);
victim_sock = -1;
}
if (container->ss->load_container(container, mdfd, devnm)) {
pr_err("Cannot load metadata for %s\n", devnm);
exit(3);
}
close(mdfd);
/* Ok, this is close enough. We can say goodbye to our parent now.
*/
if (victim > 0)
remove_pidfile(devnm);
if (make_pidfile(devnm) < 0) {
exit(3);
}
container->sock = make_control_sock(devnm);
status = 0;
if (pfd[1] >= 0) {
if (write(pfd[1], &status, sizeof(status)) < 0)
pr_err("failed to notify our parent: %d\n",
getppid());
close(pfd[1]);
}
mlockall(MCL_CURRENT | MCL_FUTURE);
if (clone_monitor(container) < 0) {
pr_err("failed to start monitor process: %s\n",
strerror(errno));
exit(2);
}
if (victim > 0) {
try_kill_monitor(victim, container->devnm, victim_sock);
if (victim_sock >= 0)
close(victim_sock);
}
setsid();
manage_fork_fds(0);
/* This silliness is to stop the compiler complaining
* that we ignore 'ignore'
*/
if (ignore)
ignore++;
do_manager(container);
exit(0);
}
/* Some stub functions so super-* can link with us */
int child_monitor(int afd, struct mdinfo *sra, struct reshape *reshape,
struct supertype *st, unsigned long blocks,
int *fds, unsigned long long *offsets,
int dests, int *destfd, unsigned long long *destoffsets)
{
return 0;
}
int restore_stripes(int *dest, unsigned long long *offsets,
int raid_disks, int chunk_size, int level, int layout,
int source, unsigned long long read_offset,
unsigned long long start, unsigned long long length,
char *src_buf)
{
return 1;
}
int save_stripes(int *source, unsigned long long *offsets,
int raid_disks, int chunk_size, int level, int layout,
int nwrites, int *dest,
unsigned long long start, unsigned long long length,
char *buf)
{
return 0;
}
struct superswitch super0 = {
.name = "0.90",
};
struct superswitch super1 = {
.name = "1.x",
};

111
mdmon.h Normal file
View file

@ -0,0 +1,111 @@
/*
* mdmon - monitor external metadata arrays
*
* Copyright (C) 2007-2009 Neil Brown <neilb@suse.de>
* Copyright (C) 2007-2009 Intel Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
* version 2, as published by the Free Software Foundation.
*
* This program is distributed in the hope it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
*/
extern const char Name[];
enum array_state { clear, inactive, suspended, readonly, read_auto,
clean, active, write_pending, active_idle, broken, bad_word};
enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
struct active_array {
struct mdinfo info;
struct supertype *container;
struct active_array *next, *replaces;
int to_remove;
int action_fd;
int resync_start_fd;
int metadata_fd; /* for monitoring rw/ro status */
int sync_completed_fd; /* for checkpoint notification events */
int safe_mode_delay_fd;
unsigned long long last_checkpoint; /* sync_completed fires for many
* reasons this field makes sure the
* kernel has made progress before
* moving the checkpoint. It is
* cleared by the metadata handler
* when it determines recovery is
* terminated.
*/
enum array_state prev_state, curr_state, next_state;
enum sync_action prev_action, curr_action, next_action;
int check_degraded; /* flag set by mon, read by manage */
int check_reshape; /* flag set by mon, read by manage */
};
/*
* Metadata updates are handled by the monitor thread,
* as it has exclusive access to the metadata.
* When the manager want to updates metadata, either
* for it's own reason (e.g. committing a spare) or
* on behalf of mdadm, it creates a metadata_update
* structure and queues it to the monitor.
* Updates are created and processed by code under the
* superswitch. All common code sees them as opaque
* blobs.
*/
extern struct metadata_update *update_queue, *update_queue_handled;
#define MD_MAJOR 9
extern struct active_array *container;
extern struct active_array *discard_this;
extern struct active_array *pending_discard;
extern struct md_generic_cmd *active_cmd;
void remove_pidfile(char *devname);
void do_monitor(struct supertype *container);
void do_manager(struct supertype *container);
extern int sigterm;
int read_dev_state(int fd);
int is_container_member(struct mdstat_ent *mdstat, char *container);
struct mdstat_ent *mdstat_read(int hold, int start);
extern int exit_now, manager_ready;
extern int mon_tid, mgr_tid;
extern int monitor_loop_cnt;
/* helper routine to determine resync completion since MaxSector is a
* moving target
*/
static inline int is_resync_complete(struct mdinfo *array)
{
unsigned long long sync_size = 0;
int ncopies, l;
switch(array->array.level) {
case 1:
case 4:
case 5:
case 6:
sync_size = array->component_size;
break;
case 10:
l = array->array.layout;
ncopies = (l & 0xff) * ((l >> 8) & 0xff);
sync_size = array->component_size * array->array.raid_disks;
sync_size /= ncopies;
break;
}
return array->resync_start >= sync_size;
}

509
mdopen.c Normal file
View file

@ -0,0 +1,509 @@
/*
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2001-2013 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
#include "mdadm.h"
#include "md_p.h"
#include <ctype.h>
void make_parts(char *dev, int cnt)
{
/* make 'cnt' partition devices for 'dev'
* If dev is a device name we use the
* major/minor from dev and add 1..cnt
* If it is a symlink, we make similar symlinks.
* If dev ends with a digit, we add "p%d" else "%d"
* If the name exists, we use it's owner/mode,
* else that of dev
*/
struct stat stb;
int major_num;
int minor_num;
int odig;
int i;
int nlen = strlen(dev) + 20;
char *name;
int dig = isdigit(dev[strlen(dev)-1]);
char orig[1001];
char sym[1024];
int err;
if (cnt == 0)
cnt = 4;
if (lstat(dev, &stb)!= 0)
return;
if (S_ISBLK(stb.st_mode)) {
major_num = major(stb.st_rdev);
minor_num = minor(stb.st_rdev);
odig = -1;
} else if (S_ISLNK(stb.st_mode)) {
int len;
len = readlink(dev, orig, sizeof(orig));
if (len < 0 || len >= (int)sizeof(orig))
return;
orig[len] = 0;
odig = isdigit(orig[len-1]);
major_num = -1;
minor_num = -1;
} else
return;
name = xmalloc(nlen);
for (i = 1; i <= cnt ; i++) {
struct stat stb2;
snprintf(name, nlen, "%s%s%d", dev, dig?"p":"", i);
if (stat(name, &stb2) == 0) {
if (!S_ISBLK(stb2.st_mode) || !S_ISBLK(stb.st_mode))
continue;
if (stb2.st_rdev == makedev(major_num, minor_num+i))
continue;
unlink(name);
} else {
stb2 = stb;
}
if (S_ISBLK(stb.st_mode)) {
if (mknod(name, S_IFBLK | 0600,
makedev(major_num, minor_num+i)))
perror("mknod");
if (chown(name, stb2.st_uid, stb2.st_gid))
perror("chown");
if (chmod(name, stb2.st_mode & 07777))
perror("chmod");
err = 0;
} else {
snprintf(sym, sizeof(sym), "%s%s%d", orig, odig?"p":"", i);
err = symlink(sym, name);
}
if (err == 0 && stat(name, &stb2) == 0)
add_dev(name, &stb2, 0, NULL);
}
free(name);
}
int create_named_array(char *devnm)
{
int fd;
int n = -1;
static const char new_array_file[] = {
"/sys/module/md_mod/parameters/new_array"
};
fd = open(new_array_file, O_WRONLY);
if (fd < 0 && errno == ENOENT) {
if (system("modprobe md_mod") == 0)
fd = open(new_array_file, O_WRONLY);
}
if (fd >= 0) {
n = write(fd, devnm, strlen(devnm));
close(fd);
}
if (fd < 0 || n != (int)strlen(devnm)) {
pr_err("Fail to create %s when using %s, fallback to creation via node\n",
devnm, new_array_file);
return 0;
}
return 1;
}
/*
* We need a new md device to assemble/build/create an array.
* 'dev' is a name given us by the user (command line or mdadm.conf)
* It might start with /dev or /dev/md any might end with a digit
* string.
* If it starts with just /dev, it must be /dev/mdX or /dev/md_dX
* If it ends with a digit string, then it must be as above, or
* 'trustworthy' must be 'METADATA' and the 'dev' must be
* /dev/md/'name'NN or 'name'NN
* If it doesn't end with a digit string, it must be /dev/md/'name'
* or 'name' or must be NULL.
* If the digit string is present, it gives the minor number to use
* If not, we choose a high, unused minor number.
* If the 'dev' is a standard name, it devices whether 'md' or 'mdp'.
* else if the name is 'd[0-9]+' then we use mdp
* else if trustworthy is 'METADATA' we use md
* else the choice depends on 'autof'.
* If name is NULL it is assumed to match whatever dev provides.
* If both name and dev are NULL, we choose a name 'mdXX' or 'mdpXX'
*
* If 'name' is given, and 'trustworthy' is 'foreign' and name is not
* supported by 'dev', we add a "_%d" suffix based on the minor number
* use that.
*
* If udev is configured, we create a temporary device, open it, and
* unlink it.
* If not, we create the /dev/mdXX device, and if name is usable,
* /dev/md/name
* In any case we return /dev/md/name or (if that isn't available)
* /dev/mdXX in 'chosen'.
*
* When we create devices, we use uid/gid/umask from config file.
*/
int create_mddev(char *dev, char *name, int autof, int trustworthy,
char *chosen, int block_udev)
{
int mdfd;
struct stat stb;
int num = -1;
int use_mdp = -1;
struct createinfo *ci = conf_get_create_info();
int parts;
char *cname;
char devname[37];
char devnm[32];
char cbuf[400];
if (!use_udev())
block_udev = 0;
if (chosen == NULL)
chosen = cbuf;
if (autof == 0)
autof = ci->autof;
parts = autof >> 3;
autof &= 7;
strcpy(chosen, "/dev/md/");
cname = chosen + strlen(chosen);
if (dev) {
if (strncmp(dev, "/dev/md/", 8) == 0) {
strcpy(cname, dev+8);
} else if (strncmp(dev, "/dev/", 5) == 0) {
char *e = dev + strlen(dev);
while (e > dev && isdigit(e[-1]))
e--;
if (e[0])
num = strtoul(e, NULL, 10);
strcpy(cname, dev+5);
cname[e-(dev+5)] = 0;
/* name *must* be mdXX or md_dXX in this context */
if (num < 0 ||
(strcmp(cname, "md") != 0 && strcmp(cname, "md_d") != 0)) {
pr_err("%s is an invalid name for an md device. Try /dev/md/%s\n",
dev, dev+5);
return -1;
}
if (strcmp(cname, "md") == 0)
use_mdp = 0;
else
use_mdp = 1;
/* recreate name: /dev/md/0 or /dev/md/d0 */
sprintf(cname, "%s%d", use_mdp?"d":"", num);
} else
strcpy(cname, dev);
/* 'cname' must not contain a slash, and may not be
* empty.
*/
if (strchr(cname, '/') != NULL) {
pr_err("%s is an invalid name for an md device.\n", dev);
return -1;
}
if (cname[0] == 0) {
pr_err("%s is an invalid name for an md device (empty!).\n", dev);
return -1;
}
if (num < 0) {
/* If cname is 'N' or 'dN', we get dev number
* from there.
*/
char *sp = cname;
char *ep;
if (cname[0] == 'd')
sp++;
if (isdigit(sp[0]))
num = strtoul(sp, &ep, 10);
else
ep = sp;
if (ep == sp || *ep || num < 0)
num = -1;
else if (cname[0] == 'd')
use_mdp = 1;
else
use_mdp = 0;
}
}
/* Now determine device number */
/* named 'METADATA' cannot use 'mdp'. */
if (name && name[0] == 0)
name = NULL;
if (name && trustworthy == METADATA && use_mdp == 1) {
pr_err("%s is not allowed for a %s container. Consider /dev/md%d.\n", dev, name, num);
return -1;
}
if (name && trustworthy == METADATA)
use_mdp = 0;
if (use_mdp == -1) {
if (autof == 4 || autof == 6)
use_mdp = 1;
else
use_mdp = 0;
}
if (num < 0 && trustworthy == LOCAL && name) {
/* if name is numeric, possibly prefixed by
* 'md' or '/dev/md', use that for num
* if it is not already in use */
char *ep;
char *n2 = name;
if (strncmp(n2, "/dev/", 5) == 0)
n2 += 5;
if (strncmp(n2, "md", 2) == 0)
n2 += 2;
if (*n2 == '/')
n2++;
num = strtoul(n2, &ep, 10);
if (ep == n2 || *ep)
num = -1;
else {
sprintf(devnm, "md%s%d", use_mdp ? "_d":"", num);
if (mddev_busy(devnm))
num = -1;
}
}
if (cname[0] == 0 && name) {
/* Need to find a name if we can
* We don't completely trust 'name'. Truncate to
* reasonable length and remove '/'
*/
char *cp;
struct map_ent *map = NULL;
int conflict = 1;
int unum = 0;
int cnlen;
strncpy(cname, name, 200);
cname[200] = 0;
for (cp = cname; *cp ; cp++)
switch (*cp) {
case '/':
*cp = '-';
break;
case ' ':
case '\t':
*cp = '_';
break;
}
if (trustworthy == LOCAL ||
(trustworthy == FOREIGN && strchr(cname, ':') != NULL)) {
/* Only need suffix if there is a conflict */
if (map_by_name(&map, cname) == NULL)
conflict = 0;
}
cnlen = strlen(cname);
while (conflict) {
if (trustworthy == METADATA && !isdigit(cname[cnlen-1]))
sprintf(cname+cnlen, "%d", unum);
else
/* add _%d to FOREIGN array that don't
* a 'host:' prefix
*/
sprintf(cname+cnlen, "_%d", unum);
unum++;
if (map_by_name(&map, cname) == NULL)
conflict = 0;
}
}
devnm[0] = 0;
if (num < 0 && cname && ci->names) {
sprintf(devnm, "md_%s", cname);
if (block_udev)
udev_block(devnm);
if (!create_named_array(devnm)) {
devnm[0] = 0;
udev_unblock();
}
}
if (num >= 0) {
sprintf(devnm, "md%d", num);
if (block_udev)
udev_block(devnm);
if (!create_named_array(devnm)) {
devnm[0] = 0;
udev_unblock();
}
}
if (devnm[0] == 0) {
if (num < 0) {
/* need to choose a free number. */
char *_devnm = find_free_devnm(use_mdp);
if (_devnm == NULL) {
pr_err("No avail md devices - aborting\n");
return -1;
}
strcpy(devnm, _devnm);
} else {
sprintf(devnm, "%s%d", use_mdp?"md_d":"md", num);
if (mddev_busy(devnm)) {
pr_err("%s is already in use.\n",
dev);
return -1;
}
}
if (block_udev)
udev_block(devnm);
}
sprintf(devname, "/dev/%s", devnm);
if (dev && dev[0] == '/')
strcpy(chosen, dev);
else if (cname[0] == 0)
strcpy(chosen, devname);
/* We have a device number and name.
* If we cannot detect udev, we need to make
* devices and links ourselves.
*/
if (!use_udev()) {
/* Make sure 'devname' exists and 'chosen' is a symlink to it */
if (lstat(devname, &stb) == 0) {
/* Must be the correct device, else error */
if ((stb.st_mode&S_IFMT) != S_IFBLK ||
stb.st_rdev != devnm2devid(devnm)) {
pr_err("%s exists but looks wrong, please fix\n",
devname);
return -1;
}
} else {
if (mknod(devname, S_IFBLK|0600,
devnm2devid(devnm)) != 0) {
pr_err("failed to create %s\n",
devname);
return -1;
}
if (chown(devname, ci->uid, ci->gid))
perror("chown");
if (chmod(devname, ci->mode))
perror("chmod");
stat(devname, &stb);
add_dev(devname, &stb, 0, NULL);
}
if (use_mdp == 1)
make_parts(devname, parts);
if (strcmp(chosen, devname) != 0) {
if (mkdir("/dev/md",0700) == 0) {
if (chown("/dev/md", ci->uid, ci->gid))
perror("chown /dev/md");
if (chmod("/dev/md", ci->mode| ((ci->mode>>2) & 0111)))
perror("chmod /dev/md");
}
if (dev && strcmp(chosen, dev) == 0)
/* We know we are allowed to use this name */
unlink(chosen);
if (lstat(chosen, &stb) == 0) {
char buf[300];
ssize_t link_len = readlink(chosen, buf, sizeof(buf)-1);
if (link_len >= 0)
buf[link_len] = '\0';
if ((stb.st_mode & S_IFMT) != S_IFLNK ||
link_len < 0 ||
strcmp(buf, devname) != 0) {
pr_err("%s exists - ignoring\n",
chosen);
strcpy(chosen, devname);
}
} else if (symlink(devname, chosen) != 0)
pr_err("failed to create %s: %s\n",
chosen, strerror(errno));
if (use_mdp && strcmp(chosen, devname) != 0)
make_parts(chosen, parts);
}
}
mdfd = open_dev_excl(devnm);
if (mdfd < 0)
pr_err("unexpected failure opening %s\n",
devname);
return mdfd;
}
/* Open this and check that it is an md device.
* On success, return filedescriptor.
* On failure, return -1 if it doesn't exist,
* or -2 if it exists but is not an md device.
*/
int open_mddev(char *dev, int report_errors)
{
int mdfd = open(dev, O_RDONLY);
if (mdfd < 0) {
if (report_errors)
pr_err("error opening %s: %s\n",
dev, strerror(errno));
return -1;
}
if (md_array_valid(mdfd) == 0) {
close(mdfd);
if (report_errors)
pr_err("%s does not appear to be an md device\n", dev);
return -2;
}
return mdfd;
}
char *find_free_devnm(int use_partitions)
{
static char devnm[32];
int devnum;
for (devnum = 127; devnum != 128;
devnum = devnum ? devnum-1 : (1<<9)-1) {
if (use_partitions)
sprintf(devnm, "md_d%d", devnum);
else
sprintf(devnm, "md%d", devnum);
if (mddev_busy(devnm))
continue;
if (!conf_name_is_free(devnm))
continue;
if (!use_udev()) {
/* make sure it is new to /dev too, at least as a
* non-standard */
dev_t devid = devnm2devid(devnm);
if (devid) {
char *dn = map_dev(major(devid),
minor(devid), 0);
if (dn && ! is_standard(dn, NULL))
continue;
}
}
break;
}
if (devnum == 128)
return NULL;
return devnm;
}

441
mdstat.c Normal file
View file

@ -0,0 +1,441 @@
/*
* mdstat - parse /proc/mdstat file. Part of:
* mdadm - manage Linux "md" devices aka RAID arrays.
*
* Copyright (C) 2002-2009 Neil Brown <neilb@suse.de>
*
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Author: Neil Brown
* Email: <neilb@suse.de>
*/
/*
* The /proc/mdstat file comes in at least 3 flavours:
* In an unpatched 2.2 kernel (md 0.36.6):
* Personalities : [n raidx] ...
* read_ahead {not set|%d sectors}
* md0 : {in}active{ raidX /dev/hda... %d blocks{ maxfault=%d}}
* md1 : .....
*
* Normally only 4 md lines, but all are listed.
*
* In a patched 2.2 kernel (md 0.90.0)
* Personalities : [raidx] ...
* read_ahead {not set|%d sectors}
* mdN : {in}active {(readonly)} raidX dev[%d]{(F)} ... %d blocks STATUS RESYNC
* ... Only initialised arrays listed
* unused devices: {dev dev ... | <none>}
*
* STATUS is personality dependant:
* linear: %dk rounding
* raid0: %dk chunks
* raid1: [%d/%d] [U_U] ( raid/working. operational or not)
* raid5: level 4/5, %dk chunk, algorithm %d [%d/%d] [U_U]
*
* RESYNC is empty or:
* {resync|recovery}=%u%% finish=%u.%umin
* or
* resync=DELAYED
*
* In a 2.4 kernel (md 0.90.0/2.4)
* Personalities : [raidX] ...
* read_ahead {not set|%d sectors}
* mdN : {in}active {(read-only)} raidX dev[%d]{(F)} ...
* %d blocks STATUS
* RESYNC
* unused devices: {dev dev .. | <none>}
*
* STATUS matches 0.90.0/2.2
* RESYNC includes [===>....],
* adds a space after {resync|recovery} and before and after '='
* adds a decimal to the recovery percent.
* adds (%d/%d) resync amount and max_blocks, before finish.
* adds speed=%dK/sec after finish
*
*
*
* Out of this we want to extract:
* list of devices, active or not
* pattern of failed drives (so need number of drives)
* percent resync complete
*
* As continuation is indicated by leading space, we use
* conf_line from config.c to read logical lines
*
*/
#include "mdadm.h"
#include "dlink.h"
#include <sys/select.h>
#include <ctype.h>
static void free_member_devnames(struct dev_member *m)
{
while(m) {
struct dev_member *t = m;
m = m->next;
free(t->name);
free(t);
}
}
static int add_member_devname(struct dev_member **m, char *name)
{
struct dev_member *new;
char *t;
if ((t = strchr(name, '[')) == NULL)
/* not a device */
return 0;
new = xmalloc(sizeof(*new));
new->name = strndup(name, t - name);
new->next = *m;
*m = new;
return 1;
}
void free_mdstat(struct mdstat_ent *ms)
{
while (ms) {
struct mdstat_ent *t;
free(ms->level);
free(ms->pattern);
free(ms->metadata_version);
free_member_devnames(ms->members);
t = ms;
ms = ms->next;
free(t);
}
}
static int mdstat_fd = -1;
struct mdstat_ent *mdstat_read(int hold, int start)
{
FILE *f;
struct mdstat_ent *all, *rv, **end, **insert_here;
char *line;
int fd;
if (hold && mdstat_fd != -1) {
off_t offset = lseek(mdstat_fd, 0L, 0);
if (offset == (off_t)-1) {
return NULL;
}
fd = dup(mdstat_fd);
if (fd >= 0)
f = fdopen(fd, "r");
else
return NULL;
} else
f = fopen("/proc/mdstat", "r");
if (f == NULL)
return NULL;
else
fcntl(fileno(f), F_SETFD, FD_CLOEXEC);
all = NULL;
end = &all;
for (; (line = conf_line(f)) ; free_line(line)) {
struct mdstat_ent *ent;
char *w;
char devnm[32];
int in_devs = 0;
if (strcmp(line, "Personalities") == 0)
continue;
if (strcmp(line, "read_ahead") == 0)
continue;
if (strcmp(line, "unused") == 0)
continue;
insert_here = NULL;
/* Better be an md line.. */
if (strncmp(line, "md", 2)!= 0 || strlen(line) >= 32 ||
(line[2] != '_' && !isdigit(line[2])))
continue;
strcpy(devnm, line);
ent = xmalloc(sizeof(*ent));
ent->level = ent->pattern= NULL;
ent->next = NULL;
ent->percent = RESYNC_NONE;
ent->active = -1;
ent->resync = 0;
ent->metadata_version = NULL;
ent->raid_disks = 0;
ent->devcnt = 0;
ent->members = NULL;
strcpy(ent->devnm, devnm);
for (w=dl_next(line); w!= line ; w=dl_next(w)) {
int l = strlen(w);
char *eq;
if (strcmp(w, "active") == 0)
ent->active = 1;
else if (strcmp(w, "inactive") == 0) {
ent->active = 0;
in_devs = 1;
} else if (strcmp(w, "bitmap:") == 0) {
/* We need to stop parsing here;
* otherwise, ent->raid_disks will be
* overwritten by the wrong value.
*/
break;
} else if (ent->active > 0 &&
ent->level == NULL &&
w[0] != '(' /*readonly*/) {
ent->level = xstrdup(w);
in_devs = 1;
} else if (in_devs && strcmp(w, "blocks") == 0)
in_devs = 0;
else if (in_devs) {
char *ep = strchr(w, '[');
ent->devcnt +=
add_member_devname(&ent->members, w);
if (ep && strncmp(w, "md", 2) == 0) {
/* This has an md device as a component.
* If that device is already in the
* list, make sure we insert before
* there.
*/
struct mdstat_ent **ih;
ih = &all;
while (ih != insert_here && *ih &&
((int)strlen((*ih)->devnm) !=
ep-w ||
strncmp((*ih)->devnm, w,
ep-w) != 0))
ih = & (*ih)->next;
insert_here = ih;
}
} else if (strcmp(w, "super") == 0 &&
dl_next(w) != line) {
w = dl_next(w);
ent->metadata_version = xstrdup(w);
} else if (w[0] == '[' && isdigit(w[1])) {
ent->raid_disks = atoi(w+1);
} else if (!ent->pattern &&
w[0] == '[' &&
(w[1] == 'U' || w[1] == '_')) {
ent->pattern = xstrdup(w+1);
if (ent->pattern[l-2] == ']')
ent->pattern[l-2] = '\0';
} else if (ent->percent == RESYNC_NONE &&
strncmp(w, "re", 2) == 0 &&
w[l-1] == '%' &&
(eq = strchr(w, '=')) != NULL ) {
ent->percent = atoi(eq+1);
if (strncmp(w,"resync", 6) == 0)
ent->resync = 1;
else if (strncmp(w, "reshape", 7) == 0)
ent->resync = 2;
else
ent->resync = 0;
} else if (ent->percent == RESYNC_NONE &&
(w[0] == 'r' || w[0] == 'c')) {
if (strncmp(w, "resync", 6) == 0)
ent->resync = 1;
if (strncmp(w, "reshape", 7) == 0)
ent->resync = 2;
if (strncmp(w, "recovery", 8) == 0)
ent->resync = 0;
if (strncmp(w, "check", 5) == 0)
ent->resync = 3;
if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0)
ent->percent = RESYNC_DELAYED;
if (l > 8 && strcmp(w+l-8, "=PENDING") == 0)
ent->percent = RESYNC_PENDING;
if (l > 7 && strcmp(w+l-7, "=REMOTE") == 0)
ent->percent = RESYNC_REMOTE;
} else if (ent->percent == RESYNC_NONE &&
w[0] >= '0' &&
w[0] <= '9' &&
w[l-1] == '%') {
ent->percent = atoi(w);
}
}
if (insert_here && (*insert_here)) {
ent->next = *insert_here;
*insert_here = ent;
} else {
*end = ent;
end = &ent->next;
}
}
if (hold && mdstat_fd == -1) {
mdstat_fd = dup(fileno(f));
fcntl(mdstat_fd, F_SETFD, FD_CLOEXEC);
}
fclose(f);
/* If we might want to start array,
* reverse the order, so that components comes before composites
*/
if (start) {
rv = NULL;
while (all) {
struct mdstat_ent *e = all;
all = all->next;
e->next = rv;
rv = e;
}
} else
rv = all;
return rv;
}
void mdstat_close(void)
{
if (mdstat_fd >= 0)
close(mdstat_fd);
mdstat_fd = -1;
}
/*
* function: mdstat_wait
* Description: Function waits for event on mdstat.
* Parameters:
* seconds - timeout for waiting
* Returns:
* > 0 - detected event
* 0 - timeout
* < 0 - detected error
*/
int mdstat_wait(int seconds)
{
fd_set fds;
struct timeval tm;
int maxfd = 0;
FD_ZERO(&fds);
if (mdstat_fd >= 0) {
FD_SET(mdstat_fd, &fds);
maxfd = mdstat_fd;
} else
return -1;
tm.tv_sec = seconds;
tm.tv_usec = 0;
return select(maxfd + 1, NULL, NULL, &fds, &tm);
}
void mdstat_wait_fd(int fd, const sigset_t *sigmask)
{
fd_set fds, rfds;
int maxfd = 0;
FD_ZERO(&fds);
FD_ZERO(&rfds);
if (mdstat_fd >= 0)
FD_SET(mdstat_fd, &fds);
if (fd >= 0) {
struct stat stb;
fstat(fd, &stb);
if ((stb.st_mode & S_IFMT) == S_IFREG)
/* Must be a /proc or /sys fd, so expect
* POLLPRI
* i.e. an 'exceptional' event.
*/
FD_SET(fd, &fds);
else
FD_SET(fd, &rfds);
if (fd > maxfd)
maxfd = fd;
}
if (mdstat_fd > maxfd)
maxfd = mdstat_fd;
pselect(maxfd + 1, &rfds, NULL, &fds,
NULL, sigmask);
}
int mddev_busy(char *devnm)
{
struct mdstat_ent *mdstat = mdstat_read(0, 0);
struct mdstat_ent *me;
for (me = mdstat ; me ; me = me->next)
if (strcmp(me->devnm, devnm) == 0)
break;
free_mdstat(mdstat);
return me != NULL;
}
struct mdstat_ent *mdstat_by_component(char *name)
{
struct mdstat_ent *mdstat = mdstat_read(0, 0);
while (mdstat) {
struct dev_member *m;
struct mdstat_ent *ent;
if (mdstat->metadata_version &&
strncmp(mdstat->metadata_version, "external:", 9) == 0 &&
is_subarray(mdstat->metadata_version+9))
/* don't return subarrays, only containers */
;
else for (m = mdstat->members; m; m = m->next) {
if (strcmp(m->name, name) == 0) {
free_mdstat(mdstat->next);
mdstat->next = NULL;
return mdstat;
}
}
ent = mdstat;
mdstat = mdstat->next;
ent->next = NULL;
free_mdstat(ent);
}
return NULL;
}
struct mdstat_ent *mdstat_by_subdev(char *subdev, char *container)
{
struct mdstat_ent *mdstat = mdstat_read(0, 0);
struct mdstat_ent *ent = NULL;
while (mdstat) {
/* metadata version must match:
* external:[/-]%s/%s
* where first %s is 'container' and second %s is 'subdev'
*/
if (ent)
free_mdstat(ent);
ent = mdstat;
mdstat = mdstat->next;
ent->next = NULL;
if (ent->metadata_version == NULL ||
strncmp(ent->metadata_version, "external:", 9) != 0)
continue;
if (!metadata_container_matches(ent->metadata_version+9,
container) ||
!metadata_subdev_matches(ent->metadata_version+9,
subdev))
continue;
free_mdstat(mdstat);
return ent;
}
return NULL;
}

166
misc/mdcheck Normal file
View file

@ -0,0 +1,166 @@
#!/bin/bash
# Copyright (C) 2014-2017 Neil Brown <neilb@suse.de>
#
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# Author: Neil Brown
# Email: <neilb@suse.com>
# This script should be run periodically to automatically
# perform a 'check' on any md arrays.
#
# It supports a 'time budget' such that any incomplete 'check'
# will be checkpointed when that time has expired.
# A subsequent invocation can allow the 'check' to continue.
#
# Options are:
# --continue Don't start new checks, only continue old ones.
# --duration This is passed to "date --date=$duration" to find out
# when to finish
#
# To support '--continue', arrays are identified by UUID and the 'sync_completed'
# value is stored in /var/lib/mdcheck/$UUID
# convert a /dev/md name into /sys/.../md equivalent
sysname() {
set `ls -lLd $1`
maj=${5%,}
min=$6
readlink -f /sys/dev/block/$maj:$min
}
args=$(getopt -o hcd: -l help,continue,duration: -n mdcheck -- "$@")
rv=$?
if [ $rv -ne 0 ]; then exit $rv; fi
eval set -- $args
cont=
endtime=
while [ " $1" != " --" ]
do
case $1 in
--help )
echo >&2 'Usage: mdcheck [--continue] [--duration time-offset]'
echo >&2 ' time-offset must be understood by "date --date"'
exit 0
;;
--continue ) cont=yes ;;
--duration ) shift; dur=$1
endtime=$(date --date "$dur" "+%s")
;;
esac
shift
done
shift
# We need a temp file occasionally...
tmp=/var/lib/mdcheck/.md-check-$$
trap 'rm -f "$tmp"' 0 2 3 15
# firstly, clean out really old state files
mkdir -p /var/lib/mdcheck
find /var/lib/mdcheck -name "MD_UUID*" -type f -mtime +180 -exec rm {} \;
# Now look at each md device.
cnt=0
for dev in /dev/md?*
do
[ -e "$dev" ] || continue
sys=`sysname $dev`
if [ ! -f "$sys/md/sync_action" ]
then # cannot check this array
continue
fi
if [ "`cat $sys/md/sync_action`" != 'idle' ]
then # This array is busy
continue
fi
mdadm --detail --export "$dev" | grep '^MD_UUID=' > $tmp || continue
source $tmp
fl="/var/lib/mdcheck/MD_UUID_$MD_UUID"
if [ -z "$cont" ]
then
start=0
logger -p daemon.info mdcheck start checking $dev
elif [ -z "$MD_UUID" -o ! -f "$fl" ]
then
# Nothing to continue here
continue
else
start=`cat "$fl"`
logger -p daemon.info mdcheck continue checking $dev from $start
fi
cnt=$[cnt+1]
eval MD_${cnt}_fl=\$fl
eval MD_${cnt}_sys=\$sys
eval MD_${cnt}_dev=\$dev
echo $start > $fl
echo $start > $sys/md/sync_min
echo check > $sys/md/sync_action
done
if [ -z "$endtime" ]
then
exit 0
fi
while [ `date +%s` -lt $endtime ]
do
any=
for i in `eval echo {1..$cnt}`
do
eval fl=\$MD_${i}_fl
eval sys=\$MD_${i}_sys
eval dev=\$MD_${i}_dev
if [ -z "$fl" ]; then continue; fi
if [ "`cat $sys/md/sync_action`" != 'check' ]
then
logger -p daemon.info mdcheck finished checking $dev
eval MD_${i}_fl=
rm -f $fl
continue;
fi
read a rest < $sys/md/sync_completed
echo $a > $fl
any=yes
done
if [ -z "$any" ]; then exit 0; fi
sleep 120
done
# We've waited, and there are still checks running.
# Time to stop them.
for i in `eval echo {1..$cnt}`
do
eval fl=\$MD_${i}_fl
eval sys=\$MD_${i}_sys
eval dev=\$MD_${i}_dev
if [ -z "$fl" ]; then continue; fi
if [ "`cat $sys/md/sync_action`" != 'check' ]
then
eval MD_${i}_fl=
rm -f $fl
continue;
fi
echo idle > $sys/md/sync_action
cat $sys/md/sync_min > $fl
logger -p daemon.info pause checking $dev at `cat $fl`
done

27
misc/syslog-events Normal file
View file

@ -0,0 +1,27 @@
#!/bin/sh
#
# sample event handling script for mdadm
# e.g. mdadm --follow --program=/sbin/syslog-events --scan
#
# License: GPL ver.2
# Copyright (C) 2004 SEKINE Tatsuo <tsekine@sdri.co.jp>
event="$1"
dev="$2"
disc="$3"
facility="kern"
tag="mdmonitor"
case x"${event}" in
xFail*) priority="error" ;;
xTest*) priority="debug" ;;
x*) priority="info" ;;
esac
msg="${event} event on ${dev}"
if [ x"${disc}" != x ]; then
msg="${msg}, related to disc ${disc}"
fi
exec logger -t "${tag}" -p "${facility}.${priority}" -- "${msg}"

55
mkinitramfs Normal file
View file

@ -0,0 +1,55 @@
#!/bin/sh
# make sure we are being run in the right directory...
if [ -f mkinitramfs ]
then :
else
echo >&2 mkinitramfs must be run from the mdadm source directory.
exit 1
fi
if [ -f /bin/busybox ]
then : good, it exists
case `file /bin/busybox` in
*statically* ) : good ;;
* ) echo >&2 mkinitramfs: /bin/busybox is not statically linked: cannot proceed.
exit 1
esac
else
echo >&2 "mkinitramfs: /bin/busybox doesn't exist - please install it statically linked."
exit 1
fi
rm -rf initramfs
mkdir initramfs
mkdir initramfs/bin
make mdadm.static
cp mdadm.static initramfs/bin/mdadm
cp /bin/busybox initramfs/bin/busybox
ln initramfs/bin/busybox initramfs/bin/sh
cat <<- END > initramfs/init
#!/bin/sh
echo 'Auto-assembling boot md array'
mkdir /proc
mount -t proc proc /proc
if [ -n "$rootuuid" ]
then arg=--uuid=$rootuuid
elif [ -n "$mdminor" ]
then arg=--super-minor=$mdminor
else arg=--super-minor=0
fi
echo "Using $arg"
mdadm -Acpartitions $arg --auto=part /dev/mda
cd /
mount /dev/mda1 /root || mount /dev/mda /root
umount /proc
cd /root
exec chroot . /sbin/init < /dev/console > /dev/console 2>&1
END
chmod +x initramfs/init
(cd initramfs
find init bin | cpio -o -H newc | gzip --best
) > init.cpio.gz
rm -rf initramfs
ls -l init.cpio.gz

Some files were not shown because too many files have changed in this diff Show more