Merging upstream version 0.27.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
619358407d
commit
5e422e043e
83 changed files with 980 additions and 726 deletions
174
doc/tarlz.texi
174
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 7 December 2024
|
||||
@set VERSION 0.26
|
||||
@set UPDATED 28 February 2025
|
||||
@set VERSION 0.27
|
||||
|
||||
@dircategory Archiving
|
||||
@direntry
|
||||
|
@ -39,6 +39,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command-line interface
|
||||
* Argument syntax:: By convention, options start with a hyphen
|
||||
* Creating backups safely:: Checking integrity and accuracy of archives
|
||||
* Portable character set:: POSIX portable filename character set
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
|
@ -51,7 +52,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2013-2024 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2013-2025 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to copy,
|
||||
distribute, and modify it.
|
||||
|
@ -76,7 +77,7 @@ compressed archives.
|
|||
|
||||
Keeping the alignment between tar members and lzip members has two
|
||||
advantages. It adds an indexed lzip layer on top of the tar archive, making
|
||||
it possible to decode the archive safely in parallel. It also minimizes the
|
||||
it possible to decode the archive safely in parallel. It also reduces the
|
||||
amount of data lost in case of corruption. Compressing a tar archive with
|
||||
plzip may even double the amount of files lost for each lzip member damaged
|
||||
because it does not keep the members aligned.
|
||||
|
@ -254,7 +255,7 @@ during multi-threaded extraction. @xref{mt-extraction}.
|
|||
@item -t
|
||||
@itemx --list
|
||||
List the contents of an archive. If @var{files} are given, list only the
|
||||
@var{files} given.
|
||||
@var{files} given. @xref{mt-listing}.
|
||||
|
||||
@item -x
|
||||
@itemx --extract
|
||||
|
@ -265,20 +266,23 @@ directory without extracting the files under it, use
|
|||
empty directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a non-empty
|
||||
directory usually fails.
|
||||
directory usually fails. @xref{mt-extraction}.
|
||||
|
||||
@item -z
|
||||
@itemx --compress
|
||||
Compress existing POSIX tar archives aligning the lzip members to the tar
|
||||
members with choice of granularity (@option{--bsolid} by default,
|
||||
@option{--dsolid} works like @option{--asolid}). Exit with error status 2 if
|
||||
any input archive is an empty file. The input archives are kept unchanged.
|
||||
Existing compressed archives are not overwritten. A hyphen @samp{-} used as
|
||||
the name of an input archive reads from standard input and writes to
|
||||
standard output (unless the option @option{--output} is used). Tarlz can be
|
||||
used as compressor for GNU tar by using a command like
|
||||
@w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be used as
|
||||
compressor for zupdate (zutils) by using a command like
|
||||
@option{--dsolid} works like @option{--asolid}). Each input archive is
|
||||
compressed to a file with the extension @file{.lz} added unless the option
|
||||
@option{--output} is used. If no archives are specified, or if a hyphen
|
||||
@samp{-} is used as the name of an archive, tarlz reads from standard input
|
||||
and writes to standard output (unless the option @option{--output} is used).
|
||||
When @option{--output} is used, only one input archive can be specified.
|
||||
Exit with error status 2 if any input archive is an empty file. The input
|
||||
archives are kept unchanged. Existing compressed archives are not
|
||||
overwritten. Tarlz can be used as compressor for GNU tar by using a command
|
||||
like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}. Tarlz can be
|
||||
used as compressor for zupdate (zutils) by using a command like
|
||||
@w{@samp{zupdate --lz='tarlz -z' foo.tar.gz}}. Note that tarlz only works
|
||||
reliably on archives without global headers, or with global headers whose
|
||||
content can be ignored.
|
||||
|
@ -289,10 +293,8 @@ block is found, and then compresses the rest of the archive. Unless solid
|
|||
compression is requested, the end-of-archive blocks are compressed in a lzip
|
||||
member separated from the preceding members and from any nonzero garbage
|
||||
following the end-of-archive blocks. @option{--compress} implies plzip
|
||||
argument style, not tar style. Each input archive is compressed to a file
|
||||
with the extension @file{.lz} added unless the option @option{--output} is
|
||||
used. When @option{--output} is used, only one input archive can be specified.
|
||||
@option{-f} can't be used with @option{--compress}.
|
||||
argument style, not tar style. @option{-f} can't be used with
|
||||
@option{--compress}.
|
||||
|
||||
@item --check-lib
|
||||
Compare the
|
||||
|
@ -319,8 +321,10 @@ tarlz supports the following options: @xref{Argument syntax}.
|
|||
@itemx --data-size=@var{bytes}
|
||||
Set target size of input data blocks for the option @option{--bsolid}.
|
||||
@xref{--bsolid}. Valid values range from @w{8 KiB} to @w{1 GiB}. Default
|
||||
value is two times the dictionary size, except for option @option{-0} where it
|
||||
defaults to @w{1 MiB}. @xref{Minimum archive sizes}.
|
||||
value is two times the dictionary size, except for option @option{-0} where
|
||||
it defaults to @w{1 MiB}. @xref{Minimum archive sizes}. Tarlz does not split
|
||||
tar members. If a file is larger than @var{bytes}, tarlz will create a lzip
|
||||
member large enough to contain the file.
|
||||
|
||||
@item -C @var{dir}
|
||||
@itemx --directory=@var{dir}
|
||||
|
@ -465,12 +469,13 @@ If @var{group} is not a valid group name, it is decoded as a decimal numeric
|
|||
group ID.
|
||||
|
||||
@item --exclude=@var{pattern}
|
||||
Exclude files matching a shell pattern like @file{*.o}. A file is considered
|
||||
to match if any component of the file name matches. For example, @file{*.o}
|
||||
matches @file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If
|
||||
@var{pattern} contains a @samp{/}, it matches a corresponding @samp{/} in
|
||||
the file name. For example, @file{foo/*.o} matches @file{foo/bar.o}.
|
||||
Multiple @option{--exclude} options can be specified.
|
||||
Exclude files matching a shell pattern like @file{*.o}, even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
component of the file name matches. For example, @file{*.o} matches
|
||||
@file{foo.o}, @file{foo.o/bar} and @file{foo/bar.o}. If @var{pattern}
|
||||
contains a @samp{/}, it matches a corresponding @samp{/} in the file name.
|
||||
For example, @file{foo/*.o} matches @file{foo/bar.o}. Multiple
|
||||
@option{--exclude} options can be specified.
|
||||
|
||||
@item --ignore-ids
|
||||
Make @option{--diff} ignore differences in owner and group IDs. This option is
|
||||
|
@ -493,6 +498,7 @@ recover as much data as possible from each damaged member. It is recommended
|
|||
to run tarlz in single-threaded mode (@option{--threads=0}) when using this
|
||||
option.
|
||||
|
||||
@anchor{--missing-crc}
|
||||
@item --missing-crc
|
||||
Exit with error status 2 if the CRC of the extended records is missing. When
|
||||
this option is used, tarlz detects any corruption in the extended records
|
||||
|
@ -525,9 +531,9 @@ values range from 1 to 1024. The default value is 64.
|
|||
During archive creation, warn if any file being archived has a modification
|
||||
time newer than the archive creation time. This option may slow archive
|
||||
creation somewhat because it makes an extra call to @samp{stat} after
|
||||
archiving each file, but it guarantees that file contents were not modified
|
||||
during the creation of the archive. Note that the file must be at least one
|
||||
second newer than the archive for it to be detected as newer.
|
||||
archiving each file, but it nearly guarantees that file contents were not
|
||||
modified during the creation of the archive. Note that the file must be at
|
||||
least one second newer than the archive for it to be detected as newer.
|
||||
|
||||
@ignore
|
||||
@item --permissive
|
||||
|
@ -591,6 +597,58 @@ Thus, @w{@option{--foo bar}} and @option{--foo=bar} are equivalent.
|
|||
@end itemize
|
||||
|
||||
|
||||
@node Creating backups safely
|
||||
@chapter Checking the integrity and accuracy of tar.lz archives
|
||||
@cindex creating backups
|
||||
|
||||
Uncompressed tar archives do not offer any integrity checking for the files
|
||||
they store. The pax format even fails to offer integrity checking for some
|
||||
of the metadata. @xref{crc32}. The integrity checking of tar archives is
|
||||
usually provided by a compression layer or by an external hash.
|
||||
|
||||
Lzip compression provides safe integrity checking to tar archives. But it
|
||||
does not matter how safe is the archiving format if the archive is created
|
||||
corrupt because of a concurrent modification of the files being archived, a
|
||||
faulty RAM, or a bug in the archiving tool. The only way of guaranteeing
|
||||
that a backup archive is correct is to check its integrity and accuracy
|
||||
after creating it.
|
||||
|
||||
Testing the integrity of the archive with @w{@samp{lzip -tv}} guarantees
|
||||
that the compression layer of the archive is valid, but it does not
|
||||
guarantee that the tar layer is valid nor that the files in the archive
|
||||
match the files in the file system. For example, if the RAM is faulty and a
|
||||
bit flip happens in the input buffer before tarlz compresses it, the archive
|
||||
will not match the files. It is safer to check the archive with
|
||||
@w{@samp{tarlz -d}} just after creation because it checks the compression
|
||||
layer and the tar layer, and it compares the files in the archive with the
|
||||
files in the file system:
|
||||
|
||||
@example
|
||||
tarlz -cf archive.tar.lz somedir # create the archive
|
||||
tarlz -df archive.tar.lz # check the archive
|
||||
@end example
|
||||
|
||||
Once the integrity and accuracy of an archive have been verified as in the
|
||||
example above, they can be verified again anywhere at any time with
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
|
||||
@option{-n0} because multi-threaded listing does not detect corruption in
|
||||
the tar member data of multimember archives: @xref{mt-listing}.
|
||||
|
||||
@example
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
@end example
|
||||
|
||||
@w{@samp{lzip -tv}} checks the integrity of the compression layer, and
|
||||
therefore the integrity and accuracy of any archive created and verified as
|
||||
explained above. This test is reliable for solidly compressed archives, but
|
||||
it does not detect a truncated multimember archive if the truncation happens
|
||||
just at a member boundary:
|
||||
|
||||
@example
|
||||
lzip -tv archive.tar.lz
|
||||
@end example
|
||||
|
||||
|
||||
@node Portable character set
|
||||
@chapter POSIX portable filename character set
|
||||
@cindex portable character set
|
||||
|
@ -641,7 +699,7 @@ are not allowed in multimember files.
|
|||
|
||||
Each lzip member contains one or more tar members in a simplified POSIX pax
|
||||
interchange format. The only pax typeflag value supported by tarlz (in
|
||||
addition to the typeflag values defined by the ustar format) is @samp{x}.
|
||||
addition to the typeflag values defined by the ustar format) is 'x'.
|
||||
The pax format is an extension on top of the ustar format that removes the
|
||||
size limitations of the ustar format.
|
||||
|
||||
|
@ -654,7 +712,7 @@ An optional extended header block followed by one or more blocks that
|
|||
contain the extended header records as if they were the contents of a file;
|
||||
i.e., the extended header records are included as the data for this header
|
||||
block. This header block is of the form described in pax header block, with
|
||||
a typeflag value of @samp{x}.
|
||||
a typeflag value of 'x'.
|
||||
|
||||
@item
|
||||
A header block in ustar format that describes the file. Any fields defined
|
||||
|
@ -713,7 +771,7 @@ An extended header just before the end-of-archive blocks.
|
|||
@section Pax header block
|
||||
|
||||
The pax header block is identical to the ustar header block described below
|
||||
except that the typeflag has the value @samp{x} (extended). The field
|
||||
except that the typeflag has the value 'x' (extended). The field
|
||||
@samp{size} is the size of the extended header data in bytes. Most other
|
||||
fields in the pax header block are zeroed on archive creation to prevent
|
||||
trouble if the archive is read by a ustar tool, and are ignored by tarlz on
|
||||
|
@ -752,8 +810,8 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
|
|||
The file name of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the field @samp{linkname} in the
|
||||
following ustar header block. The following ustar header block determines
|
||||
the type of link created. If typeflag of the following header block is 1, a
|
||||
hard link is created. If typeflag is 2, a symbolic link is created and the
|
||||
the type of link created. If typeflag of the following header block is '1', a
|
||||
hard link is created. If typeflag is '2', a symbolic link is created and the
|
||||
linkpath value is used as the contents of the symbolic link. The linkpath
|
||||
record is created only for links with a link name that does not fit in the
|
||||
space provided by the ustar header.
|
||||
|
@ -789,13 +847,12 @@ greater than 2_097_151 @w{(octal 7_777_777)}. @xref{ustar-uid-gid}.
|
|||
@item GNU.crc32
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
|
||||
representing the CRC <value> itself. The <value> is represented as 8
|
||||
hexadecimal digits in big endian order,
|
||||
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
|
||||
protected by the CRC to guarantee that corruption is always detected when
|
||||
using @option{--missing-crc} (except in case of CRC collision). A CRC was
|
||||
chosen because a checksum is too weak for a potentially large list of
|
||||
variable sized records. A checksum can't detect simple errors like the
|
||||
swapping of two bytes.
|
||||
hexadecimal digits in big endian order, @w{@samp{22 GNU.crc32=00000000\n}}.
|
||||
The option @option{--missing-crc} guarantees that corruption is always
|
||||
detected (except in case of CRC collision). A CRC was chosen because a
|
||||
checksum is too weak for a potentially large list of variable sized records.
|
||||
A checksum can't detect simple errors like the swapping of two bytes.
|
||||
@xref{--missing-crc}.
|
||||
|
||||
@end table
|
||||
|
||||
|
@ -825,6 +882,7 @@ shown in the following table. All lengths and offsets are in decimal:
|
|||
@item devmajor @tab 329 @tab 8
|
||||
@item devminor @tab 337 @tab 8
|
||||
@item prefix @tab 345 @tab 155
|
||||
@item padding @tab 500 @tab 12
|
||||
@end multitable
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC 646:1991
|
||||
|
@ -919,7 +977,7 @@ FIFO special file.
|
|||
@item '7'
|
||||
Reserved to represent a file to which an implementation has associated some
|
||||
high-performance attribute (contiguous file). Tarlz treats this type of file
|
||||
as a regular file (type 0).
|
||||
as a regular file (type '0').
|
||||
|
||||
@end table
|
||||
|
||||
|
@ -930,8 +988,8 @@ except when all characters in the array contain non-null characters
|
|||
including the last character. Each numeric field contains a leading space-
|
||||
or zero-filled, optionally null-terminated octal number using digits from
|
||||
the ISO/IEC 646:1991 (ASCII) standard. Tarlz is able to decode numeric
|
||||
fields 1 byte longer than standard ustar by not requiring a terminating null
|
||||
character.
|
||||
fields one byte longer than standard ustar by not requiring a terminating
|
||||
null character.
|
||||
|
||||
|
||||
@node Amendments to pax format
|
||||
|
@ -1044,8 +1102,7 @@ extracting file data for a hard link to a symbolic link or to a directory.
|
|||
There is no portable way to tell what charset a text string is coded into.
|
||||
Therefore, tarlz stores all fields representing text strings unmodified,
|
||||
without conversion to UTF-8 nor any other transformation. This prevents
|
||||
accidental double UTF-8 conversions. If the need arises this behavior will
|
||||
be adjusted with a command-line option in the future.
|
||||
accidental double UTF-8 conversions.
|
||||
|
||||
|
||||
@node Program design
|
||||
|
@ -1054,12 +1111,12 @@ be adjusted with a command-line option in the future.
|
|||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The interesting
|
||||
parts described here are those related to Multi-threaded processing.
|
||||
parts described here are those related to multi-threaded processing.
|
||||
|
||||
The structure of the part of tarlz performing Multi-threaded archive
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
creation is somewhat similar to that of
|
||||
@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the
|
||||
added complication of the solidity levels.
|
||||
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
|
||||
with the added complication of the solidity levels.
|
||||
@ifnothtml
|
||||
@xref{Program design,,,plzip}.
|
||||
@end ifnothtml
|
||||
|
@ -1174,6 +1231,9 @@ tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
|||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
@anchor{mt-listing}
|
||||
@section Multi-threaded listing
|
||||
|
||||
If the files in the archive are large, multi-threaded @option{--list} on a
|
||||
regular (seekable) tar.lz archive can be hundreds of times faster than
|
||||
sequential @option{--list} because, in addition to using several processors,
|
||||
|
@ -1189,8 +1249,10 @@ time tarlz -tf silesia.tar.lz (0.020s)
|
|||
|
||||
On the other hand, multi-threaded @option{--list} won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. This is another reason why the tar
|
||||
headers must provide their own integrity checking.
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
why the tar headers (including the extended records) must provide their own
|
||||
integrity checking.
|
||||
|
||||
@anchor{mt-extraction}
|
||||
@section Limitations of multi-threaded extraction
|
||||
|
@ -1344,11 +1406,13 @@ tarlz -z --no-solid archive.tar
|
|||
@end example
|
||||
|
||||
@noindent
|
||||
Example 10: Compress the archive @file{archive.tar} and write the output to
|
||||
@file{foo.tar.lz}.
|
||||
Example 10: Recompress the archive @file{archive.tar.lz} with different
|
||||
solidity, write the output to @file{archive-ns.tar.lz}, and compare both
|
||||
archives.
|
||||
|
||||
@example
|
||||
tarlz -z -o foo.tar.lz archive.tar
|
||||
lzip -cd archive.tar.lz | tarlz -9z --no-solid -o archive-ns.tar.lz
|
||||
zcmp archive.tar.lz archive-ns.tar.lz
|
||||
@end example
|
||||
|
||||
@noindent
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue