Merging upstream version 0.28.1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
9c81793bca
commit
ca8e65110f
26 changed files with 1067 additions and 716 deletions
147
doc/tarlz.texi
147
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 4 March 2025
|
||||
@set VERSION 0.27.1
|
||||
@set UPDATED 24 June 2025
|
||||
@set VERSION 0.28.1
|
||||
|
||||
@dircategory Archiving
|
||||
@direntry
|
||||
|
@ -44,8 +44,8 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Program design:: Internal structure of tarlz
|
||||
* Multi-threaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Multithreaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multithreaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -64,7 +64,7 @@ distribute, and modify it.
|
|||
@cindex introduction
|
||||
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
|
||||
(multi-threaded) combined implementation of the tar archiver and the
|
||||
(multithreaded) combined implementation of the tar archiver and the
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
|
||||
compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
|
||||
|
||||
|
@ -171,7 +171,8 @@ equivalent to @w{@option{-1 --solid}}.
|
|||
tarlz supports the following operations:
|
||||
|
||||
@table @code
|
||||
@item --help
|
||||
@item -?
|
||||
@itemx --help
|
||||
Print an informative help message describing the options and exit.
|
||||
|
||||
@item -V
|
||||
|
@ -195,7 +196,7 @@ no @var{files} have been specified.
|
|||
|
||||
Concatenating archives containing files in common results in two or more tar
|
||||
members with the same name in the resulting archive, which may produce
|
||||
nondeterministic behavior during multi-threaded extraction.
|
||||
nondeterministic behavior during multithreaded extraction.
|
||||
@xref{mt-extraction}.
|
||||
|
||||
@item -c
|
||||
|
@ -226,12 +227,9 @@ not delete a tar member unless it is possible to do so. For example it won't
|
|||
try to delete a tar member that is not compressed individually. Even in the
|
||||
case of finding a corrupt member after having deleted some member(s), tarlz
|
||||
stops and copies the rest of the file as soon as corruption is found,
|
||||
leaving it just as corrupt as it was, but not worse.
|
||||
|
||||
To delete a directory without deleting the files under it, use
|
||||
@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
|
||||
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||
data loss.
|
||||
leaving it just as corrupt as it was, but not worse. Deleting in place may
|
||||
be dangerous. A corrupt archive, a power cut, or an I/O error may cause data
|
||||
loss.
|
||||
|
||||
@item -r
|
||||
@itemx --append
|
||||
|
@ -250,7 +248,7 @@ if no @var{files} have been specified.
|
|||
|
||||
Appending files already present in the archive results in two or more tar
|
||||
members with the same name, which may produce nondeterministic behavior
|
||||
during multi-threaded extraction. @xref{mt-extraction}.
|
||||
during multithreaded extraction. @xref{mt-extraction}.
|
||||
|
||||
@item -t
|
||||
@itemx --list
|
||||
|
@ -260,13 +258,11 @@ List the contents of an archive. If @var{files} are given, list only the
|
|||
@item -x
|
||||
@itemx --extract
|
||||
Extract files from an archive. If @var{files} are given, extract only the
|
||||
@var{files} given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
|
||||
empty directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a non-empty
|
||||
directory usually fails. @xref{mt-extraction}.
|
||||
@var{files} given. Else extract all the files in the archive. Tarlz removes
|
||||
files and empty directories unconditionally before extracting over them.
|
||||
Other than that, it does not make any special effort to extract a file over
|
||||
an incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory usually fails. @xref{mt-extraction}.
|
||||
|
||||
@item -z
|
||||
@itemx --compress
|
||||
|
@ -310,6 +306,9 @@ and the value of LZ_API_VERSION (if defined).
|
|||
@xref{Library version,,,lzlib}.
|
||||
@end ifnothtml
|
||||
|
||||
@item --time-bits
|
||||
Print the size of time_t in bits and exit.
|
||||
|
||||
@end table
|
||||
|
||||
@noindent
|
||||
|
@ -331,15 +330,17 @@ member large enough to contain the file.
|
|||
Change to directory @var{dir}. When creating, appending, comparing, or
|
||||
extracting, the position of each option @option{-C} in the command line is
|
||||
significant; it changes the current working directory for the following
|
||||
@var{files} until a new option @option{-C} appears in the command line.
|
||||
@option{--list} and @option{--delete} ignore any option @option{-C}
|
||||
specified. @var{dir} is relative to the then current working directory,
|
||||
perhaps changed by a previous option @option{-C}.
|
||||
@var{files} (including those specified with option @option{-T}) until a new
|
||||
option @option{-C} appears in the command line. @option{--list} and
|
||||
@option{--delete} ignore any option @option{-C} specified. @var{dir} is
|
||||
relative to the then current working directory, perhaps changed by a
|
||||
previous option @option{-C}.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create or decode an archive if an
|
||||
option @option{-C} appears after a (relative) file name in the command line.
|
||||
(All file names are made relative by removing leading slashes when decoding).
|
||||
Therefore multithreading can't be used to create or decode an archive if an
|
||||
option @option{-C} appears in the command line after a (relative) file name
|
||||
or after an option @option{-T}. (All file names are made relative by
|
||||
removing leading slashes when decoding).
|
||||
|
||||
@item -f @var{archive}
|
||||
@itemx --file=@var{archive}
|
||||
|
@ -358,7 +359,7 @@ Valid values range from 0 to as many as your system can support. A value
|
|||
of 0 disables threads entirely. If this option is not used, tarlz tries to
|
||||
detect the number of processors in the system and use it as default value.
|
||||
@w{@samp{tarlz --help}} shows the system's default value. See the note about
|
||||
multi-threading in the option @option{-C} above.
|
||||
multithreading in the option @option{-C} above.
|
||||
|
||||
Note that the number of usable threads is limited during compression to
|
||||
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
|
||||
|
@ -382,6 +383,24 @@ permissions specified in the archive.
|
|||
@itemx --quiet
|
||||
Quiet operation. Suppress all messages.
|
||||
|
||||
@item -R
|
||||
@itemx --no-recursive
|
||||
When creating or appending, don't descend recursively into directories. When
|
||||
decoding, process only the files and directories specified.
|
||||
|
||||
@item --recursive
|
||||
Operate recursively on directories. This is the default.
|
||||
|
||||
@item -T @var{file}
|
||||
@itemx --files-from=@var{file}
|
||||
When creating or appending, read from @var{file} the names of the files to
|
||||
be archived. When decoding, read from @var{file} the names of the members to
|
||||
be processed. Each name is terminated by a newline. This option can be used
|
||||
in combination with the option @option{-R} to read a list of files generated
|
||||
with the @command{find} utility. A hyphen @samp{-} used as the name of
|
||||
@var{file} reads the names from standard input. Multiple @option{-T} options
|
||||
can be specified.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
Verbosely list files processed. Further -v's (up to 4) increase the
|
||||
|
@ -468,6 +487,10 @@ When creating or appending, use @var{group} for files added to the archive.
|
|||
If @var{group} is not a valid group name, it is decoded as a decimal numeric
|
||||
group ID.
|
||||
|
||||
@item --depth
|
||||
When creating or appending, archive all entries from each directory before
|
||||
archiving the directory itself.
|
||||
|
||||
@item --exclude=@var{pattern}
|
||||
Exclude files matching a shell pattern like @file{*.o}, even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
|
@ -520,13 +543,30 @@ format is optional and defaults to @samp{00:00:00}. The epoch is
|
|||
@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
|
||||
modification time before the epoch.
|
||||
|
||||
@item --mount
|
||||
Stay in local file system when creating archive; skip mount points and don't
|
||||
descend below mount points. This is useful when doing backups of complete
|
||||
file systems.
|
||||
|
||||
@item --xdev
|
||||
Stay in local file system when creating archive; archive the mount points
|
||||
themselves, but don't descend below mount points. This is useful when doing
|
||||
backups of complete file systems. If the function @samp{nftw} of the system
|
||||
C library does not support the flag @samp{FTW_XDEV}, @option{--xdev} behaves
|
||||
like @option{--mount}.
|
||||
|
||||
@item --out-slots=@var{n}
|
||||
Number of @w{1 MiB} output packets buffered per worker thread during
|
||||
multi-threaded creation or appending to compressed archives. Increasing the
|
||||
multithreaded creation or appending to compressed archives. Increasing the
|
||||
number of packets may increase compression speed if the files being archived
|
||||
are larger than @w{64 MiB} compressed, but requires more memory. Valid
|
||||
values range from 1 to 1024. The default value is 64.
|
||||
|
||||
@item --parallel
|
||||
Use multithreading to create an uncompressed archive in parallel if the
|
||||
number of threads is greater than 1. This is not the default because it uses
|
||||
much more memory than sequential creation.
|
||||
|
||||
@item --warn-newer
|
||||
During archive creation, warn if any file being archived has a modification
|
||||
time newer than the archive creation time. This option may slow archive
|
||||
|
@ -630,9 +670,9 @@ tarlz -df archive.tar.lz # check the archive
|
|||
|
||||
Once the integrity and accuracy of an archive have been verified as in the
|
||||
example above, they can be verified again anywhere at any time with
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
|
||||
@option{-n0} because multi-threaded listing does not detect corruption in
|
||||
the tar member data of multimember archives: @xref{mt-listing}.
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multithreading with
|
||||
@option{-n0} because multithreaded listing does not detect corruption in the
|
||||
tar member data of multimember archives: @xref{mt-listing}.
|
||||
|
||||
@example
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
|
@ -648,6 +688,9 @@ just at a member boundary:
|
|||
lzip -tv archive.tar.lz
|
||||
@end example
|
||||
|
||||
The probability of truncation happening at a member boundary is
|
||||
@w{(members - 1) / compressed_size}, usually one in several million.
|
||||
|
||||
|
||||
@node Portable character set
|
||||
@chapter POSIX portable filename character set
|
||||
|
@ -664,6 +707,8 @@ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
|||
The last three characters are the period, underscore, and hyphen-minus
|
||||
characters, respectively.
|
||||
|
||||
Tarlz does not support file names containing newline characters.
|
||||
|
||||
File names are identifiers. Therefore, archiving works better when file
|
||||
names use only the portable character set without spaces added.
|
||||
|
||||
|
@ -726,7 +771,7 @@ Zero or more blocks that contain the contents of the file.
|
|||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like @option{--list} to work. If any tar member
|
||||
is split over two or more lzip members, the archive must be decoded
|
||||
sequentially. @xref{Multi-threaded decoding}.
|
||||
sequentially. @xref{Multithreaded decoding}.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
|
||||
|
@ -1111,10 +1156,10 @@ accidental double UTF-8 conversions.
|
|||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The interesting
|
||||
parts described here are those related to multi-threaded processing.
|
||||
parts described here are those related to multithreaded processing.
|
||||
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
creation is somewhat similar to that of
|
||||
The structure of the part of tarlz performing multithreaded archive creation
|
||||
is somewhat similar to that of
|
||||
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
|
||||
with the added complication of the solidity levels.
|
||||
@ifnothtml
|
||||
|
@ -1190,7 +1235,7 @@ some other worker requests mastership in a previous lzip member can this
|
|||
error be avoided.
|
||||
|
||||
|
||||
@node Multi-threaded decoding
|
||||
@node Multithreaded decoding
|
||||
@chapter Limitations of parallel tar decoding
|
||||
@cindex parallel tar decoding
|
||||
|
||||
|
@ -1215,7 +1260,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
|
|||
parallelized if the tar members are not aligned with the lzip members. Tar
|
||||
archives compressed with plzip can't be decoded in parallel because tar and
|
||||
plzip do not have a way to align both sets of members. Certainly one can
|
||||
decompress one such archive with a multi-threaded tool like plzip, but the
|
||||
decompress one such archive with a multithreaded tool like plzip, but the
|
||||
increase in speed is not as large as it could be because plzip must
|
||||
serialize the decompressed data and pass them to tar, which decodes them
|
||||
sequentially, one tar member at a time.
|
||||
|
@ -1228,13 +1273,13 @@ decoding it safely in parallel.
|
|||
|
||||
Tarlz is able to automatically decode aligned and unaligned multimember
|
||||
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
||||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
misalignment during multithreaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
@anchor{mt-listing}
|
||||
@section Multi-threaded listing
|
||||
@section Multithreaded listing
|
||||
|
||||
If the files in the archive are large, multi-threaded @option{--list} on a
|
||||
If the files in the archive are large, multithreaded @option{--list} on a
|
||||
regular (seekable) tar.lz archive can be hundreds of times faster than
|
||||
sequential @option{--list} because, in addition to using several processors,
|
||||
it only needs to decompress part of each lzip member. See the following
|
||||
|
@ -1247,7 +1292,7 @@ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
|||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
@end example
|
||||
|
||||
On the other hand, multi-threaded @option{--list} won't detect corruption in
|
||||
On the other hand, multithreaded @option{--list} won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
|
@ -1255,12 +1300,12 @@ why the tar headers (including the extended records) must provide their own
|
|||
integrity checking.
|
||||
|
||||
@anchor{mt-extraction}
|
||||
@section Limitations of multi-threaded extraction
|
||||
@section Limitations of multithreaded extraction
|
||||
|
||||
Multi-threaded extraction may produce different output than single-threaded
|
||||
Multithreaded extraction may produce different output than single-threaded
|
||||
extraction in some cases:
|
||||
|
||||
During multi-threaded extraction, several independent threads are
|
||||
During multithreaded extraction, several independent threads are
|
||||
simultaneously reading the archive and creating files in the file system.
|
||||
The archive is not read sequentially. As a consequence, any error or
|
||||
weirdness in the archive (like a corrupt member or an end-of-archive block
|
||||
|
@ -1270,7 +1315,7 @@ archive beyond that point has been processed.
|
|||
If the archive contains two or more tar members with the same name,
|
||||
single-threaded extraction extracts the members in the order they appear in
|
||||
the archive and leaves in the file system the last version of the file. But
|
||||
multi-threaded extraction may extract the members in any order and leave in
|
||||
multithreaded extraction may extract the members in any order and leave in
|
||||
the file system any version of the file nondeterministically. It is
|
||||
unspecified which of the tar members is extracted.
|
||||
|
||||
|
@ -1283,12 +1328,12 @@ links to.
|
|||
|
||||
|
||||
@node Minimum archive sizes
|
||||
@chapter Minimum archive sizes required for multi-threaded block compression
|
||||
@chapter Minimum archive sizes required for multithreaded block compression
|
||||
@cindex minimum archive sizes
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
as many blocks simultaneously as worker threads are chosen, creating a
|
||||
When creating or appending to a compressed archive using multithreaded block
|
||||
compression, tarlz puts tar members together in blocks and compresses as
|
||||
many blocks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed by
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue