1
0
Fork 0

Merging upstream version 0.28.1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-06-25 03:37:17 +02:00
parent 9c81793bca
commit ca8e65110f
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
26 changed files with 1067 additions and 716 deletions

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 4 March 2025
@set VERSION 0.27.1
@set UPDATED 24 June 2025
@set VERSION 0.28.1
@dircategory Archiving
@direntry
@ -44,8 +44,8 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Program design:: Internal structure of tarlz
* Multi-threaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Multithreaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multithreaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -64,7 +64,7 @@ distribute, and modify it.
@cindex introduction
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
(multi-threaded) combined implementation of the tar archiver and the
(multithreaded) combined implementation of the tar archiver and the
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
@ -171,7 +171,8 @@ equivalent to @w{@option{-1 --solid}}.
tarlz supports the following operations:
@table @code
@item --help
@item -?
@itemx --help
Print an informative help message describing the options and exit.
@item -V
@ -195,7 +196,7 @@ no @var{files} have been specified.
Concatenating archives containing files in common results in two or more tar
members with the same name in the resulting archive, which may produce
nondeterministic behavior during multi-threaded extraction.
nondeterministic behavior during multithreaded extraction.
@xref{mt-extraction}.
@item -c
@ -226,12 +227,9 @@ not delete a tar member unless it is possible to do so. For example it won't
try to delete a tar member that is not compressed individually. Even in the
case of finding a corrupt member after having deleted some member(s), tarlz
stops and copies the rest of the file as soon as corruption is found,
leaving it just as corrupt as it was, but not worse.
To delete a directory without deleting the files under it, use
@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
data loss.
leaving it just as corrupt as it was, but not worse. Deleting in place may
be dangerous. A corrupt archive, a power cut, or an I/O error may cause data
loss.
@item -r
@itemx --append
@ -250,7 +248,7 @@ if no @var{files} have been specified.
Appending files already present in the archive results in two or more tar
members with the same name, which may produce nondeterministic behavior
during multi-threaded extraction. @xref{mt-extraction}.
during multithreaded extraction. @xref{mt-extraction}.
@item -t
@itemx --list
@ -260,13 +258,11 @@ List the contents of an archive. If @var{files} are given, list only the
@item -x
@itemx --extract
Extract files from an archive. If @var{files} are given, extract only the
@var{files} given. Else extract all the files in the archive. To extract a
directory without extracting the files under it, use
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
empty directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a non-empty
directory usually fails. @xref{mt-extraction}.
@var{files} given. Else extract all the files in the archive. Tarlz removes
files and empty directories unconditionally before extracting over them.
Other than that, it does not make any special effort to extract a file over
an incompatible type of file. For example, extracting a file over a
non-empty directory usually fails. @xref{mt-extraction}.
@item -z
@itemx --compress
@ -310,6 +306,9 @@ and the value of LZ_API_VERSION (if defined).
@xref{Library version,,,lzlib}.
@end ifnothtml
@item --time-bits
Print the size of time_t in bits and exit.
@end table
@noindent
@ -331,15 +330,17 @@ member large enough to contain the file.
Change to directory @var{dir}. When creating, appending, comparing, or
extracting, the position of each option @option{-C} in the command line is
significant; it changes the current working directory for the following
@var{files} until a new option @option{-C} appears in the command line.
@option{--list} and @option{--delete} ignore any option @option{-C}
specified. @var{dir} is relative to the then current working directory,
perhaps changed by a previous option @option{-C}.
@var{files} (including those specified with option @option{-T}) until a new
option @option{-C} appears in the command line. @option{--list} and
@option{--delete} ignore any option @option{-C} specified. @var{dir} is
relative to the then current working directory, perhaps changed by a
previous option @option{-C}.
Note that a process can only have one current working directory (CWD).
Therefore multi-threading can't be used to create or decode an archive if an
option @option{-C} appears after a (relative) file name in the command line.
(All file names are made relative by removing leading slashes when decoding).
Therefore multithreading can't be used to create or decode an archive if an
option @option{-C} appears in the command line after a (relative) file name
or after an option @option{-T}. (All file names are made relative by
removing leading slashes when decoding).
@item -f @var{archive}
@itemx --file=@var{archive}
@ -358,7 +359,7 @@ Valid values range from 0 to as many as your system can support. A value
of 0 disables threads entirely. If this option is not used, tarlz tries to
detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. See the note about
multi-threading in the option @option{-C} above.
multithreading in the option @option{-C} above.
Note that the number of usable threads is limited during compression to
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
@ -382,6 +383,24 @@ permissions specified in the archive.
@itemx --quiet
Quiet operation. Suppress all messages.
@item -R
@itemx --no-recursive
When creating or appending, don't descend recursively into directories. When
decoding, process only the files and directories specified.
@item --recursive
Operate recursively on directories. This is the default.
@item -T @var{file}
@itemx --files-from=@var{file}
When creating or appending, read from @var{file} the names of the files to
be archived. When decoding, read from @var{file} the names of the members to
be processed. Each name is terminated by a newline. This option can be used
in combination with the option @option{-R} to read a list of files generated
with the @command{find} utility. A hyphen @samp{-} used as the name of
@var{file} reads the names from standard input. Multiple @option{-T} options
can be specified.
@item -v
@itemx --verbose
Verbosely list files processed. Further -v's (up to 4) increase the
@ -468,6 +487,10 @@ When creating or appending, use @var{group} for files added to the archive.
If @var{group} is not a valid group name, it is decoded as a decimal numeric
group ID.
@item --depth
When creating or appending, archive all entries from each directory before
archiving the directory itself.
@item --exclude=@var{pattern}
Exclude files matching a shell pattern like @file{*.o}, even if the files
are specified in the command line. A file is considered to match if any
@ -520,13 +543,30 @@ format is optional and defaults to @samp{00:00:00}. The epoch is
@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
modification time before the epoch.
@item --mount
Stay in local file system when creating archive; skip mount points and don't
descend below mount points. This is useful when doing backups of complete
file systems.
@item --xdev
Stay in local file system when creating archive; archive the mount points
themselves, but don't descend below mount points. This is useful when doing
backups of complete file systems. If the function @samp{nftw} of the system
C library does not support the flag @samp{FTW_XDEV}, @option{--xdev} behaves
like @option{--mount}.
@item --out-slots=@var{n}
Number of @w{1 MiB} output packets buffered per worker thread during
multi-threaded creation or appending to compressed archives. Increasing the
multithreaded creation or appending to compressed archives. Increasing the
number of packets may increase compression speed if the files being archived
are larger than @w{64 MiB} compressed, but requires more memory. Valid
values range from 1 to 1024. The default value is 64.
@item --parallel
Use multithreading to create an uncompressed archive in parallel if the
number of threads is greater than 1. This is not the default because it uses
much more memory than sequential creation.
@item --warn-newer
During archive creation, warn if any file being archived has a modification
time newer than the archive creation time. This option may slow archive
@ -630,9 +670,9 @@ tarlz -df archive.tar.lz # check the archive
Once the integrity and accuracy of an archive have been verified as in the
example above, they can be verified again anywhere at any time with
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
@option{-n0} because multi-threaded listing does not detect corruption in
the tar member data of multimember archives: @xref{mt-listing}.
@w{@samp{tarlz -t -n0}}. It is important to disable multithreading with
@option{-n0} because multithreaded listing does not detect corruption in the
tar member data of multimember archives: @xref{mt-listing}.
@example
tarlz -t -n0 -f archive.tar.lz > /dev/null
@ -648,6 +688,9 @@ just at a member boundary:
lzip -tv archive.tar.lz
@end example
The probability of truncation happening at a member boundary is
@w{(members - 1) / compressed_size}, usually one in several million.
@node Portable character set
@chapter POSIX portable filename character set
@ -664,6 +707,8 @@ a b c d e f g h i j k l m n o p q r s t u v w x y z
The last three characters are the period, underscore, and hyphen-minus
characters, respectively.
Tarlz does not support file names containing newline characters.
File names are identifiers. Therefore, archiving works better when file
names use only the portable character set without spaces added.
@ -726,7 +771,7 @@ Zero or more blocks that contain the contents of the file.
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like @option{--list} to work. If any tar member
is split over two or more lzip members, the archive must be decoded
sequentially. @xref{Multi-threaded decoding}.
sequentially. @xref{Multithreaded decoding}.
At the end of the archive file there are two 512-byte blocks filled with
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
@ -1111,10 +1156,10 @@ accidental double UTF-8 conversions.
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The interesting
parts described here are those related to multi-threaded processing.
parts described here are those related to multithreaded processing.
The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of
The structure of the part of tarlz performing multithreaded archive creation
is somewhat similar to that of
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
with the added complication of the solidity levels.
@ifnothtml
@ -1190,7 +1235,7 @@ some other worker requests mastership in a previous lzip member can this
error be avoided.
@node Multi-threaded decoding
@node Multithreaded decoding
@chapter Limitations of parallel tar decoding
@cindex parallel tar decoding
@ -1215,7 +1260,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
parallelized if the tar members are not aligned with the lzip members. Tar
archives compressed with plzip can't be decoded in parallel because tar and
plzip do not have a way to align both sets of members. Certainly one can
decompress one such archive with a multi-threaded tool like plzip, but the
decompress one such archive with a multithreaded tool like plzip, but the
increase in speed is not as large as it could be because plzip must
serialize the decompressed data and pass them to tar, which decodes them
sequentially, one tar member at a time.
@ -1228,13 +1273,13 @@ decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
misalignment during multithreaded decoding, it switches to single-threaded
mode and continues decoding the archive.
@anchor{mt-listing}
@section Multi-threaded listing
@section Multithreaded listing
If the files in the archive are large, multi-threaded @option{--list} on a
If the files in the archive are large, multithreaded @option{--list} on a
regular (seekable) tar.lz archive can be hundreds of times faster than
sequential @option{--list} because, in addition to using several processors,
it only needs to decompress part of each lzip member. See the following
@ -1247,7 +1292,7 @@ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
@end example
On the other hand, multi-threaded @option{--list} won't detect corruption in
On the other hand, multithreaded @option{--list} won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
@ -1255,12 +1300,12 @@ why the tar headers (including the extended records) must provide their own
integrity checking.
@anchor{mt-extraction}
@section Limitations of multi-threaded extraction
@section Limitations of multithreaded extraction
Multi-threaded extraction may produce different output than single-threaded
Multithreaded extraction may produce different output than single-threaded
extraction in some cases:
During multi-threaded extraction, several independent threads are
During multithreaded extraction, several independent threads are
simultaneously reading the archive and creating files in the file system.
The archive is not read sequentially. As a consequence, any error or
weirdness in the archive (like a corrupt member or an end-of-archive block
@ -1270,7 +1315,7 @@ archive beyond that point has been processed.
If the archive contains two or more tar members with the same name,
single-threaded extraction extracts the members in the order they appear in
the archive and leaves in the file system the last version of the file. But
multi-threaded extraction may extract the members in any order and leave in
multithreaded extraction may extract the members in any order and leave in
the file system any version of the file nondeterministically. It is
unspecified which of the tar members is extracted.
@ -1283,12 +1328,12 @@ links to.
@node Minimum archive sizes
@chapter Minimum archive sizes required for multi-threaded block compression
@chapter Minimum archive sizes required for multithreaded block compression
@cindex minimum archive sizes
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses
as many blocks simultaneously as worker threads are chosen, creating a
When creating or appending to a compressed archive using multithreaded block
compression, tarlz puts tar members together in blocks and compresses as
many blocks simultaneously as worker threads are chosen, creating a
multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed by