Merging upstream version 0.28.1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-06-25 03:37:17 +02:00 · 2025-06-25 03:37:17 +02:00 · ca8e65110f
commit ca8e65110f
parent 9c81793bca
26 changed files with 1067 additions and 716 deletions
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 4 March 2025
-@set VERSION 0.27.1
+@set UPDATED 24 June 2025
+@set VERSION 0.28.1

@dircategory Archiving
@direntry
@ -44,8 +44,8 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
 * File format::               Detailed format of the compressed archive
 * Amendments to pax format::  The reasons for the differences with pax
 * Program design::            Internal structure of tarlz
-* Multi-threaded decoding::   Limitations of parallel tar decoding
-* Minimum archive sizes::     Sizes required for full multi-threaded speed
+* Multithreaded decoding::    Limitations of parallel tar decoding
+* Minimum archive sizes::     Sizes required for full multithreaded speed
 * Examples::                  A small tutorial with examples
 * Problems::                  Reporting bugs
 * Concept index::             Index of concepts
@ -64,7 +64,7 @@ distribute, and modify it.
@cindex introduction

@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
-(multi-threaded) combined implementation of the tar archiver and the
+(multithreaded) combined implementation of the tar archiver and the
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
 compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.

@ -171,7 +171,8 @@ equivalent to @w{@option{-1 --solid}}.
 tarlz supports the following operations:

@table @code
-@item --help
+@item -?
+@itemx --help
 Print an informative help message describing the options and exit.

@item -V
@ -195,7 +196,7 @@ no @var{files} have been specified.

 Concatenating archives containing files in common results in two or more tar
 members with the same name in the resulting archive, which may produce
-nondeterministic behavior during multi-threaded extraction.
+nondeterministic behavior during multithreaded extraction.
@xref{mt-extraction}.

@item -c
@ -226,12 +227,9 @@ not delete a tar member unless it is possible to do so. For example it won't
 try to delete a tar member that is not compressed individually. Even in the
 case of finding a corrupt member after having deleted some member(s), tarlz
 stops and copies the rest of the file as soon as corruption is found,
-leaving it just as corrupt as it was, but not worse.
-
-To delete a directory without deleting the files under it, use
-@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
-may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
-data loss.
+leaving it just as corrupt as it was, but not worse. Deleting in place may
+be dangerous. A corrupt archive, a power cut, or an I/O error may cause data
+loss.

@item -r
@itemx --append
@ -250,7 +248,7 @@ if no @var{files} have been specified.

 Appending files already present in the archive results in two or more tar
 members with the same name, which may produce nondeterministic behavior
-during multi-threaded extraction. @xref{mt-extraction}.
+during multithreaded extraction. @xref{mt-extraction}.

@item -t
@itemx --list
@ -260,13 +258,11 @@ List the contents of an archive. If @var{files} are given, list only the
@item -x
@itemx --extract
 Extract files from an archive. If @var{files} are given, extract only the
-@var{files} given. Else extract all the files in the archive. To extract a
-directory without extracting the files under it, use
-@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
-empty directories unconditionally before extracting over them. Other than
-that, it does not make any special effort to extract a file over an
-incompatible type of file. For example, extracting a file over a non-empty
-directory usually fails. @xref{mt-extraction}.
+@var{files} given. Else extract all the files in the archive. Tarlz removes
+files and empty directories unconditionally before extracting over them.
+Other than that, it does not make any special effort to extract a file over
+an incompatible type of file. For example, extracting a file over a
+non-empty directory usually fails. @xref{mt-extraction}.

@item -z
@itemx --compress
@ -310,6 +306,9 @@ and the value of LZ_API_VERSION (if defined).
@xref{Library version,,,lzlib}.
@end ifnothtml

+@item --time-bits
+Print the size of time_t in bits and exit.
+
@end table

@noindent
@ -331,15 +330,17 @@ member large enough to contain the file.
 Change to directory @var{dir}. When creating, appending, comparing, or
 extracting, the position of each option @option{-C} in the command line is
 significant; it changes the current working directory for the following
-@var{files} until a new option @option{-C} appears in the command line.
-@option{--list} and @option{--delete} ignore any option @option{-C}
-specified. @var{dir} is relative to the then current working directory,
-perhaps changed by a previous option @option{-C}.
+@var{files} (including those specified with option @option{-T}) until a new
+option @option{-C} appears in the command line. @option{--list} and
+@option{--delete} ignore any option @option{-C} specified. @var{dir} is
+relative to the then current working directory, perhaps changed by a
+previous option @option{-C}.

 Note that a process can only have one current working directory (CWD).
-Therefore multi-threading can't be used to create or decode an archive if an
-option @option{-C} appears after a (relative) file name in the command line.
-(All file names are made relative by removing leading slashes when decoding).
+Therefore multithreading can't be used to create or decode an archive if an
+option @option{-C} appears in the command line after a (relative) file name
+or after an option @option{-T}. (All file names are made relative by
+removing leading slashes when decoding).

@item -f @var{archive}
@itemx --file=@var{archive}
@ -358,7 +359,7 @@ Valid values range from 0 to as many as your system can support. A value
 of 0 disables threads entirely. If this option is not used, tarlz tries to
 detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. See the note about
-multi-threading in the option @option{-C} above.
+multithreading in the option @option{-C} above.

 Note that the number of usable threads is limited during compression to
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
@ -382,6 +383,24 @@ permissions specified in the archive.
@itemx --quiet
 Quiet operation. Suppress all messages.

+@item -R
+@itemx --no-recursive
+When creating or appending, don't descend recursively into directories. When
+decoding, process only the files and directories specified.
+
+@item --recursive
+Operate recursively on directories. This is the default.
+
+@item -T @var{file}
+@itemx --files-from=@var{file}
+When creating or appending, read from @var{file} the names of the files to
+be archived. When decoding, read from @var{file} the names of the members to
+be processed. Each name is terminated by a newline. This option can be used
+in combination with the option @option{-R} to read a list of files generated
+with the @command{find} utility. A hyphen @samp{-} used as the name of
+@var{file} reads the names from standard input. Multiple @option{-T} options
+can be specified.
+
@item -v
@itemx --verbose
 Verbosely list files processed. Further -v's (up to 4) increase the
@ -468,6 +487,10 @@ When creating or appending, use @var{group} for files added to the archive.
 If @var{group} is not a valid group name, it is decoded as a decimal numeric
 group ID.

+@item --depth
+When creating or appending, archive all entries from each directory before
+archiving the directory itself.
+
@item --exclude=@var{pattern}
 Exclude files matching a shell pattern like @file{*.o}, even if the files
 are specified in the command line. A file is considered to match if any
@ -520,13 +543,30 @@ format is optional and defaults to @samp{00:00:00}. The epoch is
@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
 modification time before the epoch.

+@item --mount
+Stay in local file system when creating archive; skip mount points and don't
+descend below mount points. This is useful when doing backups of complete
+file systems.
+
+@item --xdev
+Stay in local file system when creating archive; archive the mount points
+themselves, but don't descend below mount points. This is useful when doing
+backups of complete file systems. If the function @samp{nftw} of the system
+C library does not support the flag @samp{FTW_XDEV}, @option{--xdev} behaves
+like @option{--mount}.
+
@item --out-slots=@var{n}
 Number of @w{1 MiB} output packets buffered per worker thread during
-multi-threaded creation or appending to compressed archives. Increasing the
+multithreaded creation or appending to compressed archives. Increasing the
 number of packets may increase compression speed if the files being archived
 are larger than @w{64 MiB} compressed, but requires more memory. Valid
 values range from 1 to 1024. The default value is 64.

+@item --parallel
+Use multithreading to create an uncompressed archive in parallel if the
+number of threads is greater than 1. This is not the default because it uses
+much more memory than sequential creation.
+
@item --warn-newer
 During archive creation, warn if any file being archived has a modification
 time newer than the archive creation time. This option may slow archive
@ -630,9 +670,9 @@ tarlz -df archive.tar.lz                 # check the archive

 Once the integrity and accuracy of an archive have been verified as in the
 example above, they can be verified again anywhere at any time with
-@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
-@option{-n0} because multi-threaded listing does not detect corruption in
-the tar member data of multimember archives: @xref{mt-listing}.
+@w{@samp{tarlz -t -n0}}. It is important to disable multithreading with
+@option{-n0} because multithreaded listing does not detect corruption in the
+tar member data of multimember archives: @xref{mt-listing}.

@example
 tarlz -t -n0 -f archive.tar.lz > /dev/null
@ -648,6 +688,9 @@ just at a member boundary:
 lzip -tv archive.tar.lz
@end example

+The probability of truncation happening at a member boundary is
+@w{(members - 1) / compressed_size}, usually one in several million.
+

@node Portable character set
@chapter POSIX portable filename character set
@ -664,6 +707,8 @@ a b c d e f g h i j k l m n o p q r s t u v w x y z
 The last three characters are the period, underscore, and hyphen-minus
 characters, respectively.

+Tarlz does not support file names containing newline characters.
+
 File names are identifiers. Therefore, archiving works better when file
 names use only the portable character set without spaces added.

@ -726,7 +771,7 @@ Zero or more blocks that contain the contents of the file.
 Each tar member must be contiguously stored in a lzip member for the
 parallel decoding operations like @option{--list} to work. If any tar member
 is split over two or more lzip members, the archive must be decoded
-sequentially. @xref{Multi-threaded decoding}.
+sequentially. @xref{Multithreaded decoding}.

 At the end of the archive file there are two 512-byte blocks filled with
 binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
@ -1111,10 +1156,10 @@ accidental double UTF-8 conversions.

 The parts of tarlz related to sequential processing of the archive are more
 or less similar to any other tar and won't be described here. The interesting
-parts described here are those related to multi-threaded processing.
+parts described here are those related to multithreaded processing.

-The structure of the part of tarlz performing multi-threaded archive
-creation is somewhat similar to that of
+The structure of the part of tarlz performing multithreaded archive creation
+is somewhat similar to that of
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
 with the added complication of the solidity levels.
@ifnothtml
@ -1190,7 +1235,7 @@ some other worker requests mastership in a previous lzip member can this
 error be avoided.


-@node Multi-threaded decoding
+@node Multithreaded decoding
@chapter Limitations of parallel tar decoding
@cindex parallel tar decoding

@ -1215,7 +1260,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
 parallelized if the tar members are not aligned with the lzip members. Tar
 archives compressed with plzip can't be decoded in parallel because tar and
 plzip do not have a way to align both sets of members. Certainly one can
-decompress one such archive with a multi-threaded tool like plzip, but the
+decompress one such archive with a multithreaded tool like plzip, but the
 increase in speed is not as large as it could be because plzip must
 serialize the decompressed data and pass them to tar, which decodes them
 sequentially, one tar member at a time.
@ -1228,13 +1273,13 @@ decoding it safely in parallel.

 Tarlz is able to automatically decode aligned and unaligned multimember
 tar.lz archives, keeping backwards compatibility. If tarlz finds a member
-misalignment during multi-threaded decoding, it switches to single-threaded
+misalignment during multithreaded decoding, it switches to single-threaded
 mode and continues decoding the archive.

@anchor{mt-listing}
-@section Multi-threaded listing
+@section Multithreaded listing

-If the files in the archive are large, multi-threaded @option{--list} on a
+If the files in the archive are large, multithreaded @option{--list} on a
 regular (seekable) tar.lz archive can be hundreds of times faster than
 sequential @option{--list} because, in addition to using several processors,
 it only needs to decompress part of each lzip member. See the following
@ -1247,7 +1292,7 @@ time plzip -cd silesia.tar.lz | tar -tf -           (3.256s)
 time tarlz -tf silesia.tar.lz                       (0.020s)
@end example

-On the other hand, multi-threaded @option{--list} won't detect corruption in
+On the other hand, multithreaded @option{--list} won't detect corruption in
 the tar member data because it only decodes the part of each lzip member
 corresponding to the tar member header. Partial decoding of a lzip member
 can't guarantee the integrity of the data decoded. This is another reason
@ -1255,12 +1300,12 @@ why the tar headers (including the extended records) must provide their own
 integrity checking.

@anchor{mt-extraction}
-@section Limitations of multi-threaded extraction
+@section Limitations of multithreaded extraction

-Multi-threaded extraction may produce different output than single-threaded
+Multithreaded extraction may produce different output than single-threaded
 extraction in some cases:

-During multi-threaded extraction, several independent threads are
+During multithreaded extraction, several independent threads are
 simultaneously reading the archive and creating files in the file system.
 The archive is not read sequentially. As a consequence, any error or
 weirdness in the archive (like a corrupt member or an end-of-archive block
@ -1270,7 +1315,7 @@ archive beyond that point has been processed.
 If the archive contains two or more tar members with the same name,
 single-threaded extraction extracts the members in the order they appear in
 the archive and leaves in the file system the last version of the file. But
-multi-threaded extraction may extract the members in any order and leave in
+multithreaded extraction may extract the members in any order and leave in
 the file system any version of the file nondeterministically. It is
 unspecified which of the tar members is extracted.

@ -1283,12 +1328,12 @@ links to.


@node Minimum archive sizes
-@chapter Minimum archive sizes required for multi-threaded block compression
+@chapter Minimum archive sizes required for multithreaded block compression
@cindex minimum archive sizes

-When creating or appending to a compressed archive using multi-threaded
-block compression, tarlz puts tar members together in blocks and compresses
-as many blocks simultaneously as worker threads are chosen, creating a
+When creating or appending to a compressed archive using multithreaded block
+compression, tarlz puts tar members together in blocks and compresses as
+many blocks simultaneously as worker threads are chosen, creating a
 multimember compressed archive.

 For this to work as expected (and roughly multiply the compression speed by