1
0
Fork 0

Merging upstream version 0.28.1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-06-25 03:37:17 +02:00
parent 9c81793bca
commit ca8e65110f
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
26 changed files with 1067 additions and 716 deletions

View file

@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.27.1, 4 March 2025).
This manual is for Tarlz (version 0.28.1, 24 June 2025).
* Menu:
@ -23,8 +23,8 @@ This manual is for Tarlz (version 0.27.1, 4 March 2025).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Program design:: Internal structure of tarlz
* Multi-threaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Multithreaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multithreaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -41,7 +41,7 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
Tarlz is a massively parallel (multi-threaded) combined implementation of
Tarlz is a massively parallel (multithreaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz uses the compression
library lzlib.
@ -131,6 +131,7 @@ to '-1 --solid'.
tarlz supports the following operations:
'-?'
'--help'
Print an informative help message describing the options and exit.
@ -155,7 +156,7 @@ tarlz supports the following operations:
Concatenating archives containing files in common results in two or
more tar members with the same name in the resulting archive, which
may produce nondeterministic behavior during multi-threaded extraction.
may produce nondeterministic behavior during multithreaded extraction.
*Note mt-extraction::.
'-c'
@ -188,12 +189,8 @@ tarlz supports the following operations:
Even in the case of finding a corrupt member after having deleted some
member(s), tarlz stops and copies the rest of the file as soon as
corruption is found, leaving it just as corrupt as it was, but not
worse.
To delete a directory without deleting the files under it, use
'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place may
be dangerous. A corrupt archive, a power cut, or an I/O error may cause
data loss.
worse. Deleting in place may be dangerous. A corrupt archive, a power
cut, or an I/O error may cause data loss.
'-r'
'--append'
@ -212,7 +209,7 @@ tarlz supports the following operations:
Appending files already present in the archive results in two or more
tar members with the same name, which may produce nondeterministic
behavior during multi-threaded extraction. *Note mt-extraction::.
behavior during multithreaded extraction. *Note mt-extraction::.
'-t'
'--list'
@ -222,13 +219,11 @@ tarlz supports the following operations:
'-x'
'--extract'
Extract files from an archive. If FILES are given, extract only the
FILES given. Else extract all the files in the archive. To extract a
directory without extracting the files under it, use
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a
non-empty directory usually fails. *Note mt-extraction::.
FILES given. Else extract all the files in the archive. Tarlz removes
files and empty directories unconditionally before extracting over
them. Other than that, it does not make any special effort to extract
a file over an incompatible type of file. For example, extracting a
file over a non-empty directory usually fails. *Note mt-extraction::.
'-z'
'--compress'
@ -269,6 +264,9 @@ tarlz supports the following operations:
value of LZ_API_VERSION (if defined). *Note Library version:
(lzlib)Library version.
'--time-bits'
Print the size of time_t in bits and exit.
tarlz supports the following options: *Note Argument syntax::.
@ -286,16 +284,16 @@ tarlz supports the following options: *Note Argument syntax::.
Change to directory DIR. When creating, appending, comparing, or
extracting, the position of each option '-C' in the command line is
significant; it changes the current working directory for the following
FILES until a new option '-C' appears in the command line. '--list'
and '--delete' ignore any option '-C' specified. DIR is relative to
the then current working directory, perhaps changed by a previous
option '-C'.
FILES (including those specified with option '-T') until a new option
'-C' appears in the command line. '--list' and '--delete' ignore any
option '-C' specified. DIR is relative to the then current working
directory, perhaps changed by a previous option '-C'.
Note that a process can only have one current working directory (CWD).
Therefore multi-threading can't be used to create or decode an archive
if an option '-C' appears after a (relative) file name in the command
line. (All file names are made relative by removing leading slashes
when decoding).
Therefore multithreading can't be used to create or decode an archive
if an option '-C' appears in the command line after a (relative) file
name or after an option '-T'. (All file names are made relative by
removing leading slashes when decoding).
'-f ARCHIVE'
'--file=ARCHIVE'
@ -315,7 +313,7 @@ tarlz supports the following options: *Note Argument syntax::.
support. A value of 0 disables threads entirely. If this option is not
used, tarlz tries to detect the number of processors in the system and
use it as default value. 'tarlz --help' shows the system's default
value. See the note about multi-threading in the option '-C' above.
value. See the note about multithreading in the option '-C' above.
Note that the number of usable threads is limited during compression to
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
@ -339,6 +337,25 @@ tarlz supports the following options: *Note Argument syntax::.
'--quiet'
Quiet operation. Suppress all messages.
'-R'
'--no-recursive'
When creating or appending, don't descend recursively into
directories. When decoding, process only the files and directories
specified.
'--recursive'
Operate recursively on directories. This is the default.
'-T FILE'
'--files-from=FILE'
When creating or appending, read from FILE the names of the files to
be archived. When decoding, read from FILE the names of the members to
be processed. Each name is terminated by a newline. This option can be
used in combination with the option '-R' to read a list of files
generated with the 'find' utility. A hyphen '-' used as the name of
FILE reads the names from standard input. Multiple '-T' options can be
specified.
'-v'
'--verbose'
Verbosely list files processed. Further -v's (up to 4) increase the
@ -426,6 +443,10 @@ tarlz supports the following options: *Note Argument syntax::.
If GROUP is not a valid group name, it is decoded as a decimal numeric
group ID.
'--depth'
When creating or appending, archive all entries from each directory
before archiving the directory itself.
'--exclude=PATTERN'
Exclude files matching a shell pattern like '*.o', even if the files
are specified in the command line. A file is considered to match if any
@ -477,13 +498,29 @@ tarlz supports the following options: *Note Argument syntax::.
'1970-01-01 00:00:00 UTC'. Negative seconds or years define a
modification time before the epoch.
'--mount'
Stay in local file system when creating archive; skip mount points and
don't descend below mount points. This is useful when doing backups of
complete file systems.
'--xdev'
Stay in local file system when creating archive; archive the mount
points themselves, but don't descend below mount points. This is
useful when doing backups of complete file systems. If the function
'nftw' of the system C library does not support the flag 'FTW_XDEV',
'--xdev' behaves like '--mount'.
'--out-slots=N'
Number of 1 MiB output packets buffered per worker thread during
multi-threaded creation or appending to compressed archives.
Increasing the number of packets may increase compression speed if the
files being archived are larger than 64 MiB compressed, but requires
more memory. Valid values range from 1 to 1024. The default value is
64.
multithreaded creation or appending to compressed archives. Increasing
the number of packets may increase compression speed if the files
being archived are larger than 64 MiB compressed, but requires more
memory. Valid values range from 1 to 1024. The default value is 64.
'--parallel'
Use multithreading to create an uncompressed archive in parallel if the
number of threads is greater than 1. This is not the default because
it uses much more memory than sequential creation.
'--warn-newer'
During archive creation, warn if any file being archived has a
@ -575,8 +612,8 @@ compares the files in the archive with the files in the file system:
Once the integrity and accuracy of an archive have been verified as in
the example above, they can be verified again anywhere at any time with
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
because multi-threaded listing does not detect corruption in the tar member
'tarlz -t -n0'. It is important to disable multithreading with '-n0'
because multithreaded listing does not detect corruption in the tar member
data of multimember archives: *Note mt-listing::.
tarlz -t -n0 -f archive.tar.lz > /dev/null
@ -589,6 +626,9 @@ at a member boundary:
lzip -tv archive.tar.lz
The probability of truncation happening at a member boundary is
(members - 1) / compressed_size, usually one in several million.

File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
@ -604,6 +644,8 @@ The set of characters from which portable file names are constructed.
The last three characters are the period, underscore, and hyphen-minus
characters, respectively.
Tarlz does not support file names containing newline characters.
File names are identifiers. Therefore, archiving works better when file
names use only the portable character set without spaces added.
@ -657,7 +699,7 @@ following sequence:
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like '--list' to work. If any tar member is
split over two or more lzip members, the archive must be decoded
sequentially. *Note Multi-threaded decoding::.
sequentially. *Note Multithreaded decoding::.
At the end of the archive file there are two 512-byte blocks filled with
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
@ -1020,17 +1062,17 @@ without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions.

File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
File: tarlz.info, Node: Program design, Next: Multithreaded decoding, Prev: Amendments to pax format, Up: Top
8 Internal structure of tarlz
*****************************
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The
interesting parts described here are those related to multi-threaded
interesting parts described here are those related to multithreaded
processing.
The structure of the part of tarlz performing multi-threaded archive
The structure of the part of tarlz performing multithreaded archive
creation is somewhat similar to that of plzip with the added complication
of the solidity levels. *Note Program design: (plzip)Program design. A
grouper thread and several worker threads are created, acting the main
@ -1100,7 +1142,7 @@ some other worker requests mastership in a previous lzip member can this
error be avoided.

File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
File: tarlz.info, Node: Multithreaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
9 Limitations of parallel tar decoding
**************************************
@ -1126,7 +1168,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
parallelized if the tar members are not aligned with the lzip members. Tar
archives compressed with plzip can't be decoded in parallel because tar and
plzip do not have a way to align both sets of members. Certainly one can
decompress one such archive with a multi-threaded tool like plzip, but the
decompress one such archive with a multithreaded tool like plzip, but the
increase in speed is not as large as it could be because plzip must
serialize the decompressed data and pass them to tar, which decodes them
sequentially, one tar member at a time.
@ -1139,13 +1181,13 @@ possible decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
misalignment during multithreaded decoding, it switches to single-threaded
mode and continues decoding the archive.
9.1 Multi-threaded listing
==========================
9.1 Multithreaded listing
=========================
If the files in the archive are large, multi-threaded '--list' on a regular
If the files in the archive are large, multithreaded '--list' on a regular
(seekable) tar.lz archive can be hundreds of times faster than sequential
'--list' because, in addition to using several processors, it only needs to
decompress part of each lzip member. See the following example listing the
@ -1156,20 +1198,20 @@ Silesia corpus on a dual core machine:
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member
On the other hand, multithreaded '--list' won't detect corruption in the
tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
9.2 Limitations of multi-threaded extraction
============================================
9.2 Limitations of multithreaded extraction
===========================================
Multi-threaded extraction may produce different output than single-threaded
Multithreaded extraction may produce different output than single-threaded
extraction in some cases:
During multi-threaded extraction, several independent threads are
During multithreaded extraction, several independent threads are
simultaneously reading the archive and creating files in the file system.
The archive is not read sequentially. As a consequence, any error or
weirdness in the archive (like a corrupt member or an end-of-archive block
@ -1179,7 +1221,7 @@ archive beyond that point has been processed.
If the archive contains two or more tar members with the same name,
single-threaded extraction extracts the members in the order they appear in
the archive and leaves in the file system the last version of the file. But
multi-threaded extraction may extract the members in any order and leave in
multithreaded extraction may extract the members in any order and leave in
the file system any version of the file nondeterministically. It is
unspecified which of the tar members is extracted.
@ -1191,14 +1233,14 @@ names resolve to the same file in the file system), the result is undefined.
links to.

File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multithreaded decoding, Up: Top
10 Minimum archive sizes required for multi-threaded block compression
**********************************************************************
10 Minimum archive sizes required for multithreaded block compression
*********************************************************************
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses
as many blocks simultaneously as worker threads are chosen, creating a
When creating or appending to a compressed archive using multithreaded block
compression, tarlz puts tar members together in blocks and compresses as
many blocks simultaneously as worker threads are chosen, creating a
multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed
@ -1334,7 +1376,7 @@ Concept index
* invoking: Invoking tarlz. (line 6)
* minimum archive sizes: Minimum archive sizes. (line 6)
* options: Invoking tarlz. (line 6)
* parallel tar decoding: Multi-threaded decoding. (line 6)
* parallel tar decoding: Multithreaded decoding. (line 6)
* portable character set: Portable character set. (line 6)
* program design: Program design. (line 6)
* usage: Invoking tarlz. (line 6)
@ -1344,29 +1386,29 @@ Concept index

Tag Table:
Node: Top216
Node: Introduction1354
Node: Invoking tarlz4177
Ref: --data-size13263
Ref: --bsolid17922
Ref: --missing-crc21530
Node: Argument syntax23895
Node: Creating backups safely25671
Node: Portable character set28055
Node: File format28707
Ref: key_crc3235754
Ref: ustar-uid-gid39050
Ref: ustar-mtime39857
Node: Amendments to pax format41864
Ref: crc3242572
Ref: flawed-compat43883
Node: Program design47868
Node: Multi-threaded decoding51795
Ref: mt-listing54196
Ref: mt-extraction55234
Node: Minimum archive sizes56540
Node: Examples58669
Node: Problems61164
Node: Concept index61719
Node: Introduction1353
Node: Invoking tarlz4175
Ref: --data-size13086
Ref: --bsolid18556
Ref: --missing-crc22292
Node: Argument syntax25400
Node: Creating backups safely27176
Node: Portable character set29691
Node: File format30412
Ref: key_crc3237458
Ref: ustar-uid-gid40754
Ref: ustar-mtime41561
Node: Amendments to pax format43568
Ref: crc3244276
Ref: flawed-compat45587
Node: Program design49572
Node: Multithreaded decoding53496
Ref: mt-listing55894
Ref: mt-extraction56928
Node: Minimum archive sizes58229
Node: Examples60354
Node: Problems62849
Node: Concept index63404

End Tag Table