Merging upstream version 0.28.1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
9c81793bca
commit
ca8e65110f
26 changed files with 1067 additions and 716 deletions
206
doc/tarlz.info
206
doc/tarlz.info
|
@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.27.1, 4 March 2025).
|
||||
This manual is for Tarlz (version 0.28.1, 24 June 2025).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -23,8 +23,8 @@ This manual is for Tarlz (version 0.27.1, 4 March 2025).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Program design:: Internal structure of tarlz
|
||||
* Multi-threaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Multithreaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multithreaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -41,7 +41,7 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Tarlz is a massively parallel (multi-threaded) combined implementation of
|
||||
Tarlz is a massively parallel (multithreaded) combined implementation of
|
||||
the tar archiver and the lzip compressor. Tarlz uses the compression
|
||||
library lzlib.
|
||||
|
||||
|
@ -131,6 +131,7 @@ to '-1 --solid'.
|
|||
|
||||
tarlz supports the following operations:
|
||||
|
||||
'-?'
|
||||
'--help'
|
||||
Print an informative help message describing the options and exit.
|
||||
|
||||
|
@ -155,7 +156,7 @@ tarlz supports the following operations:
|
|||
|
||||
Concatenating archives containing files in common results in two or
|
||||
more tar members with the same name in the resulting archive, which
|
||||
may produce nondeterministic behavior during multi-threaded extraction.
|
||||
may produce nondeterministic behavior during multithreaded extraction.
|
||||
*Note mt-extraction::.
|
||||
|
||||
'-c'
|
||||
|
@ -188,12 +189,8 @@ tarlz supports the following operations:
|
|||
Even in the case of finding a corrupt member after having deleted some
|
||||
member(s), tarlz stops and copies the rest of the file as soon as
|
||||
corruption is found, leaving it just as corrupt as it was, but not
|
||||
worse.
|
||||
|
||||
To delete a directory without deleting the files under it, use
|
||||
'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place may
|
||||
be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||
data loss.
|
||||
worse. Deleting in place may be dangerous. A corrupt archive, a power
|
||||
cut, or an I/O error may cause data loss.
|
||||
|
||||
'-r'
|
||||
'--append'
|
||||
|
@ -212,7 +209,7 @@ tarlz supports the following operations:
|
|||
|
||||
Appending files already present in the archive results in two or more
|
||||
tar members with the same name, which may produce nondeterministic
|
||||
behavior during multi-threaded extraction. *Note mt-extraction::.
|
||||
behavior during multithreaded extraction. *Note mt-extraction::.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
|
@ -222,13 +219,11 @@ tarlz supports the following operations:
|
|||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive. If FILES are given, extract only the
|
||||
FILES given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
|
||||
directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory usually fails. *Note mt-extraction::.
|
||||
FILES given. Else extract all the files in the archive. Tarlz removes
|
||||
files and empty directories unconditionally before extracting over
|
||||
them. Other than that, it does not make any special effort to extract
|
||||
a file over an incompatible type of file. For example, extracting a
|
||||
file over a non-empty directory usually fails. *Note mt-extraction::.
|
||||
|
||||
'-z'
|
||||
'--compress'
|
||||
|
@ -269,6 +264,9 @@ tarlz supports the following operations:
|
|||
value of LZ_API_VERSION (if defined). *Note Library version:
|
||||
(lzlib)Library version.
|
||||
|
||||
'--time-bits'
|
||||
Print the size of time_t in bits and exit.
|
||||
|
||||
|
||||
tarlz supports the following options: *Note Argument syntax::.
|
||||
|
||||
|
@ -286,16 +284,16 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
Change to directory DIR. When creating, appending, comparing, or
|
||||
extracting, the position of each option '-C' in the command line is
|
||||
significant; it changes the current working directory for the following
|
||||
FILES until a new option '-C' appears in the command line. '--list'
|
||||
and '--delete' ignore any option '-C' specified. DIR is relative to
|
||||
the then current working directory, perhaps changed by a previous
|
||||
option '-C'.
|
||||
FILES (including those specified with option '-T') until a new option
|
||||
'-C' appears in the command line. '--list' and '--delete' ignore any
|
||||
option '-C' specified. DIR is relative to the then current working
|
||||
directory, perhaps changed by a previous option '-C'.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create or decode an archive
|
||||
if an option '-C' appears after a (relative) file name in the command
|
||||
line. (All file names are made relative by removing leading slashes
|
||||
when decoding).
|
||||
Therefore multithreading can't be used to create or decode an archive
|
||||
if an option '-C' appears in the command line after a (relative) file
|
||||
name or after an option '-T'. (All file names are made relative by
|
||||
removing leading slashes when decoding).
|
||||
|
||||
'-f ARCHIVE'
|
||||
'--file=ARCHIVE'
|
||||
|
@ -315,7 +313,7 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
support. A value of 0 disables threads entirely. If this option is not
|
||||
used, tarlz tries to detect the number of processors in the system and
|
||||
use it as default value. 'tarlz --help' shows the system's default
|
||||
value. See the note about multi-threading in the option '-C' above.
|
||||
value. See the note about multithreading in the option '-C' above.
|
||||
|
||||
Note that the number of usable threads is limited during compression to
|
||||
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
|
||||
|
@ -339,6 +337,25 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
'--quiet'
|
||||
Quiet operation. Suppress all messages.
|
||||
|
||||
'-R'
|
||||
'--no-recursive'
|
||||
When creating or appending, don't descend recursively into
|
||||
directories. When decoding, process only the files and directories
|
||||
specified.
|
||||
|
||||
'--recursive'
|
||||
Operate recursively on directories. This is the default.
|
||||
|
||||
'-T FILE'
|
||||
'--files-from=FILE'
|
||||
When creating or appending, read from FILE the names of the files to
|
||||
be archived. When decoding, read from FILE the names of the members to
|
||||
be processed. Each name is terminated by a newline. This option can be
|
||||
used in combination with the option '-R' to read a list of files
|
||||
generated with the 'find' utility. A hyphen '-' used as the name of
|
||||
FILE reads the names from standard input. Multiple '-T' options can be
|
||||
specified.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
Verbosely list files processed. Further -v's (up to 4) increase the
|
||||
|
@ -426,6 +443,10 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
If GROUP is not a valid group name, it is decoded as a decimal numeric
|
||||
group ID.
|
||||
|
||||
'--depth'
|
||||
When creating or appending, archive all entries from each directory
|
||||
before archiving the directory itself.
|
||||
|
||||
'--exclude=PATTERN'
|
||||
Exclude files matching a shell pattern like '*.o', even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
|
@ -477,13 +498,29 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
'1970-01-01 00:00:00 UTC'. Negative seconds or years define a
|
||||
modification time before the epoch.
|
||||
|
||||
'--mount'
|
||||
Stay in local file system when creating archive; skip mount points and
|
||||
don't descend below mount points. This is useful when doing backups of
|
||||
complete file systems.
|
||||
|
||||
'--xdev'
|
||||
Stay in local file system when creating archive; archive the mount
|
||||
points themselves, but don't descend below mount points. This is
|
||||
useful when doing backups of complete file systems. If the function
|
||||
'nftw' of the system C library does not support the flag 'FTW_XDEV',
|
||||
'--xdev' behaves like '--mount'.
|
||||
|
||||
'--out-slots=N'
|
||||
Number of 1 MiB output packets buffered per worker thread during
|
||||
multi-threaded creation or appending to compressed archives.
|
||||
Increasing the number of packets may increase compression speed if the
|
||||
files being archived are larger than 64 MiB compressed, but requires
|
||||
more memory. Valid values range from 1 to 1024. The default value is
|
||||
64.
|
||||
multithreaded creation or appending to compressed archives. Increasing
|
||||
the number of packets may increase compression speed if the files
|
||||
being archived are larger than 64 MiB compressed, but requires more
|
||||
memory. Valid values range from 1 to 1024. The default value is 64.
|
||||
|
||||
'--parallel'
|
||||
Use multithreading to create an uncompressed archive in parallel if the
|
||||
number of threads is greater than 1. This is not the default because
|
||||
it uses much more memory than sequential creation.
|
||||
|
||||
'--warn-newer'
|
||||
During archive creation, warn if any file being archived has a
|
||||
|
@ -575,8 +612,8 @@ compares the files in the archive with the files in the file system:
|
|||
|
||||
Once the integrity and accuracy of an archive have been verified as in
|
||||
the example above, they can be verified again anywhere at any time with
|
||||
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
|
||||
because multi-threaded listing does not detect corruption in the tar member
|
||||
'tarlz -t -n0'. It is important to disable multithreading with '-n0'
|
||||
because multithreaded listing does not detect corruption in the tar member
|
||||
data of multimember archives: *Note mt-listing::.
|
||||
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
|
@ -589,6 +626,9 @@ at a member boundary:
|
|||
|
||||
lzip -tv archive.tar.lz
|
||||
|
||||
The probability of truncation happening at a member boundary is
|
||||
(members - 1) / compressed_size, usually one in several million.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
|
||||
|
||||
|
@ -604,6 +644,8 @@ The set of characters from which portable file names are constructed.
|
|||
The last three characters are the period, underscore, and hyphen-minus
|
||||
characters, respectively.
|
||||
|
||||
Tarlz does not support file names containing newline characters.
|
||||
|
||||
File names are identifiers. Therefore, archiving works better when file
|
||||
names use only the portable character set without spaces added.
|
||||
|
||||
|
@ -657,7 +699,7 @@ following sequence:
|
|||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like '--list' to work. If any tar member is
|
||||
split over two or more lzip members, the archive must be decoded
|
||||
sequentially. *Note Multi-threaded decoding::.
|
||||
sequentially. *Note Multithreaded decoding::.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
|
||||
|
@ -1020,17 +1062,17 @@ without conversion to UTF-8 nor any other transformation. This prevents
|
|||
accidental double UTF-8 conversions.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
|
||||
File: tarlz.info, Node: Program design, Next: Multithreaded decoding, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
8 Internal structure of tarlz
|
||||
*****************************
|
||||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The
|
||||
interesting parts described here are those related to multi-threaded
|
||||
interesting parts described here are those related to multithreaded
|
||||
processing.
|
||||
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
The structure of the part of tarlz performing multithreaded archive
|
||||
creation is somewhat similar to that of plzip with the added complication
|
||||
of the solidity levels. *Note Program design: (plzip)Program design. A
|
||||
grouper thread and several worker threads are created, acting the main
|
||||
|
@ -1100,7 +1142,7 @@ some other worker requests mastership in a previous lzip member can this
|
|||
error be avoided.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
|
||||
File: tarlz.info, Node: Multithreaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
|
||||
|
||||
9 Limitations of parallel tar decoding
|
||||
**************************************
|
||||
|
@ -1126,7 +1168,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
|
|||
parallelized if the tar members are not aligned with the lzip members. Tar
|
||||
archives compressed with plzip can't be decoded in parallel because tar and
|
||||
plzip do not have a way to align both sets of members. Certainly one can
|
||||
decompress one such archive with a multi-threaded tool like plzip, but the
|
||||
decompress one such archive with a multithreaded tool like plzip, but the
|
||||
increase in speed is not as large as it could be because plzip must
|
||||
serialize the decompressed data and pass them to tar, which decodes them
|
||||
sequentially, one tar member at a time.
|
||||
|
@ -1139,13 +1181,13 @@ possible decoding it safely in parallel.
|
|||
|
||||
Tarlz is able to automatically decode aligned and unaligned multimember
|
||||
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
||||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
misalignment during multithreaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
9.1 Multi-threaded listing
|
||||
==========================
|
||||
9.1 Multithreaded listing
|
||||
=========================
|
||||
|
||||
If the files in the archive are large, multi-threaded '--list' on a regular
|
||||
If the files in the archive are large, multithreaded '--list' on a regular
|
||||
(seekable) tar.lz archive can be hundreds of times faster than sequential
|
||||
'--list' because, in addition to using several processors, it only needs to
|
||||
decompress part of each lzip member. See the following example listing the
|
||||
|
@ -1156,20 +1198,20 @@ Silesia corpus on a dual core machine:
|
|||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
|
||||
On the other hand, multi-threaded '--list' won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
On the other hand, multithreaded '--list' won't detect corruption in the
|
||||
tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
why the tar headers (including the extended records) must provide their own
|
||||
integrity checking.
|
||||
|
||||
9.2 Limitations of multi-threaded extraction
|
||||
============================================
|
||||
9.2 Limitations of multithreaded extraction
|
||||
===========================================
|
||||
|
||||
Multi-threaded extraction may produce different output than single-threaded
|
||||
Multithreaded extraction may produce different output than single-threaded
|
||||
extraction in some cases:
|
||||
|
||||
During multi-threaded extraction, several independent threads are
|
||||
During multithreaded extraction, several independent threads are
|
||||
simultaneously reading the archive and creating files in the file system.
|
||||
The archive is not read sequentially. As a consequence, any error or
|
||||
weirdness in the archive (like a corrupt member or an end-of-archive block
|
||||
|
@ -1179,7 +1221,7 @@ archive beyond that point has been processed.
|
|||
If the archive contains two or more tar members with the same name,
|
||||
single-threaded extraction extracts the members in the order they appear in
|
||||
the archive and leaves in the file system the last version of the file. But
|
||||
multi-threaded extraction may extract the members in any order and leave in
|
||||
multithreaded extraction may extract the members in any order and leave in
|
||||
the file system any version of the file nondeterministically. It is
|
||||
unspecified which of the tar members is extracted.
|
||||
|
||||
|
@ -1191,14 +1233,14 @@ names resolve to the same file in the file system), the result is undefined.
|
|||
links to.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multithreaded decoding, Up: Top
|
||||
|
||||
10 Minimum archive sizes required for multi-threaded block compression
|
||||
**********************************************************************
|
||||
10 Minimum archive sizes required for multithreaded block compression
|
||||
*********************************************************************
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
as many blocks simultaneously as worker threads are chosen, creating a
|
||||
When creating or appending to a compressed archive using multithreaded block
|
||||
compression, tarlz puts tar members together in blocks and compresses as
|
||||
many blocks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed
|
||||
|
@ -1334,7 +1376,7 @@ Concept index
|
|||
* invoking: Invoking tarlz. (line 6)
|
||||
* minimum archive sizes: Minimum archive sizes. (line 6)
|
||||
* options: Invoking tarlz. (line 6)
|
||||
* parallel tar decoding: Multi-threaded decoding. (line 6)
|
||||
* parallel tar decoding: Multithreaded decoding. (line 6)
|
||||
* portable character set: Portable character set. (line 6)
|
||||
* program design: Program design. (line 6)
|
||||
* usage: Invoking tarlz. (line 6)
|
||||
|
@ -1344,29 +1386,29 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top216
|
||||
Node: Introduction1354
|
||||
Node: Invoking tarlz4177
|
||||
Ref: --data-size13263
|
||||
Ref: --bsolid17922
|
||||
Ref: --missing-crc21530
|
||||
Node: Argument syntax23895
|
||||
Node: Creating backups safely25671
|
||||
Node: Portable character set28055
|
||||
Node: File format28707
|
||||
Ref: key_crc3235754
|
||||
Ref: ustar-uid-gid39050
|
||||
Ref: ustar-mtime39857
|
||||
Node: Amendments to pax format41864
|
||||
Ref: crc3242572
|
||||
Ref: flawed-compat43883
|
||||
Node: Program design47868
|
||||
Node: Multi-threaded decoding51795
|
||||
Ref: mt-listing54196
|
||||
Ref: mt-extraction55234
|
||||
Node: Minimum archive sizes56540
|
||||
Node: Examples58669
|
||||
Node: Problems61164
|
||||
Node: Concept index61719
|
||||
Node: Introduction1353
|
||||
Node: Invoking tarlz4175
|
||||
Ref: --data-size13086
|
||||
Ref: --bsolid18556
|
||||
Ref: --missing-crc22292
|
||||
Node: Argument syntax25400
|
||||
Node: Creating backups safely27176
|
||||
Node: Portable character set29691
|
||||
Node: File format30412
|
||||
Ref: key_crc3237458
|
||||
Ref: ustar-uid-gid40754
|
||||
Ref: ustar-mtime41561
|
||||
Node: Amendments to pax format43568
|
||||
Ref: crc3244276
|
||||
Ref: flawed-compat45587
|
||||
Node: Program design49572
|
||||
Node: Multithreaded decoding53496
|
||||
Ref: mt-listing55894
|
||||
Ref: mt-extraction56928
|
||||
Node: Minimum archive sizes58229
|
||||
Node: Examples60354
|
||||
Node: Problems62849
|
||||
Node: Concept index63404
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue