Merging upstream version 0.28.1.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
9c81793bca
commit
ca8e65110f
26 changed files with 1067 additions and 716 deletions
31
doc/tarlz.1
31
doc/tarlz.1
|
@ -1,13 +1,13 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
|
||||
.TH TARLZ "1" "March 2025" "tarlz 0.27.1" "User Commands"
|
||||
.TH TARLZ "1" "June 2025" "tarlz 0.28.1" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
.B tarlz
|
||||
\fI\,operation \/\fR[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Tarlz is a massively parallel (multi\-threaded) combined implementation of
|
||||
the tar archiver and the lzip compressor. Tarlz uses the compression library
|
||||
Tarlz is a massively parallel (multithreaded) combined implementation of the
|
||||
tar archiver and the lzip compressor. Tarlz uses the compression library
|
||||
lzlib.
|
||||
.PP
|
||||
Tarlz creates tar archives using a simplified and safer variant of the POSIX
|
||||
|
@ -30,7 +30,7 @@ recover as much data as possible from each damaged member, and lziprecover
|
|||
can be used to recover some of the damaged members.
|
||||
.SS "Operations:"
|
||||
.TP
|
||||
\fB\-\-help\fR
|
||||
\-?, \fB\-\-help\fR
|
||||
display this help and exit
|
||||
.TP
|
||||
\fB\-V\fR, \fB\-\-version\fR
|
||||
|
@ -62,6 +62,9 @@ compress existing POSIX tar archives
|
|||
.TP
|
||||
\fB\-\-check\-lib\fR
|
||||
check version of lzlib and exit
|
||||
.TP
|
||||
\fB\-\-time\-bits\fR
|
||||
print the size of time_t in bits and exit
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-B\fR, \fB\-\-data\-size=\fR<bytes>
|
||||
|
@ -88,6 +91,15 @@ don't subtract the umask on extraction
|
|||
\fB\-q\fR, \fB\-\-quiet\fR
|
||||
suppress all messages
|
||||
.TP
|
||||
\fB\-R\fR, \fB\-\-no\-recursive\fR
|
||||
don't operate recursively on directories
|
||||
.TP
|
||||
\fB\-\-recursive\fR
|
||||
operate recursively on directories (default)
|
||||
.TP
|
||||
\fB\-T\fR, \fB\-\-files\-from=\fR<file>
|
||||
get file names from <file>
|
||||
.TP
|
||||
\fB\-v\fR, \fB\-\-verbose\fR
|
||||
verbosely list files processed
|
||||
.TP
|
||||
|
@ -95,7 +107,7 @@ verbosely list files processed
|
|||
set compression level [default 6]
|
||||
.TP
|
||||
\fB\-\-uncompressed\fR
|
||||
don't compress the archive created
|
||||
create an uncompressed archive
|
||||
.TP
|
||||
\fB\-\-asolid\fR
|
||||
create solidly compressed appendable archive
|
||||
|
@ -121,6 +133,9 @@ use <owner> name/ID for files added to archive
|
|||
\fB\-\-group=\fR<group>
|
||||
use <group> name/ID for files added to archive
|
||||
.TP
|
||||
\fB\-\-depth\fR
|
||||
archive entries before the directory itself
|
||||
.TP
|
||||
\fB\-\-exclude=\fR<pattern>
|
||||
exclude files matching a shell pattern
|
||||
.TP
|
||||
|
@ -139,12 +154,18 @@ don't delete partially extracted files
|
|||
\fB\-\-missing\-crc\fR
|
||||
exit with error status if missing extended CRC
|
||||
.TP
|
||||
\fB\-\-mount\fR, \fB\-\-xdev\fR
|
||||
stay in local file system when creating archive
|
||||
.TP
|
||||
\fB\-\-mtime=\fR<date>
|
||||
use <date> as mtime for files added to archive
|
||||
.TP
|
||||
\fB\-\-out\-slots=\fR<n>
|
||||
number of 1 MiB output packets buffered [64]
|
||||
.TP
|
||||
\fB\-\-parallel\fR
|
||||
create uncompressed archive in parallel
|
||||
.TP
|
||||
\fB\-\-warn\-newer\fR
|
||||
warn if any file is newer than the archive
|
||||
.PP
|
||||
|
|
206
doc/tarlz.info
206
doc/tarlz.info
|
@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.27.1, 4 March 2025).
|
||||
This manual is for Tarlz (version 0.28.1, 24 June 2025).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -23,8 +23,8 @@ This manual is for Tarlz (version 0.27.1, 4 March 2025).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Program design:: Internal structure of tarlz
|
||||
* Multi-threaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Multithreaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multithreaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -41,7 +41,7 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Tarlz is a massively parallel (multi-threaded) combined implementation of
|
||||
Tarlz is a massively parallel (multithreaded) combined implementation of
|
||||
the tar archiver and the lzip compressor. Tarlz uses the compression
|
||||
library lzlib.
|
||||
|
||||
|
@ -131,6 +131,7 @@ to '-1 --solid'.
|
|||
|
||||
tarlz supports the following operations:
|
||||
|
||||
'-?'
|
||||
'--help'
|
||||
Print an informative help message describing the options and exit.
|
||||
|
||||
|
@ -155,7 +156,7 @@ tarlz supports the following operations:
|
|||
|
||||
Concatenating archives containing files in common results in two or
|
||||
more tar members with the same name in the resulting archive, which
|
||||
may produce nondeterministic behavior during multi-threaded extraction.
|
||||
may produce nondeterministic behavior during multithreaded extraction.
|
||||
*Note mt-extraction::.
|
||||
|
||||
'-c'
|
||||
|
@ -188,12 +189,8 @@ tarlz supports the following operations:
|
|||
Even in the case of finding a corrupt member after having deleted some
|
||||
member(s), tarlz stops and copies the rest of the file as soon as
|
||||
corruption is found, leaving it just as corrupt as it was, but not
|
||||
worse.
|
||||
|
||||
To delete a directory without deleting the files under it, use
|
||||
'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place may
|
||||
be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||
data loss.
|
||||
worse. Deleting in place may be dangerous. A corrupt archive, a power
|
||||
cut, or an I/O error may cause data loss.
|
||||
|
||||
'-r'
|
||||
'--append'
|
||||
|
@ -212,7 +209,7 @@ tarlz supports the following operations:
|
|||
|
||||
Appending files already present in the archive results in two or more
|
||||
tar members with the same name, which may produce nondeterministic
|
||||
behavior during multi-threaded extraction. *Note mt-extraction::.
|
||||
behavior during multithreaded extraction. *Note mt-extraction::.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
|
@ -222,13 +219,11 @@ tarlz supports the following operations:
|
|||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive. If FILES are given, extract only the
|
||||
FILES given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
|
||||
directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory usually fails. *Note mt-extraction::.
|
||||
FILES given. Else extract all the files in the archive. Tarlz removes
|
||||
files and empty directories unconditionally before extracting over
|
||||
them. Other than that, it does not make any special effort to extract
|
||||
a file over an incompatible type of file. For example, extracting a
|
||||
file over a non-empty directory usually fails. *Note mt-extraction::.
|
||||
|
||||
'-z'
|
||||
'--compress'
|
||||
|
@ -269,6 +264,9 @@ tarlz supports the following operations:
|
|||
value of LZ_API_VERSION (if defined). *Note Library version:
|
||||
(lzlib)Library version.
|
||||
|
||||
'--time-bits'
|
||||
Print the size of time_t in bits and exit.
|
||||
|
||||
|
||||
tarlz supports the following options: *Note Argument syntax::.
|
||||
|
||||
|
@ -286,16 +284,16 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
Change to directory DIR. When creating, appending, comparing, or
|
||||
extracting, the position of each option '-C' in the command line is
|
||||
significant; it changes the current working directory for the following
|
||||
FILES until a new option '-C' appears in the command line. '--list'
|
||||
and '--delete' ignore any option '-C' specified. DIR is relative to
|
||||
the then current working directory, perhaps changed by a previous
|
||||
option '-C'.
|
||||
FILES (including those specified with option '-T') until a new option
|
||||
'-C' appears in the command line. '--list' and '--delete' ignore any
|
||||
option '-C' specified. DIR is relative to the then current working
|
||||
directory, perhaps changed by a previous option '-C'.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create or decode an archive
|
||||
if an option '-C' appears after a (relative) file name in the command
|
||||
line. (All file names are made relative by removing leading slashes
|
||||
when decoding).
|
||||
Therefore multithreading can't be used to create or decode an archive
|
||||
if an option '-C' appears in the command line after a (relative) file
|
||||
name or after an option '-T'. (All file names are made relative by
|
||||
removing leading slashes when decoding).
|
||||
|
||||
'-f ARCHIVE'
|
||||
'--file=ARCHIVE'
|
||||
|
@ -315,7 +313,7 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
support. A value of 0 disables threads entirely. If this option is not
|
||||
used, tarlz tries to detect the number of processors in the system and
|
||||
use it as default value. 'tarlz --help' shows the system's default
|
||||
value. See the note about multi-threading in the option '-C' above.
|
||||
value. See the note about multithreading in the option '-C' above.
|
||||
|
||||
Note that the number of usable threads is limited during compression to
|
||||
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
|
||||
|
@ -339,6 +337,25 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
'--quiet'
|
||||
Quiet operation. Suppress all messages.
|
||||
|
||||
'-R'
|
||||
'--no-recursive'
|
||||
When creating or appending, don't descend recursively into
|
||||
directories. When decoding, process only the files and directories
|
||||
specified.
|
||||
|
||||
'--recursive'
|
||||
Operate recursively on directories. This is the default.
|
||||
|
||||
'-T FILE'
|
||||
'--files-from=FILE'
|
||||
When creating or appending, read from FILE the names of the files to
|
||||
be archived. When decoding, read from FILE the names of the members to
|
||||
be processed. Each name is terminated by a newline. This option can be
|
||||
used in combination with the option '-R' to read a list of files
|
||||
generated with the 'find' utility. A hyphen '-' used as the name of
|
||||
FILE reads the names from standard input. Multiple '-T' options can be
|
||||
specified.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
Verbosely list files processed. Further -v's (up to 4) increase the
|
||||
|
@ -426,6 +443,10 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
If GROUP is not a valid group name, it is decoded as a decimal numeric
|
||||
group ID.
|
||||
|
||||
'--depth'
|
||||
When creating or appending, archive all entries from each directory
|
||||
before archiving the directory itself.
|
||||
|
||||
'--exclude=PATTERN'
|
||||
Exclude files matching a shell pattern like '*.o', even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
|
@ -477,13 +498,29 @@ tarlz supports the following options: *Note Argument syntax::.
|
|||
'1970-01-01 00:00:00 UTC'. Negative seconds or years define a
|
||||
modification time before the epoch.
|
||||
|
||||
'--mount'
|
||||
Stay in local file system when creating archive; skip mount points and
|
||||
don't descend below mount points. This is useful when doing backups of
|
||||
complete file systems.
|
||||
|
||||
'--xdev'
|
||||
Stay in local file system when creating archive; archive the mount
|
||||
points themselves, but don't descend below mount points. This is
|
||||
useful when doing backups of complete file systems. If the function
|
||||
'nftw' of the system C library does not support the flag 'FTW_XDEV',
|
||||
'--xdev' behaves like '--mount'.
|
||||
|
||||
'--out-slots=N'
|
||||
Number of 1 MiB output packets buffered per worker thread during
|
||||
multi-threaded creation or appending to compressed archives.
|
||||
Increasing the number of packets may increase compression speed if the
|
||||
files being archived are larger than 64 MiB compressed, but requires
|
||||
more memory. Valid values range from 1 to 1024. The default value is
|
||||
64.
|
||||
multithreaded creation or appending to compressed archives. Increasing
|
||||
the number of packets may increase compression speed if the files
|
||||
being archived are larger than 64 MiB compressed, but requires more
|
||||
memory. Valid values range from 1 to 1024. The default value is 64.
|
||||
|
||||
'--parallel'
|
||||
Use multithreading to create an uncompressed archive in parallel if the
|
||||
number of threads is greater than 1. This is not the default because
|
||||
it uses much more memory than sequential creation.
|
||||
|
||||
'--warn-newer'
|
||||
During archive creation, warn if any file being archived has a
|
||||
|
@ -575,8 +612,8 @@ compares the files in the archive with the files in the file system:
|
|||
|
||||
Once the integrity and accuracy of an archive have been verified as in
|
||||
the example above, they can be verified again anywhere at any time with
|
||||
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
|
||||
because multi-threaded listing does not detect corruption in the tar member
|
||||
'tarlz -t -n0'. It is important to disable multithreading with '-n0'
|
||||
because multithreaded listing does not detect corruption in the tar member
|
||||
data of multimember archives: *Note mt-listing::.
|
||||
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
|
@ -589,6 +626,9 @@ at a member boundary:
|
|||
|
||||
lzip -tv archive.tar.lz
|
||||
|
||||
The probability of truncation happening at a member boundary is
|
||||
(members - 1) / compressed_size, usually one in several million.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
|
||||
|
||||
|
@ -604,6 +644,8 @@ The set of characters from which portable file names are constructed.
|
|||
The last three characters are the period, underscore, and hyphen-minus
|
||||
characters, respectively.
|
||||
|
||||
Tarlz does not support file names containing newline characters.
|
||||
|
||||
File names are identifiers. Therefore, archiving works better when file
|
||||
names use only the portable character set without spaces added.
|
||||
|
||||
|
@ -657,7 +699,7 @@ following sequence:
|
|||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like '--list' to work. If any tar member is
|
||||
split over two or more lzip members, the archive must be decoded
|
||||
sequentially. *Note Multi-threaded decoding::.
|
||||
sequentially. *Note Multithreaded decoding::.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
|
||||
|
@ -1020,17 +1062,17 @@ without conversion to UTF-8 nor any other transformation. This prevents
|
|||
accidental double UTF-8 conversions.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
|
||||
File: tarlz.info, Node: Program design, Next: Multithreaded decoding, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
8 Internal structure of tarlz
|
||||
*****************************
|
||||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The
|
||||
interesting parts described here are those related to multi-threaded
|
||||
interesting parts described here are those related to multithreaded
|
||||
processing.
|
||||
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
The structure of the part of tarlz performing multithreaded archive
|
||||
creation is somewhat similar to that of plzip with the added complication
|
||||
of the solidity levels. *Note Program design: (plzip)Program design. A
|
||||
grouper thread and several worker threads are created, acting the main
|
||||
|
@ -1100,7 +1142,7 @@ some other worker requests mastership in a previous lzip member can this
|
|||
error be avoided.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
|
||||
File: tarlz.info, Node: Multithreaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
|
||||
|
||||
9 Limitations of parallel tar decoding
|
||||
**************************************
|
||||
|
@ -1126,7 +1168,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
|
|||
parallelized if the tar members are not aligned with the lzip members. Tar
|
||||
archives compressed with plzip can't be decoded in parallel because tar and
|
||||
plzip do not have a way to align both sets of members. Certainly one can
|
||||
decompress one such archive with a multi-threaded tool like plzip, but the
|
||||
decompress one such archive with a multithreaded tool like plzip, but the
|
||||
increase in speed is not as large as it could be because plzip must
|
||||
serialize the decompressed data and pass them to tar, which decodes them
|
||||
sequentially, one tar member at a time.
|
||||
|
@ -1139,13 +1181,13 @@ possible decoding it safely in parallel.
|
|||
|
||||
Tarlz is able to automatically decode aligned and unaligned multimember
|
||||
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
||||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
misalignment during multithreaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
9.1 Multi-threaded listing
|
||||
==========================
|
||||
9.1 Multithreaded listing
|
||||
=========================
|
||||
|
||||
If the files in the archive are large, multi-threaded '--list' on a regular
|
||||
If the files in the archive are large, multithreaded '--list' on a regular
|
||||
(seekable) tar.lz archive can be hundreds of times faster than sequential
|
||||
'--list' because, in addition to using several processors, it only needs to
|
||||
decompress part of each lzip member. See the following example listing the
|
||||
|
@ -1156,20 +1198,20 @@ Silesia corpus on a dual core machine:
|
|||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
|
||||
On the other hand, multi-threaded '--list' won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
On the other hand, multithreaded '--list' won't detect corruption in the
|
||||
tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
why the tar headers (including the extended records) must provide their own
|
||||
integrity checking.
|
||||
|
||||
9.2 Limitations of multi-threaded extraction
|
||||
============================================
|
||||
9.2 Limitations of multithreaded extraction
|
||||
===========================================
|
||||
|
||||
Multi-threaded extraction may produce different output than single-threaded
|
||||
Multithreaded extraction may produce different output than single-threaded
|
||||
extraction in some cases:
|
||||
|
||||
During multi-threaded extraction, several independent threads are
|
||||
During multithreaded extraction, several independent threads are
|
||||
simultaneously reading the archive and creating files in the file system.
|
||||
The archive is not read sequentially. As a consequence, any error or
|
||||
weirdness in the archive (like a corrupt member or an end-of-archive block
|
||||
|
@ -1179,7 +1221,7 @@ archive beyond that point has been processed.
|
|||
If the archive contains two or more tar members with the same name,
|
||||
single-threaded extraction extracts the members in the order they appear in
|
||||
the archive and leaves in the file system the last version of the file. But
|
||||
multi-threaded extraction may extract the members in any order and leave in
|
||||
multithreaded extraction may extract the members in any order and leave in
|
||||
the file system any version of the file nondeterministically. It is
|
||||
unspecified which of the tar members is extracted.
|
||||
|
||||
|
@ -1191,14 +1233,14 @@ names resolve to the same file in the file system), the result is undefined.
|
|||
links to.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multithreaded decoding, Up: Top
|
||||
|
||||
10 Minimum archive sizes required for multi-threaded block compression
|
||||
**********************************************************************
|
||||
10 Minimum archive sizes required for multithreaded block compression
|
||||
*********************************************************************
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
as many blocks simultaneously as worker threads are chosen, creating a
|
||||
When creating or appending to a compressed archive using multithreaded block
|
||||
compression, tarlz puts tar members together in blocks and compresses as
|
||||
many blocks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed
|
||||
|
@ -1334,7 +1376,7 @@ Concept index
|
|||
* invoking: Invoking tarlz. (line 6)
|
||||
* minimum archive sizes: Minimum archive sizes. (line 6)
|
||||
* options: Invoking tarlz. (line 6)
|
||||
* parallel tar decoding: Multi-threaded decoding. (line 6)
|
||||
* parallel tar decoding: Multithreaded decoding. (line 6)
|
||||
* portable character set: Portable character set. (line 6)
|
||||
* program design: Program design. (line 6)
|
||||
* usage: Invoking tarlz. (line 6)
|
||||
|
@ -1344,29 +1386,29 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top216
|
||||
Node: Introduction1354
|
||||
Node: Invoking tarlz4177
|
||||
Ref: --data-size13263
|
||||
Ref: --bsolid17922
|
||||
Ref: --missing-crc21530
|
||||
Node: Argument syntax23895
|
||||
Node: Creating backups safely25671
|
||||
Node: Portable character set28055
|
||||
Node: File format28707
|
||||
Ref: key_crc3235754
|
||||
Ref: ustar-uid-gid39050
|
||||
Ref: ustar-mtime39857
|
||||
Node: Amendments to pax format41864
|
||||
Ref: crc3242572
|
||||
Ref: flawed-compat43883
|
||||
Node: Program design47868
|
||||
Node: Multi-threaded decoding51795
|
||||
Ref: mt-listing54196
|
||||
Ref: mt-extraction55234
|
||||
Node: Minimum archive sizes56540
|
||||
Node: Examples58669
|
||||
Node: Problems61164
|
||||
Node: Concept index61719
|
||||
Node: Introduction1353
|
||||
Node: Invoking tarlz4175
|
||||
Ref: --data-size13086
|
||||
Ref: --bsolid18556
|
||||
Ref: --missing-crc22292
|
||||
Node: Argument syntax25400
|
||||
Node: Creating backups safely27176
|
||||
Node: Portable character set29691
|
||||
Node: File format30412
|
||||
Ref: key_crc3237458
|
||||
Ref: ustar-uid-gid40754
|
||||
Ref: ustar-mtime41561
|
||||
Node: Amendments to pax format43568
|
||||
Ref: crc3244276
|
||||
Ref: flawed-compat45587
|
||||
Node: Program design49572
|
||||
Node: Multithreaded decoding53496
|
||||
Ref: mt-listing55894
|
||||
Ref: mt-extraction56928
|
||||
Node: Minimum archive sizes58229
|
||||
Node: Examples60354
|
||||
Node: Problems62849
|
||||
Node: Concept index63404
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
147
doc/tarlz.texi
147
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 4 March 2025
|
||||
@set VERSION 0.27.1
|
||||
@set UPDATED 24 June 2025
|
||||
@set VERSION 0.28.1
|
||||
|
||||
@dircategory Archiving
|
||||
@direntry
|
||||
|
@ -44,8 +44,8 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Program design:: Internal structure of tarlz
|
||||
* Multi-threaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Multithreaded decoding:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multithreaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -64,7 +64,7 @@ distribute, and modify it.
|
|||
@cindex introduction
|
||||
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
|
||||
(multi-threaded) combined implementation of the tar archiver and the
|
||||
(multithreaded) combined implementation of the tar archiver and the
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
|
||||
compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
|
||||
|
||||
|
@ -171,7 +171,8 @@ equivalent to @w{@option{-1 --solid}}.
|
|||
tarlz supports the following operations:
|
||||
|
||||
@table @code
|
||||
@item --help
|
||||
@item -?
|
||||
@itemx --help
|
||||
Print an informative help message describing the options and exit.
|
||||
|
||||
@item -V
|
||||
|
@ -195,7 +196,7 @@ no @var{files} have been specified.
|
|||
|
||||
Concatenating archives containing files in common results in two or more tar
|
||||
members with the same name in the resulting archive, which may produce
|
||||
nondeterministic behavior during multi-threaded extraction.
|
||||
nondeterministic behavior during multithreaded extraction.
|
||||
@xref{mt-extraction}.
|
||||
|
||||
@item -c
|
||||
|
@ -226,12 +227,9 @@ not delete a tar member unless it is possible to do so. For example it won't
|
|||
try to delete a tar member that is not compressed individually. Even in the
|
||||
case of finding a corrupt member after having deleted some member(s), tarlz
|
||||
stops and copies the rest of the file as soon as corruption is found,
|
||||
leaving it just as corrupt as it was, but not worse.
|
||||
|
||||
To delete a directory without deleting the files under it, use
|
||||
@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
|
||||
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||
data loss.
|
||||
leaving it just as corrupt as it was, but not worse. Deleting in place may
|
||||
be dangerous. A corrupt archive, a power cut, or an I/O error may cause data
|
||||
loss.
|
||||
|
||||
@item -r
|
||||
@itemx --append
|
||||
|
@ -250,7 +248,7 @@ if no @var{files} have been specified.
|
|||
|
||||
Appending files already present in the archive results in two or more tar
|
||||
members with the same name, which may produce nondeterministic behavior
|
||||
during multi-threaded extraction. @xref{mt-extraction}.
|
||||
during multithreaded extraction. @xref{mt-extraction}.
|
||||
|
||||
@item -t
|
||||
@itemx --list
|
||||
|
@ -260,13 +258,11 @@ List the contents of an archive. If @var{files} are given, list only the
|
|||
@item -x
|
||||
@itemx --extract
|
||||
Extract files from an archive. If @var{files} are given, extract only the
|
||||
@var{files} given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
|
||||
empty directories unconditionally before extracting over them. Other than
|
||||
that, it does not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a non-empty
|
||||
directory usually fails. @xref{mt-extraction}.
|
||||
@var{files} given. Else extract all the files in the archive. Tarlz removes
|
||||
files and empty directories unconditionally before extracting over them.
|
||||
Other than that, it does not make any special effort to extract a file over
|
||||
an incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory usually fails. @xref{mt-extraction}.
|
||||
|
||||
@item -z
|
||||
@itemx --compress
|
||||
|
@ -310,6 +306,9 @@ and the value of LZ_API_VERSION (if defined).
|
|||
@xref{Library version,,,lzlib}.
|
||||
@end ifnothtml
|
||||
|
||||
@item --time-bits
|
||||
Print the size of time_t in bits and exit.
|
||||
|
||||
@end table
|
||||
|
||||
@noindent
|
||||
|
@ -331,15 +330,17 @@ member large enough to contain the file.
|
|||
Change to directory @var{dir}. When creating, appending, comparing, or
|
||||
extracting, the position of each option @option{-C} in the command line is
|
||||
significant; it changes the current working directory for the following
|
||||
@var{files} until a new option @option{-C} appears in the command line.
|
||||
@option{--list} and @option{--delete} ignore any option @option{-C}
|
||||
specified. @var{dir} is relative to the then current working directory,
|
||||
perhaps changed by a previous option @option{-C}.
|
||||
@var{files} (including those specified with option @option{-T}) until a new
|
||||
option @option{-C} appears in the command line. @option{--list} and
|
||||
@option{--delete} ignore any option @option{-C} specified. @var{dir} is
|
||||
relative to the then current working directory, perhaps changed by a
|
||||
previous option @option{-C}.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create or decode an archive if an
|
||||
option @option{-C} appears after a (relative) file name in the command line.
|
||||
(All file names are made relative by removing leading slashes when decoding).
|
||||
Therefore multithreading can't be used to create or decode an archive if an
|
||||
option @option{-C} appears in the command line after a (relative) file name
|
||||
or after an option @option{-T}. (All file names are made relative by
|
||||
removing leading slashes when decoding).
|
||||
|
||||
@item -f @var{archive}
|
||||
@itemx --file=@var{archive}
|
||||
|
@ -358,7 +359,7 @@ Valid values range from 0 to as many as your system can support. A value
|
|||
of 0 disables threads entirely. If this option is not used, tarlz tries to
|
||||
detect the number of processors in the system and use it as default value.
|
||||
@w{@samp{tarlz --help}} shows the system's default value. See the note about
|
||||
multi-threading in the option @option{-C} above.
|
||||
multithreading in the option @option{-C} above.
|
||||
|
||||
Note that the number of usable threads is limited during compression to
|
||||
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
|
||||
|
@ -382,6 +383,24 @@ permissions specified in the archive.
|
|||
@itemx --quiet
|
||||
Quiet operation. Suppress all messages.
|
||||
|
||||
@item -R
|
||||
@itemx --no-recursive
|
||||
When creating or appending, don't descend recursively into directories. When
|
||||
decoding, process only the files and directories specified.
|
||||
|
||||
@item --recursive
|
||||
Operate recursively on directories. This is the default.
|
||||
|
||||
@item -T @var{file}
|
||||
@itemx --files-from=@var{file}
|
||||
When creating or appending, read from @var{file} the names of the files to
|
||||
be archived. When decoding, read from @var{file} the names of the members to
|
||||
be processed. Each name is terminated by a newline. This option can be used
|
||||
in combination with the option @option{-R} to read a list of files generated
|
||||
with the @command{find} utility. A hyphen @samp{-} used as the name of
|
||||
@var{file} reads the names from standard input. Multiple @option{-T} options
|
||||
can be specified.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
Verbosely list files processed. Further -v's (up to 4) increase the
|
||||
|
@ -468,6 +487,10 @@ When creating or appending, use @var{group} for files added to the archive.
|
|||
If @var{group} is not a valid group name, it is decoded as a decimal numeric
|
||||
group ID.
|
||||
|
||||
@item --depth
|
||||
When creating or appending, archive all entries from each directory before
|
||||
archiving the directory itself.
|
||||
|
||||
@item --exclude=@var{pattern}
|
||||
Exclude files matching a shell pattern like @file{*.o}, even if the files
|
||||
are specified in the command line. A file is considered to match if any
|
||||
|
@ -520,13 +543,30 @@ format is optional and defaults to @samp{00:00:00}. The epoch is
|
|||
@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
|
||||
modification time before the epoch.
|
||||
|
||||
@item --mount
|
||||
Stay in local file system when creating archive; skip mount points and don't
|
||||
descend below mount points. This is useful when doing backups of complete
|
||||
file systems.
|
||||
|
||||
@item --xdev
|
||||
Stay in local file system when creating archive; archive the mount points
|
||||
themselves, but don't descend below mount points. This is useful when doing
|
||||
backups of complete file systems. If the function @samp{nftw} of the system
|
||||
C library does not support the flag @samp{FTW_XDEV}, @option{--xdev} behaves
|
||||
like @option{--mount}.
|
||||
|
||||
@item --out-slots=@var{n}
|
||||
Number of @w{1 MiB} output packets buffered per worker thread during
|
||||
multi-threaded creation or appending to compressed archives. Increasing the
|
||||
multithreaded creation or appending to compressed archives. Increasing the
|
||||
number of packets may increase compression speed if the files being archived
|
||||
are larger than @w{64 MiB} compressed, but requires more memory. Valid
|
||||
values range from 1 to 1024. The default value is 64.
|
||||
|
||||
@item --parallel
|
||||
Use multithreading to create an uncompressed archive in parallel if the
|
||||
number of threads is greater than 1. This is not the default because it uses
|
||||
much more memory than sequential creation.
|
||||
|
||||
@item --warn-newer
|
||||
During archive creation, warn if any file being archived has a modification
|
||||
time newer than the archive creation time. This option may slow archive
|
||||
|
@ -630,9 +670,9 @@ tarlz -df archive.tar.lz # check the archive
|
|||
|
||||
Once the integrity and accuracy of an archive have been verified as in the
|
||||
example above, they can be verified again anywhere at any time with
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
|
||||
@option{-n0} because multi-threaded listing does not detect corruption in
|
||||
the tar member data of multimember archives: @xref{mt-listing}.
|
||||
@w{@samp{tarlz -t -n0}}. It is important to disable multithreading with
|
||||
@option{-n0} because multithreaded listing does not detect corruption in the
|
||||
tar member data of multimember archives: @xref{mt-listing}.
|
||||
|
||||
@example
|
||||
tarlz -t -n0 -f archive.tar.lz > /dev/null
|
||||
|
@ -648,6 +688,9 @@ just at a member boundary:
|
|||
lzip -tv archive.tar.lz
|
||||
@end example
|
||||
|
||||
The probability of truncation happening at a member boundary is
|
||||
@w{(members - 1) / compressed_size}, usually one in several million.
|
||||
|
||||
|
||||
@node Portable character set
|
||||
@chapter POSIX portable filename character set
|
||||
|
@ -664,6 +707,8 @@ a b c d e f g h i j k l m n o p q r s t u v w x y z
|
|||
The last three characters are the period, underscore, and hyphen-minus
|
||||
characters, respectively.
|
||||
|
||||
Tarlz does not support file names containing newline characters.
|
||||
|
||||
File names are identifiers. Therefore, archiving works better when file
|
||||
names use only the portable character set without spaces added.
|
||||
|
||||
|
@ -726,7 +771,7 @@ Zero or more blocks that contain the contents of the file.
|
|||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like @option{--list} to work. If any tar member
|
||||
is split over two or more lzip members, the archive must be decoded
|
||||
sequentially. @xref{Multi-threaded decoding}.
|
||||
sequentially. @xref{Multithreaded decoding}.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
|
||||
|
@ -1111,10 +1156,10 @@ accidental double UTF-8 conversions.
|
|||
|
||||
The parts of tarlz related to sequential processing of the archive are more
|
||||
or less similar to any other tar and won't be described here. The interesting
|
||||
parts described here are those related to multi-threaded processing.
|
||||
parts described here are those related to multithreaded processing.
|
||||
|
||||
The structure of the part of tarlz performing multi-threaded archive
|
||||
creation is somewhat similar to that of
|
||||
The structure of the part of tarlz performing multithreaded archive creation
|
||||
is somewhat similar to that of
|
||||
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
|
||||
with the added complication of the solidity levels.
|
||||
@ifnothtml
|
||||
|
@ -1190,7 +1235,7 @@ some other worker requests mastership in a previous lzip member can this
|
|||
error be avoided.
|
||||
|
||||
|
||||
@node Multi-threaded decoding
|
||||
@node Multithreaded decoding
|
||||
@chapter Limitations of parallel tar decoding
|
||||
@cindex parallel tar decoding
|
||||
|
||||
|
@ -1215,7 +1260,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
|
|||
parallelized if the tar members are not aligned with the lzip members. Tar
|
||||
archives compressed with plzip can't be decoded in parallel because tar and
|
||||
plzip do not have a way to align both sets of members. Certainly one can
|
||||
decompress one such archive with a multi-threaded tool like plzip, but the
|
||||
decompress one such archive with a multithreaded tool like plzip, but the
|
||||
increase in speed is not as large as it could be because plzip must
|
||||
serialize the decompressed data and pass them to tar, which decodes them
|
||||
sequentially, one tar member at a time.
|
||||
|
@ -1228,13 +1273,13 @@ decoding it safely in parallel.
|
|||
|
||||
Tarlz is able to automatically decode aligned and unaligned multimember
|
||||
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
||||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
misalignment during multithreaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive.
|
||||
|
||||
@anchor{mt-listing}
|
||||
@section Multi-threaded listing
|
||||
@section Multithreaded listing
|
||||
|
||||
If the files in the archive are large, multi-threaded @option{--list} on a
|
||||
If the files in the archive are large, multithreaded @option{--list} on a
|
||||
regular (seekable) tar.lz archive can be hundreds of times faster than
|
||||
sequential @option{--list} because, in addition to using several processors,
|
||||
it only needs to decompress part of each lzip member. See the following
|
||||
|
@ -1247,7 +1292,7 @@ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
|||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
@end example
|
||||
|
||||
On the other hand, multi-threaded @option{--list} won't detect corruption in
|
||||
On the other hand, multithreaded @option{--list} won't detect corruption in
|
||||
the tar member data because it only decodes the part of each lzip member
|
||||
corresponding to the tar member header. Partial decoding of a lzip member
|
||||
can't guarantee the integrity of the data decoded. This is another reason
|
||||
|
@ -1255,12 +1300,12 @@ why the tar headers (including the extended records) must provide their own
|
|||
integrity checking.
|
||||
|
||||
@anchor{mt-extraction}
|
||||
@section Limitations of multi-threaded extraction
|
||||
@section Limitations of multithreaded extraction
|
||||
|
||||
Multi-threaded extraction may produce different output than single-threaded
|
||||
Multithreaded extraction may produce different output than single-threaded
|
||||
extraction in some cases:
|
||||
|
||||
During multi-threaded extraction, several independent threads are
|
||||
During multithreaded extraction, several independent threads are
|
||||
simultaneously reading the archive and creating files in the file system.
|
||||
The archive is not read sequentially. As a consequence, any error or
|
||||
weirdness in the archive (like a corrupt member or an end-of-archive block
|
||||
|
@ -1270,7 +1315,7 @@ archive beyond that point has been processed.
|
|||
If the archive contains two or more tar members with the same name,
|
||||
single-threaded extraction extracts the members in the order they appear in
|
||||
the archive and leaves in the file system the last version of the file. But
|
||||
multi-threaded extraction may extract the members in any order and leave in
|
||||
multithreaded extraction may extract the members in any order and leave in
|
||||
the file system any version of the file nondeterministically. It is
|
||||
unspecified which of the tar members is extracted.
|
||||
|
||||
|
@ -1283,12 +1328,12 @@ links to.
|
|||
|
||||
|
||||
@node Minimum archive sizes
|
||||
@chapter Minimum archive sizes required for multi-threaded block compression
|
||||
@chapter Minimum archive sizes required for multithreaded block compression
|
||||
@cindex minimum archive sizes
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
as many blocks simultaneously as worker threads are chosen, creating a
|
||||
When creating or appending to a compressed archive using multithreaded block
|
||||
compression, tarlz puts tar members together in blocks and compresses as
|
||||
many blocks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed by
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue