1
0
Fork 0

Merging upstream version 0.28.1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-06-25 03:37:17 +02:00
parent 9c81793bca
commit ca8e65110f
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
26 changed files with 1067 additions and 716 deletions

View file

@ -1,13 +1,13 @@
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.49.2.
.TH TARLZ "1" "March 2025" "tarlz 0.27.1" "User Commands"
.TH TARLZ "1" "June 2025" "tarlz 0.28.1" "User Commands"
.SH NAME
tarlz \- creates tar archives with multimember lzip compression
.SH SYNOPSIS
.B tarlz
\fI\,operation \/\fR[\fI\,options\/\fR] [\fI\,files\/\fR]
.SH DESCRIPTION
Tarlz is a massively parallel (multi\-threaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz uses the compression library
Tarlz is a massively parallel (multithreaded) combined implementation of the
tar archiver and the lzip compressor. Tarlz uses the compression library
lzlib.
.PP
Tarlz creates tar archives using a simplified and safer variant of the POSIX
@ -30,7 +30,7 @@ recover as much data as possible from each damaged member, and lziprecover
can be used to recover some of the damaged members.
.SS "Operations:"
.TP
\fB\-\-help\fR
\-?, \fB\-\-help\fR
display this help and exit
.TP
\fB\-V\fR, \fB\-\-version\fR
@ -62,6 +62,9 @@ compress existing POSIX tar archives
.TP
\fB\-\-check\-lib\fR
check version of lzlib and exit
.TP
\fB\-\-time\-bits\fR
print the size of time_t in bits and exit
.SH OPTIONS
.TP
\fB\-B\fR, \fB\-\-data\-size=\fR<bytes>
@ -88,6 +91,15 @@ don't subtract the umask on extraction
\fB\-q\fR, \fB\-\-quiet\fR
suppress all messages
.TP
\fB\-R\fR, \fB\-\-no\-recursive\fR
don't operate recursively on directories
.TP
\fB\-\-recursive\fR
operate recursively on directories (default)
.TP
\fB\-T\fR, \fB\-\-files\-from=\fR<file>
get file names from <file>
.TP
\fB\-v\fR, \fB\-\-verbose\fR
verbosely list files processed
.TP
@ -95,7 +107,7 @@ verbosely list files processed
set compression level [default 6]
.TP
\fB\-\-uncompressed\fR
don't compress the archive created
create an uncompressed archive
.TP
\fB\-\-asolid\fR
create solidly compressed appendable archive
@ -121,6 +133,9 @@ use <owner> name/ID for files added to archive
\fB\-\-group=\fR<group>
use <group> name/ID for files added to archive
.TP
\fB\-\-depth\fR
archive entries before the directory itself
.TP
\fB\-\-exclude=\fR<pattern>
exclude files matching a shell pattern
.TP
@ -139,12 +154,18 @@ don't delete partially extracted files
\fB\-\-missing\-crc\fR
exit with error status if missing extended CRC
.TP
\fB\-\-mount\fR, \fB\-\-xdev\fR
stay in local file system when creating archive
.TP
\fB\-\-mtime=\fR<date>
use <date> as mtime for files added to archive
.TP
\fB\-\-out\-slots=\fR<n>
number of 1 MiB output packets buffered [64]
.TP
\fB\-\-parallel\fR
create uncompressed archive in parallel
.TP
\fB\-\-warn\-newer\fR
warn if any file is newer than the archive
.PP

View file

@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.27.1, 4 March 2025).
This manual is for Tarlz (version 0.28.1, 24 June 2025).
* Menu:
@ -23,8 +23,8 @@ This manual is for Tarlz (version 0.27.1, 4 March 2025).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Program design:: Internal structure of tarlz
* Multi-threaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Multithreaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multithreaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -41,7 +41,7 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
Tarlz is a massively parallel (multi-threaded) combined implementation of
Tarlz is a massively parallel (multithreaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz uses the compression
library lzlib.
@ -131,6 +131,7 @@ to '-1 --solid'.
tarlz supports the following operations:
'-?'
'--help'
Print an informative help message describing the options and exit.
@ -155,7 +156,7 @@ tarlz supports the following operations:
Concatenating archives containing files in common results in two or
more tar members with the same name in the resulting archive, which
may produce nondeterministic behavior during multi-threaded extraction.
may produce nondeterministic behavior during multithreaded extraction.
*Note mt-extraction::.
'-c'
@ -188,12 +189,8 @@ tarlz supports the following operations:
Even in the case of finding a corrupt member after having deleted some
member(s), tarlz stops and copies the rest of the file as soon as
corruption is found, leaving it just as corrupt as it was, but not
worse.
To delete a directory without deleting the files under it, use
'tarlz --delete -f foo --exclude='dir/*' dir'. Deleting in place may
be dangerous. A corrupt archive, a power cut, or an I/O error may cause
data loss.
worse. Deleting in place may be dangerous. A corrupt archive, a power
cut, or an I/O error may cause data loss.
'-r'
'--append'
@ -212,7 +209,7 @@ tarlz supports the following operations:
Appending files already present in the archive results in two or more
tar members with the same name, which may produce nondeterministic
behavior during multi-threaded extraction. *Note mt-extraction::.
behavior during multithreaded extraction. *Note mt-extraction::.
'-t'
'--list'
@ -222,13 +219,11 @@ tarlz supports the following operations:
'-x'
'--extract'
Extract files from an archive. If FILES are given, extract only the
FILES given. Else extract all the files in the archive. To extract a
directory without extracting the files under it, use
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a
non-empty directory usually fails. *Note mt-extraction::.
FILES given. Else extract all the files in the archive. Tarlz removes
files and empty directories unconditionally before extracting over
them. Other than that, it does not make any special effort to extract
a file over an incompatible type of file. For example, extracting a
file over a non-empty directory usually fails. *Note mt-extraction::.
'-z'
'--compress'
@ -269,6 +264,9 @@ tarlz supports the following operations:
value of LZ_API_VERSION (if defined). *Note Library version:
(lzlib)Library version.
'--time-bits'
Print the size of time_t in bits and exit.
tarlz supports the following options: *Note Argument syntax::.
@ -286,16 +284,16 @@ tarlz supports the following options: *Note Argument syntax::.
Change to directory DIR. When creating, appending, comparing, or
extracting, the position of each option '-C' in the command line is
significant; it changes the current working directory for the following
FILES until a new option '-C' appears in the command line. '--list'
and '--delete' ignore any option '-C' specified. DIR is relative to
the then current working directory, perhaps changed by a previous
option '-C'.
FILES (including those specified with option '-T') until a new option
'-C' appears in the command line. '--list' and '--delete' ignore any
option '-C' specified. DIR is relative to the then current working
directory, perhaps changed by a previous option '-C'.
Note that a process can only have one current working directory (CWD).
Therefore multi-threading can't be used to create or decode an archive
if an option '-C' appears after a (relative) file name in the command
line. (All file names are made relative by removing leading slashes
when decoding).
Therefore multithreading can't be used to create or decode an archive
if an option '-C' appears in the command line after a (relative) file
name or after an option '-T'. (All file names are made relative by
removing leading slashes when decoding).
'-f ARCHIVE'
'--file=ARCHIVE'
@ -315,7 +313,7 @@ tarlz supports the following options: *Note Argument syntax::.
support. A value of 0 disables threads entirely. If this option is not
used, tarlz tries to detect the number of processors in the system and
use it as default value. 'tarlz --help' shows the system's default
value. See the note about multi-threading in the option '-C' above.
value. See the note about multithreading in the option '-C' above.
Note that the number of usable threads is limited during compression to
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
@ -339,6 +337,25 @@ tarlz supports the following options: *Note Argument syntax::.
'--quiet'
Quiet operation. Suppress all messages.
'-R'
'--no-recursive'
When creating or appending, don't descend recursively into
directories. When decoding, process only the files and directories
specified.
'--recursive'
Operate recursively on directories. This is the default.
'-T FILE'
'--files-from=FILE'
When creating or appending, read from FILE the names of the files to
be archived. When decoding, read from FILE the names of the members to
be processed. Each name is terminated by a newline. This option can be
used in combination with the option '-R' to read a list of files
generated with the 'find' utility. A hyphen '-' used as the name of
FILE reads the names from standard input. Multiple '-T' options can be
specified.
'-v'
'--verbose'
Verbosely list files processed. Further -v's (up to 4) increase the
@ -426,6 +443,10 @@ tarlz supports the following options: *Note Argument syntax::.
If GROUP is not a valid group name, it is decoded as a decimal numeric
group ID.
'--depth'
When creating or appending, archive all entries from each directory
before archiving the directory itself.
'--exclude=PATTERN'
Exclude files matching a shell pattern like '*.o', even if the files
are specified in the command line. A file is considered to match if any
@ -477,13 +498,29 @@ tarlz supports the following options: *Note Argument syntax::.
'1970-01-01 00:00:00 UTC'. Negative seconds or years define a
modification time before the epoch.
'--mount'
Stay in local file system when creating archive; skip mount points and
don't descend below mount points. This is useful when doing backups of
complete file systems.
'--xdev'
Stay in local file system when creating archive; archive the mount
points themselves, but don't descend below mount points. This is
useful when doing backups of complete file systems. If the function
'nftw' of the system C library does not support the flag 'FTW_XDEV',
'--xdev' behaves like '--mount'.
'--out-slots=N'
Number of 1 MiB output packets buffered per worker thread during
multi-threaded creation or appending to compressed archives.
Increasing the number of packets may increase compression speed if the
files being archived are larger than 64 MiB compressed, but requires
more memory. Valid values range from 1 to 1024. The default value is
64.
multithreaded creation or appending to compressed archives. Increasing
the number of packets may increase compression speed if the files
being archived are larger than 64 MiB compressed, but requires more
memory. Valid values range from 1 to 1024. The default value is 64.
'--parallel'
Use multithreading to create an uncompressed archive in parallel if the
number of threads is greater than 1. This is not the default because
it uses much more memory than sequential creation.
'--warn-newer'
During archive creation, warn if any file being archived has a
@ -575,8 +612,8 @@ compares the files in the archive with the files in the file system:
Once the integrity and accuracy of an archive have been verified as in
the example above, they can be verified again anywhere at any time with
'tarlz -t -n0'. It is important to disable multi-threading with '-n0'
because multi-threaded listing does not detect corruption in the tar member
'tarlz -t -n0'. It is important to disable multithreading with '-n0'
because multithreaded listing does not detect corruption in the tar member
data of multimember archives: *Note mt-listing::.
tarlz -t -n0 -f archive.tar.lz > /dev/null
@ -589,6 +626,9 @@ at a member boundary:
lzip -tv archive.tar.lz
The probability of truncation happening at a member boundary is
(members - 1) / compressed_size, usually one in several million.

File: tarlz.info, Node: Portable character set, Next: File format, Prev: Creating backups safely, Up: Top
@ -604,6 +644,8 @@ The set of characters from which portable file names are constructed.
The last three characters are the period, underscore, and hyphen-minus
characters, respectively.
Tarlz does not support file names containing newline characters.
File names are identifiers. Therefore, archiving works better when file
names use only the portable character set without spaces added.
@ -657,7 +699,7 @@ following sequence:
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like '--list' to work. If any tar member is
split over two or more lzip members, the archive must be decoded
sequentially. *Note Multi-threaded decoding::.
sequentially. *Note Multithreaded decoding::.
At the end of the archive file there are two 512-byte blocks filled with
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
@ -1020,17 +1062,17 @@ without conversion to UTF-8 nor any other transformation. This prevents
accidental double UTF-8 conversions.

File: tarlz.info, Node: Program design, Next: Multi-threaded decoding, Prev: Amendments to pax format, Up: Top
File: tarlz.info, Node: Program design, Next: Multithreaded decoding, Prev: Amendments to pax format, Up: Top
8 Internal structure of tarlz
*****************************
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The
interesting parts described here are those related to multi-threaded
interesting parts described here are those related to multithreaded
processing.
The structure of the part of tarlz performing multi-threaded archive
The structure of the part of tarlz performing multithreaded archive
creation is somewhat similar to that of plzip with the added complication
of the solidity levels. *Note Program design: (plzip)Program design. A
grouper thread and several worker threads are created, acting the main
@ -1100,7 +1142,7 @@ some other worker requests mastership in a previous lzip member can this
error be avoided.

File: tarlz.info, Node: Multi-threaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
File: tarlz.info, Node: Multithreaded decoding, Next: Minimum archive sizes, Prev: Program design, Up: Top
9 Limitations of parallel tar decoding
**************************************
@ -1126,7 +1168,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
parallelized if the tar members are not aligned with the lzip members. Tar
archives compressed with plzip can't be decoded in parallel because tar and
plzip do not have a way to align both sets of members. Certainly one can
decompress one such archive with a multi-threaded tool like plzip, but the
decompress one such archive with a multithreaded tool like plzip, but the
increase in speed is not as large as it could be because plzip must
serialize the decompressed data and pass them to tar, which decodes them
sequentially, one tar member at a time.
@ -1139,13 +1181,13 @@ possible decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
misalignment during multithreaded decoding, it switches to single-threaded
mode and continues decoding the archive.
9.1 Multi-threaded listing
==========================
9.1 Multithreaded listing
=========================
If the files in the archive are large, multi-threaded '--list' on a regular
If the files in the archive are large, multithreaded '--list' on a regular
(seekable) tar.lz archive can be hundreds of times faster than sequential
'--list' because, in addition to using several processors, it only needs to
decompress part of each lzip member. See the following example listing the
@ -1156,20 +1198,20 @@ Silesia corpus on a dual core machine:
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member
On the other hand, multithreaded '--list' won't detect corruption in the
tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
why the tar headers (including the extended records) must provide their own
integrity checking.
9.2 Limitations of multi-threaded extraction
============================================
9.2 Limitations of multithreaded extraction
===========================================
Multi-threaded extraction may produce different output than single-threaded
Multithreaded extraction may produce different output than single-threaded
extraction in some cases:
During multi-threaded extraction, several independent threads are
During multithreaded extraction, several independent threads are
simultaneously reading the archive and creating files in the file system.
The archive is not read sequentially. As a consequence, any error or
weirdness in the archive (like a corrupt member or an end-of-archive block
@ -1179,7 +1221,7 @@ archive beyond that point has been processed.
If the archive contains two or more tar members with the same name,
single-threaded extraction extracts the members in the order they appear in
the archive and leaves in the file system the last version of the file. But
multi-threaded extraction may extract the members in any order and leave in
multithreaded extraction may extract the members in any order and leave in
the file system any version of the file nondeterministically. It is
unspecified which of the tar members is extracted.
@ -1191,14 +1233,14 @@ names resolve to the same file in the file system), the result is undefined.
links to.

File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multithreaded decoding, Up: Top
10 Minimum archive sizes required for multi-threaded block compression
**********************************************************************
10 Minimum archive sizes required for multithreaded block compression
*********************************************************************
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses
as many blocks simultaneously as worker threads are chosen, creating a
When creating or appending to a compressed archive using multithreaded block
compression, tarlz puts tar members together in blocks and compresses as
many blocks simultaneously as worker threads are chosen, creating a
multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed
@ -1334,7 +1376,7 @@ Concept index
* invoking: Invoking tarlz. (line 6)
* minimum archive sizes: Minimum archive sizes. (line 6)
* options: Invoking tarlz. (line 6)
* parallel tar decoding: Multi-threaded decoding. (line 6)
* parallel tar decoding: Multithreaded decoding. (line 6)
* portable character set: Portable character set. (line 6)
* program design: Program design. (line 6)
* usage: Invoking tarlz. (line 6)
@ -1344,29 +1386,29 @@ Concept index

Tag Table:
Node: Top216
Node: Introduction1354
Node: Invoking tarlz4177
Ref: --data-size13263
Ref: --bsolid17922
Ref: --missing-crc21530
Node: Argument syntax23895
Node: Creating backups safely25671
Node: Portable character set28055
Node: File format28707
Ref: key_crc3235754
Ref: ustar-uid-gid39050
Ref: ustar-mtime39857
Node: Amendments to pax format41864
Ref: crc3242572
Ref: flawed-compat43883
Node: Program design47868
Node: Multi-threaded decoding51795
Ref: mt-listing54196
Ref: mt-extraction55234
Node: Minimum archive sizes56540
Node: Examples58669
Node: Problems61164
Node: Concept index61719
Node: Introduction1353
Node: Invoking tarlz4175
Ref: --data-size13086
Ref: --bsolid18556
Ref: --missing-crc22292
Node: Argument syntax25400
Node: Creating backups safely27176
Node: Portable character set29691
Node: File format30412
Ref: key_crc3237458
Ref: ustar-uid-gid40754
Ref: ustar-mtime41561
Node: Amendments to pax format43568
Ref: crc3244276
Ref: flawed-compat45587
Node: Program design49572
Node: Multithreaded decoding53496
Ref: mt-listing55894
Ref: mt-extraction56928
Node: Minimum archive sizes58229
Node: Examples60354
Node: Problems62849
Node: Concept index63404

End Tag Table

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 4 March 2025
@set VERSION 0.27.1
@set UPDATED 24 June 2025
@set VERSION 0.28.1
@dircategory Archiving
@direntry
@ -44,8 +44,8 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Program design:: Internal structure of tarlz
* Multi-threaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Multithreaded decoding:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multithreaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -64,7 +64,7 @@ distribute, and modify it.
@cindex introduction
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
(multi-threaded) combined implementation of the tar archiver and the
(multithreaded) combined implementation of the tar archiver and the
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
@ -171,7 +171,8 @@ equivalent to @w{@option{-1 --solid}}.
tarlz supports the following operations:
@table @code
@item --help
@item -?
@itemx --help
Print an informative help message describing the options and exit.
@item -V
@ -195,7 +196,7 @@ no @var{files} have been specified.
Concatenating archives containing files in common results in two or more tar
members with the same name in the resulting archive, which may produce
nondeterministic behavior during multi-threaded extraction.
nondeterministic behavior during multithreaded extraction.
@xref{mt-extraction}.
@item -c
@ -226,12 +227,9 @@ not delete a tar member unless it is possible to do so. For example it won't
try to delete a tar member that is not compressed individually. Even in the
case of finding a corrupt member after having deleted some member(s), tarlz
stops and copies the rest of the file as soon as corruption is found,
leaving it just as corrupt as it was, but not worse.
To delete a directory without deleting the files under it, use
@w{@samp{tarlz --delete -f foo --exclude='dir/*' dir}}. Deleting in place
may be dangerous. A corrupt archive, a power cut, or an I/O error may cause
data loss.
leaving it just as corrupt as it was, but not worse. Deleting in place may
be dangerous. A corrupt archive, a power cut, or an I/O error may cause data
loss.
@item -r
@itemx --append
@ -250,7 +248,7 @@ if no @var{files} have been specified.
Appending files already present in the archive results in two or more tar
members with the same name, which may produce nondeterministic behavior
during multi-threaded extraction. @xref{mt-extraction}.
during multithreaded extraction. @xref{mt-extraction}.
@item -t
@itemx --list
@ -260,13 +258,11 @@ List the contents of an archive. If @var{files} are given, list only the
@item -x
@itemx --extract
Extract files from an archive. If @var{files} are given, extract only the
@var{files} given. Else extract all the files in the archive. To extract a
directory without extracting the files under it, use
@w{@samp{tarlz -xf foo --exclude='dir/*' dir}}. Tarlz removes files and
empty directories unconditionally before extracting over them. Other than
that, it does not make any special effort to extract a file over an
incompatible type of file. For example, extracting a file over a non-empty
directory usually fails. @xref{mt-extraction}.
@var{files} given. Else extract all the files in the archive. Tarlz removes
files and empty directories unconditionally before extracting over them.
Other than that, it does not make any special effort to extract a file over
an incompatible type of file. For example, extracting a file over a
non-empty directory usually fails. @xref{mt-extraction}.
@item -z
@itemx --compress
@ -310,6 +306,9 @@ and the value of LZ_API_VERSION (if defined).
@xref{Library version,,,lzlib}.
@end ifnothtml
@item --time-bits
Print the size of time_t in bits and exit.
@end table
@noindent
@ -331,15 +330,17 @@ member large enough to contain the file.
Change to directory @var{dir}. When creating, appending, comparing, or
extracting, the position of each option @option{-C} in the command line is
significant; it changes the current working directory for the following
@var{files} until a new option @option{-C} appears in the command line.
@option{--list} and @option{--delete} ignore any option @option{-C}
specified. @var{dir} is relative to the then current working directory,
perhaps changed by a previous option @option{-C}.
@var{files} (including those specified with option @option{-T}) until a new
option @option{-C} appears in the command line. @option{--list} and
@option{--delete} ignore any option @option{-C} specified. @var{dir} is
relative to the then current working directory, perhaps changed by a
previous option @option{-C}.
Note that a process can only have one current working directory (CWD).
Therefore multi-threading can't be used to create or decode an archive if an
option @option{-C} appears after a (relative) file name in the command line.
(All file names are made relative by removing leading slashes when decoding).
Therefore multithreading can't be used to create or decode an archive if an
option @option{-C} appears in the command line after a (relative) file name
or after an option @option{-T}. (All file names are made relative by
removing leading slashes when decoding).
@item -f @var{archive}
@itemx --file=@var{archive}
@ -358,7 +359,7 @@ Valid values range from 0 to as many as your system can support. A value
of 0 disables threads entirely. If this option is not used, tarlz tries to
detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. See the note about
multi-threading in the option @option{-C} above.
multithreading in the option @option{-C} above.
Note that the number of usable threads is limited during compression to
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
@ -382,6 +383,24 @@ permissions specified in the archive.
@itemx --quiet
Quiet operation. Suppress all messages.
@item -R
@itemx --no-recursive
When creating or appending, don't descend recursively into directories. When
decoding, process only the files and directories specified.
@item --recursive
Operate recursively on directories. This is the default.
@item -T @var{file}
@itemx --files-from=@var{file}
When creating or appending, read from @var{file} the names of the files to
be archived. When decoding, read from @var{file} the names of the members to
be processed. Each name is terminated by a newline. This option can be used
in combination with the option @option{-R} to read a list of files generated
with the @command{find} utility. A hyphen @samp{-} used as the name of
@var{file} reads the names from standard input. Multiple @option{-T} options
can be specified.
@item -v
@itemx --verbose
Verbosely list files processed. Further -v's (up to 4) increase the
@ -468,6 +487,10 @@ When creating or appending, use @var{group} for files added to the archive.
If @var{group} is not a valid group name, it is decoded as a decimal numeric
group ID.
@item --depth
When creating or appending, archive all entries from each directory before
archiving the directory itself.
@item --exclude=@var{pattern}
Exclude files matching a shell pattern like @file{*.o}, even if the files
are specified in the command line. A file is considered to match if any
@ -520,13 +543,30 @@ format is optional and defaults to @samp{00:00:00}. The epoch is
@w{@samp{1970-01-01 00:00:00 UTC}}. Negative seconds or years define a
modification time before the epoch.
@item --mount
Stay in local file system when creating archive; skip mount points and don't
descend below mount points. This is useful when doing backups of complete
file systems.
@item --xdev
Stay in local file system when creating archive; archive the mount points
themselves, but don't descend below mount points. This is useful when doing
backups of complete file systems. If the function @samp{nftw} of the system
C library does not support the flag @samp{FTW_XDEV}, @option{--xdev} behaves
like @option{--mount}.
@item --out-slots=@var{n}
Number of @w{1 MiB} output packets buffered per worker thread during
multi-threaded creation or appending to compressed archives. Increasing the
multithreaded creation or appending to compressed archives. Increasing the
number of packets may increase compression speed if the files being archived
are larger than @w{64 MiB} compressed, but requires more memory. Valid
values range from 1 to 1024. The default value is 64.
@item --parallel
Use multithreading to create an uncompressed archive in parallel if the
number of threads is greater than 1. This is not the default because it uses
much more memory than sequential creation.
@item --warn-newer
During archive creation, warn if any file being archived has a modification
time newer than the archive creation time. This option may slow archive
@ -630,9 +670,9 @@ tarlz -df archive.tar.lz # check the archive
Once the integrity and accuracy of an archive have been verified as in the
example above, they can be verified again anywhere at any time with
@w{@samp{tarlz -t -n0}}. It is important to disable multi-threading with
@option{-n0} because multi-threaded listing does not detect corruption in
the tar member data of multimember archives: @xref{mt-listing}.
@w{@samp{tarlz -t -n0}}. It is important to disable multithreading with
@option{-n0} because multithreaded listing does not detect corruption in the
tar member data of multimember archives: @xref{mt-listing}.
@example
tarlz -t -n0 -f archive.tar.lz > /dev/null
@ -648,6 +688,9 @@ just at a member boundary:
lzip -tv archive.tar.lz
@end example
The probability of truncation happening at a member boundary is
@w{(members - 1) / compressed_size}, usually one in several million.
@node Portable character set
@chapter POSIX portable filename character set
@ -664,6 +707,8 @@ a b c d e f g h i j k l m n o p q r s t u v w x y z
The last three characters are the period, underscore, and hyphen-minus
characters, respectively.
Tarlz does not support file names containing newline characters.
File names are identifiers. Therefore, archiving works better when file
names use only the portable character set without spaces added.
@ -726,7 +771,7 @@ Zero or more blocks that contain the contents of the file.
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like @option{--list} to work. If any tar member
is split over two or more lzip members, the archive must be decoded
sequentially. @xref{Multi-threaded decoding}.
sequentially. @xref{Multithreaded decoding}.
At the end of the archive file there are two 512-byte blocks filled with
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
@ -1111,10 +1156,10 @@ accidental double UTF-8 conversions.
The parts of tarlz related to sequential processing of the archive are more
or less similar to any other tar and won't be described here. The interesting
parts described here are those related to multi-threaded processing.
parts described here are those related to multithreaded processing.
The structure of the part of tarlz performing multi-threaded archive
creation is somewhat similar to that of
The structure of the part of tarlz performing multithreaded archive creation
is somewhat similar to that of
@uref{http://www.nongnu.org/lzip/manual/plzip_manual.html#Program-design,,plzip}
with the added complication of the solidity levels.
@ifnothtml
@ -1190,7 +1235,7 @@ some other worker requests mastership in a previous lzip member can this
error be avoided.
@node Multi-threaded decoding
@node Multithreaded decoding
@chapter Limitations of parallel tar decoding
@cindex parallel tar decoding
@ -1215,7 +1260,7 @@ parallel. Therefore, in tar.lz archives the decoding operations can't be
parallelized if the tar members are not aligned with the lzip members. Tar
archives compressed with plzip can't be decoded in parallel because tar and
plzip do not have a way to align both sets of members. Certainly one can
decompress one such archive with a multi-threaded tool like plzip, but the
decompress one such archive with a multithreaded tool like plzip, but the
increase in speed is not as large as it could be because plzip must
serialize the decompressed data and pass them to tar, which decodes them
sequentially, one tar member at a time.
@ -1228,13 +1273,13 @@ decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
misalignment during multithreaded decoding, it switches to single-threaded
mode and continues decoding the archive.
@anchor{mt-listing}
@section Multi-threaded listing
@section Multithreaded listing
If the files in the archive are large, multi-threaded @option{--list} on a
If the files in the archive are large, multithreaded @option{--list} on a
regular (seekable) tar.lz archive can be hundreds of times faster than
sequential @option{--list} because, in addition to using several processors,
it only needs to decompress part of each lzip member. See the following
@ -1247,7 +1292,7 @@ time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
@end example
On the other hand, multi-threaded @option{--list} won't detect corruption in
On the other hand, multithreaded @option{--list} won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header. Partial decoding of a lzip member
can't guarantee the integrity of the data decoded. This is another reason
@ -1255,12 +1300,12 @@ why the tar headers (including the extended records) must provide their own
integrity checking.
@anchor{mt-extraction}
@section Limitations of multi-threaded extraction
@section Limitations of multithreaded extraction
Multi-threaded extraction may produce different output than single-threaded
Multithreaded extraction may produce different output than single-threaded
extraction in some cases:
During multi-threaded extraction, several independent threads are
During multithreaded extraction, several independent threads are
simultaneously reading the archive and creating files in the file system.
The archive is not read sequentially. As a consequence, any error or
weirdness in the archive (like a corrupt member or an end-of-archive block
@ -1270,7 +1315,7 @@ archive beyond that point has been processed.
If the archive contains two or more tar members with the same name,
single-threaded extraction extracts the members in the order they appear in
the archive and leaves in the file system the last version of the file. But
multi-threaded extraction may extract the members in any order and leave in
multithreaded extraction may extract the members in any order and leave in
the file system any version of the file nondeterministically. It is
unspecified which of the tar members is extracted.
@ -1283,12 +1328,12 @@ links to.
@node Minimum archive sizes
@chapter Minimum archive sizes required for multi-threaded block compression
@chapter Minimum archive sizes required for multithreaded block compression
@cindex minimum archive sizes
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and compresses
as many blocks simultaneously as worker threads are chosen, creating a
When creating or appending to a compressed archive using multithreaded block
compression, tarlz puts tar members together in blocks and compresses as
many blocks simultaneously as worker threads are chosen, creating a
multimember compressed archive.
For this to work as expected (and roughly multiply the compression speed by