1
0
Fork 0

Adding upstream version 0.11.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 21:12:08 +01:00
parent 7a2248990c
commit 6bd0c00498
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
18 changed files with 1504 additions and 654 deletions

View file

@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.10, 31 January 2019).
This manual is for Tarlz (version 0.11, 13 February 2019).
* Menu:
@ -20,6 +20,7 @@ This manual is for Tarlz (version 0.10, 31 January 2019).
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Multi-threaded tar:: Limitations of parallel tar decoding
* Minimum archive sizes:: Sizes required for full multi-threaded speed
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@ -36,23 +37,23 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
Tarlz is a combined implementation of the tar archiver and the lzip
compressor. By default tarlz creates, lists and extracts archives in a
simplified posix pax format compressed with lzip on a per file basis.
Each tar member is compressed in its own lzip member, as well as the
end-of-file blocks. This method adds an indexed lzip layer on top of
the tar archive, making it possible to decode the archive safely in
parallel. The resulting multimember tar.lz archive is fully backward
compatible with standard tar tools like GNU tar, which treat it like
any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
Tarlz is a massively parallel (multi-threaded) combined implementation
of the tar archiver and the lzip compressor. Tarlz creates, lists and
extracts archives in a simplified posix pax format compressed with
lzip, keeping the alignment between tar members and lzip members. This
method adds an indexed lzip layer on top of the tar archive, making it
possible to decode the archive safely in parallel. The resulting
multimember tar.lz archive is fully backward compatible with standard
tar tools like GNU tar, which treat it like any other tar.lz archive.
Tarlz can append files to the end of such compressed archives.
Tarlz can create tar archives with four levels of compression
granularity; per file, per directory, appendable solid, and solid.
Tarlz can create tar archives with five levels of compression
granularity; per file, per block, per directory, appendable solid, and
solid.
Of course, compressing each file (or each directory) individually is
less efficient than compressing the whole tar archive, but it has the
following advantages:
Of course, compressing each file (or each directory) individually can't
achieve a compression ratio as high as compressing solidly the whole tar
archive, but it has the following advantages:
* The resulting multimember tar.lz archive can be decompressed in
parallel, multiplying the decompression speed.
@ -87,17 +88,23 @@ The format for running tarlz is:
tarlz [OPTIONS] [FILES]
On archive creation or appending, tarlz removes leading and trailing
slashes from filenames, as well as filename prefixes containing a '..'
component. On extraction, archive members containing a '..' component
are skipped. Tarlz detects when the archive being created or enlarged
is among the files to be dumped, appended or concatenated, and skips it.
On archive creation or appending tarlz archives the files specified, but
removes from member names any leading and trailing slashes and any
filename prefixes containing a '..' component. On extraction, leading
and trailing slashes are also removed from member names, and archive
members containing a '..' component in the filename are skipped. Tarlz
detects when the archive being created or enlarged is among the files
to be dumped, appended or concatenated, and skips it.
On extraction and listing, tarlz removes leading './' strings from
member names in the archive or given in the command line, so that
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
archive 'foo'.
If several compression levels or '--*solid' options are given, the
last setting is used. For example '-9 --solid --uncompressed -1' is
equivalent to '-1 --solid'
tarlz supports the following options:
'-h'
@ -125,7 +132,7 @@ archive 'foo'.
Set target size of input data blocks for the '--bsolid' option.
Valid values range from 8 KiB to 1 GiB. Default value is two times
the dictionary size, except for option '-0' where it defaults to
1 MiB.
1 MiB. *Note Minimum archive sizes::.
'-c'
'--create'
@ -142,6 +149,11 @@ archive 'foo'.
relative to the then current working directory, perhaps changed by
a previous '-C' option.
Note that a process can only have one current working directory
(CWD). Therefore multi-threading can't be used to create an
archive if a '-C' option appears after a relative filename in the
command line.
'-f ARCHIVE'
'--file=ARCHIVE'
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
@ -149,18 +161,21 @@ archive 'foo'.
'-n N'
'--threads=N'
Set the number of decompression threads, overriding the system's
Set the number of (de)compression threads, overriding the system's
default. Valid values range from 0 to "as many as your system can
support". A value of 0 disables threads entirely. If this option
is not used, tarlz tries to detect the number of processors in the
system and use it as default value. 'tarlz --help' shows the
system's default value. This option currently only has effect when
listing the contents of a multimember compressed archive. *Note
system's default value. See the note about multi-threaded archive
creation in the '-C' option above. Multi-threaded extraction of
files from an archive is not yet implemented. *Note
Multi-threaded tar::.
Note that the number of usable threads is limited during
decompression to the number of lzip members in the tar.lz archive,
which you can find by running 'lzip -lv archive.tar.lz'.
compression to ceil( uncompressed_size / data_size ) (*note
Minimum archive sizes::), and during decompression to the number
of lzip members in the tar.lz archive, which you can find by
running 'lzip -lv archive.tar.lz'.
'-q'
'--quiet'
@ -180,7 +195,7 @@ archive 'foo'.
'-t'
'--list'
List the contents of an archive. If FILES are given, list only the
given FILES.
FILES given.
'-v'
'--verbose'
@ -189,7 +204,7 @@ archive 'foo'.
'-x'
'--extract'
Extract files from an archive. If FILES are given, extract only
the given FILES. Else extract all the files in the archive.
the FILES given. Else extract all the files in the archive.
'-0 .. -9'
Set the compression level. The default compression level is '-6'.
@ -214,38 +229,43 @@ archive 'foo'.
solid compression. All the files being added to the archive are
compressed into a single lzip member, but the end-of-file blocks
are compressed into a separate lzip member. This creates a solidly
compressed appendable archive.
compressed appendable archive. Solid archives can't be created
nor decoded in parallel.
'--bsolid'
When creating or appending to a compressed archive, compress tar
members together in a lzip member until they approximate a target
uncompressed size. The size can't be exact because each solidly
compressed data block must contain an integer number of tar
members. This option improves compression efficiency for archives
with lots of small files. *Note --data-size::, to set the target
When creating or appending to a compressed archive, use block
compression. Tar members are compressed together in a lzip member
until they approximate a target uncompressed size. The size can't
be exact because each solidly compressed data block must contain
an integer number of tar members. Block compression is the default
because it improves compression ratio for archives with many files
smaller than the block size. This option allows tarlz revert to
default behavior if, for example, it is invoked through an alias
like 'tar='tarlz --solid''. *Note --data-size::, to set the target
block size.
'--dsolid'
When creating or appending to a compressed archive, use solid
compression for each directory especified in the command line. The
end-of-file blocks are compressed into a separate lzip member. This
creates a compressed appendable archive with a separate lzip
member for each top-level directory.
When creating or appending to a compressed archive, compress each
file specified in the command line separately in its own lzip
member, and use solid compression for each directory specified in
the command line. The end-of-file blocks are compressed into a
separate lzip member. This creates a compressed appendable archive
with a separate lzip member for each file or top-level directory
specified.
'--no-solid'
When creating or appending to a compressed archive, compress each
file separately. The end-of-file blocks are compressed into a
separate lzip member. This creates a compressed appendable archive
with a separate lzip member for each file. This option allows
tarlz revert to default behavior if, for example, tarlz is invoked
through an alias like 'tar='tarlz --solid''.
file separately in its own lzip member. The end-of-file blocks are
compressed into a separate lzip member. This creates a compressed
appendable archive with a lzip member for each file.
'--solid'
When creating or appending to a compressed archive, use solid
compression. The files being added to the archive, along with the
compression. The files being added to the archive, along with the
end-of-file blocks, are compressed into a single lzip member. The
resulting archive is not appendable. No more files can be later
appended to the archive.
appended to the archive. Solid archives can't be created nor
decoded in parallel.
'--anonymous'
Equivalent to '--owner=root --group=root'.
@ -341,9 +361,9 @@ blocks are either compressed in a separate lzip member or compressed
along with the tar members contained in the last lzip member.
The diagram below shows the correspondence between each tar member
(formed by one or two headers plus optional data) in the tar archive and
each lzip member in the resulting multimember tar.lz archive: *Note
File format: (lzip)File format.
(formed by one or two headers plus optional data) in the tar archive
and each lzip member in the resulting multimember tar.lz archive, when
per file compression is used: *Note File format: (lzip)File format.
tar
+========+======+=================+===============+========+======+========+
@ -612,12 +632,12 @@ wasteful for a backup format.
There is no portable way to tell what charset a text string is coded
into. Therefore, tarlz stores all fields representing text strings
as-is, without conversion to UTF-8 nor any other transformation. This
prevents accidental double UTF-8 conversions. If the need arises this
behavior will be adjusted with a command line option in the future.
unmodified, without conversion to UTF-8 nor any other transformation.
This prevents accidental double UTF-8 conversions. If the need arises
this behavior will be adjusted with a command line option in the future.

File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top
5 Limitations of parallel tar decoding
**************************************
@ -659,15 +679,53 @@ sequential '--list' because, in addition to using several processors,
it only needs to decompress part of each lzip member. See the following
example listing the Silesia corpus on a dual core machine:
tarlz -9 -cf silesia.tar.lz silesia
tarlz -9 --no-solid -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)

File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top
6 A small tutorial with examples
6 Minimum archive sizes required for multi-threaded block compression
*********************************************************************
When creating or appending to a compressed archive using multi-threaded
block compression, tarlz puts tar members together in blocks and
compresses as many blocks simultaneously as worker threads are chosen,
creating a multimember compressed archive.
For this to work as expected (and roughly multiply the compression
speed by the number of available processors), the uncompressed archive
must be at least as large as the number of worker threads times the
block size (*note --data-size::). Else some processors will not get any
data to compress, and compression will be proportionally slower. The
maximum speed increase achievable on a given file is limited by the
ratio (uncompressed_size / data_size). For example, a tarball the size
of gcc or linux will scale up to 10 or 12 processors at level -9.
The following table shows the minimum uncompressed archive size
needed for full use of N processors at a given compression level, using
the default data size for each level:
Processors 2 4 8 16 64 256
------------------------------------------------------------------
Level
-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB
-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB
-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB
-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB
-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB
-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB
-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB
-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB
-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB
-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB

File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
7 A small tutorial with examples
********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz'
@ -725,7 +783,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory

File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
7 Reporting bugs
8 Reporting bugs
****************
There are probably bugs in tarlz. There are certainly errors and
@ -754,6 +812,7 @@ Concept index
* getting help: Problems. (line 6)
* introduction: Introduction. (line 6)
* invoking: Invoking tarlz. (line 6)
* minimum archive sizes: Minimum archive sizes. (line 6)
* options: Invoking tarlz. (line 6)
* usage: Invoking tarlz. (line 6)
* version: Invoking tarlz. (line 6)
@ -762,18 +821,19 @@ Concept index

Tag Table:
Node: Top223
Node: Introduction1013
Node: Invoking tarlz3125
Ref: --data-size4717
Node: File format11536
Ref: key_crc3216321
Node: Amendments to pax format21738
Ref: crc3222262
Ref: flawed-compat23287
Node: Multi-threaded tar25649
Node: Examples28164
Node: Problems29830
Node: Concept index30356
Node: Introduction1089
Node: Invoking tarlz3218
Ref: --data-size5097
Node: File format12673
Ref: key_crc3217493
Node: Amendments to pax format22910
Ref: crc3223434
Ref: flawed-compat24459
Node: Multi-threaded tar26826
Node: Minimum archive sizes29365
Node: Examples31495
Node: Problems33164
Node: Concept index33690

End Tag Table