Adding upstream version 0.11.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
7a2248990c
commit
6bd0c00498
18 changed files with 1504 additions and 654 deletions
28
doc/tarlz.1
28
doc/tarlz.1
|
@ -1,20 +1,20 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH TARLZ "1" "February 2019" "tarlz 0.10a" "User Commands"
|
||||
.TH TARLZ "1" "February 2019" "tarlz 0.11" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
.B tarlz
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Tarlz is a combined implementation of the tar archiver and the lzip
|
||||
compressor. By default tarlz creates, lists and extracts archives in a
|
||||
simplified posix pax format compressed with lzip on a per file basis. Each
|
||||
tar member is compressed in its own lzip member, as well as the end\-of\-file
|
||||
blocks. This method adds an indexed lzip layer on top of the tar archive,
|
||||
making it possible to decode the archive safely in parallel. The resulting
|
||||
multimember tar.lz archive is fully backward compatible with standard tar
|
||||
tools like GNU tar, which treat it like any other tar.lz archive. Tarlz can
|
||||
append files to the end of such compressed archives.
|
||||
Tarlz is a massively parallel (multi\-threaded) combined implementation of
|
||||
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
|
||||
archives in a simplified posix pax format compressed with lzip, keeping the
|
||||
alignment between tar members and lzip members. This method adds an indexed
|
||||
lzip layer on top of the tar archive, making it possible to decode the
|
||||
archive safely in parallel. The resulting multimember tar.lz archive is
|
||||
fully backward compatible with standard tar tools like GNU tar, which treat
|
||||
it like any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
.PP
|
||||
The tarlz file format is a safe posix\-style backup format. In case of
|
||||
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||
|
@ -46,7 +46,7 @@ change to directory <dir>
|
|||
use archive file <archive>
|
||||
.TP
|
||||
\fB\-n\fR, \fB\-\-threads=\fR<n>
|
||||
set number of decompression threads [2]
|
||||
set number of (de)compression threads [2]
|
||||
.TP
|
||||
\fB\-q\fR, \fB\-\-quiet\fR
|
||||
suppress all messages
|
||||
|
@ -70,13 +70,13 @@ set compression level [default 6]
|
|||
create solidly compressed appendable archive
|
||||
.TP
|
||||
\fB\-\-bsolid\fR
|
||||
create per\-data\-block compressed archive
|
||||
create per block compressed archive (default)
|
||||
.TP
|
||||
\fB\-\-dsolid\fR
|
||||
create per\-directory compressed archive
|
||||
create per directory compressed archive
|
||||
.TP
|
||||
\fB\-\-no\-solid\fR
|
||||
create per\-file compressed archive (default)
|
||||
create per file compressed archive
|
||||
.TP
|
||||
\fB\-\-solid\fR
|
||||
create solidly compressed archive
|
||||
|
|
202
doc/tarlz.info
202
doc/tarlz.info
|
@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.10, 31 January 2019).
|
||||
This manual is for Tarlz (version 0.11, 13 February 2019).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -20,6 +20,7 @@ This manual is for Tarlz (version 0.10, 31 January 2019).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -36,23 +37,23 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Tarlz is a combined implementation of the tar archiver and the lzip
|
||||
compressor. By default tarlz creates, lists and extracts archives in a
|
||||
simplified posix pax format compressed with lzip on a per file basis.
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This method adds an indexed lzip layer on top of
|
||||
the tar archive, making it possible to decode the archive safely in
|
||||
parallel. The resulting multimember tar.lz archive is fully backward
|
||||
compatible with standard tar tools like GNU tar, which treat it like
|
||||
any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
Tarlz is a massively parallel (multi-threaded) combined implementation
|
||||
of the tar archiver and the lzip compressor. Tarlz creates, lists and
|
||||
extracts archives in a simplified posix pax format compressed with
|
||||
lzip, keeping the alignment between tar members and lzip members. This
|
||||
method adds an indexed lzip layer on top of the tar archive, making it
|
||||
possible to decode the archive safely in parallel. The resulting
|
||||
multimember tar.lz archive is fully backward compatible with standard
|
||||
tar tools like GNU tar, which treat it like any other tar.lz archive.
|
||||
Tarlz can append files to the end of such compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
Tarlz can create tar archives with five levels of compression
|
||||
granularity; per file, per block, per directory, appendable solid, and
|
||||
solid.
|
||||
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
following advantages:
|
||||
Of course, compressing each file (or each directory) individually can't
|
||||
achieve a compression ratio as high as compressing solidly the whole tar
|
||||
archive, but it has the following advantages:
|
||||
|
||||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel, multiplying the decompression speed.
|
||||
|
@ -87,17 +88,23 @@ The format for running tarlz is:
|
|||
|
||||
tarlz [OPTIONS] [FILES]
|
||||
|
||||
On archive creation or appending, tarlz removes leading and trailing
|
||||
slashes from filenames, as well as filename prefixes containing a '..'
|
||||
component. On extraction, archive members containing a '..' component
|
||||
are skipped. Tarlz detects when the archive being created or enlarged
|
||||
is among the files to be dumped, appended or concatenated, and skips it.
|
||||
On archive creation or appending tarlz archives the files specified, but
|
||||
removes from member names any leading and trailing slashes and any
|
||||
filename prefixes containing a '..' component. On extraction, leading
|
||||
and trailing slashes are also removed from member names, and archive
|
||||
members containing a '..' component in the filename are skipped. Tarlz
|
||||
detects when the archive being created or enlarged is among the files
|
||||
to be dumped, appended or concatenated, and skips it.
|
||||
|
||||
On extraction and listing, tarlz removes leading './' strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
|
||||
archive 'foo'.
|
||||
|
||||
If several compression levels or '--*solid' options are given, the
|
||||
last setting is used. For example '-9 --solid --uncompressed -1' is
|
||||
equivalent to '-1 --solid'
|
||||
|
||||
tarlz supports the following options:
|
||||
|
||||
'-h'
|
||||
|
@ -125,7 +132,7 @@ archive 'foo'.
|
|||
Set target size of input data blocks for the '--bsolid' option.
|
||||
Valid values range from 8 KiB to 1 GiB. Default value is two times
|
||||
the dictionary size, except for option '-0' where it defaults to
|
||||
1 MiB.
|
||||
1 MiB. *Note Minimum archive sizes::.
|
||||
|
||||
'-c'
|
||||
'--create'
|
||||
|
@ -142,6 +149,11 @@ archive 'foo'.
|
|||
relative to the then current working directory, perhaps changed by
|
||||
a previous '-C' option.
|
||||
|
||||
Note that a process can only have one current working directory
|
||||
(CWD). Therefore multi-threading can't be used to create an
|
||||
archive if a '-C' option appears after a relative filename in the
|
||||
command line.
|
||||
|
||||
'-f ARCHIVE'
|
||||
'--file=ARCHIVE'
|
||||
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
|
||||
|
@ -149,18 +161,21 @@ archive 'foo'.
|
|||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
Set the number of decompression threads, overriding the system's
|
||||
Set the number of (de)compression threads, overriding the system's
|
||||
default. Valid values range from 0 to "as many as your system can
|
||||
support". A value of 0 disables threads entirely. If this option
|
||||
is not used, tarlz tries to detect the number of processors in the
|
||||
system and use it as default value. 'tarlz --help' shows the
|
||||
system's default value. This option currently only has effect when
|
||||
listing the contents of a multimember compressed archive. *Note
|
||||
system's default value. See the note about multi-threaded archive
|
||||
creation in the '-C' option above. Multi-threaded extraction of
|
||||
files from an archive is not yet implemented. *Note
|
||||
Multi-threaded tar::.
|
||||
|
||||
Note that the number of usable threads is limited during
|
||||
decompression to the number of lzip members in the tar.lz archive,
|
||||
which you can find by running 'lzip -lv archive.tar.lz'.
|
||||
compression to ceil( uncompressed_size / data_size ) (*note
|
||||
Minimum archive sizes::), and during decompression to the number
|
||||
of lzip members in the tar.lz archive, which you can find by
|
||||
running 'lzip -lv archive.tar.lz'.
|
||||
|
||||
'-q'
|
||||
'--quiet'
|
||||
|
@ -180,7 +195,7 @@ archive 'foo'.
|
|||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
given FILES.
|
||||
FILES given.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
|
@ -189,7 +204,7 @@ archive 'foo'.
|
|||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive. If FILES are given, extract only
|
||||
the given FILES. Else extract all the files in the archive.
|
||||
the FILES given. Else extract all the files in the archive.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression level. The default compression level is '-6'.
|
||||
|
@ -214,38 +229,43 @@ archive 'foo'.
|
|||
solid compression. All the files being added to the archive are
|
||||
compressed into a single lzip member, but the end-of-file blocks
|
||||
are compressed into a separate lzip member. This creates a solidly
|
||||
compressed appendable archive.
|
||||
compressed appendable archive. Solid archives can't be created
|
||||
nor decoded in parallel.
|
||||
|
||||
'--bsolid'
|
||||
When creating or appending to a compressed archive, compress tar
|
||||
members together in a lzip member until they approximate a target
|
||||
uncompressed size. The size can't be exact because each solidly
|
||||
compressed data block must contain an integer number of tar
|
||||
members. This option improves compression efficiency for archives
|
||||
with lots of small files. *Note --data-size::, to set the target
|
||||
When creating or appending to a compressed archive, use block
|
||||
compression. Tar members are compressed together in a lzip member
|
||||
until they approximate a target uncompressed size. The size can't
|
||||
be exact because each solidly compressed data block must contain
|
||||
an integer number of tar members. Block compression is the default
|
||||
because it improves compression ratio for archives with many files
|
||||
smaller than the block size. This option allows tarlz revert to
|
||||
default behavior if, for example, it is invoked through an alias
|
||||
like 'tar='tarlz --solid''. *Note --data-size::, to set the target
|
||||
block size.
|
||||
|
||||
'--dsolid'
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression for each directory especified in the command line. The
|
||||
end-of-file blocks are compressed into a separate lzip member. This
|
||||
creates a compressed appendable archive with a separate lzip
|
||||
member for each top-level directory.
|
||||
When creating or appending to a compressed archive, compress each
|
||||
file specified in the command line separately in its own lzip
|
||||
member, and use solid compression for each directory specified in
|
||||
the command line. The end-of-file blocks are compressed into a
|
||||
separate lzip member. This creates a compressed appendable archive
|
||||
with a separate lzip member for each file or top-level directory
|
||||
specified.
|
||||
|
||||
'--no-solid'
|
||||
When creating or appending to a compressed archive, compress each
|
||||
file separately. The end-of-file blocks are compressed into a
|
||||
separate lzip member. This creates a compressed appendable archive
|
||||
with a separate lzip member for each file. This option allows
|
||||
tarlz revert to default behavior if, for example, tarlz is invoked
|
||||
through an alias like 'tar='tarlz --solid''.
|
||||
file separately in its own lzip member. The end-of-file blocks are
|
||||
compressed into a separate lzip member. This creates a compressed
|
||||
appendable archive with a lzip member for each file.
|
||||
|
||||
'--solid'
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later
|
||||
appended to the archive.
|
||||
appended to the archive. Solid archives can't be created nor
|
||||
decoded in parallel.
|
||||
|
||||
'--anonymous'
|
||||
Equivalent to '--owner=root --group=root'.
|
||||
|
@ -341,9 +361,9 @@ blocks are either compressed in a separate lzip member or compressed
|
|||
along with the tar members contained in the last lzip member.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each lzip member in the resulting multimember tar.lz archive: *Note
|
||||
File format: (lzip)File format.
|
||||
(formed by one or two headers plus optional data) in the tar archive
|
||||
and each lzip member in the resulting multimember tar.lz archive, when
|
||||
per file compression is used: *Note File format: (lzip)File format.
|
||||
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
@ -612,12 +632,12 @@ wasteful for a backup format.
|
|||
|
||||
There is no portable way to tell what charset a text string is coded
|
||||
into. Therefore, tarlz stores all fields representing text strings
|
||||
as-is, without conversion to UTF-8 nor any other transformation. This
|
||||
prevents accidental double UTF-8 conversions. If the need arises this
|
||||
behavior will be adjusted with a command line option in the future.
|
||||
unmodified, without conversion to UTF-8 nor any other transformation.
|
||||
This prevents accidental double UTF-8 conversions. If the need arises
|
||||
this behavior will be adjusted with a command line option in the future.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
|
||||
File: tarlz.info, Node: Multi-threaded tar, Next: Minimum archive sizes, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
5 Limitations of parallel tar decoding
|
||||
**************************************
|
||||
|
@ -659,15 +679,53 @@ sequential '--list' because, in addition to using several processors,
|
|||
it only needs to decompress part of each lzip member. See the following
|
||||
example listing the Silesia corpus on a dual core machine:
|
||||
|
||||
tarlz -9 -cf silesia.tar.lz silesia
|
||||
tarlz -9 --no-solid -cf silesia.tar.lz silesia
|
||||
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
|
||||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded tar, Up: Top
|
||||
|
||||
6 A small tutorial with examples
|
||||
6 Minimum archive sizes required for multi-threaded block compression
|
||||
*********************************************************************
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and
|
||||
compresses as many blocks simultaneously as worker threads are chosen,
|
||||
creating a multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression
|
||||
speed by the number of available processors), the uncompressed archive
|
||||
must be at least as large as the number of worker threads times the
|
||||
block size (*note --data-size::). Else some processors will not get any
|
||||
data to compress, and compression will be proportionally slower. The
|
||||
maximum speed increase achievable on a given file is limited by the
|
||||
ratio (uncompressed_size / data_size). For example, a tarball the size
|
||||
of gcc or linux will scale up to 10 or 12 processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed archive size
|
||||
needed for full use of N processors at a given compression level, using
|
||||
the default data size for each level:
|
||||
|
||||
Processors 2 4 8 16 64 256
|
||||
------------------------------------------------------------------
|
||||
Level
|
||||
-0 2 MiB 4 MiB 8 MiB 16 MiB 64 MiB 256 MiB
|
||||
-1 4 MiB 8 MiB 16 MiB 32 MiB 128 MiB 512 MiB
|
||||
-2 6 MiB 12 MiB 24 MiB 48 MiB 192 MiB 768 MiB
|
||||
-3 8 MiB 16 MiB 32 MiB 64 MiB 256 MiB 1 GiB
|
||||
-4 12 MiB 24 MiB 48 MiB 96 MiB 384 MiB 1.5 GiB
|
||||
-5 16 MiB 32 MiB 64 MiB 128 MiB 512 MiB 2 GiB
|
||||
-6 32 MiB 64 MiB 128 MiB 256 MiB 1 GiB 4 GiB
|
||||
-7 64 MiB 128 MiB 256 MiB 512 MiB 2 GiB 8 GiB
|
||||
-8 96 MiB 192 MiB 384 MiB 768 MiB 3 GiB 12 GiB
|
||||
-9 128 MiB 256 MiB 512 MiB 1 GiB 4 GiB 16 GiB
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Minimum archive sizes, Up: Top
|
||||
|
||||
7 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||
|
@ -725,7 +783,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
|||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
7 Reporting bugs
|
||||
8 Reporting bugs
|
||||
****************
|
||||
|
||||
There are probably bugs in tarlz. There are certainly errors and
|
||||
|
@ -754,6 +812,7 @@ Concept index
|
|||
* getting help: Problems. (line 6)
|
||||
* introduction: Introduction. (line 6)
|
||||
* invoking: Invoking tarlz. (line 6)
|
||||
* minimum archive sizes: Minimum archive sizes. (line 6)
|
||||
* options: Invoking tarlz. (line 6)
|
||||
* usage: Invoking tarlz. (line 6)
|
||||
* version: Invoking tarlz. (line 6)
|
||||
|
@ -762,18 +821,19 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top223
|
||||
Node: Introduction1013
|
||||
Node: Invoking tarlz3125
|
||||
Ref: --data-size4717
|
||||
Node: File format11536
|
||||
Ref: key_crc3216321
|
||||
Node: Amendments to pax format21738
|
||||
Ref: crc3222262
|
||||
Ref: flawed-compat23287
|
||||
Node: Multi-threaded tar25649
|
||||
Node: Examples28164
|
||||
Node: Problems29830
|
||||
Node: Concept index30356
|
||||
Node: Introduction1089
|
||||
Node: Invoking tarlz3218
|
||||
Ref: --data-size5097
|
||||
Node: File format12673
|
||||
Ref: key_crc3217493
|
||||
Node: Amendments to pax format22910
|
||||
Ref: crc3223434
|
||||
Ref: flawed-compat24459
|
||||
Node: Multi-threaded tar26826
|
||||
Node: Minimum archive sizes29365
|
||||
Node: Examples31495
|
||||
Node: Problems33164
|
||||
Node: Concept index33690
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
186
doc/tarlz.texi
186
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 31 January 2019
|
||||
@set VERSION 0.10
|
||||
@set UPDATED 13 February 2019
|
||||
@set VERSION 0.11
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -40,6 +40,7 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||
* Minimum archive sizes:: Sizes required for full multi-threaded speed
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
@ -56,25 +57,24 @@ to copy, distribute and modify it.
|
|||
@chapter Introduction
|
||||
@cindex introduction
|
||||
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined
|
||||
implementation of the tar archiver and the
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end-of-file blocks. This method adds an
|
||||
indexed lzip layer on top of the tar archive, making it possible to decode
|
||||
the archive safely in parallel. The resulting multimember tar.lz archive is
|
||||
fully backward compatible with standard tar tools like GNU tar, which treat
|
||||
it like any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
|
||||
(multi-threaded) combined implementation of the tar archiver and the
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
|
||||
lists and extracts archives in a simplified posix pax format compressed with
|
||||
lzip, keeping the alignment between tar members and lzip members. This
|
||||
method adds an indexed lzip layer on top of the tar archive, making it
|
||||
possible to decode the archive safely in parallel. The resulting multimember
|
||||
tar.lz archive is fully backward compatible with standard tar tools like GNU
|
||||
tar, which treat it like any other tar.lz archive. Tarlz can append files to
|
||||
the end of such compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression granularity;
|
||||
per file, per directory, appendable solid, and solid.
|
||||
Tarlz can create tar archives with five levels of compression granularity;
|
||||
per file, per block, per directory, appendable solid, and solid.
|
||||
|
||||
@noindent
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
following advantages:
|
||||
Of course, compressing each file (or each directory) individually can't
|
||||
achieve a compression ratio as high as compressing solidly the whole tar
|
||||
archive, but it has the following advantages:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
|
@ -120,18 +120,23 @@ tarlz [@var{options}] [@var{files}]
|
|||
@end example
|
||||
|
||||
@noindent
|
||||
On archive creation or appending, tarlz removes leading and trailing
|
||||
slashes from filenames, as well as filename prefixes containing a
|
||||
@samp{..} component. On extraction, archive members containing a
|
||||
@samp{..} component are skipped. Tarlz detects when the archive being
|
||||
created or enlarged is among the files to be dumped, appended or
|
||||
concatenated, and skips it.
|
||||
On archive creation or appending tarlz archives the files specified, but
|
||||
removes from member names any leading and trailing slashes and any filename
|
||||
prefixes containing a @samp{..} component. On extraction, leading and
|
||||
trailing slashes are also removed from member names, and archive members
|
||||
containing a @samp{..} component in the filename are skipped. Tarlz detects
|
||||
when the archive being created or enlarged is among the files to be dumped,
|
||||
appended or concatenated, and skips it.
|
||||
|
||||
On extraction and listing, tarlz removes leading @samp{./} strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
@w{@code{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
|
||||
@samp{./baz} from archive @samp{foo}.
|
||||
|
||||
If several compression levels or @samp{--*solid} options are given, the last
|
||||
setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is
|
||||
equivalent to @samp{-1 --solid}
|
||||
|
||||
tarlz supports the following options:
|
||||
|
||||
@table @code
|
||||
|
@ -160,6 +165,7 @@ specified. Tarlz can't concatenate uncompressed tar archives.
|
|||
Set target size of input data blocks for the @samp{--bsolid} option. Valid
|
||||
values range from @w{8 KiB} to @w{1 GiB}. Default value is two times the
|
||||
dictionary size, except for option @samp{-0} where it defaults to @w{1 MiB}.
|
||||
@xref{Minimum archive sizes}.
|
||||
|
||||
@item -c
|
||||
@itemx --create
|
||||
|
@ -176,6 +182,10 @@ extraction. Listing ignores any @samp{-C} options specified. @var{dir}
|
|||
is relative to the then current working directory, perhaps changed by a
|
||||
previous @samp{-C} option.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create an archive if a @samp{-C}
|
||||
option appears after a relative filename in the command line.
|
||||
|
||||
@item -f @var{archive}
|
||||
@itemx --file=@var{archive}
|
||||
Use archive file @var{archive}. @samp{-} used as an @var{archive}
|
||||
|
@ -183,17 +193,19 @@ argument reads from standard input or writes to standard output.
|
|||
|
||||
@item -n @var{n}
|
||||
@itemx --threads=@var{n}
|
||||
Set the number of decompression threads, overriding the system's default.
|
||||
Set the number of (de)compression threads, overriding the system's default.
|
||||
Valid values range from 0 to "as many as your system can support". A value
|
||||
of 0 disables threads entirely. If this option is not used, tarlz tries to
|
||||
detect the number of processors in the system and use it as default value.
|
||||
@w{@samp{tarlz --help}} shows the system's default value. This option
|
||||
currently only has effect when listing the contents of a multimember
|
||||
compressed archive. @xref{Multi-threaded tar}.
|
||||
@w{@samp{tarlz --help}} shows the system's default value. See the note about
|
||||
multi-threaded archive creation in the @samp{-C} option above.
|
||||
Multi-threaded extraction of files from an archive is not yet implemented.
|
||||
@xref{Multi-threaded tar}.
|
||||
|
||||
Note that the number of usable threads is limited during decompression to
|
||||
the number of lzip members in the tar.lz archive, which you can find by
|
||||
running @w{@code{lzip -lv archive.tar.lz}}.
|
||||
Note that the number of usable threads is limited during compression to
|
||||
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
|
||||
and during decompression to the number of lzip members in the tar.lz
|
||||
archive, which you can find by running @w{@code{lzip -lv archive.tar.lz}}.
|
||||
|
||||
@item -q
|
||||
@itemx --quiet
|
||||
|
@ -213,7 +225,7 @@ to an uncompressed tar archive.
|
|||
@item -t
|
||||
@itemx --list
|
||||
List the contents of an archive. If @var{files} are given, list only the
|
||||
given @var{files}.
|
||||
@var{files} given.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
|
@ -222,7 +234,7 @@ Verbosely list files processed.
|
|||
@item -x
|
||||
@itemx --extract
|
||||
Extract files from an archive. If @var{files} are given, extract only
|
||||
the given @var{files}. Else extract all the files in the archive.
|
||||
the @var{files} given. Else extract all the files in the archive.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression level. The default compression level is @samp{-6}.
|
||||
|
@ -245,40 +257,42 @@ it creates, reducing the amount of memory required for decompression.
|
|||
|
||||
@item --asolid
|
||||
When creating or appending to a compressed archive, use appendable solid
|
||||
compression. All the files being added to the archive are compressed
|
||||
into a single lzip member, but the end-of-file blocks are compressed
|
||||
into a separate lzip member. This creates a solidly compressed
|
||||
appendable archive.
|
||||
compression. All the files being added to the archive are compressed into a
|
||||
single lzip member, but the end-of-file blocks are compressed into a
|
||||
separate lzip member. This creates a solidly compressed appendable archive.
|
||||
Solid archives can't be created nor decoded in parallel.
|
||||
|
||||
@item --bsolid
|
||||
When creating or appending to a compressed archive, compress tar members
|
||||
together in a lzip member until they approximate a target uncompressed size.
|
||||
The size can't be exact because each solidly compressed data block must
|
||||
contain an integer number of tar members. This option improves compression
|
||||
efficiency for archives with lots of small files. @xref{--data-size}, to set
|
||||
the target block size.
|
||||
When creating or appending to a compressed archive, use block compression.
|
||||
Tar members are compressed together in a lzip member until they approximate
|
||||
a target uncompressed size. The size can't be exact because each solidly
|
||||
compressed data block must contain an integer number of tar members. Block
|
||||
compression is the default because it improves compression ratio for
|
||||
archives with many files smaller than the block size. This option allows
|
||||
tarlz revert to default behavior if, for example, it is invoked through an
|
||||
alias like @code{tar='tarlz --solid'}. @xref{--data-size}, to set the target
|
||||
block size.
|
||||
|
||||
@item --dsolid
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression for each directory especified in the command line. The
|
||||
end-of-file blocks are compressed into a separate lzip member. This
|
||||
creates a compressed appendable archive with a separate lzip member for
|
||||
each top-level directory.
|
||||
When creating or appending to a compressed archive, compress each file
|
||||
specified in the command line separately in its own lzip member, and use
|
||||
solid compression for each directory specified in the command line. The
|
||||
end-of-file blocks are compressed into a separate lzip member. This creates
|
||||
a compressed appendable archive with a separate lzip member for each file or
|
||||
top-level directory specified.
|
||||
|
||||
@item --no-solid
|
||||
When creating or appending to a compressed archive, compress each file
|
||||
separately. The end-of-file blocks are compressed into a separate lzip
|
||||
member. This creates a compressed appendable archive with a separate
|
||||
lzip member for each file. This option allows tarlz revert to default
|
||||
behavior if, for example, tarlz is invoked through an alias like
|
||||
@code{tar='tarlz --solid'}.
|
||||
separately in its own lzip member. The end-of-file blocks are compressed
|
||||
into a separate lzip member. This creates a compressed appendable archive
|
||||
with a lzip member for each file.
|
||||
|
||||
@item --solid
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later appended
|
||||
to the archive.
|
||||
When creating or appending to a compressed archive, use solid compression.
|
||||
The files being added to the archive, along with the end-of-file blocks, are
|
||||
compressed into a single lzip member. The resulting archive is not
|
||||
appendable. No more files can be later appended to the archive. Solid
|
||||
archives can't be created nor decoded in parallel.
|
||||
|
||||
@item --anonymous
|
||||
Equivalent to @samp{--owner=root --group=root}.
|
||||
|
@ -388,11 +402,11 @@ binary zeros, interpreted as an end-of-archive indicator. These EOF
|
|||
blocks are either compressed in a separate lzip member or compressed
|
||||
along with the tar members contained in the last lzip member.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each
|
||||
The diagram below shows the correspondence between each tar member (formed
|
||||
by one or two headers plus optional data) in the tar archive and each
|
||||
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member}
|
||||
in the resulting multimember tar.lz archive:
|
||||
in the resulting multimember tar.lz archive, when per file compression is
|
||||
used:
|
||||
@ifnothtml
|
||||
@xref{File format,,,lzip}.
|
||||
@end ifnothtml
|
||||
|
@ -672,10 +686,10 @@ format.
|
|||
@section Avoid misconversions to/from UTF-8
|
||||
|
||||
There is no portable way to tell what charset a text string is coded into.
|
||||
Therefore, tarlz stores all fields representing text strings as-is, without
|
||||
conversion to UTF-8 nor any other transformation. This prevents accidental
|
||||
double UTF-8 conversions. If the need arises this behavior will be adjusted
|
||||
with a command line option in the future.
|
||||
Therefore, tarlz stores all fields representing text strings unmodified,
|
||||
without conversion to UTF-8 nor any other transformation. This prevents
|
||||
accidental double UTF-8 conversions. If the need arises this behavior will
|
||||
be adjusted with a command line option in the future.
|
||||
|
||||
|
||||
@node Multi-threaded tar
|
||||
|
@ -717,13 +731,51 @@ it only needs to decompress part of each lzip member. See the following
|
|||
example listing the Silesia corpus on a dual core machine:
|
||||
|
||||
@example
|
||||
tarlz -9 -cf silesia.tar.lz silesia
|
||||
tarlz -9 --no-solid -cf silesia.tar.lz silesia
|
||||
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
|
||||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
@end example
|
||||
|
||||
|
||||
@node Minimum archive sizes
|
||||
@chapter Minimum archive sizes required for multi-threaded block compression
|
||||
@cindex minimum archive sizes
|
||||
|
||||
When creating or appending to a compressed archive using multi-threaded
|
||||
block compression, tarlz puts tar members together in blocks and compresses
|
||||
as many blocks simultaneously as worker threads are chosen, creating a
|
||||
multimember compressed archive.
|
||||
|
||||
For this to work as expected (and roughly multiply the compression speed by
|
||||
the number of available processors), the uncompressed archive must be at
|
||||
least as large as the number of worker threads times the block size
|
||||
(@pxref{--data-size}). Else some processors will not get any data to
|
||||
compress, and compression will be proportionally slower. The maximum speed
|
||||
increase achievable on a given file is limited by the ratio
|
||||
@w{(uncompressed_size / data_size)}. For example, a tarball the size of gcc
|
||||
or linux will scale up to 10 or 12 processors at level -9.
|
||||
|
||||
The following table shows the minimum uncompressed archive size needed for
|
||||
full use of N processors at a given compression level, using the default
|
||||
data size for each level:
|
||||
|
||||
@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
|
||||
@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
|
||||
@item Level
|
||||
@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB
|
||||
@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB
|
||||
@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB
|
||||
@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB
|
||||
@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB
|
||||
@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB
|
||||
@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB
|
||||
@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB
|
||||
@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB
|
||||
@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB
|
||||
@end multitable
|
||||
|
||||
|
||||
@node Examples
|
||||
@chapter A small tutorial with examples
|
||||
@cindex examples
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue