Adding upstream version 0.9.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
9bbbd387b8
commit
7cf0407517
25 changed files with 1761 additions and 353 deletions
25
doc/tarlz.1
25
doc/tarlz.1
|
@ -1,18 +1,20 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH TARLZ "1" "December 2018" "tarlz 0.8" "User Commands"
|
||||
.TH TARLZ "1" "January 2019" "tarlz 0.9" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
.B tarlz
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end\-of\-file blocks. This method is fully
|
||||
backward compatible with standard tar tools like GNU tar, which treat the
|
||||
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||
can append files to the end of such compressed archives.
|
||||
Tarlz is a combined implementation of the tar archiver and the lzip
|
||||
compressor. By default tarlz creates, lists and extracts archives in a
|
||||
simplified posix pax format compressed with lzip on a per file basis. Each
|
||||
tar member is compressed in its own lzip member, as well as the end\-of\-file
|
||||
blocks. This method adds an indexed lzip layer on top of the tar archive,
|
||||
making it possible to decode the archive safely in parallel. The resulting
|
||||
multimember tar.lz archive is fully backward compatible with standard tar
|
||||
tools like GNU tar, which treat it like any other tar.lz archive. Tarlz can
|
||||
append files to the end of such compressed archives.
|
||||
.PP
|
||||
The tarlz file format is a safe posix\-style backup format. In case of
|
||||
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||
|
@ -40,6 +42,9 @@ change to directory <dir>
|
|||
\fB\-f\fR, \fB\-\-file=\fR<archive>
|
||||
use archive file <archive>
|
||||
.TP
|
||||
\fB\-n\fR, \fB\-\-threads=\fR<n>
|
||||
set number of decompression threads [2]
|
||||
.TP
|
||||
\fB\-q\fR, \fB\-\-quiet\fR
|
||||
suppress all messages
|
||||
.TP
|
||||
|
@ -97,8 +102,8 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
.br
|
||||
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
Using lzlib 1.11\-rc2
|
||||
Copyright \(co 2019 Antonio Diaz Diaz.
|
||||
Using lzlib 1.11
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
153
doc/tarlz.info
153
doc/tarlz.info
|
@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.8, 16 December 2018).
|
||||
This manual is for Tarlz (version 0.9, 22 January 2019).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -19,12 +19,13 @@ This manual is for Tarlz (version 0.8, 16 December 2018).
|
|||
* Invoking tarlz:: Command line interface
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
Copyright (C) 2013-2019 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission to
|
||||
copy, distribute and modify it.
|
||||
|
@ -35,12 +36,14 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
1 Introduction
|
||||
**************
|
||||
|
||||
Tarlz is a small and simple implementation of the tar archiver. By
|
||||
default tarlz creates, lists and extracts archives in a simplified
|
||||
posix pax format compressed with lzip on a per file basis. Each tar
|
||||
member is compressed in its own lzip member, as well as the end-of-file
|
||||
blocks. This method is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||
Tarlz is a combined implementation of the tar archiver and the lzip
|
||||
compressor. By default tarlz creates, lists and extracts archives in a
|
||||
simplified posix pax format compressed with lzip on a per file basis.
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This method adds an indexed lzip layer on top of
|
||||
the tar archive, making it possible to decode the archive safely in
|
||||
parallel. The resulting multimember tar.lz archive is fully backward
|
||||
compatible with standard tar tools like GNU tar, which treat it like
|
||||
any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
|
||||
|
@ -52,7 +55,7 @@ less efficient than compressing the whole tar archive, but it has the
|
|||
following advantages:
|
||||
|
||||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel with plzip, multiplying the decompression speed.
|
||||
parallel, multiplying the decompression speed.
|
||||
|
||||
* New members can be appended to the archive (by removing the EOF
|
||||
member) just like to an uncompressed tar archive.
|
||||
|
@ -74,10 +77,6 @@ with standard tar tools. *Note crc32::.
|
|||
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
||||
'star' or 'v7'.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real
|
||||
tar programs to evaluate the format and perhaps implement it in their
|
||||
tools.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
|
||||
|
||||
|
@ -141,6 +140,21 @@ archive 'foo'.
|
|||
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
|
||||
from standard input or writes to standard output.
|
||||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
Set the number of decompression threads, overriding the system's
|
||||
default. Valid values range from 0 to "as many as your system can
|
||||
support". A value of 0 disables threads entirely. If this option
|
||||
is not used, tarlz tries to detect the number of processors in the
|
||||
system and use it as default value. 'tarlz --help' shows the
|
||||
system's default value. This option currently only has effect when
|
||||
listing the contents of a multimember compressed archive. *Note
|
||||
Multi-threaded tar::.
|
||||
|
||||
Note that the number of usable threads is limited during
|
||||
decompression to the number of lzip members in the tar.lz archive,
|
||||
which you can find by running 'lzip -lv archive.tar.lz'.
|
||||
|
||||
'-q'
|
||||
'--quiet'
|
||||
Quiet operation. Suppress all messages.
|
||||
|
@ -288,6 +302,11 @@ following sequence:
|
|||
|
||||
* Zero or more blocks that contain the contents of the file.
|
||||
|
||||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like '--list' to work. If any tar member
|
||||
is split over two or more lzip members, the archive must be decoded
|
||||
sequentially. *Note Multi-threaded tar::.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled
|
||||
with binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||
blocks are either compressed in a separate lzip member or compressed
|
||||
|
@ -417,19 +436,12 @@ record is used to store the linkname.
|
|||
The mode field provides 12 access permission bits. The following
|
||||
table shows the symbolic name of each bit and its octal value:
|
||||
|
||||
Bit Name Bit value
|
||||
S_ISUID 04000
|
||||
S_ISGID 02000
|
||||
S_ISVTX 01000
|
||||
S_IRUSR 00400
|
||||
S_IWUSR 00200
|
||||
S_IXUSR 00100
|
||||
S_IRGRP 00040
|
||||
S_IWGRP 00020
|
||||
S_IXGRP 00010
|
||||
S_IROTH 00004
|
||||
S_IWOTH 00002
|
||||
S_IXOTH 00001
|
||||
Bit Name Value Bit Name Value Bit Name Value
|
||||
---------------------------------------------------
|
||||
S_ISUID 04000 S_ISGID 02000 S_ISVTX 01000
|
||||
S_IRUSR 00400 S_IWUSR 00200 S_IXUSR 00100
|
||||
S_IRGRP 00040 S_IWGRP 00020 S_IXGRP 00010
|
||||
S_IROTH 00004 S_IWOTH 00002 S_IXOTH 00001
|
||||
|
||||
The uid and gid fields are the user and group ID of the owner and
|
||||
group of the file, respectively.
|
||||
|
@ -485,12 +497,16 @@ file archived:
|
|||
|
||||
The magic field contains the ASCII null-terminated string "ustar".
|
||||
The version field contains the characters "00" (0x30,0x30). The fields
|
||||
uname, and gname are null-terminated character strings. Each numeric
|
||||
field contains a leading zero-filled, null-terminated octal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||
uname, and gname are null-terminated character strings except when all
|
||||
characters in the array contain non-null characters including the last
|
||||
character. Each numeric field contains a leading space- or zero-filled,
|
||||
optionally null-terminated octal number using digits from the ISO/IEC
|
||||
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1
|
||||
byte larger than standard ustar by not requiring a terminating null
|
||||
character.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
|
||||
File: tarlz.info, Node: Amendments to pax format, Next: Multi-threaded tar, Prev: File format, Up: Top
|
||||
|
||||
4 The reasons for the differences with pax
|
||||
******************************************
|
||||
|
@ -508,7 +524,7 @@ and the concrete reasons to implement them.
|
|||
The posix pax format has a serious flaw. The metadata stored in pax
|
||||
extended records are not protected by any kind of check sequence.
|
||||
Corruption in a long filename may cause the extraction of the file in
|
||||
the wrong place without warning. Corruption in a long file size may
|
||||
the wrong place without warning. Corruption in a large file size may
|
||||
cause the truncation of the file or the appending of garbage to the
|
||||
file, both followed by a spurious warning about a corrupt header far
|
||||
from the place of the undetected corruption.
|
||||
|
@ -573,9 +589,57 @@ prevents accidental double UTF-8 conversions. If the need arises this
|
|||
behavior will be adjusted with a command line option in the future.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
|
||||
File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
5 A small tutorial with examples
|
||||
5 Limitations of parallel tar decoding
|
||||
**************************************
|
||||
|
||||
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
||||
example, if a tar archive containing another tar archive is decoded
|
||||
starting from some position other than the beginning, there is no way
|
||||
to know if the first header found there belongs to the outer tar
|
||||
archive or to the inner tar archive. Tar is a format inherently serial;
|
||||
it was designed for tapes.
|
||||
|
||||
In the case of compressed tar archives, the start of each compressed
|
||||
block determines one point through which the tar archive can be decoded
|
||||
in parallel. Therefore, in tar.lz archives the decoding operations
|
||||
can't be parallelized if the tar members are not aligned with the lzip
|
||||
members. Tar archives compressed with plzip can't be decoded in
|
||||
parallel because tar and plzip do not have a way to align both sets of
|
||||
members. Certainly one can decompress one such archive with a
|
||||
multi-threaded tool like plzip, but the increase in speed is not as
|
||||
large as it could be because plzip must serialize the decompressed data
|
||||
and pass them to tar, which decodes them sequentially, one tar member
|
||||
at a time.
|
||||
|
||||
On the other hand, if the tar.lz archive is created with a tool like
|
||||
tarlz, which can guarantee the alignment between tar members and lzip
|
||||
members because it controls both archiving and compression, then the
|
||||
lzip format becomes an indexed layer on top of the tar archive which
|
||||
makes possible decoding it safely in parallel.
|
||||
|
||||
Tarlz is able to automatically decode aligned and unaligned
|
||||
multimember tar.lz archives, keeping backwards compatibility. If tarlz
|
||||
finds a member misalignment during multi-threaded decoding, it switches
|
||||
to single-threaded mode and continues decoding the archive. Currently
|
||||
only the '--list' option is able to do multi-threaded decoding.
|
||||
|
||||
If the files in the archive are large, multi-threaded '--list' on a
|
||||
regular tar.lz archive can be hundreds of times faster than sequential
|
||||
'--list' because, in addition to using several processors, it only
|
||||
needs to decompress part of each lzip member. See the following example
|
||||
listing the Silesia corpus on a dual core machine:
|
||||
|
||||
tarlz -9 -cf silesia.tar.lz silesia
|
||||
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
|
||||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
|
||||
|
||||
6 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||
|
@ -633,7 +697,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
|||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
6 Reporting bugs
|
||||
7 Reporting bugs
|
||||
****************
|
||||
|
||||
There are probably bugs in tarlz. There are certainly errors and
|
||||
|
@ -670,16 +734,17 @@ Concept index
|
|||
|
||||
Tag Table:
|
||||
Node: Top223
|
||||
Node: Introduction946
|
||||
Node: Invoking tarlz3084
|
||||
Node: File format9606
|
||||
Ref: key_crc3214138
|
||||
Node: Amendments to pax format19215
|
||||
Ref: crc3219729
|
||||
Ref: flawed-compat20753
|
||||
Node: Examples23126
|
||||
Node: Problems24802
|
||||
Node: Concept index25328
|
||||
Node: Introduction1012
|
||||
Node: Invoking tarlz3124
|
||||
Node: File format10384
|
||||
Ref: key_crc3215169
|
||||
Node: Amendments to pax format20586
|
||||
Ref: crc3221110
|
||||
Ref: flawed-compat22135
|
||||
Node: Multi-threaded tar24508
|
||||
Node: Examples27012
|
||||
Node: Problems28682
|
||||
Node: Concept index29208
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
134
doc/tarlz.texi
134
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 16 December 2018
|
||||
@set VERSION 0.8
|
||||
@set UPDATED 22 January 2019
|
||||
@set VERSION 0.9
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -39,13 +39,14 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
|||
* Invoking tarlz:: Command line interface
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Multi-threaded tar:: Limitations of parallel tar decoding
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
@end menu
|
||||
|
||||
@sp 1
|
||||
Copyright @copyright{} 2013-2018 Antonio Diaz Diaz.
|
||||
Copyright @copyright{} 2013-2019 Antonio Diaz Diaz.
|
||||
|
||||
This manual is free documentation: you have unlimited permission
|
||||
to copy, distribute and modify it.
|
||||
|
@ -55,18 +56,20 @@ to copy, distribute and modify it.
|
|||
@chapter Introduction
|
||||
@cindex introduction
|
||||
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a small and simple
|
||||
implementation of the tar archiver. By default tarlz creates, lists and
|
||||
extracts archives in a simplified posix pax format compressed with
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} on a per file basis. Each
|
||||
tar member is compressed in its own lzip member, as well as the end-of-file
|
||||
blocks. This method is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like any
|
||||
other tar.lz archive. Tarlz can append files to the end of such compressed
|
||||
archives.
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined
|
||||
implementation of the tar archiver and the
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end-of-file blocks. This method adds an
|
||||
indexed lzip layer on top of the tar archive, making it possible to decode
|
||||
the archive safely in parallel. The resulting multimember tar.lz archive is
|
||||
fully backward compatible with standard tar tools like GNU tar, which treat
|
||||
it like any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
Tarlz can create tar archives with four levels of compression granularity;
|
||||
per file, per directory, appendable solid, and solid.
|
||||
|
||||
@noindent
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
|
@ -76,7 +79,7 @@ following advantages:
|
|||
@itemize @bullet
|
||||
@item
|
||||
The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel with plzip, multiplying the decompression speed.
|
||||
parallel, multiplying the decompression speed.
|
||||
|
||||
@item
|
||||
New members can be appended to the archive (by removing the EOF
|
||||
|
@ -102,9 +105,6 @@ standard tar tools. @xref{crc32}.
|
|||
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
|
||||
@samp{star} or @samp{v7}.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||
programs to evaluate the format and perhaps implement it in their tools.
|
||||
|
||||
|
||||
@node Invoking tarlz
|
||||
@chapter Invoking tarlz
|
||||
|
@ -174,6 +174,20 @@ previous @code{-C} option.
|
|||
Use archive file @var{archive}. @samp{-} used as an @var{archive}
|
||||
argument reads from standard input or writes to standard output.
|
||||
|
||||
@item -n @var{n}
|
||||
@itemx --threads=@var{n}
|
||||
Set the number of decompression threads, overriding the system's default.
|
||||
Valid values range from 0 to "as many as your system can support". A value
|
||||
of 0 disables threads entirely. If this option is not used, tarlz tries to
|
||||
detect the number of processors in the system and use it as default value.
|
||||
@w{@samp{tarlz --help}} shows the system's default value. This option
|
||||
currently only has effect when listing the contents of a multimember
|
||||
compressed archive. @xref{Multi-threaded tar}.
|
||||
|
||||
Note that the number of usable threads is limited during decompression to
|
||||
the number of lzip members in the tar.lz archive, which you can find by
|
||||
running @w{@code{lzip -lv archive.tar.lz}}.
|
||||
|
||||
@item -q
|
||||
@itemx --quiet
|
||||
Quiet operation. Suppress all messages.
|
||||
|
@ -335,6 +349,11 @@ associated fields in this header block for this file.
|
|||
Zero or more blocks that contain the contents of the file.
|
||||
@end itemize
|
||||
|
||||
Each tar member must be contiguously stored in a lzip member for the
|
||||
parallel decoding operations like @code{--list} to work. If any tar member
|
||||
is split over two or more lzip members, the archive must be decoded
|
||||
sequentially. @xref{Multi-threaded tar}.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||
blocks are either compressed in a separate lzip member or compressed
|
||||
|
@ -481,20 +500,12 @@ is used to store the linkname.
|
|||
The mode field provides 12 access permission bits. The following table
|
||||
shows the symbolic name of each bit and its octal value:
|
||||
|
||||
@multitable {Bit Name} {Bit value}
|
||||
@item Bit Name @tab Bit value
|
||||
@item S_ISUID @tab 04000
|
||||
@item S_ISGID @tab 02000
|
||||
@item S_ISVTX @tab 01000
|
||||
@item S_IRUSR @tab 00400
|
||||
@item S_IWUSR @tab 00200
|
||||
@item S_IXUSR @tab 00100
|
||||
@item S_IRGRP @tab 00040
|
||||
@item S_IWGRP @tab 00020
|
||||
@item S_IXGRP @tab 00010
|
||||
@item S_IROTH @tab 00004
|
||||
@item S_IWOTH @tab 00002
|
||||
@item S_IXOTH @tab 00001
|
||||
@multitable {Bit Name} {Value} {Bit Name} {Value} {Bit Name} {Value}
|
||||
@headitem Bit Name @tab Value @tab Bit Name @tab Value @tab Bit Name @tab Value
|
||||
@item S_ISUID @tab 04000 @tab S_ISGID @tab 02000 @tab S_ISVTX @tab 01000
|
||||
@item S_IRUSR @tab 00400 @tab S_IWUSR @tab 00200 @tab S_IXUSR @tab 00100
|
||||
@item S_IRGRP @tab 00040 @tab S_IWGRP @tab 00020 @tab S_IXGRP @tab 00010
|
||||
@item S_IROTH @tab 00004 @tab S_IWOTH @tab 00002 @tab S_IXOTH @tab 00001
|
||||
@end multitable
|
||||
|
||||
The uid and gid fields are the user and group ID of the owner and group
|
||||
|
@ -551,10 +562,13 @@ regular file (type 0).
|
|||
@end table
|
||||
|
||||
The magic field contains the ASCII null-terminated string "ustar". The
|
||||
version field contains the characters "00" (0x30,0x30). The fields
|
||||
uname, and gname are null-terminated character strings. Each numeric
|
||||
field contains a leading zero-filled, null-terminated octal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||
version field contains the characters "00" (0x30,0x30). The fields uname,
|
||||
and gname are null-terminated character strings except when all characters
|
||||
in the array contain non-null characters including the last character. Each
|
||||
numeric field contains a leading space- or zero-filled, optionally
|
||||
null-terminated octal number using digits from the ISO/IEC 646:1991 (ASCII)
|
||||
standard. Tarlz is able to decode numeric fields 1 byte larger than standard
|
||||
ustar by not requiring a terminating null character.
|
||||
|
||||
|
||||
@node Amendments to pax format
|
||||
|
@ -574,7 +588,7 @@ concrete reasons to implement them.
|
|||
The posix pax format has a serious flaw. The metadata stored in pax extended
|
||||
records are not protected by any kind of check sequence. Corruption in a
|
||||
long filename may cause the extraction of the file in the wrong place
|
||||
without warning. Corruption in a long file size may cause the truncation of
|
||||
without warning. Corruption in a large file size may cause the truncation of
|
||||
the file or the appending of garbage to the file, both followed by a
|
||||
spurious warning about a corrupt header far from the place of the undetected
|
||||
corruption.
|
||||
|
@ -636,6 +650,52 @@ double UTF-8 conversions. If the need arises this behavior will be adjusted
|
|||
with a command line option in the future.
|
||||
|
||||
|
||||
@node Multi-threaded tar
|
||||
@chapter Limitations of parallel tar decoding
|
||||
|
||||
Safely decoding an arbitrary tar archive in parallel is impossible. For
|
||||
example, if a tar archive containing another tar archive is decoded starting
|
||||
from some position other than the beginning, there is no way to know if the
|
||||
first header found there belongs to the outer tar archive or to the inner
|
||||
tar archive. Tar is a format inherently serial; it was designed for tapes.
|
||||
|
||||
In the case of compressed tar archives, the start of each compressed block
|
||||
determines one point through which the tar archive can be decoded in
|
||||
parallel. Therefore, in tar.lz archives the decoding operations can't be
|
||||
parallelized if the tar members are not aligned with the lzip members. Tar
|
||||
archives compressed with plzip can't be decoded in parallel because tar and
|
||||
plzip do not have a way to align both sets of members. Certainly one can
|
||||
decompress one such archive with a multi-threaded tool like plzip, but the
|
||||
increase in speed is not as large as it could be because plzip must
|
||||
serialize the decompressed data and pass them to tar, which decodes them
|
||||
sequentially, one tar member at a time.
|
||||
|
||||
On the other hand, if the tar.lz archive is created with a tool like tarlz,
|
||||
which can guarantee the alignment between tar members and lzip members
|
||||
because it controls both archiving and compression, then the lzip format
|
||||
becomes an indexed layer on top of the tar archive which makes possible
|
||||
decoding it safely in parallel.
|
||||
|
||||
Tarlz is able to automatically decode aligned and unaligned multimember
|
||||
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
|
||||
misalignment during multi-threaded decoding, it switches to single-threaded
|
||||
mode and continues decoding the archive. Currently only the @code{--list}
|
||||
option is able to do multi-threaded decoding.
|
||||
|
||||
If the files in the archive are large, multi-threaded @code{--list} on a
|
||||
regular tar.lz archive can be hundreds of times faster than sequential
|
||||
@code{--list} because, in addition to using several processors, it only
|
||||
needs to decompress part of each lzip member. See the following example
|
||||
listing the Silesia corpus on a dual core machine:
|
||||
|
||||
@example
|
||||
tarlz -9 -cf silesia.tar.lz silesia
|
||||
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
|
||||
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
|
||||
time tarlz -tf silesia.tar.lz (0.020s)
|
||||
@end example
|
||||
|
||||
|
||||
@node Examples
|
||||
@chapter A small tutorial with examples
|
||||
@cindex examples
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue