1
0
Fork 0

Merging upstream version 0.9.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 21:10:53 +01:00
parent 2ab7382c1c
commit f787962ed2
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
25 changed files with 1761 additions and 353 deletions

View file

@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.8, 16 December 2018).
This manual is for Tarlz (version 0.9, 22 January 2019).
* Menu:
@ -19,12 +19,13 @@ This manual is for Tarlz (version 0.8, 16 December 2018).
* Invoking tarlz:: Command line interface
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Multi-threaded tar:: Limitations of parallel tar decoding
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
Copyright (C) 2013-2018 Antonio Diaz Diaz.
Copyright (C) 2013-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute and modify it.
@ -35,12 +36,14 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
1 Introduction
**************
Tarlz is a small and simple implementation of the tar archiver. By
default tarlz creates, lists and extracts archives in a simplified
posix pax format compressed with lzip on a per file basis. Each tar
member is compressed in its own lzip member, as well as the end-of-file
blocks. This method is fully backward compatible with standard tar tools
like GNU tar, which treat the resulting multimember tar.lz archive like
Tarlz is a combined implementation of the tar archiver and the lzip
compressor. By default tarlz creates, lists and extracts archives in a
simplified posix pax format compressed with lzip on a per file basis.
Each tar member is compressed in its own lzip member, as well as the
end-of-file blocks. This method adds an indexed lzip layer on top of
the tar archive, making it possible to decode the archive safely in
parallel. The resulting multimember tar.lz archive is fully backward
compatible with standard tar tools like GNU tar, which treat it like
any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
@ -52,7 +55,7 @@ less efficient than compressing the whole tar archive, but it has the
following advantages:
* The resulting multimember tar.lz archive can be decompressed in
parallel with plzip, multiplying the decompression speed.
parallel, multiplying the decompression speed.
* New members can be appended to the archive (by removing the EOF
member) just like to an uncompressed tar archive.
@ -74,10 +77,6 @@ with standard tar tools. *Note crc32::.
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
'star' or 'v7'.
Tarlz is intended as a showcase project for the maintainers of real
tar programs to evaluate the format and perhaps implement it in their
tools.

File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
@ -141,6 +140,21 @@ archive 'foo'.
Use archive file ARCHIVE. '-' used as an ARCHIVE argument reads
from standard input or writes to standard output.
'-n N'
'--threads=N'
Set the number of decompression threads, overriding the system's
default. Valid values range from 0 to "as many as your system can
support". A value of 0 disables threads entirely. If this option
is not used, tarlz tries to detect the number of processors in the
system and use it as default value. 'tarlz --help' shows the
system's default value. This option currently only has effect when
listing the contents of a multimember compressed archive. *Note
Multi-threaded tar::.
Note that the number of usable threads is limited during
decompression to the number of lzip members in the tar.lz archive,
which you can find by running 'lzip -lv archive.tar.lz'.
'-q'
'--quiet'
Quiet operation. Suppress all messages.
@ -288,6 +302,11 @@ following sequence:
* Zero or more blocks that contain the contents of the file.
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like '--list' to work. If any tar member
is split over two or more lzip members, the archive must be decoded
sequentially. *Note Multi-threaded tar::.
At the end of the archive file there are two 512-byte blocks filled
with binary zeros, interpreted as an end-of-archive indicator. These EOF
blocks are either compressed in a separate lzip member or compressed
@ -417,19 +436,12 @@ record is used to store the linkname.
The mode field provides 12 access permission bits. The following
table shows the symbolic name of each bit and its octal value:
Bit Name Bit value
S_ISUID 04000
S_ISGID 02000
S_ISVTX 01000
S_IRUSR 00400
S_IWUSR 00200
S_IXUSR 00100
S_IRGRP 00040
S_IWGRP 00020
S_IXGRP 00010
S_IROTH 00004
S_IWOTH 00002
S_IXOTH 00001
Bit Name Value Bit Name Value Bit Name Value
---------------------------------------------------
S_ISUID 04000 S_ISGID 02000 S_ISVTX 01000
S_IRUSR 00400 S_IWUSR 00200 S_IXUSR 00100
S_IRGRP 00040 S_IWGRP 00020 S_IXGRP 00010
S_IROTH 00004 S_IWOTH 00002 S_IXOTH 00001
The uid and gid fields are the user and group ID of the owner and
group of the file, respectively.
@ -485,12 +497,16 @@ file archived:
The magic field contains the ASCII null-terminated string "ustar".
The version field contains the characters "00" (0x30,0x30). The fields
uname, and gname are null-terminated character strings. Each numeric
field contains a leading zero-filled, null-terminated octal number using
digits from the ISO/IEC 646:1991 (ASCII) standard.
uname, and gname are null-terminated character strings except when all
characters in the array contain non-null characters including the last
character. Each numeric field contains a leading space- or zero-filled,
optionally null-terminated octal number using digits from the ISO/IEC
646:1991 (ASCII) standard. Tarlz is able to decode numeric fields 1
byte larger than standard ustar by not requiring a terminating null
character.

File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
File: tarlz.info, Node: Amendments to pax format, Next: Multi-threaded tar, Prev: File format, Up: Top
4 The reasons for the differences with pax
******************************************
@ -508,7 +524,7 @@ and the concrete reasons to implement them.
The posix pax format has a serious flaw. The metadata stored in pax
extended records are not protected by any kind of check sequence.
Corruption in a long filename may cause the extraction of the file in
the wrong place without warning. Corruption in a long file size may
the wrong place without warning. Corruption in a large file size may
cause the truncation of the file or the appending of garbage to the
file, both followed by a spurious warning about a corrupt header far
from the place of the undetected corruption.
@ -573,9 +589,57 @@ prevents accidental double UTF-8 conversions. If the need arises this
behavior will be adjusted with a command line option in the future.

File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
File: tarlz.info, Node: Multi-threaded tar, Next: Examples, Prev: Amendments to pax format, Up: Top
5 A small tutorial with examples
5 Limitations of parallel tar decoding
**************************************
Safely decoding an arbitrary tar archive in parallel is impossible. For
example, if a tar archive containing another tar archive is decoded
starting from some position other than the beginning, there is no way
to know if the first header found there belongs to the outer tar
archive or to the inner tar archive. Tar is a format inherently serial;
it was designed for tapes.
In the case of compressed tar archives, the start of each compressed
block determines one point through which the tar archive can be decoded
in parallel. Therefore, in tar.lz archives the decoding operations
can't be parallelized if the tar members are not aligned with the lzip
members. Tar archives compressed with plzip can't be decoded in
parallel because tar and plzip do not have a way to align both sets of
members. Certainly one can decompress one such archive with a
multi-threaded tool like plzip, but the increase in speed is not as
large as it could be because plzip must serialize the decompressed data
and pass them to tar, which decodes them sequentially, one tar member
at a time.
On the other hand, if the tar.lz archive is created with a tool like
tarlz, which can guarantee the alignment between tar members and lzip
members because it controls both archiving and compression, then the
lzip format becomes an indexed layer on top of the tar archive which
makes possible decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned
multimember tar.lz archives, keeping backwards compatibility. If tarlz
finds a member misalignment during multi-threaded decoding, it switches
to single-threaded mode and continues decoding the archive. Currently
only the '--list' option is able to do multi-threaded decoding.
If the files in the archive are large, multi-threaded '--list' on a
regular tar.lz archive can be hundreds of times faster than sequential
'--list' because, in addition to using several processors, it only
needs to decompress part of each lzip member. See the following example
listing the Silesia corpus on a dual core machine:
tarlz -9 -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)

File: tarlz.info, Node: Examples, Next: Problems, Prev: Multi-threaded tar, Up: Top
6 A small tutorial with examples
********************************
Example 1: Create a multimember compressed archive 'archive.tar.lz'
@ -633,7 +697,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory

File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
6 Reporting bugs
7 Reporting bugs
****************
There are probably bugs in tarlz. There are certainly errors and
@ -670,16 +734,17 @@ Concept index

Tag Table:
Node: Top223
Node: Introduction946
Node: Invoking tarlz3084
Node: File format9606
Ref: key_crc3214138
Node: Amendments to pax format19215
Ref: crc3219729
Ref: flawed-compat20753
Node: Examples23126
Node: Problems24802
Node: Concept index25328
Node: Introduction1012
Node: Invoking tarlz3124
Node: File format10384
Ref: key_crc3215169
Node: Amendments to pax format20586
Ref: crc3221110
Ref: flawed-compat22135
Node: Multi-threaded tar24508
Node: Examples27012
Node: Problems28682
Node: Concept index29208

End Tag Table