1
0
Fork 0

Adding upstream version 0.9.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 21:10:27 +01:00
parent 9bbbd387b8
commit 7cf0407517
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
25 changed files with 1761 additions and 353 deletions

View file

@ -6,8 +6,8 @@
@finalout
@c %**end of header
@set UPDATED 16 December 2018
@set VERSION 0.8
@set UPDATED 22 January 2019
@set VERSION 0.9
@dircategory Data Compression
@direntry
@ -39,13 +39,14 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
* Invoking tarlz:: Command line interface
* File format:: Detailed format of the compressed archive
* Amendments to pax format:: The reasons for the differences with pax
* Multi-threaded tar:: Limitations of parallel tar decoding
* Examples:: A small tutorial with examples
* Problems:: Reporting bugs
* Concept index:: Index of concepts
@end menu
@sp 1
Copyright @copyright{} 2013-2018 Antonio Diaz Diaz.
Copyright @copyright{} 2013-2019 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission
to copy, distribute and modify it.
@ -55,18 +56,20 @@ to copy, distribute and modify it.
@chapter Introduction
@cindex introduction
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a small and simple
implementation of the tar archiver. By default tarlz creates, lists and
extracts archives in a simplified posix pax format compressed with
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} on a per file basis. Each
tar member is compressed in its own lzip member, as well as the end-of-file
blocks. This method is fully backward compatible with standard tar tools
like GNU tar, which treat the resulting multimember tar.lz archive like any
other tar.lz archive. Tarlz can append files to the end of such compressed
archives.
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a combined
implementation of the tar archiver and the
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. By default
tarlz creates, lists and extracts archives in a simplified posix pax format
compressed with lzip on a per file basis. Each tar member is compressed in
its own lzip member, as well as the end-of-file blocks. This method adds an
indexed lzip layer on top of the tar archive, making it possible to decode
the archive safely in parallel. The resulting multimember tar.lz archive is
fully backward compatible with standard tar tools like GNU tar, which treat
it like any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
Tarlz can create tar archives with four levels of compression
granularity; per file, per directory, appendable solid, and solid.
Tarlz can create tar archives with four levels of compression granularity;
per file, per directory, appendable solid, and solid.
@noindent
Of course, compressing each file (or each directory) individually is
@ -76,7 +79,7 @@ following advantages:
@itemize @bullet
@item
The resulting multimember tar.lz archive can be decompressed in
parallel with plzip, multiplying the decompression speed.
parallel, multiplying the decompression speed.
@item
New members can be appended to the archive (by removing the EOF
@ -102,9 +105,6 @@ standard tar tools. @xref{crc32}.
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
@samp{star} or @samp{v7}.
Tarlz is intended as a showcase project for the maintainers of real tar
programs to evaluate the format and perhaps implement it in their tools.
@node Invoking tarlz
@chapter Invoking tarlz
@ -174,6 +174,20 @@ previous @code{-C} option.
Use archive file @var{archive}. @samp{-} used as an @var{archive}
argument reads from standard input or writes to standard output.
@item -n @var{n}
@itemx --threads=@var{n}
Set the number of decompression threads, overriding the system's default.
Valid values range from 0 to "as many as your system can support". A value
of 0 disables threads entirely. If this option is not used, tarlz tries to
detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. This option
currently only has effect when listing the contents of a multimember
compressed archive. @xref{Multi-threaded tar}.
Note that the number of usable threads is limited during decompression to
the number of lzip members in the tar.lz archive, which you can find by
running @w{@code{lzip -lv archive.tar.lz}}.
@item -q
@itemx --quiet
Quiet operation. Suppress all messages.
@ -335,6 +349,11 @@ associated fields in this header block for this file.
Zero or more blocks that contain the contents of the file.
@end itemize
Each tar member must be contiguously stored in a lzip member for the
parallel decoding operations like @code{--list} to work. If any tar member
is split over two or more lzip members, the archive must be decoded
sequentially. @xref{Multi-threaded tar}.
At the end of the archive file there are two 512-byte blocks filled with
binary zeros, interpreted as an end-of-archive indicator. These EOF
blocks are either compressed in a separate lzip member or compressed
@ -481,20 +500,12 @@ is used to store the linkname.
The mode field provides 12 access permission bits. The following table
shows the symbolic name of each bit and its octal value:
@multitable {Bit Name} {Bit value}
@item Bit Name @tab Bit value
@item S_ISUID @tab 04000
@item S_ISGID @tab 02000
@item S_ISVTX @tab 01000
@item S_IRUSR @tab 00400
@item S_IWUSR @tab 00200
@item S_IXUSR @tab 00100
@item S_IRGRP @tab 00040
@item S_IWGRP @tab 00020
@item S_IXGRP @tab 00010
@item S_IROTH @tab 00004
@item S_IWOTH @tab 00002
@item S_IXOTH @tab 00001
@multitable {Bit Name} {Value} {Bit Name} {Value} {Bit Name} {Value}
@headitem Bit Name @tab Value @tab Bit Name @tab Value @tab Bit Name @tab Value
@item S_ISUID @tab 04000 @tab S_ISGID @tab 02000 @tab S_ISVTX @tab 01000
@item S_IRUSR @tab 00400 @tab S_IWUSR @tab 00200 @tab S_IXUSR @tab 00100
@item S_IRGRP @tab 00040 @tab S_IWGRP @tab 00020 @tab S_IXGRP @tab 00010
@item S_IROTH @tab 00004 @tab S_IWOTH @tab 00002 @tab S_IXOTH @tab 00001
@end multitable
The uid and gid fields are the user and group ID of the owner and group
@ -551,10 +562,13 @@ regular file (type 0).
@end table
The magic field contains the ASCII null-terminated string "ustar". The
version field contains the characters "00" (0x30,0x30). The fields
uname, and gname are null-terminated character strings. Each numeric
field contains a leading zero-filled, null-terminated octal number using
digits from the ISO/IEC 646:1991 (ASCII) standard.
version field contains the characters "00" (0x30,0x30). The fields uname,
and gname are null-terminated character strings except when all characters
in the array contain non-null characters including the last character. Each
numeric field contains a leading space- or zero-filled, optionally
null-terminated octal number using digits from the ISO/IEC 646:1991 (ASCII)
standard. Tarlz is able to decode numeric fields 1 byte larger than standard
ustar by not requiring a terminating null character.
@node Amendments to pax format
@ -574,7 +588,7 @@ concrete reasons to implement them.
The posix pax format has a serious flaw. The metadata stored in pax extended
records are not protected by any kind of check sequence. Corruption in a
long filename may cause the extraction of the file in the wrong place
without warning. Corruption in a long file size may cause the truncation of
without warning. Corruption in a large file size may cause the truncation of
the file or the appending of garbage to the file, both followed by a
spurious warning about a corrupt header far from the place of the undetected
corruption.
@ -636,6 +650,52 @@ double UTF-8 conversions. If the need arises this behavior will be adjusted
with a command line option in the future.
@node Multi-threaded tar
@chapter Limitations of parallel tar decoding
Safely decoding an arbitrary tar archive in parallel is impossible. For
example, if a tar archive containing another tar archive is decoded starting
from some position other than the beginning, there is no way to know if the
first header found there belongs to the outer tar archive or to the inner
tar archive. Tar is a format inherently serial; it was designed for tapes.
In the case of compressed tar archives, the start of each compressed block
determines one point through which the tar archive can be decoded in
parallel. Therefore, in tar.lz archives the decoding operations can't be
parallelized if the tar members are not aligned with the lzip members. Tar
archives compressed with plzip can't be decoded in parallel because tar and
plzip do not have a way to align both sets of members. Certainly one can
decompress one such archive with a multi-threaded tool like plzip, but the
increase in speed is not as large as it could be because plzip must
serialize the decompressed data and pass them to tar, which decodes them
sequentially, one tar member at a time.
On the other hand, if the tar.lz archive is created with a tool like tarlz,
which can guarantee the alignment between tar members and lzip members
because it controls both archiving and compression, then the lzip format
becomes an indexed layer on top of the tar archive which makes possible
decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive. Currently only the @code{--list}
option is able to do multi-threaded decoding.
If the files in the archive are large, multi-threaded @code{--list} on a
regular tar.lz archive can be hundreds of times faster than sequential
@code{--list} because, in addition to using several processors, it only
needs to decompress part of each lzip member. See the following example
listing the Silesia corpus on a dual core machine:
@example
tarlz -9 -cf silesia.tar.lz silesia
time lzip -cd silesia.tar.lz | tar -tf - (5.032s)
time plzip -cd silesia.tar.lz | tar -tf - (3.256s)
time tarlz -tf silesia.tar.lz (0.020s)
@end example
@node Examples
@chapter A small tutorial with examples
@cindex examples