1
0
Fork 0

Adding upstream version 0.19.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-17 21:15:31 +01:00
parent 739f200278
commit 7bf1f2e322
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
28 changed files with 926 additions and 616 deletions

View file

@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
Tarlz Manual
************
This manual is for Tarlz (version 0.17, 30 July 2020).
This manual is for Tarlz (version 0.19, 8 January 2021).
* Menu:
@ -28,10 +28,10 @@ This manual is for Tarlz (version 0.17, 30 July 2020).
* Concept index:: Index of concepts
Copyright (C) 2013-2020 Antonio Diaz Diaz.
Copyright (C) 2013-2021 Antonio Diaz Diaz.
This manual is free documentation: you have unlimited permission to
copy, distribute, and modify it.
This manual is free documentation: you have unlimited permission to copy,
distribute, and modify it.

File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: Top
@ -40,13 +40,15 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
**************
Tarlz is a massively parallel (multi-threaded) combined implementation of
the tar archiver and the lzip compressor. Tarlz creates, lists and extracts
archives in a simplified and safer variant of the POSIX pax format
compressed with lzip, keeping the alignment between tar members and lzip
members. The resulting multimember tar.lz archive is fully backward
compatible with standard tar tools like GNU tar, which treat it like any
other tar.lz archive. Tarlz can append files to the end of such compressed
archives.
the tar archiver and the lzip compressor. Tarlz uses the compression
library lzlib.
Tarlz creates tar archives using a simplified and safer variant of the
POSIX pax format compressed in lzip format, keeping the alignment between
tar members and lzip members. The resulting multimember tar.lz archive is
fully backward compatible with standard tar tools like GNU tar, which treat
it like any other tar.lz archive. Tarlz can append files to the end of such
compressed archives.
Keeping the alignment between tar members and lzip members has two
advantages. It adds an indexed lzip layer on top of the tar archive, making
@ -56,7 +58,7 @@ plzip may even double the amount of files lost for each lzip member damaged
because it does not keep the members aligned.
Tarlz can create tar archives with five levels of compression
granularity; per file (--no-solid), per block (--bsolid, default), per
granularity: per file (--no-solid), per block (--bsolid, default), per
directory (--dsolid), appendable solid (--asolid), and solid (--solid). It
can also create uncompressed tar archives.
@ -79,8 +81,8 @@ archive, but it has the following advantages:
lziprecover can be used to recover some of the damaged members.
* A multimember tar.lz archive is usually smaller than the corresponding
solidly compressed tar.gz archive, except when compressing files
smaller than about 32 KiB individually.
solidly compressed tar.gz archive, except when individually
compressing files smaller than about 32 KiB.
Tarlz protects the extended records with a Cyclic Redundancy Check (CRC)
in a way compatible with standard tar tools. *Note crc32::.
@ -240,8 +242,7 @@ to '-1 --solid'
not used, tarlz tries to detect the number of processors in the system
and use it as default value. 'tarlz --help' shows the system's default
value. See the note about multi-threaded archive creation in the
option '-C' above. Multi-threaded extraction of files from an archive
is not yet implemented. *Note Multi-threaded decoding::.
option '-C' above.
Note that the number of usable threads is limited during compression to
ceil( uncompressed_size / data_size ) (*note Minimum archive sizes::),
@ -281,7 +282,8 @@ to '-1 --solid'
'-v'
'--verbose'
Verbosely list files processed.
Verbosely list files processed. Further -v's (up to 4) increase the
verbosity level.
'-x'
'--extract'
@ -376,7 +378,8 @@ to '-1 --solid'
Don't delete partially extracted files. If a decompression error
happens while extracting a file, keep the partial data extracted. Use
this option to recover as much data as possible from each damaged
member.
member. It is recommended to run tarlz in single-threaded mode
(-threads=0) when using this option.
'--missing-crc'
Exit with error status 2 if the CRC of the extended records is missing.
@ -396,6 +399,15 @@ to '-1 --solid'
more memory. Valid values range from 1 to 1024. The default value is
64.
'--check-lib'
Compare the version of lzlib used to compile tarlz with the version
actually being used and exit. Report any differences found. Exit with
error status 1 if differences are found. A mismatch may indicate that
lzlib is not correctly installed or that a different version of lzlib
has been installed after compiling tarlz. 'tarlz -v --check-lib' shows
the version of lzlib being used and the value of 'LZ_API_VERSION' (if
defined). *Note Library version: (lzlib)Library version.
Exit status: 0 for a normal exit, 1 for environmental problems (file not
found, files differ, invalid flags, I/O errors, etc), 2 to indicate a
@ -546,6 +558,10 @@ space, equal-sign, and newline.
the swapping of two bytes.
At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
extended header keyword found in an archive, once per keyword.
4.2 Ustar header block
======================
@ -770,11 +786,12 @@ interesting parts described here are those related to Multi-threaded
processing.
The structure of the part of tarlz performing Multi-threaded archive
creation is somewhat similar to that of plzip with the added complication of
the solidity levels. A grouper thread and several worker threads are
created, acting the main thread as muxer (multiplexer) thread. A "packet
courier" takes care of data transfers among threads and limits the maximum
number of data blocks (packets) being processed simultaneously.
creation is somewhat similar to that of plzip with the added complication
of the solidity levels. *Note Program design: (plzip)Program design. A
grouper thread and several worker threads are created, acting the main
thread as muxer (multiplexer) thread. A "packet courier" takes care of data
transfers among threads and limits the maximum number of data blocks
(packets) being processed simultaneously.
The grouper traverses the directory tree, groups together the metadata of
the files to be archived in each lzip member, and distributes them to the
@ -805,8 +822,7 @@ the archive.
,--------,
| file |<---> data to/from each worker below
| system |
`--------'
,------------,
`--------' ,------------,
,-->| worker 0 |--,
| `------------' |
,---------, | ,------------, | ,-------, ,--------,
@ -870,8 +886,7 @@ possible decoding it safely in parallel.
Tarlz is able to automatically decode aligned and unaligned multimember
tar.lz archives, keeping backwards compatibility. If tarlz finds a member
misalignment during multi-threaded decoding, it switches to single-threaded
mode and continues decoding the archive. Currently only the options
'--diff' and '--list' are able to do multi-threaded decoding.
mode and continues decoding the archive.
If the files in the archive are large, multi-threaded '--list' on a
regular (seekable) tar.lz archive can be hundreds of times faster than
@ -886,7 +901,33 @@ example listing the Silesia corpus on a dual core machine:
On the other hand, multi-threaded '--list' won't detect corruption in
the tar member data because it only decodes the part of each lzip member
corresponding to the tar member header.
corresponding to the tar member header. This is another reason why the tar
headers must provide its own integrity checking.
7.1 Limitations of multi-threaded extraction
============================================
Multi-threaded extraction may produce different output than single-threaded
extraction in some cases:
During multi-threaded extraction, several independent processes are
simultaneously reading the archive and creating files in the file system.
The archive is not read sequentially. As a consequence, any error or
weirdness in the archive (like a corrupt member or an EOF block in the
middle of the archive) won't be usually detected until part of the archive
beyond that point has been processed.
If the archive contains two or more tar members with the same name,
single-threaded extraction extracts the members in the order they appear in
the archive and leaves in the file system the last version of the file. But
multi-threaded extraction may extract the members in any order and leave in
the file system any version of the file nondeterministically. It is
unspecified which of the tar members is extracted.
If the same file is extracted through several paths (different member
names resolve to the same file in the file system), the result is undefined.
(Probably the resulting file will be mangled).

File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
@ -1028,22 +1069,22 @@ Concept index

Tag Table:
Node: Top223
Node: Introduction1212
Node: Invoking tarlz3982
Ref: --data-size6193
Ref: --bsolid14608
Node: Portable character set18244
Node: File format18887
Ref: key_crc3223812
Node: Amendments to pax format29271
Ref: crc3229935
Ref: flawed-compat31220
Node: Program design33865
Node: Multi-threaded decoding37756
Node: Minimum archive sizes40492
Node: Examples42630
Node: Problems44345
Node: Concept index44873
Node: Introduction1214
Node: Invoking tarlz4022
Ref: --data-size6233
Ref: --bsolid14593
Node: Portable character set18852
Node: File format19495
Ref: key_crc3224420
Node: Amendments to pax format30021
Ref: crc3230685
Ref: flawed-compat31970
Node: Program design34615
Node: Multi-threaded decoding38540
Node: Minimum archive sizes42482
Node: Examples44620
Node: Problems46335
Node: Concept index46863

End Tag Table