Adding upstream version 0.19.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-17 21:15:31 +01:00 · 2025-02-17 21:15:31 +01:00 · 7bf1f2e322
commit 7bf1f2e322
parent 739f200278
28 changed files with 926 additions and 616 deletions
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 30 July 2020
-@set VERSION 0.17
+@set UPDATED 8 January 2021
+@set VERSION 0.19

@dircategory Data Compression
@direntry
@ -29,6 +29,7 @@
@contents
@end ifnothtml

+@ifnottex
@node Top
@top

@ -49,10 +50,11 @@ This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
@end menu

@sp 1
-Copyright @copyright{} 2013-2020 Antonio Diaz Diaz.
+Copyright @copyright{} 2013-2021 Antonio Diaz Diaz.

-This manual is free documentation: you have unlimited permission
-to copy, distribute, and modify it.
+This manual is free documentation: you have unlimited permission to copy,
+distribute, and modify it.
+@end ifnottex


@node Introduction
@ -61,13 +63,15 @@ to copy, distribute, and modify it.

@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a massively parallel
 (multi-threaded) combined implementation of the tar archiver and the
-@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz creates,
-lists and extracts archives in a simplified and safer variant of the POSIX
-pax format compressed with lzip, keeping the alignment between tar members
-and lzip members. The resulting multimember tar.lz archive is fully backward
-compatible with standard tar tools like GNU tar, which treat it like any
-other tar.lz archive. Tarlz can append files to the end of such compressed
-archives.
+@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} compressor. Tarlz uses the
+compression library @uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
+
+Tarlz creates tar archives using a simplified and safer variant of the POSIX
+pax format compressed in lzip format, keeping the alignment between tar
+members and lzip members. The resulting multimember tar.lz archive is fully
+backward compatible with standard tar tools like GNU tar, which treat it
+like any other tar.lz archive. Tarlz can append files to the end of such
+compressed archives.

 Keeping the alignment between tar members and lzip members has two
 advantages. It adds an indexed lzip layer on top of the tar archive, making
@ -76,7 +80,7 @@ amount of data lost in case of corruption. Compressing a tar archive with
 plzip may even double the amount of files lost for each lzip member damaged
 because it does not keep the members aligned.

-Tarlz can create tar archives with five levels of compression granularity;
+Tarlz can create tar archives with five levels of compression granularity:
 per file (---no-solid), per block (---bsolid, default), per directory
 (---dsolid), appendable solid (---asolid), and solid (---solid). It can also
 create uncompressed tar archives.
@ -97,17 +101,17 @@ member), and unwanted members can be deleted from the archive. Just
 like an uncompressed tar archive.

@item
-It is a safe POSIX-style backup format. In case of corruption,
-tarlz can extract all the undamaged members from the tar.lz
-archive, skipping over the damaged members, just like the standard
-(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be
-used to recover as much data as possible from each damaged member,
-and lziprecover can be used to recover some of the damaged members.
+It is a safe POSIX-style backup format. In case of corruption, tarlz
+can extract all the undamaged members from the tar.lz archive,
+skipping over the damaged members, just like the standard
+(uncompressed) tar. Moreover, the option @samp{--keep-damaged} can be used
+to recover as much data as possible from each damaged member, and
+lziprecover can be used to recover some of the damaged members.

@item
-A multimember tar.lz archive is usually smaller than the
-corresponding solidly compressed tar.gz archive, except when
-compressing files smaller than about 32 KiB individually.
+A multimember tar.lz archive is usually smaller than the corresponding
+solidly compressed tar.gz archive, except when individually
+compressing files smaller than about 32 KiB.
@end itemize

 Tarlz protects the extended records with a Cyclic Redundancy Check (CRC) in
@ -275,8 +279,6 @@ of 0 disables threads entirely. If this option is not used, tarlz tries to
 detect the number of processors in the system and use it as default value.
@w{@samp{tarlz --help}} shows the system's default value. See the note about
 multi-threaded archive creation in the option @samp{-C} above.
-Multi-threaded extraction of files from an archive is not yet implemented.
-@xref{Multi-threaded decoding}.

 Note that the number of usable threads is limited during compression to
@w{ceil( uncompressed_size / data_size )} (@pxref{Minimum archive sizes}),
@ -316,7 +318,8 @@ List the contents of an archive. If @var{files} are given, list only the

@item -v
@itemx --verbose
-Verbosely list files processed.
+Verbosely list files processed. Further -v's (up to 4) increase the
+verbosity level.

@item -x
@itemx --extract
@ -409,8 +412,9 @@ decimal numeric group ID.

@item --keep-damaged
 Don't delete partially extracted files. If a decompression error happens
-while extracting a file, keep the partial data extracted. Use this
-option to recover as much data as possible from each damaged member.
+while extracting a file, keep the partial data extracted. Use this option to
+recover as much data as possible from each damaged member. It is recommended
+to run tarlz in single-threaded mode (--threads=0) when using this option.

@item --missing-crc
 Exit with error status 2 if the CRC of the extended records is missing.
@ -429,6 +433,19 @@ number of packets may increase compression speed if the files being archived
 are larger than @w{64 MiB} compressed, but requires more memory. Valid
 values range from 1 to 1024. The default value is 64.

+@item --check-lib
+Compare the
+@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib}
+used to compile tarlz with the version actually being used and exit. Report
+any differences found. Exit with error status 1 if differences are found. A
+mismatch may indicate that lzlib is not correctly installed or that a
+different version of lzlib has been installed after compiling tarlz.
+@w{@samp{tarlz -v --check-lib}} shows the version of lzlib being used and
+the value of @samp{LZ_API_VERSION} (if defined).
+@ifnothtml
+@xref{Library version,,,lzlib}.
+@end ifnothtml
+
@ignore
@item --permissive
 Allow some violations of the archive format, like consecutive extended
@ -613,8 +630,12 @@ protected by the CRC to guarante that corruption is always detected
 (except in case of CRC collision). A CRC was chosen because a checksum
 is too weak for a potentially large list of variable sized records. A
 checksum can't detect simple errors like the swapping of two bytes.
+
@end table

+At verbosity level 1 or higher tarlz prints a diagnostic for each unknown
+extended header keyword found in an archive, once per keyword.
+
@sp 1
@section Ustar header block

@ -839,11 +860,16 @@ or less similar to any other tar and won't be described here. The interesting
 parts described here are those related to Multi-threaded processing.

 The structure of the part of tarlz performing Multi-threaded archive
-creation is somewhat similar to that of plzip with the added complication of
-the solidity levels. A grouper thread and several worker threads are
-created, acting the main thread as muxer (multiplexer) thread. A "packet
-courier" takes care of data transfers among threads and limits the maximum
-number of data blocks (packets) being processed simultaneously.
+creation is somewhat similar to that of
+@uref{http://www.nongnu.org/lzip/plzip.html#Program-design,,plzip} with the
+added complication of the solidity levels.
+@ifnothtml
+@xref{Program design,,,plzip}.
+@end ifnothtml
+A grouper thread and several worker threads are created, acting the main
+thread as muxer (multiplexer) thread. A "packet courier" takes care of data
+transfers among threads and limits the maximum number of data blocks
+(packets) being processed simultaneously.

 The grouper traverses the directory tree, groups together the metadata of
 the files to be archived in each lzip member, and distributes them to the
@ -876,8 +902,7 @@ access files in the file system either to read them (diff) or write them
 ,--------,
 | file   |<---> data to/from each worker below
 | system |
-`--------'
-                ,------------,
+`--------'      ,------------,
            ,-->| worker   0 |--,
            |   `------------'  |
 ,---------, |   ,------------,  |   ,-------,   ,--------,
@ -941,8 +966,7 @@ decoding it safely in parallel.
 Tarlz is able to automatically decode aligned and unaligned multimember
 tar.lz archives, keeping backwards compatibility. If tarlz finds a member
 misalignment during multi-threaded decoding, it switches to single-threaded
-mode and continues decoding the archive. Currently only the options
-@samp{--diff} and @samp{--list} are able to do multi-threaded decoding.
+mode and continues decoding the archive.

 If the files in the archive are large, multi-threaded @samp{--list} on a
 regular (seekable) tar.lz archive can be hundreds of times faster than
@ -959,7 +983,32 @@ time tarlz -tf silesia.tar.lz                       (0.020s)

 On the other hand, multi-threaded @samp{--list} won't detect corruption in
 the tar member data because it only decodes the part of each lzip member
-corresponding to the tar member header.
+corresponding to the tar member header. This is another reason why the tar
+headers must provide its own integrity checking.
+
+@sp 1
+@section Limitations of multi-threaded extraction
+
+Multi-threaded extraction may produce different output than single-threaded
+extraction in some cases:
+
+During multi-threaded extraction, several independent processes are
+simultaneously reading the archive and creating files in the file system. The
+archive is not read sequentially. As a consequence, any error or weirdness
+in the archive (like a corrupt member or an EOF block in the middle of the
+archive) won't be usually detected until part of the archive beyond that
+point has been processed.
+
+If the archive contains two or more tar members with the same name,
+single-threaded extraction extracts the members in the order they appear in
+the archive and leaves in the file system the last version of the file. But
+multi-threaded extraction may extract the members in any order and leave in
+the file system any version of the file nondeterministically. It is
+unspecified which of the tar members is extracted.
+
+If the same file is extracted through several paths (different member names
+resolve to the same file in the file system), the result is undefined.
+(Probably the resulting file will be mangled).


@node Minimum archive sizes