Adding upstream version 0.21.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-17 21:15:58 +01:00 · 2025-02-17 21:15:58 +01:00 · cc1b855cb3
commit cc1b855cb3
parent 7bf1f2e322
27 changed files with 961 additions and 324 deletions
--- a/doc/tarlz.texi
+++ b/doc/tarlz.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 8 January 2021
-@set VERSION 0.19
+@set UPDATED 14 June 2021
+@set VERSION 0.21

@dircategory Data Compression
@direntry
@ -138,7 +138,7 @@ tarlz [@var{options}] [@var{files}]

@noindent
 All operations except @samp{--concatenate} operate on whole trees if any
-@var{file} is a directory.
+@var{file} is a directory. Tarlz overwrites output files without warning.

 On archive creation or appending tarlz archives the files specified, but
 removes from member names any leading and trailing slashes and any file name
@ -155,7 +155,7 @@ member names in the archive or given in the command line, so that

 If several compression levels or @samp{--*solid} options are given, the last
 setting is used. For example @w{@samp{-9 --solid --uncompressed -1}} is
-equivalent to @samp{-1 --solid}
+equivalent to @w{@samp{-1 --solid}}.

 tarlz supports the following
@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
@ -174,15 +174,17 @@ This version number should be included in all bug reports.

@item -A
@itemx --concatenate
-Append one or more archives to the end of an archive. All the archives
-involved must be regular (seekable) files, and must be either all compressed
-or all uncompressed. Compressed and uncompressed archives can't be mixed.
-Compressed archives must be multimember lzip files with the two end-of-file
-blocks plus any zero padding contained in the last lzip member of each
-archive. The intermediate end-of-file blocks are removed as each new archive
-is concatenated. If the archive is uncompressed, tarlz parses and skips tar
-headers until it finds the end-of-file blocks. Exit with status 0 without
-modifying the archive if no @var{files} have been specified.
+Append one or more archives to the end of an archive. If no archive is
+specified with the option @samp{-f}, the input archives are concatenated to
+standard output. All the archives involved must be regular (seekable) files,
+and must be either all compressed or all uncompressed. Compressed and
+uncompressed archives can't be mixed. Compressed archives must be
+multimember lzip files with the two end-of-file blocks plus any zero padding
+contained in the last lzip member of each archive. The intermediate
+end-of-file blocks are removed as each new archive is concatenated. If the
+archive is uncompressed, tarlz parses and skips tar headers until it finds
+the end-of-file blocks. Exit with status 0 without modifying the archive if
+no @var{files} have been specified.

@anchor{--data-size}
@item -B @var{bytes}
@ -285,6 +287,12 @@ Note that the number of usable threads is limited during compression to
 and during decompression to the number of lzip members in the tar.lz
 archive, which you can find by running @w{@samp{lzip -lv archive.tar.lz}}.

+@item -o @var{file}
+@itemx --output=@var{file}
+Write the compressed output to @var{file}. @w{@samp{-o -}} writes the
+compressed output to standard output. Currently @samp{--output} only works
+with @samp{--compress}.
+
@item -p
@itemx --preserve-permissions
 On extraction, set file permissions as they appear in the archive. This is
@ -331,11 +339,34 @@ special effort to extract a file over an incompatible type of file. For
 example, extracting a link over a directory will usually fail. (Principle of
 least surprise).

+@item -z
+@itemx --compress
+Compress existing POSIX tar archives aligning the lzip members to the tar
+members with choice of granularity (---bsolid by default, ---dsolid works
+like ---asolid). The input archives are kept unchanged. Existing compressed
+archives are not overwritten. A hyphen @samp{-} used as the name of an input
+archive reads from standard input and writes to standard output (unless the
+option @samp{--output} is used). Tarlz can be used as compressor for GNU tar
+using a command like @w{@samp{tar -c -Hustar foo | tarlz -z -o foo.tar.lz}}.
+Note that tarlz only works reliably on archives without global headers, or
+with global headers whose content can be ignored.
+
+The compression is reversible, including any garbage present after the EOF
+blocks. Tarlz stops parsing after the first EOF block is found, and then
+compresses the rest of the archive. Unless solid compression is requested,
+the EOF blocks are compressed in a lzip member separated from the preceding
+members and from any non-zero garbage following the EOF blocks.
+@samp{--compress} implies plzip argument style, not tar style. Each input
+archive is compressed to a file with the extension @samp{.lz} added unless
+the option @samp{--output} is used. When @samp{--output} is used, only one
+input archive can be specified. @samp{-f} can't be used with
+@samp{--compress}.
+
@item -0 .. -9
-Set the compression level for @samp{--create} and @samp{--append}. The
-default compression level is @samp{-6}. Like lzip, tarlz also minimizes the
-dictionary size of the lzip members it creates, reducing the amount of
-memory required for decompression.
+Set the compression level for @samp{--create}, @samp{--append}, and
+@samp{--compress}. The default compression level is @samp{-6}. Like lzip,
+tarlz also minimizes the dictionary size of the lzip members it creates,
+reducing the amount of memory required for decompression.

@multitable {Level} {Dictionary size} {Match length limit}
@item Level @tab Dictionary size @tab Match length limit
@ -446,6 +477,14 @@ the value of @samp{LZ_API_VERSION} (if defined).
@xref{Library version,,,lzlib}.
@end ifnothtml

+@item --warn-newer
+During archive creation, warn if any file being archived has a modification
+time newer than the archive creation time. This option may slow archive
+creation somewhat because it makes an extra call to @samp{stat} after
+archiving each file, but it guarantees that file contents were not modified
+during the creation of the archive. Note that the file must be at least one
+second newer than the archive for it to be detected as newer.
+
@ignore
@item --permissive
 Allow some violations of the archive format, like consecutive extended
@ -457,7 +496,7 @@ keyword appearing in the same block of extended records.

 Exit status: 0 for a normal exit, 1 for environmental problems (file not
 found, files differ, invalid flags, I/O errors, etc), 2 to indicate a
-corrupt or invalid input file, 3 for an internal consistency error (eg, bug)
+corrupt or invalid input file, 3 for an internal consistency error (e.g. bug)
 which caused tarlz to panic.


@ -477,7 +516,13 @@ The last three characters are the period, underscore, and hyphen-minus
 characters, respectively.

 File names are identifiers. Therefore, archiving works better when file
-names use only the portable character set without spaces added.
+names use only the portable character set without spaces added. Unicode is
+for human consumption. It should be
+@uref{http://www.gnu.org/software/moe/manual/moe_manual.html#why-not-Unicode,,avoided}
+in computing environments, specially in file names.
+@ifnothtml
+@xref{why not Unicode,,,moe}.
+@end ifnothtml


@node File format
@ -796,8 +841,9 @@ integrity checking of lzip may not be able to detect the corruption before
 the metadata has been used, for example, to create a new file in the wrong
 place.

-Because of the above, tarlz protects the extended records with a CRC in a
-way compatible with standard tar tools. @xref{key_crc32}.
+Because of the above, tarlz protects the extended records with a Cyclic
+Redundancy Check (CRC) in a way compatible with standard tar tools.
+@xref{key_crc32}.

@sp 1
@anchor{flawed-compat}
@ -818,7 +864,9 @@ To avoid this problem, tarlz writes extended headers with all fields zeroed
 except size, chksum, typeflag, magic and version. This prevents old tar
 programs from extracting the extended records as a file in the wrong place.
 Tarlz also sets to zero those fields of the ustar header overridden by
-extended records.
+extended records. Finally, tarlz skips members without name when decoding
+except when listing. This is needed to detect certain format violations
+during parallel extraction.

 If an extended header is required for any reason (for example a file size
 larger than @w{8 GiB} or a link name longer than 100 bytes), tarlz moves the
@ -940,11 +988,20 @@ error be avoided.
@chapter Limitations of parallel tar decoding
@cindex parallel tar decoding

-Safely decoding an arbitrary tar archive in parallel is impossible. For
-example, if a tar archive containing another tar archive is decoded starting
-from some position other than the beginning, there is no way to know if the
-first header found there belongs to the outer tar archive or to the inner
-tar archive. Tar is a format inherently serial; it was designed for tapes.
+Safely decoding an arbitrary tar archive in parallel is only possible if one
+decodes the headers sequentially first. For example, if a tar archive
+containing another tar archive is decoded starting from some position other
+than the beginning, there is no way to know if the first header found there
+belongs to the outer tar archive or to the inner tar archive. Tar is a
+format inherently serial; it was designed for tapes.
+
+The pax format is even more serial than the ustar format. Two headers need
+to be decoded sequentially for each file. The extended header may even need
+parsing to reveal something as basic as file size. If a thread decodes the
+ustar header skipping the preceding extended header, it may extract a file
+of incorrect size at the wrong place. Moreover, a pax archive with global
+headers can't be decoded in parallel because each thread can't know about
+the global headers decoded by other threads.

 In the case of compressed tar archives, the start of each compressed block
 determines one point through which the tar archive can be decoded in
@ -1131,6 +1188,17 @@ directory @samp{destdir}.
 tarlz -C sourcedir -c . | tarlz -C destdir -x
@end example

+@sp 1
+@noindent
+Example 9: Compress the existing POSIX archive @samp{archive.tar} and write
+the output to @samp{archive.tar.lz}. Compress each member individually for
+maximum availability. (If one member in the compressed archive gets damaged,
+the other members can still be extracted).
+
+@example
+tarlz -z --no-solid archive.tar
+@end example
+

@node Problems
@chapter Reporting bugs