Merging upstream version 1.13~rc1.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-02-24 06:03:46 +01:00 · 2025-02-24 06:03:46 +01:00 · 95e3ee3bd3
commit 95e3ee3bd3
parent f40403d840
29 changed files with 472 additions and 517 deletions
--- a/doc/zutils.texi
+++ b/doc/zutils.texi
@ -6,8 +6,8 @@
@finalout
@c %**end of header

-@set UPDATED 7 January 2023
-@set VERSION 1.12
+@set UPDATED 31 December 2023
+@set VERSION 1.13-rc1

@dircategory Compression
@direntry
@ -66,8 +66,8 @@ is a collection of utilities able to process any combination of
 compressed and uncompressed files transparently. If any file given,
 including standard input, is compressed, its decompressed content is used.
 Compressed files are decompressed on the fly; no temporary files are
-created. Data format is detected by its magic bytes, not by the file name
-extension.
+created. Data format is detected by its identifier string (magic bytes), not
+by the file name extension. Empty files are considered uncompressed.

 These utilities are not wrapper scripts but safer and more efficient C++
 programs. In particular the option @option{--recursive} is very efficient in
@ -86,6 +86,11 @@ improved replacements for the shell scripts provided by GNU gzip.
@command{ztest} is unique to zutils. @command{zupdate} is similar to gzip's
 znew.

+@anchor{search-order}
+When @command{zcat}, @command{zcmp}, @command{zdiff}, or @command{zgrep}
+need to try compressed file names, the search order is: lzip, gzip, bzip2,
+zstd, xz. (@var{file}.[lz|gz|bz2|zst|xz]).
+
 NOTE: Bzip2 and lzip provide well-defined values of exit status, which makes
 them safe to use with zutils. Gzip and xz may return ambiguous warning
 values, making them less reliable back ends for zutils. Zstd currently does
@ -106,24 +111,6 @@ LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
 been compressed. Decompressed is used to refer to data which have undergone
 the process of decompression.

-@sp 1
-Numbers given as arguments to options (positions, sizes) may be followed
-by a multiplier and an optional @samp{B} for "byte".
-
-Table of SI and binary prefixes (unit multipliers):
-
-@multitable {Prefix} {kilobyte  (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
-@item Prefix @tab Value               @tab | @tab Prefix @tab Value
-@item k @tab kilobyte  (10^3 = 1000)  @tab | @tab Ki @tab kibibyte (2^10 = 1024)
-@item M @tab megabyte  (10^6)         @tab | @tab Mi @tab mebibyte (2^20)
-@item G @tab gigabyte  (10^9)         @tab | @tab Gi @tab gibibyte (2^30)
-@item T @tab terabyte  (10^12)        @tab | @tab Ti @tab tebibyte (2^40)
-@item P @tab petabyte  (10^15)        @tab | @tab Pi @tab pebibyte (2^50)
-@item E @tab exabyte   (10^18)        @tab | @tab Ei @tab exbibyte (2^60)
-@item Z @tab zettabyte (10^21)        @tab | @tab Zi @tab zebibyte (2^70)
-@item Y @tab yottabyte (10^24)        @tab | @tab Yi @tab yobibyte (2^80)
-@end multitable
-

@node Common options
@chapter Common options
@ -132,7 +119,8 @@ Table of SI and binary prefixes (unit multipliers):
 The following
@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
 are available in all the utilities. Rather than writing identical
-descriptions for each of the programs, they are described here.
+descriptions for each of the programs, they are described here. Remember to
+prepend @file{./} to any file name beginning with a hyphen, or use @samp{--}.
@ifnothtml
@xref{Argument syntax,,,arg_parser}.
@end ifnothtml
@ -209,6 +197,26 @@ It must return 0 if no errors occurred, and a non-zero value otherwise.

@end table

+Numbers given as arguments to options may be expressed in decimal,
+hexadecimal, or octal (using the same syntax as integer constants in C++),
+and may be followed by a multiplier and an optional @samp{B} for "byte".
+
+Table of SI and binary prefixes (unit multipliers):
+
+@multitable {Prefix} {kilobyte   (10^3 = 1000)} {|} {Prefix} {kibibyte  (2^10 = 1024)}
+@item Prefix @tab Value               @tab | @tab Prefix @tab Value
+@item k @tab kilobyte   (10^3 = 1000) @tab | @tab Ki @tab kibibyte  (2^10 = 1024)
+@item M @tab megabyte   (10^6)        @tab | @tab Mi @tab mebibyte  (2^20)
+@item G @tab gigabyte   (10^9)        @tab | @tab Gi @tab gibibyte  (2^30)
+@item T @tab terabyte   (10^12)       @tab | @tab Ti @tab tebibyte  (2^40)
+@item P @tab petabyte   (10^15)       @tab | @tab Pi @tab pebibyte  (2^50)
+@item E @tab exabyte    (10^18)       @tab | @tab Ei @tab exbibyte  (2^60)
+@item Z @tab zettabyte  (10^21)       @tab | @tab Zi @tab zebibyte  (2^70)
+@item Y @tab yottabyte  (10^24)       @tab | @tab Yi @tab yobibyte  (2^80)
+@item R @tab ronnabyte  (10^27)       @tab | @tab Ri @tab robibyte  (2^90)
+@item Q @tab quettabyte (10^30)       @tab | @tab Qi @tab quebibyte (2^100)
+@end multitable
+

@node Configuration
@chapter The configuration file 'zutils.conf'
@ -249,8 +257,9 @@ where <format> is one of @samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, or
 sequence. If any file given is compressed, its decompressed content is
 copied. If a file given does not exist, and its name does not end with one
 of the known extensions, @command{zcat} tries the compressed file names
-corresponding to the formats supported. If a file fails to decompress,
-@command{zcat} continues copying the rest of the files.
+corresponding to the formats supported until one is found.
+@xref{search-order}. If a file fails to decompress, @command{zcat} continues
+copying the rest of the files.

 If a file is specified as @samp{-}, data are read from standard input,
 decompressed if needed, and sent to standard output. Data read from
@ -297,8 +306,8 @@ Number all output lines, starting with 1. The line count is unlimited.
 Force the compressed format given. Valid values for @var{format} are
@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, @samp{zst}, and @samp{un} for
@samp{uncompressed}. If this option is used, the files are passed to the
-corresponding decompressor (or transmitted unmodified) without verifying
-their format, and the exact file name must be given. Other names won't be
+corresponding decompressor (or transmitted unmodified) without checking
+their format, and the exact file name must be given. Other names are not
 tried.

@item -q
@ -360,17 +369,10 @@ zcmp [@var{options}] @var{file1} [@var{file2}]
@noindent
 This compares @var{file1} to @var{file2}. The standard input is used only if
@var{file1} or @var{file2} refers to standard input. If @var{file2} is
-omitted @command{zcmp} tries the following:
-
-@itemize -
-@item
-If @var{file1} is compressed, compares its decompressed contents with
-the corresponding uncompressed file (the name of @var{file1} with the
-extension removed).
-@item
-If @var{file1} is uncompressed, compares it with the decompressed
-contents of @var{file1}.[lz|bz2|gz|zst|xz] (the first one that is found).
-@end itemize
+omitted @command{zcmp} tries to compare @var{file1} with the corresponding
+uncompressed file (if @var{file1} is compressed), and then with the
+corresponding compressed files of the remaining formats until one is found.
+@xref{search-order}.

@noindent
 An exit status of 0 means no differences were found, 1 means some
@ -409,14 +411,14 @@ Compare at most @var{count} input bytes.

@item -O [@var{format1}][,@var{format2}]
@itemx --force-format=[@var{format1}][,@var{format2}]
-Force the compressed formats given. Any of @var{format1} or @var{format2}
-may be omitted and the corresponding format will be automatically detected.
-Valid values for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz},
-@samp{xz}, @samp{zst}, and @samp{un} for @samp{uncompressed}. If at least
-one format is specified with this option, the file is passed to the
-corresponding decompressor (or transmitted unmodified) without verifying its
-format, and the exact file names of both @var{file1} and @var{file2} must be
-given. Other names won't be tried.
+Force the compressed formats given. If @var{format1} or @var{format2} is
+omitted, the corresponding format is automatically detected. Valid values
+for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz},
+@samp{zst}, and @samp{un} for @samp{uncompressed}. If at least one format is
+specified with this option, the file is passed to the corresponding
+decompressor (or transmitted unmodified) without checking its format, and
+the exact file names of both @var{file1} and @var{file2} must be given.
+Other names are not tried.

@item -q
@itemx --quiet
@ -441,24 +443,6 @@ the verbosity level. @xref{version}.

@end table

-Byte counts given as arguments to options may be expressed in decimal,
-hexadecimal, or octal (using the same syntax as integer constants in C++),
-and may be followed by a multiplier and an optional @samp{B} for "byte".
-
-Table of SI and binary prefixes (unit multipliers):
-
-@multitable {Prefix} {kilobyte  (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
-@item Prefix @tab Value               @tab | @tab Prefix @tab Value
-@item k @tab kilobyte  (10^3 = 1000)  @tab | @tab Ki @tab kibibyte (2^10 = 1024)
-@item M @tab megabyte  (10^6)         @tab | @tab Mi @tab mebibyte (2^20)
-@item G @tab gigabyte  (10^9)         @tab | @tab Gi @tab gibibyte (2^30)
-@item T @tab terabyte  (10^12)        @tab | @tab Ti @tab tebibyte (2^40)
-@item P @tab petabyte  (10^15)        @tab | @tab Pi @tab pebibyte (2^50)
-@item E @tab exabyte   (10^18)        @tab | @tab Ei @tab exbibyte (2^60)
-@item Z @tab zettabyte (10^21)        @tab | @tab Zi @tab zebibyte (2^70)
-@item Y @tab yottabyte (10^24)        @tab | @tab Yi @tab yobibyte (2^80)
-@end multitable
-

@node Zdiff
@chapter Zdiff
@ -480,17 +464,10 @@ zdiff [@var{options}] @var{file1} [@var{file2}]
@noindent
 This compares @var{file1} to @var{file2}. The standard input is used only if
@var{file1} or @var{file2} refers to standard input. If @var{file2} is
-omitted @command{zdiff} tries the following:
-
-@itemize -
-@item
-If @var{file1} is compressed, compares its decompressed contents with
-the corresponding uncompressed file (the name of @var{file1} with the
-extension removed).
-@item
-If @var{file1} is uncompressed, compares it with the decompressed
-contents of @var{file1}.[lz|bz2|gz|zst|xz] (the first one that is found).
-@end itemize
+omitted @command{zdiff} tries to compare @var{file1} with the corresponding
+uncompressed file (if @var{file1} is compressed), and then with the
+corresponding compressed files of the remaining formats until one is found.
+@xref{search-order}.

@noindent
 An exit status of 0 means no differences were found, 1 means some
@ -529,18 +506,18 @@ Ignore changes due to tab expansion.

@item -i
@itemx --ignore-case
-Ignore case differences in file contents.
+Ignore case differences. Consider uppercase and lowercase letters equivalent.

@item -O [@var{format1}][,@var{format2}]
@itemx --force-format=[@var{format1}][,@var{format2}]
-Force the compressed formats given. Any of @var{format1} or @var{format2}
-may be omitted and the corresponding format will be automatically detected.
-Valid values for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz},
-@samp{xz}, @samp{zst}, and @samp{un} for @samp{uncompressed}. If at least
-one format is specified with this option, the file is passed to the
-corresponding decompressor (or transmitted unmodified) without verifying its
-format, and the exact file names of both @var{file1} and @var{file2} must be
-given. Other names won't be tried.
+Force the compressed formats given. If @var{format1} or @var{format2} is
+omitted, the corresponding format is automatically detected. Valid values
+for @var{format} are @samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz},
+@samp{zst}, and @samp{un} for @samp{uncompressed}. If at least one format is
+specified with this option, the file is passed to the corresponding
+decompressor (or transmitted unmodified) without checking its format, and
+the exact file names of both @var{file1} and @var{file2} must be given.
+Other names are not tried.

@item -p
@itemx --show-c-function
@ -599,13 +576,12 @@ search on any combination of compressed and uncompressed files. If any file
 given is compressed, its decompressed content is used. If a file given does
 not exist, and its name does not end with one of the known extensions,
@command{zgrep} tries the compressed file names corresponding to the formats
-supported. If a file fails to decompress, @command{zgrep} continues
-searching the rest of the files.
+supported until one is found. @xref{search-order}. If a file fails to
+decompress, @command{zgrep} continues searching the rest of the files.

 If a file is specified as @samp{-}, data are read from standard input,
-decompressed if needed, and fed to grep. Data read from standard input
-must be of the same type; all uncompressed or all in the same
-compressed format.
+decompressed if needed, and fed to grep. Data read from standard input must
+be of the same type; all uncompressed or all in the same compressed format.

 If no files are specified, recursive searches examine the current working
 directory, and nonrecursive searches read standard input.
@ -738,8 +714,8 @@ Show only the part of matching lines that actually matches @var{pattern}.
 Force the compressed format given. Valid values for @var{format} are
@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, @samp{zst}, and @samp{un} for
@samp{uncompressed}. If this option is used, the files are passed to the
-corresponding decompressor (or transmitted unmodified) without verifying
-their format, and the exact file name must be given. Other names won't be
+corresponding decompressor (or transmitted unmodified) without checking
+their format, and the exact file name must be given. Other names are not
 tried.

@item -P
@ -809,14 +785,14 @@ unusual characters like newlines.
@chapter Ztest
@cindex ztest

-@command{ztest} verifies the integrity of the compressed files specified. It
+@command{ztest} checks the integrity of the compressed files specified. It
 also warns if an uncompressed file has a compressed file name extension, or
 if a compressed file has a wrong compressed extension. Uncompressed files
 are otherwise ignored. If a file is specified as @samp{-}, the integrity of
-compressed data read from standard input is verified. Data read from
+compressed data read from standard input is checked. Data read from
 standard input must be all in the same compressed format. If a file fails to
 decompress, does not exist, can't be opened, or is a terminal, @command{ztest}
-continues verifying the rest of the files. A final diagnostic is shown at
+continues testing the rest of the files. A final diagnostic is shown at
 verbosity level 1 or higher if any file fails the test when testing multiple
 files.

@ -827,14 +803,14 @@ Bzip2, gzip, and lzip are the primary formats. Xz and zstd are optional. If
 the decompressor for the xz or zstd formats is not found, the corresponding
 files are ignored.

-Note that error detection in the xz format is broken. First, some xz
-files lack integrity information. Second, not all xz decompressors can
-@uref{http://www.nongnu.org/lzip/xz_inadequate.html#fragmented,,verify the integrity}
+Note that error detection in the xz format is broken. First, some xz files
+lack integrity information. Second, not all xz decompressors can
+@uref{http://www.nongnu.org/lzip/xz_inadequate.html#fragmented,,check the integrity}
 of all xz files. Third, section 2.1.1.2 'Stream Flags' of the
@uref{http://tukaani.org/xz/xz-file-format.txt,,xz format specification}
 allows xz decompressors to produce garbage output without issuing any
-warning. Therefore, xz files can't always be verified as reliably as
-files in the other formats can.
+warning. Therefore, xz files can't always be checked as reliably as files in
+the other formats can.
@c We can only hope that xz is soon abandoned.

 The format for running @command{ztest} is:
@ -844,8 +820,8 @@ ztest [@var{options}] [@var{files}]
@end example

@noindent
-Exit status is 0 if all compressed files verify OK, 1 if environmental
-problems (file not found, invalid command line options, I/O errors, etc),
+Exit status is 0 if all compressed files check OK, 1 if environmental
+problems (file not found, invalid command-line options, I/O errors, etc),
 2 if any compressed file is corrupt or invalid, or if any file has an
 incorrect file name extension.

@ -857,8 +833,8 @@ incorrect file name extension.
 Force the compressed format given. Valid values for @var{format} are
@samp{bz2}, @samp{gz}, @samp{lz}, @samp{xz}, and @samp{zst}. If this option
 is used, the files are passed to the corresponding decompressor without
-verifying their format, and any files in a format that the decompressor
-can't understand will fail.
+checking their format, and any files in a format that the decompressor can't
+understand fail the test.

@item -q
@itemx --quiet
@ -877,7 +853,7 @@ recursively, following all symbolic links.

@item -v
@itemx --verbose
-Verbose mode. Show the verify status for each file processed. Further -v's
+Verbose mode. Show the check status for each file processed. Further -v's
 increase the verbosity level. @xref{version}.

@end table
@ -894,21 +870,21 @@ recompressed, other files are ignored. Compressed files are decompressed and
 then recompressed on the fly; no temporary files are created. If an error
 happens while recompressing a file, @command{zupdate} exits immediately
 without recompressing the rest of the files. The lzip format is chosen as
-destination because it is the most appropriate for long-term data archiving.
+destination because it is the most appropriate for long-term archiving.

 If no files are specified, recursive searches examine the current working
 directory, and nonrecursive searches do nothing.

-If the lzip compressed version of a file already exists, the file is skipped
+If the lzip-compressed version of a file already exists, the file is skipped
 unless the option @option{--force} is given. In this case, if the comparison
 with the existing lzip version fails, an error is returned and the original
 file is not deleted. The operation of @command{zupdate} is meant to be safe
-and not cause any data loss. Therefore, existing lzip compressed files are
+and not cause any data loss. Therefore, existing lzip-compressed files are
 never overwritten nor deleted.

 Combining the options @option{--force} and @option{--keep}, as in
-@w{@samp{zupdate -f -k *.gz}}, verifies that there are no differences
-between each pair of files in a multiformat set of files.
+@w{@samp{zupdate -f -k *.gz}}, checks that there are no differences between
+each pair of files in a multiformat set of files.

 The names of the original files must have one of the following extensions:@*
@samp{.bz2}, @samp{.gz}, @samp{.xz}, @samp{.zst}, or @samp{.Z}, which are
@ -938,7 +914,7 @@ zupdate [@var{options}] [@var{files}]
 Exit status is 0 if all the compressed files were successfully recompressed
 (if needed), compared, and deleted (if requested). 1 if a non-fatal error
 occurred (file not found or not regular, or has invalid format, or can't be
-deleted). 2 if a fatal error occurred (invalid command line options,
+deleted). 2 if a fatal error occurred (invalid command-line options,
 compressor can't be run, or comparison fails).

@command{zupdate} supports the following options:
@ -968,10 +944,10 @@ Expand combined file name extensions; recompress @samp{.tbz}, @samp{.tbz2},

@item -f
@itemx --force
-Don't skip a file for which a lzip compressed version already exists.
-@option{--force} compares the content of the input file with the content
-of the existing lzip file and deletes the input file if both contents
-are identical.
+Don't skip a file for which a lzip-compressed version already exists.
+@option{--force} compares the content of the input file with the content of
+the existing lzip file and deletes the input file if both contents are
+identical.

@item -i
@itemx --ignore-errors