Adding upstream version 0.23.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
22f7f3575c
commit
9a8733dd3b
39 changed files with 2213 additions and 1444 deletions
563
doc/tarlz.info
563
doc/tarlz.info
|
@ -1,6 +1,6 @@
|
|||
This is tarlz.info, produced by makeinfo version 4.13+ from tarlz.texi.
|
||||
|
||||
INFO-DIR-SECTION Data Compression
|
||||
INFO-DIR-SECTION Archiving
|
||||
START-INFO-DIR-ENTRY
|
||||
* Tarlz: (tarlz). Archiver with multimember lzip compression
|
||||
END-INFO-DIR-ENTRY
|
||||
|
@ -11,7 +11,7 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.22, 5 January 2022).
|
||||
This manual is for Tarlz (version 0.23, 23 September 2022).
|
||||
|
||||
* Menu:
|
||||
|
||||
|
@ -69,9 +69,9 @@ archive, but it has the following advantages:
|
|||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel, multiplying the decompression speed.
|
||||
|
||||
* New members can be appended to the archive (by removing the EOF
|
||||
member), and unwanted members can be deleted from the archive. Just
|
||||
like an uncompressed tar archive.
|
||||
* New members can be appended to the archive (by removing the
|
||||
end-of-archive member), and unwanted members can be deleted from the
|
||||
archive. Just like an uncompressed tar archive.
|
||||
|
||||
* It is a safe POSIX-style backup format. In case of corruption, tarlz
|
||||
can extract all the undamaged members from the tar.lz archive,
|
||||
|
@ -99,19 +99,24 @@ File: tarlz.info, Node: Invoking tarlz, Next: Portable character set, Prev: I
|
|||
|
||||
The format for running tarlz is:
|
||||
|
||||
tarlz [OPTIONS] [FILES]
|
||||
tarlz OPERATION [OPTIONS] [FILES]
|
||||
|
||||
All operations except '--concatenate' and '--compress' operate on whole
|
||||
trees if any FILE is a directory. All operations except '--compress'
|
||||
overwrite output files without warning.
|
||||
overwrite output files without warning. If no archive is specified, tarlz
|
||||
tries to read it from standard input or write it to standard output. Tarlz
|
||||
refuses to read archive data from a terminal or write archive data to a
|
||||
terminal. Tarlz detects when the archive being created or enlarged is among
|
||||
the files to be archived, appended, or concatenated, and skips it.
|
||||
|
||||
On archive creation or appending tarlz archives the files specified, but
|
||||
removes from member names any leading and trailing slashes and any file name
|
||||
prefixes containing a '..' component. On extraction, leading and trailing
|
||||
slashes are also removed from member names, and archive members containing
|
||||
a '..' component in the file name are skipped. Tarlz detects when the
|
||||
archive being created or enlarged is among the files to be dumped, appended
|
||||
or concatenated, and skips it.
|
||||
Tarlz does not use absolute file names nor file names above the current
|
||||
working directory (perhaps changed by option '-C'). On archive creation or
|
||||
appending tarlz archives the files specified, but removes from member names
|
||||
any leading and trailing slashes and any file name prefixes containing a
|
||||
'..' component. On extraction, leading and trailing slashes are also
|
||||
removed from member names, and archive members containing a '..' component
|
||||
in the file name are skipped. Tarlz does not follow symbolic links during
|
||||
extraction; not even symbolic links replacing intermediate directories.
|
||||
|
||||
On extraction and listing, tarlz removes leading './' strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
|
@ -122,8 +127,7 @@ member names in the archive or given in the command line, so that
|
|||
setting is used. For example '-9 --solid --uncompressed -1' is equivalent
|
||||
to '-1 --solid'.
|
||||
|
||||
tarlz supports the following options: *Note Argument syntax:
|
||||
(arg_parser)Argument syntax.
|
||||
tarlz supports the following operations:
|
||||
|
||||
'--help'
|
||||
Print an informative help message describing the options and exit.
|
||||
|
@ -140,39 +144,22 @@ to '-1 --solid'.
|
|||
standard output. All the archives involved must be regular (seekable)
|
||||
files, and must be either all compressed or all uncompressed.
|
||||
Compressed and uncompressed archives can't be mixed. Compressed
|
||||
archives must be multimember lzip files with the two end-of-file
|
||||
archives must be multimember lzip files with the two end-of-archive
|
||||
blocks plus any zero padding contained in the last lzip member of each
|
||||
archive. The intermediate end-of-file blocks are removed as each new
|
||||
archive is concatenated. If the archive is uncompressed, tarlz parses
|
||||
and skips tar headers until it finds the end-of-file blocks. Exit with
|
||||
archive. The intermediate end-of-archive blocks are removed as each
|
||||
new archive is concatenated. If the archive is uncompressed, tarlz
|
||||
parses tar headers until it finds the end-of-archive blocks. Exit with
|
||||
status 0 without modifying the archive if no FILES have been specified.
|
||||
|
||||
'-B BYTES'
|
||||
'--data-size=BYTES'
|
||||
Set target size of input data blocks for the option '--bsolid'. *Note
|
||||
--bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
|
||||
two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. *Note Minimum archive sizes::.
|
||||
Concatenating archives containing files in common results in two or
|
||||
more tar members with the same name in the resulting archive, which
|
||||
may produce nondeterministic behavior during multi-threaded extraction.
|
||||
*Note mt-extraction::.
|
||||
|
||||
'-c'
|
||||
'--create'
|
||||
Create a new archive from FILES.
|
||||
|
||||
'-C DIR'
|
||||
'--directory=DIR'
|
||||
Change to directory DIR. When creating or appending, the position of
|
||||
each '-C' option in the command line is significant; it will change the
|
||||
current working directory for the following FILES until a new '-C'
|
||||
option appears in the command line. When extracting or comparing, all
|
||||
the '-C' options are executed in sequence before reading the archive.
|
||||
Listing ignores any '-C' options specified. DIR is relative to the
|
||||
then current working directory, perhaps changed by a previous '-C'
|
||||
option.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create an archive if a '-C'
|
||||
option appears after a relative file name in the command line.
|
||||
|
||||
'-d'
|
||||
'--diff'
|
||||
Compare and report differences between archive and file system. For
|
||||
|
@ -188,10 +175,6 @@ to '-1 --solid'.
|
|||
on archive creation: 'tarlz -C / -d'. Alternatively, tarlz may be run
|
||||
from the root directory to perform the comparison.
|
||||
|
||||
'--ignore-ids'
|
||||
Make '--diff' ignore differences in owner and group IDs. This option is
|
||||
useful when comparing an '--anonymous' archive.
|
||||
|
||||
'--delete'
|
||||
Delete files and directories from an archive in place. It currently can
|
||||
delete only from uncompressed archives and from archives with files
|
||||
|
@ -210,12 +193,102 @@ to '-1 --solid'.
|
|||
be dangerous. A corrupt archive, a power cut, or an I/O error may cause
|
||||
data loss.
|
||||
|
||||
'--exclude=PATTERN'
|
||||
Exclude files matching a shell pattern like '*.o'. A file is considered
|
||||
to match if any component of the file name matches. For example, '*.o'
|
||||
matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a
|
||||
'/', it matches a corresponding '/' in the file name. For example,
|
||||
'foo/*.o' matches 'foo/bar.o'.
|
||||
'-r'
|
||||
'--append'
|
||||
Append files to the end of an archive. The archive must be a regular
|
||||
(seekable) file either compressed or uncompressed. Compressed members
|
||||
can't be appended to an uncompressed archive, nor vice versa. If the
|
||||
archive is compressed, it must be a multimember lzip file with the two
|
||||
end-of-archive blocks plus any zero padding contained in the last lzip
|
||||
member of the archive. It is possible to append files to an archive
|
||||
with a different compression granularity. Appending works as follows;
|
||||
first the end-of-archive blocks are removed, then the new members are
|
||||
appended, and finally two new end-of-archive blocks are appended to
|
||||
the archive. If the archive is uncompressed, tarlz parses and skips
|
||||
tar headers until it finds the end-of-archive blocks. Exit with status
|
||||
0 without modifying the archive if no FILES have been specified.
|
||||
|
||||
Appending files already present in the archive results in two or more
|
||||
tar members with the same name, which may produce nondeterministic
|
||||
behavior during multi-threaded extraction. *Note mt-extraction::.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
FILES given.
|
||||
|
||||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive. If FILES are given, extract only the
|
||||
FILES given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz removes files and empty
|
||||
directories unconditionally before extracting over them. Other than
|
||||
that, it will not make any special effort to extract a file over an
|
||||
incompatible type of file. For example, extracting a file over a
|
||||
non-empty directory will usually fail.
|
||||
|
||||
'-z'
|
||||
'--compress'
|
||||
Compress existing POSIX tar archives aligning the lzip members to the
|
||||
tar members with choice of granularity (--bsolid by default, --dsolid
|
||||
works like --asolid). The input archives are kept unchanged. Existing
|
||||
compressed archives are not overwritten. A hyphen '-' used as the name
|
||||
of an input archive reads from standard input and writes to standard
|
||||
output (unless the option '--output' is used). Tarlz can be used as
|
||||
compressor for GNU tar using a command like
|
||||
'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Note that tarlz only
|
||||
works reliably on archives without global headers, or with global
|
||||
headers whose content can be ignored.
|
||||
|
||||
The compression is reversible, including any garbage present after the
|
||||
end-of-archive blocks. Tarlz stops parsing after the first
|
||||
end-of-archive block is found, and then compresses the rest of the
|
||||
archive. Unless solid compression is requested, the end-of-archive
|
||||
blocks are compressed in a lzip member separated from the preceding
|
||||
members and from any non-zero garbage following the end-of-archive
|
||||
blocks. '--compress' implies plzip argument style, not tar style. Each
|
||||
input archive is compressed to a file with the extension '.lz' added
|
||||
unless the option '--output' is used. When '--output' is used, only
|
||||
one input archive can be specified. '-f' can't be used with
|
||||
'--compress'.
|
||||
|
||||
'--check-lib'
|
||||
Compare the version of lzlib used to compile tarlz with the version
|
||||
actually being used at run time and exit. Report any differences
|
||||
found. Exit with error status 1 if differences are found. A mismatch
|
||||
may indicate that lzlib is not correctly installed or that a different
|
||||
version of lzlib has been installed after compiling tarlz. Exit with
|
||||
error status 2 if LZ_API_VERSION and LZ_version_string don't match.
|
||||
'tarlz -v --check-lib' shows the version of lzlib being used and the
|
||||
value of LZ_API_VERSION (if defined). *Note Library version:
|
||||
(lzlib)Library version.
|
||||
|
||||
|
||||
tarlz supports the following options: *Note Argument syntax:
|
||||
(arg_parser)Argument syntax.
|
||||
|
||||
'-B BYTES'
|
||||
'--data-size=BYTES'
|
||||
Set target size of input data blocks for the option '--bsolid'. *Note
|
||||
--bsolid::. Valid values range from 8 KiB to 1 GiB. Default value is
|
||||
two times the dictionary size, except for option '-0' where it
|
||||
defaults to 1 MiB. *Note Minimum archive sizes::.
|
||||
|
||||
'-C DIR'
|
||||
'--directory=DIR'
|
||||
Change to directory DIR. When creating or appending, the position of
|
||||
each '-C' option in the command line is significant; it will change the
|
||||
current working directory for the following FILES until a new '-C'
|
||||
option appears in the command line. When extracting or comparing, all
|
||||
the '-C' options are executed in sequence before reading the archive.
|
||||
Listing ignores any '-C' options specified. DIR is relative to the
|
||||
then current working directory, perhaps changed by a previous '-C'
|
||||
option.
|
||||
|
||||
Note that a process can only have one current working directory (CWD).
|
||||
Therefore multi-threading can't be used to create an archive if a '-C'
|
||||
option appears after a relative file name in the command line.
|
||||
|
||||
'-f ARCHIVE'
|
||||
'--file=ARCHIVE'
|
||||
|
@ -228,14 +301,6 @@ to '-1 --solid'.
|
|||
Archive or compare the files they point to instead of the links
|
||||
themselves.
|
||||
|
||||
'--mtime=DATE'
|
||||
When creating or appending, use DATE as the modification time for
|
||||
files added to the archive instead of their actual modification times.
|
||||
The value of DATE may be either '@' followed by the number of seconds
|
||||
since the epoch, or a date in format 'YYYY-MM-DD HH:MM:SS', or the
|
||||
name of an existing file starting with '.' or '/'. In the latter case,
|
||||
the modification time of that file is used.
|
||||
|
||||
'-n N'
|
||||
'--threads=N'
|
||||
Set the number of (de)compression threads, overriding the system's
|
||||
|
@ -268,65 +333,11 @@ to '-1 --solid'.
|
|||
'--quiet'
|
||||
Quiet operation. Suppress all messages.
|
||||
|
||||
'-r'
|
||||
'--append'
|
||||
Append files to the end of an archive. The archive must be a regular
|
||||
(seekable) file either compressed or uncompressed. Compressed members
|
||||
can't be appended to an uncompressed archive, nor vice versa. If the
|
||||
archive is compressed, it must be a multimember lzip file with the two
|
||||
end-of-file blocks plus any zero padding contained in the last lzip
|
||||
member of the archive. It is possible to append files to an archive
|
||||
with a different compression granularity. Appending works as follows;
|
||||
first the end-of-file blocks are removed, then the new members are
|
||||
appended, and finally two new end-of-file blocks are appended to the
|
||||
archive. If the archive is uncompressed, tarlz parses and skips tar
|
||||
headers until it finds the end-of-file blocks. Exit with status 0
|
||||
without modifying the archive if no FILES have been specified.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
FILES given.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
Verbosely list files processed. Further -v's (up to 4) increase the
|
||||
verbosity level.
|
||||
|
||||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive. If FILES are given, extract only the
|
||||
FILES given. Else extract all the files in the archive. To extract a
|
||||
directory without extracting the files under it, use
|
||||
'tarlz -xf foo --exclude='dir/*' dir'. Tarlz will not make any special
|
||||
effort to extract a file over an incompatible type of file. For
|
||||
example, extracting a link over a directory will usually fail.
|
||||
(Principle of least surprise).
|
||||
|
||||
'-z'
|
||||
'--compress'
|
||||
Compress existing POSIX tar archives aligning the lzip members to the
|
||||
tar members with choice of granularity (--bsolid by default, --dsolid
|
||||
works like --asolid). The input archives are kept unchanged. Existing
|
||||
compressed archives are not overwritten. A hyphen '-' used as the name
|
||||
of an input archive reads from standard input and writes to standard
|
||||
output (unless the option '--output' is used). Tarlz can be used as
|
||||
compressor for GNU tar using a command like
|
||||
'tar -c -Hustar foo | tarlz -z -o foo.tar.lz'. Note that tarlz only
|
||||
works reliably on archives without global headers, or with global
|
||||
headers whose content can be ignored.
|
||||
|
||||
The compression is reversible, including any garbage present after the
|
||||
EOF blocks. Tarlz stops parsing after the first EOF block is found,
|
||||
and then compresses the rest of the archive. Unless solid compression
|
||||
is requested, the EOF blocks are compressed in a lzip member separated
|
||||
from the preceding members and from any non-zero garbage following the
|
||||
EOF blocks. '--compress' implies plzip argument style, not tar style.
|
||||
Each input archive is compressed to a file with the extension '.lz'
|
||||
added unless the option '--output' is used. When '--output' is used,
|
||||
only one input archive can be specified. '-f' can't be used with
|
||||
'--compress'.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression level for '--create', '--append', and
|
||||
'--compress'. The default compression level is '-6'. Like lzip, tarlz
|
||||
|
@ -354,8 +365,8 @@ to '-1 --solid'.
|
|||
'--asolid'
|
||||
When creating or appending to a compressed archive, use appendable
|
||||
solid compression. All the files being added to the archive are
|
||||
compressed into a single lzip member, but the end-of-file blocks are
|
||||
compressed into a separate lzip member. This creates a solidly
|
||||
compressed into a single lzip member, but the end-of-archive blocks
|
||||
are compressed into a separate lzip member. This creates a solidly
|
||||
compressed appendable archive. Solid archives can't be created nor
|
||||
decoded in parallel.
|
||||
|
||||
|
@ -375,20 +386,20 @@ to '-1 --solid'.
|
|||
When creating or appending to a compressed archive, compress each file
|
||||
specified in the command line separately in its own lzip member, and
|
||||
use solid compression for each directory specified in the command
|
||||
line. The end-of-file blocks are compressed into a separate lzip
|
||||
line. The end-of-archive blocks are compressed into a separate lzip
|
||||
member. This creates a compressed appendable archive with a separate
|
||||
lzip member for each file or top-level directory specified.
|
||||
|
||||
'--no-solid'
|
||||
When creating or appending to a compressed archive, compress each file
|
||||
separately in its own lzip member. The end-of-file blocks are
|
||||
separately in its own lzip member. The end-of-archive blocks are
|
||||
compressed into a separate lzip member. This creates a compressed
|
||||
appendable archive with a lzip member for each file.
|
||||
|
||||
'--solid'
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
end-of-archive blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later
|
||||
appended to the archive. Solid archives can't be created nor decoded
|
||||
in parallel.
|
||||
|
@ -406,22 +417,50 @@ to '-1 --solid'.
|
|||
If GROUP is not a valid group name, it is decoded as a decimal numeric
|
||||
group ID.
|
||||
|
||||
'--exclude=PATTERN'
|
||||
Exclude files matching a shell pattern like '*.o'. A file is considered
|
||||
to match if any component of the file name matches. For example, '*.o'
|
||||
matches 'foo.o', 'foo.o/bar' and 'foo/bar.o'. If PATTERN contains a
|
||||
'/', it matches a corresponding '/' in the file name. For example,
|
||||
'foo/*.o' matches 'foo/bar.o'. Multiple '--exclude' options can be
|
||||
specified.
|
||||
|
||||
'--ignore-ids'
|
||||
Make '--diff' ignore differences in owner and group IDs. This option is
|
||||
useful when comparing an '--anonymous' archive.
|
||||
|
||||
'--ignore-overflow'
|
||||
Make '--diff' ignore differences in mtime caused by overflow on 32-bit
|
||||
systems with a 32-bit time_t.
|
||||
|
||||
'--keep-damaged'
|
||||
Don't delete partially extracted files. If a decompression error
|
||||
happens while extracting a file, keep the partial data extracted. Use
|
||||
this option to recover as much data as possible from each damaged
|
||||
member. It is recommended to run tarlz in single-threaded mode
|
||||
(-threads=0) when using this option.
|
||||
(--threads=0) when using this option.
|
||||
|
||||
'--missing-crc'
|
||||
Exit with error status 2 if the CRC of the extended records is missing.
|
||||
When this option is used, tarlz detects any corruption in the extended
|
||||
records (only limited by CRC collisions). But note that a corrupt
|
||||
'GNU.crc32' keyword, for example 'GNU.crc33', is reported as a missing
|
||||
CRC instead of as a corrupt record. This misleading 'Missing CRC'
|
||||
message is the consequence of a flaw in the POSIX pax format; i.e.,
|
||||
the lack of a mandatory check sequence in the extended records. *Note
|
||||
crc32::.
|
||||
Exit with error status 2 if the CRC of the extended records is
|
||||
missing. When this option is used, tarlz detects any corruption in the
|
||||
extended records (only limited by CRC collisions). But note that a
|
||||
corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is reported as a
|
||||
missing CRC instead of as a corrupt record. This misleading
|
||||
'Missing CRC' message is the consequence of a flaw in the POSIX pax
|
||||
format; i.e., the lack of a mandatory check sequence of the extended
|
||||
records. *Note crc32::.
|
||||
|
||||
'--mtime=DATE'
|
||||
When creating or appending, use DATE as the modification time for
|
||||
files added to the archive instead of their actual modification times.
|
||||
The value of DATE may be either '@' followed by the number of seconds
|
||||
since (or before) the epoch, or a date in format
|
||||
'[-]YYYY-MM-DD HH:MM:SS' or '[-]YYYY-MM-DDTHH:MM:SS', or the name of
|
||||
an existing reference file starting with '.' or '/' whose modification
|
||||
time is used. The time of day 'HH:MM:SS' in the date format is
|
||||
optional and defaults to '00:00:00'. The epoch is
|
||||
'1970-01-01 00:00:00 UTC'. Negative seconds or years define a
|
||||
modification time before the epoch.
|
||||
|
||||
'--out-slots=N'
|
||||
Number of 1 MiB output packets buffered per worker thread during
|
||||
|
@ -431,17 +470,6 @@ to '-1 --solid'.
|
|||
more memory. Valid values range from 1 to 1024. The default value is
|
||||
64.
|
||||
|
||||
'--check-lib'
|
||||
Compare the version of lzlib used to compile tarlz with the version
|
||||
actually being used at run time and exit. Report any differences
|
||||
found. Exit with error status 1 if differences are found. A mismatch
|
||||
may indicate that lzlib is not correctly installed or that a different
|
||||
version of lzlib has been installed after compiling tarlz. Exit with
|
||||
error status 2 if LZ_API_VERSION and LZ_version_string don't match.
|
||||
'tarlz -v --check-lib' shows the version of lzlib being used and the
|
||||
value of LZ_API_VERSION (if defined). *Note Library version:
|
||||
(lzlib)Library version.
|
||||
|
||||
'--warn-newer'
|
||||
During archive creation, warn if any file being archived has a
|
||||
modification time newer than the archive creation time. This option
|
||||
|
@ -453,9 +481,9 @@ to '-1 --solid'.
|
|||
|
||||
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file not
|
||||
found, files differ, invalid flags, I/O errors, etc), 2 to indicate a
|
||||
corrupt or invalid input file, 3 for an internal consistency error (e.g.
|
||||
bug) which caused tarlz to panic.
|
||||
found, files differ, invalid command line options, I/O errors, etc), 2 to
|
||||
indicate a corrupt or invalid input file, 3 for an internal consistency
|
||||
error (e.g., bug) which caused tarlz to panic.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Portable character set, Next: File format, Prev: Invoking tarlz, Up: Top
|
||||
|
@ -473,9 +501,7 @@ The set of characters from which portable file names are constructed.
|
|||
characters, respectively.
|
||||
|
||||
File names are identifiers. Therefore, archiving works better when file
|
||||
names use only the portable character set without spaces added. Unicode is
|
||||
for human consumption. It should be avoided in computing environments,
|
||||
specially in file names. *Note why not Unicode: (moe)why not Unicode.
|
||||
names use only the portable character set without spaces added.
|
||||
|
||||
|
||||
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Portable character set, Up: Top
|
||||
|
@ -512,10 +538,11 @@ limitations of the ustar format.
|
|||
Each tar member contains one file archived, and is represented by the
|
||||
following sequence:
|
||||
|
||||
* An optional extended header block with extended header records. This
|
||||
header block is of the form described in pax header block, with a
|
||||
typeflag value of 'x'. The extended header records are included as the
|
||||
data for this header block.
|
||||
* An optional extended header block followed by one or more blocks that
|
||||
contain the extended header records as if they were the contents of a
|
||||
file; i.e., the extended header records are included as the data for
|
||||
this header block. This header block is of the form described in pax
|
||||
header block, with a typeflag value of 'x'.
|
||||
|
||||
* A header block in ustar format that describes the file. Any fields
|
||||
defined in the preceding optional extended header records override the
|
||||
|
@ -529,9 +556,11 @@ split over two or more lzip members, the archive must be decoded
|
|||
sequentially. *Note Multi-threaded decoding::.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOF blocks
|
||||
are either compressed in a separate lzip member or compressed along with
|
||||
the tar members contained in the last lzip member.
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOA blocks
|
||||
are either compressed in a separate lzip member or compressed along with the
|
||||
tar members contained in the last lzip member. For a compressed archive to
|
||||
be recognized by tarlz as appendable, the last lzip member must contain
|
||||
between 512 and 32256 zeros alone.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
|
@ -540,7 +569,7 @@ compression is used: *Note File format: (lzip)File format.
|
|||
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
| header | data | extended header | extended data | header | data | EOF |
|
||||
| header | data | extended header | extended data | header | data | EOA |
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
||||
tar.lz
|
||||
|
@ -572,25 +601,57 @@ space, equal-sign, and newline.
|
|||
|
||||
These are the <keyword> values currently supported by tarlz:
|
||||
|
||||
'atime'
|
||||
The signed decimal representation of the access time of the following
|
||||
file in seconds since (or before) the epoch, obtained from the function
|
||||
'stat'. The atime record is created only for files with a modification
|
||||
time outside of the ustar range. *Note ustar-mtime::.
|
||||
|
||||
'gid'
|
||||
The unsigned decimal representation of the group ID of the group that
|
||||
owns the following file. The gid record is created only for files with
|
||||
a group ID greater than 2_097_151 (octal 7777777). *Note
|
||||
ustar-uid-gid::.
|
||||
|
||||
'linkpath'
|
||||
The pathname of a link being created to another file, of any type,
|
||||
The file name of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the field 'linkname' in the
|
||||
following ustar header block. The following ustar header block
|
||||
determines the type of link created. If typeflag of the following
|
||||
header block is 1, it will be a hard link. If typeflag is 2, it will
|
||||
be a symbolic link and the linkpath value will be used as the contents
|
||||
of the symbolic link.
|
||||
of the symbolic link. The linkpath record is created only for links
|
||||
with a link name that does not fit in the space provided by the ustar
|
||||
header.
|
||||
|
||||
'mtime'
|
||||
The signed decimal representation of the modification time of the
|
||||
following file in seconds since (or before) the epoch, obtained from
|
||||
the function 'stat'. This record overrides the field 'mtime' in the
|
||||
following ustar header block. The mtime record is created only for
|
||||
files with a modification time outside of the ustar range. *Note
|
||||
ustar-mtime::.
|
||||
|
||||
'path'
|
||||
The pathname of the following file. This record overrides the fields
|
||||
'name' and 'prefix' in the following ustar header block.
|
||||
The file name of the following file. This record overrides the fields
|
||||
'name' and 'prefix' in the following ustar header block. The path
|
||||
record is created for files with a name that does not fit in the space
|
||||
provided by the ustar header, but is also created for files that
|
||||
require any other extended record so that the fields 'name' and
|
||||
'prefix' in the following ustar header block can be zeroed.
|
||||
|
||||
'size'
|
||||
The size of the file in bytes, expressed as a decimal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard. This record
|
||||
overrides the size field in the following ustar header block. The size
|
||||
record is used only for files with a size value greater than
|
||||
8_589_934_591 (octal 77777777777). This is 2^33 bytes or larger.
|
||||
overrides the field 'size' in the following ustar header block. The
|
||||
size record is created only for files with a size value greater than
|
||||
8_589_934_591 (octal 77777777777); that is, 8 GiB (2^33 bytes) or
|
||||
larger.
|
||||
|
||||
'uid'
|
||||
The unsigned decimal representation of the user ID of the file owner
|
||||
of the following file. The uid record is created only for files with a
|
||||
user ID greater than 2_097_151 (octal 7777777). *Note ustar-uid-gid::.
|
||||
|
||||
'GNU.crc32'
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
|
||||
|
@ -643,18 +704,18 @@ and groups, tarlz will use the byte values in these names unmodified.
|
|||
character strings except when all characters in the array contain non-null
|
||||
characters including the last character.
|
||||
|
||||
The fields 'prefix' and 'name' produce the pathname of the file. A new
|
||||
pathname is formed, if prefix is not an empty string (its first character
|
||||
is not null), by concatenating prefix (up to the first null character), a
|
||||
slash character, and name; otherwise, name is used alone. In either case,
|
||||
name is terminated at the first null character. If prefix begins with a
|
||||
null character, it is ignored. In this manner, pathnames of at most 256
|
||||
characters can be supported. If a pathname does not fit in the space
|
||||
provided, an extended record is used to store the pathname.
|
||||
The fields 'name' and 'prefix' produce the file name. A new file name is
|
||||
formed, if prefix is not an empty string (its first character is not null),
|
||||
by concatenating prefix (up to the first null character), a slash
|
||||
character, and name; otherwise, name is used alone. In either case, name is
|
||||
terminated at the first null character. If prefix begins with a null
|
||||
character, it is ignored. In this manner, file names of at most 256
|
||||
characters can be supported. If a file name does not fit in the space
|
||||
provided, an extended record is used to store the file name.
|
||||
|
||||
The field 'linkname' does not use the prefix to produce a pathname. If
|
||||
the linkname does not fit in the 100 characters provided, an extended record
|
||||
is used to store the linkname.
|
||||
The field 'linkname' does not use the prefix to produce a file name. If
|
||||
the link name does not fit in the 100 characters provided, an extended
|
||||
record is used to store the link name.
|
||||
|
||||
The field 'mode' provides 12 access permission bits. The following table
|
||||
shows the symbolic name of each bit and its octal value:
|
||||
|
@ -667,7 +728,9 @@ S_IRGRP 00040 S_IWGRP 00020 S_IXGRP 00010
|
|||
S_IROTH 00004 S_IWOTH 00002 S_IXOTH 00001
|
||||
|
||||
The fields 'uid' and 'gid' are the user and group IDs of the owner and
|
||||
group of the file, respectively.
|
||||
group of the file, respectively. If the file uid or gid are greater than
|
||||
2_097_151 (octal 7777777), an extended record is used to store the uid or
|
||||
gid.
|
||||
|
||||
The field 'size' contains the octal representation of the size of the
|
||||
file in bytes. If the field 'typeflag' specifies a file of type '0'
|
||||
|
@ -680,7 +743,10 @@ header. If the file size is larger than 8_589_934_591 bytes
|
|||
|
||||
The field 'mtime' contains the octal representation of the modification
|
||||
time of the file at the time it was archived, obtained from the function
|
||||
'stat'.
|
||||
'stat'. If the modification time is negative or larger than 8_589_934_591
|
||||
(octal 77777777777) seconds since the epoch, an extended record is used to
|
||||
store the modification time. The ustar range of mtime goes from
|
||||
'1970-01-01 00:00:00 UTC' to '2242-03-16 12:56:31 UTC'.
|
||||
|
||||
The field 'chksum' contains the octal representation of the value of the
|
||||
simple sum of all bytes in the header logical record. Each byte in the
|
||||
|
@ -694,7 +760,8 @@ file archived:
|
|||
Regular file.
|
||||
|
||||
''1''
|
||||
Hard link to another file, of any type, previously archived.
|
||||
Hard link to another file, of any type, previously archived. Hard
|
||||
links must not contain file data.
|
||||
|
||||
''2''
|
||||
Symbolic link.
|
||||
|
@ -712,8 +779,8 @@ file archived:
|
|||
|
||||
''7''
|
||||
Reserved to represent a file to which an implementation has associated
|
||||
some high-performance attribute. Tarlz treats this type of file as a
|
||||
regular file (type 0).
|
||||
some high-performance attribute (contiguous file). Tarlz treats this
|
||||
type of file as a regular file (type 0).
|
||||
|
||||
|
||||
The field 'magic' contains the ASCII null-terminated string "ustar". The
|
||||
|
@ -735,9 +802,9 @@ Tarlz creates safe archives that allow the reliable detection of invalid or
|
|||
corrupt metadata during decoding even when the integrity checking of lzip
|
||||
can't be used because the lzip members are only decompressed partially, as
|
||||
it happens in parallel '--diff', '--list', and '--extract'. In order to
|
||||
achieve this goal, tarlz makes some changes to the variant of the pax
|
||||
format that it uses. This chapter describes these changes and the concrete
|
||||
reasons to implement them.
|
||||
achieve this goal and avoid some other flaws in the pax format, tarlz makes
|
||||
some changes to the variant of the pax format that it uses. This chapter
|
||||
describes these changes and the concrete reasons to implement them.
|
||||
|
||||
|
||||
5.1 Add a CRC of the extended records
|
||||
|
@ -775,45 +842,73 @@ In order to allow the extraction of pax archives by a tar utility conforming
|
|||
to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
|
||||
header field values that allow such tar to create a regular file containing
|
||||
the extended header records as data. This approach is broken because if the
|
||||
extended header is needed because of a long file name, the fields 'prefix'
|
||||
and 'name' will be unable to contain the full pathname of the file.
|
||||
Therefore the files corresponding to both the extended header and the
|
||||
overridden ustar header will be extracted using truncated file names,
|
||||
perhaps overwriting existing files or directories. It may be a security risk
|
||||
to extract a file with a truncated file name.
|
||||
extended header is needed because of a long file name, the fields 'name'
|
||||
and 'prefix' will be unable to contain the full file name. (Some tar
|
||||
implementations store the truncated name in the field 'name' alone,
|
||||
truncating the name to only 100 bytes instead of 256). Therefore the files
|
||||
corresponding to both the extended header and the overridden ustar header
|
||||
will be extracted using truncated file names, perhaps overwriting existing
|
||||
files or directories. It may be a security risk to extract a file with a
|
||||
truncated file name.
|
||||
|
||||
To avoid this problem, tarlz writes extended headers with all fields
|
||||
zeroed except size, chksum, typeflag, magic and version. This prevents old
|
||||
tar programs from extracting the extended records as a file in the wrong
|
||||
place. Tarlz also sets to zero those fields of the ustar header overridden
|
||||
by extended records. Finally, tarlz skips members without name when decoding
|
||||
except when listing. This is needed to detect certain format violations
|
||||
during parallel extraction.
|
||||
zeroed except 'size' (which contains the size of the extended records),
|
||||
'chksum', 'typeflag', 'magic', and 'version'. In particular, tarlz sets the
|
||||
fields 'name' and 'prefix' to zero. This prevents old tar programs from
|
||||
extracting the extended records as a file in the wrong place. Tarlz also
|
||||
sets to zero those fields of the ustar header overridden by extended
|
||||
records. Finally, tarlz skips members with zeroed 'name' and 'prefix' when
|
||||
decoding, except when listing. This is needed to detect certain format
|
||||
violations during parallel extraction.
|
||||
|
||||
If an extended header is required for any reason (for example a file size
|
||||
larger than 8 GiB or a link name longer than 100 bytes), tarlz moves the
|
||||
file name also to the extended header to prevent an ustar tool from trying
|
||||
to extract the file or link. This also makes easier during parallel decoding
|
||||
the detection of a tar member split between two lzip members at the boundary
|
||||
between the extended header and the ustar header.
|
||||
If an extended header is required for any reason (for example a file
|
||||
size of 8 GiB or larger, or a link name longer than 100 bytes), tarlz also
|
||||
moves the file name to the extended records to prevent an ustar tool from
|
||||
trying to extract the file or link. This also makes easier during parallel
|
||||
decoding the detection of a tar member split between two lzip members at
|
||||
the boundary between the extended header and the ustar header.
|
||||
|
||||
|
||||
5.3 As simple as possible (but not simpler)
|
||||
===========================================
|
||||
|
||||
The tarlz format is mainly ustar. Extended pax headers are used only when
|
||||
needed because the length of a file name or link name, or the size of a file
|
||||
exceed the limits of the ustar format. Adding 1 KiB of extended headers to
|
||||
each member just to record subsecond timestamps seems wasteful for a backup
|
||||
format. Moreover, minimizing the overhead may help recovering the archive
|
||||
with lziprecover in case of corruption.
|
||||
needed because the length of a file name or link name, or the size or other
|
||||
attribute of a file exceed the limits of the ustar format. Adding 1 KiB of
|
||||
extended header and records to each member just to save subsecond
|
||||
timestamps seems wasteful for a backup format. Moreover, minimizing the
|
||||
overhead may help recovering the archive with lziprecover in case of
|
||||
corruption.
|
||||
|
||||
Global pax headers are tolerated, but not supported; they are parsed and
|
||||
ignored. Some operations may not behave as expected if the archive contains
|
||||
global headers.
|
||||
|
||||
|
||||
5.4 Avoid misconversions to/from UTF-8
|
||||
5.4 Improve reproducibility
|
||||
===========================
|
||||
|
||||
Pax includes by default the process ID of the pax process in the ustar name
|
||||
of the extended headers, making the archive not reproducible. Tarlz stores
|
||||
the true name of the file just once, either in the ustar header or in the
|
||||
extended records, making it easier to produce reproducible archives.
|
||||
|
||||
Pax allows an extended record to have length x-1 or x if x is a power of
|
||||
ten; '99<97_bytes>' or '100<97_bytes>'. Tarlz minimizes the length of the
|
||||
record and always produces a length of x-1 in these cases.
|
||||
|
||||
|
||||
5.5 No data in hard links
|
||||
=========================
|
||||
|
||||
Tarlz does not allow data in hard link members. The data (if any) must be in
|
||||
the member determining the type of the file (which can't be a link). If all
|
||||
the names of a file are stored as hard links, the type of the file is lost.
|
||||
Not allowing data in hard links also prevents invalid actions like
|
||||
extracting file data for a hard link to a symbolic link or to a directory.
|
||||
|
||||
|
||||
5.6 Avoid misconversions to/from UTF-8
|
||||
======================================
|
||||
|
||||
There is no portable way to tell what charset a text string is coded into.
|
||||
|
@ -968,12 +1063,12 @@ headers must provide their own integrity checking.
|
|||
Multi-threaded extraction may produce different output than single-threaded
|
||||
extraction in some cases:
|
||||
|
||||
During multi-threaded extraction, several independent processes are
|
||||
During multi-threaded extraction, several independent threads are
|
||||
simultaneously reading the archive and creating files in the file system.
|
||||
The archive is not read sequentially. As a consequence, any error or
|
||||
weirdness in the archive (like a corrupt member or an EOF block in the
|
||||
middle of the archive) won't be usually detected until part of the archive
|
||||
beyond that point has been processed.
|
||||
weirdness in the archive (like a corrupt member or an end-of-archive block
|
||||
in the middle of the archive) won't be usually detected until part of the
|
||||
archive beyond that point has been processed.
|
||||
|
||||
If the archive contains two or more tar members with the same name,
|
||||
single-threaded extraction extracts the members in the order they appear in
|
||||
|
@ -986,6 +1081,9 @@ unspecified which of the tar members is extracted.
|
|||
names resolve to the same file in the file system), the result is undefined.
|
||||
(Probably the resulting file will be mangled).
|
||||
|
||||
Extraction of a hard link may fail if it is extracted before the file it
|
||||
links to.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Minimum archive sizes, Next: Examples, Prev: Multi-threaded decoding, Up: Top
|
||||
|
||||
|
@ -1054,7 +1152,7 @@ Example 4: Create a compressed appendable archive containing directories
|
|||
'dir1', 'dir2' and 'dir3' with a separate lzip member per directory. Then
|
||||
append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of them
|
||||
contained in a single lzip member. The resulting archive 'archive.tar.lz'
|
||||
contains 5 lzip members (including the EOF member).
|
||||
contains 5 lzip members (including the end-of-archive member).
|
||||
|
||||
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
||||
tarlz --asolid -rf archive.tar.lz a b c d e
|
||||
|
@ -1081,7 +1179,7 @@ Example 7: Extract files 'a' and 'c', and the whole tree under directory
|
|||
Example 8: Copy the contents of directory 'sourcedir' to the directory
|
||||
'destdir'.
|
||||
|
||||
tarlz -C sourcedir -c . | tarlz -C destdir -x
|
||||
tarlz -C sourcedir --uncompressed -cf - . | tarlz -C destdir -xf -
|
||||
|
||||
|
||||
Example 9: Compress the existing POSIX archive 'archive.tar' and write the
|
||||
|
@ -1091,6 +1189,18 @@ other members can still be extracted).
|
|||
|
||||
tarlz -z --no-solid archive.tar
|
||||
|
||||
|
||||
Example 10: Compress the archive 'archive.tar' and write the output to
|
||||
'foo.tar.lz'.
|
||||
|
||||
tarlz -z -o foo.tar.lz archive.tar
|
||||
|
||||
|
||||
Example 11: Concatenate and compress two archives 'archive1.tar' and
|
||||
'archive2.tar', and write the output to 'foo.tar.lz'.
|
||||
|
||||
tarlz -A archive1.tar archive2.tar | tarlz -z -o foo.tar.lz
|
||||
|
||||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
|
@ -1133,23 +1243,26 @@ Concept index
|
|||
|
||||
|
||||
Tag Table:
|
||||
Node: Top223
|
||||
Node: Introduction1214
|
||||
Node: Invoking tarlz4022
|
||||
Ref: --data-size6436
|
||||
Ref: --bsolid16388
|
||||
Node: Portable character set21224
|
||||
Node: File format22019
|
||||
Ref: key_crc3226944
|
||||
Node: Amendments to pax format32572
|
||||
Ref: crc3233236
|
||||
Ref: flawed-compat34547
|
||||
Node: Program design37348
|
||||
Node: Multi-threaded decoding41273
|
||||
Node: Minimum archive sizes45764
|
||||
Node: Examples47902
|
||||
Node: Problems49918
|
||||
Node: Concept index50473
|
||||
Node: Top216
|
||||
Node: Introduction1210
|
||||
Node: Invoking tarlz4029
|
||||
Ref: --data-size12880
|
||||
Ref: --bsolid17192
|
||||
Node: Portable character set22788
|
||||
Node: File format23431
|
||||
Ref: key_crc3230188
|
||||
Ref: ustar-uid-gid33452
|
||||
Ref: ustar-mtime34254
|
||||
Node: Amendments to pax format36254
|
||||
Ref: crc3236963
|
||||
Ref: flawed-compat38274
|
||||
Node: Program design42364
|
||||
Node: Multi-threaded decoding46289
|
||||
Ref: mt-extraction49570
|
||||
Node: Minimum archive sizes50876
|
||||
Node: Examples53014
|
||||
Node: Problems55381
|
||||
Node: Concept index55936
|
||||
|
||||
End Tag Table
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue