Adding upstream version 0.8.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
481ef88a11
commit
9bbbd387b8
28 changed files with 2668 additions and 574 deletions
522
doc/tarlz.texi
522
doc/tarlz.texi
|
@ -6,8 +6,8 @@
|
|||
@finalout
|
||||
@c %**end of header
|
||||
|
||||
@set UPDATED 23 April 2018
|
||||
@set VERSION 0.4
|
||||
@set UPDATED 16 December 2018
|
||||
@set VERSION 0.8
|
||||
|
||||
@dircategory Data Compression
|
||||
@direntry
|
||||
|
@ -35,11 +35,13 @@
|
|||
This manual is for Tarlz (version @value{VERSION}, @value{UPDATED}).
|
||||
|
||||
@menu
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
@end menu
|
||||
|
||||
@sp 1
|
||||
|
@ -53,43 +55,19 @@ to copy, distribute and modify it.
|
|||
@chapter Introduction
|
||||
@cindex introduction
|
||||
|
||||
Tarlz is a small and simple implementation of the tar archiver. By
|
||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
||||
end of such compressed archives.
|
||||
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This same method works for any tar format (gnu,
|
||||
ustar, posix) and is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||
any other tar.lz archive.
|
||||
@uref{http://www.nongnu.org/lzip/tarlz.html,,Tarlz} is a small and simple
|
||||
implementation of the tar archiver. By default tarlz creates, lists and
|
||||
extracts archives in a simplified posix pax format compressed with
|
||||
@uref{http://www.nongnu.org/lzip/lzip.html,,lzip} on a per file basis. Each
|
||||
tar member is compressed in its own lzip member, as well as the end-of-file
|
||||
blocks. This method is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like any
|
||||
other tar.lz archive. Tarlz can append files to the end of such compressed
|
||||
archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||
programs to evaluate the format and perhaps implement it in their tools.
|
||||
|
||||
The diagram below shows the correspondence between tar members (formed
|
||||
by a header plus optional data) in the tar archive and
|
||||
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip members}
|
||||
in the resulting multimember tar.lz archive:
|
||||
@ifnothtml
|
||||
@xref{File format,,,lzip}.
|
||||
@end ifnothtml
|
||||
|
||||
@verbatim
|
||||
tar
|
||||
+========+======+========+======+========+======+========+
|
||||
| header | data | header | data | header | data | eof |
|
||||
+========+======+========+======+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+===============+===============+========+
|
||||
| member | member | member | member |
|
||||
+===============+===============+===============+========+
|
||||
@end verbatim
|
||||
|
||||
@noindent
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
|
@ -101,15 +79,16 @@ The resulting multimember tar.lz archive can be decompressed in
|
|||
parallel with plzip, multiplying the decompression speed.
|
||||
|
||||
@item
|
||||
New members can be appended to the archive (by removing the eof
|
||||
New members can be appended to the archive (by removing the EOF
|
||||
member) just like to an uncompressed tar archive.
|
||||
|
||||
@item
|
||||
It is a safe posix-style backup format. In case of corruption,
|
||||
tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
||||
least part of the contents of the damaged members.
|
||||
(uncompressed) tar. Moreover, the option @code{--keep-damaged} can be
|
||||
used to recover as much data as possible from each damaged member,
|
||||
and lziprecover can be used to recover some of the damaged members.
|
||||
|
||||
@item
|
||||
A multimember tar.lz archive is usually smaller than the
|
||||
|
@ -117,6 +96,15 @@ corresponding solidly compressed tar.gz archive, except when
|
|||
individually compressing files smaller than about 32 KiB.
|
||||
@end itemize
|
||||
|
||||
Tarlz protects the extended records with a CRC in a way compatible with
|
||||
standard tar tools. @xref{crc32}.
|
||||
|
||||
Tarlz does not understand other tar formats like @samp{gnu}, @samp{oldgnu},
|
||||
@samp{star} or @samp{v7}.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real tar
|
||||
programs to evaluate the format and perhaps implement it in their tools.
|
||||
|
||||
|
||||
@node Invoking tarlz
|
||||
@chapter Invoking tarlz
|
||||
|
@ -133,9 +121,16 @@ tarlz [@var{options}] [@var{files}]
|
|||
|
||||
@noindent
|
||||
On archive creation or appending, tarlz removes leading and trailing
|
||||
slashes from file names, as well as file name prefixes containing a
|
||||
slashes from filenames, as well as filename prefixes containing a
|
||||
@samp{..} component. On extraction, archive members containing a
|
||||
@samp{..} component are skipped.
|
||||
@samp{..} component are skipped. Tarlz detects when the archive being
|
||||
created or enlarged is among the files to be dumped, appended or
|
||||
concatenated, and skips it.
|
||||
|
||||
On extraction and listing, tarlz removes leading @samp{./} strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
@w{@code{tarlz -xf foo ./bar baz}} extracts members @samp{bar} and
|
||||
@samp{./baz} from archive @samp{foo}.
|
||||
|
||||
tarlz supports the following options:
|
||||
|
||||
|
@ -147,10 +142,21 @@ Print an informative help message describing the options and exit.
|
|||
@item -V
|
||||
@itemx --version
|
||||
Print the version number of tarlz on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
@item -A
|
||||
@itemx --concatenate
|
||||
Append tar.lz archives to the end of a tar.lz archive. All the archives
|
||||
involved must be regular (seekable) files compressed as multimember lzip
|
||||
files, and the two end-of-file blocks plus any zero padding must be
|
||||
contained in the last lzip member of each archive. The intermediate
|
||||
end-of-file blocks are removed as each new archive is concatenated. Exit
|
||||
with status 0 without modifying the archive if no @var{files} have been
|
||||
specified. Tarlz can't concatenate uncompressed tar archives.
|
||||
|
||||
@item -c
|
||||
@itemx --create
|
||||
Create a new archive.
|
||||
Create a new archive from @var{files}.
|
||||
|
||||
@item -C @var{dir}
|
||||
@itemx --directory=@var{dir}
|
||||
|
@ -174,18 +180,19 @@ Quiet operation. Suppress all messages.
|
|||
|
||||
@item -r
|
||||
@itemx --append
|
||||
Append files to the end of an archive. The archive must be a regular
|
||||
(seekable) file compressed as a multimember lzip file, and the two
|
||||
end-of-file blocks plus any zero padding must be contained in the last
|
||||
lzip member of the archive. First this last member is removed, then the
|
||||
new members are appended, and then a new end-of-file member is appended
|
||||
to the archive. Exit with status 0 without modifying the archive if no
|
||||
@var{files} have been specified. tarlz can't append files to an
|
||||
uncompressed tar archive.
|
||||
Append files to the end of a tar.lz archive. The archive must be a
|
||||
regular (seekable) file compressed as a multimember lzip file, and the
|
||||
two end-of-file blocks plus any zero padding must be contained in the
|
||||
last lzip member of the archive. First this last member is removed, then
|
||||
the new members are appended, and then a new end-of-file member is
|
||||
appended to the archive. Exit with status 0 without modifying the
|
||||
archive if no @var{files} have been specified. Tarlz can't append files
|
||||
to an uncompressed tar archive.
|
||||
|
||||
@item -t
|
||||
@itemx --list
|
||||
List the contents of an archive.
|
||||
List the contents of an archive. If @var{files} are given, list only the
|
||||
given @var{files}.
|
||||
|
||||
@item -v
|
||||
@itemx --verbose
|
||||
|
@ -193,10 +200,13 @@ Verbosely list files processed.
|
|||
|
||||
@item -x
|
||||
@itemx --extract
|
||||
Extract files from an archive.
|
||||
Extract files from an archive. If @var{files} are given, extract only
|
||||
the given @var{files}. Else extract all the files in the archive.
|
||||
|
||||
@item -0 .. -9
|
||||
Set the compression level. The default compression level is @samp{-6}.
|
||||
Like lzip, tarlz also minimizes the dictionary size of the lzip members
|
||||
it creates, reducing the amount of memory required for decompression.
|
||||
|
||||
@item --asolid
|
||||
When creating or appending to a compressed archive, use appendable solid
|
||||
|
@ -212,22 +222,55 @@ end-of-file blocks are compressed into a separate lzip member. This
|
|||
creates a compressed appendable archive with a separate lzip member for
|
||||
each top-level directory.
|
||||
|
||||
@item --no-solid
|
||||
When creating or appending to a compressed archive, compress each file
|
||||
separately. The end-of-file blocks are compressed into a separate lzip
|
||||
member. This creates a compressed appendable archive with a separate
|
||||
lzip member for each file. This option allows tarlz revert to default
|
||||
behavior if, for example, tarlz is invoked through an alias like
|
||||
@code{tar='tarlz --solid'}.
|
||||
|
||||
@item --solid
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later appended
|
||||
to the archive without decompressing it first.
|
||||
to the archive.
|
||||
|
||||
@item --anonymous
|
||||
Equivalent to @code{--owner=root --group=root}.
|
||||
|
||||
@item --owner=@var{owner}
|
||||
When creating or appending, use @var{owner} for files added to the
|
||||
archive. If @var{owner} is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
|
||||
@item --group=@var{group}
|
||||
When creating or appending, use @var{group} for files added to the
|
||||
archive. If @var{group} is not a valid group name, it is decoded as a
|
||||
decimal numeric group ID.
|
||||
|
||||
@item --owner=@var{owner}
|
||||
When creating or appending, use @var{owner} for files added to the
|
||||
archive. If @var{owner} is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
@item --keep-damaged
|
||||
Don't delete partially extracted files. If a decompression error happens
|
||||
while extracting a file, keep the partial data extracted. Use this
|
||||
option to recover as much data as possible from each damaged member.
|
||||
|
||||
@item --missing-crc
|
||||
Exit with error status 2 if the CRC of the extended records is missing.
|
||||
When this option is used, tarlz detects any corruption in the extended
|
||||
records (only limited by CRC collisions). But note that a corrupt
|
||||
@samp{GNU.crc32} keyword, for example @samp{GNU.crc33}, is reported as a
|
||||
missing CRC instead of as a corrupt record. This misleading
|
||||
@samp{Missing CRC} message is the consequence of a flaw in the posix pax
|
||||
format; i.e., the lack of a mandatory check sequence in the extended
|
||||
records. @xref{crc32}.
|
||||
|
||||
@ignore
|
||||
@item --permissive
|
||||
Allow some violations of the archive format, like consecutive extended
|
||||
headers preceding a ustar header, or several records with the same
|
||||
keyword appearing in the same block of extended records.
|
||||
@end ignore
|
||||
|
||||
@item --uncompressed
|
||||
With @code{--create}, don't compress the created tar archive. Create an
|
||||
|
@ -241,6 +284,358 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
|||
caused tarlz to panic.
|
||||
|
||||
|
||||
@node File format
|
||||
@chapter File format
|
||||
@cindex file format
|
||||
|
||||
In the diagram below, a box like this:
|
||||
@verbatim
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
@end verbatim
|
||||
|
||||
represents one byte; a box like this:
|
||||
@verbatim
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
@end verbatim
|
||||
|
||||
represents a variable number of bytes or a fixed but large number of
|
||||
bytes (for example 512).
|
||||
|
||||
@sp 1
|
||||
A tar.lz file consists of a series of lzip members (compressed data sets).
|
||||
The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them.
|
||||
|
||||
Each lzip member contains one or more tar members in a simplified posix
|
||||
pax interchange format; the only pax typeflag value supported by tarlz
|
||||
(in addition to the typeflag values defined by the ustar format) is
|
||||
@samp{x}. The pax format is an extension on top of the ustar format that
|
||||
removes the size limitations of the ustar format.
|
||||
|
||||
Each tar member contains one file archived, and is represented by the
|
||||
following sequence:
|
||||
|
||||
@itemize @bullet
|
||||
@item
|
||||
An optional extended header block with extended header records. This
|
||||
header block is of the form described in pax header block, with a
|
||||
typeflag value of @samp{x}. The extended header records are included as
|
||||
the data for this header block.
|
||||
|
||||
@item
|
||||
A header block in ustar format that describes the file. Any fields
|
||||
defined in the preceding optional extended header records override the
|
||||
associated fields in this header block for this file.
|
||||
|
||||
@item
|
||||
Zero or more blocks that contain the contents of the file.
|
||||
@end itemize
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled with
|
||||
binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||
blocks are either compressed in a separate lzip member or compressed
|
||||
along with the tar members contained in the last lzip member.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each
|
||||
@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#File-format,,lzip member}
|
||||
in the resulting multimember tar.lz archive:
|
||||
@ifnothtml
|
||||
@xref{File format,,,lzip}.
|
||||
@end ifnothtml
|
||||
|
||||
@verbatim
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
| header | data | extended header | extended data | header | data | EOF |
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+=================================================+========+
|
||||
| member | member | member |
|
||||
+===============+=================================================+========+
|
||||
@end verbatim
|
||||
|
||||
@ignore
|
||||
When @code{--permissive} is used, the following violations of the
|
||||
archive format are allowed:@*
|
||||
If several extended headers precede an ustar header, only the last
|
||||
extended header takes effect. The other extended headers are ignored.
|
||||
Similarly, if several records with the same keyword appear in the same
|
||||
block of extended records, only the last record for the repeated keyword
|
||||
takes effect. The other records for the repeated keyword are ignored.
|
||||
@end ignore
|
||||
|
||||
@sp 1
|
||||
@section Pax header block
|
||||
|
||||
The pax header block is identical to the ustar header block described below
|
||||
except that the typeflag has the value @samp{x} (extended). The size field
|
||||
is the size of the extended header data in bytes. Most other fields in the
|
||||
pax header block are zeroed on archive creation to prevent trouble if the
|
||||
archive is read by an ustar tool, and are ignored by tarlz on archive
|
||||
extraction. @xref{flawed-compat}.
|
||||
|
||||
The pax extended header data consists of one or more records, each of
|
||||
them constructed as follows:@*
|
||||
@code{"%d %s=%s\n", <length>, <keyword>, <value>}
|
||||
|
||||
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
||||
record must be limited to the portable character set. The <length> field
|
||||
contains the decimal length of the record in bytes, including the
|
||||
trailing <newline>. The <value> field is stored as-is, without
|
||||
conversion to UTF-8 nor any other transformation.
|
||||
|
||||
These are the <keyword> fields currently supported by tarlz:
|
||||
|
||||
@table @code
|
||||
@item linkpath
|
||||
The pathname of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the linkname field in the
|
||||
following ustar header block. The following ustar header block
|
||||
determines the type of link created. If typeflag of the following header
|
||||
block is 1, it will be a hard link. If typeflag is 2, it will be a
|
||||
symbolic link and the linkpath value will be used as the contents of the
|
||||
symbolic link.
|
||||
|
||||
@item path
|
||||
The pathname of the following file. This record overrides the name and
|
||||
prefix fields in the following ustar header block.
|
||||
|
||||
@item size
|
||||
The size of the file in bytes, expressed as a decimal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard. This record overrides
|
||||
the size field in the following ustar header block. The size record is
|
||||
used only for files with a size value greater than 8_589_934_591
|
||||
@w{(octal 77777777777)}. This is 2^33 bytes or larger.
|
||||
|
||||
@anchor{key_crc32}
|
||||
@item GNU.crc32
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8 bytes
|
||||
representing the CRC <value> itself. The <value> is represented as 8
|
||||
hexadecimal digits in big endian order,
|
||||
@w{@samp{22 GNU.crc32=00000000\n}}. The keyword of the CRC record is
|
||||
protected by the CRC to guarante that corruption is always detected
|
||||
(except in case of CRC collision). A CRC was chosen because a checksum
|
||||
is too weak for a potentially large list of variable sized records. A
|
||||
checksum can't detect simple errors like the swapping of two bytes.
|
||||
@end table
|
||||
|
||||
@sp 1
|
||||
@section Ustar header block
|
||||
|
||||
The ustar header block has a length of 512 bytes and is structured as
|
||||
shown in the following table. All lengths and offsets are in decimal.
|
||||
|
||||
@multitable {Field Name} {Offset} {Length (in bytes)}
|
||||
@item Field Name @tab Offset @tab Length (in bytes)
|
||||
@item name @tab 0 @tab 100
|
||||
@item mode @tab 100 @tab 8
|
||||
@item uid @tab 108 @tab 8
|
||||
@item gid @tab 116 @tab 8
|
||||
@item size @tab 124 @tab 12
|
||||
@item mtime @tab 136 @tab 12
|
||||
@item chksum @tab 148 @tab 8
|
||||
@item typeflag @tab 156 @tab 1
|
||||
@item linkname @tab 157 @tab 100
|
||||
@item magic @tab 257 @tab 6
|
||||
@item version @tab 263 @tab 2
|
||||
@item uname @tab 265 @tab 32
|
||||
@item gname @tab 297 @tab 32
|
||||
@item devmajor @tab 329 @tab 8
|
||||
@item devminor @tab 337 @tab 8
|
||||
@item prefix @tab 345 @tab 155
|
||||
@end multitable
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC 646:1991
|
||||
(ASCII) standard, except in fields storing names for files, users, and
|
||||
groups. For maximum portability between implementations, names should
|
||||
only contain characters from the portable filename character set. But if
|
||||
an implementation supports the use of characters outside of @samp{/} and
|
||||
the portable filename character set in names for files, users, and
|
||||
groups, tarlz will use the byte values in these names unmodified.
|
||||
|
||||
The fields name, linkname, and prefix are null-terminated character
|
||||
strings except when all characters in the array contain non-null
|
||||
characters including the last character.
|
||||
|
||||
The name and the prefix fields produce the pathname of the file. A new
|
||||
pathname is formed, if prefix is not an empty string (its first
|
||||
character is not null), by concatenating prefix (up to the first null
|
||||
character), a <slash> character, and name; otherwise, name is used
|
||||
alone. In either case, name is terminated at the first null character.
|
||||
If prefix begins with a null character, it is ignored. In this manner,
|
||||
pathnames of at most 256 characters can be supported. If a pathname does
|
||||
not fit in the space provided, an extended record is used to store the
|
||||
pathname.
|
||||
|
||||
The linkname field does not use the prefix to produce a pathname. If the
|
||||
linkname does not fit in the 100 characters provided, an extended record
|
||||
is used to store the linkname.
|
||||
|
||||
The mode field provides 12 access permission bits. The following table
|
||||
shows the symbolic name of each bit and its octal value:
|
||||
|
||||
@multitable {Bit Name} {Bit value}
|
||||
@item Bit Name @tab Bit value
|
||||
@item S_ISUID @tab 04000
|
||||
@item S_ISGID @tab 02000
|
||||
@item S_ISVTX @tab 01000
|
||||
@item S_IRUSR @tab 00400
|
||||
@item S_IWUSR @tab 00200
|
||||
@item S_IXUSR @tab 00100
|
||||
@item S_IRGRP @tab 00040
|
||||
@item S_IWGRP @tab 00020
|
||||
@item S_IXGRP @tab 00010
|
||||
@item S_IROTH @tab 00004
|
||||
@item S_IWOTH @tab 00002
|
||||
@item S_IXOTH @tab 00001
|
||||
@end multitable
|
||||
|
||||
The uid and gid fields are the user and group ID of the owner and group
|
||||
of the file, respectively.
|
||||
|
||||
The size field contains the octal representation of the size of the file
|
||||
in bytes. If the typeflag field specifies a file of type '0' (regular
|
||||
file) or '7' (high performance regular file), the number of logical
|
||||
records following the header is @w{(size / 512)} rounded to the next
|
||||
integer. For all other values of typeflag, tarlz either sets the size
|
||||
field to 0 or ignores it, and does not store or expect any logical
|
||||
records following the header. If the file size is larger than
|
||||
8_589_934_591 bytes @w{(octal 77777777777)}, an extended record is used
|
||||
to store the file size.
|
||||
|
||||
The mtime field contains the octal representation of the modification
|
||||
time of the file at the time it was archived, obtained from the stat()
|
||||
function.
|
||||
|
||||
The chksum field contains the octal representation of the value of the
|
||||
simple sum of all bytes in the header logical record. Each byte in the
|
||||
header is treated as an unsigned value. When calculating the checksum,
|
||||
the chksum field is treated as if it were all <space> characters.
|
||||
|
||||
The typeflag field contains a single character specifying the type of
|
||||
file archived:
|
||||
|
||||
@table @code
|
||||
@item '0'
|
||||
Regular file.
|
||||
|
||||
@item '1'
|
||||
Hard link to another file, of any type, previously archived.
|
||||
|
||||
@item '2'
|
||||
Symbolic link.
|
||||
|
||||
@item '3', '4'
|
||||
Character special file and block special file respectively. In this case
|
||||
the devmajor and devminor fields contain information defining the
|
||||
device in unspecified format.
|
||||
|
||||
@item '5'
|
||||
Directory.
|
||||
|
||||
@item '6'
|
||||
FIFO special file.
|
||||
|
||||
@item '7'
|
||||
Reserved to represent a file to which an implementation has associated
|
||||
some high-performance attribute. Tarlz treats this type of file as a
|
||||
regular file (type 0).
|
||||
|
||||
@end table
|
||||
|
||||
The magic field contains the ASCII null-terminated string "ustar". The
|
||||
version field contains the characters "00" (0x30,0x30). The fields
|
||||
uname, and gname are null-terminated character strings. Each numeric
|
||||
field contains a leading zero-filled, null-terminated octal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||
|
||||
|
||||
@node Amendments to pax format
|
||||
@chapter The reasons for the differences with pax
|
||||
@cindex Amendments to pax format
|
||||
|
||||
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
||||
extraction and to not create safety risks in the archives it creates. In
|
||||
order to achieve these goals, tarlz makes some changes to the variant of the
|
||||
pax format that it uses. This chapter describes these changes and the
|
||||
concrete reasons to implement them.
|
||||
|
||||
@sp 1
|
||||
@anchor{crc32}
|
||||
@section Add a CRC of the extended records
|
||||
|
||||
The posix pax format has a serious flaw. The metadata stored in pax extended
|
||||
records are not protected by any kind of check sequence. Corruption in a
|
||||
long filename may cause the extraction of the file in the wrong place
|
||||
without warning. Corruption in a long file size may cause the truncation of
|
||||
the file or the appending of garbage to the file, both followed by a
|
||||
spurious warning about a corrupt header far from the place of the undetected
|
||||
corruption.
|
||||
|
||||
Metadata like filename and file size must be always protected in an archive
|
||||
format because of the adverse effects of undetected corruption in them,
|
||||
potentially much worse that undetected corruption in the data. Even more so
|
||||
in the case of pax because the amount of metadata it stores is potentially
|
||||
large, making undetected corruption more probable.
|
||||
|
||||
Because of the above, tarlz protects the extended records with a CRC in
|
||||
a way compatible with standard tar tools. @xref{key_crc32}.
|
||||
|
||||
@sp 1
|
||||
@anchor{flawed-compat}
|
||||
@section Remove flawed backward compatibility
|
||||
|
||||
In order to allow the extraction of pax archives by a tar utility conforming
|
||||
to the POSIX-2:1993 standard, POSIX.1-2008 recommends selecting extended
|
||||
header field values that allow such tar to create a regular file containing
|
||||
the extended header records as data. This approach is broken because if the
|
||||
extended header is needed because of a long filename, the name and prefix
|
||||
fields will be unable to contain the full pathname of the file. Therefore
|
||||
the files corresponding to both the extended header and the overridden ustar
|
||||
header will be extracted using truncated filenames, perhaps overwriting
|
||||
existing files or directories. It may be a security risk to extract a file
|
||||
with a truncated filename.
|
||||
|
||||
To avoid this problem, tarlz writes extended headers with all fields zeroed
|
||||
except size, chksum, typeflag, magic and version. This prevents old tar
|
||||
programs from extracting the extended records as a file in the wrong place.
|
||||
Tarlz also sets to zero those fields of the ustar header overridden by
|
||||
extended records.
|
||||
|
||||
If the extended header is needed because of a file size larger than
|
||||
@w{8 GiB}, the size field will be unable to contain the full size of the
|
||||
file. Therefore the file may be partially extracted, and the tool will issue
|
||||
a spurious warning about a corrupt header at the point where it thinks the
|
||||
file ends. Setting to zero the overridden size in the ustar header at least
|
||||
prevents the partial extraction and makes obvious that the file has been
|
||||
truncated.
|
||||
|
||||
@sp 1
|
||||
@section As simple as possible (but not simpler)
|
||||
|
||||
The tarlz format is mainly ustar. Extended pax headers are used only when
|
||||
needed because the length of a filename or link name, or the size of a file
|
||||
exceed the limits of the ustar format. Adding extended headers to each
|
||||
member just to record subsecond timestamps seems wasteful for a backup
|
||||
format.
|
||||
|
||||
@sp 1
|
||||
@section Avoid misconversions to/from UTF-8
|
||||
|
||||
There is no portable way to tell what charset a text string is coded into.
|
||||
Therefore, tarlz stores all fields representing text strings as-is, without
|
||||
conversion to UTF-8 nor any other transformation. This prevents accidental
|
||||
double UTF-8 conversions. If the need arises this behavior will be adjusted
|
||||
with a command line option in the future.
|
||||
|
||||
|
||||
@node Examples
|
||||
@chapter A small tutorial with examples
|
||||
@cindex examples
|
||||
|
@ -280,7 +675,7 @@ Example 4: Create a compressed appendable archive containing directories
|
|||
directory. Then append files @samp{a}, @samp{b}, @samp{c}, @samp{d} and
|
||||
@samp{e} to the archive, all of them contained in a single lzip member.
|
||||
The resulting archive @samp{archive.tar.lz} contains 5 lzip members
|
||||
(including the eof member).
|
||||
(including the EOF member).
|
||||
|
||||
@example
|
||||
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
||||
|
@ -291,8 +686,7 @@ tarlz --asolid -rf archive.tar.lz a b c d e
|
|||
@noindent
|
||||
Example 5: Create a solidly compressed archive @samp{archive.tar.lz}
|
||||
containing files @samp{a}, @samp{b} and @samp{c}. Note that no more
|
||||
files can be later appended to the archive without decompressing it
|
||||
first.
|
||||
files can be later appended to the archive.
|
||||
|
||||
@example
|
||||
tarlz --solid -cf archive.tar.lz a b c
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue