Adding upstream version 0.8.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
481ef88a11
commit
9bbbd387b8
28 changed files with 2668 additions and 574 deletions
41
doc/tarlz.1
41
doc/tarlz.1
|
@ -1,12 +1,25 @@
|
|||
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.46.1.
|
||||
.TH TARLZ "1" "April 2018" "tarlz 0.4" "User Commands"
|
||||
.TH TARLZ "1" "December 2018" "tarlz 0.8" "User Commands"
|
||||
.SH NAME
|
||||
tarlz \- creates tar archives with multimember lzip compression
|
||||
.SH SYNOPSIS
|
||||
.B tarlz
|
||||
[\fI\,options\/\fR] [\fI\,files\/\fR]
|
||||
.SH DESCRIPTION
|
||||
Tarlz \- Archiver with multimember lzip compression.
|
||||
Tarlz is a small and simple implementation of the tar archiver. By default
|
||||
tarlz creates, lists and extracts archives in a simplified posix pax format
|
||||
compressed with lzip on a per file basis. Each tar member is compressed in
|
||||
its own lzip member, as well as the end\-of\-file blocks. This method is fully
|
||||
backward compatible with standard tar tools like GNU tar, which treat the
|
||||
resulting multimember tar.lz archive like any other tar.lz archive. Tarlz
|
||||
can append files to the end of such compressed archives.
|
||||
.PP
|
||||
The tarlz file format is a safe posix\-style backup format. In case of
|
||||
corruption, tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, the option '\-\-keep\-damaged' can be used to
|
||||
recover as much data as possible from each damaged member, and lziprecover
|
||||
can be used to recover some of the damaged members.
|
||||
.SH OPTIONS
|
||||
.TP
|
||||
\fB\-h\fR, \fB\-\-help\fR
|
||||
|
@ -15,6 +28,9 @@ display this help and exit
|
|||
\fB\-V\fR, \fB\-\-version\fR
|
||||
output version information and exit
|
||||
.TP
|
||||
\fB\-A\fR, \fB\-\-concatenate\fR
|
||||
append tar.lz archives to the end of an archive
|
||||
.TP
|
||||
\fB\-c\fR, \fB\-\-create\fR
|
||||
create a new archive
|
||||
.TP
|
||||
|
@ -48,17 +64,29 @@ create solidly compressed appendable archive
|
|||
\fB\-\-dsolid\fR
|
||||
create per\-directory compressed archive
|
||||
.TP
|
||||
\fB\-\-no\-solid\fR
|
||||
create per\-file compressed archive (default)
|
||||
.TP
|
||||
\fB\-\-solid\fR
|
||||
create solidly compressed archive
|
||||
.TP
|
||||
\fB\-\-group=\fR<group>
|
||||
use <group> name/id for added files
|
||||
\fB\-\-anonymous\fR
|
||||
equivalent to '\-\-owner=root \fB\-\-group\fR=\fI\,root\/\fR'
|
||||
.TP
|
||||
\fB\-\-owner=\fR<owner>
|
||||
use <owner> name/id for added files
|
||||
use <owner> name/ID for files added
|
||||
.TP
|
||||
\fB\-\-group=\fR<group>
|
||||
use <group> name/ID for files added
|
||||
.TP
|
||||
\fB\-\-keep\-damaged\fR
|
||||
don't delete partially extracted files
|
||||
.TP
|
||||
\fB\-\-missing\-crc\fR
|
||||
exit with error status if missing extended CRC
|
||||
.TP
|
||||
\fB\-\-uncompressed\fR
|
||||
don't compress the created archive
|
||||
don't compress the archive created
|
||||
.PP
|
||||
Exit status: 0 for a normal exit, 1 for environmental problems (file
|
||||
not found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
|
||||
|
@ -70,6 +98,7 @@ Report bugs to lzip\-bug@nongnu.org
|
|||
Tarlz home page: http://www.nongnu.org/lzip/tarlz.html
|
||||
.SH COPYRIGHT
|
||||
Copyright \(co 2018 Antonio Diaz Diaz.
|
||||
Using lzlib 1.11\-rc2
|
||||
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>
|
||||
.br
|
||||
This is free software: you are free to change and redistribute it.
|
||||
|
|
498
doc/tarlz.info
498
doc/tarlz.info
|
@ -11,15 +11,17 @@ File: tarlz.info, Node: Top, Next: Introduction, Up: (dir)
|
|||
Tarlz Manual
|
||||
************
|
||||
|
||||
This manual is for Tarlz (version 0.4, 23 April 2018).
|
||||
This manual is for Tarlz (version 0.8, 16 December 2018).
|
||||
|
||||
* Menu:
|
||||
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
* Introduction:: Purpose and features of tarlz
|
||||
* Invoking tarlz:: Command line interface
|
||||
* File format:: Detailed format of the compressed archive
|
||||
* Amendments to pax format:: The reasons for the differences with pax
|
||||
* Examples:: A small tutorial with examples
|
||||
* Problems:: Reporting bugs
|
||||
* Concept index:: Index of concepts
|
||||
|
||||
|
||||
Copyright (C) 2013-2018 Antonio Diaz Diaz.
|
||||
|
@ -34,38 +36,17 @@ File: tarlz.info, Node: Introduction, Next: Invoking tarlz, Prev: Top, Up: T
|
|||
**************
|
||||
|
||||
Tarlz is a small and simple implementation of the tar archiver. By
|
||||
default tarlz creates, lists and extracts archives in the 'ustar' format
|
||||
compressed with lzip on a per file basis. Tarlz can append files to the
|
||||
end of such compressed archives.
|
||||
|
||||
Each tar member is compressed in its own lzip member, as well as the
|
||||
end-of-file blocks. This same method works for any tar format (gnu,
|
||||
ustar, posix) and is fully backward compatible with standard tar tools
|
||||
default tarlz creates, lists and extracts archives in a simplified
|
||||
posix pax format compressed with lzip on a per file basis. Each tar
|
||||
member is compressed in its own lzip member, as well as the end-of-file
|
||||
blocks. This method is fully backward compatible with standard tar tools
|
||||
like GNU tar, which treat the resulting multimember tar.lz archive like
|
||||
any other tar.lz archive.
|
||||
any other tar.lz archive. Tarlz can append files to the end of such
|
||||
compressed archives.
|
||||
|
||||
Tarlz can create tar archives with four levels of compression
|
||||
granularity; per file, per directory, appendable solid, and solid.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real
|
||||
tar programs to evaluate the format and perhaps implement it in their
|
||||
tools.
|
||||
|
||||
The diagram below shows the correspondence between tar members
|
||||
(formed by a header plus optional data) in the tar archive and lzip
|
||||
members in the resulting multimember tar.lz archive: *Note File format:
|
||||
(lzip)File format.
|
||||
|
||||
tar
|
||||
+========+======+========+======+========+======+========+
|
||||
| header | data | header | data | header | data | eof |
|
||||
+========+======+========+======+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+===============+===============+========+
|
||||
| member | member | member | member |
|
||||
+===============+===============+===============+========+
|
||||
|
||||
Of course, compressing each file (or each directory) individually is
|
||||
less efficient than compressing the whole tar archive, but it has the
|
||||
following advantages:
|
||||
|
@ -73,21 +54,32 @@ following advantages:
|
|||
* The resulting multimember tar.lz archive can be decompressed in
|
||||
parallel with plzip, multiplying the decompression speed.
|
||||
|
||||
* New members can be appended to the archive (by removing the eof
|
||||
* New members can be appended to the archive (by removing the EOF
|
||||
member) just like to an uncompressed tar archive.
|
||||
|
||||
* It is a safe posix-style backup format. In case of corruption,
|
||||
tarlz can extract all the undamaged members from the tar.lz
|
||||
archive, skipping over the damaged members, just like the standard
|
||||
(uncompressed) tar. Moreover, lziprecover can be used to recover at
|
||||
least part of the contents of the damaged members.
|
||||
(uncompressed) tar. Moreover, the option '--keep-damaged' can be
|
||||
used to recover as much data as possible from each damaged member,
|
||||
and lziprecover can be used to recover some of the damaged members.
|
||||
|
||||
* A multimember tar.lz archive is usually smaller than the
|
||||
corresponding solidly compressed tar.gz archive, except when
|
||||
individually compressing files smaller than about 32 KiB.
|
||||
|
||||
Tarlz protects the extended records with a CRC in a way compatible
|
||||
with standard tar tools. *Note crc32::.
|
||||
|
||||
Tarlz does not understand other tar formats like 'gnu', 'oldgnu',
|
||||
'star' or 'v7'.
|
||||
|
||||
Tarlz is intended as a showcase project for the maintainers of real
|
||||
tar programs to evaluate the format and perhaps implement it in their
|
||||
tools.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Invoking tarlz, Next: Examples, Prev: Introduction, Up: Top
|
||||
File: tarlz.info, Node: Invoking tarlz, Next: File format, Prev: Introduction, Up: Top
|
||||
|
||||
2 Invoking tarlz
|
||||
****************
|
||||
|
@ -97,9 +89,15 @@ The format for running tarlz is:
|
|||
tarlz [OPTIONS] [FILES]
|
||||
|
||||
On archive creation or appending, tarlz removes leading and trailing
|
||||
slashes from file names, as well as file name prefixes containing a
|
||||
'..' component. On extraction, archive members containing a '..'
|
||||
component are skipped.
|
||||
slashes from filenames, as well as filename prefixes containing a '..'
|
||||
component. On extraction, archive members containing a '..' component
|
||||
are skipped. Tarlz detects when the archive being created or enlarged
|
||||
is among the files to be dumped, appended or concatenated, and skips it.
|
||||
|
||||
On extraction and listing, tarlz removes leading './' strings from
|
||||
member names in the archive or given in the command line, so that
|
||||
'tarlz -xf foo ./bar baz' extracts members 'bar' and './baz' from
|
||||
archive 'foo'.
|
||||
|
||||
tarlz supports the following options:
|
||||
|
||||
|
@ -110,10 +108,22 @@ component are skipped.
|
|||
'-V'
|
||||
'--version'
|
||||
Print the version number of tarlz on the standard output and exit.
|
||||
This version number should be included in all bug reports.
|
||||
|
||||
'-A'
|
||||
'--concatenate'
|
||||
Append tar.lz archives to the end of a tar.lz archive. All the
|
||||
archives involved must be regular (seekable) files compressed as
|
||||
multimember lzip files, and the two end-of-file blocks plus any
|
||||
zero padding must be contained in the last lzip member of each
|
||||
archive. The intermediate end-of-file blocks are removed as each
|
||||
new archive is concatenated. Exit with status 0 without modifying
|
||||
the archive if no FILES have been specified. Tarlz can't
|
||||
concatenate uncompressed tar archives.
|
||||
|
||||
'-c'
|
||||
'--create'
|
||||
Create a new archive.
|
||||
Create a new archive from FILES.
|
||||
|
||||
'-C DIR'
|
||||
'--directory=DIR'
|
||||
|
@ -137,18 +147,19 @@ component are skipped.
|
|||
|
||||
'-r'
|
||||
'--append'
|
||||
Append files to the end of an archive. The archive must be a
|
||||
Append files to the end of a tar.lz archive. The archive must be a
|
||||
regular (seekable) file compressed as a multimember lzip file, and
|
||||
the two end-of-file blocks plus any zero padding must be contained
|
||||
in the last lzip member of the archive. First this last member is
|
||||
removed, then the new members are appended, and then a new
|
||||
end-of-file member is appended to the archive. Exit with status 0
|
||||
without modifying the archive if no FILES have been specified.
|
||||
tarlz can't append files to an uncompressed tar archive.
|
||||
Tarlz can't append files to an uncompressed tar archive.
|
||||
|
||||
'-t'
|
||||
'--list'
|
||||
List the contents of an archive.
|
||||
List the contents of an archive. If FILES are given, list only the
|
||||
given FILES.
|
||||
|
||||
'-v'
|
||||
'--verbose'
|
||||
|
@ -156,10 +167,14 @@ component are skipped.
|
|||
|
||||
'-x'
|
||||
'--extract'
|
||||
Extract files from an archive.
|
||||
Extract files from an archive. If FILES are given, extract only
|
||||
the given FILES. Else extract all the files in the archive.
|
||||
|
||||
'-0 .. -9'
|
||||
Set the compression level. The default compression level is '-6'.
|
||||
Like lzip, tarlz also minimizes the dictionary size of the lzip
|
||||
members it creates, reducing the amount of memory required for
|
||||
decompression.
|
||||
|
||||
'--asolid'
|
||||
When creating or appending to a compressed archive, use appendable
|
||||
|
@ -175,22 +190,49 @@ component are skipped.
|
|||
creates a compressed appendable archive with a separate lzip
|
||||
member for each top-level directory.
|
||||
|
||||
'--no-solid'
|
||||
When creating or appending to a compressed archive, compress each
|
||||
file separately. The end-of-file blocks are compressed into a
|
||||
separate lzip member. This creates a compressed appendable archive
|
||||
with a separate lzip member for each file. This option allows
|
||||
tarlz revert to default behavior if, for example, tarlz is invoked
|
||||
through an alias like 'tar='tarlz --solid''.
|
||||
|
||||
'--solid'
|
||||
When creating or appending to a compressed archive, use solid
|
||||
compression. The files being added to the archive, along with the
|
||||
end-of-file blocks, are compressed into a single lzip member. The
|
||||
resulting archive is not appendable. No more files can be later
|
||||
appended to the archive without decompressing it first.
|
||||
appended to the archive.
|
||||
|
||||
'--anonymous'
|
||||
Equivalent to '--owner=root --group=root'.
|
||||
|
||||
'--owner=OWNER'
|
||||
When creating or appending, use OWNER for files added to the
|
||||
archive. If OWNER is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
|
||||
'--group=GROUP'
|
||||
When creating or appending, use GROUP for files added to the
|
||||
archive. If GROUP is not a valid group name, it is decoded as a
|
||||
decimal numeric group ID.
|
||||
|
||||
'--owner=OWNER'
|
||||
When creating or appending, use OWNER for files added to the
|
||||
archive. If OWNER is not a valid user name, it is decoded as a
|
||||
decimal numeric user ID.
|
||||
'--keep-damaged'
|
||||
Don't delete partially extracted files. If a decompression error
|
||||
happens while extracting a file, keep the partial data extracted.
|
||||
Use this option to recover as much data as possible from each
|
||||
damaged member.
|
||||
|
||||
'--missing-crc'
|
||||
Exit with error status 2 if the CRC of the extended records is
|
||||
missing. When this option is used, tarlz detects any corruption
|
||||
in the extended records (only limited by CRC collisions). But note
|
||||
that a corrupt 'GNU.crc32' keyword, for example 'GNU.crc33', is
|
||||
reported as a missing CRC instead of as a corrupt record. This
|
||||
misleading 'Missing CRC' message is the consequence of a flaw in
|
||||
the posix pax format; i.e., the lack of a mandatory check sequence
|
||||
in the extended records. *Note crc32::.
|
||||
|
||||
'--uncompressed'
|
||||
With '--create', don't compress the created tar archive. Create an
|
||||
|
@ -203,9 +245,337 @@ invalid input file, 3 for an internal consistency error (eg, bug) which
|
|||
caused tarlz to panic.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Invoking tarlz, Up: Top
|
||||
File: tarlz.info, Node: File format, Next: Amendments to pax format, Prev: Invoking tarlz, Up: Top
|
||||
|
||||
3 A small tutorial with examples
|
||||
3 File format
|
||||
*************
|
||||
|
||||
In the diagram below, a box like this:
|
||||
+---+
|
||||
| | <-- the vertical bars might be missing
|
||||
+---+
|
||||
|
||||
represents one byte; a box like this:
|
||||
+==============+
|
||||
| |
|
||||
+==============+
|
||||
|
||||
represents a variable number of bytes or a fixed but large number of
|
||||
bytes (for example 512).
|
||||
|
||||
|
||||
A tar.lz file consists of a series of lzip members (compressed data
|
||||
sets). The members simply appear one after another in the file, with no
|
||||
additional information before, between, or after them.
|
||||
|
||||
Each lzip member contains one or more tar members in a simplified
|
||||
posix pax interchange format; the only pax typeflag value supported by
|
||||
tarlz (in addition to the typeflag values defined by the ustar format)
|
||||
is 'x'. The pax format is an extension on top of the ustar format that
|
||||
removes the size limitations of the ustar format.
|
||||
|
||||
Each tar member contains one file archived, and is represented by the
|
||||
following sequence:
|
||||
|
||||
* An optional extended header block with extended header records.
|
||||
This header block is of the form described in pax header block,
|
||||
with a typeflag value of 'x'. The extended header records are
|
||||
included as the data for this header block.
|
||||
|
||||
* A header block in ustar format that describes the file. Any fields
|
||||
defined in the preceding optional extended header records override
|
||||
the associated fields in this header block for this file.
|
||||
|
||||
* Zero or more blocks that contain the contents of the file.
|
||||
|
||||
At the end of the archive file there are two 512-byte blocks filled
|
||||
with binary zeros, interpreted as an end-of-archive indicator. These EOF
|
||||
blocks are either compressed in a separate lzip member or compressed
|
||||
along with the tar members contained in the last lzip member.
|
||||
|
||||
The diagram below shows the correspondence between each tar member
|
||||
(formed by one or two headers plus optional data) in the tar archive and
|
||||
each lzip member in the resulting multimember tar.lz archive: *Note
|
||||
File format: (lzip)File format.
|
||||
|
||||
tar
|
||||
+========+======+=================+===============+========+======+========+
|
||||
| header | data | extended header | extended data | header | data | EOF |
|
||||
+========+======+=================+===============+========+======+========+
|
||||
|
||||
tar.lz
|
||||
+===============+=================================================+========+
|
||||
| member | member | member |
|
||||
+===============+=================================================+========+
|
||||
|
||||
|
||||
3.1 Pax header block
|
||||
====================
|
||||
|
||||
The pax header block is identical to the ustar header block described
|
||||
below except that the typeflag has the value 'x' (extended). The size
|
||||
field is the size of the extended header data in bytes. Most other
|
||||
fields in the pax header block are zeroed on archive creation to
|
||||
prevent trouble if the archive is read by an ustar tool, and are
|
||||
ignored by tarlz on archive extraction. *Note flawed-compat::.
|
||||
|
||||
The pax extended header data consists of one or more records, each of
|
||||
them constructed as follows:
|
||||
'"%d %s=%s\n", <length>, <keyword>, <value>'
|
||||
|
||||
The <length>, <blank>, <keyword>, <equals-sign>, and <newline> in the
|
||||
record must be limited to the portable character set. The <length> field
|
||||
contains the decimal length of the record in bytes, including the
|
||||
trailing <newline>. The <value> field is stored as-is, without
|
||||
conversion to UTF-8 nor any other transformation.
|
||||
|
||||
These are the <keyword> fields currently supported by tarlz:
|
||||
|
||||
'linkpath'
|
||||
The pathname of a link being created to another file, of any type,
|
||||
previously archived. This record overrides the linkname field in
|
||||
the following ustar header block. The following ustar header block
|
||||
determines the type of link created. If typeflag of the following
|
||||
header block is 1, it will be a hard link. If typeflag is 2, it
|
||||
will be a symbolic link and the linkpath value will be used as the
|
||||
contents of the symbolic link.
|
||||
|
||||
'path'
|
||||
The pathname of the following file. This record overrides the name
|
||||
and prefix fields in the following ustar header block.
|
||||
|
||||
'size'
|
||||
The size of the file in bytes, expressed as a decimal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard. This record
|
||||
overrides the size field in the following ustar header block. The
|
||||
size record is used only for files with a size value greater than
|
||||
8_589_934_591 (octal 77777777777). This is 2^33 bytes or larger.
|
||||
|
||||
'GNU.crc32'
|
||||
CRC32-C (Castagnoli) of the extended header data excluding the 8
|
||||
bytes representing the CRC <value> itself. The <value> is
|
||||
represented as 8 hexadecimal digits in big endian order,
|
||||
'22 GNU.crc32=00000000\n'. The keyword of the CRC record is
|
||||
protected by the CRC to guarante that corruption is always detected
|
||||
(except in case of CRC collision). A CRC was chosen because a
|
||||
checksum is too weak for a potentially large list of variable
|
||||
sized records. A checksum can't detect simple errors like the
|
||||
swapping of two bytes.
|
||||
|
||||
|
||||
3.2 Ustar header block
|
||||
======================
|
||||
|
||||
The ustar header block has a length of 512 bytes and is structured as
|
||||
shown in the following table. All lengths and offsets are in decimal.
|
||||
|
||||
Field Name Offset Length (in bytes)
|
||||
name 0 100
|
||||
mode 100 8
|
||||
uid 108 8
|
||||
gid 116 8
|
||||
size 124 12
|
||||
mtime 136 12
|
||||
chksum 148 8
|
||||
typeflag 156 1
|
||||
linkname 157 100
|
||||
magic 257 6
|
||||
version 263 2
|
||||
uname 265 32
|
||||
gname 297 32
|
||||
devmajor 329 8
|
||||
devminor 337 8
|
||||
prefix 345 155
|
||||
|
||||
All characters in the header block are coded using the ISO/IEC
|
||||
646:1991 (ASCII) standard, except in fields storing names for files,
|
||||
users, and groups. For maximum portability between implementations,
|
||||
names should only contain characters from the portable filename
|
||||
character set. But if an implementation supports the use of characters
|
||||
outside of '/' and the portable filename character set in names for
|
||||
files, users, and groups, tarlz will use the byte values in these names
|
||||
unmodified.
|
||||
|
||||
The fields name, linkname, and prefix are null-terminated character
|
||||
strings except when all characters in the array contain non-null
|
||||
characters including the last character.
|
||||
|
||||
The name and the prefix fields produce the pathname of the file. A
|
||||
new pathname is formed, if prefix is not an empty string (its first
|
||||
character is not null), by concatenating prefix (up to the first null
|
||||
character), a <slash> character, and name; otherwise, name is used
|
||||
alone. In either case, name is terminated at the first null character.
|
||||
If prefix begins with a null character, it is ignored. In this manner,
|
||||
pathnames of at most 256 characters can be supported. If a pathname does
|
||||
not fit in the space provided, an extended record is used to store the
|
||||
pathname.
|
||||
|
||||
The linkname field does not use the prefix to produce a pathname. If
|
||||
the linkname does not fit in the 100 characters provided, an extended
|
||||
record is used to store the linkname.
|
||||
|
||||
The mode field provides 12 access permission bits. The following
|
||||
table shows the symbolic name of each bit and its octal value:
|
||||
|
||||
Bit Name Bit value
|
||||
S_ISUID 04000
|
||||
S_ISGID 02000
|
||||
S_ISVTX 01000
|
||||
S_IRUSR 00400
|
||||
S_IWUSR 00200
|
||||
S_IXUSR 00100
|
||||
S_IRGRP 00040
|
||||
S_IWGRP 00020
|
||||
S_IXGRP 00010
|
||||
S_IROTH 00004
|
||||
S_IWOTH 00002
|
||||
S_IXOTH 00001
|
||||
|
||||
The uid and gid fields are the user and group ID of the owner and
|
||||
group of the file, respectively.
|
||||
|
||||
The size field contains the octal representation of the size of the
|
||||
file in bytes. If the typeflag field specifies a file of type '0'
|
||||
(regular file) or '7' (high performance regular file), the number of
|
||||
logical records following the header is (size / 512) rounded to the next
|
||||
integer. For all other values of typeflag, tarlz either sets the size
|
||||
field to 0 or ignores it, and does not store or expect any logical
|
||||
records following the header. If the file size is larger than
|
||||
8_589_934_591 bytes (octal 77777777777), an extended record is used to
|
||||
store the file size.
|
||||
|
||||
The mtime field contains the octal representation of the modification
|
||||
time of the file at the time it was archived, obtained from the stat()
|
||||
function.
|
||||
|
||||
The chksum field contains the octal representation of the value of
|
||||
the simple sum of all bytes in the header logical record. Each byte in
|
||||
the header is treated as an unsigned value. When calculating the
|
||||
checksum, the chksum field is treated as if it were all <space>
|
||||
characters.
|
||||
|
||||
The typeflag field contains a single character specifying the type of
|
||||
file archived:
|
||||
|
||||
''0''
|
||||
Regular file.
|
||||
|
||||
''1''
|
||||
Hard link to another file, of any type, previously archived.
|
||||
|
||||
''2''
|
||||
Symbolic link.
|
||||
|
||||
''3', '4''
|
||||
Character special file and block special file respectively. In
|
||||
this case the devmajor and devminor fields contain information
|
||||
defining the device in unspecified format.
|
||||
|
||||
''5''
|
||||
Directory.
|
||||
|
||||
''6''
|
||||
FIFO special file.
|
||||
|
||||
''7''
|
||||
Reserved to represent a file to which an implementation has
|
||||
associated some high-performance attribute. Tarlz treats this type
|
||||
of file as a regular file (type 0).
|
||||
|
||||
|
||||
The magic field contains the ASCII null-terminated string "ustar".
|
||||
The version field contains the characters "00" (0x30,0x30). The fields
|
||||
uname, and gname are null-terminated character strings. Each numeric
|
||||
field contains a leading zero-filled, null-terminated octal number using
|
||||
digits from the ISO/IEC 646:1991 (ASCII) standard.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Amendments to pax format, Next: Examples, Prev: File format, Up: Top
|
||||
|
||||
4 The reasons for the differences with pax
|
||||
******************************************
|
||||
|
||||
Tarlz is meant to reliably detect invalid or corrupt metadata during
|
||||
extraction and to not create safety risks in the archives it creates. In
|
||||
order to achieve these goals, tarlz makes some changes to the variant
|
||||
of the pax format that it uses. This chapter describes these changes
|
||||
and the concrete reasons to implement them.
|
||||
|
||||
|
||||
4.1 Add a CRC of the extended records
|
||||
=====================================
|
||||
|
||||
The posix pax format has a serious flaw. The metadata stored in pax
|
||||
extended records are not protected by any kind of check sequence.
|
||||
Corruption in a long filename may cause the extraction of the file in
|
||||
the wrong place without warning. Corruption in a long file size may
|
||||
cause the truncation of the file or the appending of garbage to the
|
||||
file, both followed by a spurious warning about a corrupt header far
|
||||
from the place of the undetected corruption.
|
||||
|
||||
Metadata like filename and file size must be always protected in an
|
||||
archive format because of the adverse effects of undetected corruption
|
||||
in them, potentially much worse that undetected corruption in the data.
|
||||
Even more so in the case of pax because the amount of metadata it
|
||||
stores is potentially large, making undetected corruption more probable.
|
||||
|
||||
Because of the above, tarlz protects the extended records with a CRC
|
||||
in a way compatible with standard tar tools. *Note key_crc32::.
|
||||
|
||||
|
||||
4.2 Remove flawed backward compatibility
|
||||
========================================
|
||||
|
||||
In order to allow the extraction of pax archives by a tar utility
|
||||
conforming to the POSIX-2:1993 standard, POSIX.1-2008 recommends
|
||||
selecting extended header field values that allow such tar to create a
|
||||
regular file containing the extended header records as data. This
|
||||
approach is broken because if the extended header is needed because of
|
||||
a long filename, the name and prefix fields will be unable to contain
|
||||
the full pathname of the file. Therefore the files corresponding to
|
||||
both the extended header and the overridden ustar header will be
|
||||
extracted using truncated filenames, perhaps overwriting existing files
|
||||
or directories. It may be a security risk to extract a file with a
|
||||
truncated filename.
|
||||
|
||||
To avoid this problem, tarlz writes extended headers with all fields
|
||||
zeroed except size, chksum, typeflag, magic and version. This prevents
|
||||
old tar programs from extracting the extended records as a file in the
|
||||
wrong place. Tarlz also sets to zero those fields of the ustar header
|
||||
overridden by extended records.
|
||||
|
||||
If the extended header is needed because of a file size larger than
|
||||
8 GiB, the size field will be unable to contain the full size of the
|
||||
file. Therefore the file may be partially extracted, and the tool will
|
||||
issue a spurious warning about a corrupt header at the point where it
|
||||
thinks the file ends. Setting to zero the overridden size in the ustar
|
||||
header at least prevents the partial extraction and makes obvious that
|
||||
the file has been truncated.
|
||||
|
||||
|
||||
4.3 As simple as possible (but not simpler)
|
||||
===========================================
|
||||
|
||||
The tarlz format is mainly ustar. Extended pax headers are used only
|
||||
when needed because the length of a filename or link name, or the size
|
||||
of a file exceed the limits of the ustar format. Adding extended
|
||||
headers to each member just to record subsecond timestamps seems
|
||||
wasteful for a backup format.
|
||||
|
||||
|
||||
4.4 Avoid misconversions to/from UTF-8
|
||||
======================================
|
||||
|
||||
There is no portable way to tell what charset a text string is coded
|
||||
into. Therefore, tarlz stores all fields representing text strings
|
||||
as-is, without conversion to UTF-8 nor any other transformation. This
|
||||
prevents accidental double UTF-8 conversions. If the need arises this
|
||||
behavior will be adjusted with a command line option in the future.
|
||||
|
||||
|
||||
File: tarlz.info, Node: Examples, Next: Problems, Prev: Amendments to pax format, Up: Top
|
||||
|
||||
5 A small tutorial with examples
|
||||
********************************
|
||||
|
||||
Example 1: Create a multimember compressed archive 'archive.tar.lz'
|
||||
|
@ -232,7 +602,7 @@ Example 4: Create a compressed appendable archive containing directories
|
|||
'dir1', 'dir2' and 'dir3' with a separate lzip member per directory.
|
||||
Then append files 'a', 'b', 'c', 'd' and 'e' to the archive, all of
|
||||
them contained in a single lzip member. The resulting archive
|
||||
'archive.tar.lz' contains 5 lzip members (including the eof member).
|
||||
'archive.tar.lz' contains 5 lzip members (including the EOF member).
|
||||
|
||||
tarlz --dsolid -cf archive.tar.lz dir1 dir2 dir3
|
||||
tarlz --asolid -rf archive.tar.lz a b c d e
|
||||
|
@ -240,7 +610,7 @@ them contained in a single lzip member. The resulting archive
|
|||
|
||||
Example 5: Create a solidly compressed archive 'archive.tar.lz'
|
||||
containing files 'a', 'b' and 'c'. Note that no more files can be later
|
||||
appended to the archive without decompressing it first.
|
||||
appended to the archive.
|
||||
|
||||
tarlz --solid -cf archive.tar.lz a b c
|
||||
|
||||
|
@ -263,7 +633,7 @@ Example 8: Copy the contents of directory 'sourcedir' to the directory
|
|||
|
||||
File: tarlz.info, Node: Problems, Next: Concept index, Prev: Examples, Up: Top
|
||||
|
||||
4 Reporting bugs
|
||||
6 Reporting bugs
|
||||
****************
|
||||
|
||||
There are probably bugs in tarlz. There are certainly errors and
|
||||
|
@ -284,8 +654,11 @@ Concept index
|
|||
|