123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948 |
- .\" Copyright (c) 2003-2009 Tim Kientzle
- .\" Copyright (c) 2016 Martin Matuska
- .\" All rights reserved.
- .\"
- .\" Redistribution and use in source and binary forms, with or without
- .\" modification, are permitted provided that the following conditions
- .\" are met:
- .\" 1. Redistributions of source code must retain the above copyright
- .\" notice, this list of conditions and the following disclaimer.
- .\" 2. Redistributions in binary form must reproduce the above copyright
- .\" notice, this list of conditions and the following disclaimer in the
- .\" documentation and/or other materials provided with the distribution.
- .\"
- .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
- .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- .\" SUCH DAMAGE.
- .\"
- .\" $FreeBSD$
- .\"
- .Dd December 27, 2016
- .Dt TAR 5
- .Os
- .Sh NAME
- .Nm tar
- .Nd format of tape archive files
- .Sh DESCRIPTION
- The
- .Nm
- archive format collects any number of files, directories, and other
- file system objects (symbolic links, device nodes, etc.) into a single
- stream of bytes.
- The format was originally designed to be used with
- tape drives that operate with fixed-size blocks, but is widely used as
- a general packaging mechanism.
- .Ss General Format
- A
- .Nm
- archive consists of a series of 512-byte records.
- Each file system object requires a header record which stores basic metadata
- (pathname, owner, permissions, etc.) and zero or more records containing any
- file data.
- The end of the archive is indicated by two records consisting
- entirely of zero bytes.
- .Pp
- For compatibility with tape drives that use fixed block sizes,
- programs that read or write tar files always read or write a fixed
- number of records with each I/O operation.
- These
- .Dq blocks
- are always a multiple of the record size.
- The maximum block size supported by early
- implementations was 10240 bytes or 20 records.
- This is still the default for most implementations
- although block sizes of 1MiB (2048 records) or larger are
- commonly used with modern high-speed tape drives.
- (Note: the terms
- .Dq block
- and
- .Dq record
- here are not entirely standard; this document follows the
- convention established by John Gilmore in documenting
- .Nm pdtar . )
- .Ss Old-Style Archive Format
- The original tar archive format has been extended many times to
- include additional information that various implementors found
- necessary.
- This section describes the variant implemented by the tar command
- included in
- .At v7 ,
- which seems to be the earliest widely-used version of the tar program.
- .Pp
- The header record for an old-style
- .Nm
- archive consists of the following:
- .Bd -literal -offset indent
- struct header_old_tar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char linkflag[1];
- char linkname[100];
- char pad[255];
- };
- .Ed
- All unused bytes in the header record are filled with nulls.
- .Bl -tag -width indent
- .It Va name
- Pathname, stored as a null-terminated string.
- Early tar implementations only stored regular files (including
- hardlinks to those files).
- One common early convention used a trailing "/" character to indicate
- a directory name, allowing directory permissions and owner information
- to be archived and restored.
- .It Va mode
- File mode, stored as an octal number in ASCII.
- .It Va uid , Va gid
- User id and group id of owner, as octal numbers in ASCII.
- .It Va size
- Size of file, as octal number in ASCII.
- For regular files only, this indicates the amount of data
- that follows the header.
- In particular, this field was ignored by early tar implementations
- when extracting hardlinks.
- Modern writers should always store a zero length for hardlink entries.
- .It Va mtime
- Modification time of file, as an octal number in ASCII.
- This indicates the number of seconds since the start of the epoch,
- 00:00:00 UTC January 1, 1970.
- Note that negative values should be avoided
- here, as they are handled inconsistently.
- .It Va checksum
- Header checksum, stored as an octal number in ASCII.
- To compute the checksum, set the checksum field to all spaces,
- then sum all bytes in the header using unsigned arithmetic.
- This field should be stored as six octal digits followed by a null and a space
- character.
- Note that many early implementations of tar used signed arithmetic
- for the checksum field, which can cause interoperability problems
- when transferring archives between systems.
- Modern robust readers compute the checksum both ways and accept the
- header if either computation matches.
- .It Va linkflag , Va linkname
- In order to preserve hardlinks and conserve tape, a file
- with multiple links is only written to the archive the first
- time it is encountered.
- The next time it is encountered, the
- .Va linkflag
- is set to an ASCII
- .Sq 1
- and the
- .Va linkname
- field holds the first name under which this file appears.
- (Note that regular files have a null value in the
- .Va linkflag
- field.)
- .El
- .Pp
- Early tar implementations varied in how they terminated these fields.
- The tar command in
- .At v7
- used the following conventions (this is also documented in early BSD manpages):
- the pathname must be null-terminated;
- the mode, uid, and gid fields must end in a space and a null byte;
- the size and mtime fields must end in a space;
- the checksum is terminated by a null and a space.
- Early implementations filled the numeric fields with leading spaces.
- This seems to have been common practice until the
- .St -p1003.1-88
- standard was released.
- For best portability, modern implementations should fill the numeric
- fields with leading zeros.
- .Ss Pre-POSIX Archives
- An early draft of
- .St -p1003.1-88
- served as the basis for John Gilmore's
- .Nm pdtar
- program and many system implementations from the late 1980s
- and early 1990s.
- These archives generally follow the POSIX ustar
- format described below with the following variations:
- .Bl -bullet -compact -width indent
- .It
- The magic value consists of the five characters
- .Dq ustar
- followed by a space.
- The version field contains a space character followed by a null.
- .It
- The numeric fields are generally filled with leading spaces
- (not leading zeros as recommended in the final standard).
- .It
- The prefix field is often not used, limiting pathnames to
- the 100 characters of old-style archives.
- .El
- .Ss POSIX ustar Archives
- .St -p1003.1-88
- defined a standard tar file format to be read and written
- by compliant implementations of
- .Xr tar 1 .
- This format is often called the
- .Dq ustar
- format, after the magic value used
- in the header.
- (The name is an acronym for
- .Dq Unix Standard TAR . )
- It extends the historic format with new fields:
- .Bd -literal -offset indent
- struct header_posix_ustar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char typeflag[1];
- char linkname[100];
- char magic[6];
- char version[2];
- char uname[32];
- char gname[32];
- char devmajor[8];
- char devminor[8];
- char prefix[155];
- char pad[12];
- };
- .Ed
- .Bl -tag -width indent
- .It Va typeflag
- Type of entry.
- POSIX extended the earlier
- .Va linkflag
- field with several new type values:
- .Bl -tag -width indent -compact
- .It Dq 0
- Regular file.
- NUL should be treated as a synonym, for compatibility purposes.
- .It Dq 1
- Hard link.
- .It Dq 2
- Symbolic link.
- .It Dq 3
- Character device node.
- .It Dq 4
- Block device node.
- .It Dq 5
- Directory.
- .It Dq 6
- FIFO node.
- .It Dq 7
- Reserved.
- .It Other
- A POSIX-compliant implementation must treat any unrecognized typeflag value
- as a regular file.
- In particular, writers should ensure that all entries
- have a valid filename so that they can be restored by readers that do not
- support the corresponding extension.
- Uppercase letters "A" through "Z" are reserved for custom extensions.
- Note that sockets and whiteout entries are not archivable.
- .El
- It is worth noting that the
- .Va size
- field, in particular, has different meanings depending on the type.
- For regular files, of course, it indicates the amount of data
- following the header.
- For directories, it may be used to indicate the total size of all
- files in the directory, for use by operating systems that pre-allocate
- directory space.
- For all other types, it should be set to zero by writers and ignored
- by readers.
- .It Va magic
- Contains the magic value
- .Dq ustar
- followed by a NUL byte to indicate that this is a POSIX standard archive.
- Full compliance requires the uname and gname fields be properly set.
- .It Va version
- Version.
- This should be
- .Dq 00
- (two copies of the ASCII digit zero) for POSIX standard archives.
- .It Va uname , Va gname
- User and group names, as null-terminated ASCII strings.
- These should be used in preference to the uid/gid values
- when they are set and the corresponding names exist on
- the system.
- .It Va devmajor , Va devminor
- Major and minor numbers for character device or block device entry.
- .It Va name , Va prefix
- If the pathname is too long to fit in the 100 bytes provided by the standard
- format, it can be split at any
- .Pa /
- character with the first portion going into the prefix field.
- If the prefix field is not empty, the reader will prepend
- the prefix value and a
- .Pa /
- character to the regular name field to obtain the full pathname.
- The standard does not require a trailing
- .Pa /
- character on directory names, though most implementations still
- include this for compatibility reasons.
- .El
- .Pp
- Note that all unused bytes must be set to
- .Dv NUL .
- .Pp
- Field termination is specified slightly differently by POSIX
- than by previous implementations.
- The
- .Va magic ,
- .Va uname ,
- and
- .Va gname
- fields must have a trailing
- .Dv NUL .
- The
- .Va pathname ,
- .Va linkname ,
- and
- .Va prefix
- fields must have a trailing
- .Dv NUL
- unless they fill the entire field.
- (In particular, it is possible to store a 256-character pathname if it
- happens to have a
- .Pa /
- as the 156th character.)
- POSIX requires numeric fields to be zero-padded in the front, and requires
- them to be terminated with either space or
- .Dv NUL
- characters.
- .Pp
- Currently, most tar implementations comply with the ustar
- format, occasionally extending it by adding new fields to the
- blank area at the end of the header record.
- .Ss Numeric Extensions
- There have been several attempts to extend the range of sizes
- or times supported by modifying how numbers are stored in the
- header.
- .Pp
- One obvious extension to increase the size of files is to
- eliminate the terminating characters from the various
- numeric fields.
- For example, the standard only allows the size field to contain
- 11 octal digits, reserving the twelfth byte for a trailing
- NUL character.
- Allowing 12 octal digits allows file sizes up to 64 GB.
- .Pp
- Another extension, utilized by GNU tar, star, and other newer
- .Nm
- implementations, permits binary numbers in the standard numeric fields.
- This is flagged by setting the high bit of the first byte.
- The remainder of the field is treated as a signed twos-complement
- value.
- This permits 95-bit values for the length and time fields
- and 63-bit values for the uid, gid, and device numbers.
- In particular, this provides a consistent way to handle
- negative time values.
- GNU tar supports this extension for the
- length, mtime, ctime, and atime fields.
- Joerg Schilling's star program and the libarchive library support
- this extension for all numeric fields.
- Note that this extension is largely obsoleted by the extended
- attribute record provided by the pax interchange format.
- .Pp
- Another early GNU extension allowed base-64 values rather than octal.
- This extension was short-lived and is no longer supported by any
- implementation.
- .Ss Pax Interchange Format
- There are many attributes that cannot be portably stored in a
- POSIX ustar archive.
- .St -p1003.1-2001
- defined a
- .Dq pax interchange format
- that uses two new types of entries to hold text-formatted
- metadata that applies to following entries.
- Note that a pax interchange format archive is a ustar archive in every
- respect.
- The new data is stored in ustar-compatible archive entries that use the
- .Dq x
- or
- .Dq g
- typeflag.
- In particular, older implementations that do not fully support these
- extensions will extract the metadata into regular files, where the
- metadata can be examined as necessary.
- .Pp
- An entry in a pax interchange format archive consists of one or
- two standard ustar entries, each with its own header and data.
- The first optional entry stores the extended attributes
- for the following entry.
- This optional first entry has an "x" typeflag and a size field that
- indicates the total size of the extended attributes.
- The extended attributes themselves are stored as a series of text-format
- lines encoded in the portable UTF-8 encoding.
- Each line consists of a decimal number, a space, a key string, an equals
- sign, a value string, and a new line.
- The decimal number indicates the length of the entire line, including the
- initial length field and the trailing newline.
- An example of such a field is:
- .Dl 25 ctime=1084839148.1212\en
- Keys in all lowercase are standard keys.
- Vendors can add their own keys by prefixing them with an all uppercase
- vendor name and a period.
- Note that, unlike the historic header, numeric values are stored using
- decimal, not octal.
- A description of some common keys follows:
- .Bl -tag -width indent
- .It Cm atime , Cm ctime , Cm mtime
- File access, inode change, and modification times.
- These fields can be negative or include a decimal point and a fractional value.
- .It Cm hdrcharset
- The character set used by the pax extension values.
- By default, all textual values in the pax extended attributes
- are assumed to be in UTF-8, including pathnames, user names,
- and group names.
- In some cases, it is not possible to translate local
- conventions into UTF-8.
- If this key is present and the value is the six-character ASCII string
- .Dq BINARY ,
- then all textual values are assumed to be in a platform-dependent
- multi-byte encoding.
- Note that there are only two valid values for this key:
- .Dq BINARY
- or
- .Dq ISO-IR\ 10646\ 2000\ UTF-8 .
- No other values are permitted by the standard, and
- the latter value should generally not be used as it is the
- default when this key is not specified.
- In particular, this flag should not be used as a general
- mechanism to allow filenames to be stored in arbitrary
- encodings.
- .It Cm uname , Cm uid , Cm gname , Cm gid
- User name, group name, and numeric UID and GID values.
- The user name and group name stored here are encoded in UTF8
- and can thus include non-ASCII characters.
- The UID and GID fields can be of arbitrary length.
- .It Cm linkpath
- The full path of the linked-to file.
- Note that this is encoded in UTF8 and can thus include non-ASCII characters.
- .It Cm path
- The full pathname of the entry.
- Note that this is encoded in UTF8 and can thus include non-ASCII characters.
- .It Cm realtime.* , Cm security.*
- These keys are reserved and may be used for future standardization.
- .It Cm size
- The size of the file.
- Note that there is no length limit on this field, allowing conforming
- archives to store files much larger than the historic 8GB limit.
- .It Cm SCHILY.*
- Vendor-specific attributes used by Joerg Schilling's
- .Nm star
- implementation.
- .It Cm SCHILY.acl.access , Cm SCHILY.acl.default, Cm SCHILY.acl.ace
- Stores the access, default and NFSv4 ACLs as textual strings in a format
- that is an extension of the format specified by POSIX.1e draft 17.
- In particular, each user or group access specification can include
- an additional colon-separated field with the numeric UID or GID.
- This allows ACLs to be restored on systems that may not have complete
- user or group information available (such as when NIS/YP or LDAP services
- are temporarily unavailable).
- .It Cm SCHILY.devminor , Cm SCHILY.devmajor
- The full minor and major numbers for device nodes.
- .It Cm SCHILY.fflags
- The file flags.
- .It Cm SCHILY.realsize
- The full size of the file on disk.
- XXX explain? XXX
- .It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks
- The device number, inode number, and link count for the entry.
- In particular, note that a pax interchange format archive using Joerg
- Schilling's
- .Cm SCHILY.*
- extensions can store all of the data from
- .Va struct stat .
- .It Cm LIBARCHIVE.*
- Vendor-specific attributes used by the
- .Nm libarchive
- library and programs that use it.
- .It Cm LIBARCHIVE.creationtime
- The time when the file was created.
- (This should not be confused with the POSIX
- .Dq ctime
- attribute, which refers to the time when the file
- metadata was last changed.)
- .It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
- Libarchive stores POSIX.1e-style extended attributes using
- keys of this form.
- The
- .Ar key
- value is URL-encoded:
- All non-ASCII characters and the two special characters
- .Dq =
- and
- .Dq %
- are encoded as
- .Dq %
- followed by two uppercase hexadecimal digits.
- The value of this key is the extended attribute value
- encoded in base 64.
- XXX Detail the base-64 format here XXX
- .It Cm VENDOR.*
- XXX document other vendor-specific extensions XXX
- .El
- .Pp
- Any values stored in an extended attribute override the corresponding
- values in the regular tar header.
- Note that compliant readers should ignore the regular fields when they
- are overridden.
- This is important, as existing archivers are known to store non-compliant
- values in the standard header fields in this situation.
- There are no limits on length for any of these fields.
- In particular, numeric fields can be arbitrarily large.
- All text fields are encoded in UTF8.
- Compliant writers should store only portable 7-bit ASCII characters in
- the standard ustar header and use extended
- attributes whenever a text value contains non-ASCII characters.
- .Pp
- In addition to the
- .Cm x
- entry described above, the pax interchange format
- also supports a
- .Cm g
- entry.
- The
- .Cm g
- entry is identical in format, but specifies attributes that serve as
- defaults for all subsequent archive entries.
- The
- .Cm g
- entry is not widely used.
- .Pp
- Besides the new
- .Cm x
- and
- .Cm g
- entries, the pax interchange format has a few other minor variations
- from the earlier ustar format.
- The most troubling one is that hardlinks are permitted to have
- data following them.
- This allows readers to restore any hardlink to a file without
- having to rewind the archive to find an earlier entry.
- However, it creates complications for robust readers, as it is no longer
- clear whether or not they should ignore the size field for hardlink entries.
- .Ss GNU Tar Archives
- The GNU tar program started with a pre-POSIX format similar to that
- described earlier and has extended it using several different mechanisms:
- It added new fields to the empty space in the header (some of which was later
- used by POSIX for conflicting purposes);
- it allowed the header to be continued over multiple records;
- and it defined new entries that modify following entries
- (similar in principle to the
- .Cm x
- entry described above, but each GNU special entry is single-purpose,
- unlike the general-purpose
- .Cm x
- entry).
- As a result, GNU tar archives are not POSIX compatible, although
- more lenient POSIX-compliant readers can successfully extract most
- GNU tar archives.
- .Bd -literal -offset indent
- struct header_gnu_tar {
- char name[100];
- char mode[8];
- char uid[8];
- char gid[8];
- char size[12];
- char mtime[12];
- char checksum[8];
- char typeflag[1];
- char linkname[100];
- char magic[6];
- char version[2];
- char uname[32];
- char gname[32];
- char devmajor[8];
- char devminor[8];
- char atime[12];
- char ctime[12];
- char offset[12];
- char longnames[4];
- char unused[1];
- struct {
- char offset[12];
- char numbytes[12];
- } sparse[4];
- char isextended[1];
- char realsize[12];
- char pad[17];
- };
- .Ed
- .Bl -tag -width indent
- .It Va typeflag
- GNU tar uses the following special entry types, in addition to
- those defined by POSIX:
- .Bl -tag -width indent
- .It "7"
- GNU tar treats type "7" records identically to type "0" records,
- except on one obscure RTOS where they are used to indicate the
- pre-allocation of a contiguous file on disk.
- .It "D"
- This indicates a directory entry.
- Unlike the POSIX-standard "5"
- typeflag, the header is followed by data records listing the names
- of files in this directory.
- Each name is preceded by an ASCII "Y"
- if the file is stored in this archive or "N" if the file is not
- stored in this archive.
- Each name is terminated with a null, and
- an extra null marks the end of the name list.
- The purpose of this
- entry is to support incremental backups; a program restoring from
- such an archive may wish to delete files on disk that did not exist
- in the directory when the archive was made.
- .Pp
- Note that the "D" typeflag specifically violates POSIX, which requires
- that unrecognized typeflags be restored as normal files.
- In this case, restoring the "D" entry as a file could interfere
- with subsequent creation of the like-named directory.
- .It "K"
- The data for this entry is a long linkname for the following regular entry.
- .It "L"
- The data for this entry is a long pathname for the following regular entry.
- .It "M"
- This is a continuation of the last file on the previous volume.
- GNU multi-volume archives guarantee that each volume begins with a valid
- entry header.
- To ensure this, a file may be split, with part stored at the end of one volume,
- and part stored at the beginning of the next volume.
- The "M" typeflag indicates that this entry continues an existing file.
- Such entries can only occur as the first or second entry
- in an archive (the latter only if the first entry is a volume label).
- The
- .Va size
- field specifies the size of this entry.
- The
- .Va offset
- field at bytes 369-380 specifies the offset where this file fragment
- begins.
- The
- .Va realsize
- field specifies the total size of the file (which must equal
- .Va size
- plus
- .Va offset ) .
- When extracting, GNU tar checks that the header file name is the one it is
- expecting, that the header offset is in the correct sequence, and that
- the sum of offset and size is equal to realsize.
- .It "N"
- Type "N" records are no longer generated by GNU tar.
- They contained a
- list of files to be renamed or symlinked after extraction; this was
- originally used to support long names.
- The contents of this record
- are a text description of the operations to be done, in the form
- .Dq Rename %s to %s\en
- or
- .Dq Symlink %s to %s\en ;
- in either case, both
- filenames are escaped using K&R C syntax.
- Due to security concerns, "N" records are now generally ignored
- when reading archives.
- .It "S"
- This is a
- .Dq sparse
- regular file.
- Sparse files are stored as a series of fragments.
- The header contains a list of fragment offset/length pairs.
- If more than four such entries are required, the header is
- extended as necessary with
- .Dq extra
- header extensions (an older format that is no longer used), or
- .Dq sparse
- extensions.
- .It "V"
- The
- .Va name
- field should be interpreted as a tape/volume header name.
- This entry should generally be ignored on extraction.
- .El
- .It Va magic
- The magic field holds the five characters
- .Dq ustar
- followed by a space.
- Note that POSIX ustar archives have a trailing null.
- .It Va version
- The version field holds a space character followed by a null.
- Note that POSIX ustar archives use two copies of the ASCII digit
- .Dq 0 .
- .It Va atime , Va ctime
- The time the file was last accessed and the time of
- last change of file information, stored in octal as with
- .Va mtime .
- .It Va longnames
- This field is apparently no longer used.
- .It Sparse Va offset / Va numbytes
- Each such structure specifies a single fragment of a sparse
- file.
- The two fields store values as octal numbers.
- The fragments are each padded to a multiple of 512 bytes
- in the archive.
- On extraction, the list of fragments is collected from the
- header (including any extension headers), and the data
- is then read and written to the file at appropriate offsets.
- .It Va isextended
- If this is set to non-zero, the header will be followed by additional
- .Dq sparse header
- records.
- Each such record contains information about as many as 21 additional
- sparse blocks as shown here:
- .Bd -literal -offset indent
- struct gnu_sparse_header {
- struct {
- char offset[12];
- char numbytes[12];
- } sparse[21];
- char isextended[1];
- char padding[7];
- };
- .Ed
- .It Va realsize
- A binary representation of the file's complete size, with a much larger range
- than the POSIX file size.
- In particular, with
- .Cm M
- type files, the current entry is only a portion of the file.
- In that case, the POSIX size field will indicate the size of this
- entry; the
- .Va realsize
- field will indicate the total size of the file.
- .El
- .Ss GNU tar pax archives
- GNU tar 1.14 (XXX check this XXX) and later will write
- pax interchange format archives when you specify the
- .Fl -posix
- flag.
- This format follows the pax interchange format closely,
- using some
- .Cm SCHILY
- tags and introducing new keywords to store sparse file information.
- There have been three iterations of the sparse file support, referred to
- as
- .Dq 0.0 ,
- .Dq 0.1 ,
- and
- .Dq 1.0 .
- .Bl -tag -width indent
- .It Cm GNU.sparse.numblocks , Cm GNU.sparse.offset , Cm GNU.sparse.numbytes , Cm GNU.sparse.size
- The
- .Dq 0.0
- format used an initial
- .Cm GNU.sparse.numblocks
- attribute to indicate the number of blocks in the file, a pair of
- .Cm GNU.sparse.offset
- and
- .Cm GNU.sparse.numbytes
- to indicate the offset and size of each block,
- and a single
- .Cm GNU.sparse.size
- to indicate the full size of the file.
- This is not the same as the size in the tar header because the
- latter value does not include the size of any holes.
- This format required that the order of attributes be preserved and
- relied on readers accepting multiple appearances of the same attribute
- names, which is not officially permitted by the standards.
- .It Cm GNU.sparse.map
- The
- .Dq 0.1
- format used a single attribute that stored a comma-separated
- list of decimal numbers.
- Each pair of numbers indicated the offset and size, respectively,
- of a block of data.
- This does not work well if the archive is extracted by an archiver
- that does not recognize this extension, since many pax implementations
- simply discard unrecognized attributes.
- .It Cm GNU.sparse.major , Cm GNU.sparse.minor , Cm GNU.sparse.name , Cm GNU.sparse.realsize
- The
- .Dq 1.0
- format stores the sparse block map in one or more 512-byte blocks
- prepended to the file data in the entry body.
- The pax attributes indicate the existence of this map
- (via the
- .Cm GNU.sparse.major
- and
- .Cm GNU.sparse.minor
- fields)
- and the full size of the file.
- The
- .Cm GNU.sparse.name
- holds the true name of the file.
- To avoid confusion, the name stored in the regular tar header
- is a modified name so that extraction errors will be apparent
- to users.
- .El
- .Ss Solaris Tar
- XXX More Details Needed XXX
- .Pp
- Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
- .Dq extended
- format that is fundamentally similar to pax interchange format,
- with the following differences:
- .Bl -bullet -compact -width indent
- .It
- Extended attributes are stored in an entry whose type is
- .Cm X ,
- not
- .Cm x ,
- as used by pax interchange format.
- The detailed format of this entry appears to be the same
- as detailed above for the
- .Cm x
- entry.
- .It
- An additional
- .Cm A
- header is used to store an ACL for the following regular entry.
- The body of this entry contains a seven-digit octal number
- followed by a zero byte, followed by the
- textual ACL description.
- The octal value is the number of ACL entries
- plus a constant that indicates the ACL type: 01000000
- for POSIX.1e ACLs and 03000000 for NFSv4 ACLs.
- .El
- .Ss AIX Tar
- XXX More details needed XXX
- .Pp
- AIX Tar uses a ustar-formatted header with the type
- .Cm A
- for storing coded ACL information.
- Unlike the Solaris format, AIX tar writes this header after the
- regular file body to which it applies.
- The pathname in this header is either
- .Cm NFS4
- or
- .Cm AIXC
- to indicate the type of ACL stored.
- The actual ACL is stored in platform-specific binary format.
- .Ss Mac OS X Tar
- The tar distributed with Apple's Mac OS X stores most regular files
- as two separate files in the tar archive.
- The two files have the same name except that the first
- one has
- .Dq ._
- prepended to the last path element.
- This special file stores an AppleDouble-encoded
- binary blob with additional metadata about the second file,
- including ACL, extended attributes, and resources.
- To recreate the original file on disk, each
- separate file can be extracted and the Mac OS X
- .Fn copyfile
- function can be used to unpack the separate
- metadata file and apply it to th regular file.
- Conversely, the same function provides a
- .Dq pack
- option to encode the extended metadata from
- a file into a separate file whose contents
- can then be put into a tar archive.
- .Pp
- Note that the Apple extended attributes interact
- badly with long filenames.
- Since each file is stored with the full name,
- a separate set of extensions needs to be included
- in the archive for each one, doubling the overhead
- required for files with long names.
- .Ss Summary of tar type codes
- The following list is a condensed summary of the type codes
- used in tar header records generated by different tar implementations.
- More details about specific implementations can be found above:
- .Bl -tag -compact -width XXX
- .It NUL
- Early tar programs stored a zero byte for regular files.
- .It Cm 0
- POSIX standard type code for a regular file.
- .It Cm 1
- POSIX standard type code for a hard link description.
- .It Cm 2
- POSIX standard type code for a symbolic link description.
- .It Cm 3
- POSIX standard type code for a character device node.
- .It Cm 4
- POSIX standard type code for a block device node.
- .It Cm 5
- POSIX standard type code for a directory.
- .It Cm 6
- POSIX standard type code for a FIFO.
- .It Cm 7
- POSIX reserved.
- .It Cm 7
- GNU tar used for pre-allocated files on some systems.
- .It Cm A
- Solaris tar ACL description stored prior to a regular file header.
- .It Cm A
- AIX tar ACL description stored after the file body.
- .It Cm D
- GNU tar directory dump.
- .It Cm K
- GNU tar long linkname for the following header.
- .It Cm L
- GNU tar long pathname for the following header.
- .It Cm M
- GNU tar multivolume marker, indicating the file is a continuation of a file from the previous volume.
- .It Cm N
- GNU tar long filename support. Deprecated.
- .It Cm S
- GNU tar sparse regular file.
- .It Cm V
- GNU tar tape/volume header name.
- .It Cm X
- Solaris tar general-purpose extension header.
- .It Cm g
- POSIX pax interchange format global extensions.
- .It Cm x
- POSIX pax interchange format per-file extensions.
- .El
- .Sh SEE ALSO
- .Xr ar 1 ,
- .Xr pax 1 ,
- .Xr tar 1
- .Sh STANDARDS
- The
- .Nm tar
- utility is no longer a part of POSIX or the Single Unix Standard.
- It last appeared in
- .St -susv2 .
- It has been supplanted in subsequent standards by
- .Xr pax 1 .
- The ustar format is currently part of the specification for the
- .Xr pax 1
- utility.
- The pax interchange file format is new with
- .St -p1003.1-2001 .
- .Sh HISTORY
- A
- .Nm tar
- command appeared in Seventh Edition Unix, which was released in January, 1979.
- It replaced the
- .Nm tp
- program from Fourth Edition Unix which in turn replaced the
- .Nm tap
- program from First Edition Unix.
- John Gilmore's
- .Nm pdtar
- public-domain implementation (circa 1987) was highly influential
- and formed the basis of
- .Nm GNU tar
- (circa 1988).
- Joerg Shilling's
- .Nm star
- archiver is another open-source (CDDL) archiver (originally developed
- circa 1985) which features complete support for pax interchange
- format.
- .Pp
- This documentation was written as part of the
- .Nm libarchive
- and
- .Nm bsdtar
- project by
- .An Tim Kientzle Aq kientzle@FreeBSD.org .
|