123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325 |
- .\" Copyright (c) 2007 Tim Kientzle
- .\" All rights reserved.
- .\"
- .\" Redistribution and use in source and binary forms, with or without
- .\" modification, are permitted provided that the following conditions
- .\" are met:
- .\" 1. Redistributions of source code must retain the above copyright
- .\" notice, this list of conditions and the following disclaimer.
- .\" 2. Redistributions in binary form must reproduce the above copyright
- .\" notice, this list of conditions and the following disclaimer in the
- .\" documentation and/or other materials provided with the distribution.
- .\"
- .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
- .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
- .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
- .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
- .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
- .\" SUCH DAMAGE.
- .\"
- .\" $FreeBSD$
- .\"
- .Dd December 23, 2011
- .Dt CPIO 5
- .Os
- .Sh NAME
- .Nm cpio
- .Nd format of cpio archive files
- .Sh DESCRIPTION
- The
- .Nm
- archive format collects any number of files, directories, and other
- file system objects (symbolic links, device nodes, etc.) into a single
- stream of bytes.
- .Ss General Format
- Each file system object in a
- .Nm
- archive comprises a header record with basic numeric metadata
- followed by the full pathname of the entry and the file data.
- The header record stores a series of integer values that generally
- follow the fields in
- .Va struct stat .
- (See
- .Xr stat 2
- for details.)
- The variants differ primarily in how they store those integers
- (binary, octal, or hexadecimal).
- The header is followed by the pathname of the
- entry (the length of the pathname is stored in the header)
- and any file data.
- The end of the archive is indicated by a special record with
- the pathname
- .Dq TRAILER!!! .
- .Ss PWB format
- XXX Any documentation of the original PWB/UNIX 1.0 format? XXX
- .Ss Old Binary Format
- The old binary
- .Nm
- format stores numbers as 2-byte and 4-byte binary values.
- Each entry begins with a header in the following format:
- .Bd -literal -offset indent
- struct header_old_cpio {
- unsigned short c_magic;
- unsigned short c_dev;
- unsigned short c_ino;
- unsigned short c_mode;
- unsigned short c_uid;
- unsigned short c_gid;
- unsigned short c_nlink;
- unsigned short c_rdev;
- unsigned short c_mtime[2];
- unsigned short c_namesize;
- unsigned short c_filesize[2];
- };
- .Ed
- .Pp
- The
- .Va unsigned short
- fields here are 16-bit integer values; the
- .Va unsigned int
- fields are 32-bit integer values.
- The fields are as follows
- .Bl -tag -width indent
- .It Va magic
- The integer value octal 070707.
- This value can be used to determine whether this archive is
- written with little-endian or big-endian integers.
- .It Va dev , Va ino
- The device and inode numbers from the disk.
- These are used by programs that read
- .Nm
- archives to determine when two entries refer to the same file.
- Programs that synthesize
- .Nm
- archives should be careful to set these to distinct values for each entry.
- .It Va mode
- The mode specifies both the regular permissions and the file type.
- It consists of several bit fields as follows:
- .Bl -tag -width "MMMMMMM" -compact
- .It 0170000
- This masks the file type bits.
- .It 0140000
- File type value for sockets.
- .It 0120000
- File type value for symbolic links.
- For symbolic links, the link body is stored as file data.
- .It 0100000
- File type value for regular files.
- .It 0060000
- File type value for block special devices.
- .It 0040000
- File type value for directories.
- .It 0020000
- File type value for character special devices.
- .It 0010000
- File type value for named pipes or FIFOs.
- .It 0004000
- SUID bit.
- .It 0002000
- SGID bit.
- .It 0001000
- Sticky bit.
- On some systems, this modifies the behavior of executables and/or directories.
- .It 0000777
- The lower 9 bits specify read/write/execute permissions
- for world, group, and user following standard POSIX conventions.
- .El
- .It Va uid , Va gid
- The numeric user id and group id of the owner.
- .It Va nlink
- The number of links to this file.
- Directories always have a value of at least two here.
- Note that hardlinked files include file data with every copy in the archive.
- .It Va rdev
- For block special and character special entries,
- this field contains the associated device number.
- For all other entry types, it should be set to zero by writers
- and ignored by readers.
- .It Va mtime
- Modification time of the file, indicated as the number
- of seconds since the start of the epoch,
- 00:00:00 UTC January 1, 1970.
- The four-byte integer is stored with the most-significant 16 bits first
- followed by the least-significant 16 bits.
- Each of the two 16 bit values are stored in machine-native byte order.
- .It Va namesize
- The number of bytes in the pathname that follows the header.
- This count includes the trailing NUL byte.
- .It Va filesize
- The size of the file.
- Note that this archive format is limited to
- four gigabyte file sizes.
- See
- .Va mtime
- above for a description of the storage of four-byte integers.
- .El
- .Pp
- The pathname immediately follows the fixed header.
- If the
- .Cm namesize
- is odd, an additional NUL byte is added after the pathname.
- The file data is then appended, padded with NUL
- bytes to an even length.
- .Pp
- Hardlinked files are not given special treatment;
- the full file contents are included with each copy of the
- file.
- .Ss Portable ASCII Format
- .St -susv2
- standardized an ASCII variant that is portable across all
- platforms.
- It is commonly known as the
- .Dq old character
- format or as the
- .Dq odc
- format.
- It stores the same numeric fields as the old binary format, but
- represents them as 6-character or 11-character octal values.
- .Bd -literal -offset indent
- struct cpio_odc_header {
- char c_magic[6];
- char c_dev[6];
- char c_ino[6];
- char c_mode[6];
- char c_uid[6];
- char c_gid[6];
- char c_nlink[6];
- char c_rdev[6];
- char c_mtime[11];
- char c_namesize[6];
- char c_filesize[11];
- };
- .Ed
- .Pp
- The fields are identical to those in the old binary format.
- The name and file body follow the fixed header.
- Unlike the old binary format, there is no additional padding
- after the pathname or file contents.
- If the files being archived are themselves entirely ASCII, then
- the resulting archive will be entirely ASCII, except for the
- NUL byte that terminates the name field.
- .Ss New ASCII Format
- The "new" ASCII format uses 8-byte hexadecimal fields for
- all numbers and separates device numbers into separate fields
- for major and minor numbers.
- .Bd -literal -offset indent
- struct cpio_newc_header {
- char c_magic[6];
- char c_ino[8];
- char c_mode[8];
- char c_uid[8];
- char c_gid[8];
- char c_nlink[8];
- char c_mtime[8];
- char c_filesize[8];
- char c_devmajor[8];
- char c_devminor[8];
- char c_rdevmajor[8];
- char c_rdevminor[8];
- char c_namesize[8];
- char c_check[8];
- };
- .Ed
- .Pp
- Except as specified below, the fields here match those specified
- for the old binary format above.
- .Bl -tag -width indent
- .It Va magic
- The string
- .Dq 070701 .
- .It Va check
- This field is always set to zero by writers and ignored by readers.
- See the next section for more details.
- .El
- .Pp
- The pathname is followed by NUL bytes so that the total size
- of the fixed header plus pathname is a multiple of four.
- Likewise, the file data is padded to a multiple of four bytes.
- Note that this format supports only 4 gigabyte files (unlike the
- older ASCII format, which supports 8 gigabyte files).
- .Pp
- In this format, hardlinked files are handled by setting the
- filesize to zero for each entry except the last one that
- appears in the archive.
- .Ss New CRC Format
- The CRC format is identical to the new ASCII format described
- in the previous section except that the magic field is set
- to
- .Dq 070702
- and the
- .Va check
- field is set to the sum of all bytes in the file data.
- This sum is computed treating all bytes as unsigned values
- and using unsigned arithmetic.
- Only the least-significant 32 bits of the sum are stored.
- .Ss HP variants
- The
- .Nm cpio
- implementation distributed with HPUX used XXXX but stored
- device numbers differently XXX.
- .Ss Other Extensions and Variants
- Sun Solaris uses additional file types to store extended file
- data, including ACLs and extended attributes, as special
- entries in cpio archives.
- .Pp
- XXX Others? XXX
- .Sh SEE ALSO
- .Xr cpio 1 ,
- .Xr tar 5
- .Sh STANDARDS
- The
- .Nm cpio
- utility is no longer a part of POSIX or the Single Unix Standard.
- It last appeared in
- .St -susv2 .
- It has been supplanted in subsequent standards by
- .Xr pax 1 .
- The portable ASCII format is currently part of the specification for the
- .Xr pax 1
- utility.
- .Sh HISTORY
- The original cpio utility was written by Dick Haight
- while working in AT&T's Unix Support Group.
- It appeared in 1977 as part of PWB/UNIX 1.0, the
- .Dq Programmer's Work Bench
- derived from
- .At v6
- that was used internally at AT&T.
- Both the old binary and old character formats were in use
- by 1980, according to the System III source released
- by SCO under their
- .Dq Ancient Unix
- license.
- The character format was adopted as part of
- .St -p1003.1-88 .
- XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
- .Sh BUGS
- The
- .Dq CRC
- format is mis-named, as it uses a simple checksum and
- not a cyclic redundancy check.
- .Pp
- The old binary format is limited to 16 bits for user id,
- group id, device, and inode numbers.
- It is limited to 4 gigabyte file sizes.
- .Pp
- The old ASCII format is limited to 18 bits for
- the user id, group id, device, and inode numbers.
- It is limited to 8 gigabyte file sizes.
- .Pp
- The new ASCII format is limited to 4 gigabyte file sizes.
- .Pp
- None of the cpio formats store user or group names,
- which are essential when moving files between systems with
- dissimilar user or group numbering.
- .Pp
- Especially when writing older cpio variants, it may be necessary
- to map actual device/inode values to synthesized values that
- fit the available fields.
- With very large filesystems, this may be necessary even for
- the newer formats.
|