cpio.5 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325
  1. .\" Copyright (c) 2007 Tim Kientzle
  2. .\" All rights reserved.
  3. .\"
  4. .\" Redistribution and use in source and binary forms, with or without
  5. .\" modification, are permitted provided that the following conditions
  6. .\" are met:
  7. .\" 1. Redistributions of source code must retain the above copyright
  8. .\" notice, this list of conditions and the following disclaimer.
  9. .\" 2. Redistributions in binary form must reproduce the above copyright
  10. .\" notice, this list of conditions and the following disclaimer in the
  11. .\" documentation and/or other materials provided with the distribution.
  12. .\"
  13. .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  14. .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  15. .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  16. .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  17. .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  18. .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  19. .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  20. .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  21. .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  22. .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  23. .\" SUCH DAMAGE.
  24. .\"
  25. .\" $FreeBSD$
  26. .\"
  27. .Dd December 23, 2011
  28. .Dt CPIO 5
  29. .Os
  30. .Sh NAME
  31. .Nm cpio
  32. .Nd format of cpio archive files
  33. .Sh DESCRIPTION
  34. The
  35. .Nm
  36. archive format collects any number of files, directories, and other
  37. file system objects (symbolic links, device nodes, etc.) into a single
  38. stream of bytes.
  39. .Ss General Format
  40. Each file system object in a
  41. .Nm
  42. archive comprises a header record with basic numeric metadata
  43. followed by the full pathname of the entry and the file data.
  44. The header record stores a series of integer values that generally
  45. follow the fields in
  46. .Va struct stat .
  47. (See
  48. .Xr stat 2
  49. for details.)
  50. The variants differ primarily in how they store those integers
  51. (binary, octal, or hexadecimal).
  52. The header is followed by the pathname of the
  53. entry (the length of the pathname is stored in the header)
  54. and any file data.
  55. The end of the archive is indicated by a special record with
  56. the pathname
  57. .Dq TRAILER!!! .
  58. .Ss PWB format
  59. XXX Any documentation of the original PWB/UNIX 1.0 format? XXX
  60. .Ss Old Binary Format
  61. The old binary
  62. .Nm
  63. format stores numbers as 2-byte and 4-byte binary values.
  64. Each entry begins with a header in the following format:
  65. .Bd -literal -offset indent
  66. struct header_old_cpio {
  67. unsigned short c_magic;
  68. unsigned short c_dev;
  69. unsigned short c_ino;
  70. unsigned short c_mode;
  71. unsigned short c_uid;
  72. unsigned short c_gid;
  73. unsigned short c_nlink;
  74. unsigned short c_rdev;
  75. unsigned short c_mtime[2];
  76. unsigned short c_namesize;
  77. unsigned short c_filesize[2];
  78. };
  79. .Ed
  80. .Pp
  81. The
  82. .Va unsigned short
  83. fields here are 16-bit integer values; the
  84. .Va unsigned int
  85. fields are 32-bit integer values.
  86. The fields are as follows
  87. .Bl -tag -width indent
  88. .It Va magic
  89. The integer value octal 070707.
  90. This value can be used to determine whether this archive is
  91. written with little-endian or big-endian integers.
  92. .It Va dev , Va ino
  93. The device and inode numbers from the disk.
  94. These are used by programs that read
  95. .Nm
  96. archives to determine when two entries refer to the same file.
  97. Programs that synthesize
  98. .Nm
  99. archives should be careful to set these to distinct values for each entry.
  100. .It Va mode
  101. The mode specifies both the regular permissions and the file type.
  102. It consists of several bit fields as follows:
  103. .Bl -tag -width "MMMMMMM" -compact
  104. .It 0170000
  105. This masks the file type bits.
  106. .It 0140000
  107. File type value for sockets.
  108. .It 0120000
  109. File type value for symbolic links.
  110. For symbolic links, the link body is stored as file data.
  111. .It 0100000
  112. File type value for regular files.
  113. .It 0060000
  114. File type value for block special devices.
  115. .It 0040000
  116. File type value for directories.
  117. .It 0020000
  118. File type value for character special devices.
  119. .It 0010000
  120. File type value for named pipes or FIFOs.
  121. .It 0004000
  122. SUID bit.
  123. .It 0002000
  124. SGID bit.
  125. .It 0001000
  126. Sticky bit.
  127. On some systems, this modifies the behavior of executables and/or directories.
  128. .It 0000777
  129. The lower 9 bits specify read/write/execute permissions
  130. for world, group, and user following standard POSIX conventions.
  131. .El
  132. .It Va uid , Va gid
  133. The numeric user id and group id of the owner.
  134. .It Va nlink
  135. The number of links to this file.
  136. Directories always have a value of at least two here.
  137. Note that hardlinked files include file data with every copy in the archive.
  138. .It Va rdev
  139. For block special and character special entries,
  140. this field contains the associated device number.
  141. For all other entry types, it should be set to zero by writers
  142. and ignored by readers.
  143. .It Va mtime
  144. Modification time of the file, indicated as the number
  145. of seconds since the start of the epoch,
  146. 00:00:00 UTC January 1, 1970.
  147. The four-byte integer is stored with the most-significant 16 bits first
  148. followed by the least-significant 16 bits.
  149. Each of the two 16 bit values are stored in machine-native byte order.
  150. .It Va namesize
  151. The number of bytes in the pathname that follows the header.
  152. This count includes the trailing NUL byte.
  153. .It Va filesize
  154. The size of the file.
  155. Note that this archive format is limited to
  156. four gigabyte file sizes.
  157. See
  158. .Va mtime
  159. above for a description of the storage of four-byte integers.
  160. .El
  161. .Pp
  162. The pathname immediately follows the fixed header.
  163. If the
  164. .Cm namesize
  165. is odd, an additional NUL byte is added after the pathname.
  166. The file data is then appended, padded with NUL
  167. bytes to an even length.
  168. .Pp
  169. Hardlinked files are not given special treatment;
  170. the full file contents are included with each copy of the
  171. file.
  172. .Ss Portable ASCII Format
  173. .St -susv2
  174. standardized an ASCII variant that is portable across all
  175. platforms.
  176. It is commonly known as the
  177. .Dq old character
  178. format or as the
  179. .Dq odc
  180. format.
  181. It stores the same numeric fields as the old binary format, but
  182. represents them as 6-character or 11-character octal values.
  183. .Bd -literal -offset indent
  184. struct cpio_odc_header {
  185. char c_magic[6];
  186. char c_dev[6];
  187. char c_ino[6];
  188. char c_mode[6];
  189. char c_uid[6];
  190. char c_gid[6];
  191. char c_nlink[6];
  192. char c_rdev[6];
  193. char c_mtime[11];
  194. char c_namesize[6];
  195. char c_filesize[11];
  196. };
  197. .Ed
  198. .Pp
  199. The fields are identical to those in the old binary format.
  200. The name and file body follow the fixed header.
  201. Unlike the old binary format, there is no additional padding
  202. after the pathname or file contents.
  203. If the files being archived are themselves entirely ASCII, then
  204. the resulting archive will be entirely ASCII, except for the
  205. NUL byte that terminates the name field.
  206. .Ss New ASCII Format
  207. The "new" ASCII format uses 8-byte hexadecimal fields for
  208. all numbers and separates device numbers into separate fields
  209. for major and minor numbers.
  210. .Bd -literal -offset indent
  211. struct cpio_newc_header {
  212. char c_magic[6];
  213. char c_ino[8];
  214. char c_mode[8];
  215. char c_uid[8];
  216. char c_gid[8];
  217. char c_nlink[8];
  218. char c_mtime[8];
  219. char c_filesize[8];
  220. char c_devmajor[8];
  221. char c_devminor[8];
  222. char c_rdevmajor[8];
  223. char c_rdevminor[8];
  224. char c_namesize[8];
  225. char c_check[8];
  226. };
  227. .Ed
  228. .Pp
  229. Except as specified below, the fields here match those specified
  230. for the old binary format above.
  231. .Bl -tag -width indent
  232. .It Va magic
  233. The string
  234. .Dq 070701 .
  235. .It Va check
  236. This field is always set to zero by writers and ignored by readers.
  237. See the next section for more details.
  238. .El
  239. .Pp
  240. The pathname is followed by NUL bytes so that the total size
  241. of the fixed header plus pathname is a multiple of four.
  242. Likewise, the file data is padded to a multiple of four bytes.
  243. Note that this format supports only 4 gigabyte files (unlike the
  244. older ASCII format, which supports 8 gigabyte files).
  245. .Pp
  246. In this format, hardlinked files are handled by setting the
  247. filesize to zero for each entry except the last one that
  248. appears in the archive.
  249. .Ss New CRC Format
  250. The CRC format is identical to the new ASCII format described
  251. in the previous section except that the magic field is set
  252. to
  253. .Dq 070702
  254. and the
  255. .Va check
  256. field is set to the sum of all bytes in the file data.
  257. This sum is computed treating all bytes as unsigned values
  258. and using unsigned arithmetic.
  259. Only the least-significant 32 bits of the sum are stored.
  260. .Ss HP variants
  261. The
  262. .Nm cpio
  263. implementation distributed with HPUX used XXXX but stored
  264. device numbers differently XXX.
  265. .Ss Other Extensions and Variants
  266. Sun Solaris uses additional file types to store extended file
  267. data, including ACLs and extended attributes, as special
  268. entries in cpio archives.
  269. .Pp
  270. XXX Others? XXX
  271. .Sh SEE ALSO
  272. .Xr cpio 1 ,
  273. .Xr tar 5
  274. .Sh STANDARDS
  275. The
  276. .Nm cpio
  277. utility is no longer a part of POSIX or the Single Unix Standard.
  278. It last appeared in
  279. .St -susv2 .
  280. It has been supplanted in subsequent standards by
  281. .Xr pax 1 .
  282. The portable ASCII format is currently part of the specification for the
  283. .Xr pax 1
  284. utility.
  285. .Sh HISTORY
  286. The original cpio utility was written by Dick Haight
  287. while working in AT&T's Unix Support Group.
  288. It appeared in 1977 as part of PWB/UNIX 1.0, the
  289. .Dq Programmer's Work Bench
  290. derived from
  291. .At v6
  292. that was used internally at AT&T.
  293. Both the old binary and old character formats were in use
  294. by 1980, according to the System III source released
  295. by SCO under their
  296. .Dq Ancient Unix
  297. license.
  298. The character format was adopted as part of
  299. .St -p1003.1-88 .
  300. XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
  301. .Sh BUGS
  302. The
  303. .Dq CRC
  304. format is mis-named, as it uses a simple checksum and
  305. not a cyclic redundancy check.
  306. .Pp
  307. The old binary format is limited to 16 bits for user id,
  308. group id, device, and inode numbers.
  309. It is limited to 4 gigabyte file sizes.
  310. .Pp
  311. The old ASCII format is limited to 18 bits for
  312. the user id, group id, device, and inode numbers.
  313. It is limited to 8 gigabyte file sizes.
  314. .Pp
  315. The new ASCII format is limited to 4 gigabyte file sizes.
  316. .Pp
  317. None of the cpio formats store user or group names,
  318. which are essential when moving files between systems with
  319. dissimilar user or group numbering.
  320. .Pp
  321. Especially when writing older cpio variants, it may be necessary
  322. to map actual device/inode values to synthesized values that
  323. fit the available fields.
  324. With very large filesystems, this may be necessary even for
  325. the newer formats.