ziplimit.txt 12 KB


  1. ziplimit.txt
  2. Zip 3 and UnZip 6 now support many of the extended limits of Zip64.
  3. A) Hard limits of the Zip archive format:
  4. Number of entries in Zip archive: 64 k (2^16 - 1 entries)
  5. Compressed size of archive entry: 4 GByte (2^32 - 1 Bytes)
  6. Uncompressed size of entry: 4 GByte (2^32 - 1 Bytes)
  7. Size of single-volume Zip archive: 4 GByte (2^32 - 1 Bytes)
  8. Per-volume size of multi-volume archives: 4 GByte (2^32 - 1 Bytes)
  9. Number of parts for multi-volume archives: 64 k (1^16 - 1 parts)
  10. Total size of multi-volume archive: 256 TByte (4G * 64k)
  11. The number of archive entries and of multivolume parts are limited by
  12. the structure of the "end-of-central-directory" record, where the these
  13. numbers are stored in 2-Byte fields.
  14. Some Zip and/or UnZip implementations (for example Info-ZIP's) allow
  15. handling of archives with more than 64k entries. (The information
  16. from "number of entries" field in the "end-of-central-directory" record
  17. is not really neccessary to retrieve the contents of a Zip archive;
  18. it should rather be used for consistency checks.)
  19. Length of an archive entry name: 64 kByte (2^16 - 1)
  20. Length of archive member comment: 64 kByte (2^16 - 1)
  21. Total length of "extra field": 64 kByte (2^16 - 1)
  22. Length of a single e.f. block: 64 kByte (2^16 - 1)
  23. Length of archive comment: 64 KByte (2^16 - 1)
  24. Additional limitation claimed by PKWARE:
  25. Size of local-header structure (fixed fields of 30 Bytes + filename
  26. local extra field): < 64 kByte
  27. Size of central-directory structure (46 Bytes + filename +
  28. central extra field + member comment): < 64 kByte
  29. Note:
  30. In 2001, PKWARE has published version 4.5 of the Zip format specification
  31. (together with the release of PKZIP for Windows 4.5). This specification
  32. defines new extra field blocks that allow to break the size limits of the
  33. standard zipfile structures. In this extended Zip format, the size limits
  34. of zip entries (and the complete zip archive) have been extended to
  35. (2^64 - 1) Bytes and the maximum number of archive entries to (2^32-1).
  36. Zip 3.0 supports these Zip64 extensions and should be released shortly.
  37. UnZip 6.0 should support these standards.
  38. B) Implementation limits of UnZip:
  39. Note:
  40. This section should be updated when UnZip 6.0 is near release.
  41. 1. Size limits caused by file I/O and decompression handling:
  42. Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
  43. Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
  44. Note: On some systems, UnZip may support archive sizes up to 4 GByte.
  45. To get this support, the target environment has to meet the following
  46. requirements:
  47. a) The compiler's intrinsic "long" data types must be able to hold
  48. integer numbers of 2^32. In other words - the standard intrinsic
  49. integer types "long" and "unsigned long" have to be wider than
  50. 32 bit.
  51. b) The system has to supply a C runtime library that is compatible
  52. with the more-than-32-bit-wide "long int" type of condition a)
  53. c) The standard file positioning functions fseek(), ftell() (and/or
  54. the Unix style lseek() and tell() functions) have to be capable
  55. to move to absolute file offsets of up to 4 GByte from the file
  56. start.
  57. On 32-bit CPU hardware, you generally cannot expect that a C compiler
  58. provides a "long int" type that is wider than 32-bit. So, many of the
  59. most popular systems (i386, PowerPC, 680x0, et. al) are out of luck.
  60. You may find environment that provide all requirements on systems
  61. with 64-bit CPU hardware. Examples might be Cray number crunchers
  62. or Compaq (former DEC) Alpha AXP machines.
  63. The number of Zip archive entries is unlimited. The "number-of-entries"
  64. field of the "end-of-central-dir" record is checked against the "number
  65. of entries found in the central directory" modulus 64k (2^16).
  66. Multi-volume archive extraction is not supported.
  67. Memory requirements are mostly independent of the archive size
  68. and archive contents.
  69. In general, UnZip needs a fixed amount of internal buffer space
  70. plus the size to hold the complete information of the currently
  71. processed entry's local header. Here, a large extra field
  72. (could be up to 64 kByte) may exceed the available memory
  73. for MSDOS 16-bit executables (when they were compiled in small
  74. or medium memory model, with a fixed 64kByte limit on data space).
  75. The other exception where memory requirements scale with "larger"
  76. archives is the "restore directory attributes" feature. Here, the
  77. directory attributes info for each restored directory has to be held
  78. in memory until the whole archive has been processed. So, the amount
  79. of memory needed to keep this info scales with the number of restored
  80. directories and may cause memory problems when a lot of directories
  81. are restored in a single run.
  82. C) Implementation limits of the Zip executables:
  83. Note:
  84. This section has been updated to reflect Zip 3.0.
  85. 1. Size limits caused by file I/O and compression handling:
  86. Without Zip64 extensions:
  87. Size of Zip archive: 2 GByte (2^31 - 1 Bytes)
  88. Compressed size of archive entry: 2 GByte (2^31 - 1 Bytes)
  89. Uncompressed size of entry: 2 GByte (2^31 - 1 Bytes),
  90. (could/should be 4 GBytes...)
  91. Using Zip64 extensions:
  92. Size of Zip archive: 2^63 - 1 Bytes
  93. Compressed size of archive entry: 2^63 - 1 Bytes
  94. Uncompressed size of entry: 2^63 - 1 Bytes
  95. Multi-volume archive creation now supported in the form of split
  96. archvies. Currently up to 99,999 splits are supported.
  97. 2. Limits caused by handling of archive contents lists
  98. 2.1. Number of archive entries (freshen, update, delete)
  99. a) 16-bit executable: 64k (2^16 -1) or 32k (2^15 - 1),
  100. (unsigned vs. signed type of size_t)
  101. a1) 16-bit executable: <16k ((2^16)/4)
  102. (The smaller limit a1) results from the array size limit of
  103. the "qsort()" function.)
  104. 32-bit executables: <1G ((2^32)/4)
  105. (usual system limit of the "qsort()" function on 32-bit systems)
  106. b) stack space needed by qsort to sort list of archive entries
  107. NOTE: In the current executables, overflows of limits a) and b) are NOT
  108. checked!
  109. c) amount of free memory to hold "central directory information" of
  110. all archive entries; one entry needs:
  111. 96 bytes (32-bit) resp. 80 bytes (16-bit)
  112. + 3 * length of entry name
  113. + length of zip entry comment (when present)
  114. + length of extra field(s) (when present, e.g.: UT needs 9 bytes)
  115. + some bytes for book-keeping of memory allocation
  116. Conclusion:
  117. For systems with limited memory space (MSDOS, small AMIGAs, other
  118. environments without virtual memory), the number of archive entries
  119. is most often limited by condition c).
  120. For example, with approx. 100 kBytes of free memory after loading and
  121. initializing the program, a 16-bit DOS Zip cannot process more than 600
  122. to 1000 (+) archive entries. (For the 16-bit Windows DLL or the 16-bit
  123. OS/2 port, limit c) is less important because Windows or OS/2 executables
  124. are not restricted to the 1024k area of real mode memory. These 16-bit
  125. ports are limited by conditions a1) and b), say: at maximum approx.
  126. 16000 entries!)
  127. 2.2. Number of "new" entries (add operation)
  128. In addition to the restrictions above (2.1.), the following limits
  129. caused by the handling of the "new files" list apply:
  130. a) 16-bit executable: <16k ((2^64)/4)
  131. b) stack size required for "qsort" operation on "new entries" list.
  132. NOTE: In the current executables, the overflow checks for these limits
  133. are missing!
  134. c) amount of free memory to hold the directory info list for new entries;
  135. one entry needs:
  136. 24 bytes (32-bit) resp. 22 bytes (16-bit)
  137. + 3 * length of filename
  138. NOTE: For larger systems, the actual limits may be more performance
  139. issues (how long you want to wait) rather than available memory and other
  140. resources.
  141. D) Some technical remarks:
  142. 1. For executables compiled without LARGE_FILE_SUPPORT and ZIP64_SUPPORT
  143. enabled, the 2GByte size limit on archive files is a consequence of
  144. the portable C implementation of the Info-ZIP programs. Zip archive
  145. processing requires random access to the archive file for jumping
  146. between different parts of the archive's structure. In standard C,
  147. this is done via stdio functions fseek()/ftell() resp. unix-io functions
  148. lseek()/tell(). In many (most?) C implementations, these functions use
  149. "signed long" variables to hold offset pointers into sequential files.
  150. In most cases, this is a signed 32-bit number, which is limited to
  151. ca. 2E+09. There may be specific C runtime library implementations
  152. that interpret the offset numbers as unsigned, but for us, this is not
  153. reliable in the context of portable programming.
  154. If LARGE_FILE_SUPPORT and ZIP64_SUPPORT are defined and supported by
  155. the system, 64-bit off_t file offsets are supported and the above
  156. larger limits are supported. As off_t is signed, the maximum offset
  157. is usually limited to 2^63 - 1.
  158. 2. The 2GByte limit on the size of a single compressed archive member
  159. is again a consequence of the implementation in C.
  160. The variables used internally to count the size of the compressed
  161. data stream are of type "long", which is guaranted to be at least
  162. 32-bit wide on all supported environments.
  163. But, why do we use "signed" long and not "unsigned long"?
  164. Throughout the I/O handling of the compressed data stream, the
  165. sign bit of the "long" numbers is (mis-)used as a kind of overflow
  166. detection. In the end, this is caused by the fact that standard C
  167. lacks any overflow checking on integer arithmetics and does not
  168. support access to the underlying hardware's overflow detection
  169. (the status bits, especially "carry" and "overflow" of the CPU's
  170. flags-register) in a system-independent manner.
  171. So, we "misuse" the most-significant bit of the compressed data
  172. size counters as carry bit for efficient overflow/underflow detection.
  173. We could change the code to a different method of overflow detection,
  174. by using a bunch of "sanity" comparisons (kind of "is the calculated
  175. result plausible when compared with the operands"). But, this would
  176. "blow up" the code of the "inner loop", with remarkable loss of
  177. processing speed. Or, we could reduce the amount of consistency checks
  178. of the compressed data (e.g. detection of premature end of stream) to
  179. an absolute minimum, at the cost of the programs' stability when
  180. processing corrupted data.
  181. Summary: Changing the compression/decompression core routines to
  182. be "unsigned safe" would require excessive recoding, with little
  183. gain on maximum processable uncompressed size (a gain can only be
  184. expected for hardly compressable data), but at severe costs on
  185. performance, stability and maintainability. Therefore, it is
  186. quite unlikely that this will ever happen for Zip/UnZip.
  187. With LARGE_FILE_SUPPORT and ZIP64_SUPPORT enabled and supported,
  188. the above arguments still apply, but the limits are based on 64 bits
  189. instead of 32 and should allow most large files and archives to be
  190. processed.
  191. Anyway, the Zip archive format is more and more showing its age...
  192. The effort to lift the 2GByte limits should be better invested in
  193. creating a successor for the Zip archive format and tools. But given
  194. the latest improvements to the format and the wide acceptance of zip
  195. files, the format will probably be around for awhile more.
  196. Please report any problems using the web contact form at: www.Info-ZIP.org
  197. Last updated: 26 January 2002, Christian Spieler
  198. 25 May 2008, Ed Gordon