README.txt 44 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002
  1. README file for PCRE (Perl-compatible regular expression library)
  2. -----------------------------------------------------------------
  3. NOTE: This set of files relates to PCRE releases that use the original API,
  4. with library names libpcre, libpcre16, and libpcre32. January 2015 saw the
  5. first release of a new API, known as PCRE2, with release numbers starting at
  6. 10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old
  7. libraries (now called PCRE1) are still being maintained for bug fixes, but
  8. there will be no new development. New projects are advised to use the new PCRE2
  9. libraries.
  10. The latest release of PCRE1 is always available in three alternative formats
  11. from:
  12. ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
  13. ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
  14. ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
  15. There is a mailing list for discussion about the development of PCRE at
  16. pcre-dev@exim.org. You can access the archives and subscribe or manage your
  17. subscription here:
  18. https://lists.exim.org/mailman/listinfo/pcre-dev
  19. Please read the NEWS file if you are upgrading from a previous release.
  20. The contents of this README file are:
  21. The PCRE APIs
  22. Documentation for PCRE
  23. Contributions by users of PCRE
  24. Building PCRE on non-Unix-like systems
  25. Building PCRE without using autotools
  26. Building PCRE using autotools
  27. Retrieving configuration information
  28. Shared libraries
  29. Cross-compiling using autotools
  30. Using HP's ANSI C++ compiler (aCC)
  31. Compiling in Tru64 using native compilers
  32. Using Sun's compilers for Solaris
  33. Using PCRE from MySQL
  34. Making new tarballs
  35. Testing PCRE
  36. Character tables
  37. File manifest
  38. The PCRE APIs
  39. -------------
  40. PCRE is written in C, and it has its own API. There are three sets of
  41. functions, one for the 8-bit library, which processes strings of bytes, one for
  42. the 16-bit library, which processes strings of 16-bit values, and one for the
  43. 32-bit library, which processes strings of 32-bit values. The distribution also
  44. includes a set of C++ wrapper functions (see the pcrecpp man page for details),
  45. courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
  46. C++. Other C++ wrappers have been created from time to time. See, for example:
  47. https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
  48. style to the C API.
  49. The distribution also contains a set of C wrapper functions (again, just for
  50. the 8-bit library) that are based on the POSIX regular expression API (see the
  51. pcreposix man page). These end up in the library called libpcreposix. Note that
  52. this just provides a POSIX calling interface to PCRE; the regular expressions
  53. themselves still follow Perl syntax and semantics. The POSIX API is restricted,
  54. and does not give full access to all of PCRE's facilities.
  55. The header file for the POSIX-style functions is called pcreposix.h. The
  56. official POSIX name is regex.h, but I did not want to risk possible problems
  57. with existing files of that name by distributing it that way. To use PCRE with
  58. an existing program that uses the POSIX API, pcreposix.h will have to be
  59. renamed or pointed at by a link.
  60. If you are using the POSIX interface to PCRE and there is already a POSIX regex
  61. library installed on your system, as well as worrying about the regex.h header
  62. file (as mentioned above), you must also take care when linking programs to
  63. ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
  64. up the POSIX functions of the same name from the other library.
  65. One way of avoiding this confusion is to compile PCRE with the addition of
  66. -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
  67. compiler flags (CFLAGS if you are using "configure" -- see below). This has the
  68. effect of renaming the functions so that the names no longer clash. Of course,
  69. you have to do the same thing for your applications, or write them using the
  70. new names.
  71. Documentation for PCRE
  72. ----------------------
  73. If you install PCRE in the normal way on a Unix-like system, you will end up
  74. with a set of man pages whose names all start with "pcre". The one that is just
  75. called "pcre" lists all the others. In addition to these man pages, the PCRE
  76. documentation is supplied in two other forms:
  77. 1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
  78. doc/pcretest.txt in the source distribution. The first of these is a
  79. concatenation of the text forms of all the section 3 man pages except
  80. the listing of pcredemo.c and those that summarize individual functions.
  81. The other two are the text forms of the section 1 man pages for the
  82. pcregrep and pcretest commands. These text forms are provided for ease of
  83. scanning with text editors or similar tools. They are installed in
  84. <prefix>/share/doc/pcre, where <prefix> is the installation prefix
  85. (defaulting to /usr/local).
  86. 2. A set of files containing all the documentation in HTML form, hyperlinked
  87. in various ways, and rooted in a file called index.html, is distributed in
  88. doc/html and installed in <prefix>/share/doc/pcre/html.
  89. Users of PCRE have contributed files containing the documentation for various
  90. releases in CHM format. These can be found in the Contrib directory of the FTP
  91. site (see next section).
  92. Contributions by users of PCRE
  93. ------------------------------
  94. You can find contributions from PCRE users in the directory
  95. ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
  96. There is a README file giving brief descriptions of what they are. Some are
  97. complete in themselves; others are pointers to URLs containing relevant files.
  98. Some of this material is likely to be well out-of-date. Several of the earlier
  99. contributions provided support for compiling PCRE on various flavours of
  100. Windows (I myself do not use Windows). Nowadays there is more Windows support
  101. in the standard distribution, so these contibutions have been archived.
  102. A PCRE user maintains downloadable Windows binaries of the pcregrep and
  103. pcretest programs here:
  104. http://www.rexegg.com/pcregrep-pcretest.html
  105. Building PCRE on non-Unix-like systems
  106. --------------------------------------
  107. For a non-Unix-like system, please read the comments in the file
  108. NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
  109. "make" you may be able to build PCRE using autotools in the same way as for
  110. many Unix-like systems.
  111. PCRE can also be configured using the GUI facility provided by CMake's
  112. cmake-gui command. This creates Makefiles, solution files, etc. The file
  113. NON-AUTOTOOLS-BUILD has information about CMake.
  114. PCRE has been compiled on many different operating systems. It should be
  115. straightforward to build PCRE on any system that has a Standard C compiler and
  116. library, because it uses only Standard C functions.
  117. Building PCRE without using autotools
  118. -------------------------------------
  119. The use of autotools (in particular, libtool) is problematic in some
  120. environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
  121. file for ways of building PCRE without using autotools.
  122. Building PCRE using autotools
  123. -----------------------------
  124. If you are using HP's ANSI C++ compiler (aCC), please see the special note
  125. in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
  126. The following instructions assume the use of the widely used "configure; make;
  127. make install" (autotools) process.
  128. To build PCRE on system that supports autotools, first run the "configure"
  129. command from the PCRE distribution directory, with your current directory set
  130. to the directory where you want the files to be created. This command is a
  131. standard GNU "autoconf" configuration script, for which generic instructions
  132. are supplied in the file INSTALL.
  133. Most commonly, people build PCRE within its own distribution directory, and in
  134. this case, on many systems, just running "./configure" is sufficient. However,
  135. the usual methods of changing standard defaults are available. For example:
  136. CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
  137. This command specifies that the C compiler should be run with the flags '-O2
  138. -Wall' instead of the default, and that "make install" should install PCRE
  139. under /opt/local instead of the default /usr/local.
  140. If you want to build in a different directory, just run "configure" with that
  141. directory as current. For example, suppose you have unpacked the PCRE source
  142. into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
  143. cd /build/pcre/pcre-xxx
  144. /source/pcre/pcre-xxx/configure
  145. PCRE is written in C and is normally compiled as a C library. However, it is
  146. possible to build it as a C++ library, though the provided building apparatus
  147. does not have any features to support this.
  148. There are some optional features that can be included or omitted from the PCRE
  149. library. They are also documented in the pcrebuild man page.
  150. . By default, both shared and static libraries are built. You can change this
  151. by adding one of these options to the "configure" command:
  152. --disable-shared
  153. --disable-static
  154. (See also "Shared libraries on Unix-like systems" below.)
  155. . By default, only the 8-bit library is built. If you add --enable-pcre16 to
  156. the "configure" command, the 16-bit library is also built. If you add
  157. --enable-pcre32 to the "configure" command, the 32-bit library is also built.
  158. If you want only the 16-bit or 32-bit library, use --disable-pcre8 to disable
  159. building the 8-bit library.
  160. . If you are building the 8-bit library and want to suppress the building of
  161. the C++ wrapper library, you can add --disable-cpp to the "configure"
  162. command. Otherwise, when "configure" is run without --disable-pcre8, it will
  163. try to find a C++ compiler and C++ header files, and if it succeeds, it will
  164. try to build the C++ wrapper.
  165. . If you want to include support for just-in-time compiling, which can give
  166. large performance improvements on certain platforms, add --enable-jit to the
  167. "configure" command. This support is available only for certain hardware
  168. architectures. If you try to enable it on an unsupported architecture, there
  169. will be a compile time error.
  170. . When JIT support is enabled, pcregrep automatically makes use of it, unless
  171. you add --disable-pcregrep-jit to the "configure" command.
  172. . If you want to make use of the support for UTF-8 Unicode character strings in
  173. the 8-bit library, or UTF-16 Unicode character strings in the 16-bit library,
  174. or UTF-32 Unicode character strings in the 32-bit library, you must add
  175. --enable-utf to the "configure" command. Without it, the code for handling
  176. UTF-8, UTF-16 and UTF-8 is not included in the relevant library. Even
  177. when --enable-utf is included, the use of a UTF encoding still has to be
  178. enabled by an option at run time. When PCRE is compiled with this option, its
  179. input can only either be ASCII or UTF-8/16/32, even when running on EBCDIC
  180. platforms. It is not possible to use both --enable-utf and --enable-ebcdic at
  181. the same time.
  182. . There are no separate options for enabling UTF-8, UTF-16 and UTF-32
  183. independently because that would allow ridiculous settings such as requesting
  184. UTF-16 support while building only the 8-bit library. However, the option
  185. --enable-utf8 is retained for backwards compatibility with earlier releases
  186. that did not support 16-bit or 32-bit character strings. It is synonymous with
  187. --enable-utf. It is not possible to configure one library with UTF support
  188. and the other without in the same configuration.
  189. . If, in addition to support for UTF-8/16/32 character strings, you want to
  190. include support for the \P, \p, and \X sequences that recognize Unicode
  191. character properties, you must add --enable-unicode-properties to the
  192. "configure" command. This adds about 30K to the size of the library (in the
  193. form of a property table); only the basic two-letter properties such as Lu
  194. are supported.
  195. . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
  196. of the preceding, or any of the Unicode newline sequences as indicating the
  197. end of a line. Whatever you specify at build time is the default; the caller
  198. of PCRE can change the selection at run time. The default newline indicator
  199. is a single LF character (the Unix standard). You can specify the default
  200. newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
  201. or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
  202. --enable-newline-is-any to the "configure" command, respectively.
  203. If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
  204. the standard tests will fail, because the lines in the test files end with
  205. LF. Even if the files are edited to change the line endings, there are likely
  206. to be some failures. With --enable-newline-is-anycrlf or
  207. --enable-newline-is-any, many tests should succeed, but there may be some
  208. failures.
  209. . By default, the sequence \R in a pattern matches any Unicode line ending
  210. sequence. This is independent of the option specifying what PCRE considers to
  211. be the end of a line (see above). However, the caller of PCRE can restrict \R
  212. to match only CR, LF, or CRLF. You can make this the default by adding
  213. --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
  214. . When called via the POSIX interface, PCRE uses malloc() to get additional
  215. storage for processing capturing parentheses if there are more than 10 of
  216. them in a pattern. You can increase this threshold by setting, for example,
  217. --with-posix-malloc-threshold=20
  218. on the "configure" command.
  219. . PCRE has a counter that limits the depth of nesting of parentheses in a
  220. pattern. This limits the amount of system stack that a pattern uses when it
  221. is compiled. The default is 250, but you can change it by setting, for
  222. example,
  223. --with-parens-nest-limit=500
  224. . PCRE has a counter that can be set to limit the amount of resources it uses
  225. when matching a pattern. If the limit is exceeded during a match, the match
  226. fails. The default is ten million. You can change the default by setting, for
  227. example,
  228. --with-match-limit=500000
  229. on the "configure" command. This is just the default; individual calls to
  230. pcre_exec() can supply their own value. There is more discussion on the
  231. pcreapi man page.
  232. . There is a separate counter that limits the depth of recursive function calls
  233. during a matching process. This also has a default of ten million, which is
  234. essentially "unlimited". You can change the default by setting, for example,
  235. --with-match-limit-recursion=500000
  236. Recursive function calls use up the runtime stack; running out of stack can
  237. cause programs to crash in strange ways. There is a discussion about stack
  238. sizes in the pcrestack man page.
  239. . The default maximum compiled pattern size is around 64K. You can increase
  240. this by adding --with-link-size=3 to the "configure" command. In the 8-bit
  241. library, PCRE then uses three bytes instead of two for offsets to different
  242. parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
  243. the same as --with-link-size=4, which (in both libraries) uses four-byte
  244. offsets. Increasing the internal link size reduces performance. In the 32-bit
  245. library, the only supported link size is 4.
  246. . You can build PCRE so that its internal match() function that is called from
  247. pcre_exec() does not call itself recursively. Instead, it uses memory blocks
  248. obtained from the heap via the special functions pcre_stack_malloc() and
  249. pcre_stack_free() to save data that would otherwise be saved on the stack. To
  250. build PCRE like this, use
  251. --disable-stack-for-recursion
  252. on the "configure" command. PCRE runs more slowly in this mode, but it may be
  253. necessary in environments with limited stack sizes. This applies only to the
  254. normal execution of the pcre_exec() function; if JIT support is being
  255. successfully used, it is not relevant. Equally, it does not apply to
  256. pcre_dfa_exec(), which does not use deeply nested recursion. There is a
  257. discussion about stack sizes in the pcrestack man page.
  258. . For speed, PCRE uses four tables for manipulating and identifying characters
  259. whose code point values are less than 256. By default, it uses a set of
  260. tables for ASCII encoding that is part of the distribution. If you specify
  261. --enable-rebuild-chartables
  262. a program called dftables is compiled and run in the default C locale when
  263. you obey "make". It builds a source file called pcre_chartables.c. If you do
  264. not specify this option, pcre_chartables.c is created as a copy of
  265. pcre_chartables.c.dist. See "Character tables" below for further information.
  266. . It is possible to compile PCRE for use on systems that use EBCDIC as their
  267. character code (as opposed to ASCII/Unicode) by specifying
  268. --enable-ebcdic
  269. This automatically implies --enable-rebuild-chartables (see above). However,
  270. when PCRE is built this way, it always operates in EBCDIC. It cannot support
  271. both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
  272. which specifies that the code value for the EBCDIC NL character is 0x25
  273. instead of the default 0x15.
  274. . In environments where valgrind is installed, if you specify
  275. --enable-valgrind
  276. PCRE will use valgrind annotations to mark certain memory regions as
  277. unaddressable. This allows it to detect invalid memory accesses, and is
  278. mostly useful for debugging PCRE itself.
  279. . In environments where the gcc compiler is used and lcov version 1.6 or above
  280. is installed, if you specify
  281. --enable-coverage
  282. the build process implements a code coverage report for the test suite. The
  283. report is generated by running "make coverage". If ccache is installed on
  284. your system, it must be disabled when building PCRE for coverage reporting.
  285. You can do this by setting the environment variable CCACHE_DISABLE=1 before
  286. running "make" to build PCRE. There is more information about coverage
  287. reporting in the "pcrebuild" documentation.
  288. . The pcregrep program currently supports only 8-bit data files, and so
  289. requires the 8-bit PCRE library. It is possible to compile pcregrep to use
  290. libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
  291. specifying one or both of
  292. --enable-pcregrep-libz
  293. --enable-pcregrep-libbz2
  294. Of course, the relevant libraries must be installed on your system.
  295. . The default size (in bytes) of the internal buffer used by pcregrep can be
  296. set by, for example:
  297. --with-pcregrep-bufsize=51200
  298. The value must be a plain integer. The default is 20480.
  299. . It is possible to compile pcretest so that it links with the libreadline
  300. or libedit libraries, by specifying, respectively,
  301. --enable-pcretest-libreadline or --enable-pcretest-libedit
  302. If this is done, when pcretest's input is from a terminal, it reads it using
  303. the readline() function. This provides line-editing and history facilities.
  304. Note that libreadline is GPL-licenced, so if you distribute a binary of
  305. pcretest linked in this way, there may be licensing issues. These can be
  306. avoided by linking with libedit (which has a BSD licence) instead.
  307. Enabling libreadline causes the -lreadline option to be added to the pcretest
  308. build. In many operating environments with a sytem-installed readline
  309. library this is sufficient. However, in some environments (e.g. if an
  310. unmodified distribution version of readline is in use), it may be necessary
  311. to specify something like LIBS="-lncurses" as well. This is because, to quote
  312. the readline INSTALL, "Readline uses the termcap functions, but does not link
  313. with the termcap or curses library itself, allowing applications which link
  314. with readline the to choose an appropriate library." If you get error
  315. messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
  316. this is the problem, and linking with the ncurses library should fix it.
  317. The "configure" script builds the following files for the basic C library:
  318. . Makefile the makefile that builds the library
  319. . config.h build-time configuration options for the library
  320. . pcre.h the public PCRE header file
  321. . pcre-config script that shows the building settings such as CFLAGS
  322. that were set for "configure"
  323. . libpcre.pc ) data for the pkg-config command
  324. . libpcre16.pc )
  325. . libpcre32.pc )
  326. . libpcreposix.pc )
  327. . libtool script that builds shared and/or static libraries
  328. Versions of config.h and pcre.h are distributed in the PCRE tarballs under the
  329. names config.h.generic and pcre.h.generic. These are provided for those who
  330. have to built PCRE without using "configure" or CMake. If you use "configure"
  331. or CMake, the .generic versions are not used.
  332. When building the 8-bit library, if a C++ compiler is found, the following
  333. files are also built:
  334. . libpcrecpp.pc data for the pkg-config command
  335. . pcrecpparg.h header file for calling PCRE via the C++ wrapper
  336. . pcre_stringpiece.h header for the C++ "stringpiece" functions
  337. The "configure" script also creates config.status, which is an executable
  338. script that can be run to recreate the configuration, and config.log, which
  339. contains compiler output from tests that "configure" runs.
  340. Once "configure" has run, you can run "make". This builds the the libraries
  341. libpcre, libpcre16 and/or libpcre32, and a test program called pcretest. If you
  342. enabled JIT support with --enable-jit, a test program called pcre_jit_test is
  343. built as well.
  344. If the 8-bit library is built, libpcreposix and the pcregrep command are also
  345. built, and if a C++ compiler was found on your system, and you did not disable
  346. it with --disable-cpp, "make" builds the C++ wrapper library, which is called
  347. libpcrecpp, as well as some test programs called pcrecpp_unittest,
  348. pcre_scanner_unittest, and pcre_stringpiece_unittest.
  349. The command "make check" runs all the appropriate tests. Details of the PCRE
  350. tests are given below in a separate section of this document.
  351. You can use "make install" to install PCRE into live directories on your
  352. system. The following are installed (file names are all relative to the
  353. <prefix> that is set when "configure" is run):
  354. Commands (bin):
  355. pcretest
  356. pcregrep (if 8-bit support is enabled)
  357. pcre-config
  358. Libraries (lib):
  359. libpcre16 (if 16-bit support is enabled)
  360. libpcre32 (if 32-bit support is enabled)
  361. libpcre (if 8-bit support is enabled)
  362. libpcreposix (if 8-bit support is enabled)
  363. libpcrecpp (if 8-bit and C++ support is enabled)
  364. Configuration information (lib/pkgconfig):
  365. libpcre16.pc
  366. libpcre32.pc
  367. libpcre.pc
  368. libpcreposix.pc
  369. libpcrecpp.pc (if C++ support is enabled)
  370. Header files (include):
  371. pcre.h
  372. pcreposix.h
  373. pcre_scanner.h )
  374. pcre_stringpiece.h ) if C++ support is enabled
  375. pcrecpp.h )
  376. pcrecpparg.h )
  377. Man pages (share/man/man{1,3}):
  378. pcregrep.1
  379. pcretest.1
  380. pcre-config.1
  381. pcre.3
  382. pcre*.3 (lots more pages, all starting "pcre")
  383. HTML documentation (share/doc/pcre/html):
  384. index.html
  385. *.html (lots more pages, hyperlinked from index.html)
  386. Text file documentation (share/doc/pcre):
  387. AUTHORS
  388. COPYING
  389. ChangeLog
  390. LICENCE
  391. NEWS
  392. README
  393. pcre.txt (a concatenation of the man(3) pages)
  394. pcretest.txt the pcretest man page
  395. pcregrep.txt the pcregrep man page
  396. pcre-config.txt the pcre-config man page
  397. If you want to remove PCRE from your system, you can run "make uninstall".
  398. This removes all the files that "make install" installed. However, it does not
  399. remove any directories, because these are often shared with other programs.
  400. Retrieving configuration information
  401. ------------------------------------
  402. Running "make install" installs the command pcre-config, which can be used to
  403. recall information about the PCRE configuration and installation. For example:
  404. pcre-config --version
  405. prints the version number, and
  406. pcre-config --libs
  407. outputs information about where the library is installed. This command can be
  408. included in makefiles for programs that use PCRE, saving the programmer from
  409. having to remember too many details.
  410. The pkg-config command is another system for saving and retrieving information
  411. about installed libraries. Instead of separate commands for each library, a
  412. single command is used. For example:
  413. pkg-config --cflags pcre
  414. The data is held in *.pc files that are installed in a directory called
  415. <prefix>/lib/pkgconfig.
  416. Shared libraries
  417. ----------------
  418. The default distribution builds PCRE as shared libraries and static libraries,
  419. as long as the operating system supports shared libraries. Shared library
  420. support relies on the "libtool" script which is built as part of the
  421. "configure" process.
  422. The libtool script is used to compile and link both shared and static
  423. libraries. They are placed in a subdirectory called .libs when they are newly
  424. built. The programs pcretest and pcregrep are built to use these uninstalled
  425. libraries (by means of wrapper scripts in the case of shared libraries). When
  426. you use "make install" to install shared libraries, pcregrep and pcretest are
  427. automatically re-built to use the newly installed shared libraries before being
  428. installed themselves. However, the versions left in the build directory still
  429. use the uninstalled libraries.
  430. To build PCRE using static libraries only you must use --disable-shared when
  431. configuring it. For example:
  432. ./configure --prefix=/usr/gnu --disable-shared
  433. Then run "make" in the usual way. Similarly, you can use --disable-static to
  434. build only shared libraries.
  435. Cross-compiling using autotools
  436. -------------------------------
  437. You can specify CC and CFLAGS in the normal way to the "configure" command, in
  438. order to cross-compile PCRE for some other host. However, you should NOT
  439. specify --enable-rebuild-chartables, because if you do, the dftables.c source
  440. file is compiled and run on the local host, in order to generate the inbuilt
  441. character tables (the pcre_chartables.c file). This will probably not work,
  442. because dftables.c needs to be compiled with the local compiler, not the cross
  443. compiler.
  444. When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
  445. by making a copy of pcre_chartables.c.dist, which is a default set of tables
  446. that assumes ASCII code. Cross-compiling with the default tables should not be
  447. a problem.
  448. If you need to modify the character tables when cross-compiling, you should
  449. move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
  450. run it on the local host to make a new version of pcre_chartables.c.dist.
  451. Then when you cross-compile PCRE this new version of the tables will be used.
  452. Using HP's ANSI C++ compiler (aCC)
  453. ----------------------------------
  454. Unless C++ support is disabled by specifying the "--disable-cpp" option of the
  455. "configure" script, you must include the "-AA" option in the CXXFLAGS
  456. environment variable in order for the C++ components to compile correctly.
  457. Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
  458. needed libraries fail to get included when specifying the "-AA" compiler
  459. option. If you experience unresolved symbols when linking the C++ programs,
  460. use the workaround of specifying the following environment variable prior to
  461. running the "configure" script:
  462. CXXLDFLAGS="-lstd_v2 -lCsup_v2"
  463. Compiling in Tru64 using native compilers
  464. -----------------------------------------
  465. The following error may occur when compiling with native compilers in the Tru64
  466. operating system:
  467. CXX libpcrecpp_la-pcrecpp.lo
  468. cxx: Error: /usr/lib/cmplrs/cxx/V7.1-006/include/cxx/iosfwd, line 58: #error
  469. directive: "cannot include iosfwd -- define __USE_STD_IOSTREAM to
  470. override default - see section 7.1.2 of the C++ Using Guide"
  471. #error "cannot include iosfwd -- define __USE_STD_IOSTREAM to override default
  472. - see section 7.1.2 of the C++ Using Guide"
  473. This may be followed by other errors, complaining that 'namespace "std" has no
  474. member'. The solution to this is to add the line
  475. #define __USE_STD_IOSTREAM 1
  476. to the config.h file.
  477. Using Sun's compilers for Solaris
  478. ---------------------------------
  479. A user reports that the following configurations work on Solaris 9 sparcv9 and
  480. Solaris 9 x86 (32-bit):
  481. Solaris 9 sparcv9: ./configure --disable-cpp CC=/bin/cc CFLAGS="-m64 -g"
  482. Solaris 9 x86: ./configure --disable-cpp CC=/bin/cc CFLAGS="-g"
  483. Using PCRE from MySQL
  484. ---------------------
  485. On systems where both PCRE and MySQL are installed, it is possible to make use
  486. of PCRE from within MySQL, as an alternative to the built-in pattern matching.
  487. There is a web page that tells you how to do this:
  488. http://www.mysqludf.org/lib_mysqludf_preg/index.php
  489. Making new tarballs
  490. -------------------
  491. The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
  492. zip formats. The command "make distcheck" does the same, but then does a trial
  493. build of the new distribution to ensure that it works.
  494. If you have modified any of the man page sources in the doc directory, you
  495. should first run the PrepareRelease script before making a distribution. This
  496. script creates the .txt and HTML forms of the documentation from the man pages.
  497. Testing PCRE
  498. ------------
  499. To test the basic PCRE library on a Unix-like system, run the RunTest script.
  500. There is another script called RunGrepTest that tests the options of the
  501. pcregrep command. If the C++ wrapper library is built, three test programs
  502. called pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest
  503. are also built. When JIT support is enabled, another test program called
  504. pcre_jit_test is built.
  505. Both the scripts and all the program tests are run if you obey "make check" or
  506. "make test". For other environments, see the instructions in
  507. NON-AUTOTOOLS-BUILD.
  508. The RunTest script runs the pcretest test program (which is documented in its
  509. own man page) on each of the relevant testinput files in the testdata
  510. directory, and compares the output with the contents of the corresponding
  511. testoutput files. RunTest uses a file called testtry to hold the main output
  512. from pcretest. Other files whose names begin with "test" are used as working
  513. files in some tests.
  514. Some tests are relevant only when certain build-time options were selected. For
  515. example, the tests for UTF-8/16/32 support are run only if --enable-utf was
  516. used. RunTest outputs a comment when it skips a test.
  517. Many of the tests that are not skipped are run up to three times. The second
  518. run forces pcre_study() to be called for all patterns except for a few in some
  519. tests that are marked "never study" (see the pcretest program for how this is
  520. done). If JIT support is available, the non-DFA tests are run a third time,
  521. this time with a forced pcre_study() with the PCRE_STUDY_JIT_COMPILE option.
  522. This testing can be suppressed by putting "nojit" on the RunTest command line.
  523. The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
  524. libraries that are enabled. If you want to run just one set of tests, call
  525. RunTest with either the -8, -16 or -32 option.
  526. If valgrind is installed, you can run the tests under it by putting "valgrind"
  527. on the RunTest command line. To run pcretest on just one or more specific test
  528. files, give their numbers as arguments to RunTest, for example:
  529. RunTest 2 7 11
  530. You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
  531. end), or a number preceded by ~ to exclude a test. For example:
  532. Runtest 3-15 ~10
  533. This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
  534. except test 13. Whatever order the arguments are in, the tests are always run
  535. in numerical order.
  536. You can also call RunTest with the single argument "list" to cause it to output
  537. a list of tests.
  538. The first test file can be fed directly into the perltest.pl script to check
  539. that Perl gives the same results. The only difference you should see is in the
  540. first few lines, where the Perl version is given instead of the PCRE version.
  541. The second set of tests check pcre_fullinfo(), pcre_study(),
  542. pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
  543. detection, and run-time flags that are specific to PCRE, as well as the POSIX
  544. wrapper API. It also uses the debugging flags to check some of the internals of
  545. pcre_compile().
  546. If you build PCRE with a locale setting that is not the standard C locale, the
  547. character tables may be different (see next paragraph). In some cases, this may
  548. cause failures in the second set of tests. For example, in a locale where the
  549. isprint() function yields TRUE for characters in the range 128-255, the use of
  550. [:isascii:] inside a character class defines a different set of characters, and
  551. this shows up in this test as a difference in the compiled code, which is being
  552. listed for checking. Where the comparison test output contains [\x00-\x7f] the
  553. test will contain [\x00-\xff], and similarly in some other cases. This is not a
  554. bug in PCRE.
  555. The third set of tests checks pcre_maketables(), the facility for building a
  556. set of character tables for a specific locale and using them instead of the
  557. default tables. The tests make use of the "fr_FR" (French) locale. Before
  558. running the test, the script checks for the presence of this locale by running
  559. the "locale" command. If that command fails, or if it doesn't include "fr_FR"
  560. in the list of available locales, the third test cannot be run, and a comment
  561. is output to say why. If running this test produces instances of the error
  562. ** Failed to set locale "fr_FR"
  563. in the comparison output, it means that locale is not available on your system,
  564. despite being listed by "locale". This does not mean that PCRE is broken.
  565. [If you are trying to run this test on Windows, you may be able to get it to
  566. work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
  567. RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
  568. Windows versions of test 2. More info on using RunTest.bat is included in the
  569. document entitled NON-UNIX-USE.]
  570. The fourth and fifth tests check the UTF-8/16/32 support and error handling and
  571. internal UTF features of PCRE that are not relevant to Perl, respectively. The
  572. sixth and seventh tests do the same for Unicode character properties support.
  573. The eighth, ninth, and tenth tests check the pcre_dfa_exec() alternative
  574. matching function, in non-UTF-8/16/32 mode, UTF-8/16/32 mode, and UTF-8/16/32
  575. mode with Unicode property support, respectively.
  576. The eleventh test checks some internal offsets and code size features; it is
  577. run only when the default "link size" of 2 is set (in other cases the sizes
  578. change) and when Unicode property support is enabled.
  579. The twelfth test is run only when JIT support is available, and the thirteenth
  580. test is run only when JIT support is not available. They test some JIT-specific
  581. features such as information output from pcretest about JIT compilation.
  582. The fourteenth, fifteenth, and sixteenth tests are run only in 8-bit mode, and
  583. the seventeenth, eighteenth, and nineteenth tests are run only in 16/32-bit
  584. mode. These are tests that generate different output in the two modes. They are
  585. for general cases, UTF-8/16/32 support, and Unicode property support,
  586. respectively.
  587. The twentieth test is run only in 16/32-bit mode. It tests some specific
  588. 16/32-bit features of the DFA matching engine.
  589. The twenty-first and twenty-second tests are run only in 16/32-bit mode, when
  590. the link size is set to 2 for the 16-bit library. They test reloading
  591. pre-compiled patterns.
  592. The twenty-third and twenty-fourth tests are run only in 16-bit mode. They are
  593. for general cases, and UTF-16 support, respectively.
  594. The twenty-fifth and twenty-sixth tests are run only in 32-bit mode. They are
  595. for general cases, and UTF-32 support, respectively.
  596. Character tables
  597. ----------------
  598. For speed, PCRE uses four tables for manipulating and identifying characters
  599. whose code point values are less than 256. The final argument of the
  600. pcre_compile() function is a pointer to a block of memory containing the
  601. concatenated tables. A call to pcre_maketables() can be used to generate a set
  602. of tables in the current locale. If the final argument for pcre_compile() is
  603. passed as NULL, a set of default tables that is built into the binary is used.
  604. The source file called pcre_chartables.c contains the default set of tables. By
  605. default, this is created as a copy of pcre_chartables.c.dist, which contains
  606. tables for ASCII coding. However, if --enable-rebuild-chartables is specified
  607. for ./configure, a different version of pcre_chartables.c is built by the
  608. program dftables (compiled from dftables.c), which uses the ANSI C character
  609. handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
  610. build the table sources. This means that the default C locale which is set for
  611. your system will control the contents of these default tables. You can change
  612. the default tables by editing pcre_chartables.c and then re-building PCRE. If
  613. you do this, you should take care to ensure that the file does not get
  614. automatically re-generated. The best way to do this is to move
  615. pcre_chartables.c.dist out of the way and replace it with your customized
  616. tables.
  617. When the dftables program is run as a result of --enable-rebuild-chartables,
  618. it uses the default C locale that is set on your system. It does not pay
  619. attention to the LC_xxx environment variables. In other words, it uses the
  620. system's default locale rather than whatever the compiling user happens to have
  621. set. If you really do want to build a source set of character tables in a
  622. locale that is specified by the LC_xxx variables, you can run the dftables
  623. program by hand with the -L option. For example:
  624. ./dftables -L pcre_chartables.c.special
  625. The first two 256-byte tables provide lower casing and case flipping functions,
  626. respectively. The next table consists of three 32-byte bit maps which identify
  627. digits, "word" characters, and white space, respectively. These are used when
  628. building 32-byte bit maps that represent character classes for code points less
  629. than 256.
  630. The final 256-byte table has bits indicating various character types, as
  631. follows:
  632. 1 white space character
  633. 2 letter
  634. 4 decimal digit
  635. 8 hexadecimal digit
  636. 16 alphanumeric or '_'
  637. 128 regular expression metacharacter or binary zero
  638. You should not alter the set of characters that contain the 128 bit, as that
  639. will cause PCRE to malfunction.
  640. File manifest
  641. -------------
  642. The distribution should contain the files listed below. Where a file name is
  643. given as pcre[16|32]_xxx it means that there are three files, one with the name
  644. pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
  645. (A) Source files of the PCRE library functions and their headers:
  646. dftables.c auxiliary program for building pcre_chartables.c
  647. when --enable-rebuild-chartables is specified
  648. pcre_chartables.c.dist a default set of character tables that assume ASCII
  649. coding; used, unless --enable-rebuild-chartables is
  650. specified, by copying to pcre[16]_chartables.c
  651. pcreposix.c )
  652. pcre[16|32]_byte_order.c )
  653. pcre[16|32]_compile.c )
  654. pcre[16|32]_config.c )
  655. pcre[16|32]_dfa_exec.c )
  656. pcre[16|32]_exec.c )
  657. pcre[16|32]_fullinfo.c )
  658. pcre[16|32]_get.c ) sources for the functions in the library,
  659. pcre[16|32]_globals.c ) and some internal functions that they use
  660. pcre[16|32]_jit_compile.c )
  661. pcre[16|32]_maketables.c )
  662. pcre[16|32]_newline.c )
  663. pcre[16|32]_refcount.c )
  664. pcre[16|32]_string_utils.c )
  665. pcre[16|32]_study.c )
  666. pcre[16|32]_tables.c )
  667. pcre[16|32]_ucd.c )
  668. pcre[16|32]_version.c )
  669. pcre[16|32]_xclass.c )
  670. pcre_ord2utf8.c )
  671. pcre_valid_utf8.c )
  672. pcre16_ord2utf16.c )
  673. pcre16_utf16_utils.c )
  674. pcre16_valid_utf16.c )
  675. pcre32_utf32_utils.c )
  676. pcre32_valid_utf32.c )
  677. pcre[16|32]_printint.c ) debugging function that is used by pcretest,
  678. ) and can also be #included in pcre_compile()
  679. pcre.h.in template for pcre.h when built by "configure"
  680. pcreposix.h header for the external POSIX wrapper API
  681. pcre_internal.h header for internal use
  682. sljit/* 16 files that make up the JIT compiler
  683. ucp.h header for Unicode property handling
  684. config.h.in template for config.h, which is built by "configure"
  685. pcrecpp.h public header file for the C++ wrapper
  686. pcrecpparg.h.in template for another C++ header file
  687. pcre_scanner.h public header file for C++ scanner functions
  688. pcrecpp.cc )
  689. pcre_scanner.cc ) source for the C++ wrapper library
  690. pcre_stringpiece.h.in template for pcre_stringpiece.h, the header for the
  691. C++ stringpiece functions
  692. pcre_stringpiece.cc source for the C++ stringpiece functions
  693. (B) Source files for programs that use PCRE:
  694. pcredemo.c simple demonstration of coding calls to PCRE
  695. pcregrep.c source of a grep utility that uses PCRE
  696. pcretest.c comprehensive test program
  697. (C) Auxiliary files:
  698. 132html script to turn "man" pages into HTML
  699. AUTHORS information about the author of PCRE
  700. ChangeLog log of changes to the code
  701. CleanTxt script to clean nroff output for txt man pages
  702. Detrail script to remove trailing spaces
  703. HACKING some notes about the internals of PCRE
  704. INSTALL generic installation instructions
  705. LICENCE conditions for the use of PCRE
  706. COPYING the same, using GNU's standard name
  707. Makefile.in ) template for Unix Makefile, which is built by
  708. ) "configure"
  709. Makefile.am ) the automake input that was used to create
  710. ) Makefile.in
  711. NEWS important changes in this release
  712. NON-UNIX-USE the previous name for NON-AUTOTOOLS-BUILD
  713. NON-AUTOTOOLS-BUILD notes on building PCRE without using autotools
  714. PrepareRelease script to make preparations for "make dist"
  715. README this file
  716. RunTest a Unix shell script for running tests
  717. RunGrepTest a Unix shell script for pcregrep tests
  718. aclocal.m4 m4 macros (generated by "aclocal")
  719. config.guess ) files used by libtool,
  720. config.sub ) used only when building a shared library
  721. configure a configuring shell script (built by autoconf)
  722. configure.ac ) the autoconf input that was used to build
  723. ) "configure" and config.h
  724. depcomp ) script to find program dependencies, generated by
  725. ) automake
  726. doc/*.3 man page sources for PCRE
  727. doc/*.1 man page sources for pcregrep and pcretest
  728. doc/index.html.src the base HTML page
  729. doc/html/* HTML documentation
  730. doc/pcre.txt plain text version of the man pages
  731. doc/pcretest.txt plain text documentation of test program
  732. doc/perltest.txt plain text documentation of Perl test program
  733. install-sh a shell script for installing files
  734. libpcre16.pc.in template for libpcre16.pc for pkg-config
  735. libpcre32.pc.in template for libpcre32.pc for pkg-config
  736. libpcre.pc.in template for libpcre.pc for pkg-config
  737. libpcreposix.pc.in template for libpcreposix.pc for pkg-config
  738. libpcrecpp.pc.in template for libpcrecpp.pc for pkg-config
  739. ltmain.sh file used to build a libtool script
  740. missing ) common stub for a few missing GNU programs while
  741. ) installing, generated by automake
  742. mkinstalldirs script for making install directories
  743. perltest.pl Perl test program
  744. pcre-config.in source of script which retains PCRE information
  745. pcre_jit_test.c test program for the JIT compiler
  746. pcrecpp_unittest.cc )
  747. pcre_scanner_unittest.cc ) test programs for the C++ wrapper
  748. pcre_stringpiece_unittest.cc )
  749. testdata/testinput* test data for main library tests
  750. testdata/testoutput* expected test results
  751. testdata/grep* input and output for pcregrep tests
  752. testdata/* other supporting test files
  753. (D) Auxiliary files for cmake support
  754. cmake/COPYING-CMAKE-SCRIPTS
  755. cmake/FindPackageHandleStandardArgs.cmake
  756. cmake/FindEditline.cmake
  757. cmake/FindReadline.cmake
  758. CMakeLists.txt
  759. config-cmake.h.in
  760. (E) Auxiliary files for VPASCAL
  761. makevp.bat
  762. makevp_c.txt
  763. makevp_l.txt
  764. pcregexp.pas
  765. (F) Auxiliary files for building PCRE "by hand"
  766. pcre.h.generic ) a version of the public PCRE header file
  767. ) for use in non-"configure" environments
  768. config.h.generic ) a version of config.h for use in non-"configure"
  769. ) environments
  770. (F) Miscellaneous
  771. RunTest.bat a script for running tests under Windows
  772. Philip Hazel
  773. Email local part: ph10
  774. Email domain: cam.ac.uk
  775. Last updated: 10 February 2015