message.texi 78 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128112911301131113211331134113511361137113811391140114111421143114411451146114711481149115011511152115311541155115611571158115911601161116211631164116511661167116811691170117111721173117411751176117711781179118011811182118311841185118611871188118911901191119211931194119511961197119811991200120112021203120412051206120712081209121012111212121312141215121612171218121912201221122212231224122512261227122812291230123112321233123412351236123712381239124012411242124312441245124612471248124912501251125212531254125512561257125812591260126112621263126412651266126712681269127012711272127312741275127612771278127912801281128212831284128512861287128812891290129112921293129412951296129712981299130013011302130313041305130613071308130913101311131213131314131513161317131813191320132113221323132413251326132713281329133013311332133313341335133613371338133913401341134213431344134513461347134813491350135113521353135413551356135713581359136013611362136313641365136613671368136913701371137213731374137513761377137813791380138113821383138413851386138713881389139013911392139313941395139613971398139914001401140214031404140514061407140814091410141114121413141414151416141714181419142014211422142314241425142614271428142914301431143214331434143514361437143814391440144114421443144414451446144714481449145014511452145314541455145614571458145914601461146214631464146514661467146814691470147114721473147414751476147714781479148014811482148314841485148614871488148914901491149214931494149514961497149814991500150115021503150415051506150715081509151015111512151315141515151615171518151915201521152215231524152515261527152815291530153115321533153415351536153715381539154015411542154315441545154615471548154915501551155215531554155515561557155815591560156115621563156415651566156715681569157015711572157315741575157615771578157915801581158215831584158515861587158815891590159115921593159415951596159715981599160016011602160316041605160616071608160916101611161216131614161516161617161816191620162116221623162416251626162716281629163016311632163316341635163616371638163916401641164216431644164516461647164816491650165116521653165416551656165716581659166016611662166316641665166616671668166916701671167216731674167516761677167816791680168116821683168416851686168716881689169016911692169316941695169616971698169917001701170217031704170517061707170817091710171117121713171417151716171717181719172017211722172317241725172617271728172917301731173217331734173517361737173817391740174117421743174417451746174717481749175017511752175317541755175617571758175917601761176217631764176517661767176817691770177117721773177417751776177717781779178017811782178317841785178617871788178917901791179217931794179517961797179817991800180118021803180418051806180718081809181018111812181318141815181618171818181918201821182218231824182518261827182818291830183118321833183418351836183718381839184018411842184318441845184618471848184918501851185218531854185518561857185818591860186118621863186418651866186718681869187018711872187318741875187618771878187918801881188218831884188518861887188818891890189118921893189418951896189718981899190019011902190319041905190619071908190919101911191219131914191519161917191819191920192119221923192419251926192719281929193019311932193319341935193619371938193919401941194219431944194519461947194819491950195119521953195419551956195719581959
  1. @node Message Translation, Searching and Sorting, Locales, Top
  2. @c %MENU% How to make the program speak the user's language
  3. @chapter Message Translation
  4. The program's interface with the user should be designed to ease the user's
  5. task. One way to ease the user's task is to use messages in whatever
  6. language the user prefers.
  7. Printing messages in different languages can be implemented in different
  8. ways. One could add all the different languages in the source code and
  9. choose among the variants every time a message has to be printed. This is
  10. certainly not a good solution since extending the set of languages is
  11. cumbersome (the code must be changed) and the code itself can become
  12. really big with dozens of message sets.
  13. A better solution is to keep the message sets for each language
  14. in separate files which are loaded at runtime depending on the language
  15. selection of the user.
  16. @Theglibc{} provides two different sets of functions to support
  17. message translation. The problem is that neither of the interfaces is
  18. officially defined by the POSIX standard. The @code{catgets} family of
  19. functions is defined in the X/Open standard but this is derived from
  20. industry decisions and therefore not necessarily based on reasonable
  21. decisions.
  22. As mentioned above, the message catalog handling provides easy
  23. extendability by using external data files which contain the message
  24. translations. I.e., these files contain for each of the messages used
  25. in the program a translation for the appropriate language. So the tasks
  26. of the message handling functions are
  27. @itemize @bullet
  28. @item
  29. locate the external data file with the appropriate translations
  30. @item
  31. load the data and make it possible to address the messages
  32. @item
  33. map a given key to the translated message
  34. @end itemize
  35. The two approaches mainly differ in the implementation of this last
  36. step. Decisions made in the last step influence the rest of the design.
  37. @menu
  38. * Message catalogs a la X/Open:: The @code{catgets} family of functions.
  39. * The Uniforum approach:: The @code{gettext} family of functions.
  40. @end menu
  41. @node Message catalogs a la X/Open
  42. @section X/Open Message Catalog Handling
  43. The @code{catgets} functions are based on the simple scheme:
  44. @quotation
  45. Associate every message to translate in the source code with a unique
  46. identifier. To retrieve a message from a catalog file solely the
  47. identifier is used.
  48. @end quotation
  49. This means for the author of the program that s/he will have to make
  50. sure the meaning of the identifier in the program code and in the
  51. message catalogs is always the same.
  52. Before a message can be translated the catalog file must be located.
  53. The user of the program must be able to guide the responsible function
  54. to find whatever catalog the user wants. This is separated from what
  55. the programmer had in mind.
  56. All the types, constants and functions for the @code{catgets} functions
  57. are defined/declared in the @file{nl_types.h} header file.
  58. @menu
  59. * The catgets Functions:: The @code{catgets} function family.
  60. * The message catalog files:: Format of the message catalog files.
  61. * The gencat program:: How to generate message catalogs files which
  62. can be used by the functions.
  63. * Common Usage:: How to use the @code{catgets} interface.
  64. @end menu
  65. @node The catgets Functions
  66. @subsection The @code{catgets} function family
  67. @deftypefun nl_catd catopen (const char *@var{cat_name}, int @var{flag})
  68. @standards{X/Open, nl_types.h}
  69. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
  70. @c catopen @mtsenv @ascuheap @acsmem
  71. @c strchr ok
  72. @c setlocale(,NULL) ok
  73. @c getenv @mtsenv
  74. @c strlen ok
  75. @c alloca ok
  76. @c stpcpy ok
  77. @c malloc @ascuheap @acsmem
  78. @c __open_catalog @ascuheap @acsmem
  79. @c strchr ok
  80. @c open_not_cancel_2 @acsfd
  81. @c strlen ok
  82. @c ENOUGH ok
  83. @c alloca ok
  84. @c memcpy ok
  85. @c fxstat64 ok
  86. @c __set_errno ok
  87. @c mmap @acsmem
  88. @c malloc dup @ascuheap @acsmem
  89. @c read_not_cancel ok
  90. @c free dup @ascuheap @acsmem
  91. @c munmap ok
  92. @c close_not_cancel_no_status ok
  93. @c free @ascuheap @acsmem
  94. The @code{catopen} function tries to locate the message data file named
  95. @var{cat_name} and loads it when found. The return value is of an
  96. opaque type and can be used in calls to the other functions to refer to
  97. this loaded catalog.
  98. The return value is @code{(nl_catd) -1} in case the function failed and
  99. no catalog was loaded. The global variable @code{errno} contains a code
  100. for the error causing the failure. But even if the function call
  101. succeeded this does not mean that all messages can be translated.
  102. Locating the catalog file must happen in a way which lets the user of
  103. the program influence the decision. It is up to the user to decide
  104. about the language to use and sometimes it is useful to use alternate
  105. catalog files. All this can be specified by the user by setting some
  106. environment variables.
  107. The first problem is to find out where all the message catalogs are
  108. stored. Every program could have its own place to keep all the
  109. different files but usually the catalog files are grouped by languages
  110. and the catalogs for all programs are kept in the same place.
  111. @cindex NLSPATH environment variable
  112. To tell the @code{catopen} function where the catalog for the program
  113. can be found the user can set the environment variable @code{NLSPATH} to
  114. a value which describes her/his choice. Since this value must be usable
  115. for different languages and locales it cannot be a simple string.
  116. Instead it is a format string (similar to @code{printf}'s). An example
  117. is
  118. @smallexample
  119. /usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N
  120. @end smallexample
  121. First one can see that more than one directory can be specified (with
  122. the usual syntax of separating them by colons). The next things to
  123. observe are the format string, @code{%L} and @code{%N} in this case.
  124. The @code{catopen} function knows about several of them and the
  125. replacement for all of them is of course different.
  126. @table @code
  127. @item %N
  128. This format element is substituted with the name of the catalog file.
  129. This is the value of the @var{cat_name} argument given to
  130. @code{catgets}.
  131. @item %L
  132. This format element is substituted with the name of the currently
  133. selected locale for translating messages. How this is determined is
  134. explained below.
  135. @item %l
  136. (This is the lowercase ell.) This format element is substituted with the
  137. language element of the locale name. The string describing the selected
  138. locale is expected to have the form
  139. @code{@var{lang}[_@var{terr}[.@var{codeset}]]} and this format uses the
  140. first part @var{lang}.
  141. @item %t
  142. This format element is substituted by the territory part @var{terr} of
  143. the name of the currently selected locale. See the explanation of the
  144. format above.
  145. @item %c
  146. This format element is substituted by the codeset part @var{codeset} of
  147. the name of the currently selected locale. See the explanation of the
  148. format above.
  149. @item %%
  150. Since @code{%} is used as a meta character there must be a way to
  151. express the @code{%} character in the result itself. Using @code{%%}
  152. does this just like it works for @code{printf}.
  153. @end table
  154. Using @code{NLSPATH} allows arbitrary directories to be searched for
  155. message catalogs while still allowing different languages to be used.
  156. If the @code{NLSPATH} environment variable is not set, the default value
  157. is
  158. @smallexample
  159. @var{prefix}/share/locale/%L/%N:@var{prefix}/share/locale/%L/LC_MESSAGES/%N
  160. @end smallexample
  161. @noindent
  162. where @var{prefix} is given to @code{configure} while installing @theglibc{}
  163. (this value is in many cases @code{/usr} or the empty string).
  164. The remaining problem is to decide which must be used. The value
  165. decides about the substitution of the format elements mentioned above.
  166. First of all the user can specify a path in the message catalog name
  167. (i.e., the name contains a slash character). In this situation the
  168. @code{NLSPATH} environment variable is not used. The catalog must exist
  169. as specified in the program, perhaps relative to the current working
  170. directory. This situation in not desirable and catalogs names never
  171. should be written this way. Beside this, this behavior is not portable
  172. to all other platforms providing the @code{catgets} interface.
  173. @cindex LC_ALL environment variable
  174. @cindex LC_MESSAGES environment variable
  175. @cindex LANG environment variable
  176. Otherwise the values of environment variables from the standard
  177. environment are examined (@pxref{Standard Environment}). Which
  178. variables are examined is decided by the @var{flag} parameter of
  179. @code{catopen}. If the value is @code{NL_CAT_LOCALE} (which is defined
  180. in @file{nl_types.h}) then the @code{catopen} function uses the name of
  181. the locale currently selected for the @code{LC_MESSAGES} category.
  182. If @var{flag} is zero the @code{LANG} environment variable is examined.
  183. This is a left-over from the early days when the concept of locales
  184. had not even reached the level of POSIX locales.
  185. The environment variable and the locale name should have a value of the
  186. form @code{@var{lang}[_@var{terr}[.@var{codeset}]]} as explained above.
  187. If no environment variable is set the @code{"C"} locale is used which
  188. prevents any translation.
  189. The return value of the function is in any case a valid string. Either
  190. it is a translation from a message catalog or it is the same as the
  191. @var{string} parameter. So a piece of code to decide whether a
  192. translation actually happened must look like this:
  193. @smallexample
  194. @{
  195. char *trans = catgets (desc, set, msg, input_string);
  196. if (trans == input_string)
  197. @{
  198. /* Something went wrong. */
  199. @}
  200. @}
  201. @end smallexample
  202. @noindent
  203. When an error occurs the global variable @code{errno} is set to
  204. @table @var
  205. @item EBADF
  206. The catalog does not exist.
  207. @item ENOMSG
  208. The set/message tuple does not name an existing element in the
  209. message catalog.
  210. @end table
  211. While it sometimes can be useful to test for errors programs normally
  212. will avoid any test. If the translation is not available it is no big
  213. problem if the original, untranslated message is printed. Either the
  214. user understands this as well or s/he will look for the reason why the
  215. messages are not translated.
  216. @end deftypefun
  217. Please note that the currently selected locale does not depend on a call
  218. to the @code{setlocale} function. It is not necessary that the locale
  219. data files for this locale exist and calling @code{setlocale} succeeds.
  220. The @code{catopen} function directly reads the values of the environment
  221. variables.
  222. @deftypefun {char *} catgets (nl_catd @var{catalog_desc}, int @var{set}, int @var{message}, const char *@var{string})
  223. @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
  224. The function @code{catgets} has to be used to access the message catalog
  225. previously opened using the @code{catopen} function. The
  226. @var{catalog_desc} parameter must be a value previously returned by
  227. @code{catopen}.
  228. The next two parameters, @var{set} and @var{message}, reflect the
  229. internal organization of the message catalog files. This will be
  230. explained in detail below. For now it is interesting to know that a
  231. catalog can consist of several sets and the messages in each thread are
  232. individually numbered using numbers. Neither the set number nor the
  233. message number must be consecutive. They can be arbitrarily chosen.
  234. But each message (unless equal to another one) must have its own unique
  235. pair of set and message numbers.
  236. Since it is not guaranteed that the message catalog for the language
  237. selected by the user exists the last parameter @var{string} helps to
  238. handle this case gracefully. If no matching string can be found
  239. @var{string} is returned. This means for the programmer that
  240. @itemize @bullet
  241. @item
  242. the @var{string} parameters should contain reasonable text (this also
  243. helps to understand the program seems otherwise there would be no hint
  244. on the string which is expected to be returned.
  245. @item
  246. all @var{string} arguments should be written in the same language.
  247. @end itemize
  248. @end deftypefun
  249. It is somewhat uncomfortable to write a program using the @code{catgets}
  250. functions if no supporting functionality is available. Since each
  251. set/message number tuple must be unique the programmer must keep lists
  252. of the messages at the same time the code is written. And the work
  253. between several people working on the same project must be coordinated.
  254. We will see how some of these problems can be relaxed a bit (@pxref{Common
  255. Usage}).
  256. @deftypefun int catclose (nl_catd @var{catalog_desc})
  257. @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
  258. @c catclose @ascuheap @acucorrupt @acsmem
  259. @c __set_errno ok
  260. @c munmap ok
  261. @c free @ascuheap @acsmem
  262. The @code{catclose} function can be used to free the resources
  263. associated with a message catalog which previously was opened by a call
  264. to @code{catopen}. If the resources can be successfully freed the
  265. function returns @code{0}. Otherwise it returns @code{@minus{}1} and the
  266. global variable @code{errno} is set. Errors can occur if the catalog
  267. descriptor @var{catalog_desc} is not valid in which case @code{errno} is
  268. set to @code{EBADF}.
  269. @end deftypefun
  270. @node The message catalog files
  271. @subsection Format of the message catalog files
  272. The only reasonable way to translate all the messages of a function and
  273. store the result in a message catalog file which can be read by the
  274. @code{catopen} function is to write all the message text to the
  275. translator and let her/him translate them all. I.e., we must have a
  276. file with entries which associate the set/message tuple with a specific
  277. translation. This file format is specified in the X/Open standard and
  278. is as follows:
  279. @itemize @bullet
  280. @item
  281. Lines containing only whitespace characters or empty lines are ignored.
  282. @item
  283. Lines which contain as the first non-whitespace character a @code{$}
  284. followed by a whitespace character are comment and are also ignored.
  285. @item
  286. If a line contains as the first non-whitespace characters the sequence
  287. @code{$set} followed by a whitespace character an additional argument
  288. is required to follow. This argument can either be:
  289. @itemize @minus
  290. @item
  291. a number. In this case the value of this number determines the set
  292. to which the following messages are added.
  293. @item
  294. an identifier consisting of alphanumeric characters plus the underscore
  295. character. In this case the set get automatically a number assigned.
  296. This value is one added to the largest set number which so far appeared.
  297. How to use the symbolic names is explained in section @ref{Common Usage}.
  298. It is an error if a symbol name appears more than once. All following
  299. messages are placed in a set with this number.
  300. @end itemize
  301. @item
  302. If a line contains as the first non-whitespace characters the sequence
  303. @code{$delset} followed by a whitespace character an additional argument
  304. is required to follow. This argument can either be:
  305. @itemize @minus
  306. @item
  307. a number. In this case the value of this number determines the set
  308. which will be deleted.
  309. @item
  310. an identifier consisting of alphanumeric characters plus the underscore
  311. character. This symbolic identifier must match a name for a set which
  312. previously was defined. It is an error if the name is unknown.
  313. @end itemize
  314. In both cases all messages in the specified set will be removed. They
  315. will not appear in the output. But if this set is later again selected
  316. with a @code{$set} command again messages could be added and these
  317. messages will appear in the output.
  318. @item
  319. If a line contains after leading whitespaces the sequence
  320. @code{$quote}, the quoting character used for this input file is
  321. changed to the first non-whitespace character following
  322. @code{$quote}. If no non-whitespace character is present before the
  323. line ends quoting is disabled.
  324. By default no quoting character is used. In this mode strings are
  325. terminated with the first unescaped line break. If there is a
  326. @code{$quote} sequence present newline need not be escaped. Instead a
  327. string is terminated with the first unescaped appearance of the quote
  328. character.
  329. A common usage of this feature would be to set the quote character to
  330. @code{"}. Then any appearance of the @code{"} in the strings must
  331. be escaped using the backslash (i.e., @code{\"} must be written).
  332. @item
  333. Any other line must start with a number or an alphanumeric identifier
  334. (with the underscore character included). The following characters
  335. (starting after the first whitespace character) will form the string
  336. which gets associated with the currently selected set and the message
  337. number represented by the number and identifier respectively.
  338. If the start of the line is a number the message number is obvious. It
  339. is an error if the same message number already appeared for this set.
  340. If the leading token was an identifier the message number gets
  341. automatically assigned. The value is the current maximum message
  342. number for this set plus one. It is an error if the identifier was
  343. already used for a message in this set. It is OK to reuse the
  344. identifier for a message in another thread. How to use the symbolic
  345. identifiers will be explained below (@pxref{Common Usage}). There is
  346. one limitation with the identifier: it must not be @code{Set}. The
  347. reason will be explained below.
  348. The text of the messages can contain escape characters. The usual bunch
  349. of characters known from the @w{ISO C} language are recognized
  350. (@code{\n}, @code{\t}, @code{\v}, @code{\b}, @code{\r}, @code{\f},
  351. @code{\\}, and @code{\@var{nnn}}, where @var{nnn} is the octal coding of
  352. a character code).
  353. @end itemize
  354. @strong{Important:} The handling of identifiers instead of numbers for
  355. the set and messages is a GNU extension. Systems strictly following the
  356. X/Open specification do not have this feature. An example for a message
  357. catalog file is this:
  358. @smallexample
  359. $ This is a leading comment.
  360. $quote "
  361. $set SetOne
  362. 1 Message with ID 1.
  363. two " Message with ID \"two\", which gets the value 2 assigned"
  364. $set SetTwo
  365. $ Since the last set got the number 1 assigned this set has number 2.
  366. 4000 "The numbers can be arbitrary, they need not start at one."
  367. @end smallexample
  368. This small example shows various aspects:
  369. @itemize @bullet
  370. @item
  371. Lines 1 and 9 are comments since they start with @code{$} followed by
  372. a whitespace.
  373. @item
  374. The quoting character is set to @code{"}. Otherwise the quotes in the
  375. message definition would have to be omitted and in this case the
  376. message with the identifier @code{two} would lose its leading whitespace.
  377. @item
  378. Mixing numbered messages with messages having symbolic names is no
  379. problem and the numbering happens automatically.
  380. @end itemize
  381. While this file format is pretty easy it is not the best possible for
  382. use in a running program. The @code{catopen} function would have to
  383. parse the file and handle syntactic errors gracefully. This is not so
  384. easy and the whole process is pretty slow. Therefore the @code{catgets}
  385. functions expect the data in another more compact and ready-to-use file
  386. format. There is a special program @code{gencat} which is explained in
  387. detail in the next section.
  388. Files in this other format are not human readable. To be easy to use by
  389. programs it is a binary file. But the format is byte order independent
  390. so translation files can be shared by systems of arbitrary architecture
  391. (as long as they use @theglibc{}).
  392. Details about the binary file format are not important to know since
  393. these files are always created by the @code{gencat} program. The
  394. sources of @theglibc{} also provide the sources for the
  395. @code{gencat} program and so the interested reader can look through
  396. these source files to learn about the file format.
  397. @node The gencat program
  398. @subsection Generate Message Catalogs files
  399. @cindex gencat
  400. The @code{gencat} program is specified in the X/Open standard and the
  401. GNU implementation follows this specification and so processes
  402. all correctly formed input files. Additionally some extension are
  403. implemented which help to work in a more reasonable way with the
  404. @code{catgets} functions.
  405. The @code{gencat} program can be invoked in two ways:
  406. @example
  407. `gencat [@var{Option} @dots{}] [@var{Output-File} [@var{Input-File} @dots{}]]`
  408. @end example
  409. This is the interface defined in the X/Open standard. If no
  410. @var{Input-File} parameter is given, input will be read from standard
  411. input. Multiple input files will be read as if they were concatenated.
  412. If @var{Output-File} is also missing, the output will be written to
  413. standard output. To provide the interface one is used to from other
  414. programs a second interface is provided.
  415. @smallexample
  416. `gencat [@var{Option} @dots{}] -o @var{Output-File} [@var{Input-File} @dots{}]`
  417. @end smallexample
  418. The option @samp{-o} is used to specify the output file and all file
  419. arguments are used as input files.
  420. Beside this one can use @file{-} or @file{/dev/stdin} for
  421. @var{Input-File} to denote the standard input. Corresponding one can
  422. use @file{-} and @file{/dev/stdout} for @var{Output-File} to denote
  423. standard output. Using @file{-} as a file name is allowed in X/Open
  424. while using the device names is a GNU extension.
  425. The @code{gencat} program works by concatenating all input files and
  426. then @strong{merging} the resulting collection of message sets with a
  427. possibly existing output file. This is done by removing all messages
  428. with set/message number tuples matching any of the generated messages
  429. from the output file and then adding all the new messages. To
  430. regenerate a catalog file while ignoring the old contents therefore
  431. requires removing the output file if it exists. If the output is
  432. written to standard output no merging takes place.
  433. @noindent
  434. The following table shows the options understood by the @code{gencat}
  435. program. The X/Open standard does not specify any options for the
  436. program so all of these are GNU extensions.
  437. @table @samp
  438. @item -V
  439. @itemx --version
  440. Print the version information and exit.
  441. @item -h
  442. @itemx --help
  443. Print a usage message listing all available options, then exit successfully.
  444. @item --new
  445. Do not merge the new messages from the input files with the old content
  446. of the output file. The old content of the output file is discarded.
  447. @item -H
  448. @itemx --header=name
  449. This option is used to emit the symbolic names given to sets and
  450. messages in the input files for use in the program. Details about how
  451. to use this are given in the next section. The @var{name} parameter to
  452. this option specifies the name of the output file. It will contain a
  453. number of C preprocessor @code{#define}s to associate a name with a
  454. number.
  455. Please note that the generated file only contains the symbols from the
  456. input files. If the output is merged with the previous content of the
  457. output file the possibly existing symbols from the file(s) which
  458. generated the old output files are not in the generated header file.
  459. @end table
  460. @node Common Usage
  461. @subsection How to use the @code{catgets} interface
  462. The @code{catgets} functions can be used in two different ways. By
  463. following slavishly the X/Open specs and not relying on the extension
  464. and by using the GNU extensions. We will take a look at the former
  465. method first to understand the benefits of extensions.
  466. @subsubsection Not using symbolic names
  467. Since the X/Open format of the message catalog files does not allow
  468. symbol names we have to work with numbers all the time. When we start
  469. writing a program we have to replace all appearances of translatable
  470. strings with something like
  471. @smallexample
  472. catgets (catdesc, set, msg, "string")
  473. @end smallexample
  474. @noindent
  475. @var{catgets} is retrieved from a call to @code{catopen} which is
  476. normally done once at the program start. The @code{"string"} is the
  477. string we want to translate. The problems start with the set and
  478. message numbers.
  479. In a bigger program several programmers usually work at the same time on
  480. the program and so coordinating the number allocation is crucial.
  481. Though no two different strings must be indexed by the same tuple of
  482. numbers it is highly desirable to reuse the numbers for equal strings
  483. with equal translations (please note that there might be strings which
  484. are equal in one language but have different translations due to
  485. difference contexts).
  486. The allocation process can be relaxed a bit by different set numbers for
  487. different parts of the program. So the number of developers who have to
  488. coordinate the allocation can be reduced. But still lists must be keep
  489. track of the allocation and errors can easily happen. These errors
  490. cannot be discovered by the compiler or the @code{catgets} functions.
  491. Only the user of the program might see wrong messages printed. In the
  492. worst cases the messages are so irritating that they cannot be
  493. recognized as wrong. Think about the translations for @code{"true"} and
  494. @code{"false"} being exchanged. This could result in a disaster.
  495. @subsubsection Using symbolic names
  496. The problems mentioned in the last section derive from the fact that:
  497. @enumerate
  498. @item
  499. the numbers are allocated once and due to the possibly frequent use of
  500. them it is difficult to change a number later.
  501. @item
  502. the numbers do not allow guessing anything about the string and
  503. therefore collisions can easily happen.
  504. @end enumerate
  505. By constantly using symbolic names and by providing a method which maps
  506. the string content to a symbolic name (however this will happen) one can
  507. prevent both problems above. The cost of this is that the programmer
  508. has to write a complete message catalog file while s/he is writing the
  509. program itself.
  510. This is necessary since the symbolic names must be mapped to numbers
  511. before the program sources can be compiled. In the last section it was
  512. described how to generate a header containing the mapping of the names.
  513. E.g., for the example message file given in the last section we could
  514. call the @code{gencat} program as follows (assume @file{ex.msg} contains
  515. the sources).
  516. @smallexample
  517. gencat -H ex.h -o ex.cat ex.msg
  518. @end smallexample
  519. @noindent
  520. This generates a header file with the following content:
  521. @smallexample
  522. #define SetTwoSet 0x2 /* ex.msg:8 */
  523. #define SetOneSet 0x1 /* ex.msg:4 */
  524. #define SetOnetwo 0x2 /* ex.msg:6 */
  525. @end smallexample
  526. As can be seen the various symbols given in the source file are mangled
  527. to generate unique identifiers and these identifiers get numbers
  528. assigned. Reading the source file and knowing about the rules will
  529. allow to predict the content of the header file (it is deterministic)
  530. but this is not necessary. The @code{gencat} program can take care for
  531. everything. All the programmer has to do is to put the generated header
  532. file in the dependency list of the source files of her/his project and
  533. add a rule to regenerate the header if any of the input files change.
  534. One word about the symbol mangling. Every symbol consists of two parts:
  535. the name of the message set plus the name of the message or the special
  536. string @code{Set}. So @code{SetOnetwo} means this macro can be used to
  537. access the translation with identifier @code{two} in the message set
  538. @code{SetOne}.
  539. The other names denote the names of the message sets. The special
  540. string @code{Set} is used in the place of the message identifier.
  541. If in the code the second string of the set @code{SetOne} is used the C
  542. code should look like this:
  543. @smallexample
  544. catgets (catdesc, SetOneSet, SetOnetwo,
  545. " Message with ID \"two\", which gets the value 2 assigned")
  546. @end smallexample
  547. Writing the function this way will allow to change the message number
  548. and even the set number without requiring any change in the C source
  549. code. (The text of the string is normally not the same; this is only
  550. for this example.)
  551. @subsubsection How does to this allow to develop
  552. To illustrate the usual way to work with the symbolic version numbers
  553. here is a little example. Assume we want to write the very complex and
  554. famous greeting program. We start by writing the code as usual:
  555. @smallexample
  556. #include <stdio.h>
  557. int
  558. main (void)
  559. @{
  560. printf ("Hello, world!\n");
  561. return 0;
  562. @}
  563. @end smallexample
  564. Now we want to internationalize the message and therefore replace the
  565. message with whatever the user wants.
  566. @smallexample
  567. #include <nl_types.h>
  568. #include <stdio.h>
  569. #include "msgnrs.h"
  570. int
  571. main (void)
  572. @{
  573. nl_catd catdesc = catopen ("hello.cat", NL_CAT_LOCALE);
  574. printf (catgets (catdesc, SetMainSet, SetMainHello,
  575. "Hello, world!\n"));
  576. catclose (catdesc);
  577. return 0;
  578. @}
  579. @end smallexample
  580. We see how the catalog object is opened and the returned descriptor used
  581. in the other function calls. It is not really necessary to check for
  582. failure of any of the functions since even in these situations the
  583. functions will behave reasonable. They simply will be return a
  584. translation.
  585. What remains unspecified here are the constants @code{SetMainSet} and
  586. @code{SetMainHello}. These are the symbolic names describing the
  587. message. To get the actual definitions which match the information in
  588. the catalog file we have to create the message catalog source file and
  589. process it using the @code{gencat} program.
  590. @smallexample
  591. $ Messages for the famous greeting program.
  592. $quote "
  593. $set Main
  594. Hello "Hallo, Welt!\n"
  595. @end smallexample
  596. Now we can start building the program (assume the message catalog source
  597. file is named @file{hello.msg} and the program source file @file{hello.c}):
  598. @smallexample
  599. % gencat -H msgnrs.h -o hello.cat hello.msg
  600. % cat msgnrs.h
  601. #define MainSet 0x1 /* hello.msg:4 */
  602. #define MainHello 0x1 /* hello.msg:5 */
  603. % gcc -o hello hello.c -I.
  604. % cp hello.cat /usr/share/locale/de/LC_MESSAGES
  605. % echo $LC_ALL
  606. de
  607. % ./hello
  608. Hallo, Welt!
  609. %
  610. @end smallexample
  611. The call of the @code{gencat} program creates the missing header file
  612. @file{msgnrs.h} as well as the message catalog binary. The former is
  613. used in the compilation of @file{hello.c} while the later is placed in a
  614. directory in which the @code{catopen} function will try to locate it.
  615. Please check the @code{LC_ALL} environment variable and the default path
  616. for @code{catopen} presented in the description above.
  617. @node The Uniforum approach
  618. @section The Uniforum approach to Message Translation
  619. Sun Microsystems tried to standardize a different approach to message
  620. translation in the Uniforum group. There never was a real standard
  621. defined but still the interface was used in Sun's operating systems.
  622. Since this approach fits better in the development process of free
  623. software it is also used throughout the GNU project and the GNU
  624. @file{gettext} package provides support for this outside @theglibc{}.
  625. The code of the @file{libintl} from GNU @file{gettext} is the same as
  626. the code in @theglibc{}. So the documentation in the GNU
  627. @file{gettext} manual is also valid for the functionality here. The
  628. following text will describe the library functions in detail. But the
  629. numerous helper programs are not described in this manual. Instead
  630. people should read the GNU @file{gettext} manual
  631. (@pxref{Top,,GNU gettext utilities,gettext,Native Language Support Library and Tools}).
  632. We will only give a short overview.
  633. Though the @code{catgets} functions are available by default on more
  634. systems the @code{gettext} interface is at least as portable as the
  635. former. The GNU @file{gettext} package can be used wherever the
  636. functions are not available.
  637. @menu
  638. * Message catalogs with gettext:: The @code{gettext} family of functions.
  639. * Helper programs for gettext:: Programs to handle message catalogs
  640. for @code{gettext}.
  641. @end menu
  642. @node Message catalogs with gettext
  643. @subsection The @code{gettext} family of functions
  644. The paradigms underlying the @code{gettext} approach to message
  645. translations is different from that of the @code{catgets} functions the
  646. basic functionally is equivalent. There are functions of the following
  647. categories:
  648. @menu
  649. * Translation with gettext:: What has to be done to translate a message.
  650. * Locating gettext catalog:: How to determine which catalog to be used.
  651. * Advanced gettext functions:: Additional functions for more complicated
  652. situations.
  653. * Charset conversion in gettext:: How to specify the output character set
  654. @code{gettext} uses.
  655. * GUI program problems:: How to use @code{gettext} in GUI programs.
  656. * Using gettextized software:: The possibilities of the user to influence
  657. the way @code{gettext} works.
  658. @end menu
  659. @node Translation with gettext
  660. @subsubsection What has to be done to translate a message?
  661. The @code{gettext} functions have a very simple interface. The most
  662. basic function just takes the string which shall be translated as the
  663. argument and it returns the translation. This is fundamentally
  664. different from the @code{catgets} approach where an extra key is
  665. necessary and the original string is only used for the error case.
  666. If the string which has to be translated is the only argument this of
  667. course means the string itself is the key. I.e., the translation will
  668. be selected based on the original string. The message catalogs must
  669. therefore contain the original strings plus one translation for any such
  670. string. The task of the @code{gettext} function is to compare the
  671. argument string with the available strings in the catalog and return the
  672. appropriate translation. Of course this process is optimized so that
  673. this process is not more expensive than an access using an atomic key
  674. like in @code{catgets}.
  675. The @code{gettext} approach has some advantages but also some
  676. disadvantages. Please see the GNU @file{gettext} manual for a detailed
  677. discussion of the pros and cons.
  678. All the definitions and declarations for @code{gettext} can be found in
  679. the @file{libintl.h} header file. On systems where these functions are
  680. not part of the C library they can be found in a separate library named
  681. @file{libintl.a} (or accordingly different for shared libraries).
  682. @deftypefun {char *} gettext (const char *@var{msgid})
  683. @standards{GNU, libintl.h}
  684. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  685. @c Wrapper for dcgettext.
  686. The @code{gettext} function searches the currently selected message
  687. catalogs for a string which is equal to @var{msgid}. If there is such a
  688. string available it is returned. Otherwise the argument string
  689. @var{msgid} is returned.
  690. Please note that although the return value is @code{char *} the
  691. returned string must not be changed. This broken type results from the
  692. history of the function and does not reflect the way the function should
  693. be used.
  694. Please note that above we wrote ``message catalogs'' (plural). This is
  695. a specialty of the GNU implementation of these functions and we will
  696. say more about this when we talk about the ways message catalogs are
  697. selected (@pxref{Locating gettext catalog}).
  698. The @code{gettext} function does not modify the value of the global
  699. @code{errno} variable. This is necessary to make it possible to write
  700. something like
  701. @smallexample
  702. printf (gettext ("Operation failed: %m\n"));
  703. @end smallexample
  704. Here the @code{errno} value is used in the @code{printf} function while
  705. processing the @code{%m} format element and if the @code{gettext}
  706. function would change this value (it is called before @code{printf} is
  707. called) we would get a wrong message.
  708. So there is no easy way to detect a missing message catalog besides
  709. comparing the argument string with the result. But it is normally the
  710. task of the user to react on missing catalogs. The program cannot guess
  711. when a message catalog is really necessary since for a user who speaks
  712. the language the program was developed in, the message does not need any translation.
  713. @end deftypefun
  714. The remaining two functions to access the message catalog add some
  715. functionality to select a message catalog which is not the default one.
  716. This is important if parts of the program are developed independently.
  717. Every part can have its own message catalog and all of them can be used
  718. at the same time. The C library itself is an example: internally it
  719. uses the @code{gettext} functions but since it must not depend on a
  720. currently selected default message catalog it must specify all ambiguous
  721. information.
  722. @deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
  723. @standards{GNU, libintl.h}
  724. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  725. @c Wrapper for dcgettext.
  726. The @code{dgettext} function acts just like the @code{gettext}
  727. function. It only takes an additional first argument @var{domainname}
  728. which guides the selection of the message catalogs which are searched
  729. for the translation. If the @var{domainname} parameter is the null
  730. pointer the @code{dgettext} function is exactly equivalent to
  731. @code{gettext} since the default value for the domain name is used.
  732. As for @code{gettext} the return value type is @code{char *} which is an
  733. anachronism. The returned string must never be modified.
  734. @end deftypefun
  735. @deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
  736. @standards{GNU, libintl.h}
  737. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  738. @c dcgettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  739. @c dcigettext @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  740. @c libc_rwlock_rdlock @asulock @aculock
  741. @c current_locale_name ok [protected from @mtslocale]
  742. @c tfind ok
  743. @c libc_rwlock_unlock ok
  744. @c plural_lookup ok
  745. @c plural_eval ok
  746. @c rawmemchr ok
  747. @c DETERMINE_SECURE ok, nothing
  748. @c strcmp ok
  749. @c strlen ok
  750. @c getcwd @ascuheap @acsmem @acsfd
  751. @c strchr ok
  752. @c stpcpy ok
  753. @c category_to_name ok
  754. @c guess_category_value @mtsenv
  755. @c getenv @mtsenv
  756. @c current_locale_name dup ok [protected from @mtslocale by dcigettext]
  757. @c strcmp ok
  758. @c ENABLE_SECURE ok
  759. @c _nl_find_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  760. @c libc_rwlock_rdlock dup @asulock @aculock
  761. @c _nl_make_l10nflist dup @ascuheap @acsmem
  762. @c libc_rwlock_unlock dup ok
  763. @c _nl_load_domain @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  764. @c libc_lock_lock_recursive @aculock
  765. @c libc_lock_unlock_recursive @aculock
  766. @c open->open_not_cancel_2 @acsfd
  767. @c fstat ok
  768. @c mmap dup @acsmem
  769. @c close->close_not_cancel_no_status @acsfd
  770. @c malloc dup @ascuheap @acsmem
  771. @c read->read_not_cancel ok
  772. @c munmap dup @acsmem
  773. @c W dup ok
  774. @c strlen dup ok
  775. @c get_sysdep_segment_value ok
  776. @c memcpy dup ok
  777. @c hash_string dup ok
  778. @c free dup @ascuheap @acsmem
  779. @c libc_rwlock_init ok
  780. @c _nl_find_msg dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  781. @c libc_rwlock_fini ok
  782. @c EXTRACT_PLURAL_EXPRESSION @ascuheap @acsmem
  783. @c strstr dup ok
  784. @c isspace ok
  785. @c strtoul ok
  786. @c PLURAL_PARSE @ascuheap @acsmem
  787. @c malloc dup @ascuheap @acsmem
  788. @c free dup @ascuheap @acsmem
  789. @c INIT_GERMANIC_PLURAL ok, nothing
  790. @c the pre-C99 variant is @acucorrupt [protected from @mtuinit by dcigettext]
  791. @c _nl_expand_alias dup @ascuheap @asulock @acsmem @acsfd @aculock
  792. @c _nl_explode_name dup @ascuheap @acsmem
  793. @c libc_rwlock_wrlock dup @asulock @aculock
  794. @c free dup @asulock @aculock @acsfd @acsmem
  795. @c _nl_find_msg @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  796. @c _nl_load_domain dup @mtsenv @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsfd @acsmem
  797. @c strlen ok
  798. @c hash_string ok
  799. @c W ok
  800. @c SWAP ok
  801. @c bswap_32 ok
  802. @c strcmp ok
  803. @c get_output_charset @mtsenv @ascuheap @acsmem
  804. @c getenv dup @mtsenv
  805. @c strlen dup ok
  806. @c malloc dup @ascuheap @acsmem
  807. @c memcpy dup ok
  808. @c libc_rwlock_rdlock dup @asulock @aculock
  809. @c libc_rwlock_unlock dup ok
  810. @c libc_rwlock_wrlock dup @asulock @aculock
  811. @c realloc @ascuheap @acsmem
  812. @c strdup @ascuheap @acsmem
  813. @c strstr ok
  814. @c strcspn ok
  815. @c mempcpy dup ok
  816. @c norm_add_slashes dup ok
  817. @c gconv_open @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd
  818. @c [protected from @mtslocale by dcigettext locale lock]
  819. @c free dup @ascuheap @acsmem
  820. @c libc_lock_lock @asulock @aculock
  821. @c calloc @ascuheap @acsmem
  822. @c gconv dup @acucorrupt [protected from @mtsrace and @asucorrupt by lock]
  823. @c libc_lock_unlock ok
  824. @c malloc @ascuheap @acsmem
  825. @c mempcpy ok
  826. @c memcpy ok
  827. @c strcpy ok
  828. @c libc_rwlock_wrlock @asulock @aculock
  829. @c tsearch @ascuheap @acucorrupt @acsmem [protected from @mtsrace and @asucorrupt]
  830. @c transcmp ok
  831. @c strmp dup ok
  832. @c free @ascuheap @acsmem
  833. The @code{dcgettext} adds another argument to those which
  834. @code{dgettext} takes. This argument @var{category} specifies the last
  835. piece of information needed to localize the message catalog. I.e., the
  836. domain name and the locale category exactly specify which message
  837. catalog has to be used (relative to a given directory, see below).
  838. The @code{dgettext} function can be expressed in terms of
  839. @code{dcgettext} by using
  840. @smallexample
  841. dcgettext (domain, string, LC_MESSAGES)
  842. @end smallexample
  843. @noindent
  844. instead of
  845. @smallexample
  846. dgettext (domain, string)
  847. @end smallexample
  848. This also shows which values are expected for the third parameter. One
  849. has to use the available selectors for the categories available in
  850. @file{locale.h}. Normally the available values are @code{LC_CTYPE},
  851. @code{LC_COLLATE}, @code{LC_MESSAGES}, @code{LC_MONETARY},
  852. @code{LC_NUMERIC}, and @code{LC_TIME}. Please note that @code{LC_ALL}
  853. must not be used and even though the names might suggest this, there is
  854. no relation to the environment variable of this name.
  855. The @code{dcgettext} function is only implemented for compatibility with
  856. other systems which have @code{gettext} functions. There is not really
  857. any situation where it is necessary (or useful) to use a different value
  858. than @code{LC_MESSAGES} for the @var{category} parameter. We are
  859. dealing with messages here and any other choice can only be irritating.
  860. As for @code{gettext} the return value type is @code{char *} which is an
  861. anachronism. The returned string must never be modified.
  862. @end deftypefun
  863. When using the three functions above in a program it is a frequent case
  864. that the @var{msgid} argument is a constant string. So it is worthwhile to
  865. optimize this case. Thinking shortly about this one will realize that
  866. as long as no new message catalog is loaded the translation of a message
  867. will not change. This optimization is actually implemented by the
  868. @code{gettext}, @code{dgettext} and @code{dcgettext} functions.
  869. @node Locating gettext catalog
  870. @subsubsection How to determine which catalog to be used
  871. The functions to retrieve the translations for a given message have a
  872. remarkable simple interface. But to provide the user of the program
  873. still the opportunity to select exactly the translation s/he wants and
  874. also to provide the programmer the possibility to influence the way to
  875. locate the search for catalogs files there is a quite complicated
  876. underlying mechanism which controls all this. The code is complicated
  877. the use is easy.
  878. Basically we have two different tasks to perform which can also be
  879. performed by the @code{catgets} functions:
  880. @enumerate
  881. @item
  882. Locate the set of message catalogs. There are a number of files for
  883. different languages which all belong to the package. Usually they
  884. are all stored in the filesystem below a certain directory.
  885. There can be arbitrarily many packages installed and they can follow
  886. different guidelines for the placement of their files.
  887. @item
  888. Relative to the location specified by the package the actual translation
  889. files must be searched, based on the wishes of the user. I.e., for each
  890. language the user selects the program should be able to locate the
  891. appropriate file.
  892. @end enumerate
  893. This is the functionality required by the specifications for
  894. @code{gettext} and this is also what the @code{catgets} functions are
  895. able to do. But there are some problems unresolved:
  896. @itemize @bullet
  897. @item
  898. The language to be used can be specified in several different ways.
  899. There is no generally accepted standard for this and the user always
  900. expects the program to understand what s/he means. E.g., to select the
  901. German translation one could write @code{de}, @code{german}, or
  902. @code{deutsch} and the program should always react the same.
  903. @item
  904. Sometimes the specification of the user is too detailed. If s/he, e.g.,
  905. specifies @code{de_DE.ISO-8859-1} which means German, spoken in Germany,
  906. coded using the @w{ISO 8859-1} character set there is the possibility
  907. that a message catalog matching this exactly is not available. But
  908. there could be a catalog matching @code{de} and if the character set
  909. used on the machine is always @w{ISO 8859-1} there is no reason why this
  910. later message catalog should not be used. (We call this @dfn{message
  911. inheritance}.)
  912. @item
  913. If a catalog for a wanted language is not available it is not always the
  914. second best choice to fall back on the language of the developer and
  915. simply not translate any message. Instead a user might be better able
  916. to read the messages in another language and so the user of the program
  917. should be able to define a precedence order of languages.
  918. @end itemize
  919. We can divide the configuration actions in two parts: the one is
  920. performed by the programmer, the other by the user. We will start with
  921. the functions the programmer can use since the user configuration will
  922. be based on this.
  923. As the functions described in the last sections already mention separate
  924. sets of messages can be selected by a @dfn{domain name}. This is a
  925. simple string which should be unique for each program part that uses a
  926. separate domain. It is possible to use in one program arbitrarily many
  927. domains at the same time. E.g., @theglibc{} itself uses a domain
  928. named @code{libc} while the program using the C Library could use a
  929. domain named @code{foo}. The important point is that at any time
  930. exactly one domain is active. This is controlled with the following
  931. function.
  932. @deftypefun {char *} textdomain (const char *@var{domainname})
  933. @standards{GNU, libintl.h}
  934. @safety{@prelim{}@mtsafe{}@asunsafe{@asulock{} @ascuheap{}}@acunsafe{@aculock{} @acsmem{}}}
  935. @c textdomain @asulock @ascuheap @aculock @acsmem
  936. @c libc_rwlock_wrlock @asulock @aculock
  937. @c strcmp ok
  938. @c strdup @ascuheap @acsmem
  939. @c free @ascuheap @acsmem
  940. @c libc_rwlock_unlock ok
  941. The @code{textdomain} function sets the default domain, which is used in
  942. all future @code{gettext} calls, to @var{domainname}. Please note that
  943. @code{dgettext} and @code{dcgettext} calls are not influenced if the
  944. @var{domainname} parameter of these functions is not the null pointer.
  945. Before the first call to @code{textdomain} the default domain is
  946. @code{messages}. This is the name specified in the specification of
  947. the @code{gettext} API. This name is as good as any other name. No
  948. program should ever really use a domain with this name since this can
  949. only lead to problems.
  950. The function returns the value which is from now on taken as the default
  951. domain. If the system went out of memory the returned value is
  952. @code{NULL} and the global variable @code{errno} is set to @code{ENOMEM}.
  953. Despite the return value type being @code{char *} the return string must
  954. not be changed. It is allocated internally by the @code{textdomain}
  955. function.
  956. If the @var{domainname} parameter is the null pointer no new default
  957. domain is set. Instead the currently selected default domain is
  958. returned.
  959. If the @var{domainname} parameter is the empty string the default domain
  960. is reset to its initial value, the domain with the name @code{messages}.
  961. This possibility is questionable to use since the domain @code{messages}
  962. really never should be used.
  963. @end deftypefun
  964. @deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
  965. @standards{GNU, libintl.h}
  966. @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
  967. @c bindtextdomain @ascuheap @acsmem
  968. @c set_binding_values @ascuheap @acsmem
  969. @c libc_rwlock_wrlock dup @asulock @aculock
  970. @c strcmp dup ok
  971. @c strdup dup @ascuheap @acsmem
  972. @c free dup @ascuheap @acsmem
  973. @c malloc dup @ascuheap @acsmem
  974. The @code{bindtextdomain} function can be used to specify the directory
  975. which contains the message catalogs for domain @var{domainname} for the
  976. different languages. To be correct, this is the directory where the
  977. hierarchy of directories is expected. Details are explained below.
  978. For the programmer it is important to note that the translations which
  979. come with the program have to be placed in a directory hierarchy starting
  980. at, say, @file{/foo/bar}. Then the program should make a
  981. @code{bindtextdomain} call to bind the domain for the current program to
  982. this directory. So it is made sure the catalogs are found. A correctly
  983. running program does not depend on the user setting an environment
  984. variable.
  985. The @code{bindtextdomain} function can be used several times and if the
  986. @var{domainname} argument is different the previously bound domains
  987. will not be overwritten.
  988. If the program which wish to use @code{bindtextdomain} at some point of
  989. time use the @code{chdir} function to change the current working
  990. directory it is important that the @var{dirname} strings ought to be an
  991. absolute pathname. Otherwise the addressed directory might vary with
  992. the time.
  993. If the @var{dirname} parameter is the null pointer @code{bindtextdomain}
  994. returns the currently selected directory for the domain with the name
  995. @var{domainname}.
  996. The @code{bindtextdomain} function returns a pointer to a string
  997. containing the name of the selected directory name. The string is
  998. allocated internally in the function and must not be changed by the
  999. user. If the system went out of core during the execution of
  1000. @code{bindtextdomain} the return value is @code{NULL} and the global
  1001. variable @code{errno} is set accordingly.
  1002. @end deftypefun
  1003. @node Advanced gettext functions
  1004. @subsubsection Additional functions for more complicated situations
  1005. The functions of the @code{gettext} family described so far (and all the
  1006. @code{catgets} functions as well) have one problem in the real world
  1007. which has been neglected completely in all existing approaches. What
  1008. is meant here is the handling of plural forms.
  1009. Looking through Unix source code before the time anybody thought about
  1010. internationalization (and, sadly, even afterwards) one can often find
  1011. code similar to the following:
  1012. @smallexample
  1013. printf ("%d file%s deleted", n, n == 1 ? "" : "s");
  1014. @end smallexample
  1015. @noindent
  1016. After the first complaints from people internationalizing the code people
  1017. either completely avoided formulations like this or used strings like
  1018. @code{"file(s)"}. Both look unnatural and should be avoided. First
  1019. tries to solve the problem correctly looked like this:
  1020. @smallexample
  1021. if (n == 1)
  1022. printf ("%d file deleted", n);
  1023. else
  1024. printf ("%d files deleted", n);
  1025. @end smallexample
  1026. But this does not solve the problem. It helps languages where the
  1027. plural form of a noun is not simply constructed by adding an `s' but
  1028. that is all. Once again people fell into the trap of believing the
  1029. rules their language uses are universal. But the handling of plural
  1030. forms differs widely between the language families. There are two
  1031. things we can differ between (and even inside language families);
  1032. @itemize @bullet
  1033. @item
  1034. The form how plural forms are build differs. This is a problem with
  1035. language which have many irregularities. German, for instance, is a
  1036. drastic case. Though English and German are part of the same language
  1037. family (Germanic), the almost regular forming of plural noun forms
  1038. (appending an `s') is hardly found in German.
  1039. @item
  1040. The number of plural forms differ. This is somewhat surprising for
  1041. those who only have experiences with Romanic and Germanic languages
  1042. since here the number is the same (there are two).
  1043. But other language families have only one form or many forms. More
  1044. information on this in an extra section.
  1045. @end itemize
  1046. The consequence of this is that application writers should not try to
  1047. solve the problem in their code. This would be localization since it is
  1048. only usable for certain, hardcoded language environments. Instead the
  1049. extended @code{gettext} interface should be used.
  1050. These extra functions are taking instead of the one key string two
  1051. strings and a numerical argument. The idea behind this is that using
  1052. the numerical argument and the first string as a key, the implementation
  1053. can select using rules specified by the translator the right plural
  1054. form. The two string arguments then will be used to provide a return
  1055. value in case no message catalog is found (similar to the normal
  1056. @code{gettext} behavior). In this case the rules for Germanic language
  1057. are used and it is assumed that the first string argument is the singular
  1058. form, the second the plural form.
  1059. This has the consequence that programs without language catalogs can
  1060. display the correct strings only if the program itself is written using
  1061. a Germanic language. This is a limitation but since @theglibc{}
  1062. (as well as the GNU @code{gettext} package) is written as part of the
  1063. GNU package and the coding standards for the GNU project require programs
  1064. to be written in English, this solution nevertheless fulfills its
  1065. purpose.
  1066. @deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
  1067. @standards{GNU, libintl.h}
  1068. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  1069. @c Wrapper for dcngettext.
  1070. The @code{ngettext} function is similar to the @code{gettext} function
  1071. as it finds the message catalogs in the same way. But it takes two
  1072. extra arguments. The @var{msgid1} parameter must contain the singular
  1073. form of the string to be converted. It is also used as the key for the
  1074. search in the catalog. The @var{msgid2} parameter is the plural form.
  1075. The parameter @var{n} is used to determine the plural form. If no
  1076. message catalog is found @var{msgid1} is returned if @code{n == 1},
  1077. otherwise @code{msgid2}.
  1078. An example for the use of this function is:
  1079. @smallexample
  1080. printf (ngettext ("%d file removed", "%d files removed", n), n);
  1081. @end smallexample
  1082. Please note that the numeric value @var{n} has to be passed to the
  1083. @code{printf} function as well. It is not sufficient to pass it only to
  1084. @code{ngettext}.
  1085. @end deftypefun
  1086. @deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
  1087. @standards{GNU, libintl.h}
  1088. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  1089. @c Wrapper for dcngettext.
  1090. The @code{dngettext} is similar to the @code{dgettext} function in the
  1091. way the message catalog is selected. The difference is that it takes
  1092. two extra parameters to provide the correct plural form. These two
  1093. parameters are handled in the same way @code{ngettext} handles them.
  1094. @end deftypefun
  1095. @deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
  1096. @standards{GNU, libintl.h}
  1097. @safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}}
  1098. @c Wrapper for dcigettext.
  1099. The @code{dcngettext} is similar to the @code{dcgettext} function in the
  1100. way the message catalog is selected. The difference is that it takes
  1101. two extra parameters to provide the correct plural form. These two
  1102. parameters are handled in the same way @code{ngettext} handles them.
  1103. @end deftypefun
  1104. @subsubheading The problem of plural forms
  1105. A description of the problem can be found at the beginning of the last
  1106. section. Now there is the question how to solve it. Without the input
  1107. of linguists (which was not available) it was not possible to determine
  1108. whether there are only a few different forms in which plural forms are
  1109. formed or whether the number can increase with every new supported
  1110. language.
  1111. Therefore the solution implemented is to allow the translator to specify
  1112. the rules of how to select the plural form. Since the formula varies
  1113. with every language this is the only viable solution except for
  1114. hardcoding the information in the code (which still would require the
  1115. possibility of extensions to not prevent the use of new languages). The
  1116. details are explained in the GNU @code{gettext} manual. Here only a
  1117. bit of information is provided.
  1118. The information about the plural form selection has to be stored in the
  1119. header entry (the one with the empty @code{msgid} string). It looks
  1120. like this:
  1121. @smallexample
  1122. Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
  1123. @end smallexample
  1124. The @code{nplurals} value must be a decimal number which specifies how
  1125. many different plural forms exist for this language. The string
  1126. following @code{plural} is an expression using the C language
  1127. syntax. Exceptions are that no negative numbers are allowed, numbers
  1128. must be decimal, and the only variable allowed is @code{n}. This
  1129. expression will be evaluated whenever one of the functions
  1130. @code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
  1131. numeric value passed to these functions is then substituted for all uses
  1132. of the variable @code{n} in the expression. The resulting value then
  1133. must be greater or equal to zero and smaller than the value given as the
  1134. value of @code{nplurals}.
  1135. @noindent
  1136. The following rules are known at this point. The language with families
  1137. are listed. But this does not necessarily mean the information can be
  1138. generalized for the whole family (as can be easily seen in the table
  1139. below).@footnote{Additions are welcome. Send appropriate information to
  1140. @email{bug-glibc-manual@@gnu.org}.}
  1141. @table @asis
  1142. @item Only one form:
  1143. Some languages only require one single form. There is no distinction
  1144. between the singular and plural form. An appropriate header entry
  1145. would look like this:
  1146. @smallexample
  1147. Plural-Forms: nplurals=1; plural=0;
  1148. @end smallexample
  1149. @noindent
  1150. Languages with this property include:
  1151. @table @asis
  1152. @item Finno-Ugric family
  1153. Hungarian
  1154. @item Asian family
  1155. Japanese, Korean
  1156. @item Turkic/Altaic family
  1157. Turkish
  1158. @end table
  1159. @item Two forms, singular used for one only
  1160. This is the form used in most existing programs since it is what English
  1161. uses. A header entry would look like this:
  1162. @smallexample
  1163. Plural-Forms: nplurals=2; plural=n != 1;
  1164. @end smallexample
  1165. (Note: this uses the feature of C expressions that boolean expressions
  1166. have to value zero or one.)
  1167. @noindent
  1168. Languages with this property include:
  1169. @table @asis
  1170. @item Germanic family
  1171. Danish, Dutch, English, German, Norwegian, Swedish
  1172. @item Finno-Ugric family
  1173. Estonian, Finnish
  1174. @item Latin/Greek family
  1175. Greek
  1176. @item Semitic family
  1177. Hebrew
  1178. @item Romance family
  1179. Italian, Portuguese, Spanish
  1180. @item Artificial
  1181. Esperanto
  1182. @end table
  1183. @item Two forms, singular used for zero and one
  1184. Exceptional case in the language family. The header entry would be:
  1185. @smallexample
  1186. Plural-Forms: nplurals=2; plural=n>1;
  1187. @end smallexample
  1188. @noindent
  1189. Languages with this property include:
  1190. @table @asis
  1191. @item Romanic family
  1192. French, Brazilian Portuguese
  1193. @end table
  1194. @item Three forms, special case for zero
  1195. The header entry would be:
  1196. @smallexample
  1197. Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
  1198. @end smallexample
  1199. @noindent
  1200. Languages with this property include:
  1201. @table @asis
  1202. @item Baltic family
  1203. Latvian
  1204. @end table
  1205. @item Three forms, special cases for one and two
  1206. The header entry would be:
  1207. @smallexample
  1208. Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
  1209. @end smallexample
  1210. @noindent
  1211. Languages with this property include:
  1212. @table @asis
  1213. @item Celtic
  1214. Gaeilge (Irish)
  1215. @end table
  1216. @item Three forms, special case for numbers ending in 1[2-9]
  1217. The header entry would look like this:
  1218. @smallexample
  1219. Plural-Forms: nplurals=3; \
  1220. plural=n%10==1 && n%100!=11 ? 0 : \
  1221. n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
  1222. @end smallexample
  1223. @noindent
  1224. Languages with this property include:
  1225. @table @asis
  1226. @item Baltic family
  1227. Lithuanian
  1228. @end table
  1229. @item Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
  1230. The header entry would look like this:
  1231. @smallexample
  1232. Plural-Forms: nplurals=3; \
  1233. plural=n%100/10==1 ? 2 : n%10==1 ? 0 : (n+9)%10>3 ? 2 : 1;
  1234. @end smallexample
  1235. @noindent
  1236. Languages with this property include:
  1237. @table @asis
  1238. @item Slavic family
  1239. Croatian, Czech, Russian, Ukrainian
  1240. @end table
  1241. @item Three forms, special cases for 1 and 2, 3, 4
  1242. The header entry would look like this:
  1243. @smallexample
  1244. Plural-Forms: nplurals=3; \
  1245. plural=(n==1) ? 1 : (n>=2 && n<=4) ? 2 : 0;
  1246. @end smallexample
  1247. @noindent
  1248. Languages with this property include:
  1249. @table @asis
  1250. @item Slavic family
  1251. Slovak
  1252. @end table
  1253. @item Three forms, special case for one and some numbers ending in 2, 3, or 4
  1254. The header entry would look like this:
  1255. @smallexample
  1256. Plural-Forms: nplurals=3; \
  1257. plural=n==1 ? 0 : \
  1258. n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
  1259. @end smallexample
  1260. @noindent
  1261. Languages with this property include:
  1262. @table @asis
  1263. @item Slavic family
  1264. Polish
  1265. @end table
  1266. @item Four forms, special case for one and all numbers ending in 02, 03, or 04
  1267. The header entry would look like this:
  1268. @smallexample
  1269. Plural-Forms: nplurals=4; \
  1270. plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
  1271. @end smallexample
  1272. @noindent
  1273. Languages with this property include:
  1274. @table @asis
  1275. @item Slavic family
  1276. Slovenian
  1277. @end table
  1278. @end table
  1279. @node Charset conversion in gettext
  1280. @subsubsection How to specify the output character set @code{gettext} uses
  1281. @code{gettext} not only looks up a translation in a message catalog, it
  1282. also converts the translation on the fly to the desired output character
  1283. set. This is useful if the user is working in a different character set
  1284. than the translator who created the message catalog, because it avoids
  1285. distributing variants of message catalogs which differ only in the
  1286. character set.
  1287. The output character set is, by default, the value of @code{nl_langinfo
  1288. (CODESET)}, which depends on the @code{LC_CTYPE} part of the current
  1289. locale. But programs which store strings in a locale independent way
  1290. (e.g. UTF-8) can request that @code{gettext} and related functions
  1291. return the translations in that encoding, by use of the
  1292. @code{bind_textdomain_codeset} function.
  1293. Note that the @var{msgid} argument to @code{gettext} is not subject to
  1294. character set conversion. Also, when @code{gettext} does not find a
  1295. translation for @var{msgid}, it returns @var{msgid} unchanged --
  1296. independently of the current output character set. It is therefore
  1297. recommended that all @var{msgid}s be US-ASCII strings.
  1298. @deftypefun {char *} bind_textdomain_codeset (const char *@var{domainname}, const char *@var{codeset})
  1299. @standards{GNU, libintl.h}
  1300. @safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}}
  1301. @c bind_textdomain_codeset @ascuheap @acsmem
  1302. @c set_binding_values dup @ascuheap @acsmem
  1303. The @code{bind_textdomain_codeset} function can be used to specify the
  1304. output character set for message catalogs for domain @var{domainname}.
  1305. The @var{codeset} argument must be a valid codeset name which can be used
  1306. for the @code{iconv_open} function, or a null pointer.
  1307. If the @var{codeset} parameter is the null pointer,
  1308. @code{bind_textdomain_codeset} returns the currently selected codeset
  1309. for the domain with the name @var{domainname}. It returns @code{NULL} if
  1310. no codeset has yet been selected.
  1311. The @code{bind_textdomain_codeset} function can be used several times.
  1312. If used multiple times with the same @var{domainname} argument, the
  1313. later call overrides the settings made by the earlier one.
  1314. The @code{bind_textdomain_codeset} function returns a pointer to a
  1315. string containing the name of the selected codeset. The string is
  1316. allocated internally in the function and must not be changed by the
  1317. user. If the system went out of core during the execution of
  1318. @code{bind_textdomain_codeset}, the return value is @code{NULL} and the
  1319. global variable @code{errno} is set accordingly.
  1320. @end deftypefun
  1321. @node GUI program problems
  1322. @subsubsection How to use @code{gettext} in GUI programs
  1323. One place where the @code{gettext} functions, if used normally, have big
  1324. problems is within programs with graphical user interfaces (GUIs). The
  1325. problem is that many of the strings which have to be translated are very
  1326. short. They have to appear in pull-down menus which restricts the
  1327. length. But strings which are not containing entire sentences or at
  1328. least large fragments of a sentence may appear in more than one
  1329. situation in the program but might have different translations. This is
  1330. especially true for the one-word strings which are frequently used in
  1331. GUI programs.
  1332. As a consequence many people say that the @code{gettext} approach is
  1333. wrong and instead @code{catgets} should be used which indeed does not
  1334. have this problem. But there is a very simple and powerful method to
  1335. handle these kind of problems with the @code{gettext} functions.
  1336. @noindent
  1337. As an example consider the following fictional situation. A GUI program
  1338. has a menu bar with the following entries:
  1339. @smallexample
  1340. +------------+------------+--------------------------------------+
  1341. | File | Printer | |
  1342. +------------+------------+--------------------------------------+
  1343. | Open | | Select |
  1344. | New | | Open |
  1345. +----------+ | Connect |
  1346. +----------+
  1347. @end smallexample
  1348. To have the strings @code{File}, @code{Printer}, @code{Open},
  1349. @code{New}, @code{Select}, and @code{Connect} translated there has to be
  1350. at some point in the code a call to a function of the @code{gettext}
  1351. family. But in two places the string passed into the function would be
  1352. @code{Open}. The translations might not be the same and therefore we
  1353. are in the dilemma described above.
  1354. One solution to this problem is to artificially extend the strings
  1355. to make them unambiguous. But what would the program do if no
  1356. translation is available? The extended string is not what should be
  1357. printed. So we should use a slightly modified version of the functions.
  1358. To extend the strings a uniform method should be used. E.g., in the
  1359. example above, the strings could be chosen as
  1360. @smallexample
  1361. Menu|File
  1362. Menu|Printer
  1363. Menu|File|Open
  1364. Menu|File|New
  1365. Menu|Printer|Select
  1366. Menu|Printer|Open
  1367. Menu|Printer|Connect
  1368. @end smallexample
  1369. Now all the strings are different and if now instead of @code{gettext}
  1370. the following little wrapper function is used, everything works just
  1371. fine:
  1372. @cindex sgettext
  1373. @smallexample
  1374. char *
  1375. sgettext (const char *msgid)
  1376. @{
  1377. char *msgval = gettext (msgid);
  1378. if (msgval == msgid)
  1379. msgval = strrchr (msgid, '|') + 1;
  1380. return msgval;
  1381. @}
  1382. @end smallexample
  1383. What this little function does is to recognize the case when no
  1384. translation is available. This can be done very efficiently by a
  1385. pointer comparison since the return value is the input value. If there
  1386. is no translation we know that the input string is in the format we used
  1387. for the Menu entries and therefore contains a @code{|} character. We
  1388. simply search for the last occurrence of this character and return a
  1389. pointer to the character following it. That's it!
  1390. If one now consistently uses the extended string form and replaces
  1391. the @code{gettext} calls with calls to @code{sgettext} (this is normally
  1392. limited to very few places in the GUI implementation) then it is
  1393. possible to produce a program which can be internationalized.
  1394. With advanced compilers (such as GNU C) one can write the
  1395. @code{sgettext} functions as an inline function or as a macro like this:
  1396. @cindex sgettext
  1397. @smallexample
  1398. #define sgettext(msgid) \
  1399. (@{ const char *__msgid = (msgid); \
  1400. char *__msgstr = gettext (__msgid); \
  1401. if (__msgval == __msgid) \
  1402. __msgval = strrchr (__msgid, '|') + 1; \
  1403. __msgval; @})
  1404. @end smallexample
  1405. The other @code{gettext} functions (@code{dgettext}, @code{dcgettext}
  1406. and the @code{ngettext} equivalents) can and should have corresponding
  1407. functions as well which look almost identical, except for the parameters
  1408. and the call to the underlying function.
  1409. Now there is of course the question why such functions do not exist in
  1410. @theglibc{}? There are two parts of the answer to this question.
  1411. @itemize @bullet
  1412. @item
  1413. They are easy to write and therefore can be provided by the project they
  1414. are used in. This is not an answer by itself and must be seen together
  1415. with the second part which is:
  1416. @item
  1417. There is no way the C library can contain a version which can work
  1418. everywhere. The problem is the selection of the character to separate
  1419. the prefix from the actual string in the extended string. The
  1420. examples above used @code{|} which is a quite good choice because it
  1421. resembles a notation frequently used in this context and it also is a
  1422. character not often used in message strings.
  1423. But what if the character is used in message strings. Or if the chose
  1424. character is not available in the character set on the machine one
  1425. compiles (e.g., @code{|} is not required to exist for @w{ISO C}; this is
  1426. why the @file{iso646.h} file exists in @w{ISO C} programming environments).
  1427. @end itemize
  1428. There is only one more comment to make left. The wrapper function above
  1429. requires that the translations strings are not extended themselves.
  1430. This is only logical. There is no need to disambiguate the strings
  1431. (since they are never used as keys for a search) and one also saves
  1432. quite some memory and disk space by doing this.
  1433. @node Using gettextized software
  1434. @subsubsection User influence on @code{gettext}
  1435. The last sections described what the programmer can do to
  1436. internationalize the messages of the program. But it is finally up to
  1437. the user to select the message s/he wants to see. S/He must understand
  1438. them.
  1439. The POSIX locale model uses the environment variables @code{LC_COLLATE},
  1440. @code{LC_CTYPE}, @code{LC_MESSAGES}, @code{LC_MONETARY}, @code{LC_NUMERIC},
  1441. and @code{LC_TIME} to select the locale which is to be used. This way
  1442. the user can influence lots of functions. As we mentioned above, the
  1443. @code{gettext} functions also take advantage of this.
  1444. To understand how this happens it is necessary to take a look at the
  1445. various components of the filename which gets computed to locate a
  1446. message catalog. It is composed as follows:
  1447. @smallexample
  1448. @var{dir_name}/@var{locale}/LC_@var{category}/@var{domain_name}.mo
  1449. @end smallexample
  1450. The default value for @var{dir_name} is system specific. It is computed
  1451. from the value given as the prefix while configuring the C library.
  1452. This value normally is @file{/usr} or @file{/}. For the former the
  1453. complete @var{dir_name} is:
  1454. @smallexample
  1455. /usr/share/locale
  1456. @end smallexample
  1457. We can use @file{/usr/share} since the @file{.mo} files containing the
  1458. message catalogs are system independent, so all systems can use the same
  1459. files. If the program executed the @code{bindtextdomain} function for
  1460. the message domain that is currently handled, the @code{dir_name}
  1461. component is exactly the value which was given to the function as
  1462. the second parameter. I.e., @code{bindtextdomain} allows overwriting
  1463. the only system dependent and fixed value to make it possible to
  1464. address files anywhere in the filesystem.
  1465. The @var{category} is the name of the locale category which was selected
  1466. in the program code. For @code{gettext} and @code{dgettext} this is
  1467. always @code{LC_MESSAGES}, for @code{dcgettext} this is selected by the
  1468. value of the third parameter. As said above it should be avoided to
  1469. ever use a category other than @code{LC_MESSAGES}.
  1470. The @var{locale} component is computed based on the category used. Just
  1471. like for the @code{setlocale} function here comes the user selection
  1472. into the play. Some environment variables are examined in a fixed order
  1473. and the first environment variable set determines the return value of
  1474. the lookup process. In detail, for the category @code{LC_xxx} the
  1475. following variables in this order are examined:
  1476. @table @code
  1477. @item LANGUAGE
  1478. @item LC_ALL
  1479. @item LC_xxx
  1480. @item LANG
  1481. @end table
  1482. This looks very familiar. With the exception of the @code{LANGUAGE}
  1483. environment variable this is exactly the lookup order the
  1484. @code{setlocale} function uses. But why introduce the @code{LANGUAGE}
  1485. variable?
  1486. The reason is that the syntax of the values these variables can have is
  1487. different to what is expected by the @code{setlocale} function. If we
  1488. would set @code{LC_ALL} to a value following the extended syntax that
  1489. would mean the @code{setlocale} function will never be able to use the
  1490. value of this variable as well. An additional variable removes this
  1491. problem plus we can select the language independently of the locale
  1492. setting which sometimes is useful.
  1493. While for the @code{LC_xxx} variables the value should consist of
  1494. exactly one specification of a locale the @code{LANGUAGE} variable's
  1495. value can consist of a colon separated list of locale names. The
  1496. attentive reader will realize that this is the way we manage to
  1497. implement one of our additional demands above: we want to be able to
  1498. specify an ordered list of languages.
  1499. Back to the constructed filename we have only one component missing.
  1500. The @var{domain_name} part is the name which was either registered using
  1501. the @code{textdomain} function or which was given to @code{dgettext} or
  1502. @code{dcgettext} as the first parameter. Now it becomes obvious that a
  1503. good choice for the domain name in the program code is a string which is
  1504. closely related to the program/package name. E.g., for @theglibc{}
  1505. the domain name is @code{libc}.
  1506. @noindent
  1507. A limited piece of example code should show how the program is supposed
  1508. to work:
  1509. @smallexample
  1510. @{
  1511. setlocale (LC_ALL, "");
  1512. textdomain ("test-package");
  1513. bindtextdomain ("test-package", "/usr/local/share/locale");
  1514. puts (gettext ("Hello, world!"));
  1515. @}
  1516. @end smallexample
  1517. At the program start the default domain is @code{messages}, and the
  1518. default locale is "C". The @code{setlocale} call sets the locale
  1519. according to the user's environment variables; remember that correct
  1520. functioning of @code{gettext} relies on the correct setting of the
  1521. @code{LC_MESSAGES} locale (for looking up the message catalog) and
  1522. of the @code{LC_CTYPE} locale (for the character set conversion).
  1523. The @code{textdomain} call changes the default domain to
  1524. @code{test-package}. The @code{bindtextdomain} call specifies that
  1525. the message catalogs for the domain @code{test-package} can be found
  1526. below the directory @file{/usr/local/share/locale}.
  1527. If the user sets in her/his environment the variable @code{LANGUAGE}
  1528. to @code{de} the @code{gettext} function will try to use the
  1529. translations from the file
  1530. @smallexample
  1531. /usr/local/share/locale/de/LC_MESSAGES/test-package.mo
  1532. @end smallexample
  1533. From the above descriptions it should be clear which component of this
  1534. filename is determined by which source.
  1535. In the above example we assumed the @code{LANGUAGE} environment
  1536. variable to be @code{de}. This might be an appropriate selection but what
  1537. happens if the user wants to use @code{LC_ALL} because of the wider
  1538. usability and here the required value is @code{de_DE.ISO-8859-1}? We
  1539. already mentioned above that a situation like this is not infrequent.
  1540. E.g., a person might prefer reading a dialect and if this is not
  1541. available fall back on the standard language.
  1542. The @code{gettext} functions know about situations like this and can
  1543. handle them gracefully. The functions recognize the format of the value
  1544. of the environment variable. It can split the value is different pieces
  1545. and by leaving out the only or the other part it can construct new
  1546. values. This happens of course in a predictable way. To understand
  1547. this one must know the format of the environment variable value. There
  1548. is one more or less standardized form, originally from the X/Open
  1549. specification:
  1550. @code{language[_territory[.codeset]][@@modifier]}
  1551. Less specific locale names will be stripped in the order of the
  1552. following list:
  1553. @enumerate
  1554. @item
  1555. @code{codeset}
  1556. @item
  1557. @code{normalized codeset}
  1558. @item
  1559. @code{territory}
  1560. @item
  1561. @code{modifier}
  1562. @end enumerate
  1563. The @code{language} field will never be dropped for obvious reasons.
  1564. The only new thing is the @code{normalized codeset} entry. This is
  1565. another goodie which is introduced to help reduce the chaos which
  1566. derives from the inability of people to standardize the names of
  1567. character sets. Instead of @w{ISO-8859-1} one can often see @w{8859-1},
  1568. @w{88591}, @w{iso8859-1}, or @w{iso_8859-1}. The @code{normalized
  1569. codeset} value is generated from the user-provided character set name by
  1570. applying the following rules:
  1571. @enumerate
  1572. @item
  1573. Remove all characters besides numbers and letters.
  1574. @item
  1575. Fold letters to lowercase.
  1576. @item
  1577. If the same only contains digits prepend the string @code{"iso"}.
  1578. @end enumerate
  1579. @noindent
  1580. So all of the above names will be normalized to @code{iso88591}. This
  1581. allows the program user much more freedom in choosing the locale name.
  1582. Even this extended functionality still does not help to solve the
  1583. problem that completely different names can be used to denote the same
  1584. locale (e.g., @code{de} and @code{german}). To be of help in this
  1585. situation the locale implementation and also the @code{gettext}
  1586. functions know about aliases.
  1587. The file @file{/usr/share/locale/locale.alias} (replace @file{/usr} with
  1588. whatever prefix you used for configuring the C library) contains a
  1589. mapping of alternative names to more regular names. The system manager
  1590. is free to add new entries to fill her/his own needs. The selected
  1591. locale from the environment is compared with the entries in the first
  1592. column of this file ignoring the case. If they match, the value of the
  1593. second column is used instead for the further handling.
  1594. In the description of the format of the environment variables we already
  1595. mentioned the character set as a factor in the selection of the message
  1596. catalog. In fact, only catalogs which contain text written using the
  1597. character set of the system/program can be used (directly; there will
  1598. come a solution for this some day). This means for the user that s/he
  1599. will always have to take care of this. If in the collection of the
  1600. message catalogs there are files for the same language but coded using
  1601. different character sets the user has to be careful.
  1602. @node Helper programs for gettext
  1603. @subsection Programs to handle message catalogs for @code{gettext}
  1604. @Theglibc{} does not contain the source code for the programs to
  1605. handle message catalogs for the @code{gettext} functions. As part of
  1606. the GNU project the GNU gettext package contains everything the
  1607. developer needs. The functionality provided by the tools in this
  1608. package by far exceeds the abilities of the @code{gencat} program
  1609. described above for the @code{catgets} functions.
  1610. There is a program @code{msgfmt} which is the equivalent program to the
  1611. @code{gencat} program. It generates from the human-readable and
  1612. -editable form of the message catalog a binary file which can be used by
  1613. the @code{gettext} functions. But there are several more programs
  1614. available.
  1615. The @code{xgettext} program can be used to automatically extract the
  1616. translatable messages from a source file. I.e., the programmer need not
  1617. take care of the translations and the list of messages which have to be
  1618. translated. S/He will simply wrap the translatable string in calls to
  1619. @code{gettext} et.al and the rest will be done by @code{xgettext}. This
  1620. program has a lot of options which help to customize the output or
  1621. help to understand the input better.
  1622. Other programs help to manage the development cycle when new messages appear
  1623. in the source files or when a new translation of the messages appears.
  1624. Here it should only be noted that using all the tools in GNU gettext it
  1625. is possible to @emph{completely} automate the handling of message
  1626. catalogs. Besides marking the translatable strings in the source code and
  1627. generating the translations the developers do not have anything to do
  1628. themselves.