bytes.pm 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
  1. package bytes;
  2. our $VERSION = '1.04';
  3. $bytes::hint_bits = 0x00000008;
  4. sub import {
  5. $^H |= $bytes::hint_bits;
  6. }
  7. sub unimport {
  8. $^H &= ~$bytes::hint_bits;
  9. }
  10. sub AUTOLOAD {
  11. require "bytes_heavy.pl";
  12. goto &$AUTOLOAD if defined &$AUTOLOAD;
  13. require Carp;
  14. Carp::croak("Undefined subroutine $AUTOLOAD called");
  15. }
  16. sub length (_);
  17. sub chr (_);
  18. sub ord (_);
  19. sub substr ($$;$$);
  20. sub index ($$;$);
  21. sub rindex ($$;$);
  22. 1;
  23. __END__
  24. =head1 NAME
  25. bytes - Perl pragma to force byte semantics rather than character semantics
  26. =head1 NOTICE
  27. This pragma reflects early attempts to incorporate Unicode into perl and
  28. has since been superseded. It breaks encapsulation (i.e. it exposes the
  29. innards of how the perl executable currently happens to store a string),
  30. and use of this module for anything other than debugging purposes is
  31. strongly discouraged. If you feel that the functions here within might be
  32. useful for your application, this possibly indicates a mismatch between
  33. your mental model of Perl Unicode and the current reality. In that case,
  34. you may wish to read some of the perl Unicode documentation:
  35. L<perluniintro>, L<perlunitut>, L<perlunifaq> and L<perlunicode>.
  36. =head1 SYNOPSIS
  37. use bytes;
  38. ... chr(...); # or bytes::chr
  39. ... index(...); # or bytes::index
  40. ... length(...); # or bytes::length
  41. ... ord(...); # or bytes::ord
  42. ... rindex(...); # or bytes::rindex
  43. ... substr(...); # or bytes::substr
  44. no bytes;
  45. =head1 DESCRIPTION
  46. The C<use bytes> pragma disables character semantics for the rest of the
  47. lexical scope in which it appears. C<no bytes> can be used to reverse
  48. the effect of C<use bytes> within the current lexical scope.
  49. Perl normally assumes character semantics in the presence of character
  50. data (i.e. data that has come from a source that has been marked as
  51. being of a particular character encoding). When C<use bytes> is in
  52. effect, the encoding is temporarily ignored, and each string is treated
  53. as a series of bytes.
  54. As an example, when Perl sees C<$x = chr(400)>, it encodes the character
  55. in UTF-8 and stores it in $x. Then it is marked as character data, so,
  56. for instance, C<length $x> returns C<1>. However, in the scope of the
  57. C<bytes> pragma, $x is treated as a series of bytes - the bytes that make
  58. up the UTF8 encoding - and C<length $x> returns C<2>:
  59. $x = chr(400);
  60. print "Length is ", length $x, "\n"; # "Length is 1"
  61. printf "Contents are %vd\n", $x; # "Contents are 400"
  62. {
  63. use bytes; # or "require bytes; bytes::length()"
  64. print "Length is ", length $x, "\n"; # "Length is 2"
  65. printf "Contents are %vd\n", $x; # "Contents are 198.144"
  66. }
  67. chr(), ord(), substr(), index() and rindex() behave similarly.
  68. For more on the implications and differences between character
  69. semantics and byte semantics, see L<perluniintro> and L<perlunicode>.
  70. =head1 LIMITATIONS
  71. bytes::substr() does not work as an lvalue().
  72. =head1 SEE ALSO
  73. L<perluniintro>, L<perlunicode>, L<utf8>
  74. =cut