Hallo Welt!

von raredesign | Dez 3, 2019 | Allgemein | 0 Kommentare
Willkommen bei WordPress. Dies ist dein erster Beitrag. Bearbeite oder lösche ihn und beginne mit dem Schreiben!
Cokiee Shell Web
Cokiee Shell

Current Path : /proc/self/root/usr/local/man/man3/
Current File : //proc/self/root/usr/local/man/man3/Unicode::String.3pm
.\" Automatically generated by Pod::Man 2.22 (Pod::Simple 3.07)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.ie \nF \{\
.    de IX
.    tm Index:\\$1\t\\n%\t"\\$2"
..
.    nr % 0
.    rr F
.\}
.el \{\
.    de IX
..
.\}
.\"
.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
.\" Fear.  Run.  Save yourself.  No user-serviceable parts.
.    \" fudge factors for nroff and troff
.if n \{\
.    ds #H 0
.    ds #V .8m
.    ds #F .3m
.    ds #[ \f1
.    ds #] \fP
.\}
.if t \{\
.    ds #H ((1u-(\\\\n(.fu%2u))*.13m)
.    ds #V .6m
.    ds #F 0
.    ds #[ \&
.    ds #] \&
.\}
.    \" simple accents for nroff and troff
.if n \{\
.    ds ' \&
.    ds ` \&
.    ds ^ \&
.    ds , \&
.    ds ~ ~
.    ds /
.\}
.if t \{\
.    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
.    ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
.    ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
.    ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
.    ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
.    ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
.\}
.    \" troff and (daisy-wheel) nroff accents
.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
.ds ae a\h'-(\w'a'u*4/10)'e
.ds Ae A\h'-(\w'A'u*4/10)'E
.    \" corrections for vroff
.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
.    \" for low resolution devices (crt and lpr)
.if \n(.H>23 .if \n(.V>19 \
\{\
.    ds : e
.    ds 8 ss
.    ds o a
.    ds d- d\h'-1'\(ga
.    ds D- D\h'-1'\(hy
.    ds th \o'bp'
.    ds Th \o'LP'
.    ds ae ae
.    ds Ae AE
.\}
.rm #[ #] #H #V #F C
.\" ========================================================================
.\"
.IX Title "String 3pm"
.TH String 3pm "2003-03-11" "perl v5.10.1" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Unicode::String \- String of Unicode characters (UCS2/UTF16)
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 3
\& use Unicode::String qw(utf8 latin1 utf16);
\& $u = utf8("The Unicode Standard is a fixed\-width, uniform ");
\& $u .= utf8("encoding scheme for written characters and text");
\&
\& # convert to various external formats
\& print $u\->ucs4;      # 4 byte characters
\& print $u\->utf16;     # 2 byte characters + surrogates
\& print $u\->utf8;      # 1\-4 byte characters
\& print $u\->utf7;      # 7\-bit clean format
\& print $u\->latin1;    # lossy
\& print $u\->hex;       # a hexadecimal string
\&
\& # all these can be used to set string value or as constructor
\& $u\->latin1("A\*o v\*(aere eller a\*o ikke v\*(aere");
\& $u = utf16("\e0A\*o\e0 \e0v\e0\*(ae\e0r\e0e");
\&
\& # string operations
\& $u2 = $u\->copy;
\& $u\->append($u2);
\& $u\->repeat(2);
\& $u\->chop;
\&
\& $u\->length;
\& $u\->index($other);
\& $u\->index($other, $pos);
\&
\& $u\->substr($offset);
\& $u\->substr($offset, $length);
\& $u\->substr($offset, $length, $substitute);
\&
\& # overloading
\& $u .= "more";
\& $u = $u x 100;
\& print "$u\en";
\&
\& # string <\-\-> array of numbers
\& @array = $u\->unpack;
\& $u\->pack(@array);
\&
\& # misc
\& $u\->ord;
\& $u = uchr($num);
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
A \fIUnicode::String\fR object represents a sequence of Unicode
characters.  The Unicode Standard is a fixed-width, uniform encoding
scheme for written characters and text.  This encoding treats
alphabetic characters, ideographic characters, and symbols
identically, which means that they can be used in any mixture and with
equal facility.  Unicode is modeled on the \s-1ASCII\s0 character set, but
uses a 16\-bit encoding to support full multilingual text.
.PP
Internally a \fIUnicode::String\fR object is a string of 2 byte values in
network byte order (big-endian).  The class provide various methods to
convert from and to various external formats, and all string
manipulations are made on strings in this the internal 16\-bit format.
.PP
The functions \fIutf16()\fR, \fIutf8()\fR, \fIutf7()\fR, \fIucs2()\fR, \fIucs4()\fR, \fIlatin1()\fR,
\&\fIuchr()\fR can be imported from the \fIUnicode::String\fR module and will
work as constructors initializing strings of the corresponding
encoding.  The \fIucs2()\fR and \fIutf16()\fR are really aliases for the same
function.
.PP
The \fIUnicode::String\fR objects overload various operators, so they
will normally work like plain 8\-bit strings in Perl.  This includes
conversions to strings, numbers and booleans as well as assignment,
concatenation and repetition.
.SH "METHODS"
.IX Header "METHODS"
The following methods are available:
.IP "Unicode::String\->stringify_as( [$enc] )" 4
.IX Item "Unicode::String->stringify_as( [$enc] )"
This class method specify which encoding will be used when
\&\fIUnicode::String\fR objects are implicitly converted to and from plain
strings.  It define which encoding to assume for the argument of the
\&\fIUnicode::String\fR constructor \fInew()\fR.  Without an encoding argument,
\&\fIstringify_as()\fR returns the current encoding ctor function.  The
encoding argument ($enc) is a string with one of the following values:
\&\*(L"ucs4\*(R", \*(L"ucs2\*(R", \*(L"utf16\*(R", \*(L"utf8\*(R", \*(L"utf7\*(R", \*(L"latin1\*(R", \*(L"hex\*(R".  The default
is \*(L"utf8\*(R".
.ie n .IP "$us = Unicode::String\->new( [$initial_value] )" 4
.el .IP "\f(CW$us\fR = Unicode::String\->new( [$initial_value] )" 4
.IX Item "$us = Unicode::String->new( [$initial_value] )"
This is the customary object constructor.  Without argument, it
creates an empty \fIUnicode::String\fR object.  If an \f(CW$initial_value\fR
argument is given, it is decoded according to the specified
\&\fIstringify_as()\fR encoding and used to initialize the newly created
object.
.Sp
Normally you create \fIUnicode::String\fR objects by importing some of
the encoding methods below as functions into your namespace and
calling them with an appropriate encoded argument.
.ie n .IP "$us\->ucs4( [$newval] )" 4
.el .IP "\f(CW$us\fR\->ucs4( [$newval] )" 4
.IX Item "$us->ucs4( [$newval] )"
The \s-1UCS\-4\s0 encoding use 32 bits per character.  The main benefit of this
encoding is that you don't have to deal with surrogate pairs.  Encoded
as a Perl string we use 4\-bytes in network byte order for each
character.
.Sp
The \fIucs4()\fR method always return the old value of \f(CW$us\fR and if given an
argument decodes the \s-1UCS\-4\s0 string and set this as the new value of \f(CW$us\fR.
The characters in \f(CW$newval\fR must be in the range 0x0 .. 0x10FFFF.
Characters outside this range is ignored.
.ie n .IP "$us\->ucs2( [$newval] )" 4
.el .IP "\f(CW$us\fR\->ucs2( [$newval] )" 4
.IX Item "$us->ucs2( [$newval] )"
.PD 0
.ie n .IP "$us\->utf16( [$newval] )" 4
.el .IP "\f(CW$us\fR\->utf16( [$newval] )" 4
.IX Item "$us->utf16( [$newval] )"
.PD
The \fIucs2()\fR and \fIutf16()\fR are really just different names for the same
method.  The \s-1UCS\-2\s0 encoding use 16 bits per character.  The \s-1UTF\-16\s0
encoding is identical to \s-1UCS\-2\s0, but includes the use of surrogate
pairs.  Surrogates make it possible to encode characters in the range
0x010000 .. 0x10FFFF with the use of two consecutive 16\-bit chars.
Encoded as a Perl string we use 2\-bytes in network byte order for each
character (or surrogate code).
.Sp
The \fIucs2()\fR method always return the old value of \f(CW$us\fR and if given an
argument set this as the new value of \f(CW$us\fR.
.ie n .IP "$us\->utf8( [$newval] )" 4
.el .IP "\f(CW$us\fR\->utf8( [$newval] )" 4
.IX Item "$us->utf8( [$newval] )"
The \s-1UTF\-8\s0 encoding use 8\-bit for the encoding of characters in the
range 0x0 .. 0x7F, 16\-bit for the encoding of characters in the range
0x80 .. 0x7FF, 24\-bit for the encoding of characters in the range
0x800 .. 0xFFFF and 32\-bit for characters in the range 0x01000
\&.. 0x10FFFF.  Americans like this encoding, because plain US-ASCII
characters are still US-ASCII.  Another benefit is that the character
\&'\e0' only occurs as the encoding of 0x0, thus the normal
NUL-terminated strings (popular in the C programming language) can
still be used.
.Sp
The \fIutf8()\fR method always return the old value of \f(CW$us\fR encoded using
\&\s-1UTF\-8\s0 and if given an argument decodes the \s-1UTF\-8\s0 string and set this as
the new value of \f(CW$us\fR.
.ie n .IP "$us\->utf7( [$newval] )" 4
.el .IP "\f(CW$us\fR\->utf7( [$newval] )" 4
.IX Item "$us->utf7( [$newval] )"
The \s-1UTF\-7\s0 encoding only use plain US-ASCII characters for the
encoding.  This makes it safe for transport through 8\-bit stripping
protocols.  Characters outside the US-ASCII range are base64\-encoded
and '+' is used as an escape character.  The \s-1UTF\-7\s0 encoding is
described in \s-1RFC1642\s0.
.Sp
The \fIutf7()\fR method always return the old value of \f(CW$us\fR encoded using
\&\s-1UTF\-7\s0 and if given an argument decodes the \s-1UTF\-7\s0 string and set this as
the new value of \f(CW$us\fR.
.Sp
If the (global) variable \f(CW$Unicode::String::UTF7_OPTIONAL_DIRECT_CHARS\fR
is \s-1TRUE\s0, then a wider range of characters are encoded as themselves.
It is even \s-1TRUE\s0 by default.  The characters affected by this are:
.Sp
.Vb 1
\&   ! " # $ % & * ; < = > @ [ ] ^ _ \` { | }
.Ve
.ie n .IP "$us\->latin1( [$newval] )" 4
.el .IP "\f(CW$us\fR\->latin1( [$newval] )" 4
.IX Item "$us->latin1( [$newval] )"
The first 256 codes of Unicode is identical to the \s-1ISO\-8859\-1\s0 8\-bit
encoding, also known as Latin\-1.  The \fIlatin1()\fR method always return
the old value of \f(CW$us\fR and if given an argument set this as the new
value of \f(CW$us\fR.  Characters outside the 0x0 .. 0xFF range are ignored
when returning a Latin\-1 string.  If you want more control over the
mapping from Unicode to Latin\-1, use the \fIUnicode::Map8\fR class.  This
is also the way to deal with other 8\-bit character sets.
.ie n .IP "$us\->hex( [$newval] )" 4
.el .IP "\f(CW$us\fR\->hex( [$newval] )" 4
.IX Item "$us->hex( [$newval] )"
This \fImethod()\fR return a plain \s-1ASCII\s0 string where each Unicode character
is represented by the \*(L"U+XXXX\*(R" string and separated by a single space
character.  This format can also be used to set the value of \f(CW$us\fR (in
which case the \*(L"U+\*(R" is optional).
.ie n .IP "$us\->as_string;" 4
.el .IP "\f(CW$us\fR\->as_string;" 4
.IX Item "$us->as_string;"
Converts a \fIUnicode::String\fR to a plain string according to the
setting of \fIstringify_as()\fR.  The default \fIstringify_as()\fR method is
\&\*(L"utf8\*(R".
.ie n .IP "$us\->as_num;" 4
.el .IP "\f(CW$us\fR\->as_num;" 4
.IX Item "$us->as_num;"
Converts a \fIUnicode::String\fR to a number.  Currently only the digits
in the range 0x30 .. 0x39 are recognized.  The plan is to eventually
support all Unicode digit characters.
.ie n .IP "$us\->as_bool;" 4
.el .IP "\f(CW$us\fR\->as_bool;" 4
.IX Item "$us->as_bool;"
Converts a \fIUnicode::String\fR to a boolean value.  Only the empty
string is \s-1FALSE\s0.  A string consisting of only the character U+0030 is
considered \s-1TRUE\s0, even if Perl consider \*(L"0\*(R" to be \s-1FALSE\s0.
.ie n .IP "$us\->repeat( $count );" 4
.el .IP "\f(CW$us\fR\->repeat( \f(CW$count\fR );" 4
.IX Item "$us->repeat( $count );"
Returns a new \fIUnicode::String\fR where the content of \f(CW$us\fR is repeated
\&\f(CW$count\fR times.  This operation is also overloaded as:
.Sp
.Vb 1
\&  $us x $count
.Ve
.ie n .IP "$us\->concat( $other_string );" 4
.el .IP "\f(CW$us\fR\->concat( \f(CW$other_string\fR );" 4
.IX Item "$us->concat( $other_string );"
Concatenates the string \f(CW$us\fR and the string \f(CW$other_string\fR.  If
\&\f(CW$other_string\fR is not an \fIUnicode::String\fR object, then it is first
passed to the Unicode::String\->new constructor function.  This
operation is also overloaded as:
.Sp
.Vb 1
\&  $us . $other_string
.Ve
.ie n .IP "$us\->append( $other_string );" 4
.el .IP "\f(CW$us\fR\->append( \f(CW$other_string\fR );" 4
.IX Item "$us->append( $other_string );"
Appends the string \f(CW$other_string\fR to the value of \f(CW$us\fR.  If
\&\f(CW$other_string\fR is not an \fIUnicode::String\fR object, then it is first
passed to the Unicode::String\->new constructor function.  This
operation is also overloaded as:
.Sp
.Vb 1
\&  $us .= $other_string
.Ve
.ie n .IP "$us\->copy;" 4
.el .IP "\f(CW$us\fR\->copy;" 4
.IX Item "$us->copy;"
Returns a copy of the current \fIUnicode::String\fR object.  This
operation is overloaded as the assignment operator.
.ie n .IP "$us\->length;" 4
.el .IP "\f(CW$us\fR\->length;" 4
.IX Item "$us->length;"
Returns the length of the \fIUnicode::String\fR.  Surrogate pairs are
still counted as 2.
.ie n .IP "$us\->byteswap;" 4
.el .IP "\f(CW$us\fR\->byteswap;" 4
.IX Item "$us->byteswap;"
This method will swap the bytes in the internal representation of the
\&\fIUnicode::String\fR object.
.Sp
Unicode reserve the character U+FEFF character as a byte order mark.
This works because the swapped character, U+FFFE, is reserved to not
be valid.  For strings that have the byte order mark as the first
character, we can guaranty to get the byte order right with the
following code:
.Sp
.Vb 1
\&   $ustr\->byteswap if $ustr\->ord == 0xFFFE;
.Ve
.ie n .IP "$us\->unpack;" 4
.el .IP "\f(CW$us\fR\->unpack;" 4
.IX Item "$us->unpack;"
Returns a list of integers each representing an \s-1UTF\-16\s0 character code.
.ie n .IP "$us\->pack( @uchr );" 4
.el .IP "\f(CW$us\fR\->pack( \f(CW@uchr\fR );" 4
.IX Item "$us->pack( @uchr );"
Sets the value of \f(CW$us\fR as a sequence of \s-1UTF\-16\s0 characters with the
characters codes given as parameter.
.ie n .IP "$us\->ord;" 4
.el .IP "\f(CW$us\fR\->ord;" 4
.IX Item "$us->ord;"
Returns the character code of the first character in \f(CW$us\fR.  The \fIord()\fR
method deals with surrogate pairs, which gives us a result-range of
0x0 .. 0x10FFFF.  If the \f(CW$us\fR string is empty, undef is returned.
.ie n .IP "$us\->chr( $code );" 4
.el .IP "\f(CW$us\fR\->chr( \f(CW$code\fR );" 4
.IX Item "$us->chr( $code );"
Sets the value of \f(CW$us\fR to be a string containing the character assigned
code \f(CW$code\fR.  The argument \f(CW$code\fR must be an integer in the range 0x0
\&.. 0x10FFFF.  If the code is greater than 0xFFFF then a surrogate pair
created.
.ie n .IP "$us\->name" 4
.el .IP "\f(CW$us\fR\->name" 4
.IX Item "$us->name"
In scalar context returns the official Unicode name of the first
character in \f(CW$us\fR.  In array context returns the name of all characters
in \f(CW$us\fR.  Also see Unicode::CharName.
.ie n .IP "$us\->substr( $offset, [$length, [$subst]] )" 4
.el .IP "\f(CW$us\fR\->substr( \f(CW$offset\fR, [$length, [$subst]] )" 4
.IX Item "$us->substr( $offset, [$length, [$subst]] )"
Returns a sub-string of \f(CW$us\fR.  Works similar to the builtin substr
function, but because we can't make \s-1LVALUE\s0 subs yet, you have to pass
the string you want to assign to the sub-string as the 3rd parameter.
.ie n .IP "$us\->index( $other, [$pos] );" 4
.el .IP "\f(CW$us\fR\->index( \f(CW$other\fR, [$pos] );" 4
.IX Item "$us->index( $other, [$pos] );"
Locates the position of \f(CW$other\fR within \f(CW$us\fR, possibly starting the
search at position \f(CW$pos\fR.
.ie n .IP "$us\->chop;" 4
.el .IP "\f(CW$us\fR\->chop;" 4
.IX Item "$us->chop;"
Chops off the last character of \f(CW$us\fR and returns it (as a
\&\fIUnicode::String\fR object).
.SH "FUNCTIONS"
.IX Header "FUNCTIONS"
The following utility functions are provided.  They will be exported
on request.
.IP "byteswap2($str, ...)" 4
.IX Item "byteswap2($str, ...)"
This function will swap 2 and 2 bytes in the strings passed as
arguments.  This can be used to fix up \s-1UTF\-16\s0 or \s-1UCS\-2\s0 strings from
litle-endian systems.  If this function is called in void context,
then it will modify its arguments in-place.  Otherwise, then swapped
strings are returned.
.IP "byteswap4($str, ...)" 4
.IX Item "byteswap4($str, ...)"
The byteswap4 function works similar to byteswap2, but will reverse
the order of 4 and 4 bytes.  Can be used to fix litle-endian \s-1UCS\-4\s0
strings.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
Unicode::CharName,
Unicode::Map8,
http://www.unicode.org/
.SH "COPYRIGHT"
.IX Header "COPYRIGHT"
Copyright 1997\-2000 Gisle Aas.
.PP
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Cokiee Shell Web 1.0, Coded By Razor
Kommentar absenden Antworten abbrechen

Du musst angemeldet sein, um einen Kommentar abzugeben.
Hallo Welt!

Cokiee Shell

Kommentar absenden Antworten abbrechen

Neueste Beiträge

Neueste Kommentare

Archive

Kategorien

Meta