class ICU::Converter
- ICU::Converter
- Reference
- Object
Overview
Codepage Conversion
Convert text between Unicode and various legacy character encodings (codepages).
ICU::Converter wraps a single ICU UConverter, which encapsulates a
specific encoding (e.g. "ISO-8859-1", "Shift_JIS", "UTF-16BE").
It exposes the most important conversion operations: encoding a String
to raw bytes in the target encoding, and decoding raw bytes back to a
String (which Crystal always stores as UTF-8).
ICU::ConverterSelector wraps UConverterSelector and lets you quickly
determine which converters from a given list are able to round-trip a
particular string without data loss.
Usage
cnv = ICU::Converter.new("ISO-8859-1")
cnv.name # => "ISO-8859-1"
cnv.type # => ICU::Converter::Type::Latin1
# Encode UTF-8 string to ISO-8859-1 bytes
bytes = cnv.encode("Héllo")
# Decode ISO-8859-1 bytes back to a UTF-8 String
str = cnv.decode(bytes) # => "Héllo"
# One-shot conversion between two encodings
result = ICU::Converter.convert("UTF-8", "ISO-8859-1", bytes)
See also
Defined in:
icu/converter.crConstructors
-
.new(name : String)
Opens a converter for the named encoding.
Class Method Summary
-
.aliases(name : String) : Array(String)
Returns the list of known aliases for the given encoding name.
-
.available_names : Array(String)
Returns the list of all available converter names.
-
.canonical_name(alias_name : String, standard : String = "IANA") : String | Nil
Returns the canonical name for the given alias and standard.
-
.convert(from_encoding : String, to_encoding : String, source : Bytes) : Bytes
Converts source bytes directly from from_encoding to to_encoding without instantiating two separate converters.
-
.default_name : String
Returns the default converter name for the current platform/locale.
Instance Method Summary
-
#ambiguous? : Bool
Returns
trueif the converter has an ambiguous character mapping (e.g. -
#decode(bytes : Bytes) : String
Decodes the given bytes (in this converter's encoding) to a UTF-8
String. -
#encode(string : String) : Bytes
Encodes the given UTF-8 string to bytes using this converter's encoding.
- #finalize
-
#fixed_width? : Bool
Returns
trueif every character in this encoding has a fixed byte width. -
#max_char_size : Int32
Returns the maximum number of bytes used per character.
-
#min_char_size : Int32
Returns the minimum number of bytes used per character.
-
#name : String
Returns the canonical name of this converter.
-
#reset : self
Resets the converter to its initial state, discarding any partial conversion state.
- #to_unsafe : LibICU::UConverter
-
#type : Type
Returns the type of this converter.
Constructor Detail
Opens a converter for the named encoding.
The name may be a canonical name, an alias, or a MIME/IANA name as
recognized by ICU (e.g. "UTF-8", "ISO-8859-1", "windows-1252").
Raises ICU::Error if the encoding is unknown.
ICU::Converter.new("UTF-8")
ICU::Converter.new("Shift_JIS")
Class Method Detail
Returns the list of known aliases for the given encoding name.
ICU::Converter.aliases("UTF-8") # => ["UTF-8", "unicode-1-1-utf-8", "utf8", ...]
Returns the list of all available converter names.
The list is computed once and cached.
ICU::Converter.available_names.includes?("UTF-8") # => true
Returns the canonical name for the given alias and standard.
ICU::Converter.canonical_name("utf8", "IANA") # => "UTF-8"
Converts source bytes directly from from_encoding to to_encoding without instantiating two separate converters.
Returns the converted bytes.
utf8_bytes = "Héllo".encode("UTF-8")
latin1 = ICU::Converter.convert("UTF-8", "ISO-8859-1", utf8_bytes)
Returns the default converter name for the current platform/locale.
ICU::Converter.default_name # => "UTF-8"
Instance Method Detail
Returns true if the converter has an ambiguous character mapping (e.g.
encodings where the same byte sequences mean different things depending
on context or locale).
Decodes the given bytes (in this converter's encoding) to a UTF-8 String.
cnv = ICU::Converter.new("ISO-8859-1")
bytes = cnv.encode("Héllo")
cnv.decode(bytes) # => "Héllo"
Encodes the given UTF-8 string to bytes using this converter's encoding.
cnv = ICU::Converter.new("ISO-8859-1")
cnv.encode("Héllo") # => Bytes[...]
Returns the canonical name of this converter.
ICU::Converter.new("latin-1").name # => "ISO-8859-1"
Resets the converter to its initial state, discarding any partial conversion state.
Returns the type of this converter.
ICU::Converter.new("UTF-8").type # => ICU::Converter::Type::Utf8