class ICU::Converter

Overview

Codepage Conversion

Convert text between Unicode and various legacy character encodings (codepages).

ICU::Converter wraps a single ICU UConverter, which encapsulates a specific encoding (e.g. "ISO-8859-1", "Shift_JIS", "UTF-16BE"). It exposes the most important conversion operations: encoding a String to raw bytes in the target encoding, and decoding raw bytes back to a String (which Crystal always stores as UTF-8).

ICU::ConverterSelector wraps UConverterSelector and lets you quickly determine which converters from a given list are able to round-trip a particular string without data loss.

Usage

cnv = ICU::Converter.new("ISO-8859-1")
cnv.name # => "ISO-8859-1"
cnv.type # => ICU::Converter::Type::Latin1

# Encode UTF-8 string to ISO-8859-1 bytes
bytes = cnv.encode("Héllo")

# Decode ISO-8859-1 bytes back to a UTF-8 String
str = cnv.decode(bytes) # => "Héllo"

# One-shot conversion between two encodings
result = ICU::Converter.convert("UTF-8", "ISO-8859-1", bytes)

See also

Defined in:

icu/converter.cr

Constructors

Class Method Summary

Instance Method Summary

Constructor Detail

def self.new(name : String) #

Opens a converter for the named encoding.

The name may be a canonical name, an alias, or a MIME/IANA name as recognized by ICU (e.g. "UTF-8", "ISO-8859-1", "windows-1252").

Raises ICU::Error if the encoding is unknown.

ICU::Converter.new("UTF-8")
ICU::Converter.new("Shift_JIS")

[View source]

Class Method Detail

def self.aliases(name : String) : Array(String) #

Returns the list of known aliases for the given encoding name.

ICU::Converter.aliases("UTF-8") # => ["UTF-8", "unicode-1-1-utf-8", "utf8", ...]

[View source]
def self.available_names : Array(String) #

Returns the list of all available converter names.

The list is computed once and cached.

ICU::Converter.available_names.includes?("UTF-8") # => true

[View source]
def self.canonical_name(alias_name : String, standard : String = "IANA") : String | Nil #

Returns the canonical name for the given alias and standard.

ICU::Converter.canonical_name("utf8", "IANA") # => "UTF-8"

[View source]
def self.convert(from_encoding : String, to_encoding : String, source : Bytes) : Bytes #

Converts source bytes directly from from_encoding to to_encoding without instantiating two separate converters.

Returns the converted bytes.

utf8_bytes = "Héllo".encode("UTF-8")
latin1 = ICU::Converter.convert("UTF-8", "ISO-8859-1", utf8_bytes)

[View source]
def self.default_name : String #

Returns the default converter name for the current platform/locale.

ICU::Converter.default_name # => "UTF-8"

[View source]

Instance Method Detail

def ambiguous? : Bool #

Returns true if the converter has an ambiguous character mapping (e.g. encodings where the same byte sequences mean different things depending on context or locale).


[View source]
def decode(bytes : Bytes) : String #

Decodes the given bytes (in this converter's encoding) to a UTF-8 String.

cnv = ICU::Converter.new("ISO-8859-1")
bytes = cnv.encode("Héllo")
cnv.decode(bytes) # => "Héllo"

[View source]
def encode(string : String) : Bytes #

Encodes the given UTF-8 string to bytes using this converter's encoding.

cnv = ICU::Converter.new("ISO-8859-1")
cnv.encode("Héllo") # => Bytes[...]

[View source]
def finalize #

[View source]
def fixed_width? : Bool #

Returns true if every character in this encoding has a fixed byte width.


[View source]
def max_char_size : Int32 #

Returns the maximum number of bytes used per character.


[View source]
def min_char_size : Int32 #

Returns the minimum number of bytes used per character.


[View source]
def name : String #

Returns the canonical name of this converter.

ICU::Converter.new("latin-1").name # => "ISO-8859-1"

[View source]
def reset : self #

Resets the converter to its initial state, discarding any partial conversion state.


[View source]
def to_unsafe : LibICU::UConverter #

[View source]
def type : Type #

Returns the type of this converter.

ICU::Converter.new("UTF-8").type # => ICU::Converter::Type::Utf8

[View source]