class ICU::Collator

Overview

Collation

This class allows to perform locale-sensitive string comparison.

Sort ordering may be customized by providing your own set of rules (see CLDR root sort order).

Usage

ICU::Collator.new("en").compare("y", "k") # => 1
ICU::Collator.new("lt").compare("y", "k") # => -1

col = ICU::Collator.new(rules: "&c < b < a")
col.compare("a", "b") # => 1
col.compare("b", "c") # => 1
col.compare("d", "e") # => -1

NOTE the #compare method requires ICU >= 50

See also

Defined in:

icu/collator.cr

Constant Summary

DEFAULT = AttributeValue::Default
KEYWORDS = begin keywords = Hash(String, Set(String)).new ustatus = LibICU::UErrorCode::UZeroError kenum = LibICU.ucol_get_keywords(pointerof(ustatus)) ICU.check_error!(ustatus) (UEnum.new(kenum, owns: true)).each do |keyword| ustatus = LibICU::UErrorCode::UZeroError venum = LibICU.ucol_get_keyword_values(keyword, pointerof(ustatus)) ICU.check_error!(ustatus) keywords[keyword] = Set(String).new((UEnum.new(venum, owns: true)).to_a) end keywords end
LOCALES = begin ustatus = LibICU::UErrorCode::UZeroError uenum = LibICU.ucol_open_available_locales(pointerof(ustatus)) ICU.check_error!(ustatus) Set(String).new((UEnum.new(uenum, owns: true)).to_a) end
OFF = AttributeValue::Off
ON = AttributeValue::On

Constructors

Class Method Summary

Instance Method Summary

Constructor Detail

def self.new(locale : String | Nil = nil, rules : String | Nil = nil, normalization_mode : AttributeValue = DEFAULT, strength : Strength = AttributeValue::DefaultStrength) #

Initialize a new Collator specifying a locale or a set of rules. If none of them is specified it will be initialized with the default locale.

ICU::Collator.new
ICU::Collator.new("pt")
ICU::Collator.new(locale: "lt")
ICU::Collator.new(rules: "&c < b < a")

[View source]

Class Method Detail

def self.functional_equivalent(locale : String, keyword : String = KEYWORDS.keys.first) #

Returns a functional equivalent to a given locale

ICU::Collator.functional_equivalent("en") # => "root"

[View source]

Instance Method Detail

def [](attribute : Attribute) : AttributeValue #

Get the value of the specified attribute


[View source]
def []=(attribute : Attribute, value : AttributeValue) #

Set a value to the specified attribute


[View source]
def compare(s1 : String, s2 : String) : Int #

Compares two strings

ICU::Collator.new("en").compare("y", "k")       # => 1
ICU::Collator.new("lt").compare("y", "k")       # => -1
ICU::Collator.new("fr").compare("côte", "coté") # => -1

(see String#compare)


[View source]
def contractions : Set(Char) #

Returns the set of contraction characters defined by this collator.

A contraction is a sequence of two or more characters treated as a single collation element (e.g. "ch" in traditional Spanish). Only single code points that are the start of a contraction are returned; for complete contraction strings use #contractions_and_expansions.

col = ICU::Collator.new("cs")   # Czech has "ch" as a contraction
col.contractions.includes?('c') # => true

[View source]
def contractions_and_expansions(add_prefixes : Bool = false) : Tuple(Set(Char), Set(Char)) #

Returns the contraction and expansion character sets for this collator.

  • The first element is the set of characters that start a contraction.
  • The second element is the set of characters involved in expansions (a single code point that maps to multiple collation elements).

Pass add_prefixes: true to also include prefix characters.

conts, exps = ICU::Collator.new("cs").contractions_and_expansions
conts.includes?('c') # => true

[View source]
def equals?(s1 : String, s2 : String) : Bool #

Returns true if the two strings are equivalent

col = ICU::Collator.new(rules: "&a = b")
col.equals?("a", "b") # => true

[View source]
def finalize #

[View source]
def locale : String? #

[View source]
def reorder_codes : Array(ReorderCode) #

[View source]
def reorder_codes=(codes : Array(ReorderCode)) #

[View source]
def rules : String? #

[View source]
def strength : Strength #

[View source]
def strength=(value : Strength) #

[View source]
def tailored_set : Set(Char) #

Returns the set of characters that are tailored (customized) by this collator relative to the root collation order.

The result is a plain Set(Char). Multi-character contractions that do not resolve to a single code point are not included (use #contractions_and_expansions if you need them).

col = ICU::Collator.new(rules: "&b < a")
col.tailored_set.includes?('a') # => true
col.tailored_set.includes?('b') # => true

[View source]
def to_unsafe : LibICU::UCollator #

[View source]