Unicode
Functions for Unicode strings.
List of functions
-
Unicode::IsUtf(String) -> BoolChecks whether a string is a valid UTF-8 sequence. For example, the string
"\xF0"isn't a valid UTF-8 sequence, but the string"\xF0\x9F\x90\xB1"correctly describes a UTF-8 cat emoji. -
Unicode::GetLength(Utf8{Flags:AutoMap}) -> Uint64Returns the length of a utf-8 string in unicode code points. Surrogate pairs are counted as one character.
SELECT Unicode::GetLength("жніўня"); -- 6 -
Unicode::Find(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64? -
Unicode::RFind(string:Utf8{Flags:AutoMap}, subString:Utf8, [pos:Uint64?]) -> Uint64?Finding the first (
RFind- the last) occurrence of a substring in a string starting from theposposition. Returns the position of the first character from the found substring. In case of failure, returns Null.SELECT Unicode::Find("aaa", "bb"); -- Null -
Unicode::Substring(string:Utf8{Flags:AutoMap}, from:Uint64?, len:Uint64?) -> Utf8Returns a
stringsubstring starting withfromthat islencharacters long. If thelenargument is omitted, the substring is taken to the end of the source string.If
fromexceeds the length of the original string, an empty string""is returned.SELECT Unicode::Substring("0123456789abcdefghij", 10); -- "abcdefghij" -
The
Unicode::Normalize...functions convert the passed UTF-8 string to a normalization form:Unicode::Normalize(Utf8{Flags:AutoMap}) -> Utf8-- NFCUnicode::NormalizeNFD(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFC(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFKD(Utf8{Flags:AutoMap}) -> Utf8Unicode::NormalizeNFKC(Utf8{Flags:AutoMap}) -> Utf8
-
Unicode::Translit(string:Utf8{Flags:AutoMap}, [lang:String?]) -> Utf8Transliterates with Latin letters the words from the passed string, consisting entirely of characters of the alphabet of the language passed by the second argument. If no language is specified, the words are transliterated from Russian. Available languages: "kaz", "rus", "tur", and "ukr".
SELECT Unicode::Translit("Тот уголок земли, где я провел"); -- "Tot ugolok zemli, gde ya provel" -
Unicode::LevensteinDistance(stringA:Utf8{Flags:AutoMap}, stringB:Utf8{Flags:AutoMap}) -> Uint64Calculates the Levenshtein distance for the passed strings.
-
Unicode::Fold(Utf8{Flags:AutoMap}, [ Language:String?, DoLowerCase:Bool?, DoRenyxa:Bool?, DoSimpleCyr:Bool?, FillOffset:Bool? ]) -> Utf8Performs case folding on the passed string.
Parameters:
Languageis set according to the same rules as inUnicode::Translit().DoLowerCaseconverts a string to lowercase letters, defaults totrue.DoRenyxaconverts diacritical characters to similar Latin characters, defaults totrue.DoSimpleCyrconverts diacritical Cyrillic characters to similar Latin characters, defaults totrue.FillOffsetparameter is not used.
SELECT Unicode::Fold("Kongreßstraße", false AS DoSimpleCyr, false AS DoRenyxa); -- "kongressstrasse" SELECT Unicode::Fold("ҫурт"); -- "сурт" SELECT Unicode::Fold("Eylül", "Turkish" AS Language); -- "eylul" -
Unicode::ReplaceAll(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8 -
Unicode::ReplaceFirst(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8 -
Unicode::ReplaceLast(input:Utf8{Flags:AutoMap}, find:Utf8, replacement:Utf8) -> Utf8Replaces all/first/last occurrences of the
findstring in theinputwithreplacement. -
Unicode::RemoveAll(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8 -
Unicode::RemoveFirst(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8 -
Unicode::RemoveLast(input:Utf8{Flags:AutoMap}, symbols:Utf8) -> Utf8Deletes all/first/last occurrences of characters in the
symbolsset from theinput. The second argument is interpreted as an unordered set of characters to be removed.SELECT Unicode::ReplaceLast("absence", "enc", ""); -- "abse" SELECT Unicode::RemoveAll("abandon", "an"); -- "bdo" -
Unicode::ToCodePointList(Utf8{Flags:AutoMap}) -> List<Uint32>Splits a string into a Unicode sequence of codepoints.
-
Unicode::FromCodePointList(List<Uint32>{Flags:AutoMap}) -> Utf8Generates a Unicode string from codepoints.
SELECT Unicode::ToCodePointList("Щавель"); -- [1065, 1072, 1074, 1077, 1083, 1100] SELECT Unicode::FromCodePointList(AsList(99,111,100,101,32,112,111,105,110,116,115,32,99,111,110,118,101,114,116,101,114)); -- "code points converter" -
Unicode::Reverse(Utf8{Flags:AutoMap}) -> Utf8Reverses a string.
-
Unicode::ToLower(Utf8{Flags:AutoMap}) -> Utf8 -
Unicode::ToUpper(Utf8{Flags:AutoMap}) -> Utf8 -
Unicode::ToTitle(Utf8{Flags:AutoMap}) -> Utf8Converts a string to UPPER, lower, or Title case.
-
Unicode::SplitToList( string:Utf8?, separator:Utf8, [ DelimeterString:Bool?, SkipEmpty:Bool?, Limit:Uint64? ]) -> List<Utf8>Splits a string into substrings by separator.
string-- Source string.separator-- Separator. Parameters:- DelimeterString:Bool? — treating a delimiter as a string (true, by default) or a set of characters "any of" (false)
- SkipEmpty:Bool? - whether to skip empty strings in the result, is false by default
- Limit:Uint64? - Limits the number of fetched components (unlimited by default); if the limit is exceeded, the raw suffix of the source string is returned in the last item
-
Unicode::JoinFromList(List<Utf8>{Flags:AutoMap}, separator:Utf8) -> Utf8Concatenates a list of strings via a
separatorinto a single string.SELECT Unicode::SplitToList("One, two, three, four, five", ", ", 2 AS Limit); -- ["One", "two", "three, four, five"] SELECT Unicode::JoinFromList(["One", "two", "three", "four", "five"], ";"); -- "One;two;three;four;five" -
Unicode::ToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64Converts a string to a number.
The second optional argument sets the number system. By default, 0 (automatic detection by prefix).
Supported prefixes:0x(0X)- base-16,0- base-8. Defaults to base-10.
The-sign before a number is interpreted as in C unsigned arithmetic. For example,-0x1-> UI64_MAX.
If there are incorrect characters in a string or a number goes beyond ui64, the function terminates with an error. -
Unicode::TryToUint64(string:Utf8{Flags:AutoMap}, [prefix:Uint16?]) -> Uint64?Similar to the
Unicode::ToUint64()function, except that it returnsNULLinstead of an error.SELECT Unicode::ToUint64("77741"); -- 77741 SELECT Unicode::ToUint64("-77741"); -- 18446744073709473875 SELECT Unicode::TryToUint64("asdh831"); -- Null