hb-unicode

hb-unicode — Unicode character property access

Functions

Types and Values

Includes

#include <hb.h>

Description

Unicode functions are used to access Unicode character properties. With these functions, client programs can query various properties from the Unicode Character Database for any code point, such as General Category (gc), Script (sc), Canonical Combining Class (ccc), etc.

Client programs can optionally pass in their own Unicode functions that implement the same queries. The set of functions available is defined by the virtual methods in hb_unicode_funcs_t.

HarfBuzz provides built-in default functions for each method in hb_unicode_funcs_t.

Functions

hb_unicode_general_category ()

hb_unicode_general_category_t
hb_unicode_general_category (hb_unicode_funcs_t *ufuncs,
                             hb_codepoint_t unicode);

Retrieves the General Category (gc) property of code point unicode .

Parameters

ufuncs

The Unicode-functions structure

 

unicode

The code point to query

 

Returns

The hb_unicode_general_category_t of unicode

Since: 0.9.2


hb_unicode_combining_class ()

hb_unicode_combining_class_t
hb_unicode_combining_class (hb_unicode_funcs_t *ufuncs,
                            hb_codepoint_t unicode);

Retrieves the Canonical Combining Class (ccc) property of code point unicode .

Parameters

ufuncs

The Unicode-functions structure

 

unicode

The code point to query

 

Returns

The hb_unicode_combining_class_t of unicode

Since: 0.9.2


hb_unicode_mirroring ()

hb_codepoint_t
hb_unicode_mirroring (hb_unicode_funcs_t *ufuncs,
                      hb_codepoint_t unicode);

Retrieves the Bi-directional Mirroring Glyph code point defined for code point unicode .

Parameters

ufuncs

The Unicode-functions structure

 

unicode

The code point to query

 

Returns

The hb_codepoint_t of the Mirroring Glyph for unicode

Since: 0.9.2


hb_unicode_script ()

hb_script_t
hb_unicode_script (hb_unicode_funcs_t *ufuncs,
                   hb_codepoint_t unicode);

Retrieves the hb_script_t script to which code point unicode belongs.

Parameters

ufuncs

The Unicode-functions structure

 

unicode

The code point to query

 

Returns

The hb_script_t of unicode

Since: 0.9.2


hb_unicode_compose ()

hb_bool_t
hb_unicode_compose (hb_unicode_funcs_t *ufuncs,
                    hb_codepoint_t a,
                    hb_codepoint_t b,
                    hb_codepoint_t *ab);

Fetches the composition of a sequence of two Unicode code points.

Calls the composition function of the specified Unicode-functions structure ufuncs .

Parameters

ufuncs

The Unicode-functions structure

 

a

The first Unicode code point to compose

 

b

The second Unicode code point to compose

 

ab

The composition of a , b .

[out]

Returns

true if a and b composed, false otherwise

Since: 0.9.2


hb_unicode_decompose ()

hb_bool_t
hb_unicode_decompose (hb_unicode_funcs_t *ufuncs,
                      hb_codepoint_t ab,
                      hb_codepoint_t *a,
                      hb_codepoint_t *b);

Fetches the decomposition of a Unicode code point.

Calls the decomposition function of the specified Unicode-functions structure ufuncs .

Parameters

ufuncs

The Unicode-functions structure

 

ab

Unicode code point to decompose

 

a

The first code point of the decomposition of ab .

[out]

b

The second code point of the decomposition of ab .

[out]

Returns

true if ab was decomposed, false otherwise

Since: 0.9.2


hb_unicode_funcs_create ()

hb_unicode_funcs_t *
hb_unicode_funcs_create (hb_unicode_funcs_t *parent);

Creates a new hb_unicode_funcs_t structure of Unicode functions.

Parameters

parent

Parent Unicode-functions structure.

[nullable]

Returns

The Unicode-functions structure.

[transfer full]

Since: 0.9.2


hb_unicode_funcs_get_empty ()

hb_unicode_funcs_t *
hb_unicode_funcs_get_empty (void);

Fetches the singleton empty Unicode-functions structure.

Returns

The empty Unicode-functions structure.

[transfer full]

Since: 0.9.2


hb_unicode_funcs_reference ()

hb_unicode_funcs_t *
hb_unicode_funcs_reference (hb_unicode_funcs_t *ufuncs);

Increases the reference count on a Unicode-functions structure.

[skip]

Parameters

ufuncs

The Unicode-functions structure

 

Returns

The Unicode-functions structure.

[transfer full]

Since: 0.9.2


hb_unicode_funcs_destroy ()

void
hb_unicode_funcs_destroy (hb_unicode_funcs_t *ufuncs);

Decreases the reference count on a Unicode-functions structure. When the reference count reaches zero, the Unicode-functions structure is destroyed, freeing all memory.

[skip]

Parameters

ufuncs

The Unicode-functions structure

 

Since: 0.9.2


hb_unicode_funcs_set_user_data ()

hb_bool_t
hb_unicode_funcs_set_user_data (hb_unicode_funcs_t *ufuncs,
                                hb_user_data_key_t *key,
                                void *data,
                                hb_destroy_func_t destroy,
                                hb_bool_t replace);

Attaches a user-data key/data pair to the specified Unicode-functions structure.

[skip]

Parameters

ufuncs

The Unicode-functions structure

 

key

The user-data key

 

data

A pointer to the user data

 

destroy

A callback to call when data is not needed anymore.

[nullable]

replace

Whether to replace an existing data with the same key

 

Returns

true if success, false otherwise

Since: 0.9.2


hb_unicode_funcs_get_user_data ()

void *
hb_unicode_funcs_get_user_data (const hb_unicode_funcs_t *ufuncs,
                                hb_user_data_key_t *key);

Fetches the user-data associated with the specified key, attached to the specified Unicode-functions structure.

[skip]

Parameters

ufuncs

The Unicode-functions structure

 

key

The user-data key to query

 

Returns

A pointer to the user data.

[transfer none]

Since: 0.9.2


hb_unicode_funcs_make_immutable ()

void
hb_unicode_funcs_make_immutable (hb_unicode_funcs_t *ufuncs);

Makes the specified Unicode-functions structure immutable.

Parameters

ufuncs

The Unicode-functions structure

 

Since: 0.9.2


hb_unicode_funcs_is_immutable ()

hb_bool_t
hb_unicode_funcs_is_immutable (hb_unicode_funcs_t *ufuncs);

Tests whether the specified Unicode-functions structure is immutable.

Parameters

ufuncs

The Unicode-functions structure

 

Returns

true if ufuncs is immutable, false otherwise

Since: 0.9.2


hb_unicode_funcs_get_default ()

hb_unicode_funcs_t *
hb_unicode_funcs_get_default (void);

Fetches a pointer to the default Unicode-functions structure that is used when no functions are explicitly set on hb_buffer_t.

Returns

a pointer to the hb_unicode_funcs_t Unicode-functions structure.

[transfer none]

Since: 0.9.2


hb_unicode_funcs_get_parent ()

hb_unicode_funcs_t *
hb_unicode_funcs_get_parent (hb_unicode_funcs_t *ufuncs);

Fetches the parent of the Unicode-functions structure ufuncs .

Parameters

ufuncs

The Unicode-functions structure

 

Returns

The parent Unicode-functions structure

Since: 0.9.2


hb_unicode_general_category_func_t ()

hb_unicode_general_category_t
(*hb_unicode_general_category_func_t) (hb_unicode_funcs_t *ufuncs,
                                       hb_codepoint_t unicode,
                                       void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should retrieve the General Category property for a specified Unicode code point.

Parameters

ufuncs

A Unicode-functions structure

 

unicode

The code point to query

 

user_data

User data pointer passed by the caller

 

Returns

The hb_unicode_general_category_t of unicode


hb_unicode_funcs_set_general_category_func ()

void
hb_unicode_funcs_set_general_category_func
                               (hb_unicode_funcs_t *ufuncs,
                                hb_unicode_general_category_func_t func,
                                void *user_data,
                                hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_general_category_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2


hb_unicode_combining_class_func_t ()

hb_unicode_combining_class_t
(*hb_unicode_combining_class_func_t) (hb_unicode_funcs_t *ufuncs,
                                      hb_codepoint_t unicode,
                                      void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should retrieve the Canonical Combining Class (ccc) property for a specified Unicode code point.

Parameters

ufuncs

A Unicode-functions structure

 

unicode

The code point to query

 

user_data

User data pointer passed by the caller

 

Returns

The hb_unicode_combining_class_t of unicode


hb_unicode_funcs_set_combining_class_func ()

void
hb_unicode_funcs_set_combining_class_func
                               (hb_unicode_funcs_t *ufuncs,
                                hb_unicode_combining_class_func_t func,
                                void *user_data,
                                hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_combining_class_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2


hb_unicode_mirroring_func_t ()

hb_codepoint_t
(*hb_unicode_mirroring_func_t) (hb_unicode_funcs_t *ufuncs,
                                hb_codepoint_t unicode,
                                void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should retrieve the Bi-Directional Mirroring Glyph code point for a specified Unicode code point.

Note: If a code point does not have a specified Bi-Directional Mirroring Glyph defined, the method should return the original code point.

Parameters

ufuncs

A Unicode-functions structure

 

unicode

The code point to query

 

user_data

User data pointer passed by the caller

 

Returns

The hb_codepoint_t of the Mirroring Glyph for unicode


hb_unicode_funcs_set_mirroring_func ()

void
hb_unicode_funcs_set_mirroring_func (hb_unicode_funcs_t *ufuncs,
                                     hb_unicode_mirroring_func_t func,
                                     void *user_data,
                                     hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_mirroring_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2


hb_unicode_script_func_t ()

hb_script_t
(*hb_unicode_script_func_t) (hb_unicode_funcs_t *ufuncs,
                             hb_codepoint_t unicode,
                             void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should retrieve the Script property for a specified Unicode code point.

Parameters

ufuncs

A Unicode-functions structure

 

unicode

The code point to query

 

user_data

User data pointer passed by the caller

 

Returns

The hb_script_t of unicode


hb_unicode_funcs_set_script_func ()

void
hb_unicode_funcs_set_script_func (hb_unicode_funcs_t *ufuncs,
                                  hb_unicode_script_func_t func,
                                  void *user_data,
                                  hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_script_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2


hb_unicode_compose_func_t ()

hb_bool_t
(*hb_unicode_compose_func_t) (hb_unicode_funcs_t *ufuncs,
                              hb_codepoint_t a,
                              hb_codepoint_t b,
                              hb_codepoint_t *ab,
                              void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should compose a sequence of two input Unicode code points by canonical equivalence, returning the composed code point in a hb_codepoint_t output parameter (if successful). The method must return an hb_bool_t indicating the success of the composition.

Parameters

ufuncs

A Unicode-functions structure

 

a

The first code point to compose

 

b

The second code point to compose

 

ab

The composed code point.

[out]

user_data

user data pointer passed by the caller

 

Returns

true is a ,b composed, false otherwise


hb_unicode_funcs_set_compose_func ()

void
hb_unicode_funcs_set_compose_func (hb_unicode_funcs_t *ufuncs,
                                   hb_unicode_compose_func_t func,
                                   void *user_data,
                                   hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_compose_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2


hb_unicode_decompose_func_t ()

hb_bool_t
(*hb_unicode_decompose_func_t) (hb_unicode_funcs_t *ufuncs,
                                hb_codepoint_t ab,
                                hb_codepoint_t *a,
                                hb_codepoint_t *b,
                                void *user_data);

A virtual method for the hb_unicode_funcs_t structure.

This method should decompose an input Unicode code point, returning the two decomposed code points in hb_codepoint_t output parameters (if successful). The method must return an hb_bool_t indicating the success of the composition.

Parameters

ufuncs

A Unicode-functions structure

 

ab

The code point to decompose

 

a

The first decomposed code point.

[out]

b

The second decomposed code point.

[out]

user_data

user data pointer passed by the caller

 

Returns

true if ab decomposed, false otherwise


hb_unicode_funcs_set_decompose_func ()

void
hb_unicode_funcs_set_decompose_func (hb_unicode_funcs_t *ufuncs,
                                     hb_unicode_decompose_func_t func,
                                     void *user_data,
                                     hb_destroy_func_t destroy);

Sets the implementation function for hb_unicode_decompose_func_t.

Parameters

ufuncs

A Unicode-functions structure

 

func

The callback function to assign.

[closure user_data][destroy destroy][scope notified]

user_data

Data to pass to func

 

destroy

The function to call when user_data is not needed anymore.

[nullable]

Since: 0.9.2

Types and Values

HB_UNICODE_MAX

#define HB_UNICODE_MAX 0x10FFFFu

Maximum valid Unicode code point.

Since: 1.9.0


enum hb_unicode_combining_class_t

Data type for the Canonical_Combining_Class (ccc) property from the Unicode Character Database.

Note: newer versions of Unicode may add new values. Client programs should be ready to handle any value in the 0..254 range being returned from hb_unicode_combining_class().

Members

HB_UNICODE_COMBINING_CLASS_NOT_REORDERED

Spacing and enclosing marks; also many vowel and consonant signs, even if nonspacing

 

HB_UNICODE_COMBINING_CLASS_OVERLAY

Marks which overlay a base letter or symbol

 

HB_UNICODE_COMBINING_CLASS_NUKTA

Diacritic nukta marks in Brahmi-derived scripts

 

HB_UNICODE_COMBINING_CLASS_KANA_VOICING

Hiragana/Katakana voicing marks

 

HB_UNICODE_COMBINING_CLASS_VIRAMA

Viramas

 

HB_UNICODE_COMBINING_CLASS_CCC10

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC11

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC12

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC13

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC14

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC15

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC16

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC17

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC18

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC19

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC20

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC21

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC22

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC23

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC24

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC25

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC26

[Hebrew]

 

HB_UNICODE_COMBINING_CLASS_CCC27

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC28

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC29

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC30

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC31

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC32

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC33

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC34

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC35

[Arabic]

 

HB_UNICODE_COMBINING_CLASS_CCC36

[Syriac]

 

HB_UNICODE_COMBINING_CLASS_CCC84

[Telugu]

 

HB_UNICODE_COMBINING_CLASS_CCC91

[Telugu]

 

HB_UNICODE_COMBINING_CLASS_CCC103

[Thai]

 

HB_UNICODE_COMBINING_CLASS_CCC107

[Thai]

 

HB_UNICODE_COMBINING_CLASS_CCC118

[Lao]

 

HB_UNICODE_COMBINING_CLASS_CCC122

[Lao]

 

HB_UNICODE_COMBINING_CLASS_CCC129

[Tibetan]

 

HB_UNICODE_COMBINING_CLASS_CCC130

[Tibetan]

 

HB_UNICODE_COMBINING_CLASS_CCC132

[Tibetan] Since: 7.2.0

 

HB_UNICODE_COMBINING_CLASS_ATTACHED_BELOW_LEFT

Marks attached at the bottom left

 

HB_UNICODE_COMBINING_CLASS_ATTACHED_BELOW

Marks attached directly below

 

HB_UNICODE_COMBINING_CLASS_ATTACHED_ABOVE

Marks attached directly above

 

HB_UNICODE_COMBINING_CLASS_ATTACHED_ABOVE_RIGHT

Marks attached at the top right

 

HB_UNICODE_COMBINING_CLASS_BELOW_LEFT

Distinct marks at the bottom left

 

HB_UNICODE_COMBINING_CLASS_BELOW

Distinct marks directly below

 

HB_UNICODE_COMBINING_CLASS_BELOW_RIGHT

Distinct marks at the bottom right

 

HB_UNICODE_COMBINING_CLASS_LEFT

Distinct marks to the left

 

HB_UNICODE_COMBINING_CLASS_RIGHT

Distinct marks to the right

 

HB_UNICODE_COMBINING_CLASS_ABOVE_LEFT

Distinct marks at the top left

 

HB_UNICODE_COMBINING_CLASS_ABOVE

Distinct marks directly above

 

HB_UNICODE_COMBINING_CLASS_ABOVE_RIGHT

Distinct marks at the top right

 

HB_UNICODE_COMBINING_CLASS_DOUBLE_BELOW

Distinct marks subtending two bases

 

HB_UNICODE_COMBINING_CLASS_DOUBLE_ABOVE

Distinct marks extending above two bases

 

HB_UNICODE_COMBINING_CLASS_IOTA_SUBSCRIPT

Greek iota subscript only

 

HB_UNICODE_COMBINING_CLASS_INVALID

Invalid combining class

 

enum hb_unicode_general_category_t

Data type for the "General_Category" (gc) property from the Unicode Character Database.

Members

HB_UNICODE_GENERAL_CATEGORY_CONTROL

[Cc]

 

HB_UNICODE_GENERAL_CATEGORY_FORMAT

[Cf]

 

HB_UNICODE_GENERAL_CATEGORY_UNASSIGNED

[Cn]

 

HB_UNICODE_GENERAL_CATEGORY_PRIVATE_USE

[Co]

 

HB_UNICODE_GENERAL_CATEGORY_SURROGATE

[Cs]

 

HB_UNICODE_GENERAL_CATEGORY_LOWERCASE_LETTER

[Ll]

 

HB_UNICODE_GENERAL_CATEGORY_MODIFIER_LETTER

[Lm]

 

HB_UNICODE_GENERAL_CATEGORY_OTHER_LETTER

[Lo]

 

HB_UNICODE_GENERAL_CATEGORY_TITLECASE_LETTER

[Lt]

 

HB_UNICODE_GENERAL_CATEGORY_UPPERCASE_LETTER

[Lu]

 

HB_UNICODE_GENERAL_CATEGORY_SPACING_MARK

[Mc]

 

HB_UNICODE_GENERAL_CATEGORY_ENCLOSING_MARK

[Me]

 

HB_UNICODE_GENERAL_CATEGORY_NON_SPACING_MARK

[Mn]

 

HB_UNICODE_GENERAL_CATEGORY_DECIMAL_NUMBER

[Nd]

 

HB_UNICODE_GENERAL_CATEGORY_LETTER_NUMBER

[Nl]

 

HB_UNICODE_GENERAL_CATEGORY_OTHER_NUMBER

[No]

 

HB_UNICODE_GENERAL_CATEGORY_CONNECT_PUNCTUATION

[Pc]

 

HB_UNICODE_GENERAL_CATEGORY_DASH_PUNCTUATION

[Pd]

 

HB_UNICODE_GENERAL_CATEGORY_CLOSE_PUNCTUATION

[Pe]

 

HB_UNICODE_GENERAL_CATEGORY_FINAL_PUNCTUATION

[Pf]

 

HB_UNICODE_GENERAL_CATEGORY_INITIAL_PUNCTUATION

[Pi]

 

HB_UNICODE_GENERAL_CATEGORY_OTHER_PUNCTUATION

[Po]

 

HB_UNICODE_GENERAL_CATEGORY_OPEN_PUNCTUATION

[Ps]

 

HB_UNICODE_GENERAL_CATEGORY_CURRENCY_SYMBOL

[Sc]

 

HB_UNICODE_GENERAL_CATEGORY_MODIFIER_SYMBOL

[Sk]

 

HB_UNICODE_GENERAL_CATEGORY_MATH_SYMBOL

[Sm]

 

HB_UNICODE_GENERAL_CATEGORY_OTHER_SYMBOL

[So]

 

HB_UNICODE_GENERAL_CATEGORY_LINE_SEPARATOR

[Zl]

 

HB_UNICODE_GENERAL_CATEGORY_PARAGRAPH_SEPARATOR

[Zp]

 

HB_UNICODE_GENERAL_CATEGORY_SPACE_SEPARATOR

[Zs]

 

hb_unicode_funcs_t

typedef struct hb_unicode_funcs_t hb_unicode_funcs_t;

Data type containing a set of virtual methods used for accessing various Unicode character properties.

HarfBuzz provides a default function for each of the methods in hb_unicode_funcs_t. Client programs can implement their own replacements for the individual Unicode functions, as needed, and replace the default by calling the setter for a method.