Customizing Unicode functions

HarfBuzz requires some simple functions for accessing information from the Unicode Character Database (such as the General_Category (gc) and Script (sc) properties) that is useful for shaping, as well as some useful operations like composing and decomposing code points.

HarfBuzz includes its own internal, lightweight set of Unicode functions. At build time, it is also possible to compile support for some other options, such as the Unicode functions provided by GLib or the International Components for Unicode (ICU) library. Generally, this option is only of interest for client programs that have specific integration requirements or that do a significant amount of customization.

If your program has access to other Unicode functions, however, such as through a system library or application framework, you might prefer to use those instead of the built-in options. HarfBuzz supports this by implementing its Unicode functions as a set of virtual methods that you can replace — without otherwise affecting HarfBuzz's functionality.

The Unicode functions are specified in a structure called unicode_funcs which is attached to each buffer. But even though unicode_funcs is associated with a hb_buffer_t, the functions themselves are called by other HarfBuzz APIs that access buffers, so it would be unwise for you to hook different functions into different buffers.

In addition, you can mark your unicode_funcs as immutable by calling hb_unicode_funcs_make_immutable (ufuncs). This is especially useful if your code is a library or framework that will have its own client programs. By marking your Unicode function choices as immutable, you prevent your own client programs from changing the unicode_funcs configuration and introducing inconsistencies and errors downstream.

You can retrieve the Unicode-functions configuration for your buffer by calling hb_buffer_get_unicode_funcs():

      hb_unicode_funcs_t *ufunctions;
      ufunctions = hb_buffer_get_unicode_funcs(buf);

The current version of unicode_funcs uses six functions:

Note, however, that future HarfBuzz releases may alter this set.

Each Unicode function has a corresponding setter, with which you can assign a callback to your replacement function. For example, to replace hb_unicode_general_category_func_t, you can call

      hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)	    

Virtualizing this set of Unicode functions is primarily intended to improve portability. There is no need for every client program to make the effort to replace the default options, so if you are unsure, do not feel any pressure to customize unicode_funcs.