Adding text to the buffer: HarfBuzz Manual

Adding text to the buffer

Now we have a brand new HarfBuzz buffer. Let's start filling it with text! From HarfBuzz's perspective, a buffer is just a stream of Unicode code points, but your input string is probably in one of the standard Unicode character encodings (UTF-8, UTF-16, or UTF-32). HarfBuzz provides convenience functions that accept each of these encodings: hb_buffer_add_utf8(), hb_buffer_add_utf16(), and hb_buffer_add_utf32(). Other than the character encoding they accept, they function identically.

You can add UTF-8 text to a buffer by passing in the text array, the array's length, an offset into the array for the first character to add, and the length of the segment to add:

    hb_buffer_add_utf8 (hb_buffer_t *buf,
                    const char *text,
                    int text_length,
                    unsigned int item_offset,
                    int item_length)

So, in practice, you can say:

      hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));

This will append your new characters to buf, not replace its existing contents. Also, note that you can use -1 in place of the first instance of strlen(text) if your text array is NULL-terminated. Similarly, you can also use -1 as the final argument want to add its full contents.

Whatever start item_offset and item_length you provide, HarfBuzz will also attempt to grab the five characters before the offset point and the five characters after the designated end. These are the before and after "context" segments, which are used internally for HarfBuzz to make shaping decisions. They will not be part of the final output, but they ensure that HarfBuzz's script-specific shaping operations are correct. If there are fewer than five characters available for the before or after contexts, HarfBuzz will just grab what is there.

For longer text runs, such as full paragraphs, it might be tempting to only add smaller sub-segments to a buffer and shape them in piecemeal fashion. Generally, this is not a good idea, however, because a lot of shaping decisions are dependent on this context information. For example, in Arabic and other connected scripts, HarfBuzz needs to know the code points before and after each character in order to correctly determine which glyph to return.

The safest approach is to add all of the text available (even if your text contains a mix of scripts, directions, languages and fonts), then use item_offset and item_length to indicate which characters you want shaped (which must all have the same script, direction, language and font), so that HarfBuzz has access to any context.

You can also add Unicode code points directly with hb_buffer_add_codepoints(). The arguments to this function are the same as those for the UTF encodings. But it is particularly important to note that HarfBuzz does not do validity checking on the text that is added to a buffer. Invalid code points will be replaced, but it is up to you to do any deep-sanity checking necessary.