Now we have a brand new HarfBuzz buffer. Let's start filling it
with text! From HarfBuzz's perspective, a buffer is just a stream
of Unicode code points, but your input string is probably in one of
the standard Unicode character encodings (UTF-8, UTF-16, or
UTF-32). HarfBuzz provides convenience functions that accept
each of these encodings:
hb_buffer_add_utf8()
,
hb_buffer_add_utf16()
, and
hb_buffer_add_utf32()
. Other than the
character encoding they accept, they function identically.
You can add UTF-8 text to a buffer by passing in the text array, the array's length, an offset into the array for the first character to add, and the length of the segment to add:
hb_buffer_add_utf8 (hb_buffer_t *buf, const char *text, int text_length, unsigned int item_offset, int item_length)
So, in practice, you can say:
hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
This will append your new characters to
buf
, not replace its existing
contents. Also, note that you can use -1
in
place of the first instance of strlen(text)
if your text array is NULL-terminated. Similarly, you can also use
-1
as the final argument want to add its full
contents.
Whatever start item_offset
and
item_length
you provide, HarfBuzz will also
attempt to grab the five characters before
the offset point and the five characters
after the designated end. These are the
before and after "context" segments, which are used internally
for HarfBuzz to make shaping decisions. They will not be part of
the final output, but they ensure that HarfBuzz's
script-specific shaping operations are correct. If there are
fewer than five characters available for the before or after
contexts, HarfBuzz will just grab what is there.
For longer text runs, such as full paragraphs, it might be tempting to only add smaller sub-segments to a buffer and shape them in piecemeal fashion. Generally, this is not a good idea, however, because a lot of shaping decisions are dependent on this context information. For example, in Arabic and other connected scripts, HarfBuzz needs to know the code points before and after each character in order to correctly determine which glyph to return.
The safest approach is to add all of the text available (even
if your text contains a mix of scripts, directions, languages
and fonts), then use item_offset
and
item_length
to indicate which characters you
want shaped (which must all have the same script, direction,
language and font), so that HarfBuzz has access to any context.
You can also add Unicode code points directly with
hb_buffer_add_codepoints()
. The arguments
to this function are the same as those for the UTF
encodings. But it is particularly important to note that
HarfBuzz does not do validity checking on the text that is added
to a buffer. Invalid code points will be replaced, but it is up
to you to do any deep-sanity checking necessary.