The distinction between levels 0 and 1

The preceding examples demonstrate the main effects of using cluster levels 0 and 1. The only difference between the two levels is this: in level 0, at the very beginning of the shaping process, HarfBuzz merges the cluster of each base character with the clusters of all Unicode marks (combining or not) and modifiers that follow it.

For example, let us start with the following character sequence (top row) and accompanying initial cluster values (bottom row):

      A,acute,B
      0,1    ,2
    

The acute is a Unicode mark. If HarfBuzz is using cluster level 0 on this sequence, then the A and acute clusters will merge, and the result will become:

      A,acute,B
      0,0    ,2
    

This merger is performed before any other script-shaping steps.

This initial cluster merging is the default behavior of the Windows shaping engine, and the old HarfBuzz codebase copied that behavior to maintain compatibility. Consequently, it has remained the default behavior in the new HarfBuzz codebase.

But this initial cluster-merging behavior makes it impossible for client programs to implement some features (such as to color diacritic marks differently from their base characters). That is why, in level 1, HarfBuzz does not perform the initial merging step.

For client programs that rely on HarfBuzz cluster values to perform cursor positioning, level 0 is more convenient. But relying on cluster boundaries for cursor positioning is wrong: cursor positions should be determined based on Unicode grapheme boundaries, not on shaping-cluster boundaries. As such, using level 1 clustering behavior is recommended.

One final facet of levels 0 and 1 is worth noting. HarfBuzz currently does not allow any multiple-substitution GSUB lookups to replace a glyph with zero glyphs (in other words, to delete a glyph).

But, in some other situations, glyphs can be deleted. In those cases, if the glyph being deleted is the last glyph of its cluster, HarfBuzz makes sure to merge the deleted glyph's cluster with a neighboring cluster.

This is done primarily to make sure that the starting cluster of the text always has the cluster index pointing to the start of the text for the run; more than one client program currently relies on this guarantee.

Incidentally, Apple's CoreText does something different to maintain the same promise: it inserts a glyph with id 65535 at the beginning of the glyph string if the glyph corresponding to the first character in the run was deleted. HarfBuzz might do something similar in the future.