Splits CJK (Chinese, Japanese, Korean) text into tokens.
The Search module matches exact words, where a word is defined to be a sequence of characters delimited by spaces or punctuation. CJK languages are written in long strings of characters, though, not split up into words. So in order to allow search matching, we split up CJK text into tokens consisting of consecutive, overlapping sequences of characters whose length is equal to the 'minimum_word_size' variable. T