Initial import of linebreak

2015-05-27 00:00:50 +03:00 · 2015-05-27 00:00:50 +03:00 · 74ca5511d7
commit 74ca5511d7
parent 56a5d7f63f
34 changed files with 6889 additions and 0 deletions
--- a/linebreak/linebreak/AUTHORS
+++ b/linebreak/linebreak/AUTHORS
@ -0,0 +1,8 @@
 Wu Yongwei.  Designed and implemented liblinebreak.
 Nikolay Pultsin.  Put forward the original requirements on liblinebreak,
 performed tests, and made a lot of suggestions on the initial versions.
 Thomas Klausner.  Autoconfiscated and libtoolized liblinebreak.
 Tom Hacohen.  Added word boundaries support.
--- a/linebreak/linebreak/CVS/Entries
+++ b/linebreak/linebreak/CVS/Entries
@ -0,0 +1,32 @@
 /AUTHORS/1.2/Wed Jan 18 14:26:13 2012//
 /ChangeLog/1.78/Sat Aug 11 07:35:23 2012//
 /Doxyfile/1.7/Sat Aug 11 06:55:18 2012//
 /LICENCE/1.4/Sat Aug 11 07:35:23 2012//
 /LineBreak1.sed/1.2/Sun Dec  7 10:54:37 2008//
 /LineBreak2.sed/1.2/Sun Dec  7 10:54:37 2008//
 /Makefile.am/1.8/Sat Aug 11 06:55:18 2012//
 /Makefile.gcc/1.4/Thu Jan 19 14:03:34 2012//
 /Makefile.msvc/1.5/Sat Aug 11 05:57:50 2012//
 /NEWS/1.7/Sat Aug 11 06:55:18 2012//
 /README/1.8/Sat Aug 11 06:55:18 2012//
 /bootstrap/1.1/Fri Dec 12 12:01:39 2008//
 /configure.ac/1.6/Sat Aug 11 06:55:18 2012//
 /filter_dup.c/1.1/Sat Feb 23 11:53:28 2008//
 /libunibreak.pc.in/1.1/Sat Aug 11 06:55:18 2012//
 /linebreak.c/1.25/Sat May  7 19:55:10 2011//
 /linebreak.h/1.14/Sat May  7 19:55:10 2011//
 /linebreakdata.c/1.5/Sat May  7 19:40:20 2011//
 /linebreakdata1.tmpl/1.1/Sat Feb 23 11:53:28 2008//
 /linebreakdata2.tmpl/1.2/Sun Mar  2 07:30:43 2008//
 /linebreakdata3.tmpl/1.1/Sat Feb 23 11:53:28 2008//
 /linebreakdef.c/1.12/Sat May  7 19:55:10 2011//
 /linebreakdef.h/1.12/Sat May  7 19:55:10 2011//
 /purge/1.1/Fri Dec 12 12:01:39 2008//
 /sort_numeric_hex.py/1.2/Wed Jan 18 14:26:13 2012//
 /wordbreak.c/1.3/Sat Feb  4 14:32:57 2012//
 /wordbreak.h/1.4/Sat Feb  4 14:32:58 2012//
 /wordbreakdata.c/1.2/Wed Jan 18 14:26:13 2012//
 /wordbreakdata1.tmpl/1.2/Wed Jan 18 14:26:13 2012//
 /wordbreakdata2.tmpl/1.2/Wed Jan 18 14:26:13 2012//
 /wordbreakdef.h/1.2/Wed Jan 18 14:26:13 2012//
 D
--- a/linebreak/linebreak/CVS/Repository
+++ b/linebreak/linebreak/CVS/Repository
@ -0,0 +1 @@
 common/tools/linebreak
--- a/linebreak/linebreak/CVS/Root
+++ b/linebreak/linebreak/CVS/Root
@ -0,0 +1 @@
 :pserver:anonymous@vimgadgets.cvs.sourceforge.net:/cvsroot/vimgadgets
--- a/linebreak/linebreak/ChangeLog
+++ b/linebreak/linebreak/ChangeLog
@ -0,0 +1,512 @@
 2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
 	* LICENCE: Add copyright information about Tom Hacohen.
 2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
 	* configure.ac (AC_INIT): Change the library name and version to
 	`libunibreak' and `1.0'.
 	(AC_PROG_LN_S): New macro.
 	(AC_OUTPUT): Change to `libunibreak.pc'.
 	* Doxyfile: (PROJECT_NAME): Change to `libunibreak'.
 	(PROJECT_NUMBER): Change to `1.0'.
 	* Makefile.am (lib_LTLIBRARIES): Change to `libunibreak.la'.
 	(pkgconfig_DATA): Change to `libunibreak.la'.
 	(libunibreak_la_LDFLAGS): Reset the version to `1:0'.
 	(install-exec-hook): Replace the static library liblinebreak.a with
 	a symlink to libunibreak.a.
 	* NEW: Add information about libunibreak 1.0.
 	* README: Change the library name, and add information about word
 	break.
 2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.msvc: Change the library name to `libunibreak', and the
 	output library to `unibreak.lib'.
 2012-02-04  Wu Yongwei  <wuyongwei@gmail.com>
 	* wordbreak.h (WORDBREAK_INSIDEACHAR): Change from
 	WORDBREAK_INSIDECHAR.
 	* wordbreak.c (set_brks_to): Change `WORDBREAK_INSIDECHAR' to
 	`WORDBREAK_INSIDEACHAR'.
 2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
 	* wordbreak.h: Change angle brackets to quotation marks (which
 	caused build errors).
 2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.gcc (CFILES): Add wordbreak.c.
 	(WordBreakProperty.txt): New target.
 	(wordbreakdata): New target.
 2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.am (liblinebreak_la_SOURCES): Remove wordbreakdata.c.
 	(EXTRA_DIST): Add wordbreakdata.c, wordbreakdata1.tmpl, and
 	wordbreakdata2.tmpl.
 2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.msvc: Add wordbreak files.
 2012-01-18  Tom Hacohen  <tom@stosb.com>
 	Add word breaking support.
 	* AUTHORS: Add `Tom Hacohen'.
 	* Makefile.am (include_HEADERS): Add header files for word breaking.
 	(liblinebreak_la_SOURCES): Add source files for word breaking.
 	(sort_numeric_hex.py): Add `sort_numeric_hex.py'.
 	(distclean-local): Clean also `WordBreakData.txt'.
 	(WordBreakProperty.txt): New target.
 	(wordbreakdata): New target.
 	* sort_numeric_hex.py: New file.
 	* wordbreak.c: New file.
 	* wordbreak.h: New file.
 	* wordbreakdef.h: New file.
 	* wordbreakdata.c: New file.
 	* wordbreakdata1.tmpl: New file.
 	* wordbreakdata2.tmpl: New file.
 2011-05-17  Wu Yongwei  <wuyongwei@gmail.com>
 	Add support for pkg-config (thanks to Tom Hacohen).
 	* liblinebreak.pc.in: New file.
 	* configure.ac (AC_OUTPUT): Add `liblinebreak.pc'.
 	* Makefile.am (pkgconfig_DATA): Set to `liblinebreak.pc'.
 	(pkgconfigdir): Set to `$(libdir)/pkgconfig'.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* README: Update the reference to UAX #14-26, for Unicode 6.0.0.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* configure.ac (AC_INIT): Increase the version to 2.1.
 	* Makefile.am (liblinebreak_la_LDFLAGS): Set the version-info to
 	`2:1'.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* LICENCE: Update the copyright year.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	Update for the 2.1 release.
 	* Doxyfile (PROJECT_NUMBER): Set to `2.1'.
 	* NEWS: Add information about the 2.1 release.
 	* linebreak.h (LINEBREAK_VERSION): Set to `0x0201'.
 	* linebreak.h: Update comments.
 	* linebreak.c: Ditto.
 	* linebreakdef.h: Ditto.
 	* linebreakdef.c: Ditto.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreakdata.c: Regenerate from LineBreak-6.0.0.txt.
 2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c (set_linebreaks): Fix the assertion failure when
 	U+FFFC (OBJECT REPLACEMENT CHARACTER) appears at the beginning of a
 	line (thanks to Tom Hacohen).
 2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
 	* LICENCE: Update the copyright year.
 2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
 	* NEWS: Add information about the 2.0 release.
 2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
 	* Doxyfile (PROJECT_NUMBER): Set to `2.0'.
 	(HAVE_DOT): Set to `YES'.
 2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c: Update the version number in comment to 2.0.
 	* linebreak.h: Ditto.
 	* linebreakdef.c: Ditto.
 	* linebreakdef.h: Ditto.
 2009-12-17  Wu Yongwei  <wuyongwei@gmail.com>
 	Change the values of enum BreakAction to the same length.
 	* linebreak.c (DIRECT_BRK): Rename to DIR_BRK.
 	(INDIRECT_BRK): Rename to IND_BRK.
 	(CM_INDIRECT_BRK): Rename to CMI_BRK.
 	(CM_PROHIBITED_BRK): Rename to CMP_BRK.
 	(PROHIBITED_BRK): Rename to PRH_BRK.
 2009-11-29  Wu Yongwei  <wuyongwei@gmail.com>
 	* Doxyfile (TAB_SIZE): Set to the correct size `4', as used in the
 	source files.
 2009-11-29  Wu Yongwei  <wuyongwei@gmail.com>
 	Update files according to UAX #14-24, for Unicode 5.2.0.
 	* linebreak.c: Update comments about UAX #14.
 	* linebreak.h: Ditto.
 	* linebreakdef.c: Ditto.
 	* linebreakdef.h: Ditto.
 	(LBP_CP): New enumerator for the new `CP' class as defined in
 	UAX #14-24.
 	* linebreak.c (baTable): Update for the new class `CP'.
 	* linebreakdata.c: Regenerate from LineBreak-5.2.0.txt.
 	* README: Update the reference to UAX #14-24, for Unicode 5.2.0.
 2009-05-03  Wu Yongwei  <wuyongwei@gmail.com>
 	* NEWS: Add information about the 1.2 release.
 2009-04-30  Wu Yongwei  <wuyongwei@gmail.com>
 	Optimize the Doxygen output.
 	* linebreak.c (lb_prop_index): Adjust its definition format
 	slightly.
 2009-04-30  Wu Yongwei  <wuyongwei@gmail.com>
 	* Doxyfile (USE_WINDOWS_ENCODING): Remove obsolete tag.
 	(DETAILS_AT_TOP): Ditto.
 	(MAX_DOT_GRAPH_WIDTH): Ditto.
 	(MAX_DOT_GRAPH_HEIGHT): Ditto.
 	(REFERENCED_BY_RELATION): Set to `NO'.
 	(REFERENCES_RELATION): Ditto.
 	(EXCLUDE): Add `filter_dup.c'.
 2009-04-28  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c (lb_get_next_char_utf8): Fix the issue that the index
 	can point to the middle of a UTF-8 sequence if End of String (EOS)
 	is encountered prematurely (thanks to Nikolay Pultsin and Rick Xu).
 	(lb_get_next_char_utf16): Fix the issue that the index can point to
 	the middle of a UTF-16 surrogate pair if EOS is encountered
 	prematurely.
 2009-04-20  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreakdef.c (lb_prop_English): Remove the specialization of
 	right single quotation mark as closing punctuation mark, because it
 	can be used as apostrophe.
 	(lb_prop_Spanish): Ditto.
 	(lb_prop_French): Ditto.
 2009-04-09  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.msvc: Make the `clean' target work on MSVC versions other
 	than 6.0; do not use precompiled header.
 2009-03-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.h: Correct the wrong date in the documentation comment.
 	* linebreakdef.h: Ditto.
 2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* configure.ac (AC_INIT): Increase the version to 2.0.
 	* Makefile.am (liblinebreak_la_LDFLAGS): Set the version-info to
 	`2:0'.
 2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.h (LINEBREAK_VERSION): New macro.
 	(linebreak_version): New global constant declaration.
 	* linebreak.c (linebreak_version): New global constant definition.
 2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
 	Reduce namespace pollution.
 	* linebreak.c (get_lb_prop_lang): Mark as static.
 	(get_next_char_utf8): Rename to lb_get_next_char_utf8.
 	(get_next_char_utf16): Rename to lb_get_next_char_utf32.
 	(get_next_char_utf32): Rename to lb_get_next_char_utf32.
 	(is_breakable): Rename to is_line_breakable.
 	* linebreak.h (get_next_char_utf8): Remove the function prototype
 	declaration.
 	(get_next_char_utf16): Ditto.
 	(get_next_char_utf32): Ditto.
 	(is_breakable): Rename to is_line_breakable.
 	* linebreakdef.h (lb_get_next_char_utf8): Add the function prototype
 	declaration.
 	(lb_get_next_char_utf16): Ditto.
 	(lb_get_next_char_utf32): Ditto.
 2009-02-06  Wu Yongwei  <wuyongwei@gmail.com>
 	* NEWS: Add information about the 1.1 release.
 2009-01-02  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.am (EXTRA_DIST): Add the missing `LICENCE' file.
 2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c: Update the version number in comment to 1.0.
 	* linebreak.h: Ditto.
 	* linebreakdef.c: Ditto.
 	* linebreakdef.h: Ditto.
 2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
 	* NEWS: Update for the 1.0 release.
 2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
 	* README: Correct two typos.
 2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
 	* README: Add the online URL reference.
 2008-12-30  Wu Yongwei  <wuyongwei@gmail.com>
 	* README: Update the reference to UAX #14-22, for Unicode 5.1.0.
 2008-12-13  Wu Yongwei  <wuyongwei@gmail.com>
 	Update files according to UAX #14-22, for Unicode 5.1.0.
 	* linebreak.c (baTable): Update according to Table 2 of UAX #14-22.
 	* linebreakdef.c (lb_prop_Spanish): Remove the unnecessary
 	customization for inverted marks in Spanish.
 	* linebreakdata.c: Regenerate from LineBreak-5.1.0.txt.
 	* linebreak.h: Update comment only.
 	* linebreakdef.h: Ditto.
 2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
 	* README: Update for the new build methods and better readability.
 2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.msvc: Correct the inconsistent naming in the output
 	message.
 2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
 	* configure.ac (AM_INIT_AUTOMAKE): Mark `foreign'.
 	* bootstrap: New file.
 	* purge: New file.
 	* Makefile.gcc (purge): Remove this target.
 2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* NEWS: New file.
 2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* AUTHORS: New file.
 2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.gcc (purge): New phony target to purge files generated by
 	autoconfiscation.
 2008-12-10  Thomas Klausner  <tk@giga.or.at>
 	* configure.ac: New file.
 	* Makefile.am: New file.
 2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
 	* Doxyfile (OUTPUT_DIRECTORY): Set to `doc'.
 	(ALPHABETICAL_INDEX): Set to `YES'.
 2008-12-09  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile.msvc: New file.
 2008-12-09  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile: Remove (to become Makefile.gcc).
 	* Makefile.gcc: New file (was Makefile).
 2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c: Adjust the comment that refers to Unicode Annex 14.
 	* linebreak.h: Ditto.
 	* linebreakdef.c: Ditto.
 	* linebreakdef.h: Ditto.
 2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
 	Use only POSIX basic regexp to ensure maximum portability (issues
 	have been found on Mac OS X, where GNU extensions do not work).
 	* LineBreak1.sed: Replace `[:xdigit:]' with `0-9A-F', and `\+' with
 	`\{1,\}'.
 	* LineBreak2.sed: Ditto.
 2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile: Replace `*.exe' with `filter_dup$(EXEEXT)', since the
 	extension `.exe' is specific to Windows.
 2008-04-20  Wu Yongwei  <wuyongwei@gmail.com>
 	Add README and LICENCE files, as well as a Doxyfile to generate
 	documents.
 	* README: New file.
 	* LICENCE: New file.
 	* Doxyfile: New file.
 	* Makefile (doc): Add new phony target.
 2008-04-04  Wu Yongwei  <wuyongwei@gmail.com>
 	Remove the English override for plus sign: it is better treated in
 	the text breaking program (see ../breaktext/ for an example).
 	* linebreakdef.c (lb_prop_English): Remove the line for plus sign.
 2008-03-29  Wu Yongwei <wuyongwei@gmail.com>
 	* Makefile: Correct the dependency-making rules when OLDGCC=Y.
 2008-03-23  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile (clean): Do not remove *.exe and tags here.
 	(distclean): Remove *.exe and tags.
 2008-03-23  Wu Yongwei  <wuyongwei@gmail.com>
 	Remove the English override for solidus: it is better treated in the
 	text breaking program (see ../breaktext/ for an example).
 	* linebreakdef.c (lb_prop_English): Remove the line for solidus.
 2008-03-16  Wu Yongwei  <wuyongwei@gmail.com>
 	Rename init_linebreak_prop_index to init_linebreak for future
 	safety; make visible certain functions that are potentially useful.
 	* linebreak.c (init_linebreak_prop_index): Rename to init_linebreak.
 	(get_next_char_t): Move to linebreakdef.h.
 	(get_next_char_utf8): Make non-static.
 	(get_next_char_utf16): Ditto.
 	(get_next_char_utf32): Ditto.
 	(set_linebreaks): Ditto.
 	* linebreak.h (init_linebreak_prop_index): Rename to init_linebreak.
 	(get_next_char_utf8): Add the function prototype.
 	(get_next_char_utf16): Ditto.
 	(get_next_char_utf32): Ditto.
 	* linebreakdef.h (get_next_char_t): Add the typedef.
 	(set_linebreaks): Add the function prototype.
 2008-03-16  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile (OLDGCC): Add support for GCC 2.95.3 (when OLDGCC=Y).
 2008-03-15  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c (set_linebreaks): Fix a bug that `==' was wrongly used
 	for `='.
 2008-03-05  Wu Yongwei  <wuyongwei@gmail.com>
 	Improve the performance by reducing the look-ups of the
 	language-specific line breaking properties array from the language
 	name (thanks to Nikolay Pultsin).
 	* linebreak.c (get_lb_prop_lang): New function.
 	(get_char_lb_class_lang): Change the second parameter from the
 	language name to the line breaking properties array.
 	(set_linebreaks): Look up the language-specific line breaking
 	properties array from the language name only once in one function
 	call.
 2008-03-03  Wu Yongwei  <wuyongwei@gmail.com>
 	Make minor adjustments in code and comments.
 	* linebreak.c: Adjust the doc comments.
 	(init_linebreak_prop_index): Modify a conditional to make it more
 	robust and consistent.
 	* linebreakdef.c (lb_prop_lang_map): Replace the pointer
 	lb_prop_default with NULL, since the value is never used.
 2008-03-03  Wu Yongwei  <wuyongwei@gmail.com>
 	Accelerate get_char_lb_class for invalid Unicode code points.
 	* linebreak.c (get_char_lb_class): Adjust the conditionals so that
 	getting the line breaking class for an invalid code point is much
 	faster, which requires the array of line breaking properties be
 	sorted.
 	* linebreakdef.h: Adjust a comment that the array of line break
 	properties must be sorted.
 2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
 	Change the values of enum BreakAction to more complete forms.
 	* linebreak.c (INDRCT_BRK): Rename to INDIRECT_BRK.
 	(CM_INDRCT_BRK): Rename to CM_INDIRECT_BRK.
 	(CM_PROHIBTD_BRK): Rename to CM_PROHIBITED_BRK.
 	(PROHIBTD_BRK): Rename to PROHIBITED_BRK.
 2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
 	Implement a two-stage search in get_char_lb_class_default to
 	accelerate the overall performance, especially for non-Latin
 	languages.
 	* linebreak.c (LINEBREAK_INDEX_SIZE): New constant macro.
 	(struct LineBreakPropertiesIndex): New struct.
 	(lb_prop_index): New static variable.
 	(init_linebreak_prop_index): New function.
 	(get_char_lb_class_default): New function.
 	(get_char_lb_class_lang): Use get_char_lb_class_default.
 	* linebreak.h: Detect C++ and add extern "C" guard if necessary.
 	(init_linebreak_prop_index): Add the prototype declaration.
 	* linebreakdef.h: Adjust a comment.
 2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
 	Split/refactor the code; add (doc) comments.
 	* Makefile (CFILES): Add linebreakdata.c and linebreakdef.c.
 	* linebreak.c: Add and adjust comments.
 	(linebreakdef.h): Add include file.
 	(linebreakdata.c): Remove include file.
 	(EOS): Remove (now in linebreakdef.h).
 	(enum LineBreakClass): Ditto.
 	(struct LineBreakProperties): Ditto.
 	(lbpEnglish): Remove (now in linebreakdef.c as lb_prop_English).
 	(lbpGerman): Remove (now in linebreakdef.c as lb_prop_German).
 	(lbpSpanish): Remove (now in linebreakdef.c as lb_prop_Spanish).
 	(lbpFrench): Remove (now in linebreakdef.c as lb_prop_French).
 	(lbpRussian): Remove (now in linebreakdef.c as lb_prop_Russian).
 	(lbpChinese): Remove (now in linebreakdef.c as lb_prop_Chinese).
 	(struct LineBreakPropertiesLang): Remove (now in linebreakdef.h).
 	(lbpLangs): Remove (now in linebreakdef.c as lb_prop_lang_map).
 	(get_next_char_utf16): Make sure memory access not go beyond len.
 	* linebreak.h: Add copyright information and adjust comments.
 	(stddef.h): Add include file.
 	* linebreakdata.c (linebreak.h): Add include file.
 	(linebreakdef.h): Add include file.
 	(lbpDefault): Make global and rename to lb_prop_default.
 	* linebreakdata2.tmpl: Add two include files, a comment line, and
 	remove `static'.
 	* linebreakdef.c: New file.
 	* linebreakdef.h: New file.
 2008-02-26  Wu Yongwei  <wuyongwei@gmail.com>
 	* linebreak.c (lbpSpanish): New array for Spanish-specific data.
 	(lbpLangs): Update the index array for Spanish.
 	(resolve_lb_class): Resolve AmbIguous class to IDeographic in
 	Chinese, Japanese, and Korean.
 2008-02-26  Wu Yongwei  <wuyongwei@gmail.com>
 	* Makefile (LineBreak.txt): Add new rule to retrieve it from the Web
 	if it is not already there.
 2008-02-23  Wu Yongwei  <wuyongwei@gmail.com>
 	Add files for linebreak.
 	* LineBreak1.sed: New file.
 	* LineBreak2.sed: New file.
 	* Makefile: New file.
 	* filter_dup.c: New file.
 	* linebreak.c: New file.
 	* linebreak.h: New file.
 	* linebreakdata.c: New file.
 	* linebreakdata1.tmpl: New file.
 	* linebreakdata2.tmpl: New file.
 	* linebreakdata3.tmpl: New file.
--- a/linebreak/linebreak/Doxyfile
+++ b/linebreak/linebreak/Doxyfile
--- a/linebreak/linebreak/LICENCE
+++ b/linebreak/linebreak/LICENCE
@ -0,0 +1,19 @@
 Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com>
 Copyright (C) 2012 Tom Hacohen <tom dot hacohen at samsung dot com>
 This software is provided 'as-is', without any express or implied
 warranty.  In no event will the author be held liable for any damages
 arising from the use of this software.
 Permission is granted to anyone to use this software for any purpose,
 including commercial applications, and to alter it and redistribute it
 freely, subject to the following restrictions:
 1. The origin of this software must not be misrepresented; you must not
   claim that you wrote the original software.  If you use this software
   in a product, an acknowledgement in the product documentation would
   be appreciated but is not required.
 2. Altered source versions must be plainly marked as such, and must not
   be misrepresented as being the original software.
 3. This notice may not be removed or altered from any source
   distribution.
--- a/linebreak/linebreak/LineBreak1.sed
+++ b/linebreak/linebreak/LineBreak1.sed
@ -0,0 +1 @@
 s/\(^[0-9A-F.]\{1,\};[A-Z][A-Z0-9]\) #.*/\1/p
--- a/linebreak/linebreak/LineBreak2.sed
+++ b/linebreak/linebreak/LineBreak2.sed
@ -0,0 +1,2 @@
 s/^\([0-9A-F]\{1,\}\);/\1..\1;/
 s/^\([0-9A-F]\{1,\}\)\.\.\([0-9A-F]\{1,\}\);\([A-Z][A-Z0-9]\)/	{ 0x\1, 0x\2, LBP_\3 },/
--- a/linebreak/linebreak/Makefile.am
+++ b/linebreak/linebreak/Makefile.am
@ -0,0 +1,63 @@
 #noinst_PROGRAMS = filter_dup
 include_HEADERS = linebreak.h linebreakdef.h wordbreak.h wordbreakdef.h
 lib_LTLIBRARIES = libunibreak.la
 pkgconfig_DATA  = libunibreak.pc
 pkgconfigdir    = ${libdir}/pkgconfig
 libunibreak_la_LDFLAGS = -no-undefined -version-info 1:0
 libunibreak_la_SOURCES = \
 	linebreak.c \
 	linebreakdata.c \
 	linebreakdef.c \
 	wordbreak.c
 EXTRA_DIST = \
 	LineBreak1.sed \
 	LineBreak2.sed \
 	linebreakdata1.tmpl \
 	linebreakdata2.tmpl \
 	linebreakdata3.tmpl \
 	wordbreakdata1.tmpl \
 	wordbreakdata2.tmpl \
 	wordbreakdata.c \
 	LICENCE \
 	Doxyfile \
 	Makefile.gcc \
 	Makefile.msvc \
 	doc \
 	sort_numeric_hex.py
 install-exec-hook:
 	rm -f ${libdir}/liblinebreak.a
 	${LN_S} ${libdir}/libunibreak.a ${libdir}/liblinebreak.a
 distclean-local:
 	rm -f LineBreak.txt WordBreakData.txt filter_dup${EXEEXT}
 doc:
 	cd ${top_srcdir} && doxygen
 LineBreak.txt:
 	wget http://unicode.org/Public/UNIDATA/LineBreak.txt
 WordBreakProperty.txt:
 	wget http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
 linebreakdata: ${builddir}/filter_dup LineBreak.txt
 	sed -n -f ${top_srcdir}/LineBreak1.sed LineBreak.txt > tmp.txt
 	sed -f ${top_srcdir}/LineBreak2.sed tmp.txt | ${builddir}/filter_dup > tmp.c
 	head -2 LineBreak.txt > tmp.txt
 	cat ${top_srcdir}/linebreakdata1.tmpl tmp.txt ${top_srcdir}/linebreakdata2.tmpl tmp.c ${top_srcdir}/linebreakdata3.tmpl > ${top_srcdir}/linebreakdata.c
 	rm tmp.txt tmp.c
 wordbreakdata: WordBreakProperty.txt
 	sed -E -n 's/(^[0-9A-F.]+)/\1/p' WordBreakProperty.txt > tmp2.txt
 	sed -E -i.bak 's/^([0-9A-F]+) +/\1..\1/' tmp2.txt
 	${top_srcdir}/sort_numeric_hex.py tmp2.txt > tmp.txt
 	rm tmp2.txt tmp2.txt.bak
 	sed -E -i.bak -n 's/^([0-9A-F]+)..([0-9A-F]+) *; *([A-Za-z]+).*/'$$'\t''{0x\1, 0x\2, WBP_\3},/p' tmp.txt 
 	echo "/* The content of this file is generated from:" > ${top_srcdir}/wordbreakdata.c
 	head -2 WordBreakProperty.txt >> ${top_srcdir}/wordbreakdata.c
 	echo "*/" >> ${top_srcdir}/wordbreakdata.c
 	cat ${top_srcdir}/wordbreakdata1.tmpl tmp.txt ${top_srcdir}/wordbreakdata2.tmpl >> ${top_srcdir}/wordbreakdata.c
 	rm tmp.txt tmp.txt.bak
--- a/linebreak/linebreak/Makefile.gcc
+++ b/linebreak/linebreak/Makefile.gcc
@ -0,0 +1,177 @@
 # Windows/Cygwin support
 ifdef windir
    WINDOWS := 1
    CYGWIN  := 0
 else
    ifdef WINDIR
        WINDOWS := 1
        CYGWIN  := 1
    else
        WINDOWS := 0
    endif
 endif
 ifeq ($(WINDOWS),1)
    EXEEXT := .exe
    DLLEXT := .dll
    DEVNUL := nul
    ifeq ($(CYGWIN),1)
        PATHSEP := /
    else
        PATHSEP := $(strip \ )
    endif
 else
    EXEEXT :=
    DLLEXT := .so
    DEVNUL := /dev/null
    PATHSEP := /
 endif
 CFG ?= Debug
 ifeq ($(CFG),Debug)
    all: debug
 else
    all: release
 endif
 OLDGCC ?= N
 DEBUG   := DebugDir
 RELEASE := ReleaseDir
 $(DEBUG)/%.o: %.c
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -c -o $@ $<
 $(RELEASE)/%.o: %.c
 	$(CC) $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -c -o $@ $<
 $(DEBUG)/%.o: %.cpp
 	$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -c -o $@ $<
 $(RELEASE)/%.o: %.cpp
 	$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -c -o $@ $<
 ifeq ($(OLDGCC),N)
 $(DEBUG)/%.dep: %.c
 	$(CC) -MM -MT $(patsubst %.dep,%.o,$@) $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -o $@ $<
 $(RELEASE)/%.dep: %.c
 	$(CC) -MM -MT $(patsubst %.dep,%.o,$@) $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -o $@ $<
 $(DEBUG)/%.dep: %.cpp
 	$(CXX) -MM -MT $(patsubst %.dep,%.o,$@) $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -o $@ $<
 $(RELEASE)/%.dep: %.cpp
 	$(CXX) -MM -MT $(patsubst %.dep,%.o,$@) $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -o $@ $<
 else
 $(DEBUG)/%.dep: %.c
 	$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(DEBUG)/!" > $@
 $(RELEASE)/%.dep: %.c
 	$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(RELEASE)/!" > $@
 $(DEBUG)/%.dep: %.cpp
 	$(CXX) -MM $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(DEBUG)/!" > $@
 $(RELEASE)/%.dep: %.cpp
 	$(CXX) -MM $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(RELEASE)/!" > $@
 endif
 CC  = gcc
 CXX = g++
 AR  = ar
 LD  = $(CXX) $(CXXFLAGS) $(TARGET_ARCH)
 INCLUDE  = -I. $(patsubst %,-I%,$(VPATH))
 CFLAGS   = -W -Wall $(INCLUDE)
 CXXFLAGS = $(CFLAGS)
 DBGFLAGS = -D_DEBUG -g
 RELFLAGS = -DNDEBUG -O2
 CPPFLAGS =
 ifeq ($(OLDGCC),N)
    CFLAGS += -fmessage-length=0
 endif
 HFILES   = $(wildcard $(patsubst -I%,%/*.h,$(INCLUDE)))
 OBJFILES = $(CFILES:.c=.o) $(CXXFILES:.cpp=.o)
 DEBUG_OBJS   = $(patsubst %.o,$(DEBUG)/%.o,$(OBJFILES))
 RELEASE_OBJS = $(patsubst %.o,$(RELEASE)/%.o,$(OBJFILES))
 DEBUG_DEPS   = $(patsubst %.o,%.dep,$(DEBUG_OBJS))
 RELEASE_DEPS = $(patsubst %.o,%.dep,$(RELEASE_OBJS))
 CFILES   := linebreak.c linebreakdata.c linebreakdef.c wordbreak.c
 CXXFILES :=
 LIBS :=
 TARGET         = liblinebreak.a
 DEBUG_TARGET   = $(patsubst %,$(DEBUG)/%,$(TARGET))
 RELEASE_TARGET = $(patsubst %,$(RELEASE)/%,$(TARGET))
 debug:   $(DEBUG) $(DEBUG_TARGET)
 release: $(RELEASE) $(RELEASE_TARGET)
 $(DEBUG):
 	mkdir $(DEBUG)
 $(RELEASE):
 	mkdir $(RELEASE)
 $(DEBUG_TARGET): $(DEBUG_DEPS) $(DEBUG_OBJS)
 	$(AR) -r $(DEBUG_TARGET) $(DEBUG_OBJS)
 $(RELEASE_TARGET): $(RELEASE_DEPS) $(RELEASE_OBJS)
 	$(AR) -r $(RELEASE_TARGET) $(RELEASE_OBJS)
 doc:
 	doxygen
 linebreakdata: filter_dup$(EXEEXT) LineBreak.txt
 	sed -n -f LineBreak1.sed LineBreak.txt > tmp.txt
 	sed -f LineBreak2.sed tmp.txt | .$(PATHSEP)filter_dup > tmp.c
 	head -2 LineBreak.txt > tmp.txt
 	cat linebreakdata1.tmpl tmp.txt linebreakdata2.tmpl tmp.c linebreakdata3.tmpl > linebreakdata.c
 	$(RM) tmp.txt tmp.c
 wordbreakdata: WordBreakProperty.txt
 	sed -E -n 's/(^[0-9A-F.]+)/\1/p' WordBreakProperty.txt > tmp2.txt
 	sed -E -i.bak 's/^([0-9A-F]+) +/\1..\1/' tmp2.txt
 	./sort_numeric_hex.py tmp2.txt > tmp.txt
 	rm tmp2.txt tmp2.txt.bak
 	sed -E -i.bak -n 's/^([0-9A-F]+)..([0-9A-F]+) *; *([A-Za-z]+).*/'$$'\t''{0x\1, 0x\2, WBP_\3},/p' tmp.txt 
 	echo "/* The content of this file is generated from:" > wordbreakdata.c
 	head -2 WordBreakProperty.txt >> wordbreakdata.c
 	echo "*/" >> wordbreakdata.c
 	cat wordbreakdata1.tmpl tmp.txt wordbreakdata2.tmpl >> wordbreakdata.c
 	rm tmp.txt tmp.txt.bak
 filter_dup$(EXEEXT): filter_dup.c
 	gcc -O2 -o filter_dup$(EXEEXT) $<
 LineBreak.txt:
 	wget http://unicode.org/Public/UNIDATA/LineBreak.txt
 WordBreakProperty.txt:
 	wget http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
 .PHONY: all debug release clean distclean doc linebreakdata wordbreakdata
 clean:
 	$(RM) $(DEBUG)/*.o $(DEBUG)/*.dep $(DEBUG_TARGET)
 	$(RM) $(RELEASE)/*.o $(RELEASE)/*.dep $(RELEASE_TARGET)
 distclean: clean
 	$(RM) $(DEBUG)/* $(RELEASE)/* filter_dup$(EXEEXT) tags LineBreak.txt
 	-rmdir $(DEBUG) 2> $(DEVNUL)
 	-rmdir $(RELEASE) 2> $(DEVNUL)
 -include $(wildcard $(DEBUG)/*.dep) $(wildcard $(RELEASE)/*.dep)
--- a/linebreak/linebreak/Makefile.msvc
+++ b/linebreak/linebreak/Makefile.msvc
@ -0,0 +1,189 @@
 # Makefile for Microsoft Visual C++ and NMAKE
 !IF "$(CFG)" == ""
 CFG=libunibreak - Win32 Debug
 !MESSAGE No configuration specified. Defaulting to libunibreak - Win32 Debug.
 !ENDIF 
 !IF "$(CFG)" != "libunibreak - Win32 Release" && "$(CFG)" != "libunibreak - Win32 Debug"
 !MESSAGE Invalid configuration "$(CFG)" specified.
 !MESSAGE You can specify a configuration when running NMAKE
 !MESSAGE by defining the macro CFG on the command line. For example:
 !MESSAGE 
 !MESSAGE NMAKE /f Makefile.msvc CFG="libunibreak - Win32 Debug"
 !MESSAGE 
 !MESSAGE Possible choices for configuration are:
 !MESSAGE 
 !MESSAGE "libunibreak - Win32 Release" (based on "Win32 (x86) Static Library")
 !MESSAGE "libunibreak - Win32 Debug" (based on "Win32 (x86) Static Library")
 !MESSAGE 
 !ERROR An invalid configuration is specified.
 !ENDIF 
 !IF "$(OS)" == "Windows_NT"
 NULL=
 !ELSE 
 NULL=nul
 !ENDIF 
 CPP=cl.exe
 RSC=rc.exe
 !IF  "$(CFG)" == "libunibreak - Win32 Release"
 OUTDIR=.\Release
 INTDIR=.\Release
 # Begin Custom Macros
 OutDir=.\Release
 # End Custom Macros
 ALL : "$(OUTDIR)\unibreak.lib"
 CLEAN :
 	-@erase "$(INTDIR)\linebreak.obj"
 	-@erase "$(INTDIR)\linebreakdata.obj"
 	-@erase "$(INTDIR)\linebreakdef.obj"
 	-@erase "$(INTDIR)\wordbreak.obj"
 	-@erase "$(INTDIR)\vc*.idb"
 	-@erase "$(OUTDIR)\unibreak.lib"
 "$(OUTDIR)" :
    if not exist "$(OUTDIR)/$(NULL)" mkdir "$(OUTDIR)"
 CPP_PROJ=/nologo /ML /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /Fo"$(INTDIR)\\" /Fd"$(INTDIR)\\" /FD /c 
 BSC32=bscmake.exe
 BSC32_FLAGS=/nologo /o"$(OUTDIR)\unibreak.bsc" 
 BSC32_SBRS= \
 LIB32=link.exe -lib
 LIB32_FLAGS=/nologo /out:"$(OUTDIR)\unibreak.lib" 
 LIB32_OBJS= \
 	"$(INTDIR)\linebreak.obj" \
 	"$(INTDIR)\linebreakdata.obj" \
 	"$(INTDIR)\linebreakdef.obj" \
 	"$(INTDIR)\wordbreak.obj"
 "$(OUTDIR)\unibreak.lib" : "$(OUTDIR)" $(DEF_FILE) $(LIB32_OBJS)
    $(LIB32) @<<
  $(LIB32_FLAGS) $(DEF_FLAGS) $(LIB32_OBJS)
 <<
 !ELSEIF  "$(CFG)" == "libunibreak - Win32 Debug"
 OUTDIR=.\Debug
 INTDIR=.\Debug
 # Begin Custom Macros
 OutDir=.\Debug
 # End Custom Macros
 ALL : "$(OUTDIR)\unibreak.lib"
 CLEAN :
 	-@erase "$(INTDIR)\linebreak.obj"
 	-@erase "$(INTDIR)\linebreakdata.obj"
 	-@erase "$(INTDIR)\linebreakdef.obj"
 	-@erase "$(INTDIR)\wordbreak.obj"
 	-@erase "$(INTDIR)\vc*.idb"
 	-@erase "$(INTDIR)\vc*.pdb"
 	-@erase "$(OUTDIR)\unibreak.lib"
 "$(OUTDIR)" :
    if not exist "$(OUTDIR)/$(NULL)" mkdir "$(OUTDIR)"
 CPP_PROJ=/nologo /MLd /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_MBCS" /D "_LIB" /Fo"$(INTDIR)\\" /Fd"$(INTDIR)\\" /FD /GZ  /c 
 BSC32=bscmake.exe
 BSC32_FLAGS=/nologo /o"$(OUTDIR)\unibreak.bsc" 
 BSC32_SBRS= \
 LIB32=link.exe -lib
 LIB32_FLAGS=/nologo /out:"$(OUTDIR)\unibreak.lib" 
 LIB32_OBJS= \
 	"$(INTDIR)\linebreak.obj" \
 	"$(INTDIR)\linebreakdata.obj" \
 	"$(INTDIR)\linebreakdef.obj" \
 	"$(INTDIR)\wordbreak.obj"
 "$(OUTDIR)\unibreak.lib" : "$(OUTDIR)" $(DEF_FILE) $(LIB32_OBJS)
    $(LIB32) @<<
  $(LIB32_FLAGS) $(DEF_FLAGS) $(LIB32_OBJS)
 <<
 !ENDIF 
 .c{$(INTDIR)}.obj::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .cpp{$(INTDIR)}.obj::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .cxx{$(INTDIR)}.obj::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .c{$(INTDIR)}.sbr::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .cpp{$(INTDIR)}.sbr::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .cxx{$(INTDIR)}.sbr::
   $(CPP) @<<
   $(CPP_PROJ) $< 
 <<
 .\linebreak.c : \
 	".\linebreak.h"\
 	".\linebreakdef.h"\
 .\linebreakdata.c : \
 	".\linebreak.h"\
 	".\linebreakdef.h"\
 .\linebreakdef.c : \
 	".\linebreak.h"\
 	".\linebreakdef.h"\
 .\wordbreak.c : \
 	".\linebreak.h"\
 	".\linebreakdef.h"\
 	".\wordbreak.h"\
 	".\wordbreakdef.h"\
 	".\wordbreakdata.c"\
 !IF "$(CFG)" == "libunibreak - Win32 Release" || "$(CFG)" == "libunibreak - Win32 Debug"
 SOURCE=.\linebreak.c
 "$(INTDIR)\linebreak.obj" : $(SOURCE) "$(INTDIR)"
 SOURCE=.\linebreakdata.c
 "$(INTDIR)\linebreakdata.obj" : $(SOURCE) "$(INTDIR)"
 SOURCE=.\linebreakdef.c
 "$(INTDIR)\linebreakdef.obj" : $(SOURCE) "$(INTDIR)"
 SOURCE=.\wordbreak.c
 "$(INTDIR)\wordbreak.obj" : $(SOURCE) "$(INTDIR)"
 !ENDIF 
--- a/linebreak/linebreak/NEWS
+++ b/linebreak/linebreak/NEWS
@ -0,0 +1,49 @@
 New in libunibreak 1.0
 - Add word breaking support
 - Change the library name to "libunibreak", while keeping maximum compatibility
 - Add pkg-config support
 New in liblinebreak 2.1
 - Update the data according to LineBreak-6.0.0.txt
 - Fix the bug that an assertion in code can fail if U+FFFC is
  encountered at the beginning of a line
 New in liblinebreak 2.0
 - Update the algorithm and data according to UAX #14-24 and
  LineBreak-5.2.0.txt
 - Rename some functions to reduce namespace pollution
 - Make Doxygen documentation better
 New in liblinebreak 1.2
 - Fix the bug that an assertion in code can fail if an invalid UTF-8 or
  UTF-16 sequence is encountered near the end of input
 - Remove the specialization of right single quotation mark as closing
  punctuation mark in English, French, and Spanish, because it can be
  used as apostrophe
 - Make Doxygen documentation better
 New in liblinebreak 1.1
 - Make get_lb_prop_lang static and not an exported symbol
 - Define is_line_breakable to alias to is_breakable
 - Declare get_next_char_utf* will be changed to lb_get_next_char_utf*
 - Move the declarations of get_next_char_utf* from linebreak.h to
  linebreakdef.h
 - Add the function documentation comments to the header files
 New in liblinebreak 1.0
 - Update the line breaking data according to UAX #14-22 and
  LineBreak-5.1.0.txt
 - Add autoconfiscation support (./configure, make, make install)
 - Add Makefile for MSVC
 First public release (0.9.6, or 20080421)
 - Implement line breaking algorithm according to UAX #14-19
 - Line breaking data is generated from LineBreak-5.0.0.txt
 - Makefile only supports GCC
--- a/linebreak/linebreak/README
+++ b/linebreak/linebreak/README
@ -0,0 +1,88 @@
                         L I B U N I B R E A K
                         =====================
 Overview
 --------
 This is the README file for libunibreak, an implementation of the line
 breaking and word breaking algorithms as described in Unicode
 Standard Annex 14 and Unicode Standard Annex 29, available at
         <URL:http://www.unicode.org/reports/tr14/tr14-26.html>
         <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 Check this URL for up-to-date information:
         <URL:http://vimgadgets.sourceforge.net/libunibreak/>
 Licence
 -------
 This library is released under an open-source licence, the zlib/libpng
 licence.  Please check the file LICENCE for details.
 Apart from using the algorithm, part of the code is derived from the
 data provided under
                  <URL:http://www.unicode.org/Public/>
 And the Unicode Terms of Use may apply:
              <URL:http://www.unicode.org/copyright.html>
 Installation
 ------------
 There are three ways to build the library:
 1) On *NIX systems supported by the autoconfiscation tools, do the
   normal
     ./configure
     make
     sudo make install
   to build and install both the dynamic and static libraries.  In
   addition, one may
   - type `make doc' to generate the doxygen documentation; or
   - type `make linebreakdata' to regenerate linebreakdata.c from
     LineBreak.txt.
   - type ‘make wordbreakdata’ to regenerate wordbreakdata.c from
     WordBreakProperty.txt.
 2) On systems where GCC and Binutils are supported, one can type
     cp -p Makefile.gcc Makefile
     make
   to build the static library.  In addition, one may
   - type `make debug' or `make release' to explicitly generate the
     debug or release build;
   - type `make doc' to generate the doxygen documentation; or
   - type `make linebreakdata' to regenerate linebreakdata.c from
     LineBreak.txt.
   - type ‘make wordbreakdata’ to regenerate wordbreakdata.c from
     WordBreakProperty.txt.
 3) On Windows, apart from using method 1 (Cygwin/MSYS) and method 2
   (MinGW), MSVC can also be used.  Type
     nmake -f Makefile.msvc
   to build the static library.  By default the debug release is built.
   To build the release version
     nmake -f Makefile.msvc CFG="libunibreak - Win32 Release"
 Documentation
 -------------
 Check the generated document doc/html/linebreak_8h.html and
 doc/html/wordbreak_8h.html in the downloaded file for the public
 interfaces exposed to applications.
 $Id: README,v 1.8 2012/08/11 06:55:18 adah Exp $
 vim:autoindent:expandtab:formatoptions=tcqlmn:textwidth=72:
--- a/linebreak/linebreak/bootstrap
+++ b/linebreak/linebreak/bootstrap
@ -0,0 +1,6 @@
 #! /bin/sh
 aclocal && \
 autoheader && \
 autoconf && \
 libtoolize && \
 automake --add-missing
--- a/linebreak/linebreak/configure.ac
+++ b/linebreak/linebreak/configure.ac
@ -0,0 +1,12 @@
 AC_PREREQ(2.57)
 AC_INIT([libunibreak],[1.0],[wuyongwei@gmail.com])
 AC_CONFIG_SRCDIR([linebreak.c])
 AC_CONFIG_HEADERS([config.h])
 AM_INIT_AUTOMAKE([foreign])
 AC_PROG_CC
 AC_PROG_LN_S
 AC_EXEEXT
 AM_PROG_LIBTOOL
 AC_CONFIG_FILES([Makefile])
 AC_OUTPUT([libunibreak.pc])
--- a/linebreak/linebreak/filter_dup.c
+++ b/linebreak/linebreak/filter_dup.c
@ -0,0 +1,47 @@
 #include <stdio.h>
 #include <string.h>
 int main()
 {
 	char s[80];
 	char beg[16];
 	char end[16];
 	char prop[16];
 	char lastbeg[16];
 	char lastend[16];
 	char lastprop[16];
 	lastprop[0] = 0;
 	for (;;)
 	{
 		if (fgets(s, sizeof s, stdin) == NULL)
 			break;
 		if (strstr(s, "LBP_") == NULL || strstr(s, "LBP_Undef") != NULL)
 		{
 			if (lastprop[0])
 			{
 				printf("\t{ %s %s %s },\n", lastbeg, lastend, lastprop);
 				lastprop[0] = 0;
 			}
 			printf("%s", s);
 			continue;
 		}
 		sscanf(s, "\t{ %s %s %s }", beg, end, prop);
 		/*printf("==>\t{ \"%s\" \"%s\" \"%s\" },\n", beg, end, prop);*/
 		if (lastprop[0] && strcmp(lastprop, prop) != 0)
 		{
 			printf("\t{ %s %s %s },\n", lastbeg, lastend, lastprop);
 			lastprop[0] = 0;
 		}
 		if (lastprop[0] == 0)
 		{
 			strcpy(lastbeg, beg);
 			strcpy(lastprop, prop);
 		}
 		strcpy(lastend, end);
 	}
 	if (lastprop[0])
 	{
 		printf("\t{ %s %s %s },\n", lastbeg, lastend, prop);
 	}
 	return 0;
 }
--- a/linebreak/linebreak/libunibreak.pc.in
+++ b/linebreak/linebreak/libunibreak.pc.in
@ -0,0 +1,11 @@
 libunibreak:
 prefix=@prefix@
 exec_prefix=@exec_prefix@
 libdir=@libdir@
 includedir=@includedir@
 Name: libunibreak
 Description: Library to implement Unicode algorithms for line and word breaking
 Version: @VERSION@
 Libs: -L${libdir} -lunibreak
 Cflags: -I${includedir}
--- a/linebreak/linebreak/linebreak.c
+++ b/linebreak/linebreak/linebreak.c
@ -0,0 +1,737 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 14 (UAX #14):
 *		<URL:http://www.unicode.org/reports/tr14/>
 *
 * When this library was designed, this annex was at Revision 19, for
 * Unicode 5.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
 *
 * This library has been updated according to Revision 26, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	linebreak.c
 *
 * Implementation of the line breaking algorithm as described in Unicode
 * Standard Annex 14.
 *
 * @version	2.1, 2011/05/07
 * @author	Wu Yongwei
 */
 #include <assert.h>
 #include <stddef.h>
 #include <string.h>
 #include "linebreak.h"
 #include "linebreakdef.h"
 /**
 * Size of the second-level index to the line breaking properties.
 */
 #define LINEBREAK_INDEX_SIZE 40
 /**
 * Version number of the library.
 */
 const int linebreak_version = LINEBREAK_VERSION;
 /**
 * Enumeration of break actions.  They are used in the break action
 * pair table below.
 */
 enum BreakAction
 {
 	DIR_BRK,		/**< Direct break opportunity */
 	IND_BRK,		/**< Indirect break opportunity */
 	CMI_BRK,		/**< Indirect break opportunity for combining marks */
 	CMP_BRK,		/**< Prohibited break for combining marks */
 	PRH_BRK			/**< Prohibited break */
 };
 /**
 * Break action pair table.  This is a direct mapping of Table 2 of
 * Unicode Standard Annex 14, Revision 24.
 */
 static enum BreakAction baTable[LBP_JT][LBP_JT] = {
 	{	/* OP */
 		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, CMP_BRK,
 		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK },
 	{	/* CL */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, PRH_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* CP */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, PRH_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* QU */
 		PRH_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
 		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
 	{	/* GL */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
 		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
 	{	/* NS */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* EX */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* SY */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* IS */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* PR */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, IND_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
 	{	/* PO */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* NU */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* AL */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* ID */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* IN */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* HY */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, DIR_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* BA */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, DIR_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* BB */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
 		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
 	{	/* B2 */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, PRH_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* ZW */
 		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, PRH_BRK, DIR_BRK,
 		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* CM */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
 	{	/* WJ */
 		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
 		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
 	{	/* H2 */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK },
 	{	/* H3 */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK },
 	{	/* JL */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK },
 	{	/* JV */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK },
 	{	/* JT */
 		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
 		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
 		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
 		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK }
 };
 /**
 * Struct for the second-level index to the line breaking properties.
 */
 struct LineBreakPropertiesIndex
 {
 	utf32_t end;					/**< End coding point */
 	struct LineBreakProperties *lbp;/**< Pointer to line breaking properties */
 };
 /**
 * Second-level index to the line breaking properties.
 */
 static struct LineBreakPropertiesIndex lb_prop_index[LINEBREAK_INDEX_SIZE] =
 {
 	{ 0xFFFFFFFF, lb_prop_default }
 };
 /**
 * Initializes the second-level index to the line breaking properties.
 * If it is not called, the performance of #get_char_lb_class_lang (and
 * thus the main functionality) can be pretty bad, especially for big
 * code points like those of Chinese.
 */
 void init_linebreak(void)
 {
 	size_t i;
 	size_t iPropDefault;
 	size_t len;
 	size_t step;
 	len = 0;
 	while (lb_prop_default[len].prop != LBP_Undefined)
 		++len;
 	step = len / LINEBREAK_INDEX_SIZE;
 	iPropDefault = 0;
 	for (i = 0; i < LINEBREAK_INDEX_SIZE; ++i)
 	{
 		lb_prop_index[i].lbp = lb_prop_default + iPropDefault;
 		iPropDefault += step;
 		lb_prop_index[i].end = lb_prop_default[iPropDefault].start - 1;
 	}
 	lb_prop_index[--i].end = 0xFFFFFFFF;
 }
 /**
 * Gets the language-specific line breaking properties.
 *
 * @param lang	language of the text
 * @return		pointer to the language-specific line breaking
 *				properties array if found; \c NULL otherwise
 */
 static struct LineBreakProperties *get_lb_prop_lang(const char *lang)
 {
 	struct LineBreakPropertiesLang *lbplIter;
 	if (lang != NULL)
 	{
 		for (lbplIter = lb_prop_lang_map; lbplIter->lang != NULL; ++lbplIter)
 		{
 			if (strncmp(lang, lbplIter->lang, lbplIter->namelen) == 0)
 			{
 				return lbplIter->lbp;
 			}
 		}
 	}
 	return NULL;
 }
 /**
 * Gets the line breaking class of a character from a line breaking
 * properties array.
 *
 * @param ch	character to check
 * @param lbp	pointer to the line breaking properties array
 * @return		the line breaking class if found; \c LBP_XX otherwise
 */
 static enum LineBreakClass get_char_lb_class(
 		utf32_t ch,
 		struct LineBreakProperties *lbp)
 {
 	while (lbp->prop != LBP_Undefined && ch >= lbp->start)
 	{
 		if (ch <= lbp->end)
 			return lbp->prop;
 		++lbp;
 	}
 	return LBP_XX;
 }
 /**
 * Gets the line breaking class of a character from the default line
 * breaking properties array.
 *
 * @param ch	character to check
 * @return		the line breaking class if found; \c LBP_XX otherwise
 */
 static enum LineBreakClass get_char_lb_class_default(
 		utf32_t ch)
 {
 	size_t i = 0;
 	while (ch > lb_prop_index[i].end)
 		++i;
 	assert(i < LINEBREAK_INDEX_SIZE);
 	return get_char_lb_class(ch, lb_prop_index[i].lbp);
 }
 /**
 * Gets the line breaking class of a character for a specific
 * language.  This function will check the language-specific data first,
 * and then the default data if there is no language-specific property
 * available for the character.
 *
 * @param ch		character to check
 * @param lbpLang	pointer to the language-specific line breaking
 *					properties array
 * @return			the line breaking class if found; \c LBP_XX
 *					otherwise
 */
 static enum LineBreakClass get_char_lb_class_lang(
 		utf32_t ch,
 		struct LineBreakProperties *lbpLang)
 {
 	enum LineBreakClass lbcResult;
 	/* Find the language-specific line breaking class for a character */
 	if (lbpLang)
 	{
 		lbcResult = get_char_lb_class(ch, lbpLang);
 		if (lbcResult != LBP_XX)
 			return lbcResult;
 	}
 	/* Find the generic language-specific line breaking class, if no
 	 * language context is provided, or language-specific data are not
 	 * available for the specific character in the specified language */
 	return get_char_lb_class_default(ch);
 }
 /**
 * Resolves the line breaking class for certain ambiguous or complicated
 * characters.  They are treated in a simplistic way in this
 * implementation.
 *
 * @param lbc	line breaking class to resolve
 * @param lang	language of the text
 * @return		the resolved line breaking class
 */
 static enum LineBreakClass resolve_lb_class(
 		enum LineBreakClass lbc,
 		const char *lang)
 {
 	switch (lbc)
 	{
 	case LBP_AI:
 		if (lang != NULL &&
 				(strncmp(lang, "zh", 2) == 0 ||	/* Chinese */
 				 strncmp(lang, "ja", 2) == 0 ||	/* Japanese */
 				 strncmp(lang, "ko", 2) == 0))	/* Korean */
 		{
 			return LBP_ID;
 		}
 		/* Fall through */
 	case LBP_SA:
 	case LBP_SG:
 	case LBP_XX:
 		return LBP_AL;
 	default:
 		return lbc;
 	}
 }
 /**
 * Gets the next Unicode character in a UTF-8 sequence.  The index will
 * be advanced to the next complete character, unless the end of string
 * is reached in the middle of a UTF-8 sequence.
 *
 * @param[in]     s		input UTF-8 string
 * @param[in]     len	length of the string in bytes
 * @param[in,out] ip	pointer to the index
 * @return				the Unicode character beginning at the index; or
 *						#EOS if end of input is encountered
 */
 utf32_t lb_get_next_char_utf8(
 		const utf8_t *s,
 		size_t len,
 		size_t *ip)
 {
 	utf8_t ch;
 	utf32_t res;
 	assert(*ip <= len);
 	if (*ip == len)
 		return EOS;
 	ch = s[*ip];
 	if (ch < 0xC2 || ch > 0xF4)
 	{	/* One-byte sequence, tail (should not occur), or invalid */
 		*ip += 1;
 		return ch;
 	}
 	else if (ch < 0xE0)
 	{	/* Two-byte sequence */
 		if (*ip + 2 > len)
 			return EOS;
 		res = ((ch & 0x1F) << 6) + (s[*ip + 1] & 0x3F);
 		*ip += 2;
 		return res;
 	}
 	else if (ch < 0xF0)
 	{	/* Three-byte sequence */
 		if (*ip + 3 > len)
 			return EOS;
 		res = ((ch & 0x0F) << 12) +
 			  ((s[*ip + 1] & 0x3F) << 6) +
 			  ((s[*ip + 2] & 0x3F));
 		*ip += 3;
 		return res;
 	}
 	else
 	{	/* Four-byte sequence */
 		if (*ip + 4 > len)
 			return EOS;
 		res = ((ch & 0x07) << 18) +
 			  ((s[*ip + 1] & 0x3F) << 12) +
 			  ((s[*ip + 2] & 0x3F) << 6) +
 			  ((s[*ip + 3] & 0x3F));
 		*ip += 4;
 		return res;
 	}
 }
 /**
 * Gets the next Unicode character in a UTF-16 sequence.  The index will
 * be advanced to the next complete character, unless the end of string
 * is reached in the middle of a UTF-16 surrogate pair.
 *
 * @param[in]     s		input UTF-16 string
 * @param[in]     len	length of the string in words
 * @param[in,out] ip	pointer to the index
 * @return				the Unicode character beginning at the index; or
 *						#EOS if end of input is encountered
 */
 utf32_t lb_get_next_char_utf16(
 		const utf16_t *s,
 		size_t len,
 		size_t *ip)
 {
 	utf16_t ch;
 	assert(*ip <= len);
 	if (*ip == len)
 		return EOS;
 	ch = s[(*ip)++];
 	if (ch < 0xD800 || ch > 0xDBFF)
 	{	/* If the character is not a high surrogate */
 		return ch;
 	}
 	if (*ip == len)
 	{	/* If the input ends here (an error) */
 		--(*ip);
 		return EOS;
 	}
 	if (s[*ip] < 0xDC00 || s[*ip] > 0xDFFF)
 	{	/* If the next character is not the low surrogate (an error) */
 		return ch;
 	}
 	/* Return the constructed character and advance the index again */
 	return (((utf32_t)ch & 0x3FF) << 10) + (s[(*ip)++] & 0x3FF) + 0x10000;
 }
 /**
 * Gets the next Unicode character in a UTF-32 sequence.  The index will
 * be advanced to the next character.
 *
 * @param[in]     s		input UTF-32 string
 * @param[in]     len	length of the string in dwords
 * @param[in,out] ip	pointer to the index
 * @return				the Unicode character beginning at the index; or
 *						#EOS if end of input is encountered
 */
 utf32_t lb_get_next_char_utf32(
 		const utf32_t *s,
 		size_t len,
 		size_t *ip)
 {
 	assert(*ip <= len);
 	if (*ip == len)
 		return EOS;
 	return s[(*ip)++];
 }
 /**
 * Sets the line breaking information for a generic input string.
 *
 * @param[in]  s			input string
 * @param[in]  len			length of the input
 * @param[in]  lang			language of the input
 * @param[out] brks			pointer to the output breaking data,
 *							containing #LINEBREAK_MUSTBREAK,
 *							#LINEBREAK_ALLOWBREAK, #LINEBREAK_NOBREAK,
 *							or #LINEBREAK_INSIDEACHAR
 * @param[in] get_next_char	function to get the next UTF-32 character
 */
 void set_linebreaks(
 		const void *s,
 		size_t len,
 		const char *lang,
 		char *brks,
 		get_next_char_t get_next_char)
 {
 	utf32_t ch;
 	enum LineBreakClass lbcCur;
 	enum LineBreakClass lbcNew;
 	enum LineBreakClass lbcLast;
 	struct LineBreakProperties *lbpLang;
 	size_t posCur = 0;
 	size_t posLast = 0;
 	--posLast;	/* To be ++'d later */
 	ch = get_next_char(s, len, &posCur);
 	if (ch == EOS)
 		return;
 	lbpLang = get_lb_prop_lang(lang);
 	lbcCur = resolve_lb_class(get_char_lb_class_lang(ch, lbpLang), lang);
 	lbcNew = LBP_Undefined;
 nextline:
 	/* Special treatment for the first character */
 	switch (lbcCur)
 	{
 	case LBP_LF:
 	case LBP_NL:
 		lbcCur = LBP_BK;
 		break;
 	case LBP_CB:
 		lbcCur = LBP_BA;
 		break;
 	case LBP_SP:
 		lbcCur = LBP_WJ;
 		break;
 	default:
 		break;
 	}
 	/* Process a line till an explicit break or end of string */
 	for (;;)
 	{
 		for (++posLast; posLast < posCur - 1; ++posLast)
 		{
 			brks[posLast] = LINEBREAK_INSIDEACHAR;
 		}
 		assert(posLast == posCur - 1);
 		lbcLast = lbcNew;
 		ch = get_next_char(s, len, &posCur);
 		if (ch == EOS)
 			break;
 		lbcNew = get_char_lb_class_lang(ch, lbpLang);
 		if (lbcCur == LBP_BK || (lbcCur == LBP_CR && lbcNew != LBP_LF))
 		{
 			brks[posLast] = LINEBREAK_MUSTBREAK;
 			lbcCur = resolve_lb_class(lbcNew, lang);
 			goto nextline;
 		}
 		switch (lbcNew)
 		{
 		case LBP_SP:
 			brks[posLast] = LINEBREAK_NOBREAK;
 			continue;
 		case LBP_BK:
 		case LBP_LF:
 		case LBP_NL:
 			brks[posLast] = LINEBREAK_NOBREAK;
 			lbcCur = LBP_BK;
 			continue;
 		case LBP_CR:
 			brks[posLast] = LINEBREAK_NOBREAK;
 			lbcCur = LBP_CR;
 			continue;
 		case LBP_CB:
 			brks[posLast] = LINEBREAK_ALLOWBREAK;
 			lbcCur = LBP_BA;
 			continue;
 		default:
 			break;
 		}
 		lbcNew = resolve_lb_class(lbcNew, lang);
 		assert(lbcCur <= LBP_JT);
 		assert(lbcNew <= LBP_JT);
 		switch (baTable[lbcCur - 1][lbcNew - 1])
 		{
 		case DIR_BRK:
 			brks[posLast] = LINEBREAK_ALLOWBREAK;
 			break;
 		case CMI_BRK:
 		case IND_BRK:
 			if (lbcLast == LBP_SP)
 			{
 				brks[posLast] = LINEBREAK_ALLOWBREAK;
 			}
 			else
 			{
 				brks[posLast] = LINEBREAK_NOBREAK;
 			}
 			break;
 		case CMP_BRK:
 			brks[posLast] = LINEBREAK_NOBREAK;
 			if (lbcLast != LBP_SP)
 				continue;
 			break;
 		case PRH_BRK:
 			brks[posLast] = LINEBREAK_NOBREAK;
 			break;
 		}
 		lbcCur = lbcNew;
 	}
 	assert(posLast == posCur - 1 && posCur <= len);
 	/* Break after the last character */
 	brks[posLast] = LINEBREAK_MUSTBREAK;
 	/* When the input contains incomplete sequences */
 	while (posCur < len)
 	{
 		brks[posCur++] = LINEBREAK_INSIDEACHAR;
 	}
 }
 /**
 * Sets the line breaking information for a UTF-8 input string.
 *
 * @param[in]  s	input UTF-8 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
 *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
 */
 void set_linebreaks_utf8(
 		const utf8_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_linebreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf8);
 }
 /**
 * Sets the line breaking information for a UTF-16 input string.
 *
 * @param[in]  s	input UTF-16 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
 *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
 */
 void set_linebreaks_utf16(
 		const utf16_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_linebreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf16);
 }
 /**
 * Sets the line breaking information for a UTF-32 input string.
 *
 * @param[in]  s	input UTF-32 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
 *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
 */
 void set_linebreaks_utf32(
 		const utf32_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_linebreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf32);
 }
 /**
 * Tells whether a line break can occur between two Unicode characters.
 * This is a wrapper function to expose a simple interface.  Generally
 * speaking, it is better to use #set_linebreaks_utf32 instead, since
 * complicated cases involving combining marks, spaces, etc. cannot be
 * correctly processed.
 *
 * @param char1 the first Unicode character
 * @param char2 the second Unicode character
 * @param lang  language of the input
 * @return      one of #LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
 *				#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
 */
 int is_line_breakable(
 		utf32_t char1,
 		utf32_t char2,
 		const char* lang)
 {
 	utf32_t s[2];
 	char brks[2];
 	s[0] = char1;
 	s[1] = char2;
 	set_linebreaks_utf32(s, 2, lang, brks);
 	return brks[0];
 }
--- a/linebreak/linebreak/linebreak.h
+++ b/linebreak/linebreak/linebreak.h
@ -0,0 +1,87 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 14 (UAX #14):
 *		<URL:http://www.unicode.org/reports/tr14/>
 *
 * When this library was designed, this annex was at Revision 19, for
 * Unicode 5.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
 *
 * This library has been updated according to Revision 26, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	linebreak.h
 *
 * Header file for the line breaking algorithm.
 *
 * @version	2.1, 2011/05/07
 * @author	Wu Yongwei
 */
 #ifndef LINEBREAK_H
 #define LINEBREAK_H
 #include <stddef.h>
 #ifdef __cplusplus
 extern "C" {
 #endif
 #define LINEBREAK_VERSION	0x0201	/**< Version of the library linebreak */
 extern const int linebreak_version;
 #ifndef LINEBREAK_UTF_TYPES_DEFINED
 #define LINEBREAK_UTF_TYPES_DEFINED
 typedef unsigned char	utf8_t;		/**< Type for UTF-8 data points */
 typedef unsigned short	utf16_t;	/**< Type for UTF-16 data points */
 typedef unsigned int	utf32_t;	/**< Type for UTF-32 data points */
 #endif
 #define LINEBREAK_MUSTBREAK		0	/**< Break is mandatory */
 #define LINEBREAK_ALLOWBREAK	1	/**< Break is allowed */
 #define LINEBREAK_NOBREAK		2	/**< No break is possible */
 #define LINEBREAK_INSIDEACHAR	3	/**< A UTF-8/16 sequence is unfinished */
 void init_linebreak(void);
 void set_linebreaks_utf8(
 		const utf8_t *s, size_t len, const char* lang, char *brks);
 void set_linebreaks_utf16(
 		const utf16_t *s, size_t len, const char* lang, char *brks);
 void set_linebreaks_utf32(
 		const utf32_t *s, size_t len, const char* lang, char *brks);
 int is_line_breakable(utf32_t char1, utf32_t char2, const char* lang);
 #ifdef __cplusplus
 }
 #endif
 #endif /* LINEBREAK_H */
--- a/linebreak/linebreak/linebreakdata.c
+++ b/linebreak/linebreak/linebreakdata.c
--- a/linebreak/linebreak/linebreakdata1.tmpl
+++ b/linebreak/linebreak/linebreakdata1.tmpl
@ -0,0 +1 @@
 /* The content of this file is generated from:
--- a/linebreak/linebreak/linebreakdata2.tmpl
+++ b/linebreak/linebreak/linebreakdata2.tmpl
@ -0,0 +1,7 @@
 */
 #include "linebreak.h"
 #include "linebreakdef.h"
 /** Default line breaking properties as from the Unicode Web site. */
 struct LineBreakProperties lb_prop_default[] = {
--- a/linebreak/linebreak/linebreakdata3.tmpl
+++ b/linebreak/linebreak/linebreakdata3.tmpl
@ -0,0 +1,2 @@
 	{ 0xFFFFFFFF, 0xFFFFFFFF, LBP_Undefined }
 };
--- a/linebreak/linebreak/linebreakdef.c
+++ b/linebreak/linebreak/linebreakdef.c
@ -0,0 +1,139 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 14 (UAX #14):
 *		<URL:http://www.unicode.org/reports/tr14/>
 *
 * When this library was designed, this annex was at Revision 19, for
 * Unicode 5.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
 *
 * This library has been updated according to Revision 26, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	linebreakdef.c
 *
 * Definition of language-specific data.
 *
 * @version	2.1, 2011/05/07
 * @author	Wu Yongwei
 */
 #include "linebreak.h"
 #include "linebreakdef.h"
 /**
 * English-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_English[] = {
 	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
 	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
 	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * German-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_German[] = {
 	{ 0x00AB, 0x00AB, LBP_CL },	/* Left double angle quotation mark: closing */
 	{ 0x00BB, 0x00BB, LBP_OP },	/* Right double angle quotation mark: opening */
 	{ 0x2018, 0x2018, LBP_CL },	/* Left single quotation mark: closing */
 	{ 0x201C, 0x201C, LBP_CL },	/* Left double quotation mark: closing */
 	{ 0x2039, 0x2039, LBP_CL },	/* Left single angle quotation mark: closing */
 	{ 0x203A, 0x203A, LBP_OP },	/* Right single angle quotation mark: opening */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * Spanish-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_Spanish[] = {
 	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
 	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
 	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
 	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
 	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
 	{ 0x2039, 0x2039, LBP_OP },	/* Left single angle quotation mark: opening */
 	{ 0x203A, 0x203A, LBP_CL },	/* Right single angle quotation mark: closing */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * French-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_French[] = {
 	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
 	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
 	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
 	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
 	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
 	{ 0x2039, 0x2039, LBP_OP },	/* Left single angle quotation mark: opening */
 	{ 0x203A, 0x203A, LBP_CL },	/* Right single angle quotation mark: closing */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * Russian-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_Russian[] = {
 	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
 	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
 	{ 0x201C, 0x201C, LBP_CL },	/* Left double quotation mark: closing */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * Chinese-specifc data over the default Unicode rules.
 */
 static struct LineBreakProperties lb_prop_Chinese[] = {
 	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
 	{ 0x2019, 0x2019, LBP_CL },	/* Right single quotation mark: closing */
 	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
 	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
 	{ 0, 0, LBP_Undefined }
 };
 /**
 * Association data of language-specific line breaking properties with
 * language names.  This is the definition for the static data in this
 * file.  If you want more flexibility, or do not need the data here,
 * you may want to redefine \e lb_prop_lang_map in your C source file.
 */
 struct LineBreakPropertiesLang lb_prop_lang_map[] = {
 	{ "en", 2, lb_prop_English },
 	{ "de", 2, lb_prop_German },
 	{ "es", 2, lb_prop_Spanish },
 	{ "fr", 2, lb_prop_French },
 	{ "ru", 2, lb_prop_Russian },
 	{ "zh", 2, lb_prop_Chinese },
 	{ NULL, 0, NULL }
 };
--- a/linebreak/linebreak/linebreakdef.h
+++ b/linebreak/linebreak/linebreakdef.h
@ -0,0 +1,149 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 14 (UAX #14):
 *		<URL:http://www.unicode.org/reports/tr14/>
 *
 * When this library was designed, this annex was at Revision 19, for
 * Unicode 5.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
 *
 * This library has been updated according to Revision 26, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	linebreakdef.h
 *
 * Definitions of internal data structures, declarations of global
 * variables, and function prototypes for the line breaking algorithm.
 *
 * @version	2.1, 2011/05/07
 * @author	Wu Yongwei
 */
 /**
 * Constant value to mark the end of string.  It is not a valid Unicode
 * character.
 */
 #define EOS 0xFFFF
 /**
 * Line break classes.  This is a direct mapping of Table 1 of Unicode
 * Standard Annex 14, Revision 26.
 */
 enum LineBreakClass
 {
 	/* This is used to signal an error condition. */
 	LBP_Undefined,	/**< Undefined */
 	/* The following break classes are treated in the pair table. */
 	LBP_OP,			/**< Opening punctuation */
 	LBP_CL,			/**< Closing punctuation */
 	LBP_CP,			/**< Closing parenthesis */
 	LBP_QU,			/**< Ambiguous quotation */
 	LBP_GL,			/**< Glue */
 	LBP_NS,			/**< Non-starters */
 	LBP_EX,			/**< Exclamation/Interrogation */
 	LBP_SY,			/**< Symbols allowing break after */
 	LBP_IS,			/**< Infix separator */
 	LBP_PR,			/**< Prefix */
 	LBP_PO,			/**< Postfix */
 	LBP_NU,			/**< Numeric */
 	LBP_AL,			/**< Alphabetic */
 	LBP_ID,			/**< Ideographic */
 	LBP_IN,			/**< Inseparable characters */
 	LBP_HY,			/**< Hyphen */
 	LBP_BA,			/**< Break after */
 	LBP_BB,			/**< Break before */
 	LBP_B2,			/**< Break on either side (but not pair) */
 	LBP_ZW,			/**< Zero-width space */
 	LBP_CM,			/**< Combining marks */
 	LBP_WJ,			/**< Word joiner */
 	LBP_H2,			/**< Hangul LV */
 	LBP_H3,			/**< Hangul LVT */
 	LBP_JL,			/**< Hangul L Jamo */
 	LBP_JV,			/**< Hangul V Jamo */
 	LBP_JT,			/**< Hangul T Jamo */
 	/* The following break classes are not treated in the pair table */
 	LBP_AI,			/**< Ambiguous (alphabetic or ideograph) */
 	LBP_BK,			/**< Break (mandatory) */
 	LBP_CB,			/**< Contingent break */
 	LBP_CR,			/**< Carriage return */
 	LBP_LF,			/**< Line feed */
 	LBP_NL,			/**< Next line */
 	LBP_SA,			/**< South-East Asian */
 	LBP_SG,			/**< Surrogates */
 	LBP_SP,			/**< Space */
 	LBP_XX			/**< Unknown */
 };
 /**
 * Struct for entries of line break properties.  The array of the
 * entries \e must be sorted.
 */
 struct LineBreakProperties
 {
 	utf32_t start;				/**< Starting coding point */
 	utf32_t end;				/**< End coding point */
 	enum LineBreakClass prop;	/**< The line breaking property */
 };
 /**
 * Struct for association of language-specific line breaking properties
 * with language names.
 */
 struct LineBreakPropertiesLang
 {
 	const char *lang;					/**< Language name */
 	size_t namelen;						/**< Length of name to match */
 	struct LineBreakProperties *lbp;	/**< Pointer to associated data */
 };
 /**
 * Abstract function interface for #lb_get_next_char_utf8,
 * #lb_get_next_char_utf16, and #lb_get_next_char_utf32.
 */
 typedef utf32_t (*get_next_char_t)(const void *, size_t, size_t *);
 /* Declarations */
 extern struct LineBreakProperties lb_prop_default[];
 extern struct LineBreakPropertiesLang lb_prop_lang_map[];
 /* Function Prototype */
 utf32_t lb_get_next_char_utf8(const utf8_t *s, size_t len, size_t *ip);
 utf32_t lb_get_next_char_utf16(const utf16_t *s, size_t len, size_t *ip);
 utf32_t lb_get_next_char_utf32(const utf32_t *s, size_t len, size_t *ip);
 void set_linebreaks(
 		const void *s,
 		size_t len,
 		const char *lang,
 		char *brks,
 		get_next_char_t get_next_char);
--- a/linebreak/linebreak/purge
+++ b/linebreak/linebreak/purge
@ -0,0 +1,2 @@
 #! /bin/sh
 rm -rf Makefile.in aclocal.m4 autom4te.cache/ config.guess config.h.in config.sub configure depcomp doc/ install-sh ltmain.sh missing
--- a/linebreak/linebreak/sort_numeric_hex.py
+++ b/linebreak/linebreak/sort_numeric_hex.py
@ -0,0 +1,6 @@
 #!/usr/bin/env python
 import sys
 lines = open(sys.argv[1]).readlines()
 lines_out = sorted(lines, key=lambda line: int(line.split("..")[0], 16))
 map(sys.stdout.write, lines_out)
--- a/linebreak/linebreak/wordbreak.c
+++ b/linebreak/linebreak/wordbreak.c
@ -0,0 +1,437 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 29 (UAX #29):
 *		<URL:http://unicode.org/reports/tr29>
 *
 * When this library was designed, this annex was at Revision 17, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	wordbreak.c
 *
 * Implementation of the word breaking algorithm as described in Unicode
 * Standard Annex 29.
 *
 * @version	2.2, 2012/02/04
 * @author	Tom Hacohen
 */
 #include <assert.h>
 #include <stddef.h>
 #include <string.h>
 #include "linebreak.h"
 #include "linebreakdef.h"
 #include "wordbreak.h"
 #include "wordbreakdata.c"
 #define ARRAY_LEN(x) (sizeof(x) / sizeof(x[0]))
 /**
 * Initializes the wordbreak internals.  It currently does nothing, but
 * it may in the future.
 */
 void init_wordbreak(void)
 {
 }
 /**
 * Gets the word breaking class of a character.
 *
 * @param ch	character to check
 * @param wbp	pointer to the wbp breaking properties array
 * @param len	size of the wbp array in number of items
 * @return		the word breaking class if found; \c WBP_Any otherwise
 */
 static enum WordBreakClass get_char_wb_class(
 		utf32_t ch,
 		struct WordBreakProperties *wbp,
 		size_t len)
 {
 	int min = 0;
 	int max = len - 1;
 	int mid;
 	do
 	{
 		mid = (min + max) / 2;
 		if (ch < wbp[mid].start)
 			max = mid - 1;
 		else if (ch > wbp[mid].end)
 			min = mid + 1;
 		else
 			return wbp[mid].prop;
 	}
 	while (min <= max);
 	return WBP_Any;
 }
 /**
 * Sets the word break types to a specific value in a range.
 *
 * It sets the inside chars to #WORDBREAK_INSIDEACHAR and the rest to brkType.
 * Assumes \a brks is initialized - all the cells with #WORDBREAK_NOBREAK are
 * cells that we really don't want to break after.
 *
 * @param[in]  s			input string
 * @param[out] brks			breaks array to fill
 * @param[in]  posStart		start position
 * @param[in]  posEnd		end position (exclusive)
 * @param[in]  len			length of the string
 * @param[in]  brkType		breaks type to use
 * @param[in] get_next_char	function to get the next UTF-32 character
 */
 static void set_brks_to(
 		const void *s,
 		char *brks,
 		size_t posStart,
 		size_t posEnd,
 		size_t len,
 		char brkType,
 		get_next_char_t get_next_char)
 {
 	size_t posNext = posStart;
 	while (posNext < posEnd)
 	{
 		utf32_t ch;
 		ch = get_next_char(s, len, &posNext);
 		assert(ch != EOS);
 		for (; posStart < posNext - 1; ++posStart)
 			brks[posStart] = WORDBREAK_INSIDEACHAR;
 		assert(posStart == posNext - 1);
 		/* Only set it if we haven't set it not to break before. */
 		if (brks[posStart] != WORDBREAK_NOBREAK)
 			brks[posStart] = brkType;
 		posStart = posNext;
 	}
 }
 /* Checks to see if the class is newline, CR, or LF (rules WB3a and b). */
 #define IS_WB3ab(cls) ((cls == WBP_Newline) || (cls == WBP_CR) || \
 					   (cls == WBP_LF))
 /**
 * Sets the word breaking information for a generic input string.
 *
 * @param[in]  s			input string
 * @param[in]  len			length of the input
 * @param[in]  lang			language of the input
 * @param[out] brks			pointer to the output breaking data, containing
 *							#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
 *							#WORDBREAK_INSIDEACHAR
 * @param[in] get_next_char	function to get the next UTF-32 character
 */
 static void set_wordbreaks(
 		const void *s,
 		size_t len,
 		const char *lang,
 		char *brks,
 		get_next_char_t get_next_char)
 {
 	enum WordBreakClass wbcLast = WBP_Undefined;
 	/* wbcSeqStart is the class that started the current sequence.
 	 * WBP_Undefined is a special case that means "sot".
 	 * This value is the class that is at the start of the current rule
 	 * matching sequence. For example, in case of Numeric+MidNum+Numeric
 	 * it'll be Numeric all the way.
 	 */
 	enum WordBreakClass wbcSeqStart = WBP_Undefined;
 	utf32_t ch;
 	size_t posNext = 0;
 	size_t posCur = 0;
 	size_t posLast = 0;
 	/* TODO: Language-specific specialization. */
 	(void) lang;
 	/* Init brks. */
 	memset(brks, WORDBREAK_BREAK, len);
 	ch = get_next_char(s, len, &posNext);
 	while (ch != EOS)
 	{
 		enum WordBreakClass wbcCur;
 		wbcCur = get_char_wb_class(ch, wb_prop_default,
 								   ARRAY_LEN(wb_prop_default));
 		switch (wbcCur)
 		{
 	    case WBP_CR:
 			/* WB3b */
 			set_brks_to(s, brks, posLast, posCur, len,
 						WORDBREAK_BREAK, get_next_char);
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    case WBP_LF:
 			if (wbcSeqStart == WBP_CR) /* WB3 */
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_NOBREAK, get_next_char);
 				wbcSeqStart = wbcCur;
 				posLast = posCur;
 				break;
 			}
 			/* Fall off */
 	    case WBP_Newline:
 			/* WB3a,3b */
 			set_brks_to(s, brks, posLast, posCur, len,
 						WORDBREAK_BREAK, get_next_char);
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    case WBP_Extend:
 	    case WBP_Format:
 			/* WB4 - If not the first char/after a newline (WB3a,3b), skip
 			 * this class, set it to be the same as the prev, and mark
 			 * brks not to break before them. */
 			if ((wbcSeqStart == WBP_Undefined) || IS_WB3ab(wbcSeqStart))
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 				wbcSeqStart = wbcCur;
 			}
 			else
 			{
 				/* It's surely not the first */
 				brks[posCur - 1] = WORDBREAK_NOBREAK;
 				/* "inherit" the previous class. */
 				wbcCur = wbcLast;
 			}
 			break;
 	    case WBP_Katakana:
 			if ((wbcSeqStart == WBP_Katakana) || /* WB13 */
 					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_NOBREAK, get_next_char);
 			}
 			/* No rule found, reset */
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 			}
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    case WBP_ALetter:
 			if ((wbcSeqStart == WBP_ALetter) || /* WB5,6,7 */
 					(wbcLast == WBP_Numeric) || /* WB10 */
 					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_NOBREAK, get_next_char);
 			}
 			/* No rule found, reset */
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 			}
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    case WBP_MidNumLet:
 			if ((wbcLast == WBP_ALetter) || /* WB6,7 */
 					(wbcLast == WBP_Numeric)) /* WB11,12 */
 			{
 				/* Go on */
 			}
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 				wbcSeqStart = wbcCur;
 				posLast = posCur;
 			}
 			break;
 	    case WBP_MidLetter:
 			if (wbcLast == WBP_ALetter) /* WB6,7 */
 			{
 				/* Go on */
 			}
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 				wbcSeqStart = wbcCur;
 				posLast = posCur;
 			}
 			break;
 	    case WBP_MidNum:
 			if (wbcLast == WBP_Numeric) /* WB11,12 */
 			{
 				/* Go on */
 			}
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 				wbcSeqStart = wbcCur;
 				posLast = posCur;
 			}
 			break;
 	    case WBP_Numeric:
 			if ((wbcSeqStart == WBP_Numeric) || /* WB8,11,12 */
 					(wbcLast == WBP_ALetter) || /* WB9 */
 					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_NOBREAK, get_next_char);
 			}
 			/* No rule found, reset */
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 			}
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    case WBP_ExtendNumLet:
 			/* WB13a,13b */
 			if ((wbcSeqStart == wbcLast) &&
 				((wbcLast == WBP_ALetter) ||
 				 (wbcLast == WBP_Numeric) ||
 				 (wbcLast == WBP_Katakana) ||
 				 (wbcLast == WBP_ExtendNumLet)))
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_NOBREAK, get_next_char);
 			}
 			/* No rule found, reset */
 			else
 			{
 				set_brks_to(s, brks, posLast, posCur, len,
 							WORDBREAK_BREAK, get_next_char);
 			}
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 		 case WBP_Any:
 			/* Allow breaks and reset */
 			set_brks_to(s, brks, posLast, posCur, len,
 						WORDBREAK_BREAK, get_next_char);
 			wbcSeqStart = wbcCur;
 			posLast = posCur;
 			break;
 	    default:
 			/* Error, should never get here! */
 			assert(0);
 			break;
 		}
 		wbcLast = wbcCur;
 		posCur = posNext;
 		ch = get_next_char(s, len, &posNext);
    }
 	/* WB2 */
 	set_brks_to(s, brks, posLast, posNext, len,
 				WORDBREAK_BREAK, get_next_char);
 }
 /**
 * Sets the word breaking information for a UTF-8 input string.
 *
 * @param[in]  s	input UTF-8 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
 *					#WORDBREAK_INSIDEACHAR
 */
 void set_wordbreaks_utf8(
 		const utf8_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_wordbreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf8);
 }
 /**
 * Sets the word breaking information for a UTF-16 input string.
 *
 * @param[in]  s	input UTF-16 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
 *					#WORDBREAK_INSIDEACHAR
 */
 void set_wordbreaks_utf16(
 		const utf16_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_wordbreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf16);
 }
 /**
 * Sets the word breaking information for a UTF-32 input string.
 *
 * @param[in]  s	input UTF-32 string
 * @param[in]  len	length of the input
 * @param[in]  lang	language of the input
 * @param[out] brks	pointer to the output breaking data, containing
 *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
 *					#WORDBREAK_INSIDEACHAR
 */
 void set_wordbreaks_utf32(
 		const utf32_t *s,
 		size_t len,
 		const char *lang,
 		char *brks)
 {
 	set_wordbreaks(s, len, lang, brks,
 				   (get_next_char_t)lb_get_next_char_utf32);
 }
--- a/linebreak/linebreak/wordbreak.h
+++ b/linebreak/linebreak/wordbreak.h
@ -0,0 +1,72 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 29 (UAX #29):
 *		<URL:http://unicode.org/reports/tr29>
 *
 * When this library was designed, this annex was at Revision 17, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	wordbreak.h
 *
 * Header file for the word breaking (segmentation) algorithm.
 *
 * @version	2.2, 2012/02/04
 * @author	Tom Hacohen
 */
 #ifndef WORDBREAK_H
 #define WORDBREAK_H
 #include <stddef.h>
 #include "linebreak.h"
 #ifdef __cplusplus
 extern "C" {
 #endif
 #define WORDBREAK_BREAK			0	/**< Break is allowed */
 #define WORDBREAK_NOBREAK		1	/**< No break is allowed */
 #define WORDBREAK_INSIDEACHAR	2	/**< A UTF-8/16 sequence is unfinished */
 void init_wordbreak(void);
 void set_wordbreaks_utf8(
 		const utf8_t *s, size_t len, const char* lang, char *brks);
 void set_wordbreaks_utf16(
 		const utf16_t *s, size_t len, const char* lang, char *brks);
 void set_wordbreaks_utf32(
 		const utf32_t *s, size_t len, const char* lang, char *brks);
 #ifdef __cplusplus
 }
 #endif
 #endif
--- a/linebreak/linebreak/wordbreakdata.c
+++ b/linebreak/linebreak/wordbreakdata.c
@ -0,0 +1,860 @@
 /* The content of this file is generated from:
 # WordBreakProperty-6.0.0.txt
 # Date: 2010-08-19, 00:48:48 GMT [MD]
 */
 #include "linebreak.h"
 #include "wordbreakdef.h"
 static struct WordBreakProperties wb_prop_default[] = {
 	{0x000A, 0x000A, WBP_LF},
 	{0x000B, 0x000C, WBP_Newline},
 	{0x000D, 0x000D, WBP_CR},
 	{0x0027, 0x0027, WBP_MidNumLet},
 	{0x002C, 0x002C, WBP_MidNum},
 	{0x002E, 0x002E, WBP_MidNumLet},
 	{0x0030, 0x0039, WBP_Numeric},
 	{0x003A, 0x003A, WBP_MidLetter},
 	{0x003B, 0x003B, WBP_MidNum},
 	{0x0041, 0x005A, WBP_ALetter},
 	{0x005F, 0x005F, WBP_ExtendNumLet},
 	{0x0061, 0x007A, WBP_ALetter},
 	{0x0085, 0x0085, WBP_Newline},
 	{0x00AA, 0x00AA, WBP_ALetter},
 	{0x00AD, 0x00AD, WBP_Format},
 	{0x00B5, 0x00B5, WBP_ALetter},
 	{0x00B7, 0x00B7, WBP_MidLetter},
 	{0x00BA, 0x00BA, WBP_ALetter},
 	{0x00C0, 0x00D6, WBP_ALetter},
 	{0x00D8, 0x00F6, WBP_ALetter},
 	{0x00F8, 0x01BA, WBP_ALetter},
 	{0x01BB, 0x01BB, WBP_ALetter},
 	{0x01BC, 0x01BF, WBP_ALetter},
 	{0x01C0, 0x01C3, WBP_ALetter},
 	{0x01C4, 0x0293, WBP_ALetter},
 	{0x0294, 0x0294, WBP_ALetter},
 	{0x0295, 0x02AF, WBP_ALetter},
 	{0x02B0, 0x02C1, WBP_ALetter},
 	{0x02C6, 0x02D1, WBP_ALetter},
 	{0x02E0, 0x02E4, WBP_ALetter},
 	{0x02EC, 0x02EC, WBP_ALetter},
 	{0x02EE, 0x02EE, WBP_ALetter},
 	{0x0300, 0x036F, WBP_Extend},
 	{0x0370, 0x0373, WBP_ALetter},
 	{0x0374, 0x0374, WBP_ALetter},
 	{0x0376, 0x0377, WBP_ALetter},
 	{0x037A, 0x037A, WBP_ALetter},
 	{0x037B, 0x037D, WBP_ALetter},
 	{0x037E, 0x037E, WBP_MidNum},
 	{0x0386, 0x0386, WBP_ALetter},
 	{0x0387, 0x0387, WBP_MidLetter},
 	{0x0388, 0x038A, WBP_ALetter},
 	{0x038C, 0x038C, WBP_ALetter},
 	{0x038E, 0x03A1, WBP_ALetter},
 	{0x03A3, 0x03F5, WBP_ALetter},
 	{0x03F7, 0x0481, WBP_ALetter},
 	{0x0483, 0x0487, WBP_Extend},
 	{0x0488, 0x0489, WBP_Extend},
 	{0x048A, 0x0527, WBP_ALetter},
 	{0x0531, 0x0556, WBP_ALetter},
 	{0x0559, 0x0559, WBP_ALetter},
 	{0x0561, 0x0587, WBP_ALetter},
 	{0x0589, 0x0589, WBP_MidNum},
 	{0x0591, 0x05BD, WBP_Extend},
 	{0x05BF, 0x05BF, WBP_Extend},
 	{0x05C1, 0x05C2, WBP_Extend},
 	{0x05C4, 0x05C5, WBP_Extend},
 	{0x05C7, 0x05C7, WBP_Extend},
 	{0x05D0, 0x05EA, WBP_ALetter},
 	{0x05F0, 0x05F2, WBP_ALetter},
 	{0x05F3, 0x05F3, WBP_ALetter},
 	{0x05F4, 0x05F4, WBP_MidLetter},
 	{0x0600, 0x0603, WBP_Format},
 	{0x060C, 0x060D, WBP_MidNum},
 	{0x0610, 0x061A, WBP_Extend},
 	{0x0620, 0x063F, WBP_ALetter},
 	{0x0640, 0x0640, WBP_ALetter},
 	{0x0641, 0x064A, WBP_ALetter},
 	{0x064B, 0x065F, WBP_Extend},
 	{0x0660, 0x0669, WBP_Numeric},
 	{0x066B, 0x066B, WBP_Numeric},
 	{0x066C, 0x066C, WBP_MidNum},
 	{0x066E, 0x066F, WBP_ALetter},
 	{0x0670, 0x0670, WBP_Extend},
 	{0x0671, 0x06D3, WBP_ALetter},
 	{0x06D5, 0x06D5, WBP_ALetter},
 	{0x06D6, 0x06DC, WBP_Extend},
 	{0x06DD, 0x06DD, WBP_Format},
 	{0x06DF, 0x06E4, WBP_Extend},
 	{0x06E5, 0x06E6, WBP_ALetter},
 	{0x06E7, 0x06E8, WBP_Extend},
 	{0x06EA, 0x06ED, WBP_Extend},
 	{0x06EE, 0x06EF, WBP_ALetter},
 	{0x06F0, 0x06F9, WBP_Numeric},
 	{0x06FA, 0x06FC, WBP_ALetter},
 	{0x06FF, 0x06FF, WBP_ALetter},
 	{0x070F, 0x070F, WBP_Format},
 	{0x0710, 0x0710, WBP_ALetter},
 	{0x0711, 0x0711, WBP_Extend},
 	{0x0712, 0x072F, WBP_ALetter},
 	{0x0730, 0x074A, WBP_Extend},
 	{0x074D, 0x07A5, WBP_ALetter},
 	{0x07A6, 0x07B0, WBP_Extend},
 	{0x07B1, 0x07B1, WBP_ALetter},
 	{0x07C0, 0x07C9, WBP_Numeric},
 	{0x07CA, 0x07EA, WBP_ALetter},
 	{0x07EB, 0x07F3, WBP_Extend},
 	{0x07F4, 0x07F5, WBP_ALetter},
 	{0x07F8, 0x07F8, WBP_MidNum},
 	{0x07FA, 0x07FA, WBP_ALetter},
 	{0x0800, 0x0815, WBP_ALetter},
 	{0x0816, 0x0819, WBP_Extend},
 	{0x081A, 0x081A, WBP_ALetter},
 	{0x081B, 0x0823, WBP_Extend},
 	{0x0824, 0x0824, WBP_ALetter},
 	{0x0825, 0x0827, WBP_Extend},
 	{0x0828, 0x0828, WBP_ALetter},
 	{0x0829, 0x082D, WBP_Extend},
 	{0x0840, 0x0858, WBP_ALetter},
 	{0x0859, 0x085B, WBP_Extend},
 	{0x0900, 0x0902, WBP_Extend},
 	{0x0903, 0x0903, WBP_Extend},
 	{0x0904, 0x0939, WBP_ALetter},
 	{0x093A, 0x093A, WBP_Extend},
 	{0x093B, 0x093B, WBP_Extend},
 	{0x093C, 0x093C, WBP_Extend},
 	{0x093D, 0x093D, WBP_ALetter},
 	{0x093E, 0x0940, WBP_Extend},
 	{0x0941, 0x0948, WBP_Extend},
 	{0x0949, 0x094C, WBP_Extend},
 	{0x094D, 0x094D, WBP_Extend},
 	{0x094E, 0x094F, WBP_Extend},
 	{0x0950, 0x0950, WBP_ALetter},
 	{0x0951, 0x0957, WBP_Extend},
 	{0x0958, 0x0961, WBP_ALetter},
 	{0x0962, 0x0963, WBP_Extend},
 	{0x0966, 0x096F, WBP_Numeric},
 	{0x0971, 0x0971, WBP_ALetter},
 	{0x0972, 0x0977, WBP_ALetter},
 	{0x0979, 0x097F, WBP_ALetter},
 	{0x0981, 0x0981, WBP_Extend},
 	{0x0982, 0x0983, WBP_Extend},
 	{0x0985, 0x098C, WBP_ALetter},
 	{0x098F, 0x0990, WBP_ALetter},
 	{0x0993, 0x09A8, WBP_ALetter},
 	{0x09AA, 0x09B0, WBP_ALetter},
 	{0x09B2, 0x09B2, WBP_ALetter},
 	{0x09B6, 0x09B9, WBP_ALetter},
 	{0x09BC, 0x09BC, WBP_Extend},
 	{0x09BD, 0x09BD, WBP_ALetter},
 	{0x09BE, 0x09C0, WBP_Extend},
 	{0x09C1, 0x09C4, WBP_Extend},
 	{0x09C7, 0x09C8, WBP_Extend},
 	{0x09CB, 0x09CC, WBP_Extend},
 	{0x09CD, 0x09CD, WBP_Extend},
 	{0x09CE, 0x09CE, WBP_ALetter},
 	{0x09D7, 0x09D7, WBP_Extend},
 	{0x09DC, 0x09DD, WBP_ALetter},
 	{0x09DF, 0x09E1, WBP_ALetter},
 	{0x09E2, 0x09E3, WBP_Extend},
 	{0x09E6, 0x09EF, WBP_Numeric},
 	{0x09F0, 0x09F1, WBP_ALetter},
 	{0x0A01, 0x0A02, WBP_Extend},
 	{0x0A03, 0x0A03, WBP_Extend},
 	{0x0A05, 0x0A0A, WBP_ALetter},
 	{0x0A0F, 0x0A10, WBP_ALetter},
 	{0x0A13, 0x0A28, WBP_ALetter},
 	{0x0A2A, 0x0A30, WBP_ALetter},
 	{0x0A32, 0x0A33, WBP_ALetter},
 	{0x0A35, 0x0A36, WBP_ALetter},
 	{0x0A38, 0x0A39, WBP_ALetter},
 	{0x0A3C, 0x0A3C, WBP_Extend},
 	{0x0A3E, 0x0A40, WBP_Extend},
 	{0x0A41, 0x0A42, WBP_Extend},
 	{0x0A47, 0x0A48, WBP_Extend},
 	{0x0A4B, 0x0A4D, WBP_Extend},
 	{0x0A51, 0x0A51, WBP_Extend},
 	{0x0A59, 0x0A5C, WBP_ALetter},
 	{0x0A5E, 0x0A5E, WBP_ALetter},
 	{0x0A66, 0x0A6F, WBP_Numeric},
 	{0x0A70, 0x0A71, WBP_Extend},
 	{0x0A72, 0x0A74, WBP_ALetter},
 	{0x0A75, 0x0A75, WBP_Extend},
 	{0x0A81, 0x0A82, WBP_Extend},
 	{0x0A83, 0x0A83, WBP_Extend},
 	{0x0A85, 0x0A8D, WBP_ALetter},
 	{0x0A8F, 0x0A91, WBP_ALetter},
 	{0x0A93, 0x0AA8, WBP_ALetter},
 	{0x0AAA, 0x0AB0, WBP_ALetter},
 	{0x0AB2, 0x0AB3, WBP_ALetter},
 	{0x0AB5, 0x0AB9, WBP_ALetter},
 	{0x0ABC, 0x0ABC, WBP_Extend},
 	{0x0ABD, 0x0ABD, WBP_ALetter},
 	{0x0ABE, 0x0AC0, WBP_Extend},
 	{0x0AC1, 0x0AC5, WBP_Extend},
 	{0x0AC7, 0x0AC8, WBP_Extend},
 	{0x0AC9, 0x0AC9, WBP_Extend},
 	{0x0ACB, 0x0ACC, WBP_Extend},
 	{0x0ACD, 0x0ACD, WBP_Extend},
 	{0x0AD0, 0x0AD0, WBP_ALetter},
 	{0x0AE0, 0x0AE1, WBP_ALetter},
 	{0x0AE2, 0x0AE3, WBP_Extend},
 	{0x0AE6, 0x0AEF, WBP_Numeric},
 	{0x0B01, 0x0B01, WBP_Extend},
 	{0x0B02, 0x0B03, WBP_Extend},
 	{0x0B05, 0x0B0C, WBP_ALetter},
 	{0x0B0F, 0x0B10, WBP_ALetter},
 	{0x0B13, 0x0B28, WBP_ALetter},
 	{0x0B2A, 0x0B30, WBP_ALetter},
 	{0x0B32, 0x0B33, WBP_ALetter},
 	{0x0B35, 0x0B39, WBP_ALetter},
 	{0x0B3C, 0x0B3C, WBP_Extend},
 	{0x0B3D, 0x0B3D, WBP_ALetter},
 	{0x0B3E, 0x0B3E, WBP_Extend},
 	{0x0B3F, 0x0B3F, WBP_Extend},
 	{0x0B40, 0x0B40, WBP_Extend},
 	{0x0B41, 0x0B44, WBP_Extend},
 	{0x0B47, 0x0B48, WBP_Extend},
 	{0x0B4B, 0x0B4C, WBP_Extend},
 	{0x0B4D, 0x0B4D, WBP_Extend},
 	{0x0B56, 0x0B56, WBP_Extend},
 	{0x0B57, 0x0B57, WBP_Extend},
 	{0x0B5C, 0x0B5D, WBP_ALetter},
 	{0x0B5F, 0x0B61, WBP_ALetter},
 	{0x0B62, 0x0B63, WBP_Extend},
 	{0x0B66, 0x0B6F, WBP_Numeric},
 	{0x0B71, 0x0B71, WBP_ALetter},
 	{0x0B82, 0x0B82, WBP_Extend},
 	{0x0B83, 0x0B83, WBP_ALetter},
 	{0x0B85, 0x0B8A, WBP_ALetter},
 	{0x0B8E, 0x0B90, WBP_ALetter},
 	{0x0B92, 0x0B95, WBP_ALetter},
 	{0x0B99, 0x0B9A, WBP_ALetter},
 	{0x0B9C, 0x0B9C, WBP_ALetter},
 	{0x0B9E, 0x0B9F, WBP_ALetter},
 	{0x0BA3, 0x0BA4, WBP_ALetter},
 	{0x0BA8, 0x0BAA, WBP_ALetter},
 	{0x0BAE, 0x0BB9, WBP_ALetter},
 	{0x0BBE, 0x0BBF, WBP_Extend},
 	{0x0BC0, 0x0BC0, WBP_Extend},
 	{0x0BC1, 0x0BC2, WBP_Extend},
 	{0x0BC6, 0x0BC8, WBP_Extend},
 	{0x0BCA, 0x0BCC, WBP_Extend},
 	{0x0BCD, 0x0BCD, WBP_Extend},
 	{0x0BD0, 0x0BD0, WBP_ALetter},
 	{0x0BD7, 0x0BD7, WBP_Extend},
 	{0x0BE6, 0x0BEF, WBP_Numeric},
 	{0x0C01, 0x0C03, WBP_Extend},
 	{0x0C05, 0x0C0C, WBP_ALetter},
 	{0x0C0E, 0x0C10, WBP_ALetter},
 	{0x0C12, 0x0C28, WBP_ALetter},
 	{0x0C2A, 0x0C33, WBP_ALetter},
 	{0x0C35, 0x0C39, WBP_ALetter},
 	{0x0C3D, 0x0C3D, WBP_ALetter},
 	{0x0C3E, 0x0C40, WBP_Extend},
 	{0x0C41, 0x0C44, WBP_Extend},
 	{0x0C46, 0x0C48, WBP_Extend},
 	{0x0C4A, 0x0C4D, WBP_Extend},
 	{0x0C55, 0x0C56, WBP_Extend},
 	{0x0C58, 0x0C59, WBP_ALetter},
 	{0x0C60, 0x0C61, WBP_ALetter},
 	{0x0C62, 0x0C63, WBP_Extend},
 	{0x0C66, 0x0C6F, WBP_Numeric},
 	{0x0C82, 0x0C83, WBP_Extend},
 	{0x0C85, 0x0C8C, WBP_ALetter},
 	{0x0C8E, 0x0C90, WBP_ALetter},
 	{0x0C92, 0x0CA8, WBP_ALetter},
 	{0x0CAA, 0x0CB3, WBP_ALetter},
 	{0x0CB5, 0x0CB9, WBP_ALetter},
 	{0x0CBC, 0x0CBC, WBP_Extend},
 	{0x0CBD, 0x0CBD, WBP_ALetter},
 	{0x0CBE, 0x0CBE, WBP_Extend},
 	{0x0CBF, 0x0CBF, WBP_Extend},
 	{0x0CC0, 0x0CC4, WBP_Extend},
 	{0x0CC6, 0x0CC6, WBP_Extend},
 	{0x0CC7, 0x0CC8, WBP_Extend},
 	{0x0CCA, 0x0CCB, WBP_Extend},
 	{0x0CCC, 0x0CCD, WBP_Extend},
 	{0x0CD5, 0x0CD6, WBP_Extend},
 	{0x0CDE, 0x0CDE, WBP_ALetter},
 	{0x0CE0, 0x0CE1, WBP_ALetter},
 	{0x0CE2, 0x0CE3, WBP_Extend},
 	{0x0CE6, 0x0CEF, WBP_Numeric},
 	{0x0CF1, 0x0CF2, WBP_ALetter},
 	{0x0D02, 0x0D03, WBP_Extend},
 	{0x0D05, 0x0D0C, WBP_ALetter},
 	{0x0D0E, 0x0D10, WBP_ALetter},
 	{0x0D12, 0x0D3A, WBP_ALetter},
 	{0x0D3D, 0x0D3D, WBP_ALetter},
 	{0x0D3E, 0x0D40, WBP_Extend},
 	{0x0D41, 0x0D44, WBP_Extend},
 	{0x0D46, 0x0D48, WBP_Extend},
 	{0x0D4A, 0x0D4C, WBP_Extend},
 	{0x0D4D, 0x0D4D, WBP_Extend},
 	{0x0D4E, 0x0D4E, WBP_ALetter},
 	{0x0D57, 0x0D57, WBP_Extend},
 	{0x0D60, 0x0D61, WBP_ALetter},
 	{0x0D62, 0x0D63, WBP_Extend},
 	{0x0D66, 0x0D6F, WBP_Numeric},
 	{0x0D7A, 0x0D7F, WBP_ALetter},
 	{0x0D82, 0x0D83, WBP_Extend},
 	{0x0D85, 0x0D96, WBP_ALetter},
 	{0x0D9A, 0x0DB1, WBP_ALetter},
 	{0x0DB3, 0x0DBB, WBP_ALetter},
 	{0x0DBD, 0x0DBD, WBP_ALetter},
 	{0x0DC0, 0x0DC6, WBP_ALetter},
 	{0x0DCA, 0x0DCA, WBP_Extend},
 	{0x0DCF, 0x0DD1, WBP_Extend},
 	{0x0DD2, 0x0DD4, WBP_Extend},
 	{0x0DD6, 0x0DD6, WBP_Extend},
 	{0x0DD8, 0x0DDF, WBP_Extend},
 	{0x0DF2, 0x0DF3, WBP_Extend},
 	{0x0E31, 0x0E31, WBP_Extend},
 	{0x0E34, 0x0E3A, WBP_Extend},
 	{0x0E47, 0x0E4E, WBP_Extend},
 	{0x0E50, 0x0E59, WBP_Numeric},
 	{0x0EB1, 0x0EB1, WBP_Extend},
 	{0x0EB4, 0x0EB9, WBP_Extend},
 	{0x0EBB, 0x0EBC, WBP_Extend},
 	{0x0EC8, 0x0ECD, WBP_Extend},
 	{0x0ED0, 0x0ED9, WBP_Numeric},
 	{0x0F00, 0x0F00, WBP_ALetter},
 	{0x0F18, 0x0F19, WBP_Extend},
 	{0x0F20, 0x0F29, WBP_Numeric},
 	{0x0F35, 0x0F35, WBP_Extend},
 	{0x0F37, 0x0F37, WBP_Extend},
 	{0x0F39, 0x0F39, WBP_Extend},
 	{0x0F3E, 0x0F3F, WBP_Extend},
 	{0x0F40, 0x0F47, WBP_ALetter},
 	{0x0F49, 0x0F6C, WBP_ALetter},
 	{0x0F71, 0x0F7E, WBP_Extend},
 	{0x0F7F, 0x0F7F, WBP_Extend},
 	{0x0F80, 0x0F84, WBP_Extend},
 	{0x0F86, 0x0F87, WBP_Extend},
 	{0x0F88, 0x0F8C, WBP_ALetter},
 	{0x0F8D, 0x0F97, WBP_Extend},
 	{0x0F99, 0x0FBC, WBP_Extend},
 	{0x0FC6, 0x0FC6, WBP_Extend},
 	{0x102B, 0x102C, WBP_Extend},
 	{0x102D, 0x1030, WBP_Extend},
 	{0x1031, 0x1031, WBP_Extend},
 	{0x1032, 0x1037, WBP_Extend},
 	{0x1038, 0x1038, WBP_Extend},
 	{0x1039, 0x103A, WBP_Extend},
 	{0x103B, 0x103C, WBP_Extend},
 	{0x103D, 0x103E, WBP_Extend},
 	{0x1040, 0x1049, WBP_Numeric},
 	{0x1056, 0x1057, WBP_Extend},
 	{0x1058, 0x1059, WBP_Extend},
 	{0x105E, 0x1060, WBP_Extend},
 	{0x1062, 0x1064, WBP_Extend},
 	{0x1067, 0x106D, WBP_Extend},
 	{0x1071, 0x1074, WBP_Extend},
 	{0x1082, 0x1082, WBP_Extend},
 	{0x1083, 0x1084, WBP_Extend},
 	{0x1085, 0x1086, WBP_Extend},
 	{0x1087, 0x108C, WBP_Extend},
 	{0x108D, 0x108D, WBP_Extend},
 	{0x108F, 0x108F, WBP_Extend},
 	{0x1090, 0x1099, WBP_Numeric},
 	{0x109A, 0x109C, WBP_Extend},
 	{0x109D, 0x109D, WBP_Extend},
 	{0x10A0, 0x10C5, WBP_ALetter},
 	{0x10D0, 0x10FA, WBP_ALetter},
 	{0x10FC, 0x10FC, WBP_ALetter},
 	{0x1100, 0x1248, WBP_ALetter},
 	{0x124A, 0x124D, WBP_ALetter},
 	{0x1250, 0x1256, WBP_ALetter},
 	{0x1258, 0x1258, WBP_ALetter},
 	{0x125A, 0x125D, WBP_ALetter},
 	{0x1260, 0x1288, WBP_ALetter},
 	{0x128A, 0x128D, WBP_ALetter},
 	{0x1290, 0x12B0, WBP_ALetter},
 	{0x12B2, 0x12B5, WBP_ALetter},
 	{0x12B8, 0x12BE, WBP_ALetter},
 	{0x12C0, 0x12C0, WBP_ALetter},
 	{0x12C2, 0x12C5, WBP_ALetter},
 	{0x12C8, 0x12D6, WBP_ALetter},
 	{0x12D8, 0x1310, WBP_ALetter},
 	{0x1312, 0x1315, WBP_ALetter},
 	{0x1318, 0x135A, WBP_ALetter},
 	{0x135D, 0x135F, WBP_Extend},
 	{0x1380, 0x138F, WBP_ALetter},
 	{0x13A0, 0x13F4, WBP_ALetter},
 	{0x1401, 0x166C, WBP_ALetter},
 	{0x166F, 0x167F, WBP_ALetter},
 	{0x1681, 0x169A, WBP_ALetter},
 	{0x16A0, 0x16EA, WBP_ALetter},
 	{0x16EE, 0x16F0, WBP_ALetter},
 	{0x1700, 0x170C, WBP_ALetter},
 	{0x170E, 0x1711, WBP_ALetter},
 	{0x1712, 0x1714, WBP_Extend},
 	{0x1720, 0x1731, WBP_ALetter},
 	{0x1732, 0x1734, WBP_Extend},
 	{0x1740, 0x1751, WBP_ALetter},
 	{0x1752, 0x1753, WBP_Extend},
 	{0x1760, 0x176C, WBP_ALetter},
 	{0x176E, 0x1770, WBP_ALetter},
 	{0x1772, 0x1773, WBP_Extend},
 	{0x17B4, 0x17B5, WBP_Format},
 	{0x17B6, 0x17B6, WBP_Extend},
 	{0x17B7, 0x17BD, WBP_Extend},
 	{0x17BE, 0x17C5, WBP_Extend},
 	{0x17C6, 0x17C6, WBP_Extend},
 	{0x17C7, 0x17C8, WBP_Extend},
 	{0x17C9, 0x17D3, WBP_Extend},
 	{0x17DD, 0x17DD, WBP_Extend},
 	{0x17E0, 0x17E9, WBP_Numeric},
 	{0x180B, 0x180D, WBP_Extend},
 	{0x1810, 0x1819, WBP_Numeric},
 	{0x1820, 0x1842, WBP_ALetter},
 	{0x1843, 0x1843, WBP_ALetter},
 	{0x1844, 0x1877, WBP_ALetter},
 	{0x1880, 0x18A8, WBP_ALetter},
 	{0x18A9, 0x18A9, WBP_Extend},
 	{0x18AA, 0x18AA, WBP_ALetter},
 	{0x18B0, 0x18F5, WBP_ALetter},
 	{0x1900, 0x191C, WBP_ALetter},
 	{0x1920, 0x1922, WBP_Extend},
 	{0x1923, 0x1926, WBP_Extend},
 	{0x1927, 0x1928, WBP_Extend},
 	{0x1929, 0x192B, WBP_Extend},
 	{0x1930, 0x1931, WBP_Extend},
 	{0x1932, 0x1932, WBP_Extend},
 	{0x1933, 0x1938, WBP_Extend},
 	{0x1939, 0x193B, WBP_Extend},
 	{0x1946, 0x194F, WBP_Numeric},
 	{0x19B0, 0x19C0, WBP_Extend},
 	{0x19C8, 0x19C9, WBP_Extend},
 	{0x19D0, 0x19D9, WBP_Numeric},
 	{0x1A00, 0x1A16, WBP_ALetter},
 	{0x1A17, 0x1A18, WBP_Extend},
 	{0x1A19, 0x1A1B, WBP_Extend},
 	{0x1A55, 0x1A55, WBP_Extend},
 	{0x1A56, 0x1A56, WBP_Extend},
 	{0x1A57, 0x1A57, WBP_Extend},
 	{0x1A58, 0x1A5E, WBP_Extend},
 	{0x1A60, 0x1A60, WBP_Extend},
 	{0x1A61, 0x1A61, WBP_Extend},
 	{0x1A62, 0x1A62, WBP_Extend},
 	{0x1A63, 0x1A64, WBP_Extend},
 	{0x1A65, 0x1A6C, WBP_Extend},
 	{0x1A6D, 0x1A72, WBP_Extend},
 	{0x1A73, 0x1A7C, WBP_Extend},
 	{0x1A7F, 0x1A7F, WBP_Extend},
 	{0x1A80, 0x1A89, WBP_Numeric},
 	{0x1A90, 0x1A99, WBP_Numeric},
 	{0x1B00, 0x1B03, WBP_Extend},
 	{0x1B04, 0x1B04, WBP_Extend},
 	{0x1B05, 0x1B33, WBP_ALetter},
 	{0x1B34, 0x1B34, WBP_Extend},
 	{0x1B35, 0x1B35, WBP_Extend},
 	{0x1B36, 0x1B3A, WBP_Extend},
 	{0x1B3B, 0x1B3B, WBP_Extend},
 	{0x1B3C, 0x1B3C, WBP_Extend},
 	{0x1B3D, 0x1B41, WBP_Extend},
 	{0x1B42, 0x1B42, WBP_Extend},
 	{0x1B43, 0x1B44, WBP_Extend},
 	{0x1B45, 0x1B4B, WBP_ALetter},
 	{0x1B50, 0x1B59, WBP_Numeric},
 	{0x1B6B, 0x1B73, WBP_Extend},
 	{0x1B80, 0x1B81, WBP_Extend},
 	{0x1B82, 0x1B82, WBP_Extend},
 	{0x1B83, 0x1BA0, WBP_ALetter},
 	{0x1BA1, 0x1BA1, WBP_Extend},
 	{0x1BA2, 0x1BA5, WBP_Extend},
 	{0x1BA6, 0x1BA7, WBP_Extend},
 	{0x1BA8, 0x1BA9, WBP_Extend},
 	{0x1BAA, 0x1BAA, WBP_Extend},
 	{0x1BAE, 0x1BAF, WBP_ALetter},
 	{0x1BB0, 0x1BB9, WBP_Numeric},
 	{0x1BC0, 0x1BE5, WBP_ALetter},
 	{0x1BE6, 0x1BE6, WBP_Extend},
 	{0x1BE7, 0x1BE7, WBP_Extend},
 	{0x1BE8, 0x1BE9, WBP_Extend},
 	{0x1BEA, 0x1BEC, WBP_Extend},
 	{0x1BED, 0x1BED, WBP_Extend},
 	{0x1BEE, 0x1BEE, WBP_Extend},
 	{0x1BEF, 0x1BF1, WBP_Extend},
 	{0x1BF2, 0x1BF3, WBP_Extend},
 	{0x1C00, 0x1C23, WBP_ALetter},
 	{0x1C24, 0x1C2B, WBP_Extend},
 	{0x1C2C, 0x1C33, WBP_Extend},
 	{0x1C34, 0x1C35, WBP_Extend},
 	{0x1C36, 0x1C37, WBP_Extend},
 	{0x1C40, 0x1C49, WBP_Numeric},
 	{0x1C4D, 0x1C4F, WBP_ALetter},
 	{0x1C50, 0x1C59, WBP_Numeric},
 	{0x1C5A, 0x1C77, WBP_ALetter},
 	{0x1C78, 0x1C7D, WBP_ALetter},
 	{0x1CD0, 0x1CD2, WBP_Extend},
 	{0x1CD4, 0x1CE0, WBP_Extend},
 	{0x1CE1, 0x1CE1, WBP_Extend},
 	{0x1CE2, 0x1CE8, WBP_Extend},
 	{0x1CE9, 0x1CEC, WBP_ALetter},
 	{0x1CED, 0x1CED, WBP_Extend},
 	{0x1CEE, 0x1CF1, WBP_ALetter},
 	{0x1CF2, 0x1CF2, WBP_Extend},
 	{0x1D00, 0x1D2B, WBP_ALetter},
 	{0x1D2C, 0x1D61, WBP_ALetter},
 	{0x1D62, 0x1D77, WBP_ALetter},
 	{0x1D78, 0x1D78, WBP_ALetter},
 	{0x1D79, 0x1D9A, WBP_ALetter},
 	{0x1D9B, 0x1DBF, WBP_ALetter},
 	{0x1DC0, 0x1DE6, WBP_Extend},
 	{0x1DFC, 0x1DFF, WBP_Extend},
 	{0x1E00, 0x1F15, WBP_ALetter},
 	{0x1F18, 0x1F1D, WBP_ALetter},
 	{0x1F20, 0x1F45, WBP_ALetter},
 	{0x1F48, 0x1F4D, WBP_ALetter},
 	{0x1F50, 0x1F57, WBP_ALetter},
 	{0x1F59, 0x1F59, WBP_ALetter},
 	{0x1F5B, 0x1F5B, WBP_ALetter},
 	{0x1F5D, 0x1F5D, WBP_ALetter},
 	{0x1F5F, 0x1F7D, WBP_ALetter},
 	{0x1F80, 0x1FB4, WBP_ALetter},
 	{0x1FB6, 0x1FBC, WBP_ALetter},
 	{0x1FBE, 0x1FBE, WBP_ALetter},
 	{0x1FC2, 0x1FC4, WBP_ALetter},
 	{0x1FC6, 0x1FCC, WBP_ALetter},
 	{0x1FD0, 0x1FD3, WBP_ALetter},
 	{0x1FD6, 0x1FDB, WBP_ALetter},
 	{0x1FE0, 0x1FEC, WBP_ALetter},
 	{0x1FF2, 0x1FF4, WBP_ALetter},
 	{0x1FF6, 0x1FFC, WBP_ALetter},
 	{0x200C, 0x200D, WBP_Extend},
 	{0x200E, 0x200F, WBP_Format},
 	{0x2018, 0x2018, WBP_MidNumLet},
 	{0x2019, 0x2019, WBP_MidNumLet},
 	{0x2024, 0x2024, WBP_MidNumLet},
 	{0x2027, 0x2027, WBP_MidLetter},
 	{0x2028, 0x2028, WBP_Newline},
 	{0x2029, 0x2029, WBP_Newline},
 	{0x202A, 0x202E, WBP_Format},
 	{0x203F, 0x2040, WBP_ExtendNumLet},
 	{0x2044, 0x2044, WBP_MidNum},
 	{0x2054, 0x2054, WBP_ExtendNumLet},
 	{0x2060, 0x2064, WBP_Format},
 	{0x206A, 0x206F, WBP_Format},
 	{0x2071, 0x2071, WBP_ALetter},
 	{0x207F, 0x207F, WBP_ALetter},
 	{0x2090, 0x209C, WBP_ALetter},
 	{0x20D0, 0x20DC, WBP_Extend},
 	{0x20DD, 0x20E0, WBP_Extend},
 	{0x20E1, 0x20E1, WBP_Extend},
 	{0x20E2, 0x20E4, WBP_Extend},
 	{0x20E5, 0x20F0, WBP_Extend},
 	{0x2102, 0x2102, WBP_ALetter},
 	{0x2107, 0x2107, WBP_ALetter},
 	{0x210A, 0x2113, WBP_ALetter},
 	{0x2115, 0x2115, WBP_ALetter},
 	{0x2119, 0x211D, WBP_ALetter},
 	{0x2124, 0x2124, WBP_ALetter},
 	{0x2126, 0x2126, WBP_ALetter},
 	{0x2128, 0x2128, WBP_ALetter},
 	{0x212A, 0x212D, WBP_ALetter},
 	{0x212F, 0x2134, WBP_ALetter},
 	{0x2135, 0x2138, WBP_ALetter},
 	{0x2139, 0x2139, WBP_ALetter},
 	{0x213C, 0x213F, WBP_ALetter},
 	{0x2145, 0x2149, WBP_ALetter},
 	{0x214E, 0x214E, WBP_ALetter},
 	{0x2160, 0x2182, WBP_ALetter},
 	{0x2183, 0x2184, WBP_ALetter},
 	{0x2185, 0x2188, WBP_ALetter},
 	{0x24B6, 0x24E9, WBP_ALetter},
 	{0x2C00, 0x2C2E, WBP_ALetter},
 	{0x2C30, 0x2C5E, WBP_ALetter},
 	{0x2C60, 0x2C7C, WBP_ALetter},
 	{0x2C7D, 0x2C7D, WBP_ALetter},
 	{0x2C7E, 0x2CE4, WBP_ALetter},
 	{0x2CEB, 0x2CEE, WBP_ALetter},
 	{0x2CEF, 0x2CF1, WBP_Extend},
 	{0x2D00, 0x2D25, WBP_ALetter},
 	{0x2D30, 0x2D65, WBP_ALetter},
 	{0x2D6F, 0x2D6F, WBP_ALetter},
 	{0x2D7F, 0x2D7F, WBP_Extend},
 	{0x2D80, 0x2D96, WBP_ALetter},
 	{0x2DA0, 0x2DA6, WBP_ALetter},
 	{0x2DA8, 0x2DAE, WBP_ALetter},
 	{0x2DB0, 0x2DB6, WBP_ALetter},
 	{0x2DB8, 0x2DBE, WBP_ALetter},
 	{0x2DC0, 0x2DC6, WBP_ALetter},
 	{0x2DC8, 0x2DCE, WBP_ALetter},
 	{0x2DD0, 0x2DD6, WBP_ALetter},
 	{0x2DD8, 0x2DDE, WBP_ALetter},
 	{0x2DE0, 0x2DFF, WBP_Extend},
 	{0x2E2F, 0x2E2F, WBP_ALetter},
 	{0x3005, 0x3005, WBP_ALetter},
 	{0x302A, 0x302F, WBP_Extend},
 	{0x3031, 0x3035, WBP_Katakana},
 	{0x303B, 0x303B, WBP_ALetter},
 	{0x303C, 0x303C, WBP_ALetter},
 	{0x3099, 0x309A, WBP_Extend},
 	{0x309B, 0x309C, WBP_Katakana},
 	{0x30A0, 0x30A0, WBP_Katakana},
 	{0x30A1, 0x30FA, WBP_Katakana},
 	{0x30FC, 0x30FE, WBP_Katakana},
 	{0x30FF, 0x30FF, WBP_Katakana},
 	{0x3105, 0x312D, WBP_ALetter},
 	{0x3131, 0x318E, WBP_ALetter},
 	{0x31A0, 0x31BA, WBP_ALetter},
 	{0x31F0, 0x31FF, WBP_Katakana},
 	{0x32D0, 0x32FE, WBP_Katakana},
 	{0x3300, 0x3357, WBP_Katakana},
 	{0xA000, 0xA014, WBP_ALetter},
 	{0xA015, 0xA015, WBP_ALetter},
 	{0xA016, 0xA48C, WBP_ALetter},
 	{0xA4D0, 0xA4F7, WBP_ALetter},
 	{0xA4F8, 0xA4FD, WBP_ALetter},
 	{0xA500, 0xA60B, WBP_ALetter},
 	{0xA60C, 0xA60C, WBP_ALetter},
 	{0xA610, 0xA61F, WBP_ALetter},
 	{0xA620, 0xA629, WBP_Numeric},
 	{0xA62A, 0xA62B, WBP_ALetter},
 	{0xA640, 0xA66D, WBP_ALetter},
 	{0xA66E, 0xA66E, WBP_ALetter},
 	{0xA66F, 0xA66F, WBP_Extend},
 	{0xA670, 0xA672, WBP_Extend},
 	{0xA67C, 0xA67D, WBP_Extend},
 	{0xA67F, 0xA67F, WBP_ALetter},
 	{0xA680, 0xA697, WBP_ALetter},
 	{0xA6A0, 0xA6E5, WBP_ALetter},
 	{0xA6E6, 0xA6EF, WBP_ALetter},
 	{0xA6F0, 0xA6F1, WBP_Extend},
 	{0xA717, 0xA71F, WBP_ALetter},
 	{0xA722, 0xA76F, WBP_ALetter},
 	{0xA770, 0xA770, WBP_ALetter},
 	{0xA771, 0xA787, WBP_ALetter},
 	{0xA788, 0xA788, WBP_ALetter},
 	{0xA78B, 0xA78E, WBP_ALetter},
 	{0xA790, 0xA791, WBP_ALetter},
 	{0xA7A0, 0xA7A9, WBP_ALetter},
 	{0xA7FA, 0xA7FA, WBP_ALetter},
 	{0xA7FB, 0xA801, WBP_ALetter},
 	{0xA802, 0xA802, WBP_Extend},
 	{0xA803, 0xA805, WBP_ALetter},
 	{0xA806, 0xA806, WBP_Extend},
 	{0xA807, 0xA80A, WBP_ALetter},
 	{0xA80B, 0xA80B, WBP_Extend},
 	{0xA80C, 0xA822, WBP_ALetter},
 	{0xA823, 0xA824, WBP_Extend},
 	{0xA825, 0xA826, WBP_Extend},
 	{0xA827, 0xA827, WBP_Extend},
 	{0xA840, 0xA873, WBP_ALetter},
 	{0xA880, 0xA881, WBP_Extend},
 	{0xA882, 0xA8B3, WBP_ALetter},
 	{0xA8B4, 0xA8C3, WBP_Extend},
 	{0xA8C4, 0xA8C4, WBP_Extend},
 	{0xA8D0, 0xA8D9, WBP_Numeric},
 	{0xA8E0, 0xA8F1, WBP_Extend},
 	{0xA8F2, 0xA8F7, WBP_ALetter},
 	{0xA8FB, 0xA8FB, WBP_ALetter},
 	{0xA900, 0xA909, WBP_Numeric},
 	{0xA90A, 0xA925, WBP_ALetter},
 	{0xA926, 0xA92D, WBP_Extend},
 	{0xA930, 0xA946, WBP_ALetter},
 	{0xA947, 0xA951, WBP_Extend},
 	{0xA952, 0xA953, WBP_Extend},
 	{0xA960, 0xA97C, WBP_ALetter},
 	{0xA980, 0xA982, WBP_Extend},
 	{0xA983, 0xA983, WBP_Extend},
 	{0xA984, 0xA9B2, WBP_ALetter},
 	{0xA9B3, 0xA9B3, WBP_Extend},
 	{0xA9B4, 0xA9B5, WBP_Extend},
 	{0xA9B6, 0xA9B9, WBP_Extend},
 	{0xA9BA, 0xA9BB, WBP_Extend},
 	{0xA9BC, 0xA9BC, WBP_Extend},
 	{0xA9BD, 0xA9C0, WBP_Extend},
 	{0xA9CF, 0xA9CF, WBP_ALetter},
 	{0xA9D0, 0xA9D9, WBP_Numeric},
 	{0xAA00, 0xAA28, WBP_ALetter},
 	{0xAA29, 0xAA2E, WBP_Extend},
 	{0xAA2F, 0xAA30, WBP_Extend},
 	{0xAA31, 0xAA32, WBP_Extend},
 	{0xAA33, 0xAA34, WBP_Extend},
 	{0xAA35, 0xAA36, WBP_Extend},
 	{0xAA40, 0xAA42, WBP_ALetter},
 	{0xAA43, 0xAA43, WBP_Extend},
 	{0xAA44, 0xAA4B, WBP_ALetter},
 	{0xAA4C, 0xAA4C, WBP_Extend},
 	{0xAA4D, 0xAA4D, WBP_Extend},
 	{0xAA50, 0xAA59, WBP_Numeric},
 	{0xAA7B, 0xAA7B, WBP_Extend},
 	{0xAAB0, 0xAAB0, WBP_Extend},
 	{0xAAB2, 0xAAB4, WBP_Extend},
 	{0xAAB7, 0xAAB8, WBP_Extend},
 	{0xAABE, 0xAABF, WBP_Extend},
 	{0xAAC1, 0xAAC1, WBP_Extend},
 	{0xAB01, 0xAB06, WBP_ALetter},
 	{0xAB09, 0xAB0E, WBP_ALetter},
 	{0xAB11, 0xAB16, WBP_ALetter},
 	{0xAB20, 0xAB26, WBP_ALetter},
 	{0xAB28, 0xAB2E, WBP_ALetter},
 	{0xABC0, 0xABE2, WBP_ALetter},
 	{0xABE3, 0xABE4, WBP_Extend},
 	{0xABE5, 0xABE5, WBP_Extend},
 	{0xABE6, 0xABE7, WBP_Extend},
 	{0xABE8, 0xABE8, WBP_Extend},
 	{0xABE9, 0xABEA, WBP_Extend},
 	{0xABEC, 0xABEC, WBP_Extend},
 	{0xABED, 0xABED, WBP_Extend},
 	{0xABF0, 0xABF9, WBP_Numeric},
 	{0xAC00, 0xD7A3, WBP_ALetter},
 	{0xD7B0, 0xD7C6, WBP_ALetter},
 	{0xD7CB, 0xD7FB, WBP_ALetter},
 	{0xFB00, 0xFB06, WBP_ALetter},
 	{0xFB13, 0xFB17, WBP_ALetter},
 	{0xFB1D, 0xFB1D, WBP_ALetter},
 	{0xFB1E, 0xFB1E, WBP_Extend},
 	{0xFB1F, 0xFB28, WBP_ALetter},
 	{0xFB2A, 0xFB36, WBP_ALetter},
 	{0xFB38, 0xFB3C, WBP_ALetter},
 	{0xFB3E, 0xFB3E, WBP_ALetter},
 	{0xFB40, 0xFB41, WBP_ALetter},
 	{0xFB43, 0xFB44, WBP_ALetter},
 	{0xFB46, 0xFBB1, WBP_ALetter},
 	{0xFBD3, 0xFD3D, WBP_ALetter},
 	{0xFD50, 0xFD8F, WBP_ALetter},
 	{0xFD92, 0xFDC7, WBP_ALetter},
 	{0xFDF0, 0xFDFB, WBP_ALetter},
 	{0xFE00, 0xFE0F, WBP_Extend},
 	{0xFE10, 0xFE10, WBP_MidNum},
 	{0xFE13, 0xFE13, WBP_MidLetter},
 	{0xFE14, 0xFE14, WBP_MidNum},
 	{0xFE20, 0xFE26, WBP_Extend},
 	{0xFE33, 0xFE34, WBP_ExtendNumLet},
 	{0xFE4D, 0xFE4F, WBP_ExtendNumLet},
 	{0xFE50, 0xFE50, WBP_MidNum},
 	{0xFE52, 0xFE52, WBP_MidNumLet},
 	{0xFE54, 0xFE54, WBP_MidNum},
 	{0xFE55, 0xFE55, WBP_MidLetter},
 	{0xFE70, 0xFE74, WBP_ALetter},
 	{0xFE76, 0xFEFC, WBP_ALetter},
 	{0xFEFF, 0xFEFF, WBP_Format},
 	{0xFF07, 0xFF07, WBP_MidNumLet},
 	{0xFF0C, 0xFF0C, WBP_MidNum},
 	{0xFF0E, 0xFF0E, WBP_MidNumLet},
 	{0xFF1A, 0xFF1A, WBP_MidLetter},
 	{0xFF1B, 0xFF1B, WBP_MidNum},
 	{0xFF21, 0xFF3A, WBP_ALetter},
 	{0xFF3F, 0xFF3F, WBP_ExtendNumLet},
 	{0xFF41, 0xFF5A, WBP_ALetter},
 	{0xFF66, 0xFF6F, WBP_Katakana},
 	{0xFF70, 0xFF70, WBP_Katakana},
 	{0xFF71, 0xFF9D, WBP_Katakana},
 	{0xFF9E, 0xFF9F, WBP_Extend},
 	{0xFFA0, 0xFFBE, WBP_ALetter},
 	{0xFFC2, 0xFFC7, WBP_ALetter},
 	{0xFFCA, 0xFFCF, WBP_ALetter},
 	{0xFFD2, 0xFFD7, WBP_ALetter},
 	{0xFFDA, 0xFFDC, WBP_ALetter},
 	{0xFFF9, 0xFFFB, WBP_Format},
 	{0x10000, 0x1000B, WBP_ALetter},
 	{0x1000D, 0x10026, WBP_ALetter},
 	{0x10028, 0x1003A, WBP_ALetter},
 	{0x1003C, 0x1003D, WBP_ALetter},
 	{0x1003F, 0x1004D, WBP_ALetter},
 	{0x10050, 0x1005D, WBP_ALetter},
 	{0x10080, 0x100FA, WBP_ALetter},
 	{0x10140, 0x10174, WBP_ALetter},
 	{0x101FD, 0x101FD, WBP_Extend},
 	{0x10280, 0x1029C, WBP_ALetter},
 	{0x102A0, 0x102D0, WBP_ALetter},
 	{0x10300, 0x1031E, WBP_ALetter},
 	{0x10330, 0x10340, WBP_ALetter},
 	{0x10341, 0x10341, WBP_ALetter},
 	{0x10342, 0x10349, WBP_ALetter},
 	{0x1034A, 0x1034A, WBP_ALetter},
 	{0x10380, 0x1039D, WBP_ALetter},
 	{0x103A0, 0x103C3, WBP_ALetter},
 	{0x103C8, 0x103CF, WBP_ALetter},
 	{0x103D1, 0x103D5, WBP_ALetter},
 	{0x10400, 0x1044F, WBP_ALetter},
 	{0x10450, 0x1049D, WBP_ALetter},
 	{0x104A0, 0x104A9, WBP_Numeric},
 	{0x10800, 0x10805, WBP_ALetter},
 	{0x10808, 0x10808, WBP_ALetter},
 	{0x1080A, 0x10835, WBP_ALetter},
 	{0x10837, 0x10838, WBP_ALetter},
 	{0x1083C, 0x1083C, WBP_ALetter},
 	{0x1083F, 0x10855, WBP_ALetter},
 	{0x10900, 0x10915, WBP_ALetter},
 	{0x10920, 0x10939, WBP_ALetter},
 	{0x10A00, 0x10A00, WBP_ALetter},
 	{0x10A01, 0x10A03, WBP_Extend},
 	{0x10A05, 0x10A06, WBP_Extend},
 	{0x10A0C, 0x10A0F, WBP_Extend},
 	{0x10A10, 0x10A13, WBP_ALetter},
 	{0x10A15, 0x10A17, WBP_ALetter},
 	{0x10A19, 0x10A33, WBP_ALetter},
 	{0x10A38, 0x10A3A, WBP_Extend},
 	{0x10A3F, 0x10A3F, WBP_Extend},
 	{0x10A60, 0x10A7C, WBP_ALetter},
 	{0x10B00, 0x10B35, WBP_ALetter},
 	{0x10B40, 0x10B55, WBP_ALetter},
 	{0x10B60, 0x10B72, WBP_ALetter},
 	{0x10C00, 0x10C48, WBP_ALetter},
 	{0x11000, 0x11000, WBP_Extend},
 	{0x11001, 0x11001, WBP_Extend},
 	{0x11002, 0x11002, WBP_Extend},
 	{0x11003, 0x11037, WBP_ALetter},
 	{0x11038, 0x11046, WBP_Extend},
 	{0x11066, 0x1106F, WBP_Numeric},
 	{0x11080, 0x11081, WBP_Extend},
 	{0x11082, 0x11082, WBP_Extend},
 	{0x11083, 0x110AF, WBP_ALetter},
 	{0x110B0, 0x110B2, WBP_Extend},
 	{0x110B3, 0x110B6, WBP_Extend},
 	{0x110B7, 0x110B8, WBP_Extend},
 	{0x110B9, 0x110BA, WBP_Extend},
 	{0x110BD, 0x110BD, WBP_Format},
 	{0x12000, 0x1236E, WBP_ALetter},
 	{0x12400, 0x12462, WBP_ALetter},
 	{0x13000, 0x1342E, WBP_ALetter},
 	{0x16800, 0x16A38, WBP_ALetter},
 	{0x1B000, 0x1B000, WBP_Katakana},
 	{0x1D165, 0x1D166, WBP_Extend},
 	{0x1D167, 0x1D169, WBP_Extend},
 	{0x1D16D, 0x1D172, WBP_Extend},
 	{0x1D173, 0x1D17A, WBP_Format},
 	{0x1D17B, 0x1D182, WBP_Extend},
 	{0x1D185, 0x1D18B, WBP_Extend},
 	{0x1D1AA, 0x1D1AD, WBP_Extend},
 	{0x1D242, 0x1D244, WBP_Extend},
 	{0x1D400, 0x1D454, WBP_ALetter},
 	{0x1D456, 0x1D49C, WBP_ALetter},
 	{0x1D49E, 0x1D49F, WBP_ALetter},
 	{0x1D4A2, 0x1D4A2, WBP_ALetter},
 	{0x1D4A5, 0x1D4A6, WBP_ALetter},
 	{0x1D4A9, 0x1D4AC, WBP_ALetter},
 	{0x1D4AE, 0x1D4B9, WBP_ALetter},
 	{0x1D4BB, 0x1D4BB, WBP_ALetter},
 	{0x1D4BD, 0x1D4C3, WBP_ALetter},
 	{0x1D4C5, 0x1D505, WBP_ALetter},
 	{0x1D507, 0x1D50A, WBP_ALetter},
 	{0x1D50D, 0x1D514, WBP_ALetter},
 	{0x1D516, 0x1D51C, WBP_ALetter},
 	{0x1D51E, 0x1D539, WBP_ALetter},
 	{0x1D53B, 0x1D53E, WBP_ALetter},
 	{0x1D540, 0x1D544, WBP_ALetter},
 	{0x1D546, 0x1D546, WBP_ALetter},
 	{0x1D54A, 0x1D550, WBP_ALetter},
 	{0x1D552, 0x1D6A5, WBP_ALetter},
 	{0x1D6A8, 0x1D6C0, WBP_ALetter},
 	{0x1D6C2, 0x1D6DA, WBP_ALetter},
 	{0x1D6DC, 0x1D6FA, WBP_ALetter},
 	{0x1D6FC, 0x1D714, WBP_ALetter},
 	{0x1D716, 0x1D734, WBP_ALetter},
 	{0x1D736, 0x1D74E, WBP_ALetter},
 	{0x1D750, 0x1D76E, WBP_ALetter},
 	{0x1D770, 0x1D788, WBP_ALetter},
 	{0x1D78A, 0x1D7A8, WBP_ALetter},
 	{0x1D7AA, 0x1D7C2, WBP_ALetter},
 	{0x1D7C4, 0x1D7CB, WBP_ALetter},
 	{0x1D7CE, 0x1D7FF, WBP_Numeric},
 	{0xE0001, 0xE0001, WBP_Format},
 	{0xE0020, 0xE007F, WBP_Format},
 	{0xE0100, 0xE01EF, WBP_Extend},
 	{0xFFFFFFFF, 0xFFFFFFFF, WBP_Undefined}
 };
--- a/linebreak/linebreak/wordbreakdata1.tmpl
+++ b/linebreak/linebreak/wordbreakdata1.tmpl
@ -0,0 +1,5 @@
 #include "linebreak.h"
 #include "wordbreakdef.h"
 static struct WordBreakProperties wb_prop_default[] = {
--- a/linebreak/linebreak/wordbreakdata2.tmpl
+++ b/linebreak/linebreak/wordbreakdata2.tmpl
@ -0,0 +1,2 @@
 	{0xFFFFFFFF, 0xFFFFFFFF, WBP_Undefined}
 };
--- a/linebreak/linebreak/wordbreakdef.h
+++ b/linebreak/linebreak/wordbreakdef.h
@ -0,0 +1,78 @@
 /* vim: set tabstop=4 shiftwidth=4: */
 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
 * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
 * arising from the use of this software.
 *
 * Permission is granted to anyone to use this software for any purpose,
 * including commercial applications, and to alter it and redistribute
 * it freely, subject to the following restrictions:
 *
 * 1. The origin of this software must not be misrepresented; you must
 *    not claim that you wrote the original software.  If you use this
 *    software in a product, an acknowledgement in the product
 *    documentation would be appreciated but is not required.
 * 2. Altered source versions must be plainly marked as such, and must
 *    not be misrepresented as being the original software.
 * 3. This notice may not be removed or altered from any source
 *    distribution.
 *
 * The main reference is Unicode Standard Annex 29 (UAX #29):
 *		<URL:http://unicode.org/reports/tr29>
 *
 * When this library was designed, this annex was at Revision 17, for
 * Unicode 6.0.0:
 *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
 * The Unicode Terms of Use are available at
 *		<URL:http://www.unicode.org/copyright.html>
 */
 /**
 * @file	wordbreakdef.h
 *
 * Definitions of internal data structures, declarations of global
 * variables, and function prototypes for the word breaking algorithm.
 *
 * @version	2.1, 2012/01/18
 * @author	Tom Hacohen
 */
 /**
 * Word break classes.  This is a direct mapping of Table 3 of Unicode
 * Standard Annex 29, Revision 17.
 */
 enum WordBreakClass
 {
   WBP_Undefined,
   WBP_CR,
   WBP_LF,
   WBP_Newline,
   WBP_Extend,
   WBP_Format,
   WBP_Katakana,
   WBP_ALetter,
   WBP_MidNumLet,
   WBP_MidLetter,
   WBP_MidNum,
   WBP_Numeric,
   WBP_ExtendNumLet,
   WBP_Any
 };
 /**
 * Struct for entries of word break properties.  The array of the
 * entries \e must be sorted.
 */
 struct WordBreakProperties
 {
 	utf32_t start;				/**< Starting coding point */
 	utf32_t end;				/**< End coding point */
 	enum WordBreakClass prop;	/**< The word breaking property */
 };
		`@ -0,0 +1 @@`
							`:pserver:anonymous@vimgadgets.cvs.sourceforge.net:/cvsroot/vimgadgets`
		`@ -0,0 +1 @@`
							`s/\(^[0-9A-F.]\{1,\};[A-Z][A-Z0-9]\) #.*/\1/p`
		`@ -0,0 +1,2 @@`
							`s/^\([0-9A-F]\{1,\}\);/\1..\1;/`
							`s/^\([0-9A-F]\{1,\}\)\.\.\([0-9A-F]\{1,\}\);\([A-Z][A-Z0-9]\)/ { 0x\1, 0x\2, LBP_\3 },/`
		`@ -0,0 +1 @@`
							`/* The content of this file is generated from:`
		`@ -0,0 +1,2 @@`
							`{ 0xFFFFFFFF, 0xFFFFFFFF, LBP_Undefined }`
							`};`
		`@ -0,0 +1,2 @@`
							`#! /bin/sh`
							`rm -rf Makefile.in aclocal.m4 autom4te.cache/ config.guess config.h.in config.sub configure depcomp doc/ install-sh ltmain.sh missing`
		`@ -0,0 +1,2 @@`
							`{0xFFFFFFFF, 0xFFFFFFFF, WBP_Undefined}`
							`};`