Initial import of linebreak

2015-05-27 00:00:50 +03:00 · 2015-05-27 00:00:50 +03:00 · 74ca5511d7
commit 74ca5511d7
parent 56a5d7f63f
34 changed files with 6889 additions and 0 deletions
--- a/linebreak/linebreak/AUTHORS
+++ b/linebreak/linebreak/AUTHORS
@ -0,0 +1,8 @@
+Wu Yongwei.  Designed and implemented liblinebreak.
+
+Nikolay Pultsin.  Put forward the original requirements on liblinebreak,
+performed tests, and made a lot of suggestions on the initial versions.
+
+Thomas Klausner.  Autoconfiscated and libtoolized liblinebreak.
+
+Tom Hacohen.  Added word boundaries support.
--- a/linebreak/linebreak/CVS/Entries
+++ b/linebreak/linebreak/CVS/Entries
@ -0,0 +1,32 @@
+/AUTHORS/1.2/Wed Jan 18 14:26:13 2012//
+/ChangeLog/1.78/Sat Aug 11 07:35:23 2012//
+/Doxyfile/1.7/Sat Aug 11 06:55:18 2012//
+/LICENCE/1.4/Sat Aug 11 07:35:23 2012//
+/LineBreak1.sed/1.2/Sun Dec  7 10:54:37 2008//
+/LineBreak2.sed/1.2/Sun Dec  7 10:54:37 2008//
+/Makefile.am/1.8/Sat Aug 11 06:55:18 2012//
+/Makefile.gcc/1.4/Thu Jan 19 14:03:34 2012//
+/Makefile.msvc/1.5/Sat Aug 11 05:57:50 2012//
+/NEWS/1.7/Sat Aug 11 06:55:18 2012//
+/README/1.8/Sat Aug 11 06:55:18 2012//
+/bootstrap/1.1/Fri Dec 12 12:01:39 2008//
+/configure.ac/1.6/Sat Aug 11 06:55:18 2012//
+/filter_dup.c/1.1/Sat Feb 23 11:53:28 2008//
+/libunibreak.pc.in/1.1/Sat Aug 11 06:55:18 2012//
+/linebreak.c/1.25/Sat May  7 19:55:10 2011//
+/linebreak.h/1.14/Sat May  7 19:55:10 2011//
+/linebreakdata.c/1.5/Sat May  7 19:40:20 2011//
+/linebreakdata1.tmpl/1.1/Sat Feb 23 11:53:28 2008//
+/linebreakdata2.tmpl/1.2/Sun Mar  2 07:30:43 2008//
+/linebreakdata3.tmpl/1.1/Sat Feb 23 11:53:28 2008//
+/linebreakdef.c/1.12/Sat May  7 19:55:10 2011//
+/linebreakdef.h/1.12/Sat May  7 19:55:10 2011//
+/purge/1.1/Fri Dec 12 12:01:39 2008//
+/sort_numeric_hex.py/1.2/Wed Jan 18 14:26:13 2012//
+/wordbreak.c/1.3/Sat Feb  4 14:32:57 2012//
+/wordbreak.h/1.4/Sat Feb  4 14:32:58 2012//
+/wordbreakdata.c/1.2/Wed Jan 18 14:26:13 2012//
+/wordbreakdata1.tmpl/1.2/Wed Jan 18 14:26:13 2012//
+/wordbreakdata2.tmpl/1.2/Wed Jan 18 14:26:13 2012//
+/wordbreakdef.h/1.2/Wed Jan 18 14:26:13 2012//
+D
--- a/linebreak/linebreak/CVS/Repository
+++ b/linebreak/linebreak/CVS/Repository
@ -0,0 +1 @@
+common/tools/linebreak
--- a/linebreak/linebreak/CVS/Root
+++ b/linebreak/linebreak/CVS/Root
@ -0,0 +1 @@
+:pserver:anonymous@vimgadgets.cvs.sourceforge.net:/cvsroot/vimgadgets
--- a/linebreak/linebreak/ChangeLog
+++ b/linebreak/linebreak/ChangeLog
@ -0,0 +1,512 @@
+2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* LICENCE: Add copyright information about Tom Hacohen.
+
+2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* configure.ac (AC_INIT): Change the library name and version to
+	`libunibreak' and `1.0'.
+	(AC_PROG_LN_S): New macro.
+	(AC_OUTPUT): Change to `libunibreak.pc'.
+	* Doxyfile: (PROJECT_NAME): Change to `libunibreak'.
+	(PROJECT_NUMBER): Change to `1.0'.
+	* Makefile.am (lib_LTLIBRARIES): Change to `libunibreak.la'.
+	(pkgconfig_DATA): Change to `libunibreak.la'.
+	(libunibreak_la_LDFLAGS): Reset the version to `1:0'.
+	(install-exec-hook): Replace the static library liblinebreak.a with
+	a symlink to libunibreak.a.
+	* NEW: Add information about libunibreak 1.0.
+	* README: Change the library name, and add information about word
+	break.
+
+2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.msvc: Change the library name to `libunibreak', and the
+	output library to `unibreak.lib'.
+
+2012-02-04  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* wordbreak.h (WORDBREAK_INSIDEACHAR): Change from
+	WORDBREAK_INSIDECHAR.
+	* wordbreak.c (set_brks_to): Change `WORDBREAK_INSIDECHAR' to
+	`WORDBREAK_INSIDEACHAR'.
+
+2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* wordbreak.h: Change angle brackets to quotation marks (which
+	caused build errors).
+
+2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.gcc (CFILES): Add wordbreak.c.
+	(WordBreakProperty.txt): New target.
+	(wordbreakdata): New target.
+
+2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.am (liblinebreak_la_SOURCES): Remove wordbreakdata.c.
+	(EXTRA_DIST): Add wordbreakdata.c, wordbreakdata1.tmpl, and
+	wordbreakdata2.tmpl.
+
+2012-01-19  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.msvc: Add wordbreak files.
+
+2012-01-18  Tom Hacohen  <tom@stosb.com>
+
+	Add word breaking support.
+	* AUTHORS: Add `Tom Hacohen'.
+	* Makefile.am (include_HEADERS): Add header files for word breaking.
+	(liblinebreak_la_SOURCES): Add source files for word breaking.
+	(sort_numeric_hex.py): Add `sort_numeric_hex.py'.
+	(distclean-local): Clean also `WordBreakData.txt'.
+	(WordBreakProperty.txt): New target.
+	(wordbreakdata): New target.
+	* sort_numeric_hex.py: New file.
+	* wordbreak.c: New file.
+	* wordbreak.h: New file.
+	* wordbreakdef.h: New file.
+	* wordbreakdata.c: New file.
+	* wordbreakdata1.tmpl: New file.
+	* wordbreakdata2.tmpl: New file.
+
+2011-05-17  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Add support for pkg-config (thanks to Tom Hacohen).
+	* liblinebreak.pc.in: New file.
+	* configure.ac (AC_OUTPUT): Add `liblinebreak.pc'.
+	* Makefile.am (pkgconfig_DATA): Set to `liblinebreak.pc'.
+	(pkgconfigdir): Set to `$(libdir)/pkgconfig'.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README: Update the reference to UAX #14-26, for Unicode 6.0.0.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* configure.ac (AC_INIT): Increase the version to 2.1.
+	* Makefile.am (liblinebreak_la_LDFLAGS): Set the version-info to
+	`2:1'.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* LICENCE: Update the copyright year.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Update for the 2.1 release.
+	* Doxyfile (PROJECT_NUMBER): Set to `2.1'.
+	* NEWS: Add information about the 2.1 release.
+	* linebreak.h (LINEBREAK_VERSION): Set to `0x0201'.
+	* linebreak.h: Update comments.
+	* linebreak.c: Ditto.
+	* linebreakdef.h: Ditto.
+	* linebreakdef.c: Ditto.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreakdata.c: Regenerate from LineBreak-6.0.0.txt.
+
+2011-05-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c (set_linebreaks): Fix the assertion failure when
+	U+FFFC (OBJECT REPLACEMENT CHARACTER) appears at the beginning of a
+	line (thanks to Tom Hacohen).
+
+2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* LICENCE: Update the copyright year.
+
+2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* NEWS: Add information about the 2.0 release.
+
+2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Doxyfile (PROJECT_NUMBER): Set to `2.0'.
+	(HAVE_DOT): Set to `YES'.
+
+2010-01-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c: Update the version number in comment to 2.0.
+	* linebreak.h: Ditto.
+	* linebreakdef.c: Ditto.
+	* linebreakdef.h: Ditto.
+
+2009-12-17  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Change the values of enum BreakAction to the same length.
+	* linebreak.c (DIRECT_BRK): Rename to DIR_BRK.
+	(INDIRECT_BRK): Rename to IND_BRK.
+	(CM_INDIRECT_BRK): Rename to CMI_BRK.
+	(CM_PROHIBITED_BRK): Rename to CMP_BRK.
+	(PROHIBITED_BRK): Rename to PRH_BRK.
+
+2009-11-29  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Doxyfile (TAB_SIZE): Set to the correct size `4', as used in the
+	source files.
+
+2009-11-29  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Update files according to UAX #14-24, for Unicode 5.2.0.
+	* linebreak.c: Update comments about UAX #14.
+	* linebreak.h: Ditto.
+	* linebreakdef.c: Ditto.
+	* linebreakdef.h: Ditto.
+	(LBP_CP): New enumerator for the new `CP' class as defined in
+	UAX #14-24.
+	* linebreak.c (baTable): Update for the new class `CP'.
+	* linebreakdata.c: Regenerate from LineBreak-5.2.0.txt.
+	* README: Update the reference to UAX #14-24, for Unicode 5.2.0.
+
+2009-05-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* NEWS: Add information about the 1.2 release.
+
+2009-04-30  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Optimize the Doxygen output.
+	* linebreak.c (lb_prop_index): Adjust its definition format
+	slightly.
+
+2009-04-30  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Doxyfile (USE_WINDOWS_ENCODING): Remove obsolete tag.
+	(DETAILS_AT_TOP): Ditto.
+	(MAX_DOT_GRAPH_WIDTH): Ditto.
+	(MAX_DOT_GRAPH_HEIGHT): Ditto.
+	(REFERENCED_BY_RELATION): Set to `NO'.
+	(REFERENCES_RELATION): Ditto.
+	(EXCLUDE): Add `filter_dup.c'.
+
+2009-04-28  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c (lb_get_next_char_utf8): Fix the issue that the index
+	can point to the middle of a UTF-8 sequence if End of String (EOS)
+	is encountered prematurely (thanks to Nikolay Pultsin and Rick Xu).
+	(lb_get_next_char_utf16): Fix the issue that the index can point to
+	the middle of a UTF-16 surrogate pair if EOS is encountered
+	prematurely.
+
+2009-04-20  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreakdef.c (lb_prop_English): Remove the specialization of
+	right single quotation mark as closing punctuation mark, because it
+	can be used as apostrophe.
+	(lb_prop_Spanish): Ditto.
+	(lb_prop_French): Ditto.
+
+2009-04-09  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.msvc: Make the `clean' target work on MSVC versions other
+	than 6.0; do not use precompiled header.
+
+2009-03-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.h: Correct the wrong date in the documentation comment.
+	* linebreakdef.h: Ditto.
+
+2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* configure.ac (AC_INIT): Increase the version to 2.0.
+	* Makefile.am (liblinebreak_la_LDFLAGS): Set the version-info to
+	`2:0'.
+
+2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.h (LINEBREAK_VERSION): New macro.
+	(linebreak_version): New global constant declaration.
+	* linebreak.c (linebreak_version): New global constant definition.
+
+2009-02-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Reduce namespace pollution.
+	* linebreak.c (get_lb_prop_lang): Mark as static.
+	(get_next_char_utf8): Rename to lb_get_next_char_utf8.
+	(get_next_char_utf16): Rename to lb_get_next_char_utf32.
+	(get_next_char_utf32): Rename to lb_get_next_char_utf32.
+	(is_breakable): Rename to is_line_breakable.
+	* linebreak.h (get_next_char_utf8): Remove the function prototype
+	declaration.
+	(get_next_char_utf16): Ditto.
+	(get_next_char_utf32): Ditto.
+	(is_breakable): Rename to is_line_breakable.
+	* linebreakdef.h (lb_get_next_char_utf8): Add the function prototype
+	declaration.
+	(lb_get_next_char_utf16): Ditto.
+	(lb_get_next_char_utf32): Ditto.
+
+2009-02-06  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* NEWS: Add information about the 1.1 release.
+
+2009-01-02  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.am (EXTRA_DIST): Add the missing `LICENCE' file.
+
+2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c: Update the version number in comment to 1.0.
+	* linebreak.h: Ditto.
+	* linebreakdef.c: Ditto.
+	* linebreakdef.h: Ditto.
+
+2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* NEWS: Update for the 1.0 release.
+
+2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README: Correct two typos.
+
+2008-12-31  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README: Add the online URL reference.
+
+2008-12-30  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README: Update the reference to UAX #14-22, for Unicode 5.1.0.
+
+2008-12-13  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Update files according to UAX #14-22, for Unicode 5.1.0.
+	* linebreak.c (baTable): Update according to Table 2 of UAX #14-22.
+	* linebreakdef.c (lb_prop_Spanish): Remove the unnecessary
+	customization for inverted marks in Spanish.
+	* linebreakdata.c: Regenerate from LineBreak-5.1.0.txt.
+	* linebreak.h: Update comment only.
+	* linebreakdef.h: Ditto.
+
+2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README: Update for the new build methods and better readability.
+
+2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.msvc: Correct the inconsistent naming in the output
+	message.
+
+2008-12-12  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* configure.ac (AM_INIT_AUTOMAKE): Mark `foreign'.
+	* bootstrap: New file.
+	* purge: New file.
+	* Makefile.gcc (purge): Remove this target.
+
+2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* NEWS: New file.
+
+2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* AUTHORS: New file.
+
+2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.gcc (purge): New phony target to purge files generated by
+	autoconfiscation.
+
+2008-12-10  Thomas Klausner  <tk@giga.or.at>
+
+	* configure.ac: New file.
+	* Makefile.am: New file.
+
+2008-12-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Doxyfile (OUTPUT_DIRECTORY): Set to `doc'.
+	(ALPHABETICAL_INDEX): Set to `YES'.
+
+2008-12-09  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.msvc: New file.
+
+2008-12-09  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile: Remove (to become Makefile.gcc).
+	* Makefile.gcc: New file (was Makefile).
+
+2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c: Adjust the comment that refers to Unicode Annex 14.
+	* linebreak.h: Ditto.
+	* linebreakdef.c: Ditto.
+	* linebreakdef.h: Ditto.
+
+2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Use only POSIX basic regexp to ensure maximum portability (issues
+	have been found on Mac OS X, where GNU extensions do not work).
+	* LineBreak1.sed: Replace `[:xdigit:]' with `0-9A-F', and `\+' with
+	`\{1,\}'.
+	* LineBreak2.sed: Ditto.
+
+2008-12-07  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile: Replace `*.exe' with `filter_dup$(EXEEXT)', since the
+	extension `.exe' is specific to Windows.
+
+2008-04-20  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Add README and LICENCE files, as well as a Doxyfile to generate
+	documents.
+	* README: New file.
+	* LICENCE: New file.
+	* Doxyfile: New file.
+	* Makefile (doc): Add new phony target.
+
+2008-04-04  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Remove the English override for plus sign: it is better treated in
+	the text breaking program (see ../breaktext/ for an example).
+	* linebreakdef.c (lb_prop_English): Remove the line for plus sign.
+
+2008-03-29  Wu Yongwei <wuyongwei@gmail.com>
+
+	* Makefile: Correct the dependency-making rules when OLDGCC=Y.
+
+2008-03-23  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile (clean): Do not remove *.exe and tags here.
+	(distclean): Remove *.exe and tags.
+
+2008-03-23  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Remove the English override for solidus: it is better treated in the
+	text breaking program (see ../breaktext/ for an example).
+	* linebreakdef.c (lb_prop_English): Remove the line for solidus.
+
+2008-03-16  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Rename init_linebreak_prop_index to init_linebreak for future
+	safety; make visible certain functions that are potentially useful.
+	* linebreak.c (init_linebreak_prop_index): Rename to init_linebreak.
+	(get_next_char_t): Move to linebreakdef.h.
+	(get_next_char_utf8): Make non-static.
+	(get_next_char_utf16): Ditto.
+	(get_next_char_utf32): Ditto.
+	(set_linebreaks): Ditto.
+	* linebreak.h (init_linebreak_prop_index): Rename to init_linebreak.
+	(get_next_char_utf8): Add the function prototype.
+	(get_next_char_utf16): Ditto.
+	(get_next_char_utf32): Ditto.
+	* linebreakdef.h (get_next_char_t): Add the typedef.
+	(set_linebreaks): Add the function prototype.
+
+2008-03-16  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile (OLDGCC): Add support for GCC 2.95.3 (when OLDGCC=Y).
+
+2008-03-15  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c (set_linebreaks): Fix a bug that `==' was wrongly used
+	for `='.
+
+2008-03-05  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Improve the performance by reducing the look-ups of the
+	language-specific line breaking properties array from the language
+	name (thanks to Nikolay Pultsin).
+	* linebreak.c (get_lb_prop_lang): New function.
+	(get_char_lb_class_lang): Change the second parameter from the
+	language name to the line breaking properties array.
+	(set_linebreaks): Look up the language-specific line breaking
+	properties array from the language name only once in one function
+	call.
+
+2008-03-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Make minor adjustments in code and comments.
+	* linebreak.c: Adjust the doc comments.
+	(init_linebreak_prop_index): Modify a conditional to make it more
+	robust and consistent.
+	* linebreakdef.c (lb_prop_lang_map): Replace the pointer
+	lb_prop_default with NULL, since the value is never used.
+
+2008-03-03  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Accelerate get_char_lb_class for invalid Unicode code points.
+	* linebreak.c (get_char_lb_class): Adjust the conditionals so that
+	getting the line breaking class for an invalid code point is much
+	faster, which requires the array of line breaking properties be
+	sorted.
+	* linebreakdef.h: Adjust a comment that the array of line break
+	properties must be sorted.
+
+2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Change the values of enum BreakAction to more complete forms.
+	* linebreak.c (INDRCT_BRK): Rename to INDIRECT_BRK.
+	(CM_INDRCT_BRK): Rename to CM_INDIRECT_BRK.
+	(CM_PROHIBTD_BRK): Rename to CM_PROHIBITED_BRK.
+	(PROHIBTD_BRK): Rename to PROHIBITED_BRK.
+
+2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Implement a two-stage search in get_char_lb_class_default to
+	accelerate the overall performance, especially for non-Latin
+	languages.
+	* linebreak.c (LINEBREAK_INDEX_SIZE): New constant macro.
+	(struct LineBreakPropertiesIndex): New struct.
+	(lb_prop_index): New static variable.
+	(init_linebreak_prop_index): New function.
+	(get_char_lb_class_default): New function.
+	(get_char_lb_class_lang): Use get_char_lb_class_default.
+	* linebreak.h: Detect C++ and add extern "C" guard if necessary.
+	(init_linebreak_prop_index): Add the prototype declaration.
+	* linebreakdef.h: Adjust a comment.
+
+2008-03-02  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Split/refactor the code; add (doc) comments.
+	* Makefile (CFILES): Add linebreakdata.c and linebreakdef.c.
+	* linebreak.c: Add and adjust comments.
+	(linebreakdef.h): Add include file.
+	(linebreakdata.c): Remove include file.
+	(EOS): Remove (now in linebreakdef.h).
+	(enum LineBreakClass): Ditto.
+	(struct LineBreakProperties): Ditto.
+	(lbpEnglish): Remove (now in linebreakdef.c as lb_prop_English).
+	(lbpGerman): Remove (now in linebreakdef.c as lb_prop_German).
+	(lbpSpanish): Remove (now in linebreakdef.c as lb_prop_Spanish).
+	(lbpFrench): Remove (now in linebreakdef.c as lb_prop_French).
+	(lbpRussian): Remove (now in linebreakdef.c as lb_prop_Russian).
+	(lbpChinese): Remove (now in linebreakdef.c as lb_prop_Chinese).
+	(struct LineBreakPropertiesLang): Remove (now in linebreakdef.h).
+	(lbpLangs): Remove (now in linebreakdef.c as lb_prop_lang_map).
+	(get_next_char_utf16): Make sure memory access not go beyond len.
+	* linebreak.h: Add copyright information and adjust comments.
+	(stddef.h): Add include file.
+	* linebreakdata.c (linebreak.h): Add include file.
+	(linebreakdef.h): Add include file.
+	(lbpDefault): Make global and rename to lb_prop_default.
+	* linebreakdata2.tmpl: Add two include files, a comment line, and
+	remove `static'.
+	* linebreakdef.c: New file.
+	* linebreakdef.h: New file.
+
+2008-02-26  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* linebreak.c (lbpSpanish): New array for Spanish-specific data.
+	(lbpLangs): Update the index array for Spanish.
+	(resolve_lb_class): Resolve AmbIguous class to IDeographic in
+	Chinese, Japanese, and Korean.
+
+2008-02-26  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile (LineBreak.txt): Add new rule to retrieve it from the Web
+	if it is not already there.
+
+2008-02-23  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Add files for linebreak.
+	* LineBreak1.sed: New file.
+	* LineBreak2.sed: New file.
+	* Makefile: New file.
+	* filter_dup.c: New file.
+	* linebreak.c: New file.
+	* linebreak.h: New file.
+	* linebreakdata.c: New file.
+	* linebreakdata1.tmpl: New file.
+	* linebreakdata2.tmpl: New file.
+	* linebreakdata3.tmpl: New file.
--- a/linebreak/linebreak/Doxyfile
+++ b/linebreak/linebreak/Doxyfile
--- a/linebreak/linebreak/LICENCE
+++ b/linebreak/linebreak/LICENCE
@ -0,0 +1,19 @@
+Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com>
+Copyright (C) 2012 Tom Hacohen <tom dot hacohen at samsung dot com>
+
+This software is provided 'as-is', without any express or implied
+warranty.  In no event will the author be held liable for any damages
+arising from the use of this software.
+
+Permission is granted to anyone to use this software for any purpose,
+including commercial applications, and to alter it and redistribute it
+freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you must not
+   claim that you wrote the original software.  If you use this software
+   in a product, an acknowledgement in the product documentation would
+   be appreciated but is not required.
+2. Altered source versions must be plainly marked as such, and must not
+   be misrepresented as being the original software.
+3. This notice may not be removed or altered from any source
+   distribution.
--- a/linebreak/linebreak/LineBreak1.sed
+++ b/linebreak/linebreak/LineBreak1.sed
@ -0,0 +1 @@
+s/\(^[0-9A-F.]\{1,\};[A-Z][A-Z0-9]\) #.*/\1/p
--- a/linebreak/linebreak/LineBreak2.sed
+++ b/linebreak/linebreak/LineBreak2.sed
@ -0,0 +1,2 @@
+s/^\([0-9A-F]\{1,\}\);/\1..\1;/
+s/^\([0-9A-F]\{1,\}\)\.\.\([0-9A-F]\{1,\}\);\([A-Z][A-Z0-9]\)/	{ 0x\1, 0x\2, LBP_\3 },/
--- a/linebreak/linebreak/Makefile.am
+++ b/linebreak/linebreak/Makefile.am
@ -0,0 +1,63 @@
+#noinst_PROGRAMS = filter_dup
+include_HEADERS = linebreak.h linebreakdef.h wordbreak.h wordbreakdef.h
+lib_LTLIBRARIES = libunibreak.la
+pkgconfig_DATA  = libunibreak.pc
+pkgconfigdir    = ${libdir}/pkgconfig
+
+libunibreak_la_LDFLAGS = -no-undefined -version-info 1:0
+libunibreak_la_SOURCES = \
+	linebreak.c \
+	linebreakdata.c \
+	linebreakdef.c \
+	wordbreak.c
+
+EXTRA_DIST = \
+	LineBreak1.sed \
+	LineBreak2.sed \
+	linebreakdata1.tmpl \
+	linebreakdata2.tmpl \
+	linebreakdata3.tmpl \
+	wordbreakdata1.tmpl \
+	wordbreakdata2.tmpl \
+	wordbreakdata.c \
+	LICENCE \
+	Doxyfile \
+	Makefile.gcc \
+	Makefile.msvc \
+	doc \
+	sort_numeric_hex.py
+
+install-exec-hook:
+	rm -f ${libdir}/liblinebreak.a
+	${LN_S} ${libdir}/libunibreak.a ${libdir}/liblinebreak.a
+
+distclean-local:
+	rm -f LineBreak.txt WordBreakData.txt filter_dup${EXEEXT}
+
+doc:
+	cd ${top_srcdir} && doxygen
+
+LineBreak.txt:
+	wget http://unicode.org/Public/UNIDATA/LineBreak.txt
+
+WordBreakProperty.txt:
+	wget http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
+
+linebreakdata: ${builddir}/filter_dup LineBreak.txt
+	sed -n -f ${top_srcdir}/LineBreak1.sed LineBreak.txt > tmp.txt
+	sed -f ${top_srcdir}/LineBreak2.sed tmp.txt | ${builddir}/filter_dup > tmp.c
+	head -2 LineBreak.txt > tmp.txt
+	cat ${top_srcdir}/linebreakdata1.tmpl tmp.txt ${top_srcdir}/linebreakdata2.tmpl tmp.c ${top_srcdir}/linebreakdata3.tmpl > ${top_srcdir}/linebreakdata.c
+	rm tmp.txt tmp.c
+
+wordbreakdata: WordBreakProperty.txt
+	sed -E -n 's/(^[0-9A-F.]+)/\1/p' WordBreakProperty.txt > tmp2.txt
+	sed -E -i.bak 's/^([0-9A-F]+) +/\1..\1/' tmp2.txt
+	${top_srcdir}/sort_numeric_hex.py tmp2.txt > tmp.txt
+	rm tmp2.txt tmp2.txt.bak
+	sed -E -i.bak -n 's/^([0-9A-F]+)..([0-9A-F]+) *; *([A-Za-z]+).*/'$$'\t''{0x\1, 0x\2, WBP_\3},/p' tmp.txt 
+	echo "/* The content of this file is generated from:" > ${top_srcdir}/wordbreakdata.c
+	head -2 WordBreakProperty.txt >> ${top_srcdir}/wordbreakdata.c
+	echo "*/" >> ${top_srcdir}/wordbreakdata.c
+	cat ${top_srcdir}/wordbreakdata1.tmpl tmp.txt ${top_srcdir}/wordbreakdata2.tmpl >> ${top_srcdir}/wordbreakdata.c
+	rm tmp.txt tmp.txt.bak
--- a/linebreak/linebreak/Makefile.gcc
+++ b/linebreak/linebreak/Makefile.gcc
@ -0,0 +1,177 @@
+# Windows/Cygwin support
+ifdef windir
+    WINDOWS := 1
+    CYGWIN  := 0
+else
+    ifdef WINDIR
+        WINDOWS := 1
+        CYGWIN  := 1
+    else
+        WINDOWS := 0
+    endif
+endif
+ifeq ($(WINDOWS),1)
+    EXEEXT := .exe
+    DLLEXT := .dll
+    DEVNUL := nul
+    ifeq ($(CYGWIN),1)
+        PATHSEP := /
+    else
+        PATHSEP := $(strip \ )
+    endif
+else
+    EXEEXT :=
+    DLLEXT := .so
+    DEVNUL := /dev/null
+    PATHSEP := /
+endif
+
+CFG ?= Debug
+ifeq ($(CFG),Debug)
+    all: debug
+else
+    all: release
+endif
+
+OLDGCC ?= N
+
+DEBUG   := DebugDir
+RELEASE := ReleaseDir
+
+$(DEBUG)/%.o: %.c
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -c -o $@ $<
+
+$(RELEASE)/%.o: %.c
+	$(CC) $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -c -o $@ $<
+
+$(DEBUG)/%.o: %.cpp
+	$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -c -o $@ $<
+
+$(RELEASE)/%.o: %.cpp
+	$(CXX) $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -c -o $@ $<
+
+ifeq ($(OLDGCC),N)
+
+$(DEBUG)/%.dep: %.c
+	$(CC) -MM -MT $(patsubst %.dep,%.o,$@) $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -o $@ $<
+
+$(RELEASE)/%.dep: %.c
+	$(CC) -MM -MT $(patsubst %.dep,%.o,$@) $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -o $@ $<
+
+$(DEBUG)/%.dep: %.cpp
+	$(CXX) -MM -MT $(patsubst %.dep,%.o,$@) $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) -o $@ $<
+
+$(RELEASE)/%.dep: %.cpp
+	$(CXX) -MM -MT $(patsubst %.dep,%.o,$@) $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) -o $@ $<
+
+else
+
+$(DEBUG)/%.dep: %.c
+	$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(DEBUG)/!" > $@
+
+$(RELEASE)/%.dep: %.c
+	$(CC) -MM $(CFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(RELEASE)/!" > $@
+
+$(DEBUG)/%.dep: %.cpp
+	$(CXX) -MM $(CXXFLAGS) $(CPPFLAGS) $(DBGFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(DEBUG)/!" > $@
+
+$(RELEASE)/%.dep: %.cpp
+	$(CXX) -MM $(CXXFLAGS) $(CPPFLAGS) $(RELFLAGS) $(TARGET_ARCH) $< | sed "s!^!$(RELEASE)/!" > $@
+
+endif
+
+CC  = gcc
+CXX = g++
+AR  = ar
+LD  = $(CXX) $(CXXFLAGS) $(TARGET_ARCH)
+
+INCLUDE  = -I. $(patsubst %,-I%,$(VPATH))
+CFLAGS   = -W -Wall $(INCLUDE)
+CXXFLAGS = $(CFLAGS)
+DBGFLAGS = -D_DEBUG -g
+RELFLAGS = -DNDEBUG -O2
+CPPFLAGS =
+
+ifeq ($(OLDGCC),N)
+    CFLAGS += -fmessage-length=0
+endif
+
+HFILES   = $(wildcard $(patsubst -I%,%/*.h,$(INCLUDE)))
+OBJFILES = $(CFILES:.c=.o) $(CXXFILES:.cpp=.o)
+
+DEBUG_OBJS   = $(patsubst %.o,$(DEBUG)/%.o,$(OBJFILES))
+RELEASE_OBJS = $(patsubst %.o,$(RELEASE)/%.o,$(OBJFILES))
+
+DEBUG_DEPS   = $(patsubst %.o,%.dep,$(DEBUG_OBJS))
+RELEASE_DEPS = $(patsubst %.o,%.dep,$(RELEASE_OBJS))
+
+CFILES   := linebreak.c linebreakdata.c linebreakdef.c wordbreak.c
+CXXFILES :=
+
+LIBS :=
+
+TARGET         = liblinebreak.a
+DEBUG_TARGET   = $(patsubst %,$(DEBUG)/%,$(TARGET))
+RELEASE_TARGET = $(patsubst %,$(RELEASE)/%,$(TARGET))
+
+debug:   $(DEBUG) $(DEBUG_TARGET)
+
+release: $(RELEASE) $(RELEASE_TARGET)
+
+
+
+$(DEBUG):
+	mkdir $(DEBUG)
+
+$(RELEASE):
+	mkdir $(RELEASE)
+
+$(DEBUG_TARGET): $(DEBUG_DEPS) $(DEBUG_OBJS)
+	$(AR) -r $(DEBUG_TARGET) $(DEBUG_OBJS)
+
+$(RELEASE_TARGET): $(RELEASE_DEPS) $(RELEASE_OBJS)
+	$(AR) -r $(RELEASE_TARGET) $(RELEASE_OBJS)
+
+doc:
+	doxygen
+
+linebreakdata: filter_dup$(EXEEXT) LineBreak.txt
+	sed -n -f LineBreak1.sed LineBreak.txt > tmp.txt
+	sed -f LineBreak2.sed tmp.txt | .$(PATHSEP)filter_dup > tmp.c
+	head -2 LineBreak.txt > tmp.txt
+	cat linebreakdata1.tmpl tmp.txt linebreakdata2.tmpl tmp.c linebreakdata3.tmpl > linebreakdata.c
+	$(RM) tmp.txt tmp.c
+
+wordbreakdata: WordBreakProperty.txt
+	sed -E -n 's/(^[0-9A-F.]+)/\1/p' WordBreakProperty.txt > tmp2.txt
+	sed -E -i.bak 's/^([0-9A-F]+) +/\1..\1/' tmp2.txt
+	./sort_numeric_hex.py tmp2.txt > tmp.txt
+	rm tmp2.txt tmp2.txt.bak
+	sed -E -i.bak -n 's/^([0-9A-F]+)..([0-9A-F]+) *; *([A-Za-z]+).*/'$$'\t''{0x\1, 0x\2, WBP_\3},/p' tmp.txt 
+	echo "/* The content of this file is generated from:" > wordbreakdata.c
+	head -2 WordBreakProperty.txt >> wordbreakdata.c
+	echo "*/" >> wordbreakdata.c
+	cat wordbreakdata1.tmpl tmp.txt wordbreakdata2.tmpl >> wordbreakdata.c
+	rm tmp.txt tmp.txt.bak
+
+filter_dup$(EXEEXT): filter_dup.c
+	gcc -O2 -o filter_dup$(EXEEXT) $<
+
+LineBreak.txt:
+	wget http://unicode.org/Public/UNIDATA/LineBreak.txt
+
+WordBreakProperty.txt:
+	wget http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt
+
+.PHONY: all debug release clean distclean doc linebreakdata wordbreakdata
+
+clean:
+	$(RM) $(DEBUG)/*.o $(DEBUG)/*.dep $(DEBUG_TARGET)
+	$(RM) $(RELEASE)/*.o $(RELEASE)/*.dep $(RELEASE_TARGET)
+
+distclean: clean
+	$(RM) $(DEBUG)/* $(RELEASE)/* filter_dup$(EXEEXT) tags LineBreak.txt
+	-rmdir $(DEBUG) 2> $(DEVNUL)
+	-rmdir $(RELEASE) 2> $(DEVNUL)
+
+-include $(wildcard $(DEBUG)/*.dep) $(wildcard $(RELEASE)/*.dep)
--- a/linebreak/linebreak/Makefile.msvc
+++ b/linebreak/linebreak/Makefile.msvc
@ -0,0 +1,189 @@
+# Makefile for Microsoft Visual C++ and NMAKE
+
+!IF "$(CFG)" == ""
+CFG=libunibreak - Win32 Debug
+!MESSAGE No configuration specified. Defaulting to libunibreak - Win32 Debug.
+!ENDIF 
+
+!IF "$(CFG)" != "libunibreak - Win32 Release" && "$(CFG)" != "libunibreak - Win32 Debug"
+!MESSAGE Invalid configuration "$(CFG)" specified.
+!MESSAGE You can specify a configuration when running NMAKE
+!MESSAGE by defining the macro CFG on the command line. For example:
+!MESSAGE 
+!MESSAGE NMAKE /f Makefile.msvc CFG="libunibreak - Win32 Debug"
+!MESSAGE 
+!MESSAGE Possible choices for configuration are:
+!MESSAGE 
+!MESSAGE "libunibreak - Win32 Release" (based on "Win32 (x86) Static Library")
+!MESSAGE "libunibreak - Win32 Debug" (based on "Win32 (x86) Static Library")
+!MESSAGE 
+!ERROR An invalid configuration is specified.
+!ENDIF 
+
+!IF "$(OS)" == "Windows_NT"
+NULL=
+!ELSE 
+NULL=nul
+!ENDIF 
+
+CPP=cl.exe
+RSC=rc.exe
+
+!IF  "$(CFG)" == "libunibreak - Win32 Release"
+
+OUTDIR=.\Release
+INTDIR=.\Release
+# Begin Custom Macros
+OutDir=.\Release
+# End Custom Macros
+
+ALL : "$(OUTDIR)\unibreak.lib"
+
+
+CLEAN :
+	-@erase "$(INTDIR)\linebreak.obj"
+	-@erase "$(INTDIR)\linebreakdata.obj"
+	-@erase "$(INTDIR)\linebreakdef.obj"
+	-@erase "$(INTDIR)\wordbreak.obj"
+	-@erase "$(INTDIR)\vc*.idb"
+	-@erase "$(OUTDIR)\unibreak.lib"
+
+"$(OUTDIR)" :
+    if not exist "$(OUTDIR)/$(NULL)" mkdir "$(OUTDIR)"
+
+CPP_PROJ=/nologo /ML /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_MBCS" /D "_LIB" /Fo"$(INTDIR)\\" /Fd"$(INTDIR)\\" /FD /c 
+BSC32=bscmake.exe
+BSC32_FLAGS=/nologo /o"$(OUTDIR)\unibreak.bsc" 
+BSC32_SBRS= \
+	
+LIB32=link.exe -lib
+LIB32_FLAGS=/nologo /out:"$(OUTDIR)\unibreak.lib" 
+LIB32_OBJS= \
+	"$(INTDIR)\linebreak.obj" \
+	"$(INTDIR)\linebreakdata.obj" \
+	"$(INTDIR)\linebreakdef.obj" \
+	"$(INTDIR)\wordbreak.obj"
+
+"$(OUTDIR)\unibreak.lib" : "$(OUTDIR)" $(DEF_FILE) $(LIB32_OBJS)
+    $(LIB32) @<<
+  $(LIB32_FLAGS) $(DEF_FLAGS) $(LIB32_OBJS)
+<<
+
+!ELSEIF  "$(CFG)" == "libunibreak - Win32 Debug"
+
+OUTDIR=.\Debug
+INTDIR=.\Debug
+# Begin Custom Macros
+OutDir=.\Debug
+# End Custom Macros
+
+ALL : "$(OUTDIR)\unibreak.lib"
+
+
+CLEAN :
+	-@erase "$(INTDIR)\linebreak.obj"
+	-@erase "$(INTDIR)\linebreakdata.obj"
+	-@erase "$(INTDIR)\linebreakdef.obj"
+	-@erase "$(INTDIR)\wordbreak.obj"
+	-@erase "$(INTDIR)\vc*.idb"
+	-@erase "$(INTDIR)\vc*.pdb"
+	-@erase "$(OUTDIR)\unibreak.lib"
+
+"$(OUTDIR)" :
+    if not exist "$(OUTDIR)/$(NULL)" mkdir "$(OUTDIR)"
+
+CPP_PROJ=/nologo /MLd /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_MBCS" /D "_LIB" /Fo"$(INTDIR)\\" /Fd"$(INTDIR)\\" /FD /GZ  /c 
+BSC32=bscmake.exe
+BSC32_FLAGS=/nologo /o"$(OUTDIR)\unibreak.bsc" 
+BSC32_SBRS= \
+	
+LIB32=link.exe -lib
+LIB32_FLAGS=/nologo /out:"$(OUTDIR)\unibreak.lib" 
+LIB32_OBJS= \
+	"$(INTDIR)\linebreak.obj" \
+	"$(INTDIR)\linebreakdata.obj" \
+	"$(INTDIR)\linebreakdef.obj" \
+	"$(INTDIR)\wordbreak.obj"
+
+"$(OUTDIR)\unibreak.lib" : "$(OUTDIR)" $(DEF_FILE) $(LIB32_OBJS)
+    $(LIB32) @<<
+  $(LIB32_FLAGS) $(DEF_FLAGS) $(LIB32_OBJS)
+<<
+
+!ENDIF 
+
+.c{$(INTDIR)}.obj::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+.cpp{$(INTDIR)}.obj::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+.cxx{$(INTDIR)}.obj::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+.c{$(INTDIR)}.sbr::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+.cpp{$(INTDIR)}.sbr::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+.cxx{$(INTDIR)}.sbr::
+   $(CPP) @<<
+   $(CPP_PROJ) $< 
+<<
+
+
+.\linebreak.c : \
+	".\linebreak.h"\
+	".\linebreakdef.h"\
+	
+.\linebreakdata.c : \
+	".\linebreak.h"\
+	".\linebreakdef.h"\
+	
+.\linebreakdef.c : \
+	".\linebreak.h"\
+	".\linebreakdef.h"\
+	
+.\wordbreak.c : \
+	".\linebreak.h"\
+	".\linebreakdef.h"\
+	".\wordbreak.h"\
+	".\wordbreakdef.h"\
+	".\wordbreakdata.c"\
+	
+
+!IF "$(CFG)" == "libunibreak - Win32 Release" || "$(CFG)" == "libunibreak - Win32 Debug"
+SOURCE=.\linebreak.c
+
+"$(INTDIR)\linebreak.obj" : $(SOURCE) "$(INTDIR)"
+
+
+SOURCE=.\linebreakdata.c
+
+"$(INTDIR)\linebreakdata.obj" : $(SOURCE) "$(INTDIR)"
+
+
+SOURCE=.\linebreakdef.c
+
+"$(INTDIR)\linebreakdef.obj" : $(SOURCE) "$(INTDIR)"
+
+
+SOURCE=.\wordbreak.c
+
+"$(INTDIR)\wordbreak.obj" : $(SOURCE) "$(INTDIR)"
+
+
+
+!ENDIF 
+
--- a/linebreak/linebreak/NEWS
+++ b/linebreak/linebreak/NEWS
@ -0,0 +1,49 @@
+New in libunibreak 1.0
+
+- Add word breaking support
+- Change the library name to "libunibreak", while keeping maximum compatibility
+- Add pkg-config support
+
+New in liblinebreak 2.1
+
+- Update the data according to LineBreak-6.0.0.txt
+- Fix the bug that an assertion in code can fail if U+FFFC is
+  encountered at the beginning of a line
+
+New in liblinebreak 2.0
+
+- Update the algorithm and data according to UAX #14-24 and
+  LineBreak-5.2.0.txt
+- Rename some functions to reduce namespace pollution
+- Make Doxygen documentation better
+
+New in liblinebreak 1.2
+
+- Fix the bug that an assertion in code can fail if an invalid UTF-8 or
+  UTF-16 sequence is encountered near the end of input
+- Remove the specialization of right single quotation mark as closing
+  punctuation mark in English, French, and Spanish, because it can be
+  used as apostrophe
+- Make Doxygen documentation better
+
+New in liblinebreak 1.1
+
+- Make get_lb_prop_lang static and not an exported symbol
+- Define is_line_breakable to alias to is_breakable
+- Declare get_next_char_utf* will be changed to lb_get_next_char_utf*
+- Move the declarations of get_next_char_utf* from linebreak.h to
+  linebreakdef.h
+- Add the function documentation comments to the header files
+
+New in liblinebreak 1.0
+
+- Update the line breaking data according to UAX #14-22 and
+  LineBreak-5.1.0.txt
+- Add autoconfiscation support (./configure, make, make install)
+- Add Makefile for MSVC
+
+First public release (0.9.6, or 20080421)
+
+- Implement line breaking algorithm according to UAX #14-19
+- Line breaking data is generated from LineBreak-5.0.0.txt
+- Makefile only supports GCC
--- a/linebreak/linebreak/README
+++ b/linebreak/linebreak/README
@ -0,0 +1,88 @@
+                         L I B U N I B R E A K
+                         =====================
+
+Overview
+--------
+
+This is the README file for libunibreak, an implementation of the line
+breaking and word breaking algorithms as described in Unicode
+Standard Annex 14 and Unicode Standard Annex 29, available at
+         <URL:http://www.unicode.org/reports/tr14/tr14-26.html>
+         <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
+
+Check this URL for up-to-date information:
+         <URL:http://vimgadgets.sourceforge.net/libunibreak/>
+
+
+Licence
+-------
+
+This library is released under an open-source licence, the zlib/libpng
+licence.  Please check the file LICENCE for details.
+
+Apart from using the algorithm, part of the code is derived from the
+data provided under
+                  <URL:http://www.unicode.org/Public/>
+
+And the Unicode Terms of Use may apply:
+              <URL:http://www.unicode.org/copyright.html>
+
+
+Installation
+------------
+
+There are three ways to build the library:
+
+1) On *NIX systems supported by the autoconfiscation tools, do the
+   normal
+
+     ./configure
+     make
+     sudo make install
+
+   to build and install both the dynamic and static libraries.  In
+   addition, one may
+
+   - type `make doc' to generate the doxygen documentation; or
+   - type `make linebreakdata' to regenerate linebreakdata.c from
+     LineBreak.txt.
+   - type ‘make wordbreakdata’ to regenerate wordbreakdata.c from
+     WordBreakProperty.txt.
+
+2) On systems where GCC and Binutils are supported, one can type
+
+     cp -p Makefile.gcc Makefile
+     make
+
+   to build the static library.  In addition, one may
+
+   - type `make debug' or `make release' to explicitly generate the
+     debug or release build;
+   - type `make doc' to generate the doxygen documentation; or
+   - type `make linebreakdata' to regenerate linebreakdata.c from
+     LineBreak.txt.
+   - type ‘make wordbreakdata’ to regenerate wordbreakdata.c from
+     WordBreakProperty.txt.
+
+3) On Windows, apart from using method 1 (Cygwin/MSYS) and method 2
+   (MinGW), MSVC can also be used.  Type
+
+     nmake -f Makefile.msvc
+
+   to build the static library.  By default the debug release is built.
+   To build the release version
+
+     nmake -f Makefile.msvc CFG="libunibreak - Win32 Release"
+
+
+Documentation
+-------------
+
+Check the generated document doc/html/linebreak_8h.html and
+doc/html/wordbreak_8h.html in the downloaded file for the public
+interfaces exposed to applications.
+
+
+$Id: README,v 1.8 2012/08/11 06:55:18 adah Exp $
+
+vim:autoindent:expandtab:formatoptions=tcqlmn:textwidth=72:
--- a/linebreak/linebreak/bootstrap
+++ b/linebreak/linebreak/bootstrap
@ -0,0 +1,6 @@
+#! /bin/sh
+aclocal && \
+autoheader && \
+autoconf && \
+libtoolize && \
+automake --add-missing
--- a/linebreak/linebreak/configure.ac
+++ b/linebreak/linebreak/configure.ac
@ -0,0 +1,12 @@
+AC_PREREQ(2.57)
+AC_INIT([libunibreak],[1.0],[wuyongwei@gmail.com])
+AC_CONFIG_SRCDIR([linebreak.c])
+AC_CONFIG_HEADERS([config.h])
+AM_INIT_AUTOMAKE([foreign])
+
+AC_PROG_CC
+AC_PROG_LN_S
+AC_EXEEXT
+AM_PROG_LIBTOOL
+AC_CONFIG_FILES([Makefile])
+AC_OUTPUT([libunibreak.pc])
--- a/linebreak/linebreak/filter_dup.c
+++ b/linebreak/linebreak/filter_dup.c
@ -0,0 +1,47 @@
+#include <stdio.h>
+#include <string.h>
+
+int main()
+{
+	char s[80];
+	char beg[16];
+	char end[16];
+	char prop[16];
+	char lastbeg[16];
+	char lastend[16];
+	char lastprop[16];
+	lastprop[0] = 0;
+	for (;;)
+	{
+		if (fgets(s, sizeof s, stdin) == NULL)
+			break;
+		if (strstr(s, "LBP_") == NULL || strstr(s, "LBP_Undef") != NULL)
+		{
+			if (lastprop[0])
+			{
+				printf("\t{ %s %s %s },\n", lastbeg, lastend, lastprop);
+				lastprop[0] = 0;
+			}
+			printf("%s", s);
+			continue;
+		}
+		sscanf(s, "\t{ %s %s %s }", beg, end, prop);
+		/*printf("==>\t{ \"%s\" \"%s\" \"%s\" },\n", beg, end, prop);*/
+		if (lastprop[0] && strcmp(lastprop, prop) != 0)
+		{
+			printf("\t{ %s %s %s },\n", lastbeg, lastend, lastprop);
+			lastprop[0] = 0;
+		}
+		if (lastprop[0] == 0)
+		{
+			strcpy(lastbeg, beg);
+			strcpy(lastprop, prop);
+		}
+		strcpy(lastend, end);
+	}
+	if (lastprop[0])
+	{
+		printf("\t{ %s %s %s },\n", lastbeg, lastend, prop);
+	}
+	return 0;
+}
--- a/linebreak/linebreak/libunibreak.pc.in
+++ b/linebreak/linebreak/libunibreak.pc.in
@ -0,0 +1,11 @@
+libunibreak:
+prefix=@prefix@
+exec_prefix=@exec_prefix@
+libdir=@libdir@
+includedir=@includedir@
+
+Name: libunibreak
+Description: Library to implement Unicode algorithms for line and word breaking
+Version: @VERSION@
+Libs: -L${libdir} -lunibreak
+Cflags: -I${includedir}
--- a/linebreak/linebreak/linebreak.c
+++ b/linebreak/linebreak/linebreak.c
@ -0,0 +1,737 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Line breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 14 (UAX #14):
+ *		<URL:http://www.unicode.org/reports/tr14/>
+ *
+ * When this library was designed, this annex was at Revision 19, for
+ * Unicode 5.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
+ *
+ * This library has been updated according to Revision 26, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	linebreak.c
+ *
+ * Implementation of the line breaking algorithm as described in Unicode
+ * Standard Annex 14.
+ *
+ * @version	2.1, 2011/05/07
+ * @author	Wu Yongwei
+ */
+
+#include <assert.h>
+#include <stddef.h>
+#include <string.h>
+#include "linebreak.h"
+#include "linebreakdef.h"
+
+/**
+ * Size of the second-level index to the line breaking properties.
+ */
+#define LINEBREAK_INDEX_SIZE 40
+
+/**
+ * Version number of the library.
+ */
+const int linebreak_version = LINEBREAK_VERSION;
+
+/**
+ * Enumeration of break actions.  They are used in the break action
+ * pair table below.
+ */
+enum BreakAction
+{
+	DIR_BRK,		/**< Direct break opportunity */
+	IND_BRK,		/**< Indirect break opportunity */
+	CMI_BRK,		/**< Indirect break opportunity for combining marks */
+	CMP_BRK,		/**< Prohibited break for combining marks */
+	PRH_BRK			/**< Prohibited break */
+};
+
+/**
+ * Break action pair table.  This is a direct mapping of Table 2 of
+ * Unicode Standard Annex 14, Revision 24.
+ */
+static enum BreakAction baTable[LBP_JT][LBP_JT] = {
+	{	/* OP */
+		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, CMP_BRK,
+		PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK, PRH_BRK },
+	{	/* CL */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, PRH_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* CP */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, PRH_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* QU */
+		PRH_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
+		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
+	{	/* GL */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
+		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
+	{	/* NS */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* EX */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* SY */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* IS */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* PR */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, IND_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
+	{	/* PO */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* NU */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* AL */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* ID */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* IN */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* HY */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, DIR_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* BA */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, DIR_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* BB */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
+		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
+	{	/* B2 */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, IND_BRK, IND_BRK, DIR_BRK, PRH_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* ZW */
+		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, PRH_BRK, DIR_BRK,
+		DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* CM */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK },
+	{	/* WJ */
+		IND_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK,
+		IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK },
+	{	/* H2 */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK },
+	{	/* H3 */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK },
+	{	/* JL */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, IND_BRK, IND_BRK, IND_BRK, IND_BRK, DIR_BRK },
+	{	/* JV */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK, IND_BRK },
+	{	/* JT */
+		DIR_BRK, PRH_BRK, PRH_BRK, IND_BRK, IND_BRK, IND_BRK, PRH_BRK,
+		PRH_BRK, PRH_BRK, DIR_BRK, IND_BRK, DIR_BRK, DIR_BRK, DIR_BRK,
+		IND_BRK, IND_BRK, IND_BRK, DIR_BRK, DIR_BRK, PRH_BRK, CMI_BRK,
+		PRH_BRK, DIR_BRK, DIR_BRK, DIR_BRK, DIR_BRK, IND_BRK }
+};
+
+/**
+ * Struct for the second-level index to the line breaking properties.
+ */
+struct LineBreakPropertiesIndex
+{
+	utf32_t end;					/**< End coding point */
+	struct LineBreakProperties *lbp;/**< Pointer to line breaking properties */
+};
+
+/**
+ * Second-level index to the line breaking properties.
+ */
+static struct LineBreakPropertiesIndex lb_prop_index[LINEBREAK_INDEX_SIZE] =
+{
+	{ 0xFFFFFFFF, lb_prop_default }
+};
+
+/**
+ * Initializes the second-level index to the line breaking properties.
+ * If it is not called, the performance of #get_char_lb_class_lang (and
+ * thus the main functionality) can be pretty bad, especially for big
+ * code points like those of Chinese.
+ */
+void init_linebreak(void)
+{
+	size_t i;
+	size_t iPropDefault;
+	size_t len;
+	size_t step;
+
+	len = 0;
+	while (lb_prop_default[len].prop != LBP_Undefined)
+		++len;
+	step = len / LINEBREAK_INDEX_SIZE;
+	iPropDefault = 0;
+	for (i = 0; i < LINEBREAK_INDEX_SIZE; ++i)
+	{
+		lb_prop_index[i].lbp = lb_prop_default + iPropDefault;
+		iPropDefault += step;
+		lb_prop_index[i].end = lb_prop_default[iPropDefault].start - 1;
+	}
+	lb_prop_index[--i].end = 0xFFFFFFFF;
+}
+
+/**
+ * Gets the language-specific line breaking properties.
+ *
+ * @param lang	language of the text
+ * @return		pointer to the language-specific line breaking
+ *				properties array if found; \c NULL otherwise
+ */
+static struct LineBreakProperties *get_lb_prop_lang(const char *lang)
+{
+	struct LineBreakPropertiesLang *lbplIter;
+	if (lang != NULL)
+	{
+		for (lbplIter = lb_prop_lang_map; lbplIter->lang != NULL; ++lbplIter)
+		{
+			if (strncmp(lang, lbplIter->lang, lbplIter->namelen) == 0)
+			{
+				return lbplIter->lbp;
+			}
+		}
+	}
+	return NULL;
+}
+
+/**
+ * Gets the line breaking class of a character from a line breaking
+ * properties array.
+ *
+ * @param ch	character to check
+ * @param lbp	pointer to the line breaking properties array
+ * @return		the line breaking class if found; \c LBP_XX otherwise
+ */
+static enum LineBreakClass get_char_lb_class(
+		utf32_t ch,
+		struct LineBreakProperties *lbp)
+{
+	while (lbp->prop != LBP_Undefined && ch >= lbp->start)
+	{
+		if (ch <= lbp->end)
+			return lbp->prop;
+		++lbp;
+	}
+	return LBP_XX;
+}
+
+/**
+ * Gets the line breaking class of a character from the default line
+ * breaking properties array.
+ *
+ * @param ch	character to check
+ * @return		the line breaking class if found; \c LBP_XX otherwise
+ */
+static enum LineBreakClass get_char_lb_class_default(
+		utf32_t ch)
+{
+	size_t i = 0;
+	while (ch > lb_prop_index[i].end)
+		++i;
+	assert(i < LINEBREAK_INDEX_SIZE);
+	return get_char_lb_class(ch, lb_prop_index[i].lbp);
+}
+
+/**
+ * Gets the line breaking class of a character for a specific
+ * language.  This function will check the language-specific data first,
+ * and then the default data if there is no language-specific property
+ * available for the character.
+ *
+ * @param ch		character to check
+ * @param lbpLang	pointer to the language-specific line breaking
+ *					properties array
+ * @return			the line breaking class if found; \c LBP_XX
+ *					otherwise
+ */
+static enum LineBreakClass get_char_lb_class_lang(
+		utf32_t ch,
+		struct LineBreakProperties *lbpLang)
+{
+	enum LineBreakClass lbcResult;
+
+	/* Find the language-specific line breaking class for a character */
+	if (lbpLang)
+	{
+		lbcResult = get_char_lb_class(ch, lbpLang);
+		if (lbcResult != LBP_XX)
+			return lbcResult;
+	}
+
+	/* Find the generic language-specific line breaking class, if no
+	 * language context is provided, or language-specific data are not
+	 * available for the specific character in the specified language */
+	return get_char_lb_class_default(ch);
+}
+
+/**
+ * Resolves the line breaking class for certain ambiguous or complicated
+ * characters.  They are treated in a simplistic way in this
+ * implementation.
+ *
+ * @param lbc	line breaking class to resolve
+ * @param lang	language of the text
+ * @return		the resolved line breaking class
+ */
+static enum LineBreakClass resolve_lb_class(
+		enum LineBreakClass lbc,
+		const char *lang)
+{
+	switch (lbc)
+	{
+	case LBP_AI:
+		if (lang != NULL &&
+				(strncmp(lang, "zh", 2) == 0 ||	/* Chinese */
+				 strncmp(lang, "ja", 2) == 0 ||	/* Japanese */
+				 strncmp(lang, "ko", 2) == 0))	/* Korean */
+		{
+			return LBP_ID;
+		}
+		/* Fall through */
+	case LBP_SA:
+	case LBP_SG:
+	case LBP_XX:
+		return LBP_AL;
+	default:
+		return lbc;
+	}
+}
+
+/**
+ * Gets the next Unicode character in a UTF-8 sequence.  The index will
+ * be advanced to the next complete character, unless the end of string
+ * is reached in the middle of a UTF-8 sequence.
+ *
+ * @param[in]     s		input UTF-8 string
+ * @param[in]     len	length of the string in bytes
+ * @param[in,out] ip	pointer to the index
+ * @return				the Unicode character beginning at the index; or
+ *						#EOS if end of input is encountered
+ */
+utf32_t lb_get_next_char_utf8(
+		const utf8_t *s,
+		size_t len,
+		size_t *ip)
+{
+	utf8_t ch;
+	utf32_t res;
+
+	assert(*ip <= len);
+	if (*ip == len)
+		return EOS;
+	ch = s[*ip];
+
+	if (ch < 0xC2 || ch > 0xF4)
+	{	/* One-byte sequence, tail (should not occur), or invalid */
+		*ip += 1;
+		return ch;
+	}
+	else if (ch < 0xE0)
+	{	/* Two-byte sequence */
+		if (*ip + 2 > len)
+			return EOS;
+		res = ((ch & 0x1F) << 6) + (s[*ip + 1] & 0x3F);
+		*ip += 2;
+		return res;
+	}
+	else if (ch < 0xF0)
+	{	/* Three-byte sequence */
+		if (*ip + 3 > len)
+			return EOS;
+		res = ((ch & 0x0F) << 12) +
+			  ((s[*ip + 1] & 0x3F) << 6) +
+			  ((s[*ip + 2] & 0x3F));
+		*ip += 3;
+		return res;
+	}
+	else
+	{	/* Four-byte sequence */
+		if (*ip + 4 > len)
+			return EOS;
+		res = ((ch & 0x07) << 18) +
+			  ((s[*ip + 1] & 0x3F) << 12) +
+			  ((s[*ip + 2] & 0x3F) << 6) +
+			  ((s[*ip + 3] & 0x3F));
+		*ip += 4;
+		return res;
+	}
+}
+
+/**
+ * Gets the next Unicode character in a UTF-16 sequence.  The index will
+ * be advanced to the next complete character, unless the end of string
+ * is reached in the middle of a UTF-16 surrogate pair.
+ *
+ * @param[in]     s		input UTF-16 string
+ * @param[in]     len	length of the string in words
+ * @param[in,out] ip	pointer to the index
+ * @return				the Unicode character beginning at the index; or
+ *						#EOS if end of input is encountered
+ */
+utf32_t lb_get_next_char_utf16(
+		const utf16_t *s,
+		size_t len,
+		size_t *ip)
+{
+	utf16_t ch;
+
+	assert(*ip <= len);
+	if (*ip == len)
+		return EOS;
+	ch = s[(*ip)++];
+
+	if (ch < 0xD800 || ch > 0xDBFF)
+	{	/* If the character is not a high surrogate */
+		return ch;
+	}
+	if (*ip == len)
+	{	/* If the input ends here (an error) */
+		--(*ip);
+		return EOS;
+	}
+	if (s[*ip] < 0xDC00 || s[*ip] > 0xDFFF)
+	{	/* If the next character is not the low surrogate (an error) */
+		return ch;
+	}
+	/* Return the constructed character and advance the index again */
+	return (((utf32_t)ch & 0x3FF) << 10) + (s[(*ip)++] & 0x3FF) + 0x10000;
+}
+
+/**
+ * Gets the next Unicode character in a UTF-32 sequence.  The index will
+ * be advanced to the next character.
+ *
+ * @param[in]     s		input UTF-32 string
+ * @param[in]     len	length of the string in dwords
+ * @param[in,out] ip	pointer to the index
+ * @return				the Unicode character beginning at the index; or
+ *						#EOS if end of input is encountered
+ */
+utf32_t lb_get_next_char_utf32(
+		const utf32_t *s,
+		size_t len,
+		size_t *ip)
+{
+	assert(*ip <= len);
+	if (*ip == len)
+		return EOS;
+	return s[(*ip)++];
+}
+
+/**
+ * Sets the line breaking information for a generic input string.
+ *
+ * @param[in]  s			input string
+ * @param[in]  len			length of the input
+ * @param[in]  lang			language of the input
+ * @param[out] brks			pointer to the output breaking data,
+ *							containing #LINEBREAK_MUSTBREAK,
+ *							#LINEBREAK_ALLOWBREAK, #LINEBREAK_NOBREAK,
+ *							or #LINEBREAK_INSIDEACHAR
+ * @param[in] get_next_char	function to get the next UTF-32 character
+ */
+void set_linebreaks(
+		const void *s,
+		size_t len,
+		const char *lang,
+		char *brks,
+		get_next_char_t get_next_char)
+{
+	utf32_t ch;
+	enum LineBreakClass lbcCur;
+	enum LineBreakClass lbcNew;
+	enum LineBreakClass lbcLast;
+	struct LineBreakProperties *lbpLang;
+	size_t posCur = 0;
+	size_t posLast = 0;
+
+	--posLast;	/* To be ++'d later */
+	ch = get_next_char(s, len, &posCur);
+	if (ch == EOS)
+		return;
+	lbpLang = get_lb_prop_lang(lang);
+	lbcCur = resolve_lb_class(get_char_lb_class_lang(ch, lbpLang), lang);
+	lbcNew = LBP_Undefined;
+
+nextline:
+
+	/* Special treatment for the first character */
+	switch (lbcCur)
+	{
+	case LBP_LF:
+	case LBP_NL:
+		lbcCur = LBP_BK;
+		break;
+	case LBP_CB:
+		lbcCur = LBP_BA;
+		break;
+	case LBP_SP:
+		lbcCur = LBP_WJ;
+		break;
+	default:
+		break;
+	}
+
+	/* Process a line till an explicit break or end of string */
+	for (;;)
+	{
+		for (++posLast; posLast < posCur - 1; ++posLast)
+		{
+			brks[posLast] = LINEBREAK_INSIDEACHAR;
+		}
+		assert(posLast == posCur - 1);
+		lbcLast = lbcNew;
+		ch = get_next_char(s, len, &posCur);
+		if (ch == EOS)
+			break;
+		lbcNew = get_char_lb_class_lang(ch, lbpLang);
+		if (lbcCur == LBP_BK || (lbcCur == LBP_CR && lbcNew != LBP_LF))
+		{
+			brks[posLast] = LINEBREAK_MUSTBREAK;
+			lbcCur = resolve_lb_class(lbcNew, lang);
+			goto nextline;
+		}
+
+		switch (lbcNew)
+		{
+		case LBP_SP:
+			brks[posLast] = LINEBREAK_NOBREAK;
+			continue;
+		case LBP_BK:
+		case LBP_LF:
+		case LBP_NL:
+			brks[posLast] = LINEBREAK_NOBREAK;
+			lbcCur = LBP_BK;
+			continue;
+		case LBP_CR:
+			brks[posLast] = LINEBREAK_NOBREAK;
+			lbcCur = LBP_CR;
+			continue;
+		case LBP_CB:
+			brks[posLast] = LINEBREAK_ALLOWBREAK;
+			lbcCur = LBP_BA;
+			continue;
+		default:
+			break;
+		}
+
+		lbcNew = resolve_lb_class(lbcNew, lang);
+
+		assert(lbcCur <= LBP_JT);
+		assert(lbcNew <= LBP_JT);
+		switch (baTable[lbcCur - 1][lbcNew - 1])
+		{
+		case DIR_BRK:
+			brks[posLast] = LINEBREAK_ALLOWBREAK;
+			break;
+		case CMI_BRK:
+		case IND_BRK:
+			if (lbcLast == LBP_SP)
+			{
+				brks[posLast] = LINEBREAK_ALLOWBREAK;
+			}
+			else
+			{
+				brks[posLast] = LINEBREAK_NOBREAK;
+			}
+			break;
+		case CMP_BRK:
+			brks[posLast] = LINEBREAK_NOBREAK;
+			if (lbcLast != LBP_SP)
+				continue;
+			break;
+		case PRH_BRK:
+			brks[posLast] = LINEBREAK_NOBREAK;
+			break;
+		}
+
+		lbcCur = lbcNew;
+	}
+
+	assert(posLast == posCur - 1 && posCur <= len);
+	/* Break after the last character */
+	brks[posLast] = LINEBREAK_MUSTBREAK;
+	/* When the input contains incomplete sequences */
+	while (posCur < len)
+	{
+		brks[posCur++] = LINEBREAK_INSIDEACHAR;
+	}
+}
+
+/**
+ * Sets the line breaking information for a UTF-8 input string.
+ *
+ * @param[in]  s	input UTF-8 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
+ *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
+ */
+void set_linebreaks_utf8(
+		const utf8_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_linebreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf8);
+}
+
+/**
+ * Sets the line breaking information for a UTF-16 input string.
+ *
+ * @param[in]  s	input UTF-16 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
+ *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
+ */
+void set_linebreaks_utf16(
+		const utf16_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_linebreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf16);
+}
+
+/**
+ * Sets the line breaking information for a UTF-32 input string.
+ *
+ * @param[in]  s	input UTF-32 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
+ *					#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
+ */
+void set_linebreaks_utf32(
+		const utf32_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_linebreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf32);
+}
+
+/**
+ * Tells whether a line break can occur between two Unicode characters.
+ * This is a wrapper function to expose a simple interface.  Generally
+ * speaking, it is better to use #set_linebreaks_utf32 instead, since
+ * complicated cases involving combining marks, spaces, etc. cannot be
+ * correctly processed.
+ *
+ * @param char1 the first Unicode character
+ * @param char2 the second Unicode character
+ * @param lang  language of the input
+ * @return      one of #LINEBREAK_MUSTBREAK, #LINEBREAK_ALLOWBREAK,
+ *				#LINEBREAK_NOBREAK, or #LINEBREAK_INSIDEACHAR
+ */
+int is_line_breakable(
+		utf32_t char1,
+		utf32_t char2,
+		const char* lang)
+{
+	utf32_t s[2];
+	char brks[2];
+	s[0] = char1;
+	s[1] = char2;
+	set_linebreaks_utf32(s, 2, lang, brks);
+	return brks[0];
+}
--- a/linebreak/linebreak/linebreak.h
+++ b/linebreak/linebreak/linebreak.h
@ -0,0 +1,87 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Line breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 14 (UAX #14):
+ *		<URL:http://www.unicode.org/reports/tr14/>
+ *
+ * When this library was designed, this annex was at Revision 19, for
+ * Unicode 5.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
+ *
+ * This library has been updated according to Revision 26, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	linebreak.h
+ *
+ * Header file for the line breaking algorithm.
+ *
+ * @version	2.1, 2011/05/07
+ * @author	Wu Yongwei
+ */
+
+#ifndef LINEBREAK_H
+#define LINEBREAK_H
+
+#include <stddef.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define LINEBREAK_VERSION	0x0201	/**< Version of the library linebreak */
+extern const int linebreak_version;
+
+#ifndef LINEBREAK_UTF_TYPES_DEFINED
+#define LINEBREAK_UTF_TYPES_DEFINED
+typedef unsigned char	utf8_t;		/**< Type for UTF-8 data points */
+typedef unsigned short	utf16_t;	/**< Type for UTF-16 data points */
+typedef unsigned int	utf32_t;	/**< Type for UTF-32 data points */
+#endif
+
+#define LINEBREAK_MUSTBREAK		0	/**< Break is mandatory */
+#define LINEBREAK_ALLOWBREAK	1	/**< Break is allowed */
+#define LINEBREAK_NOBREAK		2	/**< No break is possible */
+#define LINEBREAK_INSIDEACHAR	3	/**< A UTF-8/16 sequence is unfinished */
+
+void init_linebreak(void);
+void set_linebreaks_utf8(
+		const utf8_t *s, size_t len, const char* lang, char *brks);
+void set_linebreaks_utf16(
+		const utf16_t *s, size_t len, const char* lang, char *brks);
+void set_linebreaks_utf32(
+		const utf32_t *s, size_t len, const char* lang, char *brks);
+int is_line_breakable(utf32_t char1, utf32_t char2, const char* lang);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* LINEBREAK_H */
--- a/linebreak/linebreak/linebreakdata.c
+++ b/linebreak/linebreak/linebreakdata.c
--- a/linebreak/linebreak/linebreakdata1.tmpl
+++ b/linebreak/linebreak/linebreakdata1.tmpl
@ -0,0 +1 @@
+/* The content of this file is generated from:
--- a/linebreak/linebreak/linebreakdata2.tmpl
+++ b/linebreak/linebreak/linebreakdata2.tmpl
@ -0,0 +1,7 @@
+*/
+
+#include "linebreak.h"
+#include "linebreakdef.h"
+
+/** Default line breaking properties as from the Unicode Web site. */
+struct LineBreakProperties lb_prop_default[] = {
--- a/linebreak/linebreak/linebreakdata3.tmpl
+++ b/linebreak/linebreak/linebreakdata3.tmpl
@ -0,0 +1,2 @@
+	{ 0xFFFFFFFF, 0xFFFFFFFF, LBP_Undefined }
+};
--- a/linebreak/linebreak/linebreakdef.c
+++ b/linebreak/linebreak/linebreakdef.c
@ -0,0 +1,139 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Line breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 14 (UAX #14):
+ *		<URL:http://www.unicode.org/reports/tr14/>
+ *
+ * When this library was designed, this annex was at Revision 19, for
+ * Unicode 5.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
+ *
+ * This library has been updated according to Revision 26, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	linebreakdef.c
+ *
+ * Definition of language-specific data.
+ *
+ * @version	2.1, 2011/05/07
+ * @author	Wu Yongwei
+ */
+
+#include "linebreak.h"
+#include "linebreakdef.h"
+
+/**
+ * English-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_English[] = {
+	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
+	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
+	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * German-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_German[] = {
+	{ 0x00AB, 0x00AB, LBP_CL },	/* Left double angle quotation mark: closing */
+	{ 0x00BB, 0x00BB, LBP_OP },	/* Right double angle quotation mark: opening */
+	{ 0x2018, 0x2018, LBP_CL },	/* Left single quotation mark: closing */
+	{ 0x201C, 0x201C, LBP_CL },	/* Left double quotation mark: closing */
+	{ 0x2039, 0x2039, LBP_CL },	/* Left single angle quotation mark: closing */
+	{ 0x203A, 0x203A, LBP_OP },	/* Right single angle quotation mark: opening */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * Spanish-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_Spanish[] = {
+	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
+	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
+	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
+	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
+	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
+	{ 0x2039, 0x2039, LBP_OP },	/* Left single angle quotation mark: opening */
+	{ 0x203A, 0x203A, LBP_CL },	/* Right single angle quotation mark: closing */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * French-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_French[] = {
+	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
+	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
+	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
+	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
+	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
+	{ 0x2039, 0x2039, LBP_OP },	/* Left single angle quotation mark: opening */
+	{ 0x203A, 0x203A, LBP_CL },	/* Right single angle quotation mark: closing */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * Russian-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_Russian[] = {
+	{ 0x00AB, 0x00AB, LBP_OP },	/* Left double angle quotation mark: opening */
+	{ 0x00BB, 0x00BB, LBP_CL },	/* Right double angle quotation mark: closing */
+	{ 0x201C, 0x201C, LBP_CL },	/* Left double quotation mark: closing */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * Chinese-specifc data over the default Unicode rules.
+ */
+static struct LineBreakProperties lb_prop_Chinese[] = {
+	{ 0x2018, 0x2018, LBP_OP },	/* Left single quotation mark: opening */
+	{ 0x2019, 0x2019, LBP_CL },	/* Right single quotation mark: closing */
+	{ 0x201C, 0x201C, LBP_OP },	/* Left double quotation mark: opening */
+	{ 0x201D, 0x201D, LBP_CL },	/* Right double quotation mark: closing */
+	{ 0, 0, LBP_Undefined }
+};
+
+/**
+ * Association data of language-specific line breaking properties with
+ * language names.  This is the definition for the static data in this
+ * file.  If you want more flexibility, or do not need the data here,
+ * you may want to redefine \e lb_prop_lang_map in your C source file.
+ */
+struct LineBreakPropertiesLang lb_prop_lang_map[] = {
+	{ "en", 2, lb_prop_English },
+	{ "de", 2, lb_prop_German },
+	{ "es", 2, lb_prop_Spanish },
+	{ "fr", 2, lb_prop_French },
+	{ "ru", 2, lb_prop_Russian },
+	{ "zh", 2, lb_prop_Chinese },
+	{ NULL, 0, NULL }
+};
--- a/linebreak/linebreak/linebreakdef.h
+++ b/linebreak/linebreak/linebreakdef.h
@ -0,0 +1,149 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Line breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2008-2011 Wu Yongwei <wuyongwei at gmail dot com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 14 (UAX #14):
+ *		<URL:http://www.unicode.org/reports/tr14/>
+ *
+ * When this library was designed, this annex was at Revision 19, for
+ * Unicode 5.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-19.html>
+ *
+ * This library has been updated according to Revision 26, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr14/tr14-26.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	linebreakdef.h
+ *
+ * Definitions of internal data structures, declarations of global
+ * variables, and function prototypes for the line breaking algorithm.
+ *
+ * @version	2.1, 2011/05/07
+ * @author	Wu Yongwei
+ */
+
+/**
+ * Constant value to mark the end of string.  It is not a valid Unicode
+ * character.
+ */
+#define EOS 0xFFFF
+
+/**
+ * Line break classes.  This is a direct mapping of Table 1 of Unicode
+ * Standard Annex 14, Revision 26.
+ */
+enum LineBreakClass
+{
+	/* This is used to signal an error condition. */
+	LBP_Undefined,	/**< Undefined */
+
+	/* The following break classes are treated in the pair table. */
+	LBP_OP,			/**< Opening punctuation */
+	LBP_CL,			/**< Closing punctuation */
+	LBP_CP,			/**< Closing parenthesis */
+	LBP_QU,			/**< Ambiguous quotation */
+	LBP_GL,			/**< Glue */
+	LBP_NS,			/**< Non-starters */
+	LBP_EX,			/**< Exclamation/Interrogation */
+	LBP_SY,			/**< Symbols allowing break after */
+	LBP_IS,			/**< Infix separator */
+	LBP_PR,			/**< Prefix */
+	LBP_PO,			/**< Postfix */
+	LBP_NU,			/**< Numeric */
+	LBP_AL,			/**< Alphabetic */
+	LBP_ID,			/**< Ideographic */
+	LBP_IN,			/**< Inseparable characters */
+	LBP_HY,			/**< Hyphen */
+	LBP_BA,			/**< Break after */
+	LBP_BB,			/**< Break before */
+	LBP_B2,			/**< Break on either side (but not pair) */
+	LBP_ZW,			/**< Zero-width space */
+	LBP_CM,			/**< Combining marks */
+	LBP_WJ,			/**< Word joiner */
+	LBP_H2,			/**< Hangul LV */
+	LBP_H3,			/**< Hangul LVT */
+	LBP_JL,			/**< Hangul L Jamo */
+	LBP_JV,			/**< Hangul V Jamo */
+	LBP_JT,			/**< Hangul T Jamo */
+
+	/* The following break classes are not treated in the pair table */
+	LBP_AI,			/**< Ambiguous (alphabetic or ideograph) */
+	LBP_BK,			/**< Break (mandatory) */
+	LBP_CB,			/**< Contingent break */
+	LBP_CR,			/**< Carriage return */
+	LBP_LF,			/**< Line feed */
+	LBP_NL,			/**< Next line */
+	LBP_SA,			/**< South-East Asian */
+	LBP_SG,			/**< Surrogates */
+	LBP_SP,			/**< Space */
+	LBP_XX			/**< Unknown */
+};
+
+/**
+ * Struct for entries of line break properties.  The array of the
+ * entries \e must be sorted.
+ */
+struct LineBreakProperties
+{
+	utf32_t start;				/**< Starting coding point */
+	utf32_t end;				/**< End coding point */
+	enum LineBreakClass prop;	/**< The line breaking property */
+};
+
+/**
+ * Struct for association of language-specific line breaking properties
+ * with language names.
+ */
+struct LineBreakPropertiesLang
+{
+	const char *lang;					/**< Language name */
+	size_t namelen;						/**< Length of name to match */
+	struct LineBreakProperties *lbp;	/**< Pointer to associated data */
+};
+
+/**
+ * Abstract function interface for #lb_get_next_char_utf8,
+ * #lb_get_next_char_utf16, and #lb_get_next_char_utf32.
+ */
+typedef utf32_t (*get_next_char_t)(const void *, size_t, size_t *);
+
+/* Declarations */
+extern struct LineBreakProperties lb_prop_default[];
+extern struct LineBreakPropertiesLang lb_prop_lang_map[];
+
+/* Function Prototype */
+utf32_t lb_get_next_char_utf8(const utf8_t *s, size_t len, size_t *ip);
+utf32_t lb_get_next_char_utf16(const utf16_t *s, size_t len, size_t *ip);
+utf32_t lb_get_next_char_utf32(const utf32_t *s, size_t len, size_t *ip);
+void set_linebreaks(
+		const void *s,
+		size_t len,
+		const char *lang,
+		char *brks,
+		get_next_char_t get_next_char);
--- a/linebreak/linebreak/purge
+++ b/linebreak/linebreak/purge
@ -0,0 +1,2 @@
+#! /bin/sh
+rm -rf Makefile.in aclocal.m4 autom4te.cache/ config.guess config.h.in config.sub configure depcomp doc/ install-sh ltmain.sh missing
--- a/linebreak/linebreak/sort_numeric_hex.py
+++ b/linebreak/linebreak/sort_numeric_hex.py
@ -0,0 +1,6 @@
+#!/usr/bin/env python
+import sys
+
+lines = open(sys.argv[1]).readlines()
+lines_out = sorted(lines, key=lambda line: int(line.split("..")[0], 16))
+map(sys.stdout.write, lines_out)
--- a/linebreak/linebreak/wordbreak.c
+++ b/linebreak/linebreak/wordbreak.c
@ -0,0 +1,437 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Word breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 29 (UAX #29):
+ *		<URL:http://unicode.org/reports/tr29>
+ *
+ * When this library was designed, this annex was at Revision 17, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	wordbreak.c
+ *
+ * Implementation of the word breaking algorithm as described in Unicode
+ * Standard Annex 29.
+ *
+ * @version	2.2, 2012/02/04
+ * @author	Tom Hacohen
+ */
+
+#include <assert.h>
+#include <stddef.h>
+#include <string.h>
+#include "linebreak.h"
+#include "linebreakdef.h"
+
+#include "wordbreak.h"
+#include "wordbreakdata.c"
+
+#define ARRAY_LEN(x) (sizeof(x) / sizeof(x[0]))
+
+/**
+ * Initializes the wordbreak internals.  It currently does nothing, but
+ * it may in the future.
+ */
+void init_wordbreak(void)
+{
+}
+
+/**
+ * Gets the word breaking class of a character.
+ *
+ * @param ch	character to check
+ * @param wbp	pointer to the wbp breaking properties array
+ * @param len	size of the wbp array in number of items
+ * @return		the word breaking class if found; \c WBP_Any otherwise
+ */
+static enum WordBreakClass get_char_wb_class(
+		utf32_t ch,
+		struct WordBreakProperties *wbp,
+		size_t len)
+{
+	int min = 0;
+	int max = len - 1;
+	int mid;
+
+	do
+	{
+		mid = (min + max) / 2;
+
+		if (ch < wbp[mid].start)
+			max = mid - 1;
+		else if (ch > wbp[mid].end)
+			min = mid + 1;
+		else
+			return wbp[mid].prop;
+	}
+	while (min <= max);
+
+	return WBP_Any;
+}
+
+/**
+ * Sets the word break types to a specific value in a range.
+ *
+ * It sets the inside chars to #WORDBREAK_INSIDEACHAR and the rest to brkType.
+ * Assumes \a brks is initialized - all the cells with #WORDBREAK_NOBREAK are
+ * cells that we really don't want to break after.
+ *
+ * @param[in]  s			input string
+ * @param[out] brks			breaks array to fill
+ * @param[in]  posStart		start position
+ * @param[in]  posEnd		end position (exclusive)
+ * @param[in]  len			length of the string
+ * @param[in]  brkType		breaks type to use
+ * @param[in] get_next_char	function to get the next UTF-32 character
+ */
+static void set_brks_to(
+		const void *s,
+		char *brks,
+		size_t posStart,
+		size_t posEnd,
+		size_t len,
+		char brkType,
+		get_next_char_t get_next_char)
+{
+	size_t posNext = posStart;
+	while (posNext < posEnd)
+	{
+		utf32_t ch;
+		ch = get_next_char(s, len, &posNext);
+		assert(ch != EOS);
+		for (; posStart < posNext - 1; ++posStart)
+			brks[posStart] = WORDBREAK_INSIDEACHAR;
+		assert(posStart == posNext - 1);
+
+		/* Only set it if we haven't set it not to break before. */
+		if (brks[posStart] != WORDBREAK_NOBREAK)
+			brks[posStart] = brkType;
+		posStart = posNext;
+	}
+}
+
+/* Checks to see if the class is newline, CR, or LF (rules WB3a and b). */
+#define IS_WB3ab(cls) ((cls == WBP_Newline) || (cls == WBP_CR) || \
+					   (cls == WBP_LF))
+
+/**
+ * Sets the word breaking information for a generic input string.
+ *
+ * @param[in]  s			input string
+ * @param[in]  len			length of the input
+ * @param[in]  lang			language of the input
+ * @param[out] brks			pointer to the output breaking data, containing
+ *							#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
+ *							#WORDBREAK_INSIDEACHAR
+ * @param[in] get_next_char	function to get the next UTF-32 character
+ */
+static void set_wordbreaks(
+		const void *s,
+		size_t len,
+		const char *lang,
+		char *brks,
+		get_next_char_t get_next_char)
+{
+	enum WordBreakClass wbcLast = WBP_Undefined;
+	/* wbcSeqStart is the class that started the current sequence.
+	 * WBP_Undefined is a special case that means "sot".
+	 * This value is the class that is at the start of the current rule
+	 * matching sequence. For example, in case of Numeric+MidNum+Numeric
+	 * it'll be Numeric all the way.
+	 */
+	enum WordBreakClass wbcSeqStart = WBP_Undefined;
+	utf32_t ch;
+	size_t posNext = 0;
+	size_t posCur = 0;
+	size_t posLast = 0;
+
+	/* TODO: Language-specific specialization. */
+	(void) lang;
+
+	/* Init brks. */
+	memset(brks, WORDBREAK_BREAK, len);
+
+	ch = get_next_char(s, len, &posNext);
+
+	while (ch != EOS)
+	{
+		enum WordBreakClass wbcCur;
+		wbcCur = get_char_wb_class(ch, wb_prop_default,
+								   ARRAY_LEN(wb_prop_default));
+
+		switch (wbcCur)
+		{
+	    case WBP_CR:
+			/* WB3b */
+			set_brks_to(s, brks, posLast, posCur, len,
+						WORDBREAK_BREAK, get_next_char);
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    case WBP_LF:
+			if (wbcSeqStart == WBP_CR) /* WB3 */
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_NOBREAK, get_next_char);
+				wbcSeqStart = wbcCur;
+				posLast = posCur;
+				break;
+			}
+			/* Fall off */
+
+	    case WBP_Newline:
+			/* WB3a,3b */
+			set_brks_to(s, brks, posLast, posCur, len,
+						WORDBREAK_BREAK, get_next_char);
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    case WBP_Extend:
+	    case WBP_Format:
+			/* WB4 - If not the first char/after a newline (WB3a,3b), skip
+			 * this class, set it to be the same as the prev, and mark
+			 * brks not to break before them. */
+			if ((wbcSeqStart == WBP_Undefined) || IS_WB3ab(wbcSeqStart))
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+				wbcSeqStart = wbcCur;
+			}
+			else
+			{
+				/* It's surely not the first */
+				brks[posCur - 1] = WORDBREAK_NOBREAK;
+				/* "inherit" the previous class. */
+				wbcCur = wbcLast;
+			}
+			break;
+
+	    case WBP_Katakana:
+			if ((wbcSeqStart == WBP_Katakana) || /* WB13 */
+					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_NOBREAK, get_next_char);
+			}
+			/* No rule found, reset */
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+			}
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    case WBP_ALetter:
+			if ((wbcSeqStart == WBP_ALetter) || /* WB5,6,7 */
+					(wbcLast == WBP_Numeric) || /* WB10 */
+					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_NOBREAK, get_next_char);
+			}
+			/* No rule found, reset */
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+			}
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    case WBP_MidNumLet:
+			if ((wbcLast == WBP_ALetter) || /* WB6,7 */
+					(wbcLast == WBP_Numeric)) /* WB11,12 */
+			{
+				/* Go on */
+			}
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+				wbcSeqStart = wbcCur;
+				posLast = posCur;
+			}
+			break;
+
+	    case WBP_MidLetter:
+			if (wbcLast == WBP_ALetter) /* WB6,7 */
+			{
+				/* Go on */
+			}
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+				wbcSeqStart = wbcCur;
+				posLast = posCur;
+			}
+			break;
+
+	    case WBP_MidNum:
+			if (wbcLast == WBP_Numeric) /* WB11,12 */
+			{
+				/* Go on */
+			}
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+				wbcSeqStart = wbcCur;
+				posLast = posCur;
+			}
+			break;
+
+	    case WBP_Numeric:
+			if ((wbcSeqStart == WBP_Numeric) || /* WB8,11,12 */
+					(wbcLast == WBP_ALetter) || /* WB9 */
+					(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_NOBREAK, get_next_char);
+			}
+			/* No rule found, reset */
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+			}
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    case WBP_ExtendNumLet:
+			/* WB13a,13b */
+			if ((wbcSeqStart == wbcLast) &&
+				((wbcLast == WBP_ALetter) ||
+				 (wbcLast == WBP_Numeric) ||
+				 (wbcLast == WBP_Katakana) ||
+				 (wbcLast == WBP_ExtendNumLet)))
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_NOBREAK, get_next_char);
+			}
+			/* No rule found, reset */
+			else
+			{
+				set_brks_to(s, brks, posLast, posCur, len,
+							WORDBREAK_BREAK, get_next_char);
+			}
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+		 case WBP_Any:
+			/* Allow breaks and reset */
+			set_brks_to(s, brks, posLast, posCur, len,
+						WORDBREAK_BREAK, get_next_char);
+			wbcSeqStart = wbcCur;
+			posLast = posCur;
+			break;
+
+	    default:
+			/* Error, should never get here! */
+			assert(0);
+			break;
+		}
+
+		wbcLast = wbcCur;
+		posCur = posNext;
+		ch = get_next_char(s, len, &posNext);
+    }
+
+	/* WB2 */
+	set_brks_to(s, brks, posLast, posNext, len,
+				WORDBREAK_BREAK, get_next_char);
+}
+
+/**
+ * Sets the word breaking information for a UTF-8 input string.
+ *
+ * @param[in]  s	input UTF-8 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
+ *					#WORDBREAK_INSIDEACHAR
+ */
+void set_wordbreaks_utf8(
+		const utf8_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_wordbreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf8);
+}
+
+/**
+ * Sets the word breaking information for a UTF-16 input string.
+ *
+ * @param[in]  s	input UTF-16 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
+ *					#WORDBREAK_INSIDEACHAR
+ */
+void set_wordbreaks_utf16(
+		const utf16_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_wordbreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf16);
+}
+
+/**
+ * Sets the word breaking information for a UTF-32 input string.
+ *
+ * @param[in]  s	input UTF-32 string
+ * @param[in]  len	length of the input
+ * @param[in]  lang	language of the input
+ * @param[out] brks	pointer to the output breaking data, containing
+ *					#WORDBREAK_BREAK, #WORDBREAK_NOBREAK, or
+ *					#WORDBREAK_INSIDEACHAR
+ */
+void set_wordbreaks_utf32(
+		const utf32_t *s,
+		size_t len,
+		const char *lang,
+		char *brks)
+{
+	set_wordbreaks(s, len, lang, brks,
+				   (get_next_char_t)lb_get_next_char_utf32);
+}
--- a/linebreak/linebreak/wordbreak.h
+++ b/linebreak/linebreak/wordbreak.h
@ -0,0 +1,72 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Word breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 29 (UAX #29):
+ *		<URL:http://unicode.org/reports/tr29>
+ *
+ * When this library was designed, this annex was at Revision 17, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	wordbreak.h
+ *
+ * Header file for the word breaking (segmentation) algorithm.
+ *
+ * @version	2.2, 2012/02/04
+ * @author	Tom Hacohen
+ */
+
+#ifndef WORDBREAK_H
+#define WORDBREAK_H
+
+#include <stddef.h>
+#include "linebreak.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define WORDBREAK_BREAK			0	/**< Break is allowed */
+#define WORDBREAK_NOBREAK		1	/**< No break is allowed */
+#define WORDBREAK_INSIDEACHAR	2	/**< A UTF-8/16 sequence is unfinished */
+
+void init_wordbreak(void);
+void set_wordbreaks_utf8(
+		const utf8_t *s, size_t len, const char* lang, char *brks);
+void set_wordbreaks_utf16(
+		const utf16_t *s, size_t len, const char* lang, char *brks);
+void set_wordbreaks_utf32(
+		const utf32_t *s, size_t len, const char* lang, char *brks);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/linebreak/linebreak/wordbreakdata.c
+++ b/linebreak/linebreak/wordbreakdata.c
@ -0,0 +1,860 @@
+/* The content of this file is generated from:
+# WordBreakProperty-6.0.0.txt
+# Date: 2010-08-19, 00:48:48 GMT [MD]
+*/
+
+#include "linebreak.h"
+#include "wordbreakdef.h"
+
+static struct WordBreakProperties wb_prop_default[] = {
+	{0x000A, 0x000A, WBP_LF},
+	{0x000B, 0x000C, WBP_Newline},
+	{0x000D, 0x000D, WBP_CR},
+	{0x0027, 0x0027, WBP_MidNumLet},
+	{0x002C, 0x002C, WBP_MidNum},
+	{0x002E, 0x002E, WBP_MidNumLet},
+	{0x0030, 0x0039, WBP_Numeric},
+	{0x003A, 0x003A, WBP_MidLetter},
+	{0x003B, 0x003B, WBP_MidNum},
+	{0x0041, 0x005A, WBP_ALetter},
+	{0x005F, 0x005F, WBP_ExtendNumLet},
+	{0x0061, 0x007A, WBP_ALetter},
+	{0x0085, 0x0085, WBP_Newline},
+	{0x00AA, 0x00AA, WBP_ALetter},
+	{0x00AD, 0x00AD, WBP_Format},
+	{0x00B5, 0x00B5, WBP_ALetter},
+	{0x00B7, 0x00B7, WBP_MidLetter},
+	{0x00BA, 0x00BA, WBP_ALetter},
+	{0x00C0, 0x00D6, WBP_ALetter},
+	{0x00D8, 0x00F6, WBP_ALetter},
+	{0x00F8, 0x01BA, WBP_ALetter},
+	{0x01BB, 0x01BB, WBP_ALetter},
+	{0x01BC, 0x01BF, WBP_ALetter},
+	{0x01C0, 0x01C3, WBP_ALetter},
+	{0x01C4, 0x0293, WBP_ALetter},
+	{0x0294, 0x0294, WBP_ALetter},
+	{0x0295, 0x02AF, WBP_ALetter},
+	{0x02B0, 0x02C1, WBP_ALetter},
+	{0x02C6, 0x02D1, WBP_ALetter},
+	{0x02E0, 0x02E4, WBP_ALetter},
+	{0x02EC, 0x02EC, WBP_ALetter},
+	{0x02EE, 0x02EE, WBP_ALetter},
+	{0x0300, 0x036F, WBP_Extend},
+	{0x0370, 0x0373, WBP_ALetter},
+	{0x0374, 0x0374, WBP_ALetter},
+	{0x0376, 0x0377, WBP_ALetter},
+	{0x037A, 0x037A, WBP_ALetter},
+	{0x037B, 0x037D, WBP_ALetter},
+	{0x037E, 0x037E, WBP_MidNum},
+	{0x0386, 0x0386, WBP_ALetter},
+	{0x0387, 0x0387, WBP_MidLetter},
+	{0x0388, 0x038A, WBP_ALetter},
+	{0x038C, 0x038C, WBP_ALetter},
+	{0x038E, 0x03A1, WBP_ALetter},
+	{0x03A3, 0x03F5, WBP_ALetter},
+	{0x03F7, 0x0481, WBP_ALetter},
+	{0x0483, 0x0487, WBP_Extend},
+	{0x0488, 0x0489, WBP_Extend},
+	{0x048A, 0x0527, WBP_ALetter},
+	{0x0531, 0x0556, WBP_ALetter},
+	{0x0559, 0x0559, WBP_ALetter},
+	{0x0561, 0x0587, WBP_ALetter},
+	{0x0589, 0x0589, WBP_MidNum},
+	{0x0591, 0x05BD, WBP_Extend},
+	{0x05BF, 0x05BF, WBP_Extend},
+	{0x05C1, 0x05C2, WBP_Extend},
+	{0x05C4, 0x05C5, WBP_Extend},
+	{0x05C7, 0x05C7, WBP_Extend},
+	{0x05D0, 0x05EA, WBP_ALetter},
+	{0x05F0, 0x05F2, WBP_ALetter},
+	{0x05F3, 0x05F3, WBP_ALetter},
+	{0x05F4, 0x05F4, WBP_MidLetter},
+	{0x0600, 0x0603, WBP_Format},
+	{0x060C, 0x060D, WBP_MidNum},
+	{0x0610, 0x061A, WBP_Extend},
+	{0x0620, 0x063F, WBP_ALetter},
+	{0x0640, 0x0640, WBP_ALetter},
+	{0x0641, 0x064A, WBP_ALetter},
+	{0x064B, 0x065F, WBP_Extend},
+	{0x0660, 0x0669, WBP_Numeric},
+	{0x066B, 0x066B, WBP_Numeric},
+	{0x066C, 0x066C, WBP_MidNum},
+	{0x066E, 0x066F, WBP_ALetter},
+	{0x0670, 0x0670, WBP_Extend},
+	{0x0671, 0x06D3, WBP_ALetter},
+	{0x06D5, 0x06D5, WBP_ALetter},
+	{0x06D6, 0x06DC, WBP_Extend},
+	{0x06DD, 0x06DD, WBP_Format},
+	{0x06DF, 0x06E4, WBP_Extend},
+	{0x06E5, 0x06E6, WBP_ALetter},
+	{0x06E7, 0x06E8, WBP_Extend},
+	{0x06EA, 0x06ED, WBP_Extend},
+	{0x06EE, 0x06EF, WBP_ALetter},
+	{0x06F0, 0x06F9, WBP_Numeric},
+	{0x06FA, 0x06FC, WBP_ALetter},
+	{0x06FF, 0x06FF, WBP_ALetter},
+	{0x070F, 0x070F, WBP_Format},
+	{0x0710, 0x0710, WBP_ALetter},
+	{0x0711, 0x0711, WBP_Extend},
+	{0x0712, 0x072F, WBP_ALetter},
+	{0x0730, 0x074A, WBP_Extend},
+	{0x074D, 0x07A5, WBP_ALetter},
+	{0x07A6, 0x07B0, WBP_Extend},
+	{0x07B1, 0x07B1, WBP_ALetter},
+	{0x07C0, 0x07C9, WBP_Numeric},
+	{0x07CA, 0x07EA, WBP_ALetter},
+	{0x07EB, 0x07F3, WBP_Extend},
+	{0x07F4, 0x07F5, WBP_ALetter},
+	{0x07F8, 0x07F8, WBP_MidNum},
+	{0x07FA, 0x07FA, WBP_ALetter},
+	{0x0800, 0x0815, WBP_ALetter},
+	{0x0816, 0x0819, WBP_Extend},
+	{0x081A, 0x081A, WBP_ALetter},
+	{0x081B, 0x0823, WBP_Extend},
+	{0x0824, 0x0824, WBP_ALetter},
+	{0x0825, 0x0827, WBP_Extend},
+	{0x0828, 0x0828, WBP_ALetter},
+	{0x0829, 0x082D, WBP_Extend},
+	{0x0840, 0x0858, WBP_ALetter},
+	{0x0859, 0x085B, WBP_Extend},
+	{0x0900, 0x0902, WBP_Extend},
+	{0x0903, 0x0903, WBP_Extend},
+	{0x0904, 0x0939, WBP_ALetter},
+	{0x093A, 0x093A, WBP_Extend},
+	{0x093B, 0x093B, WBP_Extend},
+	{0x093C, 0x093C, WBP_Extend},
+	{0x093D, 0x093D, WBP_ALetter},
+	{0x093E, 0x0940, WBP_Extend},
+	{0x0941, 0x0948, WBP_Extend},
+	{0x0949, 0x094C, WBP_Extend},
+	{0x094D, 0x094D, WBP_Extend},
+	{0x094E, 0x094F, WBP_Extend},
+	{0x0950, 0x0950, WBP_ALetter},
+	{0x0951, 0x0957, WBP_Extend},
+	{0x0958, 0x0961, WBP_ALetter},
+	{0x0962, 0x0963, WBP_Extend},
+	{0x0966, 0x096F, WBP_Numeric},
+	{0x0971, 0x0971, WBP_ALetter},
+	{0x0972, 0x0977, WBP_ALetter},
+	{0x0979, 0x097F, WBP_ALetter},
+	{0x0981, 0x0981, WBP_Extend},
+	{0x0982, 0x0983, WBP_Extend},
+	{0x0985, 0x098C, WBP_ALetter},
+	{0x098F, 0x0990, WBP_ALetter},
+	{0x0993, 0x09A8, WBP_ALetter},
+	{0x09AA, 0x09B0, WBP_ALetter},
+	{0x09B2, 0x09B2, WBP_ALetter},
+	{0x09B6, 0x09B9, WBP_ALetter},
+	{0x09BC, 0x09BC, WBP_Extend},
+	{0x09BD, 0x09BD, WBP_ALetter},
+	{0x09BE, 0x09C0, WBP_Extend},
+	{0x09C1, 0x09C4, WBP_Extend},
+	{0x09C7, 0x09C8, WBP_Extend},
+	{0x09CB, 0x09CC, WBP_Extend},
+	{0x09CD, 0x09CD, WBP_Extend},
+	{0x09CE, 0x09CE, WBP_ALetter},
+	{0x09D7, 0x09D7, WBP_Extend},
+	{0x09DC, 0x09DD, WBP_ALetter},
+	{0x09DF, 0x09E1, WBP_ALetter},
+	{0x09E2, 0x09E3, WBP_Extend},
+	{0x09E6, 0x09EF, WBP_Numeric},
+	{0x09F0, 0x09F1, WBP_ALetter},
+	{0x0A01, 0x0A02, WBP_Extend},
+	{0x0A03, 0x0A03, WBP_Extend},
+	{0x0A05, 0x0A0A, WBP_ALetter},
+	{0x0A0F, 0x0A10, WBP_ALetter},
+	{0x0A13, 0x0A28, WBP_ALetter},
+	{0x0A2A, 0x0A30, WBP_ALetter},
+	{0x0A32, 0x0A33, WBP_ALetter},
+	{0x0A35, 0x0A36, WBP_ALetter},
+	{0x0A38, 0x0A39, WBP_ALetter},
+	{0x0A3C, 0x0A3C, WBP_Extend},
+	{0x0A3E, 0x0A40, WBP_Extend},
+	{0x0A41, 0x0A42, WBP_Extend},
+	{0x0A47, 0x0A48, WBP_Extend},
+	{0x0A4B, 0x0A4D, WBP_Extend},
+	{0x0A51, 0x0A51, WBP_Extend},
+	{0x0A59, 0x0A5C, WBP_ALetter},
+	{0x0A5E, 0x0A5E, WBP_ALetter},
+	{0x0A66, 0x0A6F, WBP_Numeric},
+	{0x0A70, 0x0A71, WBP_Extend},
+	{0x0A72, 0x0A74, WBP_ALetter},
+	{0x0A75, 0x0A75, WBP_Extend},
+	{0x0A81, 0x0A82, WBP_Extend},
+	{0x0A83, 0x0A83, WBP_Extend},
+	{0x0A85, 0x0A8D, WBP_ALetter},
+	{0x0A8F, 0x0A91, WBP_ALetter},
+	{0x0A93, 0x0AA8, WBP_ALetter},
+	{0x0AAA, 0x0AB0, WBP_ALetter},
+	{0x0AB2, 0x0AB3, WBP_ALetter},
+	{0x0AB5, 0x0AB9, WBP_ALetter},
+	{0x0ABC, 0x0ABC, WBP_Extend},
+	{0x0ABD, 0x0ABD, WBP_ALetter},
+	{0x0ABE, 0x0AC0, WBP_Extend},
+	{0x0AC1, 0x0AC5, WBP_Extend},
+	{0x0AC7, 0x0AC8, WBP_Extend},
+	{0x0AC9, 0x0AC9, WBP_Extend},
+	{0x0ACB, 0x0ACC, WBP_Extend},
+	{0x0ACD, 0x0ACD, WBP_Extend},
+	{0x0AD0, 0x0AD0, WBP_ALetter},
+	{0x0AE0, 0x0AE1, WBP_ALetter},
+	{0x0AE2, 0x0AE3, WBP_Extend},
+	{0x0AE6, 0x0AEF, WBP_Numeric},
+	{0x0B01, 0x0B01, WBP_Extend},
+	{0x0B02, 0x0B03, WBP_Extend},
+	{0x0B05, 0x0B0C, WBP_ALetter},
+	{0x0B0F, 0x0B10, WBP_ALetter},
+	{0x0B13, 0x0B28, WBP_ALetter},
+	{0x0B2A, 0x0B30, WBP_ALetter},
+	{0x0B32, 0x0B33, WBP_ALetter},
+	{0x0B35, 0x0B39, WBP_ALetter},
+	{0x0B3C, 0x0B3C, WBP_Extend},
+	{0x0B3D, 0x0B3D, WBP_ALetter},
+	{0x0B3E, 0x0B3E, WBP_Extend},
+	{0x0B3F, 0x0B3F, WBP_Extend},
+	{0x0B40, 0x0B40, WBP_Extend},
+	{0x0B41, 0x0B44, WBP_Extend},
+	{0x0B47, 0x0B48, WBP_Extend},
+	{0x0B4B, 0x0B4C, WBP_Extend},
+	{0x0B4D, 0x0B4D, WBP_Extend},
+	{0x0B56, 0x0B56, WBP_Extend},
+	{0x0B57, 0x0B57, WBP_Extend},
+	{0x0B5C, 0x0B5D, WBP_ALetter},
+	{0x0B5F, 0x0B61, WBP_ALetter},
+	{0x0B62, 0x0B63, WBP_Extend},
+	{0x0B66, 0x0B6F, WBP_Numeric},
+	{0x0B71, 0x0B71, WBP_ALetter},
+	{0x0B82, 0x0B82, WBP_Extend},
+	{0x0B83, 0x0B83, WBP_ALetter},
+	{0x0B85, 0x0B8A, WBP_ALetter},
+	{0x0B8E, 0x0B90, WBP_ALetter},
+	{0x0B92, 0x0B95, WBP_ALetter},
+	{0x0B99, 0x0B9A, WBP_ALetter},
+	{0x0B9C, 0x0B9C, WBP_ALetter},
+	{0x0B9E, 0x0B9F, WBP_ALetter},
+	{0x0BA3, 0x0BA4, WBP_ALetter},
+	{0x0BA8, 0x0BAA, WBP_ALetter},
+	{0x0BAE, 0x0BB9, WBP_ALetter},
+	{0x0BBE, 0x0BBF, WBP_Extend},
+	{0x0BC0, 0x0BC0, WBP_Extend},
+	{0x0BC1, 0x0BC2, WBP_Extend},
+	{0x0BC6, 0x0BC8, WBP_Extend},
+	{0x0BCA, 0x0BCC, WBP_Extend},
+	{0x0BCD, 0x0BCD, WBP_Extend},
+	{0x0BD0, 0x0BD0, WBP_ALetter},
+	{0x0BD7, 0x0BD7, WBP_Extend},
+	{0x0BE6, 0x0BEF, WBP_Numeric},
+	{0x0C01, 0x0C03, WBP_Extend},
+	{0x0C05, 0x0C0C, WBP_ALetter},
+	{0x0C0E, 0x0C10, WBP_ALetter},
+	{0x0C12, 0x0C28, WBP_ALetter},
+	{0x0C2A, 0x0C33, WBP_ALetter},
+	{0x0C35, 0x0C39, WBP_ALetter},
+	{0x0C3D, 0x0C3D, WBP_ALetter},
+	{0x0C3E, 0x0C40, WBP_Extend},
+	{0x0C41, 0x0C44, WBP_Extend},
+	{0x0C46, 0x0C48, WBP_Extend},
+	{0x0C4A, 0x0C4D, WBP_Extend},
+	{0x0C55, 0x0C56, WBP_Extend},
+	{0x0C58, 0x0C59, WBP_ALetter},
+	{0x0C60, 0x0C61, WBP_ALetter},
+	{0x0C62, 0x0C63, WBP_Extend},
+	{0x0C66, 0x0C6F, WBP_Numeric},
+	{0x0C82, 0x0C83, WBP_Extend},
+	{0x0C85, 0x0C8C, WBP_ALetter},
+	{0x0C8E, 0x0C90, WBP_ALetter},
+	{0x0C92, 0x0CA8, WBP_ALetter},
+	{0x0CAA, 0x0CB3, WBP_ALetter},
+	{0x0CB5, 0x0CB9, WBP_ALetter},
+	{0x0CBC, 0x0CBC, WBP_Extend},
+	{0x0CBD, 0x0CBD, WBP_ALetter},
+	{0x0CBE, 0x0CBE, WBP_Extend},
+	{0x0CBF, 0x0CBF, WBP_Extend},
+	{0x0CC0, 0x0CC4, WBP_Extend},
+	{0x0CC6, 0x0CC6, WBP_Extend},
+	{0x0CC7, 0x0CC8, WBP_Extend},
+	{0x0CCA, 0x0CCB, WBP_Extend},
+	{0x0CCC, 0x0CCD, WBP_Extend},
+	{0x0CD5, 0x0CD6, WBP_Extend},
+	{0x0CDE, 0x0CDE, WBP_ALetter},
+	{0x0CE0, 0x0CE1, WBP_ALetter},
+	{0x0CE2, 0x0CE3, WBP_Extend},
+	{0x0CE6, 0x0CEF, WBP_Numeric},
+	{0x0CF1, 0x0CF2, WBP_ALetter},
+	{0x0D02, 0x0D03, WBP_Extend},
+	{0x0D05, 0x0D0C, WBP_ALetter},
+	{0x0D0E, 0x0D10, WBP_ALetter},
+	{0x0D12, 0x0D3A, WBP_ALetter},
+	{0x0D3D, 0x0D3D, WBP_ALetter},
+	{0x0D3E, 0x0D40, WBP_Extend},
+	{0x0D41, 0x0D44, WBP_Extend},
+	{0x0D46, 0x0D48, WBP_Extend},
+	{0x0D4A, 0x0D4C, WBP_Extend},
+	{0x0D4D, 0x0D4D, WBP_Extend},
+	{0x0D4E, 0x0D4E, WBP_ALetter},
+	{0x0D57, 0x0D57, WBP_Extend},
+	{0x0D60, 0x0D61, WBP_ALetter},
+	{0x0D62, 0x0D63, WBP_Extend},
+	{0x0D66, 0x0D6F, WBP_Numeric},
+	{0x0D7A, 0x0D7F, WBP_ALetter},
+	{0x0D82, 0x0D83, WBP_Extend},
+	{0x0D85, 0x0D96, WBP_ALetter},
+	{0x0D9A, 0x0DB1, WBP_ALetter},
+	{0x0DB3, 0x0DBB, WBP_ALetter},
+	{0x0DBD, 0x0DBD, WBP_ALetter},
+	{0x0DC0, 0x0DC6, WBP_ALetter},
+	{0x0DCA, 0x0DCA, WBP_Extend},
+	{0x0DCF, 0x0DD1, WBP_Extend},
+	{0x0DD2, 0x0DD4, WBP_Extend},
+	{0x0DD6, 0x0DD6, WBP_Extend},
+	{0x0DD8, 0x0DDF, WBP_Extend},
+	{0x0DF2, 0x0DF3, WBP_Extend},
+	{0x0E31, 0x0E31, WBP_Extend},
+	{0x0E34, 0x0E3A, WBP_Extend},
+	{0x0E47, 0x0E4E, WBP_Extend},
+	{0x0E50, 0x0E59, WBP_Numeric},
+	{0x0EB1, 0x0EB1, WBP_Extend},
+	{0x0EB4, 0x0EB9, WBP_Extend},
+	{0x0EBB, 0x0EBC, WBP_Extend},
+	{0x0EC8, 0x0ECD, WBP_Extend},
+	{0x0ED0, 0x0ED9, WBP_Numeric},
+	{0x0F00, 0x0F00, WBP_ALetter},
+	{0x0F18, 0x0F19, WBP_Extend},
+	{0x0F20, 0x0F29, WBP_Numeric},
+	{0x0F35, 0x0F35, WBP_Extend},
+	{0x0F37, 0x0F37, WBP_Extend},
+	{0x0F39, 0x0F39, WBP_Extend},
+	{0x0F3E, 0x0F3F, WBP_Extend},
+	{0x0F40, 0x0F47, WBP_ALetter},
+	{0x0F49, 0x0F6C, WBP_ALetter},
+	{0x0F71, 0x0F7E, WBP_Extend},
+	{0x0F7F, 0x0F7F, WBP_Extend},
+	{0x0F80, 0x0F84, WBP_Extend},
+	{0x0F86, 0x0F87, WBP_Extend},
+	{0x0F88, 0x0F8C, WBP_ALetter},
+	{0x0F8D, 0x0F97, WBP_Extend},
+	{0x0F99, 0x0FBC, WBP_Extend},
+	{0x0FC6, 0x0FC6, WBP_Extend},
+	{0x102B, 0x102C, WBP_Extend},
+	{0x102D, 0x1030, WBP_Extend},
+	{0x1031, 0x1031, WBP_Extend},
+	{0x1032, 0x1037, WBP_Extend},
+	{0x1038, 0x1038, WBP_Extend},
+	{0x1039, 0x103A, WBP_Extend},
+	{0x103B, 0x103C, WBP_Extend},
+	{0x103D, 0x103E, WBP_Extend},
+	{0x1040, 0x1049, WBP_Numeric},
+	{0x1056, 0x1057, WBP_Extend},
+	{0x1058, 0x1059, WBP_Extend},
+	{0x105E, 0x1060, WBP_Extend},
+	{0x1062, 0x1064, WBP_Extend},
+	{0x1067, 0x106D, WBP_Extend},
+	{0x1071, 0x1074, WBP_Extend},
+	{0x1082, 0x1082, WBP_Extend},
+	{0x1083, 0x1084, WBP_Extend},
+	{0x1085, 0x1086, WBP_Extend},
+	{0x1087, 0x108C, WBP_Extend},
+	{0x108D, 0x108D, WBP_Extend},
+	{0x108F, 0x108F, WBP_Extend},
+	{0x1090, 0x1099, WBP_Numeric},
+	{0x109A, 0x109C, WBP_Extend},
+	{0x109D, 0x109D, WBP_Extend},
+	{0x10A0, 0x10C5, WBP_ALetter},
+	{0x10D0, 0x10FA, WBP_ALetter},
+	{0x10FC, 0x10FC, WBP_ALetter},
+	{0x1100, 0x1248, WBP_ALetter},
+	{0x124A, 0x124D, WBP_ALetter},
+	{0x1250, 0x1256, WBP_ALetter},
+	{0x1258, 0x1258, WBP_ALetter},
+	{0x125A, 0x125D, WBP_ALetter},
+	{0x1260, 0x1288, WBP_ALetter},
+	{0x128A, 0x128D, WBP_ALetter},
+	{0x1290, 0x12B0, WBP_ALetter},
+	{0x12B2, 0x12B5, WBP_ALetter},
+	{0x12B8, 0x12BE, WBP_ALetter},
+	{0x12C0, 0x12C0, WBP_ALetter},
+	{0x12C2, 0x12C5, WBP_ALetter},
+	{0x12C8, 0x12D6, WBP_ALetter},
+	{0x12D8, 0x1310, WBP_ALetter},
+	{0x1312, 0x1315, WBP_ALetter},
+	{0x1318, 0x135A, WBP_ALetter},
+	{0x135D, 0x135F, WBP_Extend},
+	{0x1380, 0x138F, WBP_ALetter},
+	{0x13A0, 0x13F4, WBP_ALetter},
+	{0x1401, 0x166C, WBP_ALetter},
+	{0x166F, 0x167F, WBP_ALetter},
+	{0x1681, 0x169A, WBP_ALetter},
+	{0x16A0, 0x16EA, WBP_ALetter},
+	{0x16EE, 0x16F0, WBP_ALetter},
+	{0x1700, 0x170C, WBP_ALetter},
+	{0x170E, 0x1711, WBP_ALetter},
+	{0x1712, 0x1714, WBP_Extend},
+	{0x1720, 0x1731, WBP_ALetter},
+	{0x1732, 0x1734, WBP_Extend},
+	{0x1740, 0x1751, WBP_ALetter},
+	{0x1752, 0x1753, WBP_Extend},
+	{0x1760, 0x176C, WBP_ALetter},
+	{0x176E, 0x1770, WBP_ALetter},
+	{0x1772, 0x1773, WBP_Extend},
+	{0x17B4, 0x17B5, WBP_Format},
+	{0x17B6, 0x17B6, WBP_Extend},
+	{0x17B7, 0x17BD, WBP_Extend},
+	{0x17BE, 0x17C5, WBP_Extend},
+	{0x17C6, 0x17C6, WBP_Extend},
+	{0x17C7, 0x17C8, WBP_Extend},
+	{0x17C9, 0x17D3, WBP_Extend},
+	{0x17DD, 0x17DD, WBP_Extend},
+	{0x17E0, 0x17E9, WBP_Numeric},
+	{0x180B, 0x180D, WBP_Extend},
+	{0x1810, 0x1819, WBP_Numeric},
+	{0x1820, 0x1842, WBP_ALetter},
+	{0x1843, 0x1843, WBP_ALetter},
+	{0x1844, 0x1877, WBP_ALetter},
+	{0x1880, 0x18A8, WBP_ALetter},
+	{0x18A9, 0x18A9, WBP_Extend},
+	{0x18AA, 0x18AA, WBP_ALetter},
+	{0x18B0, 0x18F5, WBP_ALetter},
+	{0x1900, 0x191C, WBP_ALetter},
+	{0x1920, 0x1922, WBP_Extend},
+	{0x1923, 0x1926, WBP_Extend},
+	{0x1927, 0x1928, WBP_Extend},
+	{0x1929, 0x192B, WBP_Extend},
+	{0x1930, 0x1931, WBP_Extend},
+	{0x1932, 0x1932, WBP_Extend},
+	{0x1933, 0x1938, WBP_Extend},
+	{0x1939, 0x193B, WBP_Extend},
+	{0x1946, 0x194F, WBP_Numeric},
+	{0x19B0, 0x19C0, WBP_Extend},
+	{0x19C8, 0x19C9, WBP_Extend},
+	{0x19D0, 0x19D9, WBP_Numeric},
+	{0x1A00, 0x1A16, WBP_ALetter},
+	{0x1A17, 0x1A18, WBP_Extend},
+	{0x1A19, 0x1A1B, WBP_Extend},
+	{0x1A55, 0x1A55, WBP_Extend},
+	{0x1A56, 0x1A56, WBP_Extend},
+	{0x1A57, 0x1A57, WBP_Extend},
+	{0x1A58, 0x1A5E, WBP_Extend},
+	{0x1A60, 0x1A60, WBP_Extend},
+	{0x1A61, 0x1A61, WBP_Extend},
+	{0x1A62, 0x1A62, WBP_Extend},
+	{0x1A63, 0x1A64, WBP_Extend},
+	{0x1A65, 0x1A6C, WBP_Extend},
+	{0x1A6D, 0x1A72, WBP_Extend},
+	{0x1A73, 0x1A7C, WBP_Extend},
+	{0x1A7F, 0x1A7F, WBP_Extend},
+	{0x1A80, 0x1A89, WBP_Numeric},
+	{0x1A90, 0x1A99, WBP_Numeric},
+	{0x1B00, 0x1B03, WBP_Extend},
+	{0x1B04, 0x1B04, WBP_Extend},
+	{0x1B05, 0x1B33, WBP_ALetter},
+	{0x1B34, 0x1B34, WBP_Extend},
+	{0x1B35, 0x1B35, WBP_Extend},
+	{0x1B36, 0x1B3A, WBP_Extend},
+	{0x1B3B, 0x1B3B, WBP_Extend},
+	{0x1B3C, 0x1B3C, WBP_Extend},
+	{0x1B3D, 0x1B41, WBP_Extend},
+	{0x1B42, 0x1B42, WBP_Extend},
+	{0x1B43, 0x1B44, WBP_Extend},
+	{0x1B45, 0x1B4B, WBP_ALetter},
+	{0x1B50, 0x1B59, WBP_Numeric},
+	{0x1B6B, 0x1B73, WBP_Extend},
+	{0x1B80, 0x1B81, WBP_Extend},
+	{0x1B82, 0x1B82, WBP_Extend},
+	{0x1B83, 0x1BA0, WBP_ALetter},
+	{0x1BA1, 0x1BA1, WBP_Extend},
+	{0x1BA2, 0x1BA5, WBP_Extend},
+	{0x1BA6, 0x1BA7, WBP_Extend},
+	{0x1BA8, 0x1BA9, WBP_Extend},
+	{0x1BAA, 0x1BAA, WBP_Extend},
+	{0x1BAE, 0x1BAF, WBP_ALetter},
+	{0x1BB0, 0x1BB9, WBP_Numeric},
+	{0x1BC0, 0x1BE5, WBP_ALetter},
+	{0x1BE6, 0x1BE6, WBP_Extend},
+	{0x1BE7, 0x1BE7, WBP_Extend},
+	{0x1BE8, 0x1BE9, WBP_Extend},
+	{0x1BEA, 0x1BEC, WBP_Extend},
+	{0x1BED, 0x1BED, WBP_Extend},
+	{0x1BEE, 0x1BEE, WBP_Extend},
+	{0x1BEF, 0x1BF1, WBP_Extend},
+	{0x1BF2, 0x1BF3, WBP_Extend},
+	{0x1C00, 0x1C23, WBP_ALetter},
+	{0x1C24, 0x1C2B, WBP_Extend},
+	{0x1C2C, 0x1C33, WBP_Extend},
+	{0x1C34, 0x1C35, WBP_Extend},
+	{0x1C36, 0x1C37, WBP_Extend},
+	{0x1C40, 0x1C49, WBP_Numeric},
+	{0x1C4D, 0x1C4F, WBP_ALetter},
+	{0x1C50, 0x1C59, WBP_Numeric},
+	{0x1C5A, 0x1C77, WBP_ALetter},
+	{0x1C78, 0x1C7D, WBP_ALetter},
+	{0x1CD0, 0x1CD2, WBP_Extend},
+	{0x1CD4, 0x1CE0, WBP_Extend},
+	{0x1CE1, 0x1CE1, WBP_Extend},
+	{0x1CE2, 0x1CE8, WBP_Extend},
+	{0x1CE9, 0x1CEC, WBP_ALetter},
+	{0x1CED, 0x1CED, WBP_Extend},
+	{0x1CEE, 0x1CF1, WBP_ALetter},
+	{0x1CF2, 0x1CF2, WBP_Extend},
+	{0x1D00, 0x1D2B, WBP_ALetter},
+	{0x1D2C, 0x1D61, WBP_ALetter},
+	{0x1D62, 0x1D77, WBP_ALetter},
+	{0x1D78, 0x1D78, WBP_ALetter},
+	{0x1D79, 0x1D9A, WBP_ALetter},
+	{0x1D9B, 0x1DBF, WBP_ALetter},
+	{0x1DC0, 0x1DE6, WBP_Extend},
+	{0x1DFC, 0x1DFF, WBP_Extend},
+	{0x1E00, 0x1F15, WBP_ALetter},
+	{0x1F18, 0x1F1D, WBP_ALetter},
+	{0x1F20, 0x1F45, WBP_ALetter},
+	{0x1F48, 0x1F4D, WBP_ALetter},
+	{0x1F50, 0x1F57, WBP_ALetter},
+	{0x1F59, 0x1F59, WBP_ALetter},
+	{0x1F5B, 0x1F5B, WBP_ALetter},
+	{0x1F5D, 0x1F5D, WBP_ALetter},
+	{0x1F5F, 0x1F7D, WBP_ALetter},
+	{0x1F80, 0x1FB4, WBP_ALetter},
+	{0x1FB6, 0x1FBC, WBP_ALetter},
+	{0x1FBE, 0x1FBE, WBP_ALetter},
+	{0x1FC2, 0x1FC4, WBP_ALetter},
+	{0x1FC6, 0x1FCC, WBP_ALetter},
+	{0x1FD0, 0x1FD3, WBP_ALetter},
+	{0x1FD6, 0x1FDB, WBP_ALetter},
+	{0x1FE0, 0x1FEC, WBP_ALetter},
+	{0x1FF2, 0x1FF4, WBP_ALetter},
+	{0x1FF6, 0x1FFC, WBP_ALetter},
+	{0x200C, 0x200D, WBP_Extend},
+	{0x200E, 0x200F, WBP_Format},
+	{0x2018, 0x2018, WBP_MidNumLet},
+	{0x2019, 0x2019, WBP_MidNumLet},
+	{0x2024, 0x2024, WBP_MidNumLet},
+	{0x2027, 0x2027, WBP_MidLetter},
+	{0x2028, 0x2028, WBP_Newline},
+	{0x2029, 0x2029, WBP_Newline},
+	{0x202A, 0x202E, WBP_Format},
+	{0x203F, 0x2040, WBP_ExtendNumLet},
+	{0x2044, 0x2044, WBP_MidNum},
+	{0x2054, 0x2054, WBP_ExtendNumLet},
+	{0x2060, 0x2064, WBP_Format},
+	{0x206A, 0x206F, WBP_Format},
+	{0x2071, 0x2071, WBP_ALetter},
+	{0x207F, 0x207F, WBP_ALetter},
+	{0x2090, 0x209C, WBP_ALetter},
+	{0x20D0, 0x20DC, WBP_Extend},
+	{0x20DD, 0x20E0, WBP_Extend},
+	{0x20E1, 0x20E1, WBP_Extend},
+	{0x20E2, 0x20E4, WBP_Extend},
+	{0x20E5, 0x20F0, WBP_Extend},
+	{0x2102, 0x2102, WBP_ALetter},
+	{0x2107, 0x2107, WBP_ALetter},
+	{0x210A, 0x2113, WBP_ALetter},
+	{0x2115, 0x2115, WBP_ALetter},
+	{0x2119, 0x211D, WBP_ALetter},
+	{0x2124, 0x2124, WBP_ALetter},
+	{0x2126, 0x2126, WBP_ALetter},
+	{0x2128, 0x2128, WBP_ALetter},
+	{0x212A, 0x212D, WBP_ALetter},
+	{0x212F, 0x2134, WBP_ALetter},
+	{0x2135, 0x2138, WBP_ALetter},
+	{0x2139, 0x2139, WBP_ALetter},
+	{0x213C, 0x213F, WBP_ALetter},
+	{0x2145, 0x2149, WBP_ALetter},
+	{0x214E, 0x214E, WBP_ALetter},
+	{0x2160, 0x2182, WBP_ALetter},
+	{0x2183, 0x2184, WBP_ALetter},
+	{0x2185, 0x2188, WBP_ALetter},
+	{0x24B6, 0x24E9, WBP_ALetter},
+	{0x2C00, 0x2C2E, WBP_ALetter},
+	{0x2C30, 0x2C5E, WBP_ALetter},
+	{0x2C60, 0x2C7C, WBP_ALetter},
+	{0x2C7D, 0x2C7D, WBP_ALetter},
+	{0x2C7E, 0x2CE4, WBP_ALetter},
+	{0x2CEB, 0x2CEE, WBP_ALetter},
+	{0x2CEF, 0x2CF1, WBP_Extend},
+	{0x2D00, 0x2D25, WBP_ALetter},
+	{0x2D30, 0x2D65, WBP_ALetter},
+	{0x2D6F, 0x2D6F, WBP_ALetter},
+	{0x2D7F, 0x2D7F, WBP_Extend},
+	{0x2D80, 0x2D96, WBP_ALetter},
+	{0x2DA0, 0x2DA6, WBP_ALetter},
+	{0x2DA8, 0x2DAE, WBP_ALetter},
+	{0x2DB0, 0x2DB6, WBP_ALetter},
+	{0x2DB8, 0x2DBE, WBP_ALetter},
+	{0x2DC0, 0x2DC6, WBP_ALetter},
+	{0x2DC8, 0x2DCE, WBP_ALetter},
+	{0x2DD0, 0x2DD6, WBP_ALetter},
+	{0x2DD8, 0x2DDE, WBP_ALetter},
+	{0x2DE0, 0x2DFF, WBP_Extend},
+	{0x2E2F, 0x2E2F, WBP_ALetter},
+	{0x3005, 0x3005, WBP_ALetter},
+	{0x302A, 0x302F, WBP_Extend},
+	{0x3031, 0x3035, WBP_Katakana},
+	{0x303B, 0x303B, WBP_ALetter},
+	{0x303C, 0x303C, WBP_ALetter},
+	{0x3099, 0x309A, WBP_Extend},
+	{0x309B, 0x309C, WBP_Katakana},
+	{0x30A0, 0x30A0, WBP_Katakana},
+	{0x30A1, 0x30FA, WBP_Katakana},
+	{0x30FC, 0x30FE, WBP_Katakana},
+	{0x30FF, 0x30FF, WBP_Katakana},
+	{0x3105, 0x312D, WBP_ALetter},
+	{0x3131, 0x318E, WBP_ALetter},
+	{0x31A0, 0x31BA, WBP_ALetter},
+	{0x31F0, 0x31FF, WBP_Katakana},
+	{0x32D0, 0x32FE, WBP_Katakana},
+	{0x3300, 0x3357, WBP_Katakana},
+	{0xA000, 0xA014, WBP_ALetter},
+	{0xA015, 0xA015, WBP_ALetter},
+	{0xA016, 0xA48C, WBP_ALetter},
+	{0xA4D0, 0xA4F7, WBP_ALetter},
+	{0xA4F8, 0xA4FD, WBP_ALetter},
+	{0xA500, 0xA60B, WBP_ALetter},
+	{0xA60C, 0xA60C, WBP_ALetter},
+	{0xA610, 0xA61F, WBP_ALetter},
+	{0xA620, 0xA629, WBP_Numeric},
+	{0xA62A, 0xA62B, WBP_ALetter},
+	{0xA640, 0xA66D, WBP_ALetter},
+	{0xA66E, 0xA66E, WBP_ALetter},
+	{0xA66F, 0xA66F, WBP_Extend},
+	{0xA670, 0xA672, WBP_Extend},
+	{0xA67C, 0xA67D, WBP_Extend},
+	{0xA67F, 0xA67F, WBP_ALetter},
+	{0xA680, 0xA697, WBP_ALetter},
+	{0xA6A0, 0xA6E5, WBP_ALetter},
+	{0xA6E6, 0xA6EF, WBP_ALetter},
+	{0xA6F0, 0xA6F1, WBP_Extend},
+	{0xA717, 0xA71F, WBP_ALetter},
+	{0xA722, 0xA76F, WBP_ALetter},
+	{0xA770, 0xA770, WBP_ALetter},
+	{0xA771, 0xA787, WBP_ALetter},
+	{0xA788, 0xA788, WBP_ALetter},
+	{0xA78B, 0xA78E, WBP_ALetter},
+	{0xA790, 0xA791, WBP_ALetter},
+	{0xA7A0, 0xA7A9, WBP_ALetter},
+	{0xA7FA, 0xA7FA, WBP_ALetter},
+	{0xA7FB, 0xA801, WBP_ALetter},
+	{0xA802, 0xA802, WBP_Extend},
+	{0xA803, 0xA805, WBP_ALetter},
+	{0xA806, 0xA806, WBP_Extend},
+	{0xA807, 0xA80A, WBP_ALetter},
+	{0xA80B, 0xA80B, WBP_Extend},
+	{0xA80C, 0xA822, WBP_ALetter},
+	{0xA823, 0xA824, WBP_Extend},
+	{0xA825, 0xA826, WBP_Extend},
+	{0xA827, 0xA827, WBP_Extend},
+	{0xA840, 0xA873, WBP_ALetter},
+	{0xA880, 0xA881, WBP_Extend},
+	{0xA882, 0xA8B3, WBP_ALetter},
+	{0xA8B4, 0xA8C3, WBP_Extend},
+	{0xA8C4, 0xA8C4, WBP_Extend},
+	{0xA8D0, 0xA8D9, WBP_Numeric},
+	{0xA8E0, 0xA8F1, WBP_Extend},
+	{0xA8F2, 0xA8F7, WBP_ALetter},
+	{0xA8FB, 0xA8FB, WBP_ALetter},
+	{0xA900, 0xA909, WBP_Numeric},
+	{0xA90A, 0xA925, WBP_ALetter},
+	{0xA926, 0xA92D, WBP_Extend},
+	{0xA930, 0xA946, WBP_ALetter},
+	{0xA947, 0xA951, WBP_Extend},
+	{0xA952, 0xA953, WBP_Extend},
+	{0xA960, 0xA97C, WBP_ALetter},
+	{0xA980, 0xA982, WBP_Extend},
+	{0xA983, 0xA983, WBP_Extend},
+	{0xA984, 0xA9B2, WBP_ALetter},
+	{0xA9B3, 0xA9B3, WBP_Extend},
+	{0xA9B4, 0xA9B5, WBP_Extend},
+	{0xA9B6, 0xA9B9, WBP_Extend},
+	{0xA9BA, 0xA9BB, WBP_Extend},
+	{0xA9BC, 0xA9BC, WBP_Extend},
+	{0xA9BD, 0xA9C0, WBP_Extend},
+	{0xA9CF, 0xA9CF, WBP_ALetter},
+	{0xA9D0, 0xA9D9, WBP_Numeric},
+	{0xAA00, 0xAA28, WBP_ALetter},
+	{0xAA29, 0xAA2E, WBP_Extend},
+	{0xAA2F, 0xAA30, WBP_Extend},
+	{0xAA31, 0xAA32, WBP_Extend},
+	{0xAA33, 0xAA34, WBP_Extend},
+	{0xAA35, 0xAA36, WBP_Extend},
+	{0xAA40, 0xAA42, WBP_ALetter},
+	{0xAA43, 0xAA43, WBP_Extend},
+	{0xAA44, 0xAA4B, WBP_ALetter},
+	{0xAA4C, 0xAA4C, WBP_Extend},
+	{0xAA4D, 0xAA4D, WBP_Extend},
+	{0xAA50, 0xAA59, WBP_Numeric},
+	{0xAA7B, 0xAA7B, WBP_Extend},
+	{0xAAB0, 0xAAB0, WBP_Extend},
+	{0xAAB2, 0xAAB4, WBP_Extend},
+	{0xAAB7, 0xAAB8, WBP_Extend},
+	{0xAABE, 0xAABF, WBP_Extend},
+	{0xAAC1, 0xAAC1, WBP_Extend},
+	{0xAB01, 0xAB06, WBP_ALetter},
+	{0xAB09, 0xAB0E, WBP_ALetter},
+	{0xAB11, 0xAB16, WBP_ALetter},
+	{0xAB20, 0xAB26, WBP_ALetter},
+	{0xAB28, 0xAB2E, WBP_ALetter},
+	{0xABC0, 0xABE2, WBP_ALetter},
+	{0xABE3, 0xABE4, WBP_Extend},
+	{0xABE5, 0xABE5, WBP_Extend},
+	{0xABE6, 0xABE7, WBP_Extend},
+	{0xABE8, 0xABE8, WBP_Extend},
+	{0xABE9, 0xABEA, WBP_Extend},
+	{0xABEC, 0xABEC, WBP_Extend},
+	{0xABED, 0xABED, WBP_Extend},
+	{0xABF0, 0xABF9, WBP_Numeric},
+	{0xAC00, 0xD7A3, WBP_ALetter},
+	{0xD7B0, 0xD7C6, WBP_ALetter},
+	{0xD7CB, 0xD7FB, WBP_ALetter},
+	{0xFB00, 0xFB06, WBP_ALetter},
+	{0xFB13, 0xFB17, WBP_ALetter},
+	{0xFB1D, 0xFB1D, WBP_ALetter},
+	{0xFB1E, 0xFB1E, WBP_Extend},
+	{0xFB1F, 0xFB28, WBP_ALetter},
+	{0xFB2A, 0xFB36, WBP_ALetter},
+	{0xFB38, 0xFB3C, WBP_ALetter},
+	{0xFB3E, 0xFB3E, WBP_ALetter},
+	{0xFB40, 0xFB41, WBP_ALetter},
+	{0xFB43, 0xFB44, WBP_ALetter},
+	{0xFB46, 0xFBB1, WBP_ALetter},
+	{0xFBD3, 0xFD3D, WBP_ALetter},
+	{0xFD50, 0xFD8F, WBP_ALetter},
+	{0xFD92, 0xFDC7, WBP_ALetter},
+	{0xFDF0, 0xFDFB, WBP_ALetter},
+	{0xFE00, 0xFE0F, WBP_Extend},
+	{0xFE10, 0xFE10, WBP_MidNum},
+	{0xFE13, 0xFE13, WBP_MidLetter},
+	{0xFE14, 0xFE14, WBP_MidNum},
+	{0xFE20, 0xFE26, WBP_Extend},
+	{0xFE33, 0xFE34, WBP_ExtendNumLet},
+	{0xFE4D, 0xFE4F, WBP_ExtendNumLet},
+	{0xFE50, 0xFE50, WBP_MidNum},
+	{0xFE52, 0xFE52, WBP_MidNumLet},
+	{0xFE54, 0xFE54, WBP_MidNum},
+	{0xFE55, 0xFE55, WBP_MidLetter},
+	{0xFE70, 0xFE74, WBP_ALetter},
+	{0xFE76, 0xFEFC, WBP_ALetter},
+	{0xFEFF, 0xFEFF, WBP_Format},
+	{0xFF07, 0xFF07, WBP_MidNumLet},
+	{0xFF0C, 0xFF0C, WBP_MidNum},
+	{0xFF0E, 0xFF0E, WBP_MidNumLet},
+	{0xFF1A, 0xFF1A, WBP_MidLetter},
+	{0xFF1B, 0xFF1B, WBP_MidNum},
+	{0xFF21, 0xFF3A, WBP_ALetter},
+	{0xFF3F, 0xFF3F, WBP_ExtendNumLet},
+	{0xFF41, 0xFF5A, WBP_ALetter},
+	{0xFF66, 0xFF6F, WBP_Katakana},
+	{0xFF70, 0xFF70, WBP_Katakana},
+	{0xFF71, 0xFF9D, WBP_Katakana},
+	{0xFF9E, 0xFF9F, WBP_Extend},
+	{0xFFA0, 0xFFBE, WBP_ALetter},
+	{0xFFC2, 0xFFC7, WBP_ALetter},
+	{0xFFCA, 0xFFCF, WBP_ALetter},
+	{0xFFD2, 0xFFD7, WBP_ALetter},
+	{0xFFDA, 0xFFDC, WBP_ALetter},
+	{0xFFF9, 0xFFFB, WBP_Format},
+	{0x10000, 0x1000B, WBP_ALetter},
+	{0x1000D, 0x10026, WBP_ALetter},
+	{0x10028, 0x1003A, WBP_ALetter},
+	{0x1003C, 0x1003D, WBP_ALetter},
+	{0x1003F, 0x1004D, WBP_ALetter},
+	{0x10050, 0x1005D, WBP_ALetter},
+	{0x10080, 0x100FA, WBP_ALetter},
+	{0x10140, 0x10174, WBP_ALetter},
+	{0x101FD, 0x101FD, WBP_Extend},
+	{0x10280, 0x1029C, WBP_ALetter},
+	{0x102A0, 0x102D0, WBP_ALetter},
+	{0x10300, 0x1031E, WBP_ALetter},
+	{0x10330, 0x10340, WBP_ALetter},
+	{0x10341, 0x10341, WBP_ALetter},
+	{0x10342, 0x10349, WBP_ALetter},
+	{0x1034A, 0x1034A, WBP_ALetter},
+	{0x10380, 0x1039D, WBP_ALetter},
+	{0x103A0, 0x103C3, WBP_ALetter},
+	{0x103C8, 0x103CF, WBP_ALetter},
+	{0x103D1, 0x103D5, WBP_ALetter},
+	{0x10400, 0x1044F, WBP_ALetter},
+	{0x10450, 0x1049D, WBP_ALetter},
+	{0x104A0, 0x104A9, WBP_Numeric},
+	{0x10800, 0x10805, WBP_ALetter},
+	{0x10808, 0x10808, WBP_ALetter},
+	{0x1080A, 0x10835, WBP_ALetter},
+	{0x10837, 0x10838, WBP_ALetter},
+	{0x1083C, 0x1083C, WBP_ALetter},
+	{0x1083F, 0x10855, WBP_ALetter},
+	{0x10900, 0x10915, WBP_ALetter},
+	{0x10920, 0x10939, WBP_ALetter},
+	{0x10A00, 0x10A00, WBP_ALetter},
+	{0x10A01, 0x10A03, WBP_Extend},
+	{0x10A05, 0x10A06, WBP_Extend},
+	{0x10A0C, 0x10A0F, WBP_Extend},
+	{0x10A10, 0x10A13, WBP_ALetter},
+	{0x10A15, 0x10A17, WBP_ALetter},
+	{0x10A19, 0x10A33, WBP_ALetter},
+	{0x10A38, 0x10A3A, WBP_Extend},
+	{0x10A3F, 0x10A3F, WBP_Extend},
+	{0x10A60, 0x10A7C, WBP_ALetter},
+	{0x10B00, 0x10B35, WBP_ALetter},
+	{0x10B40, 0x10B55, WBP_ALetter},
+	{0x10B60, 0x10B72, WBP_ALetter},
+	{0x10C00, 0x10C48, WBP_ALetter},
+	{0x11000, 0x11000, WBP_Extend},
+	{0x11001, 0x11001, WBP_Extend},
+	{0x11002, 0x11002, WBP_Extend},
+	{0x11003, 0x11037, WBP_ALetter},
+	{0x11038, 0x11046, WBP_Extend},
+	{0x11066, 0x1106F, WBP_Numeric},
+	{0x11080, 0x11081, WBP_Extend},
+	{0x11082, 0x11082, WBP_Extend},
+	{0x11083, 0x110AF, WBP_ALetter},
+	{0x110B0, 0x110B2, WBP_Extend},
+	{0x110B3, 0x110B6, WBP_Extend},
+	{0x110B7, 0x110B8, WBP_Extend},
+	{0x110B9, 0x110BA, WBP_Extend},
+	{0x110BD, 0x110BD, WBP_Format},
+	{0x12000, 0x1236E, WBP_ALetter},
+	{0x12400, 0x12462, WBP_ALetter},
+	{0x13000, 0x1342E, WBP_ALetter},
+	{0x16800, 0x16A38, WBP_ALetter},
+	{0x1B000, 0x1B000, WBP_Katakana},
+	{0x1D165, 0x1D166, WBP_Extend},
+	{0x1D167, 0x1D169, WBP_Extend},
+	{0x1D16D, 0x1D172, WBP_Extend},
+	{0x1D173, 0x1D17A, WBP_Format},
+	{0x1D17B, 0x1D182, WBP_Extend},
+	{0x1D185, 0x1D18B, WBP_Extend},
+	{0x1D1AA, 0x1D1AD, WBP_Extend},
+	{0x1D242, 0x1D244, WBP_Extend},
+	{0x1D400, 0x1D454, WBP_ALetter},
+	{0x1D456, 0x1D49C, WBP_ALetter},
+	{0x1D49E, 0x1D49F, WBP_ALetter},
+	{0x1D4A2, 0x1D4A2, WBP_ALetter},
+	{0x1D4A5, 0x1D4A6, WBP_ALetter},
+	{0x1D4A9, 0x1D4AC, WBP_ALetter},
+	{0x1D4AE, 0x1D4B9, WBP_ALetter},
+	{0x1D4BB, 0x1D4BB, WBP_ALetter},
+	{0x1D4BD, 0x1D4C3, WBP_ALetter},
+	{0x1D4C5, 0x1D505, WBP_ALetter},
+	{0x1D507, 0x1D50A, WBP_ALetter},
+	{0x1D50D, 0x1D514, WBP_ALetter},
+	{0x1D516, 0x1D51C, WBP_ALetter},
+	{0x1D51E, 0x1D539, WBP_ALetter},
+	{0x1D53B, 0x1D53E, WBP_ALetter},
+	{0x1D540, 0x1D544, WBP_ALetter},
+	{0x1D546, 0x1D546, WBP_ALetter},
+	{0x1D54A, 0x1D550, WBP_ALetter},
+	{0x1D552, 0x1D6A5, WBP_ALetter},
+	{0x1D6A8, 0x1D6C0, WBP_ALetter},
+	{0x1D6C2, 0x1D6DA, WBP_ALetter},
+	{0x1D6DC, 0x1D6FA, WBP_ALetter},
+	{0x1D6FC, 0x1D714, WBP_ALetter},
+	{0x1D716, 0x1D734, WBP_ALetter},
+	{0x1D736, 0x1D74E, WBP_ALetter},
+	{0x1D750, 0x1D76E, WBP_ALetter},
+	{0x1D770, 0x1D788, WBP_ALetter},
+	{0x1D78A, 0x1D7A8, WBP_ALetter},
+	{0x1D7AA, 0x1D7C2, WBP_ALetter},
+	{0x1D7C4, 0x1D7CB, WBP_ALetter},
+	{0x1D7CE, 0x1D7FF, WBP_Numeric},
+	{0xE0001, 0xE0001, WBP_Format},
+	{0xE0020, 0xE007F, WBP_Format},
+	{0xE0100, 0xE01EF, WBP_Extend},
+	{0xFFFFFFFF, 0xFFFFFFFF, WBP_Undefined}
+};
--- a/linebreak/linebreak/wordbreakdata1.tmpl
+++ b/linebreak/linebreak/wordbreakdata1.tmpl
@ -0,0 +1,5 @@
+
+#include "linebreak.h"
+#include "wordbreakdef.h"
+
+static struct WordBreakProperties wb_prop_default[] = {
--- a/linebreak/linebreak/wordbreakdata2.tmpl
+++ b/linebreak/linebreak/wordbreakdata2.tmpl
@ -0,0 +1,2 @@
+	{0xFFFFFFFF, 0xFFFFFFFF, WBP_Undefined}
+};
--- a/linebreak/linebreak/wordbreakdef.h
+++ b/linebreak/linebreak/wordbreakdef.h
@ -0,0 +1,78 @@
+/* vim: set tabstop=4 shiftwidth=4: */
+
+/*
+ * Word breaking in a Unicode sequence.  Designed to be used in a
+ * generic text renderer.
+ *
+ * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ *
+ * This software is provided 'as-is', without any express or implied
+ * warranty.  In no event will the author be held liable for any damages
+ * arising from the use of this software.
+ *
+ * Permission is granted to anyone to use this software for any purpose,
+ * including commercial applications, and to alter it and redistribute
+ * it freely, subject to the following restrictions:
+ *
+ * 1. The origin of this software must not be misrepresented; you must
+ *    not claim that you wrote the original software.  If you use this
+ *    software in a product, an acknowledgement in the product
+ *    documentation would be appreciated but is not required.
+ * 2. Altered source versions must be plainly marked as such, and must
+ *    not be misrepresented as being the original software.
+ * 3. This notice may not be removed or altered from any source
+ *    distribution.
+ *
+ * The main reference is Unicode Standard Annex 29 (UAX #29):
+ *		<URL:http://unicode.org/reports/tr29>
+ *
+ * When this library was designed, this annex was at Revision 17, for
+ * Unicode 6.0.0:
+ *		<URL:http://www.unicode.org/reports/tr29/tr29-17.html>
+ *
+ * The Unicode Terms of Use are available at
+ *		<URL:http://www.unicode.org/copyright.html>
+ */
+
+/**
+ * @file	wordbreakdef.h
+ *
+ * Definitions of internal data structures, declarations of global
+ * variables, and function prototypes for the word breaking algorithm.
+ *
+ * @version	2.1, 2012/01/18
+ * @author	Tom Hacohen
+ */
+
+/**
+ * Word break classes.  This is a direct mapping of Table 3 of Unicode
+ * Standard Annex 29, Revision 17.
+ */
+enum WordBreakClass
+{
+   WBP_Undefined,
+   WBP_CR,
+   WBP_LF,
+   WBP_Newline,
+   WBP_Extend,
+   WBP_Format,
+   WBP_Katakana,
+   WBP_ALetter,
+   WBP_MidNumLet,
+   WBP_MidLetter,
+   WBP_MidNum,
+   WBP_Numeric,
+   WBP_ExtendNumLet,
+   WBP_Any
+};
+
+/**
+ * Struct for entries of word break properties.  The array of the
+ * entries \e must be sorted.
+ */
+struct WordBreakProperties
+{
+	utf32_t start;				/**< Starting coding point */
+	utf32_t end;				/**< End coding point */
+	enum WordBreakClass prop;	/**< The word breaking property */
+};
				`@ -0,0 +1 @@`
				`:pserver:anonymous@vimgadgets.cvs.sourceforge.net:/cvsroot/vimgadgets`
				`@ -0,0 +1 @@`
				`s/\(^[0-9A-F.]\{1,\};[A-Z][A-Z0-9]\) #.*/\1/p`
				`@ -0,0 +1 @@`
				`/* The content of this file is generated from:`