| Index: third_party/sqlite/src/ext/icu/README.txt
|
| diff --git a/third_party/sqlite/src/ext/icu/README.txt b/third_party/sqlite/src/ext/icu/README.txt
|
| new file mode 100644
|
| index 0000000000000000000000000000000000000000..c5cadb57d1ff575b615a2c1e75dff2657e904b72
|
| --- /dev/null
|
| +++ b/third_party/sqlite/src/ext/icu/README.txt
|
| @@ -0,0 +1,169 @@
|
| +
|
| +This directory contains source code for the SQLite "ICU" extension, an
|
| +integration of the "International Components for Unicode" library with
|
| +SQLite. Documentation follows.
|
| +
|
| + 1. Features
|
| +
|
| + 1.1 SQL Scalars upper() and lower()
|
| + 1.2 Unicode Aware LIKE Operator
|
| + 1.3 ICU Collation Sequences
|
| + 1.4 SQL REGEXP Operator
|
| +
|
| + 2. Compilation and Usage
|
| +
|
| + 3. Bugs, Problems and Security Issues
|
| +
|
| + 3.1 The "case_sensitive_like" Pragma
|
| + 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
|
| + 3.3 Collation Sequence Security Issue
|
| +
|
| +
|
| +1. FEATURES
|
| +
|
| + 1.1 SQL Scalars upper() and lower()
|
| +
|
| + SQLite's built-in implementations of these two functions only
|
| + provide case mapping for the 26 letters used in the English
|
| + language. The ICU based functions provided by this extension
|
| + provide case mapping, where defined, for the full range of
|
| + unicode characters.
|
| +
|
| + ICU provides two types of case mapping, "general" case mapping and
|
| + "language specific". Refer to ICU documentation for the differences
|
| + between the two. Specifically:
|
| +
|
| + http://www.icu-project.org/userguide/caseMappings.html
|
| + http://www.icu-project.org/userguide/posix.html#case_mappings
|
| +
|
| + To utilise "general" case mapping, the upper() or lower() scalar
|
| + functions are invoked with one argument:
|
| +
|
| + upper('ABC') -> 'abc'
|
| + lower('abc') -> 'ABC'
|
| +
|
| + To access ICU "language specific" case mapping, upper() or lower()
|
| + should be invoked with two arguments. The second argument is the name
|
| + of the locale to use. Passing an empty string ("") or SQL NULL value
|
| + as the second argument is the same as invoking the 1 argument version
|
| + of upper() or lower():
|
| +
|
| + lower('I', 'en_us') -> 'i'
|
| + lower('I', 'tr_tr') -> 'ı' (small dotless i)
|
| +
|
| + 1.2 Unicode Aware LIKE Operator
|
| +
|
| + Similarly to the upper() and lower() functions, the built-in SQLite LIKE
|
| + operator understands case equivalence for the 26 letters of the English
|
| + language alphabet. The implementation of LIKE included in this
|
| + extension uses the ICU function u_foldCase() to provide case
|
| + independent comparisons for the full range of unicode characters.
|
| +
|
| + The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the
|
| + dotless 'I' character used in the Turkish language is considered
|
| + to be in the same equivalence class as the dotted 'I' character
|
| + used by many languages (including English).
|
| +
|
| + 1.3 ICU Collation Sequences
|
| +
|
| + A special SQL scalar function, icu_load_collation() is provided that
|
| + may be used to register ICU collation sequences with SQLite. It
|
| + is always called with exactly two arguments, the ICU locale
|
| + identifying the collation sequence to ICU, and the name of the
|
| + SQLite collation sequence to create. For example, to create an
|
| + SQLite collation sequence named "turkish" using Turkish language
|
| + sorting rules, the SQL statement:
|
| +
|
| + SELECT icu_load_collation('tr_TR', 'turkish');
|
| +
|
| + Or, for Australian English:
|
| +
|
| + SELECT icu_load_collation('en_AU', 'australian');
|
| +
|
| + The identifiers "turkish" and "australian" may then be used
|
| + as collation sequence identifiers in SQL statements:
|
| +
|
| + CREATE TABLE aust_turkish_penpals(
|
| + australian_penpal_name TEXT COLLATE australian,
|
| + turkish_penpal_name TEXT COLLATE turkish
|
| + );
|
| +
|
| + 1.4 SQL REGEXP Operator
|
| +
|
| + This extension provides an implementation of the SQL binary
|
| + comparision operator "REGEXP", based on the regular expression functions
|
| + provided by the ICU library. The syntax of the operator is as described
|
| + in SQLite documentation:
|
| +
|
| + <string> REGEXP <re-pattern>
|
| +
|
| + This extension uses the ICU defaults for regular expression matching
|
| + behaviour. Specifically, this means that:
|
| +
|
| + * Matching is case-sensitive,
|
| + * Regular expression comments are not allowed within patterns, and
|
| + * The '^' and '$' characters match the beginning and end of the
|
| + <string> argument, not the beginning and end of lines within
|
| + the <string> argument.
|
| +
|
| + Even more specifically, the value passed to the "flags" parameter
|
| + of ICU C function uregex_open() is 0.
|
| +
|
| +
|
| +2 COMPILATION AND USAGE
|
| +
|
| + The easiest way to compile and use the ICU extension is to build
|
| + and use it as a dynamically loadable SQLite extension. To do this
|
| + using gcc on *nix:
|
| +
|
| + gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so
|
| +
|
| + You may need to add "-I" flags so that gcc can find sqlite3ext.h
|
| + and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be
|
| + loaded into sqlite in the same way as any other dynamically loadable
|
| + extension.
|
| +
|
| +
|
| +3 BUGS, PROBLEMS AND SECURITY ISSUES
|
| +
|
| + 3.1 The "case_sensitive_like" Pragma
|
| +
|
| + This extension does not work well with the "case_sensitive_like"
|
| + pragma. If this pragma is used before the ICU extension is loaded,
|
| + then the pragma has no effect. If the pragma is used after the ICU
|
| + extension is loaded, then SQLite ignores the ICU implementation and
|
| + always uses the built-in LIKE operator.
|
| +
|
| + The ICU extension LIKE operator is always case insensitive.
|
| +
|
| + 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro
|
| +
|
| + Passing very long patterns to the built-in SQLite LIKE operator can
|
| + cause excessive CPU usage. To curb this problem, SQLite defines the
|
| + SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a
|
| + pattern in bytes (irrespective of encoding). The default value is
|
| + defined in internal header file "limits.h".
|
| +
|
| + The ICU extension LIKE implementation suffers from the same
|
| + problem and uses the same solution. However, since the ICU extension
|
| + code does not include the SQLite file "limits.h", modifying
|
| + the default value therein does not affect the ICU extension.
|
| + The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by
|
| + the ICU extension LIKE operator is 50000, defined in source
|
| + file "icu.c".
|
| +
|
| + 3.3 Collation Sequence Security Issue
|
| +
|
| + Internally, SQLite assumes that indices stored in database files
|
| + are sorted according to the collation sequence indicated by the
|
| + SQL schema. Changing the definition of a collation sequence after
|
| + an index has been built is therefore equivalent to database
|
| + corruption. The SQLite library is not very well tested under
|
| + these conditions, and may contain potential buffer overruns
|
| + or other programming errors that could be exploited by a malicious
|
| + programmer.
|
| +
|
| + If the ICU extension is used in an environment where potentially
|
| + malicious users may execute arbitrary SQL (i.e. gears), they
|
| + should be prevented from invoking the icu_load_collation() function,
|
| + possibly using the authorisation callback.
|
|
|