third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt - Issue 949043002: Add //third_party/sqlite to dirs_to_snapshot, remove net_sql.patch

Unified Diff: third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt

Issue 949043002: Add //third_party/sqlite to dirs_to_snapshot, remove net_sql.patch (Closed) Base URL: git@github.com:domokit/mojo.git@master

Patch Set: Created 5 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

« no previous file with comments | « third_party/sqlite/sqlite-src-3070603/ext/fts3/mkfts3amal.tcl ('k') | third_party/sqlite/sqlite-src-3070603/ext/icu/icu.c » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt

diff --git a/third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt b/third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt

new file mode 100644

index 0000000000000000000000000000000000000000..c5cadb57d1ff575b615a2c1e75dff2657e904b72

--- /dev/null

+++ b/third_party/sqlite/sqlite-src-3070603/ext/icu/README.txt

@@ -0,0 +1,169 @@

+This directory contains source code for the SQLite "ICU" extension, an

+integration of the "International Components for Unicode" library with

+SQLite. Documentation follows.

+ 1. Features

+ 1.1 SQL Scalars upper() and lower()

+ 1.2 Unicode Aware LIKE Operator

+ 1.3 ICU Collation Sequences

+ 1.4 SQL REGEXP Operator

+ 2. Compilation and Usage

+ 3. Bugs, Problems and Security Issues

+ 3.1 The "case_sensitive_like" Pragma

+ 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro

+ 3.3 Collation Sequence Security Issue

+1. FEATURES

+ 1.1 SQL Scalars upper() and lower()

+ SQLite's built-in implementations of these two functions only

+ provide case mapping for the 26 letters used in the English

+ language. The ICU based functions provided by this extension

+ provide case mapping, where defined, for the full range of

+ unicode characters.

+ ICU provides two types of case mapping, "general" case mapping and

+ "language specific". Refer to ICU documentation for the differences

+ between the two. Specifically:

+ http://www.icu-project.org/userguide/caseMappings.html

+ http://www.icu-project.org/userguide/posix.html#case_mappings

+ To utilise "general" case mapping, the upper() or lower() scalar

+ functions are invoked with one argument:

+ upper('ABC') -> 'abc'

+ lower('abc') -> 'ABC'

+ To access ICU "language specific" case mapping, upper() or lower()

+ should be invoked with two arguments. The second argument is the name

+ of the locale to use. Passing an empty string ("") or SQL NULL value

+ as the second argument is the same as invoking the 1 argument version

+ of upper() or lower():

+ lower('I', 'en_us') -> 'i'

+ lower('I', 'tr_tr') -> 'ı' (small dotless i)

+ 1.2 Unicode Aware LIKE Operator

+ Similarly to the upper() and lower() functions, the built-in SQLite LIKE

+ operator understands case equivalence for the 26 letters of the English

+ language alphabet. The implementation of LIKE included in this

+ extension uses the ICU function u_foldCase() to provide case

+ independent comparisons for the full range of unicode characters.

+ The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the

+ dotless 'I' character used in the Turkish language is considered

+ to be in the same equivalence class as the dotted 'I' character

+ used by many languages (including English).

+ 1.3 ICU Collation Sequences

+ A special SQL scalar function, icu_load_collation() is provided that

+ may be used to register ICU collation sequences with SQLite. It

+ is always called with exactly two arguments, the ICU locale

+ identifying the collation sequence to ICU, and the name of the

+ SQLite collation sequence to create. For example, to create an

+ SQLite collation sequence named "turkish" using Turkish language

+ sorting rules, the SQL statement:

+ SELECT icu_load_collation('tr_TR', 'turkish');

+ Or, for Australian English:

+ SELECT icu_load_collation('en_AU', 'australian');

+ The identifiers "turkish" and "australian" may then be used

+ as collation sequence identifiers in SQL statements:

+ CREATE TABLE aust_turkish_penpals(

+ australian_penpal_name TEXT COLLATE australian,

+ turkish_penpal_name TEXT COLLATE turkish

+ );

+ 1.4 SQL REGEXP Operator

+ This extension provides an implementation of the SQL binary

+ comparision operator "REGEXP", based on the regular expression functions

+ provided by the ICU library. The syntax of the operator is as described

+ in SQLite documentation:

+ <string> REGEXP <re-pattern>

+ This extension uses the ICU defaults for regular expression matching

+ behaviour. Specifically, this means that:

+ * Matching is case-sensitive,

+ * Regular expression comments are not allowed within patterns, and

+ * The '^' and '$' characters match the beginning and end of the

+ <string> argument, not the beginning and end of lines within

+ the <string> argument.

+ Even more specifically, the value passed to the "flags" parameter

+ of ICU C function uregex_open() is 0.

+2 COMPILATION AND USAGE

+ The easiest way to compile and use the ICU extension is to build

+ and use it as a dynamically loadable SQLite extension. To do this

+ using gcc on *nix:

+ gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so

+ You may need to add "-I" flags so that gcc can find sqlite3ext.h

+ and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be

+ loaded into sqlite in the same way as any other dynamically loadable

+ extension.

+3 BUGS, PROBLEMS AND SECURITY ISSUES

+ 3.1 The "case_sensitive_like" Pragma

+ This extension does not work well with the "case_sensitive_like"

+ pragma. If this pragma is used before the ICU extension is loaded,

+ then the pragma has no effect. If the pragma is used after the ICU

+ extension is loaded, then SQLite ignores the ICU implementation and

+ always uses the built-in LIKE operator.

+ The ICU extension LIKE operator is always case insensitive.

+ 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro

+ Passing very long patterns to the built-in SQLite LIKE operator can

+ cause excessive CPU usage. To curb this problem, SQLite defines the

+ SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a

+ pattern in bytes (irrespective of encoding). The default value is

+ defined in internal header file "limits.h".

+ The ICU extension LIKE implementation suffers from the same

+ problem and uses the same solution. However, since the ICU extension

+ code does not include the SQLite file "limits.h", modifying

+ the default value therein does not affect the ICU extension.

+ The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by

+ the ICU extension LIKE operator is 50000, defined in source

+ file "icu.c".

+ 3.3 Collation Sequence Security Issue

+ Internally, SQLite assumes that indices stored in database files

+ are sorted according to the collation sequence indicated by the

+ SQL schema. Changing the definition of a collation sequence after

+ an index has been built is therefore equivalent to database

+ corruption. The SQLite library is not very well tested under

+ these conditions, and may contain potential buffer overruns

+ or other programming errors that could be exploited by a malicious

+ programmer.

+ If the ICU extension is used in an environment where potentially

+ malicious users may execute arbitrary SQL (i.e. gears), they

+ should be prevented from invoking the icu_load_collation() function,

+ possibly using the authorisation callback.