third_party/sqlite/ext/icu/README.txt - Issue 3108030: Move bundled copy of sqlite one level deeper to better separate it...

Side by Side Diff: third_party/sqlite/ext/icu/README.txt

Issue 3108030: Move bundled copy of sqlite one level deeper to better separate it... (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src/

Patch Set: Created 10 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
	(Empty)
1

2 This directory contains source code for the SQLite "ICU" extension, an

3 integration of the "International Components for Unicode" library with

4 SQLite. Documentation follows.

5

6 1. Features

7

8 1.1 SQL Scalars upper() and lower()

9 1.2 Unicode Aware LIKE Operator

10 1.3 ICU Collation Sequences

11 1.4 SQL REGEXP Operator

12

13 2. Compilation and Usage

14

15 3. Bugs, Problems and Security Issues

16

17 3.1 The "case_sensitive_like" Pragma

18 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro

19 3.3 Collation Sequence Security Issue

20

21

22 1. FEATURES

23

24 1.1 SQL Scalars upper() and lower()

25

26 SQLite's built-in implementations of these two functions only

27 provide case mapping for the 26 letters used in the English

28 language. The ICU based functions provided by this extension

29 provide case mapping, where defined, for the full range of

30 unicode characters.

31

32 ICU provides two types of case mapping, "general" case mapping and

33 "language specific". Refer to ICU documentation for the differences

34 between the two. Specifically:

35

36 http://www.icu-project.org/userguide/caseMappings.html

37 http://www.icu-project.org/userguide/posix.html#case_mappings

38

39 To utilise "general" case mapping, the upper() or lower() scalar

40 functions are invoked with one argument:

41

42 upper('ABC') -> 'abc'

43 lower('abc') -> 'ABC'

44

45 To access ICU "language specific" case mapping, upper() or lower()

46 should be invoked with two arguments. The second argument is the name

47 of the locale to use. Passing an empty string ("") or SQL NULL value

48 as the second argument is the same as invoking the 1 argument version

49 of upper() or lower():

50

51 lower('I', 'en_us') -> 'i'

52 lower('I', 'tr_tr') -> 'ı' (small dotless i)

53

54 1.2 Unicode Aware LIKE Operator

55

56 Similarly to the upper() and lower() functions, the built-in SQLite LIKE

57 operator understands case equivalence for the 26 letters of the English

58 language alphabet. The implementation of LIKE included in this

59 extension uses the ICU function u_foldCase() to provide case

60 independent comparisons for the full range of unicode characters.

61

62 The U_FOLD_CASE_DEFAULT flag is passed to u_foldCase(), meaning the

63 dotless 'I' character used in the Turkish language is considered

64 to be in the same equivalence class as the dotted 'I' character

65 used by many languages (including English).

66

67 1.3 ICU Collation Sequences

68

69 A special SQL scalar function, icu_load_collation() is provided that

70 may be used to register ICU collation sequences with SQLite. It

71 is always called with exactly two arguments, the ICU locale

72 identifying the collation sequence to ICU, and the name of the

73 SQLite collation sequence to create. For example, to create an

74 SQLite collation sequence named "turkish" using Turkish language

75 sorting rules, the SQL statement:

76

77 SELECT icu_load_collation('tr_TR', 'turkish');

78

79 Or, for Australian English:

80

81 SELECT icu_load_collation('en_AU', 'australian');

82

83 The identifiers "turkish" and "australian" may then be used

84 as collation sequence identifiers in SQL statements:

85

86 CREATE TABLE aust_turkish_penpals(

87 australian_penpal_name TEXT COLLATE australian,

88 turkish_penpal_name TEXT COLLATE turkish

89 );

90

91 1.4 SQL REGEXP Operator

92

93 This extension provides an implementation of the SQL binary

94 comparision operator "REGEXP", based on the regular expression functions

95 provided by the ICU library. The syntax of the operator is as described

96 in SQLite documentation:

97

98 <string> REGEXP <re-pattern>

99

100 This extension uses the ICU defaults for regular expression matching

101 behaviour. Specifically, this means that:

102

103 * Matching is case-sensitive,

104 * Regular expression comments are not allowed within patterns, and

105 * The '^' and '$' characters match the beginning and end of the

106 <string> argument, not the beginning and end of lines within

107 the <string> argument.

108

109 Even more specifically, the value passed to the "flags" parameter

110 of ICU C function uregex_open() is 0.

111

112

113 2 COMPILATION AND USAGE

114

115 The easiest way to compile and use the ICU extension is to build

116 and use it as a dynamically loadable SQLite extension. To do this

117 using gcc on *nix:

118

119 gcc -shared icu.c `icu-config --ldflags` -o libSqliteIcu.so

120

121 You may need to add "-I" flags so that gcc can find sqlite3ext.h

122 and sqlite3.h. The resulting shared lib, libSqliteIcu.so, may be

123 loaded into sqlite in the same way as any other dynamically loadable

124 extension.

125

126

127 3 BUGS, PROBLEMS AND SECURITY ISSUES

128

129 3.1 The "case_sensitive_like" Pragma

130

131 This extension does not work well with the "case_sensitive_like"

132 pragma. If this pragma is used before the ICU extension is loaded,

133 then the pragma has no effect. If the pragma is used after the ICU

134 extension is loaded, then SQLite ignores the ICU implementation and

135 always uses the built-in LIKE operator.

136

137 The ICU extension LIKE operator is always case insensitive.

138

139 3.2 The SQLITE_MAX_LIKE_PATTERN_LENGTH Macro

140

141 Passing very long patterns to the built-in SQLite LIKE operator can

142 cause a stack overflow. To curb this problem, SQLite defines the

143 SQLITE_MAX_LIKE_PATTERN_LENGTH macro as the maximum length of a

144 pattern in bytes (irrespective of encoding). The default value is

145 defined in internal header file "limits.h".

146

147 The ICU extension LIKE implementation suffers from the same

148 problem and uses the same solution. However, since the ICU extension

149 code does not include the SQLite file "limits.h", modifying

150 the default value therein does not affect the ICU extension.

151 The default value of SQLITE_MAX_LIKE_PATTERN_LENGTH used by

152 the ICU extension LIKE operator is 50000, defined in source

153 file "icu.c".

154

155 3.3 Collation Sequence Security Issue

156

157 Internally, SQLite assumes that indices stored in database files

158 are sorted according to the collation sequence indicated by the

159 SQL schema. Changing the definition of a collation sequence after

160 an index has been built is therefore equivalent to database

161 corruption. The SQLite library is not very well tested under

162 these conditions, and may contain potential buffer overruns

163 or other programming errors that could be exploited by a malicious

164 programmer.

165

166 If the ICU extension is used in an environment where potentially

167 malicious users may execute arbitrary SQL (i.e. gears), they

168 should be prevented from invoking the icu_load_collation() function,

169 possibly using the authorisation callback.

170

OLD	NEW

« no previous file with comments | « third_party/sqlite/ext/fts3/mkfts3amal.tcl ('k') | third_party/sqlite/ext/icu/icu.c » ('j') | no next file with comments »