OLD | NEW |
| (Empty) |
1 | |
2 FTS4 CONTENT OPTION | |
3 | |
4 Normally, in order to create a full-text index on a dataset, the FTS4 | |
5 module stores a copy of all indexed documents in a specially created | |
6 database table. | |
7 | |
8 As of SQLite version 3.7.9, FTS4 supports a new option - "content" - | |
9 designed to extend FTS4 to support the creation of full-text indexes where: | |
10 | |
11 * The indexed documents are not stored within the SQLite database | |
12 at all (a "contentless" FTS4 table), or | |
13 | |
14 * The indexed documents are stored in a database table created and | |
15 managed by the user (an "external content" FTS4 table). | |
16 | |
17 Because the indexed documents themselves are usually much larger than | |
18 the full-text index, the content option can sometimes be used to achieve | |
19 significant space savings. | |
20 | |
21 CONTENTLESS FTS4 TABLES | |
22 | |
23 In order to create an FTS4 table that does not store a copy of the indexed | |
24 documents at all, the content option should be set to an empty string. | |
25 For example, the following SQL creates such an FTS4 table with three | |
26 columns - "a", "b", and "c": | |
27 | |
28 CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c); | |
29 | |
30 Data can be inserted into such an FTS4 table using an INSERT statements. | |
31 However, unlike ordinary FTS4 tables, the user must supply an explicit | |
32 integer docid value. For example: | |
33 | |
34 -- This statement is Ok: | |
35 INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i'); | |
36 | |
37 -- This statement causes an error, as no docid value has been provided: | |
38 INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r'); | |
39 | |
40 It is not possible to UPDATE or DELETE a row stored in a contentless FTS4 | |
41 table. Attempting to do so is an error. | |
42 | |
43 Contentless FTS4 tables also support SELECT statements. However, it is | |
44 an error to attempt to retrieve the value of any table column other than | |
45 the docid column. The auxiliary function matchinfo() may be used, but | |
46 snippet() and offsets() may not. For example: | |
47 | |
48 -- The following statements are Ok: | |
49 SELECT docid FROM t1 WHERE t1 MATCH 'xxx'; | |
50 SELECT docid FROM t1 WHERE a MATCH 'xxx'; | |
51 SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx'; | |
52 | |
53 -- The following statements all cause errors, as the value of columns | |
54 -- other than docid are required to evaluate them. | |
55 SELECT * FROM t1; | |
56 SELECT a, b FROM t1 WHERE t1 MATCH 'xxx'; | |
57 SELECT docid FROM t1 WHERE a LIKE 'xxx%'; | |
58 SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx'; | |
59 | |
60 Errors related to attempting to retrieve column values other than docid | |
61 are runtime errors that occur within sqlite3_step(). In some cases, for | |
62 example if the MATCH expression in a SELECT query matches zero rows, there | |
63 may be no error at all even if a statement does refer to column values | |
64 other than docid. | |
65 | |
66 EXTERNAL CONTENT FTS4 TABLES | |
67 | |
68 An "external content" FTS4 table is similar to a contentless table, except | |
69 that if evaluation of a query requires the value of a column other than | |
70 docid, FTS4 attempts to retrieve that value from a table (or view, or | |
71 virtual table) nominated by the user (hereafter referred to as the "content | |
72 table"). The FTS4 module never writes to the content table, and writing | |
73 to the content table does not affect the full-text index. It is the | |
74 responsibility of the user to ensure that the content table and the | |
75 full-text index are consistent. | |
76 | |
77 An external content FTS4 table is created by setting the content option | |
78 to the name of a table (or view, or virtual table) that may be queried by | |
79 FTS4 to retrieve column values when required. If the nominated table does | |
80 not exist, then an external content table behaves in the same way as | |
81 a contentless table. For example: | |
82 | |
83 CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c); | |
84 CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c); | |
85 | |
86 Assuming the nominated table does exist, then its columns must be the same | |
87 as or a superset of those defined for the FTS table. | |
88 | |
89 When a users query on the FTS table requires a column value other than | |
90 docid, FTS attempts to read this value from the corresponding column of | |
91 the row in the content table with a rowid value equal to the current FTS | |
92 docid. Or, if such a row cannot be found in the content table, a NULL | |
93 value is used instead. For example: | |
94 | |
95 CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d); | |
96 CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c); | |
97 | |
98 INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f'); | |
99 INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l'); | |
100 INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2; | |
101 | |
102 -- The following query returns a single row with two columns containing | |
103 -- the text values "i j" and "k l". | |
104 -- | |
105 -- The query uses the full-text index to discover that the MATCH | |
106 -- term matches the row with docid=3. It then retrieves the values | |
107 -- of columns b and c from the row with rowid=3 in the content table | |
108 -- to return. | |
109 -- | |
110 SELECT * FROM t3 WHERE t3 MATCH 'k'; | |
111 | |
112 -- Following the UPDATE, the query still returns a single row, this | |
113 -- time containing the text values "xxx" and "yyy". This is because the | |
114 -- full-text index still indicates that the row with docid=3 matches | |
115 -- the FTS4 query 'k', even though the documents stored in the content | |
116 -- table have been modified. | |
117 -- | |
118 UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3; | |
119 SELECT * FROM t3 WHERE t3 MATCH 'k'; | |
120 | |
121 -- Following the DELETE below, the query returns one row containing two | |
122 -- NULL values. NULL values are returned because FTS is unable to find | |
123 -- a row with rowid=3 within the content table. | |
124 -- | |
125 DELETE FROM t2; | |
126 SELECT * FROM t3 WHERE t3 MATCH 'k'; | |
127 | |
128 When a row is deleted from an external content FTS4 table, FTS4 needs to | |
129 retrieve the column values of the row being deleted from the content table. | |
130 This is so that FTS4 can update the full-text index entries for each token | |
131 that occurs within the deleted row to indicate that that row has been | |
132 deleted. If the content table row cannot be found, or if it contains values | |
133 inconsistent with the contents of the FTS index, the results can be difficult | |
134 to predict. The FTS index may be left containing entries corresponding to the | |
135 deleted row, which can lead to seemingly nonsensical results being returned | |
136 by subsequent SELECT queries. The same applies when a row is updated, as | |
137 internally an UPDATE is the same as a DELETE followed by an INSERT. | |
138 | |
139 Instead of writing separately to the full-text index and the content table, | |
140 some users may wish to use database triggers to keep the full-text index | |
141 up to date with respect to the set of documents stored in the content table. | |
142 For example, using the tables from earlier examples: | |
143 | |
144 CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN | |
145 DELETE FROM t3 WHERE docid=old.rowid; | |
146 END; | |
147 CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN | |
148 DELETE FROM t3 WHERE docid=old.rowid; | |
149 END; | |
150 | |
151 CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN | |
152 INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); | |
153 END; | |
154 CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN | |
155 INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); | |
156 END; | |
157 | |
158 The DELETE trigger must be fired before the actual delete takes place | |
159 on the content table. This is so that FTS4 can still retrieve the original | |
160 values in order to update the full-text index. And the INSERT trigger must | |
161 be fired after the new row is inserted, so as to handle the case where the | |
162 rowid is assigned automatically within the system. The UPDATE trigger must | |
163 be split into two parts, one fired before and one after the update of the | |
164 content table, for the same reasons. | |
165 | |
166 FTS4 features a special command similar to the 'optimize' command that | |
167 deletes the entire full-text index and rebuilds it based on the current | |
168 set of documents in the content table. Assuming again that "t3" is the | |
169 name of the external content FTS4 table, the command is: | |
170 | |
171 INSERT INTO t3(t3) VALUES('rebuild'); | |
172 | |
173 This command may also be used with ordinary FTS4 tables, although it may | |
174 only be useful if the full-text index has somehow become corrupt. It is an | |
175 error to attempt to rebuild the full-text index maintained by a contentless | |
176 FTS4 table. | |
177 | |
178 | |
OLD | NEW |