OLD | NEW |
(Empty) | |
| 1 |
| 2 FTS4 CONTENT OPTION |
| 3 |
| 4 Normally, in order to create a full-text index on a dataset, the FTS4 |
| 5 module stores a copy of all indexed documents in a specially created |
| 6 database table. |
| 7 |
| 8 As of SQLite version 3.7.9, FTS4 supports a new option - "content" - |
| 9 designed to extend FTS4 to support the creation of full-text indexes where: |
| 10 |
| 11 * The indexed documents are not stored within the SQLite database |
| 12 at all (a "contentless" FTS4 table), or |
| 13 |
| 14 * The indexed documents are stored in a database table created and |
| 15 managed by the user (an "external content" FTS4 table). |
| 16 |
| 17 Because the indexed documents themselves are usually much larger than |
| 18 the full-text index, the content option can sometimes be used to achieve |
| 19 significant space savings. |
| 20 |
| 21 CONTENTLESS FTS4 TABLES |
| 22 |
| 23 In order to create an FTS4 table that does not store a copy of the indexed |
| 24 documents at all, the content option should be set to an empty string. |
| 25 For example, the following SQL creates such an FTS4 table with three |
| 26 columns - "a", "b", and "c": |
| 27 |
| 28 CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c); |
| 29 |
| 30 Data can be inserted into such an FTS4 table using an INSERT statements. |
| 31 However, unlike ordinary FTS4 tables, the user must supply an explicit |
| 32 integer docid value. For example: |
| 33 |
| 34 -- This statement is Ok: |
| 35 INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i'); |
| 36 |
| 37 -- This statement causes an error, as no docid value has been provided: |
| 38 INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r'); |
| 39 |
| 40 It is not possible to UPDATE or DELETE a row stored in a contentless FTS4 |
| 41 table. Attempting to do so is an error. |
| 42 |
| 43 Contentless FTS4 tables also support SELECT statements. However, it is |
| 44 an error to attempt to retrieve the value of any table column other than |
| 45 the docid column. The auxiliary function matchinfo() may be used, but |
| 46 snippet() and offsets() may not. For example: |
| 47 |
| 48 -- The following statements are Ok: |
| 49 SELECT docid FROM t1 WHERE t1 MATCH 'xxx'; |
| 50 SELECT docid FROM t1 WHERE a MATCH 'xxx'; |
| 51 SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx'; |
| 52 |
| 53 -- The following statements all cause errors, as the value of columns |
| 54 -- other than docid are required to evaluate them. |
| 55 SELECT * FROM t1; |
| 56 SELECT a, b FROM t1 WHERE t1 MATCH 'xxx'; |
| 57 SELECT docid FROM t1 WHERE a LIKE 'xxx%'; |
| 58 SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx'; |
| 59 |
| 60 Errors related to attempting to retrieve column values other than docid |
| 61 are runtime errors that occur within sqlite3_step(). In some cases, for |
| 62 example if the MATCH expression in a SELECT query matches zero rows, there |
| 63 may be no error at all even if a statement does refer to column values |
| 64 other than docid. |
| 65 |
| 66 EXTERNAL CONTENT FTS4 TABLES |
| 67 |
| 68 An "external content" FTS4 table is similar to a contentless table, except |
| 69 that if evaluation of a query requires the value of a column other than |
| 70 docid, FTS4 attempts to retrieve that value from a table (or view, or |
| 71 virtual table) nominated by the user (hereafter referred to as the "content |
| 72 table"). The FTS4 module never writes to the content table, and writing |
| 73 to the content table does not affect the full-text index. It is the |
| 74 responsibility of the user to ensure that the content table and the |
| 75 full-text index are consistent. |
| 76 |
| 77 An external content FTS4 table is created by setting the content option |
| 78 to the name of a table (or view, or virtual table) that may be queried by |
| 79 FTS4 to retrieve column values when required. If the nominated table does |
| 80 not exist, then an external content table behaves in the same way as |
| 81 a contentless table. For example: |
| 82 |
| 83 CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c); |
| 84 CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c); |
| 85 |
| 86 Assuming the nominated table does exist, then its columns must be the same |
| 87 as or a superset of those defined for the FTS table. |
| 88 |
| 89 When a users query on the FTS table requires a column value other than |
| 90 docid, FTS attempts to read this value from the corresponding column of |
| 91 the row in the content table with a rowid value equal to the current FTS |
| 92 docid. Or, if such a row cannot be found in the content table, a NULL |
| 93 value is used instead. For example: |
| 94 |
| 95 CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d); |
| 96 CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c); |
| 97 |
| 98 INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f'); |
| 99 INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l'); |
| 100 INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2; |
| 101 |
| 102 -- The following query returns a single row with two columns containing |
| 103 -- the text values "i j" and "k l". |
| 104 -- |
| 105 -- The query uses the full-text index to discover that the MATCH |
| 106 -- term matches the row with docid=3. It then retrieves the values |
| 107 -- of columns b and c from the row with rowid=3 in the content table |
| 108 -- to return. |
| 109 -- |
| 110 SELECT * FROM t3 WHERE t3 MATCH 'k'; |
| 111 |
| 112 -- Following the UPDATE, the query still returns a single row, this |
| 113 -- time containing the text values "xxx" and "yyy". This is because the |
| 114 -- full-text index still indicates that the row with docid=3 matches |
| 115 -- the FTS4 query 'k', even though the documents stored in the content |
| 116 -- table have been modified. |
| 117 -- |
| 118 UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3; |
| 119 SELECT * FROM t3 WHERE t3 MATCH 'k'; |
| 120 |
| 121 -- Following the DELETE below, the query returns one row containing two |
| 122 -- NULL values. NULL values are returned because FTS is unable to find |
| 123 -- a row with rowid=3 within the content table. |
| 124 -- |
| 125 DELETE FROM t2; |
| 126 SELECT * FROM t3 WHERE t3 MATCH 'k'; |
| 127 |
| 128 When a row is deleted from an external content FTS4 table, FTS4 needs to |
| 129 retrieve the column values of the row being deleted from the content table. |
| 130 This is so that FTS4 can update the full-text index entries for each token |
| 131 that occurs within the deleted row to indicate that that row has been |
| 132 deleted. If the content table row cannot be found, or if it contains values |
| 133 inconsistent with the contents of the FTS index, the results can be difficult |
| 134 to predict. The FTS index may be left containing entries corresponding to the |
| 135 deleted row, which can lead to seemingly nonsensical results being returned |
| 136 by subsequent SELECT queries. The same applies when a row is updated, as |
| 137 internally an UPDATE is the same as a DELETE followed by an INSERT. |
| 138 |
| 139 Instead of writing separately to the full-text index and the content table, |
| 140 some users may wish to use database triggers to keep the full-text index |
| 141 up to date with respect to the set of documents stored in the content table. |
| 142 For example, using the tables from earlier examples: |
| 143 |
| 144 CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN |
| 145 DELETE FROM t3 WHERE docid=old.rowid; |
| 146 END; |
| 147 CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN |
| 148 DELETE FROM t3 WHERE docid=old.rowid; |
| 149 END; |
| 150 |
| 151 CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN |
| 152 INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); |
| 153 END; |
| 154 CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN |
| 155 INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c); |
| 156 END; |
| 157 |
| 158 The DELETE trigger must be fired before the actual delete takes place |
| 159 on the content table. This is so that FTS4 can still retrieve the original |
| 160 values in order to update the full-text index. And the INSERT trigger must |
| 161 be fired after the new row is inserted, so as to handle the case where the |
| 162 rowid is assigned automatically within the system. The UPDATE trigger must |
| 163 be split into two parts, one fired before and one after the update of the |
| 164 content table, for the same reasons. |
| 165 |
| 166 FTS4 features a special command similar to the 'optimize' command that |
| 167 deletes the entire full-text index and rebuilds it based on the current |
| 168 set of documents in the content table. Assuming again that "t3" is the |
| 169 name of the external content FTS4 table, the command is: |
| 170 |
| 171 INSERT INTO t3(t3) VALUES('rebuild'); |
| 172 |
| 173 This command may also be used with ordinary FTS4 tables, although it may |
| 174 only be useful if the full-text index has somehow become corrupt. It is an |
| 175 error to attempt to rebuild the full-text index maintained by a contentless |
| 176 FTS4 table. |
| 177 |
| 178 |
OLD | NEW |