third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers - Issue 949043002: Add //third_party/sqlite to dirs_to_snapshot, remove net_sql.patch

Unified Diff: third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers

Issue 949043002: Add //third_party/sqlite to dirs_to_snapshot, remove net_sql.patch (Closed) Base URL: git@github.com:domokit/mojo.git@master

Patch Set: Created 5 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

« no previous file with comments | « third_party/sqlite/sqlite-src-3080704/ext/fts1/tokenizer.h ('k') | third_party/sqlite/sqlite-src-3080704/ext/fts2/README.txt » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers

diff --git a/third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers b/third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers

new file mode 100644

index 0000000000000000000000000000000000000000..98d2021ba1d02cef16b67eed84b2103a45adfaff

--- /dev/null

+++ b/third_party/sqlite/sqlite-src-3080704/ext/fts2/README.tokenizers

@@ -0,0 +1,133 @@

+1. FTS2 Tokenizers

+ When creating a new full-text table, FTS2 allows the user to select

+ the text tokenizer implementation to be used when indexing text

+ by specifying a "tokenizer" clause as part of the CREATE VIRTUAL TABLE

+ statement:

+ CREATE VIRTUAL TABLE <table-name> USING fts2(

+ <columns ...> [, tokenizer <tokenizer-name> [<tokenizer-args>]]

+ );

+ The built-in tokenizers (valid values to pass as <tokenizer name>) are

+ "simple" and "porter".

+ <tokenizer-args> should consist of zero or more white-space separated

+ arguments to pass to the selected tokenizer implementation. The

+ interpretation of the arguments, if any, depends on the individual

+ tokenizer.

+2. Custom Tokenizers

+ FTS2 allows users to provide custom tokenizer implementations. The

+ interface used to create a new tokenizer is defined and described in

+ the fts2_tokenizer.h source file.

+ Registering a new FTS2 tokenizer is similar to registering a new

+ virtual table module with SQLite. The user passes a pointer to a

+ structure containing pointers to various callback functions that

+ make up the implementation of the new tokenizer type. For tokenizers,

+ the structure (defined in fts2_tokenizer.h) is called

+ "sqlite3_tokenizer_module".

+ FTS2 does not expose a C-function that users call to register new

+ tokenizer types with a database handle. Instead, the pointer must

+ be encoded as an SQL blob value and passed to FTS2 through the SQL

+ engine by evaluating a special scalar function, "fts2_tokenizer()".

+ The fts2_tokenizer() function may be called with one or two arguments,

+ as follows:

+ SELECT fts2_tokenizer(<tokenizer-name>);

+ SELECT fts2_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>);

+ Where <tokenizer-name> is a string identifying the tokenizer and

+ <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module

+ structure encoded as an SQL blob. If the second argument is present,

+ it is registered as tokenizer <tokenizer-name> and a copy of it

+ returned. If only one argument is passed, a pointer to the tokenizer

+ implementation currently registered as <tokenizer-name> is returned,

+ encoded as a blob. Or, if no such tokenizer exists, an SQL exception

+ (error) is raised.

+ SECURITY: If the fts2 extension is used in an environment where potentially

+ malicious users may execute arbitrary SQL (i.e. gears), they should be

+ prevented from invoking the fts2_tokenizer() function, possibly using the

+ authorisation callback.

+ See "Sample code" below for an example of calling the fts2_tokenizer()

+ function from C code.

+3. ICU Library Tokenizers

+ If this extension is compiled with the SQLITE_ENABLE_ICU pre-processor

+ symbol defined, then there exists a built-in tokenizer named "icu"

+ implemented using the ICU library. The first argument passed to the

+ xCreate() method (see fts2_tokenizer.h) of this tokenizer may be

+ an ICU locale identifier. For example "tr_TR" for Turkish as used

+ in Turkey, or "en_AU" for English as used in Australia. For example:

+ "CREATE VIRTUAL TABLE thai_text USING fts2(text, tokenizer icu th_TH)"

+ The ICU tokenizer implementation is very simple. It splits the input

+ text according to the ICU rules for finding word boundaries and discards

+ any tokens that consist entirely of white-space. This may be suitable

+ for some applications in some locales, but not all. If more complex

+ processing is required, for example to implement stemming or

+ discard punctuation, this can be done by creating a tokenizer

+ implementation that uses the ICU tokenizer as part of its implementation.

+ When using the ICU tokenizer this way, it is safe to overwrite the

+ contents of the strings returned by the xNext() method (see

+ fts2_tokenizer.h).

+4. Sample code.

+ The following two code samples illustrate the way C code should invoke

+ the fts2_tokenizer() scalar function:

+ int registerTokenizer(

+ sqlite3 *db,

+ char *zName,

+ const sqlite3_tokenizer_module *p

+ ){

+ int rc;

+ sqlite3_stmt *pStmt;

+ const char zSql[] = "SELECT fts2_tokenizer(?, ?)";

+ rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);

+ if( rc!=SQLITE_OK ){

+ return rc;

+ }

+ sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);

+ sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC);

+ sqlite3_step(pStmt);

+ return sqlite3_finalize(pStmt);

+ }

+ int queryTokenizer(

+ sqlite3 *db,

+ char *zName,

+ const sqlite3_tokenizer_module **pp

+ ){

+ int rc;

+ sqlite3_stmt *pStmt;

+ const char zSql[] = "SELECT fts2_tokenizer(?)";

+ *pp = 0;

+ rc = sqlite3_prepare_v2(db, zSql, -1, &pStmt, 0);

+ if( rc!=SQLITE_OK ){

+ return rc;

+ }

+ sqlite3_bind_text(pStmt, 1, zName, -1, SQLITE_STATIC);

+ if( SQLITE_ROW==sqlite3_step(pStmt) ){

+ if( sqlite3_column_type(pStmt, 0)==SQLITE_BLOB ){

+ memcpy(pp, sqlite3_column_blob(pStmt, 0), sizeof(*pp));

+ }

+ return sqlite3_finalize(pStmt);

+ }