OLD | NEW |
(Empty) | |
| 1 IN-Operator Implementation Notes |
| 2 ================================ |
| 3 |
| 4 ## Definitions: |
| 5 |
| 6 An IN operator has one of the following formats: |
| 7 |
| 8 > |
| 9 x IN (y1,y2,y3,...,yN) |
| 10 x IN (subquery) |
| 11 |
| 12 The "x" is referred to as the LHS (left-hand side). The list or subquery |
| 13 on the right is called the RHS (right-hand side). If the RHS is a list |
| 14 it must be a non-empty list. But if the RHS is a subquery, it can be an |
| 15 empty set. |
| 16 |
| 17 The LHS can be a scalar (a single quantity) or a vector (a list of |
| 18 two or or more values) or a subquery that returns one or more columns. |
| 19 We use the term "vector" to mean an actually list of values or a |
| 20 subquery that returns two or more columns. An isolated value or |
| 21 a subquery that returns a single columns is called a scalar. |
| 22 |
| 23 The RHS can be a subquery that returns a single column, a subquery |
| 24 that returns two or more columns, or a list of scalars. It is not |
| 25 currently support for the RHS to be a list of vectors. |
| 26 |
| 27 The number of columns for LHS must match the number of columns for |
| 28 the RHS. If the RHS is a list of values, then the LHS must be a |
| 29 scalar. If the RHS is a subquery returning N columns, then the LHS |
| 30 must be a vector of size N. |
| 31 |
| 32 NULL values can occur in either or both of the LHS and RHS. |
| 33 If the LHS contains only |
| 34 NULL values then we say that it is a "total-NULL". If the LHS contains |
| 35 some NULL values and some non-NULL values, then it is a "partial-NULL". |
| 36 For a scalar, there is no difference between a partial-NULL and a total-NULL. |
| 37 The RHS is a partial-NULL if any row contains a NULL value. The RHS is |
| 38 a total-NULL if it contains one or more rows that contain only NULL values. |
| 39 The LHS is called "non-NULL" if it contains no NULL values. The RHS is |
| 40 called "non-NULL" if it contains no NULL values in any row. |
| 41 |
| 42 The result of an IN operator is one of TRUE, FALSE, or NULL. A NULL result |
| 43 means that it cannot be determined if the LHS is contained in the RHS due |
| 44 to the presence of NULL values. In some contexts (for example, when the IN |
| 45 operator occurs in a WHERE clause) |
| 46 the system only needs a binary result: TRUE or NOT-TRUE. One can also |
| 47 to define a binary result of FALSE and NOT-FALSE, but |
| 48 it turns out that no extra optimizations are possible in that case, so if |
| 49 the FALSE/NOT-FALSE binary is needed, we have to compute the three-state |
| 50 TRUE/FALSE/NULL result and then combine the TRUE and NULL values into |
| 51 NOT-FALSE. |
| 52 |
| 53 A "NOT IN" operator is computed by first computing the equivalent IN |
| 54 operator, then interchanging the TRUE and FALSE results. |
| 55 |
| 56 ## Simple Full-Scan Algorithm |
| 57 |
| 58 The following algorithm always compute the correct answer. However, this |
| 59 algorithm is suboptimal, especially if there are many rows on the RHS. |
| 60 |
| 61 1. Set the null-flag to false |
| 62 2. For each row in the RHS: |
| 63 <ol type='a'> |
| 64 <li> Compare the LHS against the RHS |
| 65 <li> If the LHS exactly matches the RHS, immediately return TRUE |
| 66 <li> If the comparison result is NULL, set the null-flag to true |
| 67 </ol> |
| 68 3. If the null-flag is true, return NULL. |
| 69 4. Return FALSE |
| 70 |
| 71 ## Optimized Algorithm |
| 72 |
| 73 The following procedure computes the same answer as the simple full-scan |
| 74 algorithm, though it does so with less work in the common case. This |
| 75 is the algorithm that is implemented in SQLite. |
| 76 |
| 77 1. If the RHS is a constant list of length 1 or 2, then rewrite the |
| 78 IN operator as a simple expression. Implement |
| 79 |
| 80 x IN (y1,y2) |
| 81 |
| 82 as if it were |
| 83 |
| 84 x=y1 OR x=y2 |
| 85 |
| 86 This is the INDEX_NOOP optimization and is only undertaken if the |
| 87 IN operator is used for membership testing. If the IN operator is |
| 88 driving a loop, then skip this step entirely. |
| 89 |
| 90 2. Check the LHS to see if it is a partial-NULL and if it is, jump |
| 91 ahead to step 5. |
| 92 |
| 93 3. Do a binary search of the RHS using the LHS as a probe. If |
| 94 an exact match is found, return TRUE. |
| 95 |
| 96 4. If the RHS is non-NULL then return FALSE. |
| 97 |
| 98 5. If we do not need to distinguish between FALSE and NULL, |
| 99 then return FALSE. |
| 100 |
| 101 6. For each row in the RHS, compare that row against the LHS and |
| 102 if the result is NULL, immediately return NULL. In the case |
| 103 of a scalar IN operator, we only need to look at the very first |
| 104 row the RHS because for a scalar RHS, all NULLs will always come |
| 105 first. If the RHS is empty, this step is a no-op. |
| 106 |
| 107 7. Return FALSE. |
OLD | NEW |