Index: third_party/sqlite/src/src/in-operator.md |
diff --git a/third_party/sqlite/src/src/in-operator.md b/third_party/sqlite/src/src/in-operator.md |
new file mode 100644 |
index 0000000000000000000000000000000000000000..e9ad2101aa94f3765234bff63f2d9c6ce56d792c |
--- /dev/null |
+++ b/third_party/sqlite/src/src/in-operator.md |
@@ -0,0 +1,107 @@ |
+IN-Operator Implementation Notes |
+================================ |
+ |
+## Definitions: |
+ |
+An IN operator has one of the following formats: |
+ |
+> |
+ x IN (y1,y2,y3,...,yN) |
+ x IN (subquery) |
+ |
+The "x" is referred to as the LHS (left-hand side). The list or subquery |
+on the right is called the RHS (right-hand side). If the RHS is a list |
+it must be a non-empty list. But if the RHS is a subquery, it can be an |
+empty set. |
+ |
+The LHS can be a scalar (a single quantity) or a vector (a list of |
+two or or more values) or a subquery that returns one or more columns. |
+We use the term "vector" to mean an actually list of values or a |
+subquery that returns two or more columns. An isolated value or |
+a subquery that returns a single columns is called a scalar. |
+ |
+The RHS can be a subquery that returns a single column, a subquery |
+that returns two or more columns, or a list of scalars. It is not |
+currently support for the RHS to be a list of vectors. |
+ |
+The number of columns for LHS must match the number of columns for |
+the RHS. If the RHS is a list of values, then the LHS must be a |
+scalar. If the RHS is a subquery returning N columns, then the LHS |
+must be a vector of size N. |
+ |
+NULL values can occur in either or both of the LHS and RHS. |
+If the LHS contains only |
+NULL values then we say that it is a "total-NULL". If the LHS contains |
+some NULL values and some non-NULL values, then it is a "partial-NULL". |
+For a scalar, there is no difference between a partial-NULL and a total-NULL. |
+The RHS is a partial-NULL if any row contains a NULL value. The RHS is |
+a total-NULL if it contains one or more rows that contain only NULL values. |
+The LHS is called "non-NULL" if it contains no NULL values. The RHS is |
+called "non-NULL" if it contains no NULL values in any row. |
+ |
+The result of an IN operator is one of TRUE, FALSE, or NULL. A NULL result |
+means that it cannot be determined if the LHS is contained in the RHS due |
+to the presence of NULL values. In some contexts (for example, when the IN |
+operator occurs in a WHERE clause) |
+the system only needs a binary result: TRUE or NOT-TRUE. One can also |
+to define a binary result of FALSE and NOT-FALSE, but |
+it turns out that no extra optimizations are possible in that case, so if |
+the FALSE/NOT-FALSE binary is needed, we have to compute the three-state |
+TRUE/FALSE/NULL result and then combine the TRUE and NULL values into |
+NOT-FALSE. |
+ |
+A "NOT IN" operator is computed by first computing the equivalent IN |
+operator, then interchanging the TRUE and FALSE results. |
+ |
+## Simple Full-Scan Algorithm |
+ |
+The following algorithm always compute the correct answer. However, this |
+algorithm is suboptimal, especially if there are many rows on the RHS. |
+ |
+ 1. Set the null-flag to false |
+ 2. For each row in the RHS: |
+ <ol type='a'> |
+ <li> Compare the LHS against the RHS |
+ <li> If the LHS exactly matches the RHS, immediately return TRUE |
+ <li> If the comparison result is NULL, set the null-flag to true |
+ </ol> |
+ 3. If the null-flag is true, return NULL. |
+ 4. Return FALSE |
+ |
+## Optimized Algorithm |
+ |
+The following procedure computes the same answer as the simple full-scan |
+algorithm, though it does so with less work in the common case. This |
+is the algorithm that is implemented in SQLite. |
+ |
+ 1. If the RHS is a constant list of length 1 or 2, then rewrite the |
+ IN operator as a simple expression. Implement |
+ |
+ x IN (y1,y2) |
+ |
+ as if it were |
+ |
+ x=y1 OR x=y2 |
+ |
+ This is the INDEX_NOOP optimization and is only undertaken if the |
+ IN operator is used for membership testing. If the IN operator is |
+ driving a loop, then skip this step entirely. |
+ |
+ 2. Check the LHS to see if it is a partial-NULL and if it is, jump |
+ ahead to step 5. |
+ |
+ 3. Do a binary search of the RHS using the LHS as a probe. If |
+ an exact match is found, return TRUE. |
+ |
+ 4. If the RHS is non-NULL then return FALSE. |
+ |
+ 5. If we do not need to distinguish between FALSE and NULL, |
+ then return FALSE. |
+ |
+ 6. For each row in the RHS, compare that row against the LHS and |
+ if the result is NULL, immediately return NULL. In the case |
+ of a scalar IN operator, we only need to look at the very first |
+ row the RHS because for a scalar RHS, all NULLs will always come |
+ first. If the RHS is empty, this step is a no-op. |
+ |
+ 7. Return FALSE. |