components/omnibox/browser/url_index_private_data.h - Issue 2690303012: Cleaning up url_index_private_data and in_memory_url_index_types.

Side by Side Diff: components/omnibox/browser/url_index_private_data.h

Issue 2690303012: Cleaning up url_index_private_data and in_memory_url_index_types. (Closed)

Patch Set: Removing sorting, utilitiy for sets intersection. Created 3 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

« no previous file with comments | « components/omnibox/browser/in_memory_url_index_types.cc ('k') | components/omnibox/browser/url_index_private_data.cc » ('j') | components/omnibox/browser/url_index_private_data.cc » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.	1 // Copyright (c) 2012 The Chromium Authors. All rights reserved.

2 // Use of this source code is governed by a BSD-style license that can be	2 // Use of this source code is governed by a BSD-style license that can be

3 // found in the LICENSE file.	3 // found in the LICENSE file.

4	4

5 #ifndef COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_	5 #ifndef COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_

6 #define COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_	6 #define COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_

7	7

8 #include <stddef.h>	8 #include <stddef.h>

9	9

10 #include <set>	10 #include <set>

	11 #include <stack>

11 #include <string>	12 #include <string>

12	13

13 #include "base/files/file_path.h"	14 #include "base/files/file_path.h"

14 #include "base/gtest_prod_util.h"	15 #include "base/gtest_prod_util.h"

15 #include "base/memory/ref_counted.h"	16 #include "base/memory/ref_counted.h"

16 #include "components/history/core/browser/history_service.h"	17 #include "components/history/core/browser/history_service.h"

17 #include "components/omnibox/browser/in_memory_url_index_cache.pb.h"	18 #include "components/omnibox/browser/in_memory_url_index_cache.pb.h"

18 #include "components/omnibox/browser/in_memory_url_index_types.h"	19 #include "components/omnibox/browser/in_memory_url_index_types.h"

19 #include "components/omnibox/browser/scored_history_match.h"	20 #include "components/omnibox/browser/scored_history_match.h"

20	21

(...skipping 176 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
197 ~HistoryItemFactorGreater();	198 ~HistoryItemFactorGreater();

198	199

199 bool operator()(const HistoryID h1, const HistoryID h2);	200 bool operator()(const HistoryID h1, const HistoryID h2);

200	201

201 private:	202 private:

202 const HistoryInfoMap& history_info_map_;	203 const HistoryInfoMap& history_info_map_;

203 };	204 };

204	205

205 // URL History indexing support functions.	206 // URL History indexing support functions.

206	207

207 // Composes a set of history item IDs by intersecting the set for each word	208 // Composes a vector of history item IDs by intersecting the set for each word

208 // in \|unsorted_words\|.	209 // in \|unsorted_words\|.

209 HistoryIDSet HistoryIDSetFromWords(const String16Vector& unsorted_words);	210 HistoryIDVector HistoryIDsFromWords(const String16Vector& unsorted_words);

	211

	212 // Trim the candidate pool if it is large. Note that we do not filter out
	Peter Kasting 2017/02/18 01:46:31 Nit: Trims Might want to give a little idea of ho Nit: Trims Might want to give a little idea of how/why, e.g. "Trims the candidate pool in advance of doing proper substring searching, to cap the cost of such searching. Discards the least-relevant items (based on visit stats), which are least likely to score highly in the end. To minimize the risk of discarding a valuable URL, the candidate pool is still left two orders of magnitude larger than the final number of results returned from the HQP." dyaroshev 2017/02/18 11:48:14 Done. Show quoted text On 2017/02/18 01:46:31, Peter Kasting wrote: > Nit: Trims > > Might want to give a little idea of how/why, e.g. "Trims the candidate pool in > advance of doing proper substring searching, to cap the cost of such searching. > Discards the least-relevant items (based on visit stats), which are least likely > to score highly in the end. To minimize the risk of discarding a valuable URL, > the candidate pool is still left two orders of magnitude larger than the final > number of results returned from the HQP." Done.
	213 // items that do not contain the search terms as proper substrings --

	214 // doing so is the performance-costly operation we are trying to avoid in

	215 // order to maintain omnibox responsiveness.

	216 void TrimHistoryIdsPool(HistoryIDVector* history_ids) const;

210	217

211 // Helper function to HistoryIDSetFromWords which composes a set of history	218 // Helper function to HistoryIDSetFromWords which composes a set of history

212 // ids for the given term given in \|term\|.	219 // ids for the given term given in \|term\|.

213 HistoryIDSet HistoryIDsForTerm(const base::string16& term);	220 HistoryIDSet HistoryIDsForTerm(const base::string16& term);

214	221

215 // Given a set of Char16s, finds words containing those characters.	222 // Given a set of Char16s, finds words containing those characters.

216 WordIDSet WordIDSetForTermChars(const Char16Set& term_chars);	223 WordIDSet WordIDSetForTermChars(const Char16Set& term_chars);

217	224

218 // Helper function for HistoryItemsForTerms(). Fills in \|scored_items\| from	225 // Helper function for HistoryItemsForTerms(). Fills in \|scored_items\| from

219 // the matches listed in \|history_id_set\|.	226 // the matches listed in \|history_ids\|.

220 void HistoryIdSetToScoredMatches(	227 void HistoryIdsToScoredMatches(HistoryIDVector history_ids,

221 HistoryIDSet history_id_set,	228 const base::string16& lower_raw_string,

222 const base::string16& lower_raw_string,	229 const TemplateURLService* template_url_service,

223 const TemplateURLService* template_url_service,	230 bookmarks::BookmarkModel* bookmark_model,

224 bookmarks::BookmarkModel* bookmark_model,	231 ScoredHistoryMatches* scored_items) const;

225 ScoredHistoryMatches* scored_items) const;

226	232

227 // Fills in \|terms_to_word_starts_offsets\| according to where the word starts	233 // Fills in \|terms_to_word_starts_offsets\| according to where the word starts

228 // in each term. For example, in the term "-foo" the word starts at offset 1.	234 // in each term. For example, in the term "-foo" the word starts at offset 1.

229 static void CalculateWordStartsOffsets(	235 static void CalculateWordStartsOffsets(

230 const String16Vector& terms,	236 const String16Vector& terms,

231 WordStarts* terms_to_word_starts_offsets);	237 WordStarts* terms_to_word_starts_offsets);

232	238

233 // Indexes one URL history item as described by \|row\|. Returns true if the	239 // Indexes one URL history item as described by \|row\|. Returns true if the

234 // row was actually indexed. \|scheme_whitelist\| is used to filter	240 // row was actually indexed. \|scheme_whitelist\| is used to filter

235 // non-qualifying schemes. If \|history_db\| is not NULL then this function	241 // non-qualifying schemes. If \|history_db\| is not NULL then this function

(...skipping 10 matching lines...) Expand all Loading...
246	252

247 // Parses and indexes the words in the URL and page title of \|row\| and	253 // Parses and indexes the words in the URL and page title of \|row\| and

248 // calculate the word starts in each, saving the starts in \|word_starts\|.	254 // calculate the word starts in each, saving the starts in \|word_starts\|.

249 void AddRowWordsToIndex(const history::URLRow& row,	255 void AddRowWordsToIndex(const history::URLRow& row,

250 RowWordStarts* word_starts);	256 RowWordStarts* word_starts);

251	257

252 // Given a single word in \|uni_word\|, adds a reference for the containing	258 // Given a single word in \|uni_word\|, adds a reference for the containing

253 // history item identified by \|history_id\| to the index.	259 // history item identified by \|history_id\| to the index.

254 void AddWordToIndex(const base::string16& uni_word, HistoryID history_id);	260 void AddWordToIndex(const base::string16& uni_word, HistoryID history_id);

255	261

256 // Creates a new entry in the word/history map for \|word_id\| and add	262 // Adds a new entry to word_list. Uses previously freed positions if

257 // \|history_id\| as the initial element of the word's set.	263 // available.

258 void AddWordHistory(const base::string16& uni_word, HistoryID history_id);	264 WordID AddNewWordToWordList(const base::string16& term);

259

260 // Updates an existing entry in the word/history index by adding the

261 // \|history_id\| to set for \|word_id\| in the word_id_history_map_.

262 void UpdateWordHistory(WordID word_id, HistoryID history_id);

263

264 // Adds \|word_id\| to \|history_id\|'s entry in the history/word map,

265 // creating a new entry if one does not already exist.

266 void AddToHistoryIDWordMap(HistoryID history_id, WordID word_id);

267	265

268 // Removes \|row\| and all associated words and characters from the index.	266 // Removes \|row\| and all associated words and characters from the index.

269 void RemoveRowFromIndex(const history::URLRow& row);	267 void RemoveRowFromIndex(const history::URLRow& row);

270	268

271 // Removes all words and characters associated with \|row\| from the index.	269 // Removes all words and characters associated with \|row\| from the index.

272 void RemoveRowWordsFromIndex(const history::URLRow& row);	270 void RemoveRowWordsFromIndex(const history::URLRow& row);

273	271

274 // Clears \|used_\| for each item in the search term cache.	272 // Clears \|used_\| for each item in the search term cache.

275 void ResetSearchTermCache();	273 void ResetSearchTermCache();

276	274

(...skipping 62 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
339 String16Vector word_list_;	337 String16Vector word_list_;

340	338

341 // A list of available words slots in \|word_list_\|. An available word slot	339 // A list of available words slots in \|word_list_\|. An available word slot

342 // is the index of a unused word in word_list_ vector, also referred to as	340 // is the index of a unused word in word_list_ vector, also referred to as

343 // a WordID. As URL visits are added or modified new words may be added to	341 // a WordID. As URL visits are added or modified new words may be added to

344 // the index, in which case any available words are used, if any, and then	342 // the index, in which case any available words are used, if any, and then

345 // words are added to the end of the word_list_. When URL visits are	343 // words are added to the end of the word_list_. When URL visits are

346 // modified or deleted old words may be removed from the index, in which	344 // modified or deleted old words may be removed from the index, in which

347 // case the slots for those words are added to available_words_ for resuse	345 // case the slots for those words are added to available_words_ for resuse

348 // by future URL updates.	346 // by future URL updates.

349 WordIDSet available_words_;	347 std::stack<WordID> available_words_;

350	348

351 // A one-to-one mapping from the a word string to its slot number (i.e.	349 // A one-to-one mapping from the a word string to its slot number (i.e.

352 // WordID) in the \|word_list_\|.	350 // WordID) in the \|word_list_\|.

353 WordMap word_map_;	351 WordMap word_map_;

354	352

355 // A one-to-many mapping from a single character to all WordIDs of words	353 // A one-to-many mapping from a single character to all WordIDs of words

356 // containing that character.	354 // containing that character.

357 CharWordIDMap char_word_map_;	355 CharWordIDMap char_word_map_;

358	356

359 // A one-to-many mapping from a WordID to all HistoryIDs (the row_id as	357 // A one-to-many mapping from a WordID to all HistoryIDs (the row_id as

(...skipping 21 matching lines...) Expand all Loading...
381 int saved_cache_version_;	379 int saved_cache_version_;

382	380

383 // Used for unit testing only. Records the number of candidate history items	381 // Used for unit testing only. Records the number of candidate history items

384 // at three stages in the index searching process.	382 // at three stages in the index searching process.

385 size_t pre_filter_item_count_; // After word index is queried.	383 size_t pre_filter_item_count_; // After word index is queried.

386 size_t post_filter_item_count_; // After trimming large result set.	384 size_t post_filter_item_count_; // After trimming large result set.

387 size_t post_scoring_item_count_; // After performing final filter/scoring.	385 size_t post_scoring_item_count_; // After performing final filter/scoring.

388 };	386 };

389	387

390 #endif // COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_	388 #endif // COMPONENTS_OMNIBOX_BROWSER_URL_INDEX_PRIVATE_DATA_H_

OLD	NEW