Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(11170)

Unified Diff: chrome/test/functional/dataset_converter.py

Issue 6246147: Test Autofill's ability to merge duplicate profiles and... (Closed) Base URL: svn://chrome-svn/chrome/trunk/src/
Patch Set: '' Created 9 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« chrome/test/functional/autofill.py ('K') | « chrome/test/functional/autofill.py ('k') | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: chrome/test/functional/dataset_converter.py
===================================================================
--- chrome/test/functional/dataset_converter.py (revision 0)
+++ chrome/test/functional/dataset_converter.py (revision 0)
@@ -0,0 +1,203 @@
+#!/usr/bin/python
+# Copyright (c) 2011 The Chromium Authors. All rights reserved.
+# Use of this source code is governed by a BSD-style license that can be
+# found in the LICENSE file.
+
+"""Takes in a dataset profiles file and outputs to a dictionary list format
dennisjeffrey 2011/02/11 00:53:17 The first line of this comment should be a 1-line
dyu1 2011/02/16 03:17:31 Done.
+for converting Autofill profile datasets.
+
+Used for test autofill.AutoFillTest.testMergeDuplicateProfilesInAutofill.
+"""
+
+import re
+import codecs
+import sys
+import os
dennisjeffrey 2011/02/11 00:53:17 These should be specified in alphabetical order.
dyu1 2011/02/16 03:17:31 Done.
+
+
+class DatasetConverter(object):
+ def __init__(self, input_filename, output_filename = None,
+ display_nothing = True, display_input_lines = False,
+ display_converted_lines = False):
dennisjeffrey 2011/02/11 00:53:17 Don't put spaces around the "=" when you're defini
dennisjeffrey 2011/02/11 00:53:17 Using the "logging" module with different verbosit
dyu1 2011/02/16 03:17:31 Done.
+ """Constructs a dataset converter object.
+
+ Full input pattern:
+ '(?P<NAME_FIRST>.*?)\|(?P<MIDDLE_NAME>.*?)\|(?P<NAME_LAST>.*?)\|
+ (?P<EMAIL_ADDRESS>.*?)\|(?P<COMPANY_NAME>.*?)\|(?P<ADDRESS_HOME_LINE1>.*?)
+ \|(?P<ADDRESS_HOME_LINE2>.*?)\|(?P<ADDRESS_HOME_CITY>.*?)\|
+ (?P<ADDRESS_HOME_STATE>.*?)\|(?P<ADDRESS_HOME_ZIP>.*?)\|
+ (?P<ADDRESS_HOME_COUNTRY>.*?)\|
+ (?P<PHONE_HOME_WHOLE_NUMBER>.*?)\|(?P<PHONE_FAX_WHOLE_NUMBER>.*?)$'
+
+ Full ouput pattern:
+ "{u'NAME_FIRST': u'%s', u'NAME_MIDDLE': u'%s', u'NAME_LAST': u'%s',
+ u'EMAIL_ADDRESS': u'%s', u'COMPANY_NAME': u'%s', u'ADDRESS_HOME_LINE1':
+ u'%s', u'ADDRESS_HOME_LINE2': u'%s', u'ADDRESS_HOME_CITY': u'%s',
+ u'ADDRESS_HOME_STATE': u'%s', u'ADDRESS_HOME_ZIP': u'%s',
+ u'ADDRESS_HOME_COUNTRY': u'%s', u'PHONE_HOME_WHOLE_NUMBER': u'%s',
+ u'PHONE_FAX_WHOLE_NUMBER': u'%s',},"
+
+ The pattern is a regular expression which has named parenthesis groups
Nirnimesh 2011/02/11 19:39:54 I think the input/output pattern above is illustra
dyu1 2011/02/16 03:17:31 Done.
+ like this (?P<name>...) in order to match the '|' separated fields.
+ If we had only the NAME_FIRST and NAME_MIDDLE fields (e.g 'Jared|JV') our
+ pattern would be: "(?P<NAME_FIRST>.*?)\|(?P<NAME_MIDDLE>.*?)$"
+
+ This means that '(?P<NAME_FIRST> regexp)\|' matches whatever regular
+ expression is inside the parentheses, and indicates the start and end of a
+ group; the contents of a group can be retrieved after a match has been
+ performed using the symbolic group name 'NAME_FIRST'.
+
+ The regexp is '.*?'. '.*' which means to match 0 or more repetitions of any
+ character. The following '?' makes the regexp non-greedy meaning it will
+ stop at the first occurrence of the '|' character (escaped in the pattern).
+
+ For '(?P<NAME_MIDDLE>.*?)$' there is no '|' at the end, so we have '$' to
+ indicate the end of the line.
+
+ From the full pattern, we construct once from the FIELDS list.
+
+ The out_line_pattern for one field: "{u'NAME_FIRST': u'%s',"
+ is ready to accept the value for the 'NAME_FIRST' field once it is extracted
+ from an input line using the above group pattern.
+
+ 'pattern' is used in CreateDictionaryFromRecord(line) to construct and
+ return a dictionary from a line.
+
+ 'out_line_pattern' is used in 'convert()' to construct the final dataset
+ line that will be printed to the output file.
+
+ Args:
+ input_filename: name and path of the input dataset.
+ output_filename: name and path of the converted file, default is None.
+ display_nothing: output display on the screen, default is True.
+ display_input_lines: output display of the inpute file, default is False.
+ display_converted_lines: output display of the converted file,
+ default is False.
+ """
+ self._fields = [
+ u'NAME_FIRST',
+ u'NAME_MIDDLE',
+ u'NAME_LAST',
+ u'EMAIL_ADDRESS',
+ u'COMPANY_NAME',
+ u'ADDRESS_HOME_LINE1',
+ u'ADDRESS_HOME_LINE2',
+ u'ADDRESS_HOME_CITY',
+ u'ADDRESS_HOME_STATE',
+ u'ADDRESS_HOME_ZIP',
+ u'ADDRESS_HOME_COUNTRY',
+ u'PHONE_HOME_WHOLE_NUMBER',
+ u'PHONE_FAX_WHOLE_NUMBER',
+ ]
dennisjeffrey 2011/02/11 00:53:17 Since _fields is just a constant array, would it b
dyu1 2011/02/16 03:17:31 Done.
+ self._output_pattern = u"{"
Nirnimesh 2011/02/11 19:39:54 prefer single quote char '
dyu1 2011/02/16 03:17:31 Done.
+ for key in self._fields:
+ self._output_pattern += u"u'%s': u'%s', " %(key, "%s")
dennisjeffrey 2011/02/11 00:53:17 I think this could be re-written like this: self.
dyu1 2011/02/16 03:17:31 Done.
+ self._output_pattern = self._output_pattern[:-1] + "},\n"
+
+ self._input_filename = input_filename
dennisjeffrey 2011/02/11 00:53:17 We should probably check to ensure that input_file
dyu1 2011/02/16 03:17:31 Done.
+ self._output_filename = output_filename
+ self._display_nothing = display_nothing
+ self._display_input_lines = display_input_lines
+ self._display_converted_lines = display_converted_lines
+ self._record_length = len(self._fields)
dennisjeffrey 2011/02/11 00:53:17 Perhaps we could remove this variable and just rep
dyu1 2011/02/16 03:17:31 Done.
+
+ def CreateDictionaryFromRecord(self, line):
dennisjeffrey 2011/02/11 00:53:17 If this function is only used by the _Convert() fu
dyu1 2011/02/16 03:17:31 Done.
+ """Constructs and returns a dictionary from a record in the dataset file.
+ Escapes single quotation first and uses split('|') to separate values.
dennisjeffrey 2011/02/11 00:53:17 This first line of the comment should be a 1-line
dyu1 2011/02/16 03:17:31 Done.
+
+ Example:
+ Take an argument as a string u'John|Doe|Mountain View'
+ and returns a dictionary
+ {
+ u'NAME_FIRST': u'John',
+ u'NAME_LAST': u'Doe',
+ u'ADDRESS_HOME_CITY': u'Mountain View',
+ }
+
+ Arg:
dennisjeffrey 2011/02/11 00:53:17 "Arg" --> "Args" (I think it should be "Args" eve
dyu1 2011/02/16 03:17:31 Done.
+ line: row of record from the dataset file.
dennisjeffrey 2011/02/11 00:53:17 Since this method returns something, you should ha
dyu1 2011/02/16 03:17:31 Done.
+ """
+ # Ignore irrelevant record lines such as comment lines.
dennisjeffrey 2011/02/11 00:53:17 Besides comment lines, what other lines are consid
dyu1 2011/02/16 03:17:31 Done.
+ if not '|' in line:
dennisjeffrey 2011/02/11 00:53:17 What if a comment contains a "|" character? Then
dyu1 2011/02/16 03:17:31 No, I have a check in place (line 129) where it ch
dennis_jeffrey 2011/02/16 19:43:29 Oh, ok. I didn't realize that each line is expect
+ return
dennisjeffrey 2011/02/11 00:53:17 Is it possible to have a valid line that does not
dyu1 2011/02/16 03:17:31 Well the dataset given to me is in the following f
dennis_jeffrey 2011/02/16 19:43:29 Ok, I see. I was thinking that in general, a reco
+ re_pattern = re.compile("'", re.UNICODE)
+ line = re_pattern.sub(r"\'", line)
dennisjeffrey 2011/02/11 00:53:17 You might want to add a comment to describe what y
dyu1 2011/02/16 03:17:31 Done.
dennis_jeffrey 2011/02/16 19:43:29 Oops, sorry - Now that I see your comment, I reali
+
+ line_list = line.split('|')
+ if line_list:
+ # Check for case when a line may have more or less fields than expected.
+ if len(line_list) != self._record_length:
+ print >> sys.stderr, "Error: a '|' seperated line has %d fields \
+ instead of %d" % (len(line_list), self._record_length)
+ print >> sys.stderr, "\t%s" % line
+ return
dennisjeffrey 2011/02/11 00:53:17 How about raising an exception rather than just re
dyu1 2011/02/16 03:17:31 Done for logging. If I raise an exception here th
dennis_jeffrey 2011/02/16 19:43:29 Ok, I think a logging.warning like what you do now
+ out_record = {}
+ i = 0
+ for key in self._fields:
+ out_record[key] = line_list[i]
+ i += 1
dennisjeffrey 2011/02/11 00:53:17 It looks like here, you're assuming that the order
dyu1 2011/02/16 03:17:31 Yes, since the order of the keys from the order in
+ return out_record
+
+ def _Convert(self, input_file, output_file):
+ """The real conversion takes place here.
dennisjeffrey 2011/02/11 00:53:17 I think it would be more useful to say what's bein
dyu1 2011/02/16 03:17:31 Done.
+
+ Args:
+ input_file: dataset input file.
+ output_file: the converted dictionary list output file.
dennisjeffrey 2011/02/11 00:53:17 Since this function returns something, you need a
dyu1 2011/02/16 03:17:31 Done.
+ """
+ list_of_dict = []
+ i = 0
+ if output_file:
+ output_file.write("[")
+ output_file.write(os.linesep)
+ for line in input_file.readlines():
+ line = line.strip()
+ if not line:
+ continue
+ line = unicode(line, 'UTF-8')
+ output_record = self.CreateDictionaryFromRecord(line)
+ if output_record:
+ i += 1
+ list_of_dict.append(output_record)
+ output_line = self._output_pattern %tuple(
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
+ [output_record[key] for key in self._fields])
+ if output_file:
+ output_file.write(output_line)
+ output_file.write(os.linesep)
+ if not self._display_nothing:
+ if self._display_input_lines:
+ print "\n%d: %s" %(i, line.encode(sys.stdout.encoding, 'ignore'))
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
+ if self._display_converted_lines:
+ print "\tconverted to: %s" %output_line.encode(
dennisjeffrey 2011/02/11 00:53:17 You may want to consider using the "logging" modul
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
+ sys.stdout.encoding, 'ignore')
+ else:
+ if not self._display_input_lines and not i % 10:
+ print "\t%d lines converted so far!" %i
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dennisjeffrey 2011/02/11 00:53:17 I assume all lines should be converted nearly inst
+ if output_file:
+ output_file.write("]")
+ output_file.write(os.linesep)
+ if not self._display_nothing:
+ print
+ print "%d lines converted SUCCESSFULLY!" %i
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
+ print "--- FINISHED ---"
+ print
dennisjeffrey 2011/02/11 00:53:17 Again, consider using "logging" instead of "print"
dyu1 2011/02/16 03:17:31 Done.
+ return list_of_dict
+
+ def Convert(self):
+ """Takes arguments of two file names and creates two file objects, then
dennisjeffrey 2011/02/11 00:53:17 This method actually doesn't take any parameter ar
dyu1 2011/02/16 03:17:31 Done.
+ calls _Convert() with these two file objects to do the real conversion."""
dennisjeffrey 2011/02/11 00:53:17 The first comment line should be a 1-line summary
dyu1 2011/02/16 03:17:31 Done.
+ with open(self._input_filename) as input_file:
+ if self._output_filename:
+ with codecs.open(self._output_filename, mode = 'wb',
+ encoding = 'utf-8-sig') as output_file:
dennisjeffrey 2011/02/11 00:53:17 Remove the spaces around the "=" when specifying t
dyu1 2011/02/16 03:17:31 Done.
+ return self._Convert(input_file, output_file)
+ else:
+ return self._Convert(input_file, None)
+
dennisjeffrey 2011/02/11 00:53:17 Should have an extra blank line here: the style gu
dyu1 2011/02/16 03:17:31 Done.
+def main():
+ c = DatasetConverter(r'../data/autofill/dataset.txt',
dennisjeffrey 2011/02/11 00:53:17 Is it better to hard-code the input filename and o
dyu1 2011/02/16 03:17:31 Well command-line input would be find for the stan
dennis_jeffrey 2011/02/16 19:43:29 When this module is invoked via the PyAuto test, t
+ r'../data/autofill/dataset_duplicate-profiles.txt')
dennisjeffrey 2011/02/11 00:53:17 The second argument should line up underneath the
dyu1 2011/02/16 03:17:31 Done.
+ c.Convert()
+
+if __name__ == '__main__':
+ main()
« chrome/test/functional/autofill.py ('K') | « chrome/test/functional/autofill.py ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698