chrome/test/functional/dataset_converter.py - Issue 6246147: Test Autofill's ability to merge duplicate profiles and...

Unified Diff: chrome/test/functional/dataset_converter.py

Issue 6246147: Test Autofill's ability to merge duplicate profiles and... (Closed) Base URL: svn://chrome-svn/chrome/trunk/src/

Patch Set: '' Created 9 years, 10 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: chrome/test/functional/dataset_converter.py

===================================================================

--- chrome/test/functional/dataset_converter.py (revision 0)

+++ chrome/test/functional/dataset_converter.py (revision 0)

@@ -0,0 +1,203 @@

+#!/usr/bin/python

+# Use of this source code is governed by a BSD-style license that can be

+# found in the LICENSE file.

+"""Takes in a dataset profiles file and outputs to a dictionary list format

dennisjeffrey 2011/02/11 00:53:17 The first line of this comment should be a 1-line

dyu1 2011/02/16 03:17:31 Done.

+for converting Autofill profile datasets.

+Used for test autofill.AutoFillTest.testMergeDuplicateProfilesInAutofill.

+"""

+import re

+import codecs

+import sys

+import os

dennisjeffrey 2011/02/11 00:53:17 These should be specified in alphabetical order.

dyu1 2011/02/16 03:17:31 Done.

+class DatasetConverter(object):

+ def __init__(self, input_filename, output_filename = None,

+ display_nothing = True, display_input_lines = False,

+ display_converted_lines = False):

dennisjeffrey 2011/02/11 00:53:17 Don't put spaces around the "=" when you're defini

dennisjeffrey 2011/02/11 00:53:17 Using the "logging" module with different verbosit

dyu1 2011/02/16 03:17:31 Done.

+ """Constructs a dataset converter object.

+ Full input pattern:

+ '(?P<NAME_FIRST>.*?)\|(?P<MIDDLE_NAME>.*?)\|(?P<NAME_LAST>.*?)\|

+ (?P<EMAIL_ADDRESS>.*?)\|(?P<COMPANY_NAME>.*?)\|(?P<ADDRESS_HOME_LINE1>.*?)

+ \|(?P<ADDRESS_HOME_LINE2>.*?)\|(?P<ADDRESS_HOME_CITY>.*?)\|

+ (?P<ADDRESS_HOME_STATE>.*?)\|(?P<ADDRESS_HOME_ZIP>.*?)\|

+ (?P<ADDRESS_HOME_COUNTRY>.*?)\|

+ (?P<PHONE_HOME_WHOLE_NUMBER>.*?)\|(?P<PHONE_FAX_WHOLE_NUMBER>.*?)$'

+ Full ouput pattern:

+ "{u'NAME_FIRST': u'%s', u'NAME_MIDDLE': u'%s', u'NAME_LAST': u'%s',

+ u'EMAIL_ADDRESS': u'%s', u'COMPANY_NAME': u'%s', u'ADDRESS_HOME_LINE1':

+ u'%s', u'ADDRESS_HOME_LINE2': u'%s', u'ADDRESS_HOME_CITY': u'%s',

+ u'ADDRESS_HOME_STATE': u'%s', u'ADDRESS_HOME_ZIP': u'%s',

+ u'ADDRESS_HOME_COUNTRY': u'%s', u'PHONE_HOME_WHOLE_NUMBER': u'%s',

+ u'PHONE_FAX_WHOLE_NUMBER': u'%s',},"

+ The pattern is a regular expression which has named parenthesis groups

Nirnimesh 2011/02/11 19:39:54 I think the input/output pattern above is illustra

dyu1 2011/02/16 03:17:31 Done.

+ like this (?P<name>...) in order to match the '|' separated fields.

+ If we had only the NAME_FIRST and NAME_MIDDLE fields (e.g 'Jared|JV') our

+ pattern would be: "(?P<NAME_FIRST>.*?)\|(?P<NAME_MIDDLE>.*?)$"

+ This means that '(?P<NAME_FIRST> regexp)\|' matches whatever regular

+ expression is inside the parentheses, and indicates the start and end of a

+ group; the contents of a group can be retrieved after a match has been

+ performed using the symbolic group name 'NAME_FIRST'.

+ The regexp is '.*?'. '.*' which means to match 0 or more repetitions of any

+ character. The following '?' makes the regexp non-greedy meaning it will

+ stop at the first occurrence of the '|' character (escaped in the pattern).

+ For '(?P<NAME_MIDDLE>.*?)$' there is no '|' at the end, so we have '$' to

+ indicate the end of the line.

+ From the full pattern, we construct once from the FIELDS list.

+ The out_line_pattern for one field: "{u'NAME_FIRST': u'%s',"

+ is ready to accept the value for the 'NAME_FIRST' field once it is extracted

+ from an input line using the above group pattern.

+ 'pattern' is used in CreateDictionaryFromRecord(line) to construct and

+ return a dictionary from a line.

+ 'out_line_pattern' is used in 'convert()' to construct the final dataset

+ line that will be printed to the output file.

+ Args:

+ input_filename: name and path of the input dataset.

+ output_filename: name and path of the converted file, default is None.

+ display_nothing: output display on the screen, default is True.

+ display_input_lines: output display of the inpute file, default is False.

+ display_converted_lines: output display of the converted file,

+ default is False.

+ """

+ self._fields = [

+ u'NAME_FIRST',

+ u'NAME_MIDDLE',

+ u'NAME_LAST',

+ u'EMAIL_ADDRESS',

+ u'COMPANY_NAME',

+ u'ADDRESS_HOME_LINE1',

+ u'ADDRESS_HOME_LINE2',

+ u'ADDRESS_HOME_CITY',

+ u'ADDRESS_HOME_STATE',

+ u'ADDRESS_HOME_ZIP',

+ u'ADDRESS_HOME_COUNTRY',

+ u'PHONE_HOME_WHOLE_NUMBER',

+ u'PHONE_FAX_WHOLE_NUMBER',

+ ]

dennisjeffrey 2011/02/11 00:53:17 Since _fields is just a constant array, would it b

dyu1 2011/02/16 03:17:31 Done.

+ self._output_pattern = u"{"

Nirnimesh 2011/02/11 19:39:54 prefer single quote char '

dyu1 2011/02/16 03:17:31 Done.

+ for key in self._fields:

+ self._output_pattern += u"u'%s': u'%s', " %(key, "%s")

dennisjeffrey 2011/02/11 00:53:17 I think this could be re-written like this: self.

dyu1 2011/02/16 03:17:31 Done.

+ self._output_pattern = self._output_pattern[:-1] + "},\n"

+ self._input_filename = input_filename

dennisjeffrey 2011/02/11 00:53:17 We should probably check to ensure that input_file

dyu1 2011/02/16 03:17:31 Done.

+ self._output_filename = output_filename

+ self._display_nothing = display_nothing

+ self._display_input_lines = display_input_lines

+ self._display_converted_lines = display_converted_lines

+ self._record_length = len(self._fields)

dennisjeffrey 2011/02/11 00:53:17 Perhaps we could remove this variable and just rep

dyu1 2011/02/16 03:17:31 Done.

+ def CreateDictionaryFromRecord(self, line):

dennisjeffrey 2011/02/11 00:53:17 If this function is only used by the _Convert() fu

dyu1 2011/02/16 03:17:31 Done.

+ """Constructs and returns a dictionary from a record in the dataset file.

+ Escapes single quotation first and uses split('|') to separate values.

dennisjeffrey 2011/02/11 00:53:17 This first line of the comment should be a 1-line

dyu1 2011/02/16 03:17:31 Done.

+ Example:

+ Take an argument as a string u'John|Doe|Mountain View'

+ and returns a dictionary

+ {

+ u'NAME_FIRST': u'John',

+ u'NAME_LAST': u'Doe',

+ u'ADDRESS_HOME_CITY': u'Mountain View',

+ }

+ Arg:

dennisjeffrey 2011/02/11 00:53:17 "Arg" --> "Args" (I think it should be "Args" eve

dyu1 2011/02/16 03:17:31 Done.

+ line: row of record from the dataset file.

dennisjeffrey 2011/02/11 00:53:17 Since this method returns something, you should ha

dyu1 2011/02/16 03:17:31 Done.

+ """

+ # Ignore irrelevant record lines such as comment lines.

dennisjeffrey 2011/02/11 00:53:17 Besides comment lines, what other lines are consid

dyu1 2011/02/16 03:17:31 Done.

+ if not '|' in line:

dennisjeffrey 2011/02/11 00:53:17 What if a comment contains a "|" character? Then

dyu1 2011/02/16 03:17:31 No, I have a check in place (line 129) where it ch

dennis_jeffrey 2011/02/16 19:43:29 Oh, ok. I didn't realize that each line is expect

+ return

dennisjeffrey 2011/02/11 00:53:17 Is it possible to have a valid line that does not

dyu1 2011/02/16 03:17:31 Well the dataset given to me is in the following f

dennis_jeffrey 2011/02/16 19:43:29 Ok, I see. I was thinking that in general, a reco

+ re_pattern = re.compile("'", re.UNICODE)

+ line = re_pattern.sub(r"\'", line)

dennisjeffrey 2011/02/11 00:53:17 You might want to add a comment to describe what y

dyu1 2011/02/16 03:17:31 Done.

dennis_jeffrey 2011/02/16 19:43:29 Oops, sorry - Now that I see your comment, I reali

+ line_list = line.split('|')

+ if line_list:

+ # Check for case when a line may have more or less fields than expected.

+ if len(line_list) != self._record_length:

+ print >> sys.stderr, "Error: a '|' seperated line has %d fields \

+ instead of %d" % (len(line_list), self._record_length)

+ print >> sys.stderr, "\t%s" % line

+ return

dennisjeffrey 2011/02/11 00:53:17 How about raising an exception rather than just re

dyu1 2011/02/16 03:17:31 Done for logging. If I raise an exception here th

dennis_jeffrey 2011/02/16 19:43:29 Ok, I think a logging.warning like what you do now

+ out_record = {}

+ i = 0

+ for key in self._fields:

+ out_record[key] = line_list[i]

+ i += 1

dennisjeffrey 2011/02/11 00:53:17 It looks like here, you're assuming that the order

dyu1 2011/02/16 03:17:31 Yes, since the order of the keys from the order in

+ return out_record

+ def _Convert(self, input_file, output_file):

+ """The real conversion takes place here.

dennisjeffrey 2011/02/11 00:53:17 I think it would be more useful to say what's bein

dyu1 2011/02/16 03:17:31 Done.

+ Args:

+ input_file: dataset input file.

+ output_file: the converted dictionary list output file.

dennisjeffrey 2011/02/11 00:53:17 Since this function returns something, you need a

dyu1 2011/02/16 03:17:31 Done.

+ """

+ list_of_dict = []

+ i = 0

+ if output_file:

+ output_file.write("[")

+ output_file.write(os.linesep)

+ for line in input_file.readlines():

+ line = line.strip()

+ if not line:

+ continue

+ line = unicode(line, 'UTF-8')

+ output_record = self.CreateDictionaryFromRecord(line)

+ if output_record:

+ i += 1

+ list_of_dict.append(output_record)

+ output_line = self._output_pattern %tuple(

dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".

dyu1 2011/02/16 03:17:31 Done.

+ [output_record[key] for key in self._fields])

+ if output_file:

+ output_file.write(output_line)

+ output_file.write(os.linesep)

+ if not self._display_nothing:

+ if self._display_input_lines:

+ print "\n%d: %s" %(i, line.encode(sys.stdout.encoding, 'ignore'))

dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".

dyu1 2011/02/16 03:17:31 Done.

+ if self._display_converted_lines:

+ print "\tconverted to: %s" %output_line.encode(

dennisjeffrey 2011/02/11 00:53:17 You may want to consider using the "logging" modul

dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".

dyu1 2011/02/16 03:17:31 Done.

+ sys.stdout.encoding, 'ignore')

+ else:

+ if not self._display_input_lines and not i % 10:

+ print "\t%d lines converted so far!" %i

dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".

dennisjeffrey 2011/02/11 00:53:17 I assume all lines should be converted nearly inst

+ if output_file:

+ output_file.write("]")

+ output_file.write(os.linesep)

+ if not self._display_nothing:

+ print

+ print "%d lines converted SUCCESSFULLY!" %i

dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".

dyu1 2011/02/16 03:17:31 Done.

+ print "--- FINISHED ---"

+ print

dennisjeffrey 2011/02/11 00:53:17 Again, consider using "logging" instead of "print"

dyu1 2011/02/16 03:17:31 Done.

+ return list_of_dict

+ def Convert(self):

+ """Takes arguments of two file names and creates two file objects, then

dennisjeffrey 2011/02/11 00:53:17 This method actually doesn't take any parameter ar

dyu1 2011/02/16 03:17:31 Done.

+ calls _Convert() with these two file objects to do the real conversion."""

dennisjeffrey 2011/02/11 00:53:17 The first comment line should be a 1-line summary

dyu1 2011/02/16 03:17:31 Done.

+ with open(self._input_filename) as input_file:

+ if self._output_filename:

+ with codecs.open(self._output_filename, mode = 'wb',

+ encoding = 'utf-8-sig') as output_file:

dennisjeffrey 2011/02/11 00:53:17 Remove the spaces around the "=" when specifying t

dyu1 2011/02/16 03:17:31 Done.

+ return self._Convert(input_file, output_file)

+ else:

+ return self._Convert(input_file, None)

dennisjeffrey 2011/02/11 00:53:17 Should have an extra blank line here: the style gu

dyu1 2011/02/16 03:17:31 Done.

+def main():

+ c = DatasetConverter(r'../data/autofill/dataset.txt',

dennisjeffrey 2011/02/11 00:53:17 Is it better to hard-code the input filename and o

dyu1 2011/02/16 03:17:31 Well command-line input would be find for the stan

dennis_jeffrey 2011/02/16 19:43:29 When this module is invoked via the PyAuto test, t

+ r'../data/autofill/dataset_duplicate-profiles.txt')

dennisjeffrey 2011/02/11 00:53:17 The second argument should line up underneath the

dyu1 2011/02/16 03:17:31 Done.

+ c.Convert()

+if __name__ == '__main__':

+ main()

« chrome/test/functional/autofill.py ('K') | « chrome/test/functional/autofill.py ('k') | no next file » | no next file with comments »