Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(94)

Side by Side Diff: chrome/test/functional/dataset_converter.py

Issue 6246147: Test Autofill's ability to merge duplicate profiles and... (Closed) Base URL: svn://chrome-svn/chrome/trunk/src/
Patch Set: '' Created 9 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
OLDNEW
(Empty)
1 #!/usr/bin/python
2 # Copyright (c) 2011 The Chromium Authors. All rights reserved.
3 # Use of this source code is governed by a BSD-style license that can be
4 # found in the LICENSE file.
5
6 """Takes in a dataset profiles file and outputs to a dictionary list format
dennisjeffrey 2011/02/11 00:53:17 The first line of this comment should be a 1-line
dyu1 2011/02/16 03:17:31 Done.
7 for converting Autofill profile datasets.
8
9 Used for test autofill.AutoFillTest.testMergeDuplicateProfilesInAutofill.
10 """
11
12 import re
13 import codecs
14 import sys
15 import os
dennisjeffrey 2011/02/11 00:53:17 These should be specified in alphabetical order.
dyu1 2011/02/16 03:17:31 Done.
16
17
18 class DatasetConverter(object):
19 def __init__(self, input_filename, output_filename = None,
20 display_nothing = True, display_input_lines = False,
21 display_converted_lines = False):
dennisjeffrey 2011/02/11 00:53:17 Don't put spaces around the "=" when you're defini
dennisjeffrey 2011/02/11 00:53:17 Using the "logging" module with different verbosit
dyu1 2011/02/16 03:17:31 Done.
22 """Constructs a dataset converter object.
23
24 Full input pattern:
25 '(?P<NAME_FIRST>.*?)\|(?P<MIDDLE_NAME>.*?)\|(?P<NAME_LAST>.*?)\|
26 (?P<EMAIL_ADDRESS>.*?)\|(?P<COMPANY_NAME>.*?)\|(?P<ADDRESS_HOME_LINE1>.*?)
27 \|(?P<ADDRESS_HOME_LINE2>.*?)\|(?P<ADDRESS_HOME_CITY>.*?)\|
28 (?P<ADDRESS_HOME_STATE>.*?)\|(?P<ADDRESS_HOME_ZIP>.*?)\|
29 (?P<ADDRESS_HOME_COUNTRY>.*?)\|
30 (?P<PHONE_HOME_WHOLE_NUMBER>.*?)\|(?P<PHONE_FAX_WHOLE_NUMBER>.*?)$'
31
32 Full ouput pattern:
33 "{u'NAME_FIRST': u'%s', u'NAME_MIDDLE': u'%s', u'NAME_LAST': u'%s',
34 u'EMAIL_ADDRESS': u'%s', u'COMPANY_NAME': u'%s', u'ADDRESS_HOME_LINE1':
35 u'%s', u'ADDRESS_HOME_LINE2': u'%s', u'ADDRESS_HOME_CITY': u'%s',
36 u'ADDRESS_HOME_STATE': u'%s', u'ADDRESS_HOME_ZIP': u'%s',
37 u'ADDRESS_HOME_COUNTRY': u'%s', u'PHONE_HOME_WHOLE_NUMBER': u'%s',
38 u'PHONE_FAX_WHOLE_NUMBER': u'%s',},"
39
40 The pattern is a regular expression which has named parenthesis groups
Nirnimesh 2011/02/11 19:39:54 I think the input/output pattern above is illustra
dyu1 2011/02/16 03:17:31 Done.
41 like this (?P<name>...) in order to match the '|' separated fields.
42 If we had only the NAME_FIRST and NAME_MIDDLE fields (e.g 'Jared|JV') our
43 pattern would be: "(?P<NAME_FIRST>.*?)\|(?P<NAME_MIDDLE>.*?)$"
44
45 This means that '(?P<NAME_FIRST> regexp)\|' matches whatever regular
46 expression is inside the parentheses, and indicates the start and end of a
47 group; the contents of a group can be retrieved after a match has been
48 performed using the symbolic group name 'NAME_FIRST'.
49
50 The regexp is '.*?'. '.*' which means to match 0 or more repetitions of any
51 character. The following '?' makes the regexp non-greedy meaning it will
52 stop at the first occurrence of the '|' character (escaped in the pattern).
53
54 For '(?P<NAME_MIDDLE>.*?)$' there is no '|' at the end, so we have '$' to
55 indicate the end of the line.
56
57 From the full pattern, we construct once from the FIELDS list.
58
59 The out_line_pattern for one field: "{u'NAME_FIRST': u'%s',"
60 is ready to accept the value for the 'NAME_FIRST' field once it is extracted
61 from an input line using the above group pattern.
62
63 'pattern' is used in CreateDictionaryFromRecord(line) to construct and
64 return a dictionary from a line.
65
66 'out_line_pattern' is used in 'convert()' to construct the final dataset
67 line that will be printed to the output file.
68
69 Args:
70 input_filename: name and path of the input dataset.
71 output_filename: name and path of the converted file, default is None.
72 display_nothing: output display on the screen, default is True.
73 display_input_lines: output display of the inpute file, default is False.
74 display_converted_lines: output display of the converted file,
75 default is False.
76 """
77 self._fields = [
78 u'NAME_FIRST',
79 u'NAME_MIDDLE',
80 u'NAME_LAST',
81 u'EMAIL_ADDRESS',
82 u'COMPANY_NAME',
83 u'ADDRESS_HOME_LINE1',
84 u'ADDRESS_HOME_LINE2',
85 u'ADDRESS_HOME_CITY',
86 u'ADDRESS_HOME_STATE',
87 u'ADDRESS_HOME_ZIP',
88 u'ADDRESS_HOME_COUNTRY',
89 u'PHONE_HOME_WHOLE_NUMBER',
90 u'PHONE_FAX_WHOLE_NUMBER',
91 ]
dennisjeffrey 2011/02/11 00:53:17 Since _fields is just a constant array, would it b
dyu1 2011/02/16 03:17:31 Done.
92 self._output_pattern = u"{"
Nirnimesh 2011/02/11 19:39:54 prefer single quote char '
dyu1 2011/02/16 03:17:31 Done.
93 for key in self._fields:
94 self._output_pattern += u"u'%s': u'%s', " %(key, "%s")
dennisjeffrey 2011/02/11 00:53:17 I think this could be re-written like this: self.
dyu1 2011/02/16 03:17:31 Done.
95 self._output_pattern = self._output_pattern[:-1] + "},\n"
96
97 self._input_filename = input_filename
dennisjeffrey 2011/02/11 00:53:17 We should probably check to ensure that input_file
dyu1 2011/02/16 03:17:31 Done.
98 self._output_filename = output_filename
99 self._display_nothing = display_nothing
100 self._display_input_lines = display_input_lines
101 self._display_converted_lines = display_converted_lines
102 self._record_length = len(self._fields)
dennisjeffrey 2011/02/11 00:53:17 Perhaps we could remove this variable and just rep
dyu1 2011/02/16 03:17:31 Done.
103
104 def CreateDictionaryFromRecord(self, line):
dennisjeffrey 2011/02/11 00:53:17 If this function is only used by the _Convert() fu
dyu1 2011/02/16 03:17:31 Done.
105 """Constructs and returns a dictionary from a record in the dataset file.
106 Escapes single quotation first and uses split('|') to separate values.
dennisjeffrey 2011/02/11 00:53:17 This first line of the comment should be a 1-line
dyu1 2011/02/16 03:17:31 Done.
107
108 Example:
109 Take an argument as a string u'John|Doe|Mountain View'
110 and returns a dictionary
111 {
112 u'NAME_FIRST': u'John',
113 u'NAME_LAST': u'Doe',
114 u'ADDRESS_HOME_CITY': u'Mountain View',
115 }
116
117 Arg:
dennisjeffrey 2011/02/11 00:53:17 "Arg" --> "Args" (I think it should be "Args" eve
dyu1 2011/02/16 03:17:31 Done.
118 line: row of record from the dataset file.
dennisjeffrey 2011/02/11 00:53:17 Since this method returns something, you should ha
dyu1 2011/02/16 03:17:31 Done.
119 """
120 # Ignore irrelevant record lines such as comment lines.
dennisjeffrey 2011/02/11 00:53:17 Besides comment lines, what other lines are consid
dyu1 2011/02/16 03:17:31 Done.
121 if not '|' in line:
dennisjeffrey 2011/02/11 00:53:17 What if a comment contains a "|" character? Then
dyu1 2011/02/16 03:17:31 No, I have a check in place (line 129) where it ch
dennis_jeffrey 2011/02/16 19:43:29 Oh, ok. I didn't realize that each line is expect
122 return
dennisjeffrey 2011/02/11 00:53:17 Is it possible to have a valid line that does not
dyu1 2011/02/16 03:17:31 Well the dataset given to me is in the following f
dennis_jeffrey 2011/02/16 19:43:29 Ok, I see. I was thinking that in general, a reco
123 re_pattern = re.compile("'", re.UNICODE)
124 line = re_pattern.sub(r"\'", line)
dennisjeffrey 2011/02/11 00:53:17 You might want to add a comment to describe what y
dyu1 2011/02/16 03:17:31 Done.
dennis_jeffrey 2011/02/16 19:43:29 Oops, sorry - Now that I see your comment, I reali
125
126 line_list = line.split('|')
127 if line_list:
128 # Check for case when a line may have more or less fields than expected.
129 if len(line_list) != self._record_length:
130 print >> sys.stderr, "Error: a '|' seperated line has %d fields \
131 instead of %d" % (len(line_list), self._record_length)
132 print >> sys.stderr, "\t%s" % line
133 return
dennisjeffrey 2011/02/11 00:53:17 How about raising an exception rather than just re
dyu1 2011/02/16 03:17:31 Done for logging. If I raise an exception here th
dennis_jeffrey 2011/02/16 19:43:29 Ok, I think a logging.warning like what you do now
134 out_record = {}
135 i = 0
136 for key in self._fields:
137 out_record[key] = line_list[i]
138 i += 1
dennisjeffrey 2011/02/11 00:53:17 It looks like here, you're assuming that the order
dyu1 2011/02/16 03:17:31 Yes, since the order of the keys from the order in
139 return out_record
140
141 def _Convert(self, input_file, output_file):
142 """The real conversion takes place here.
dennisjeffrey 2011/02/11 00:53:17 I think it would be more useful to say what's bein
dyu1 2011/02/16 03:17:31 Done.
143
144 Args:
145 input_file: dataset input file.
146 output_file: the converted dictionary list output file.
dennisjeffrey 2011/02/11 00:53:17 Since this function returns something, you need a
dyu1 2011/02/16 03:17:31 Done.
147 """
148 list_of_dict = []
149 i = 0
150 if output_file:
151 output_file.write("[")
152 output_file.write(os.linesep)
153 for line in input_file.readlines():
154 line = line.strip()
155 if not line:
156 continue
157 line = unicode(line, 'UTF-8')
158 output_record = self.CreateDictionaryFromRecord(line)
159 if output_record:
160 i += 1
161 list_of_dict.append(output_record)
162 output_line = self._output_pattern %tuple(
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
163 [output_record[key] for key in self._fields])
164 if output_file:
165 output_file.write(output_line)
166 output_file.write(os.linesep)
167 if not self._display_nothing:
168 if self._display_input_lines:
169 print "\n%d: %s" %(i, line.encode(sys.stdout.encoding, 'ignore'))
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
170 if self._display_converted_lines:
171 print "\tconverted to: %s" %output_line.encode(
dennisjeffrey 2011/02/11 00:53:17 You may want to consider using the "logging" modul
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
172 sys.stdout.encoding, 'ignore')
173 else:
174 if not self._display_input_lines and not i % 10:
175 print "\t%d lines converted so far!" %i
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dennisjeffrey 2011/02/11 00:53:17 I assume all lines should be converted nearly inst
176 if output_file:
177 output_file.write("]")
178 output_file.write(os.linesep)
179 if not self._display_nothing:
180 print
181 print "%d lines converted SUCCESSFULLY!" %i
dennisjeffrey 2011/02/11 00:53:17 Put a space after the "%".
dyu1 2011/02/16 03:17:31 Done.
182 print "--- FINISHED ---"
183 print
dennisjeffrey 2011/02/11 00:53:17 Again, consider using "logging" instead of "print"
dyu1 2011/02/16 03:17:31 Done.
184 return list_of_dict
185
186 def Convert(self):
187 """Takes arguments of two file names and creates two file objects, then
dennisjeffrey 2011/02/11 00:53:17 This method actually doesn't take any parameter ar
dyu1 2011/02/16 03:17:31 Done.
188 calls _Convert() with these two file objects to do the real conversion."""
dennisjeffrey 2011/02/11 00:53:17 The first comment line should be a 1-line summary
dyu1 2011/02/16 03:17:31 Done.
189 with open(self._input_filename) as input_file:
190 if self._output_filename:
191 with codecs.open(self._output_filename, mode = 'wb',
192 encoding = 'utf-8-sig') as output_file:
dennisjeffrey 2011/02/11 00:53:17 Remove the spaces around the "=" when specifying t
dyu1 2011/02/16 03:17:31 Done.
193 return self._Convert(input_file, output_file)
194 else:
195 return self._Convert(input_file, None)
196
dennisjeffrey 2011/02/11 00:53:17 Should have an extra blank line here: the style gu
dyu1 2011/02/16 03:17:31 Done.
197 def main():
198 c = DatasetConverter(r'../data/autofill/dataset.txt',
dennisjeffrey 2011/02/11 00:53:17 Is it better to hard-code the input filename and o
dyu1 2011/02/16 03:17:31 Well command-line input would be find for the stan
dennis_jeffrey 2011/02/16 19:43:29 When this module is invoked via the PyAuto test, t
199 r'../data/autofill/dataset_duplicate-profiles.txt')
dennisjeffrey 2011/02/11 00:53:17 The second argument should line up underneath the
dyu1 2011/02/16 03:17:31 Done.
200 c.Convert()
201
202 if __name__ == '__main__':
203 main()
OLDNEW
« chrome/test/functional/autofill.py ('K') | « chrome/test/functional/autofill.py ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698