tools/findit/crash_utils.py - Issue 430943003: [Findit] Plain objects to represent and parse stack trace.

Side by Side Diff: tools/findit/crash_utils.py

Issue 430943003: [Findit] Plain objects to represent and parse stack trace. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Addressed code review and changed the stacktrace parsing logic to correctly look at the type of cal… Created 6 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Copyright (c) 2014 The Chromium Authors. All rights reserved.

	2 # Use of this source code is governed by a BSD-style license that can be

	3 # found in the LICENSE file.

	4

	5 import cgi

	6 import json

	7 import time

	8 import urllib

	9

	10

	11 def NormalizePathLinux(path):

	12 """Normalizes linux path.

	13

	14 Args:

	15 path: A string representing a path.

	16

	17 Returns:

	18 A tuple containing a component this path is in (e.g blink, skia, etc)

	19 and a path in that component's repository.

	20 """

	21 normalized_path = path

	22 # TODO(jeun): Integrate with parsing DEPS file.

	23 if 'WebKit/' in path:

	24 component = 'blink'

	25 normalized_path = ''.join(path.split('WebKit/')[1:])

	26 else:

	27 component = 'chromium'

	28

	29 if normalized_path.startswith(
	aarya 2014/08/07 15:38:47 Look for the first src/ in path. Build location ar Look for the first src/ in path. Build location are very different for different job types. jeun 2014/08/07 18:43:19 Made it so that it would look for /build/. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > Look for the first src/ in path. Build location are very different for different > job types. Made it so that it would look for /build/.
	30 '/b/build/slave/ASAN_Release__symbolized_/build/'):

	31 normalized_path = normalized_path.split(

	32 '/b/build/slave/ASAN_Release__symbolized_/build/')[1]

	33

	34 if '../../' in normalized_path:
	aarya 2014/08/07 15:38:47 This is bad and hacky (how about three ../../../). This is bad and hacky (how about three ../../../). try using abs_path to see if it rids of .. this should be the first step in normalication. jeun 2014/08/07 18:43:19 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > This is bad and hacky (how about three ../../../). try using abs_path to see if > it rids of .. this should be the first step in normalication. Done.
	35 normalized_path = normalized_path.split('../../')[1]

	36

	37 if 'src/v8/' in normalized_path:

	38 component = 'v8'

	39 normalized_path = normalized_path.split('src/v8/')[1]

	40

	41 if './' in normalized_path:
	aarya 2014/08/07 15:38:48 Why this ? No comments in code ? Why ? Why this ? No comments in code ? Why ? jeun 2014/08/07 18:43:20 Removed and used abspath. Show quoted text On 2014/08/07 15:38:48, aarya wrote: > Why this ? No comments in code ? Why ? Removed and used abspath.
	42 normalized_path = normalized_path.split('./')[1]

	43

	44 if not normalized_path.startswith('src/') and (

	45 not normalized_path.startswith('Source/')):

	46 normalized_path = 'src/' + normalized_path

	47

	48 return (component, normalized_path)

	49

	50

	51 def SplitRange(regression):

	52 """Splits a range as retrieved from clusterfuzz.

	53

	54 Args:

	55 regression: A string in format 'r1234:r5678'.

	56

	57 Returns:

	58 A list containing two numbers represented in string, for example

	59 ['1234','5678'].

	60 """

	61 revisions = regression.split(':')

	62

	63 # If regression information is not available, return none.

	64 if len(revisions) != 2:

	65 return None

	66

	67 start_range = revisions[0]

	68 end_range = revisions[1]

	69

	70 # Check if the range starts with r, such as in 'r10000' format.

	71 if start_range.startswith('r'):
	aarya 2014/08/07 15:38:47 Use lstrip and then don't need an if. Use lstrip and then don't need an if. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > Use lstrip and then don't need an if. Done.
	72 start_range = start_range[1:]

	73 if end_range.startswith('r'):

	74 end_range = end_range[1:]

	75

	76 return [start_range, end_range]

	77

	78

	79 def LoadJSON(json_string):

	80 """Loads json object from string, or None.

	81

	82 Args:

	83 json_string: A string to get object from.

	84

	85 Returns:

	86 JSON object if the string represents a JSON object, None otherwise.

	87 """

	88 try:

	89 data = json.loads(json_string)

	90 except ValueError:

	91 data = None

	92 return data
	aarya 2014/08/07 15:38:47 new line before this line. new line before this line. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > new line before this line. Done.
	93

	94

	95 def GetDataFromURL(url, retries=10, sleep_time=0.1):

	96 """Retrieves raw data from URL, tries 10 times.

	97

	98 Args:

	99 url: URL to get data from.

	100 retries: Number of times to retry connection.

	101 sleep_time: Time in seconds to wait before retrying connection.

	102

	103 Returns:

	104 None if the data retrieval fails, or the raw data.

	105 """

	106 data = None

	107 for i in range(retries):

	108 # Retrieves data from URL.

	109 try:

	110 data = urllib.urlopen(url)

	111

	112 # If retrieval is successful, break from the retry loop.

	113 if data:

	114 break
	aarya 2014/08/07 15:38:47 Why not just return data.read() then don't need 12 Why not just return data.read() then don't need 121-124 and just return None there. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > Why not just return data.read() then don't need 121-124 and just return None > there. Done.
	115

	116 # If retrieval fails, try after sleep_time second.

	117 except IOError:

	118 time.sleep(sleep_time)

	119 continue

	120

	121 # If returned data has something in it, return the content.

	122 if data:

	123 return data.read()

	124 else:

	125 return None

	126

	127

	128 def FindMinLineDistance(crashed_line_list, changed_line_numbers):

	129 """Calculates how far the changed line is from one of the crashes.

	130

	131 Finds the minimum distance between the lines that the file crashed on

	132 and the lines that the file changed. For example, if the file crashed on

	133 line 200 and the CL changes line 203,204 and 205, the function returns 3.

	134

	135 Args:

	136 crashed_line_list: A list of lines that the file crashed on.

	137 changed_line_numbers: A list of lines that the file changed.

	138

	139 Returns:

	140 The minimum distance. If either of the input lists is empty,

	141 it returns inf.

	142

	143 """

	144 min_distance = float('inf')

	145

	146 for line in crashed_line_list:

	147 for distance in changed_line_numbers:

	148 # Find the current distance and update the min if current distance is

	149 # less than current min.

	150 current_distance = abs(line - distance)

	151 if current_distance < min_distance:

	152 min_distance = current_distance

	153

	154 return min_distance

	155

	156

	157 def GuessIfSamePath(path1, path2):
	aarya 2014/08/07 15:38:47 SamePath naming is wrong, this should be like Same SamePath naming is wrong, this should be like SameSubPath. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > SamePath naming is wrong, this should be like SameSubPath. Done.
	158 """Guesses if two paths represent same path.

	159

	160 Compares the name of the folders in the path (by split('/')), and checks

	161 if they match either more than 3 or min of path lengths.

	162

	163 Args:

	164 path1: First path.

	165 path2: Second path to compare.

	166

	167 Returns:

	168 True if it they are thought to be a same path, False otherwise.

	169 """

	170 path1 = path1.split('/')

	171 path2 = path2.split('/')

	172

	173 intersection = set(path1).intersection(set(path2))

	174 return len(intersection) >= (min(3, min(len(path1), len(path2))))

	175

	176

	177 def FindMinStackFrameNum(stack_frame_index, priorities):
	aarya 2014/08/07 15:38:47 s/Num/Number. s/Num/Number. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > s/Num/Number. Done.
	178 """Finds the minimum stack number, from the list of stack numbers.

	179

	180 Args:

	181 stack_frame_index: A list of list containing stack position.
	aarya 2014/08/07 15:38:48 list of list ? list of list ? jeun 2014/08/07 18:43:20 yes, each sublist is a list of stack frames in one Show quoted text On 2014/08/07 15:38:48, aarya wrote: > list of list ? yes, each sublist is a list of stack frames in one match.
	182 priorities: A list of of priority for each file.

	183

	184 Returns:

	185 Inf if stack_frame_index is empty, minimum stack number otherwise.

	186 """

	187 # Get the indexes of the highest priority (or low priority number)

	188 highest_priority = min(priorities)

	189 highest_priority_indices = []

	190 for i in range(len(priorities)):

	191 if priorities[i] == highest_priority:

	192 highest_priority_indices.append(i)

	193

	194 # Gather the list of stack frame numbers for the files that change the

	195 # crash lines.

	196 flattened = []

	197 for i in highest_priority_indices:

	198 flattened += stack_frame_index[i]

	199

	200 # If no stack frame information is available, return inf. Else, return min.

	201 if not flattened:

	202 return float('inf')
	aarya 2014/08/07 15:38:47 define float('inf') in a global var and use throug define float('inf') in a global var and use throughout code with a better name. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > define float('inf') in a global var and use throughout code with a better name. Done.
	203 else:

	204 return min(flattened)

	205

	206

	207 def AddHyperlink(to_add, link):

	208 """Returns a string with HTML link tag.

	209

	210 Args:

	211 to_add: A string to add link.

	212 link: A link to add to the string.

	213

	214 Returns:

	215 A string with hyperlink added.

	216 """

	217 sanitized_link = cgi.escape(link)

	218 return '<a href="%s">%s</a>' % (sanitized_link, to_add)
	Martin Barbella 2014/08/07 15:48:41 This is still potentially unsafe. You need to sani This is still potentially unsafe. You need to sanitize both of these, and the link needs to use quote=True. jeun 2014/08/07 18:43:19 Done. Show quoted text On 2014/08/07 15:48:41, mbarbella wrote: > This is still potentially unsafe. You need to sanitize both of these, and the > link needs to use quote=True. Done.
	219

	220

	221 def PrettifyList(l):

	222 """Returns a string representation of a list.

	223

	224 It adds comma in between the elements and removes the brackets.

	225 Args:

	226 l: A list to prettify.

	227 Returns:

	228 A string representation of the list.

	229 """

	230 return str(l)[1:-1]

	231

	232

	233 def PrettifyFiles(file_list):

	234 """Returns a string representation of a list of file names.

	235

	236 Args:

	237 file_list: A list of tuple, (file_name, file_url).

	238 Returns:

	239 A string representation of file names with their urls.

	240 """

	241 ret = ['\n']

	242 for file_name, file_url in file_list:

	243 ret.append(' %s\n' % AddHyperlink(file_name, file_url))

	244 return ''.join(ret)

	245

	246

	247 def Intersection(crashed_line_list, stack_frame_index, changed_line_numbers):

	248 """Finds the overlap betwee changed lines and crashed lines.

	249

	250 Finds the intersection of the lines that caused the crash and

	251 lines that the file changes. The intersection looks within 3 lines

	252 of the line that caused the crash.

	253

	254 Args:

	255 crashed_line_list: A list of lines that the file crashed on.

	256 stack_frame_index: A list of positions in stack for each of the lines.

	257 changed_line_numbers: A list of lines that the file changed.

	258

	259 Returns:

	260 line_intersection: Intersection between crashed_line_list and

	261 changed_line_numbers.

	262 stack_frame_index_intersection: Stack number for each of the intersections.

	263 """

	264 line_intersection = []

	265 stack_frame_index_intersection = []

	266

	267 # Iterate through the crashed lines, and its occurence in stack.

	268 for (line, stack_frame_index) in zip(crashed_line_list, stack_frame_index):

	269

	270 # Also check previous 3 lines.

	271 line_minus_n = range(line - 3, line + 1)
	aarya 2014/08/07 15:38:47 Also, you shouldn't go backward, just forward by 5 Also, you shouldn't go backward, just forward by 5 lines. Any line change can't affect anything before it. give me a scenario. aarya 2014/08/07 15:38:47 don't hardcode numbers in code, put them in global don't hardcode numbers in code, put them in global at start of file or in a configuration number. also this number should be like 5. jeun 2014/08/07 18:43:19 it is a backward by 5 lines from crashed line, ins Show quoted text On 2014/08/07 15:38:47, aarya wrote: > Also, you shouldn't go backward, just forward by 5 lines. Any line change can't > affect anything before it. give me a scenario. it is a backward by 5 lines from crashed line, instead of forward 5 lines from changed lines. So I think they are the same? jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > don't hardcode numbers in code, put them in global at start of file or in a > configuration number. also this number should be like 5. Done.
	272

	273 for changed_line in changed_line_numbers:

	274

	275 # If a CL does not change crahsed line, check next line.

	276 if changed_line not in line_minus_n:

	277 continue

	278

	279 # If the changed line is exactly the crashed line, add that line.

	280 if line in changed_line_numbers:

	281 to_add = line

	282

	283 # If the changed line is in 3 lines of the crashed line, add the line.

	284 else:

	285 to_add = changed_line
	aarya 2014/08/07 15:38:47 s/to_add/pick better meaningful name. s/to_add/pick better meaningful name. jeun 2014/08/07 18:43:20 Done. Show quoted text On 2014/08/07 15:38:47, aarya wrote: > s/to_add/pick better meaningful name. Done.
	286

	287 # Avoid adding the same line twice.

	288 if to_add not in line_intersection:

	289 line_intersection.append(to_add)

	290 stack_frame_index_intersection.append(stack_frame_index)

	291

	292 break

	293

	294 return (line_intersection, stack_frame_index_intersection)

OLD	NEW

« tools/findit/component_dictionary.py ('K') | « tools/findit/component_dictionary.py ('k') | tools/findit/stacktrace.py » ('j') | tools/findit/stacktrace.py » ('J')