tools/chrome_proxy/webdriver/common.py - Issue 2572383003: Add domain checking to returned HTTPResponses

Side by Side Diff: tools/chrome_proxy/webdriver/common.py

Issue 2572383003: Add domain checking to returned HTTPResponses (Closed)

Patch Set: Created 4 years ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

OLD	NEW
1 # Copyright 2016 The Chromium Authors. All rights reserved.	1 # Copyright 2016 The Chromium Authors. All rights reserved.

2 # Use of this source code is governed by a BSD-style license that can be	2 # Use of this source code is governed by a BSD-style license that can be

3 # found in the LICENSE file.	3 # found in the LICENSE file.

4	4

5 import argparse	5 import argparse

6 import json	6 import json

7 import logging	7 import logging

8 import os	8 import os

9 import re	9 import re

10 import socket	10 import socket

11 import shlex	11 import shlex

12 import sys	12 import sys

13 import time	13 import time

14 import traceback	14 import traceback

15 import unittest	15 import unittest

	16 import urlparse

16	17

17 sys.path.append(os.path.join(os.path.dirname(__file__), os.pardir, os.pardir,	18 sys.path.append(os.path.join(os.path.dirname(__file__), os.pardir, os.pardir,

18 os.pardir, 'third_party', 'webdriver', 'pylib'))	19 os.pardir, 'third_party', 'webdriver', 'pylib'))

19 from selenium import webdriver	20 from selenium import webdriver

20 from selenium.webdriver.chrome.options import Options	21 from selenium.webdriver.chrome.options import Options

21	22

22 def ParseFlags():	23 def ParseFlags():

23 """Parses the given command line arguments.	24 """Parses the given command line arguments.

24	25

25 Returns:	26 Returns:

(...skipping 324 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
350 all_messages = []	351 all_messages = []

351 for log in self._driver.execute('getLog', {'type': 'performance'})['value']:	352 for log in self._driver.execute('getLog', {'type': 'performance'})['value']:

352 message = json.loads(log['message'])['message']	353 message = json.loads(log['message'])['message']

353 self._logger.debug('Got Performance log: %s', log['message'])	354 self._logger.debug('Got Performance log: %s', log['message'])

354 if re.match(method_filter, message['method']):	355 if re.match(method_filter, message['method']):

355 all_messages.append(message)	356 all_messages.append(message)

356 self._logger.info('Got %d performance logs with filter method=%s',	357 self._logger.info('Got %d performance logs with filter method=%s',

357 len(all_messages), method_filter)	358 len(all_messages), method_filter)

358 return all_messages	359 return all_messages

359	360

360 def GetHTTPResponses(self, include_favicon=False):	361 def GetHTTPResponses(self, include_favicon=False,

	362 only_include_loaded_domain=True):
	sclittle 2016/12/19 19:50:09 I'm not sure this is the best way to solve the pro I'm not sure this is the best way to solve the problem of data:// or about:blank URLs being requested. Could you just filter out all non-http/https URLs by default, or are there other URLs from other domains that pose problems? Robert Ogden 2016/12/19 20:23:10 Filtering on presence of the scheme (http,https) w Show quoted text On 2016/12/19 19:50:09, sclittle wrote: > I'm not sure this is the best way to solve the problem of data:// or about:blank > URLs being requested. Could you just filter out all non-http/https URLs by > default, or are there other URLs from other domains that pose problems? Filtering on presence of the scheme (http,https) would work. The idea behind domains is not a concrete one, just a thought/fear of unexpected behavior for test sites that use cross-origin requests. sclittle 2016/12/19 21:17:08 Hmm - I think I see what you mean, but I'm worried Show quoted text On 2016/12/19 20:23:10, Robert Ogden wrote: > On 2016/12/19 19:50:09, sclittle wrote: > > I'm not sure this is the best way to solve the problem of data:// or > about:blank > > URLs being requested. Could you just filter out all non-http/https URLs by > > default, or are there other URLs from other domains that pose problems? > > Filtering on presence of the scheme (http,https) would work. The idea behind > domains is not a concrete one, just a thought/fear of unexpected behavior for > test sites that use cross-origin requests. Hmm - I think I see what you mean, but I'm worried that ignoring the cross-origin requests could cause more problems than it solves. E.g., suppose there's an integration test to check if bypass works. If the request that caused the bypass gets ignored here because it was cross origin, it'll be harder to debug. Plus, hopefully it shouldn't be too hard to have our test pages only make first-party requests. If they do make cross-origin requests, then that's probably on purpose and we likely don't want to ignore those requests anyways. WDYT? Robert Ogden 2016/12/19 22:17:46 SGTM. After looking through the pydoc, I ended up Show quoted text On 2016/12/19 21:17:08, sclittle wrote: > On 2016/12/19 20:23:10, Robert Ogden wrote: > > On 2016/12/19 19:50:09, sclittle wrote: > > > I'm not sure this is the best way to solve the problem of data:// or > > about:blank > > > URLs being requested. Could you just filter out all non-http/https URLs by > > > default, or are there other URLs from other domains that pose problems? > > > > Filtering on presence of the scheme (http,https) would work. The idea behind > > domains is not a concrete one, just a thought/fear of unexpected behavior for > > test sites that use cross-origin requests. > > Hmm - I think I see what you mean, but I'm worried that ignoring the > cross-origin requests could cause more problems than it solves. E.g., suppose > there's an integration test to check if bypass works. If the request that caused > the bypass gets ignored here because it was cross origin, it'll be harder to > debug. > > Plus, hopefully it shouldn't be too hard to have our test pages only make > first-party requests. If they do make cross-origin requests, then that's > probably on purpose and we likely don't want to ignore those requests anyways. > > WDYT? SGTM. After looking through the pydoc, I ended up checking the presence of a net_loc (or domain as I know it) instead of a scheme, since scheme='data' where url='data:,,' which has been happening a lot recently.
361 """Parses the Performance Logs and returns a list of HTTPResponse objects.	363 """Parses the Performance Logs and returns a list of HTTPResponse objects.

362	364

363 Use caution when calling this function multiple times. Only responses	365 Use caution when calling this function multiple times. Only responses

364 since the last time this function was called are returned (or since Chrome	366 since the last time this function was called are returned (or since Chrome

365 started, whichever is later).	367 started, whichever is later).

366	368

367 Args:	369 Args:

368 include_favicon: A bool that if True will include responses for favicons.	370 include_favicon: A bool that if True will include responses for favicons.

	371 only_include_loaded_domain: If True, only responses with the same domain

	372 as the last requested page will be included in the result.

369 Returns:	373 Returns:

370 A list of HTTPResponse objects, each representing a single completed HTTP	374 A list of HTTPResponse objects, each representing a single completed HTTP

371 transaction by Chrome.	375 transaction by Chrome.

372 """	376 """

373 def MakeHTTPResponse(log_dict):	377 def MakeHTTPResponse(log_dict):

374 params = log_dict['params']	378 params = log_dict['params']

375 response_dict = params['response']	379 response_dict = params['response']

376 http_response_dict = {	380 http_response_dict = {

377 'response_headers': response_dict['headers'] if 'headers' in	381 'response_headers': response_dict['headers'] if 'headers' in

378 response_dict else {},	382 response_dict else {},

379 'request_headers': response_dict['requestHeaders'] if 'requestHeaders'	383 'request_headers': response_dict['requestHeaders'] if 'requestHeaders'

380 in response_dict else {},	384 in response_dict else {},

381 'url': response_dict['url'] if 'url' in response_dict else '',	385 'url': response_dict['url'] if 'url' in response_dict else '',

382 'protocol': response_dict['protocol'] if 'protocol' in response_dict	386 'protocol': response_dict['protocol'] if 'protocol' in response_dict

383 else '',	387 else '',

384 'port': response_dict['remotePort'] if 'remotePort' in response_dict	388 'port': response_dict['remotePort'] if 'remotePort' in response_dict

385 else -1,	389 else -1,

386 'status': response_dict['status'] if 'status' in response_dict else -1,	390 'status': response_dict['status'] if 'status' in response_dict else -1,

387 'request_type': params['type'] if 'type' in params else ''	391 'request_type': params['type'] if 'type' in params else ''

388 }	392 }

389 return HTTPResponse(**http_response_dict)	393 return HTTPResponse(**http_response_dict)

390 all_responses = []	394 all_responses = []

391 for message in self.GetPerformanceLogs():	395 for message in self.GetPerformanceLogs():

392 response = MakeHTTPResponse(message)	396 response = MakeHTTPResponse(message)

393 self._logger.debug('New HTTPResponse: %s', str(response))	397 self._logger.debug('New HTTPResponse: %s', str(response))

394 is_favicon = response.url.endswith('favicon.ico')	398 is_favicon = response.url.endswith('favicon.ico')

395 if not is_favicon or include_favicon:	399 is_same_loaded_domain = (urlparse.urlparse(response.url).netloc ==

	400 urlparse.urlparse(self._url).netloc)

	401 if ((not is_favicon or include_favicon) and

	402 (not only_include_loaded_domain or is_same_loaded_domain)):

396 all_responses.append(response)	403 all_responses.append(response)

	404 else:

	405 self._logger.info("Skipping HTTPResponse with url=%s in returned logs.",

	406 response.url)

397 self._logger.info('%d new HTTPResponse objects found in the logs %s '	407 self._logger.info('%d new HTTPResponse objects found in the logs %s '

398 'favicons', len(all_responses), ('including' if include_favicon else	408 'favicons', len(all_responses), ('including' if include_favicon else

399 'not including'))	409 'not including'))

400 return all_responses	410 return all_responses

401	411

402 class HTTPResponse:	412 class HTTPResponse:

403 """This class represents a single HTTP transaction (request and response) by	413 """This class represents a single HTTP transaction (request and response) by

404 Chrome.	414 Chrome.

405	415

406 This class also includes several convenience functions for ChromeProxy	416 This class also includes several convenience functions for ChromeProxy

(...skipping 114 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
521 """	531 """

522 flags = ParseFlags()	532 flags = ParseFlags()

523 logger = GetLogger()	533 logger = GetLogger()

524 logger.debug('Command line args: %s', str(sys.argv))	534 logger.debug('Command line args: %s', str(sys.argv))

525 logger.info('sys.argv parsed to %s', str(flags))	535 logger.info('sys.argv parsed to %s', str(flags))

526 # The unittest library uses sys.argv itself and is easily confused by our	536 # The unittest library uses sys.argv itself and is easily confused by our

527 # command line options. Pass it a simpler argv instead, while working in the	537 # command line options. Pass it a simpler argv instead, while working in the

528 # unittest command line args functionality.	538 # unittest command line args functionality.

529 unittest.main(argv=[sys.argv[0]], verbosity=2, failfast=flags.failfast,	539 unittest.main(argv=[sys.argv[0]], verbosity=2, failfast=flags.failfast,

530 catchbreak=flags.catch, buffer=(not flags.disable_buffer))	540 catchbreak=flags.catch, buffer=(not flags.disable_buffer))

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »