|
|
Created:
3 years, 11 months ago by romax Modified:
3 years, 11 months ago CC:
chromium-reviews, romax+watch_chromium.org, fgorski+watch_chromium.org, dewittj+watch_chromium.org, petewil+watch_chromium.org, chili+watch_chromium.org, dimich+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[Offline Pages] Rename page names saved by test harness to hostnames.
Fixed the issue where the names of the archives would be random GUIDs
after pulling down from the device. Added a renaming step in the script
which would parse the mhtml source, get the original URL and use that as
filename in order to make archive names easier to understand.
BUG=None
Review-Url: https://codereview.chromium.org/2618863002
Cr-Commit-Position: refs/heads/master@{#442704}
Committed: https://chromium.googlesource.com/chromium/src/+/af210b9fb582034f0fe5fde6f675a10ddc40954a
Patch Set 1 #Patch Set 2 : Appending GUID to the filename. #Patch Set 3 : Adding full path, removing GUID. #Messages
Total messages: 16 (4 generated)
romax@chromium.org changed reviewers: + dougarnett@chromium.org, petewil@chromium.org
PTAL. The filenames for now would be like: login_live_com.mhtml www_facebook_com.mhtml www_flipcart_com.mhtml www_myntra_com.mhtml please let me know if you have better ideas on naming:) Thanks.
On 2017/01/06 01:16:15, romax wrote: > PTAL. > The filenames for now would be like: > login_live_com.mhtml > www_facebook_com.mhtml > www_flipcart_com.mhtml > www_myntra_com.mhtml > > please let me know if you have better ideas on naming:) > > Thanks. Did you consider adding versioning(0000, 0001, ...) and hash of everything after the domain name? Or is that already taken care of?
On 2017/01/06 19:07:13, fgorski wrote: > On 2017/01/06 01:16:15, romax wrote: > > PTAL. > > The filenames for now would be like: > > login_live_com.mhtml > > www_facebook_com.mhtml > > www_flipcart_com.mhtml > > www_myntra_com.mhtml > > > > please let me know if you have better ideas on naming:) > > > > Thanks. > > Did you consider adding versioning(0000, 0001, ...) and hash of everything after > the domain name? Or is that already taken care of? It's not. Currently there's no case which would need version/hash since the url list would be 1 url per hostname. However I do think we need this in near future(like multiple snapshot during one save page request), but yet I have no idea what it should look like.
On 2017/01/06 19:12:26, romax wrote: > On 2017/01/06 19:07:13, fgorski wrote: > > On 2017/01/06 01:16:15, romax wrote: > > > PTAL. > > > The filenames for now would be like: > > > login_live_com.mhtml > > > www_facebook_com.mhtml > > > www_flipcart_com.mhtml > > > www_myntra_com.mhtml > > > > > > please let me know if you have better ideas on naming:) > > > > > > Thanks. > > > > Did you consider adding versioning(0000, 0001, ...) and hash of everything > after > > the domain name? Or is that already taken care of? > > It's not. Currently there's no case which would need version/hash since the url > list would be 1 url per hostname. > However I do think we need this in near future(like multiple snapshot during one > save page request), but yet I have no idea what it should look like. Perhaps just prepending the hostname to the current name would take care of uniqueness plus maintain the mapping just in case useful somehow.
On 2017/01/06 19:18:36, dougarnett wrote: > On 2017/01/06 19:12:26, romax wrote: > > On 2017/01/06 19:07:13, fgorski wrote: > > > On 2017/01/06 01:16:15, romax wrote: > > > > PTAL. > > > > The filenames for now would be like: > > > > login_live_com.mhtml > > > > www_facebook_com.mhtml > > > > www_flipcart_com.mhtml > > > > www_myntra_com.mhtml > > > > > > > > please let me know if you have better ideas on naming:) > > > > > > > > Thanks. > > > > > > Did you consider adding versioning(0000, 0001, ...) and hash of everything > > after > > > the domain name? Or is that already taken care of? > > > > It's not. Currently there's no case which would need version/hash since the > url > > list would be 1 url per hostname. > > However I do think we need this in near future(like multiple snapshot during > one > > save page request), but yet I have no idea what it should look like. > > Perhaps just prepending the hostname to the current name would take care of > uniqueness plus maintain the mapping just in case useful somehow. made the filenames like this: login_live_com-96de3123-ea71-421b-a8de-57b6eebcf156.mhtml m_facebook_com-4cc6b7d5-95e0-4f84-8c29-9c0d17e405a7.mhtml www_flipkart_com-776c7c9d-069b-4361-8c3f-aa194cf17c90.mhtml www_myntra_com-e33b36b9-9823-4cfe-bf91-78224ad98b89.mhtml I think starting with hostnames might be better for sorting... PTAL
On 2017/01/06 19:51:31, romax wrote: > On 2017/01/06 19:18:36, dougarnett wrote: > > On 2017/01/06 19:12:26, romax wrote: > > > On 2017/01/06 19:07:13, fgorski wrote: > > > > On 2017/01/06 01:16:15, romax wrote: > > > > > PTAL. > > > > > The filenames for now would be like: > > > > > login_live_com.mhtml > > > > > www_facebook_com.mhtml > > > > > www_flipcart_com.mhtml > > > > > www_myntra_com.mhtml > > > > > > > > > > please let me know if you have better ideas on naming:) > > > > > > > > > > Thanks. > > > > > > > > Did you consider adding versioning(0000, 0001, ...) and hash of everything > > > after > > > > the domain name? Or is that already taken care of? > > > > > > It's not. Currently there's no case which would need version/hash since the > > url > > > list would be 1 url per hostname. > > > However I do think we need this in near future(like multiple snapshot during > > one > > > save page request), but yet I have no idea what it should look like. > > > > Perhaps just prepending the hostname to the current name would take care of > > uniqueness plus maintain the mapping just in case useful somehow. > > made the filenames like this: > login_live_com-96de3123-ea71-421b-a8de-57b6eebcf156.mhtml > m_facebook_com-4cc6b7d5-95e0-4f84-8c29-9c0d17e405a7.mhtml > www_flipkart_com-776c7c9d-069b-4361-8c3f-aa194cf17c90.mhtml > www_myntra_com-e33b36b9-9823-4cfe-bf91-78224ad98b89.mhtml > I think starting with hostnames might be better for sorting... PTAL This is a huge step forward, but sometimes we have more than one site from the same domain in our list. Would it be hard to include the path in addition to the domain? Failing that, adding the GUID after the hostname would also be OK.
lgtm I'm happy with this improvement (currently no duplicate hostnames in 200 eval urls). The final urls we had previously often looked fairly different from original urls so have just been doing domain name mapping when rating mhtml files anyway.
Removed GUID from filename, using full path (without fragments/usernames etc). Now the filenames will look like: www_flipkart_com-sony_playstation_4_ps4_1_tb_p_itmedsdh9jhh9pyt.mhtml m_facebook_com-.mhtml www_myntra_com-towels.mhtml login_live_com-login_srf.mhtml www_hotstar_com-sports_cricket_live_cricket_score_.mhtml PTAL, thanks.
Great, still lgtm from me.
lgtm. Thanks for going the extra mile, this will make all our tests easier to handle!
The CQ bit was checked by romax@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1484085375519950, "parent_rev": "53b088cdab1b90a81d475d96e9e052a125a85590", "commit_rev": "af210b9fb582034f0fe5fde6f675a10ddc40954a"}
Message was sent while issue was closed.
Description was changed from ========== [Offline Pages] Rename page names saved by test harness to hostnames. Fixed the issue where the names of the archives would be random GUIDs after pulling down from the device. Added a renaming step in the script which would parse the mhtml source, get the original URL and use that as filename in order to make archive names easier to understand. BUG=None ========== to ========== [Offline Pages] Rename page names saved by test harness to hostnames. Fixed the issue where the names of the archives would be random GUIDs after pulling down from the device. Added a renaming step in the script which would parse the mhtml source, get the original URL and use that as filename in order to make archive names easier to understand. BUG=None Review-Url: https://codereview.chromium.org/2618863002 Cr-Commit-Position: refs/heads/master@{#442704} Committed: https://chromium.googlesource.com/chromium/src/+/af210b9fb582034f0fe5fde6f675... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://chromium.googlesource.com/chromium/src/+/af210b9fb582034f0fe5fde6f675... |