|
|
Created:
5 years, 4 months ago by lwchkg Modified:
5 years, 4 months ago CC:
grit-developer_googlegroups.com Base URL:
https://chromium.googlesource.com/external/grit-i18n.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionFixed: The depends file should be written in text mode instead of binary mode (UTF-32-LE).
BUG=466008
Patch Set 1 #Patch Set 2 : #Messages
Total messages: 44 (9 generated)
lwchkg@gmail.com changed reviewers: + brettw@chromium.org, dpranke@chromium.org, mdemp...@chromium.org, thakis@chromium.org
Dear dpranke and others, I've just found the exact problem in issue 466008. Please review the patch. Thank you. Regards, WC Leung.
Testing procedure (which I've done myself): 1. Get a Linux checkout with the problem. (That is what I use every day.) 2. Edit the give file. (tools/grit/grit/tool/build.py) 3. Delete *_stamp.d ( Command: find out/Default -name "*_stamp.d" -delete ) 4. Build chrome (ninja -C out/Default chrome) 5. Repeat 4. Expected result: Step 5 does not fail (display ninja: no work to do.)
On 2015/08/05 14:15:58, lwchkg wrote: > Testing procedure (which I've done myself): > 1. Get a Linux checkout with the problem. (That is what I use every day.) > 2. Edit the give file. (tools/grit/grit/tool/build.py) > 3. Delete *_stamp.d ( Command: find out/Default -name "*_stamp.d" -delete ) > 4. Build chrome (ninja -C out/Default chrome) > 5. Repeat 4. > > Expected result: > Step 5 does not fail (display ninja: no work to do.) I think this fix is probably close to correct, if not definitely correct, but I'm curious to know why I haven't been able to reproduce this. Are you using a stock Chromium build? Anything else in your environment that might cause the strings to be UTF-32 rather than ASCII or UTF-8 (I'm not actually sure which we would normally expect to see here)?
Some related questions: - Are you using cygwin? - Do you have non-ASCII characters in the path to your source code?
On 2015/08/05 15:54:50, brettw wrote: > Some related questions: > - Are you using cygwin? This is on Linux.
On 2015/08/05 15:23:08, Dirk Pranke wrote: > On 2015/08/05 14:15:58, lwchkg wrote: > > Testing procedure (which I've done myself): > > 1. Get a Linux checkout with the problem. (That is what I use every day.) > > 2. Edit the give file. (tools/grit/grit/tool/build.py) > > 3. Delete *_stamp.d ( Command: find out/Default -name "*_stamp.d" -delete ) > > 4. Build chrome (ninja -C out/Default chrome) > > 5. Repeat 4. > > > > Expected result: > > Step 5 does not fail (display ninja: no work to do.) > > I think this fix is probably close to correct, if not definitely correct, but > I'm curious to know why I haven't been able to reproduce this. > > Are you using a stock Chromium build? Anything else in your > environment that might cause the strings to be UTF-32 rather > than ASCII or UTF-8 (I'm not actually sure which we would > normally expect to see here)? I guess it is about Python. I'm using 64-bit Ubuntu 15.04 in my computer. https://docs.python.org/2/c-api/unicode.html Just tried the following script: f = open('test.txt', 'wb') f.writelines([unicode('123'), '456']) In Windows (Python comes with depot_tools), a 6-byte file is created. But in Ubuntu a 15-byte file is created. (123 is in UTF-32, 456 in ASCII) (Is wchar_t simply converted into ascii when it is saved to a file?)
Interesting, on my Linux I see 6 bytes. Can you tell me the value of import sys print sys.stdout.encoding and also the LANG and LC_CTYPE environment variables?
(I'm asking because I want to understand this for future reference when doing this kind of thing.)
On 2015/08/05 16:04:17, brettw wrote: > Interesting, on my Linux I see 6 bytes. Can you tell me the value of > import sys > print sys.stdout.encoding > and also the LANG and LC_CTYPE environment variables? python program: UTF-8 LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"
On 2015/08/05 16:25:04, lwchkg wrote: > On 2015/08/05 16:04:17, brettw wrote: > > Interesting, on my Linux I see 6 bytes. Can you tell me the value of > > import sys > > print sys.stdout.encoding > > and also the LANG and LC_CTYPE environment variables? > > python program: UTF-8 > > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" Now it's my turn to ask. What is/are the Linux version(s) you're using?
> Now it's my turn to ask. What is/are the Linux version(s) you're using? Very strange, I would have expected a different encoding. I have Ubuntu 14.04 LTS 64-bit with Python 2.7.6. I wonder if we should always be explicit. I think the current code will fail if the path contains non-ASCII characters, which is probably bad. It looks like codecs.open can be used instead, and then you can specify the encoding (should be UTF-8). Do you think that would be more robust?
On 2015/08/05 16:51:31, brettw wrote: > > Now it's my turn to ask. What is/are the Linux version(s) you're using? > > Very strange, I would have expected a different encoding. I have Ubuntu 14.04 > LTS 64-bit with Python 2.7.6. > > I wonder if we should always be explicit. I think the current code will fail if > the path contains non-ASCII characters, which is probably bad. It looks like > codecs.open can be used instead, and then you can specify the encoding (should > be UTF-8). Do you think that would be more robust? Good idea. I've changed the source code and now testing the building process. Will upload the patch when testing is successful. Anyway, by using codecs.open 'w' and 'wb' are the same now.
On 2015/08/05 17:25:12, lwchkg wrote: > Anyway, by using codecs.open 'w' and 'wb' are the same now. There will be a difference in line endings on Windows. We should be consistent in the types of files we generate.
On 2015/08/05 17:25:12, lwchkg wrote: > On 2015/08/05 16:51:31, brettw wrote: > > > Now it's my turn to ask. What is/are the Linux version(s) you're using? > > > > Very strange, I would have expected a different encoding. I have Ubuntu 14.04 > > LTS 64-bit with Python 2.7.6. > > > > I wonder if we should always be explicit. I think the current code will fail > if > > the path contains non-ASCII characters, which is probably bad. It looks like > > codecs.open can be used instead, and then you can specify the encoding (should > > be UTF-8). Do you think that would be more robust? > > Good idea. I've changed the source code and now testing the building process. > Will upload the patch when testing is successful. > > Anyway, by using codecs.open 'w' and 'wb' are the same now. AFAIK the generated files do not contain any line break. (writelines does not insert line break.)
Revised patch is now uploaded.
lgtm
lgtm lgtm
One wonders if there is some variation in the way the system python was compiled or something that is causing open(..., 'wb').write(<unicode>) thing to be written as utf32-le or ucs-4 raw bytes instead of ascii (which I believe should normally be the default). From a python interpreter, what happens if you run: >>> open('foo', 'wb').write(u'\2020') ? I'd expect you to get: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2020' in position 0: ordinal not in range(128) but maybe on your system you don't get an error and the file gets written in utf32-le ?
On 2015/08/05 19:30:57, Dirk Pranke wrote: > One wonders if there is some variation in the way the system python was compiled > or something that is causing open(..., 'wb').write(<unicode>) thing to be > written as utf32-le or ucs-4 raw bytes instead of ascii (which I believe should > normally be the default). Definitely yes. See https://docs.python.org/2/c-api/unicode.html > Very strange, I would have expected a different encoding. I have Ubuntu 14.04 > LTS 64-bit with Python 2.7.6. As bug 466008 comment #25, it's apparent that the change started from 14.10 > > From a python interpreter, what happens if you run: > > >>> open('foo', 'wb').write(u'\2020') ? > > I'd expect you to get: > > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2020' in position > 0: ordinal not in range(128) > > but maybe on your system you don't get an error and the file gets written in > utf32-le ? Still the same error in Ubuntu 15.04. But if the line is changed into open('foo', 'wb').writelines([u'\u2020']) then interesting things happens: In Windows I get TypeError: writelines() argument must be a sequence of strings But in Ubuntu 15.04 I get a file. The contents are listed below: lwchkg@ubuntu:~/test$ hd -b foo 00000000 20 20 00 00 | ..| 0000000 040 040 000 000 0000004
The CQ bit was checked by lwchkg@gmail.com to run a CQ dry run
The CQ bit was unchecked by lwchkg@gmail.com
The CQ bit was checked by lwchkg@gmail.com
The CQ bit was unchecked by lwchkg@gmail.com
The CQ bit was checked by lwchkg@gmail.com to run a CQ dry run
The CQ bit was unchecked by commit-bot@chromium.org
Commit queue rejected this change because it did not recognize the base URL. Please commit your change manually.
The CQ bit was checked by lwchkg@gmail.com
The CQ bit was unchecked by commit-bot@chromium.org
Commit queue rejected this change because it did not recognize the base URL. Please commit your change manually.
Newbie question: How can I get the patch committed? Note: I've tried "git cl dcommit" but some unrelated presubmit checks failed.
On 2015/08/06 02:19:50, lwchkg wrote: > On 2015/08/05 19:30:57, Dirk Pranke wrote: > > One wonders if there is some variation in the way the system python was > compiled > > or something that is causing open(..., 'wb').write(<unicode>) thing to be > > written as utf32-le or ucs-4 raw bytes instead of ascii (which I believe > should > > normally be the default). > > Definitely yes. > See https://docs.python.org/2/c-api/unicode.html > It's true that Python can be compiled to store unicode as either 2-byte or 4-byte entities -- and it would be interesting to confirm if that was a difference for you -- but I wouldn't expect that to change how the write() call behaved. > > Very strange, I would have expected a different encoding. I have Ubuntu 14.04 > > LTS 64-bit with Python 2.7.6. > > As bug 466008 comment #25, it's apparent that the change started from 14.10 > Didn't comment #24 say that you saw this on 14.04 as well? I also wouldn't expect the O/S version to affect this, but maybe the python package installed with the O/S was configured differently? Can you tell what the pkg-config flags for your Python binary were? > > > > From a python interpreter, what happens if you run: > > > > >>> open('foo', 'wb').write(u'\2020') ? > > > > I'd expect you to get: > > > > UnicodeEncodeError: 'ascii' codec can't encode character u'\u2020' in position > > 0: ordinal not in range(128) > > > > but maybe on your system you don't get an error and the file gets written in > > utf32-le ? > > Still the same error in Ubuntu 15.04. But if the line is changed into > open('foo', 'wb').writelines([u'\u2020']) > then interesting things happens: > > In Windows I get > TypeError: writelines() argument must be a sequence of strings > > But in Ubuntu 15.04 I get a file. The contents are listed below: > > lwchkg@ubuntu:~/test$ hd -b foo > 00000000 20 20 00 00 | ..| > 0000000 040 040 000 000 > 0000004 interesting.
On 2015/08/06 06:41:46, lwchkg wrote: > Newbie question: How can I get the patch committed? > > Note: I've tried "git cl dcommit" but some unrelated presubmit checks failed. grit lives in a separate repo from Chromium and doesn't have a commit queue, and you're probably not a committer. Brett or I will commit it for you.
> It's true that Python can be compiled to store unicode as either 2-byte or > 4-byte > entities -- and it would be interesting to confirm if that was a difference for > you -- > but I wouldn't expect that to change how the write() call behaved. > Correction: write() is the same. Only writelines() is different. > > > Very strange, I would have expected a different encoding. I have Ubuntu > 14.04 > > > LTS 64-bit with Python 2.7.6. > > > > As bug 466008 comment #25, it's apparent that the change started from 14.10 > > > > Didn't comment #24 say that you saw this on 14.04 as well? I also wouldn't > expect the O/S version to affect this, but maybe the python package installed > with the O/S was configured differently? > Sorry I've missed that comment. We may need to ask the Intel guys. Maybe some people is not using stock Python that comes with Ubuntu. Or maybe simply a typo. > Can you tell what the pkg-config flags for your Python binary were? Here's what I've got in Ubuntu 15.04. >>> print sysconfig.get_config_var('CONFIG_ARGS') '--enable-shared' '--prefix=/usr' '--enable-ipv6' '--enable-unicode=ucs4' '--with-dbmliborder=bdb:gdbm' '--with-system-expat' '--with-system-ffi' '--with-fpectl' 'CC=x86_64-linux-gnu-gcc' 'CFLAGS=-D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security ' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' The Windows version coming with depot_tools returns "None" for the some command.
On 2015/08/06 15:42:14, Dirk Pranke wrote: > On 2015/08/06 06:41:46, lwchkg wrote: > > Newbie question: How can I get the patch committed? > > > > Note: I've tried "git cl dcommit" but some unrelated presubmit checks failed. > > grit lives in a separate repo from Chromium and doesn't have a commit queue, > and you're probably not a committer. > > Brett or I will commit it for you. Thanks. BTW I'm still working on my virgin commit in bug 501916. This one simply skipped the queue. LOL.
On 2015/08/06 16:10:15, lwchkg wrote: > Sorry I've missed that comment. We may need to ask the Intel guys. Maybe some > people is not using stock Python that comes with Ubuntu. Or maybe simply a typo. Hi, I'm the reporter of bug 466008. This bug does appear on Ubuntu 14.10 and 15.04(what I'm currently working on). I do use the system python (v2.7.9). The pkg-config is exact same as lwchkg offered. BTW, I've verified this CL works fine for me.
> Hi, I'm the reporter of bug 466008. This bug does appear on Ubuntu 14.10 and > 15.04(what I'm currently working on). I do use the system python (v2.7.9). The > pkg-config is exact same as lwchkg offered. > > BTW, I've verified this CL works fine for me. I see. How about Leon... at comment #24?
Is it possible to add a test for this?
On 2015/08/07 19:31:52, Nico (hiding) wrote: > Is it possible to add a test for this? There are already existing tests, and the tests caught the error correctly in my Ubuntu 15.04. What we need to is to build one/two test machine (trybots?) to test with different python configurations. Below are the tests failed before the patch. After the patch all the three tests passed. Anyway, there is another testGetInputFilesChromeScaledImage still failing. Should we file another bug to fix it? ====================================================================== ERROR: testGenerateDepFile (grit.tool.build_unittest.BuildUnittest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lwchkg/chromium/src/tools/grit/grit/tool/build_unittest.py", line 53, in testGenerateDepFile (dep_output_file, deps_string) = line.split(': ') ValueError: need more than 1 value to unpack ====================================================================== ERROR: testGenerateDepFileWithDependOnStamp (grit.tool.build_unittest.BuildUnittest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lwchkg/chromium/src/tools/grit/grit/tool/build_unittest.py", line 343, in testGenerateDepFileWithDependOnStamp (dep_output_file, deps_string) = line.split(': ') ValueError: need more than 1 value to unpack ====================================================================== ERROR: testGenerateDepFileWithResourceIds (grit.tool.build_unittest.BuildUnittest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/lwchkg/chromium/src/tools/grit/grit/tool/build_unittest.py", line 79, in testGenerateDepFileWithResourceIds (dep_output_file, deps_string) = line.split(': ') ValueError: need more than 1 value to unpack ----------------------------------------------------------------------
On 2015/08/09 14:27:56, lwchkg wrote: > On 2015/08/07 19:31:52, Nico (hiding) wrote: > > Is it possible to add a test for this? > Just an idea: Is it possible to make a modified version of /src/tools/grit/grit/testdata/substitute.grd that outputs a non-ascii file name? Initial testing shows that python can handle Unicode filenames in both Windows and Ubuntu 15.04. Of course we need to make sure that all OS/filesystem that grit is used support Unicode filenames before making such a test.
Ah cool, if existing tests catch this on your system, then I'm happy. I'm going to land this in a second. It's probably a good idea to fix the other failing test too. Adding a non-ascii substitute.grd to increase test coverage sounds good to me too.
Message was sent while issue was closed.
Landed in grit r193.
Message was sent while issue was closed.
> It's probably a good idea to fix the other failing test too. Okay. I'll investigate the test once I have time. > > Adding a non-ascii substitute.grd to increase test coverage sounds good to me > too. Should we create a new .grd file or just modify the existing substitute.grd?
Message was sent while issue was closed.
I've found that there are more tests failing in Windows. So I started issue 518840 and attached a log there. It's a good idea discuss about the failed tests there. Of course, the non-ascii unit test still belongs to here. |