|
|
Created:
8 years, 3 months ago by agl Modified:
8 years, 1 month ago Reviewers:
wtc CC:
chromium-reviews Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Description32-bit, P-256 implementation.
This is a dummy changelist to get this code into the codereview system.
Patch Set 1 #
Total comments: 22
Patch Set 2 : ... #Patch Set 3 : ... #Patch Set 4 : Including results of ekasper's review. #
Total comments: 86
Patch Set 5 : Addressing wtc'c comments. #Messages
Total messages: 13 (0 generated)
Sorry that it took so long, but I finally finished off the P-256 code. The changes to hook it into freebl are trivial compared to the implementation itself, which I've put into this dummy CL in order to make it easier to review. (I fear this codereview may kill you!)
agl: Thank you for the patch. I will take a look at it on Thursday. Please also send me the changes to hook it into freebl. If you have time before Thursday, could you figure out how to use the NSS ecperf program to demonstrate the speedup? http://mxr.mozilla.org/security/source/security/nss/cmd/ecperf/ I haven't used the ecperf program before.
Preliminary review comments on patch set 1: Status update: sorry about the late review. I read the CL for a few hours on my flight to Boston last Monday. I only made it to felem_diff() (about 250 lines). There is quite a bit of math in this CL that I don't understand, which is why the review was slow. (I was hoping I could ask you some questions at the offsite meeting.) I'm going to take a different strategy and make a first pass without understanding the math. Here are some preliminary comments. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include <endian.h> <endian.h> is not portable. For example, it doesn't exist on Mac OS X. so probably doesn't exist for iOS either. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode8 ecp_256_32.c:8: #include <unistd.h> Do you remember why you need <unistd.h>? This header seems to declare system calls. It's not clear to me why this file (which is mostly math) needs to make system calls. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode19 ecp_256_32.c:19: /* Our field elements are represented as nine, 32-bit words. Freebl's MPI Nit: I think the comma after "nine" is not necessary. Similarly for line 63. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode31 ecp_256_32.c:31: * when multipling as terms end up one bit short of a limb which would require Typo: multipling => multiplying http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode160 ecp_256_32.c:160: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) I believe you are depending on the right-shift operator extending the sign bit when the left operand is negative. This is implementation-defined behavior. The C language standard says (paraphrasing): If the left operand has a signed type and negative value, the result is implementation defined. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode190 ecp_256_32.c:190: for (i = 0; i < 9; i++) { Should "9" be changed to NLIMBS here and on line 197? Not sure which is more readable. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode216 ecp_256_32.c:216: /* zero31 is 0 mod p. */ zero31 is 0 mod p. Just curious: is it p, 2p, 3p, 4p, ...? http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode255 ecp_256_32.c:255: static void felem_reduce_degree(felem out, u64 tmp[17]) Why does the 'tmp' array have a size of 17?
http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include <endian.h> On 2012/10/09 21:04:47, wtc wrote: > > <endian.h> is not portable. For example, it doesn't exist on > Mac OS X. so probably doesn't exist for iOS either. In which case there doesn't appear to be any portable method for checking the endianness of the system at compile time. Is this something that the NSS build system knows about? http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode8 ecp_256_32.c:8: #include <unistd.h> On 2012/10/09 21:04:47, wtc wrote: > > Do you remember why you need <unistd.h>? This header seems to > declare system calls. It's not clear to me why this file (which > is mostly math) needs to make system calls. It's probably leftover from debugging calls. Will remove. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode19 ecp_256_32.c:19: /* Our field elements are represented as nine, 32-bit words. Freebl's MPI On 2012/10/09 21:04:47, wtc wrote: > > Nit: I think the comma after "nine" is not necessary. Similarly > for line 63. Done. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode31 ecp_256_32.c:31: * when multipling as terms end up one bit short of a limb which would require On 2012/10/09 21:04:47, wtc wrote: > > Typo: multipling => multiplying Done. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode160 ecp_256_32.c:160: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) On 2012/10/09 21:04:47, wtc wrote: > > I believe you are depending on the right-shift operator > extending the sign bit when the left operand is negative. > This is implementation-defined behavior. The C language > standard says (paraphrasing): > If the left operand has a signed type and negative value, > the result is implementation defined. I wasn't aware that the result was undefined by C. That's unfortunate because the alternative is quite a lot slower. Can we compile this only on ARM (and maybe Intel) and thus restrict it to chips that implement an arithmetic-shift-right? Tests will certainly uncover if a platform is doing anything else. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode190 ecp_256_32.c:190: for (i = 0; i < 9; i++) { On 2012/10/09 21:04:47, wtc wrote: > > Should "9" be changed to NLIMBS here and on line 197? Not sure > which is more readable. I've changed it to NLIMBS here and in other places where I still had an explicit '9'. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode216 ecp_256_32.c:216: /* zero31 is 0 mod p. */ On 2012/10/09 21:04:47, wtc wrote: > > zero31 is 0 mod p. Just curious: is it p, 2p, 3p, 4p, ...? It ends up as 8p. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode255 ecp_256_32.c:255: static void felem_reduce_degree(felem out, u64 tmp[17]) On 2012/10/09 21:04:47, wtc wrote: > > Why does the 'tmp' array have a size of 17? The result of multiplying two arrays with indexes [0...8] is an array with indexes [0...16], so there's 17 elements in the product.
http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include <endian.h> On 2012/10/11 14:55:04, agl wrote: > > In which case there doesn't appear to be any portable method for checking the > endianness of the system at compile time. Is this something that the NSS build > system knows about? Yes, NSPR provides the endianness of the system using two macros: IS_BIG_ENDIAN and IS_LITTLE_ENDIAN. Only one of the macro is defined. Include the NSPR header "prcpucfg.h" (which is included by any NSPR header) and test either IS_BIG_ENDIAN or IS_LITTLE_ENDIAN. For example, the "prcpucfg.h" file for Linux defines the two macros for ARM as follows: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/include/md/_li... http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode160 ecp_256_32.c:160: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) On 2012/10/11 14:55:04, agl wrote: > Can we compile this only on ARM (and > maybe Intel) and thus restrict it to chips that implement an > arithmetic-shift-right? Tests will certainly uncover if a platform is doing > anything else. If that property depends on the processor rather than the compiler, yes, we can do that. Please also add a comment to state your assumption of arithmetic right-shift. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode216 ecp_256_32.c:216: /* zero31 is 0 mod p. */ On 2012/10/11 14:55:04, agl wrote: > > It ends up as 8p. Just curious: why is 8p chosen? Is it more convenient than the other 0 mod p values?
http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include <endian.h> On 2012/10/11 16:29:00, wtc wrote: > Include the NSPR header "prcpucfg.h" (which is included by > any NSPR header) and test either IS_BIG_ENDIAN or > IS_LITTLE_ENDIAN. Thanks, done. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode160 ecp_256_32.c:160: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) On 2012/10/11 16:29:00, wtc wrote: > > On 2012/10/11 14:55:04, agl wrote: > > Can we compile this only on ARM (and > > maybe Intel) and thus restrict it to chips that implement an > > arithmetic-shift-right? Tests will certainly uncover if a platform is doing > > anything else. > > If that property depends on the processor rather than the > compiler, yes, we can do that. Please also add a comment to > state your assumption of arithmetic right-shift. Done. http://codereview.chromium.org/10944017/diff/1/ecp_256_32.c#newcode216 ecp_256_32.c:216: /* zero31 is 0 mod p. */ On 2012/10/11 16:29:00, wtc wrote: > Just curious: why is 8p chosen? Is it more convenient than > the other 0 mod p values? The important feature of the value is that the high bits of each word are set (i.e. either bit 30 or 31). This allows us to add this value to another felem without affecting it's value mod p, but now each word can be subtracted from without underflow. So it happens to be 8p because we want to set bit 31 in the last word and that happens to hit 2**259. Therefore the closest value that is 0 mod p is 8p.
Would you like me to ask bmoeller or ekasper to review this? I fear that I've overloaded you :)
On 2012/11/12 16:12:42, agl wrote: > Would you like me to ask bmoeller or ekasper to review this? I fear that I've > overloaded you :) Yes, having a second reviewer who can check the math is a great idea. Thanks.
I asked ekasper to review the code and she asked for a few places to be commented (done), to express some indexes differently (done) and pointed out that I had missed a couple of small terms in the bounds calculations and had been overly conservative in some others. I've fixed the bounds calculations as suggested but it didn't change any of the results.
Patch set 4 LGTM. High-level comments: 1. It would be nice to add a comment to provide insight on why this implementation is much faster than the current implementation in freebl, and recommend the configurations that this implementation is suitable for (32-bit, little-endian processors?). 2. Is there any code in freebl that can be conditionally compiled out if this file is used? 3. I'm glad that ekasper reviewed the math. This simplified the code review for me. I read every line, so I couldn't resist asking a few questions about the math. I also suggest some changes to fix coding style issues. Thanks! http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include "prcpucfg.h" 1. Please copy the MPL 2 header from another file. 2. NSS doesn't have style recommendations for headers. I suggest we list the headers in two groups: #include <stdint.h> #include <stdio.h> #include <string.h> #include <stdlib.h> #include "prcpucfg.h" #include "mpi.h" #include "mpi-priv.h" #include "ecp.h" 3. The <stdint.h> header could be problematic when we integrate this file with upstream NSS. A solution is to replace <stdint.h> with "prtypes.h" and replace uint8_t, uint32_t, etc. with PRUint8, PRUint32, etc. If "prtypes.h" is included, "prcpucfg.h" can be omitted. (Every NSPR header includes "prcpucfg.h".) http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode56 ecp_256_32.c:56: static const limb k2P[NLIMBS] = { Use the felem typedef when you define kZero, kP, and k2P, as in the definition of kOne. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode60 ecp_256_32.c:60: }; Add a blank line here? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode86 ecp_256_32.c:86: * The second table follows the same style, but the multiples are 2**32G, Nit: 2**32G, 2**96G, etc. probably should be called "terms" rather than "multiples". http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode89 ecp_256_32.c:89: * This is ~2KB of data. */ Nit: put the closing */ on its own line. If you make this change, please make the same change to all comment blocks in this file for consistency. (I am also fine with your current style.) http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode152 ecp_256_32.c:152: Nit: delete one blank line. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode163 ecp_256_32.c:163: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) Should we add parentheses around x in the macro's expansion? It is not necessary in the current code, and this macro already has a lot of parentheses. So I am fine to go without for clarity. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode165 ecp_256_32.c:165: /* felem_reduce_degree adds a multiple of p in order to cancel |carry|, Typo: felem_reduce_degree => felem_reduce_carry http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode168 ecp_256_32.c:168: * On entry: carry < 2**3, inout[0,2,...] < 2**29, inout[1,3,...] < 2**28. Question: The above says |carry| is a term at 2**257. But here we have carry < 2**3. These seem to contradict each other. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode184 ecp_256_32.c:184: // next line. Why don't we add and then subtract? Will that avoid the potential underflow altogether? Just curious. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode192 ecp_256_32.c:192: * On exit: in[0,2,...] < 2**30, in2[1,3,...] < 2**29 */ Are "in" and "in2" typos? I think you meant "out". http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode196 ecp_256_32.c:196: unsigned i; Nit: change "unsigned" to "unsigned int" throughout the file. The latter is more common in NSS. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode224 ecp_256_32.c:224: static const felem zero31 = { two31m3, two30m2, two31m2, two30p13m2, two31m2, two30m2, two31p24m2, two30m27m2, two31m2 }; Please fold this long line. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode257 ecp_256_32.c:257: /* felem_reduce_degree sets out = tmp/R mod p where tmp contains 64-bit words Please add "where R = 2**257", or define R earlier in the file, such as on line 34. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode260 ecp_256_32.c:260: * On entry: tmp[i] < 2**64 You said tmp contains the 29, 28 bit pattern. Did you mean tmp[0,2,...] < 2**30, tmp[1,3,...] < 2**29? The on-entry condition you stated here implies that's not the case. So I guess the 29, 28 bits are packed into 64-bit words? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode269 ecp_256_32.c:269: * (odd word): 0 | 28| 57| 85|114|142|142|171|199|228|256 Nit: should "(odd word):" be left aligned? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode321 ecp_256_32.c:321: * in Montgomery form. This paragraph should be moved to the comment block in front of this function, because it explains why this function computes out = tmp/R mod p. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode375 ecp_256_32.c:375: * So the greatest amount is added to tmp2[10/12]. If tmp2[10/12] has an What does tmp2[10/12] mean? tmp2[10] and tmp2[12]? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode388 ecp_256_32.c:388: exactly hits word 8. */ Nit: add '*' to the left of this comment line. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode411 ecp_256_32.c:411: * 0x1e000000 = 2**29 - 2**25. Since we haved updated i, the 9th word Is "we have updated i" correct? This loop does not do i++ in the middle of the loop. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode458 ecp_256_32.c:458: uint64_t tmp[17]; You can use u64 instead of uint64_t here and in the typecasts. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode463 ecp_256_32.c:463: ((uint64_t) in[1]) * (in[1] << 1); Just curious: where do these << 1 and << 2 come from? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode609 ecp_256_32.c:609: memcpy(out, in, sizeof(felem)); Does C allow us to assign an array directly? I know we can do that with structs. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode678 ecp_256_32.c:678: felem_mul(out, ftmp2, ftmp); /* 2^256 - 2^224 + 2^192 + 2^96 - 3 */ How did you figure out this sequence?! :-) http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode690 ecp_256_32.c:690: for (i = 0;; i++) { In some other for loops you still put the i < NLIMBS condition here. It would be good to be consistent. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode809 ecp_256_32.c:809: * coordinates. An affine point (x', y') is x=x'/z, y=y'/z in Jacobian form. */ What is z? Should the Jacobian form include the z coordinate? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1020 ecp_256_32.c:1020: } Do you think this is a little clearer? for (j = 0; j < NLIMBS; j++, table++) { out_x[j] |= *table & mask; } http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1029 ecp_256_32.c:1029: static void select_jacobian_point(felem out_x, felem out_y, felem out_z, Typo: select_affine_point => select_jacobian_point http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1038 ecp_256_32.c:1038: table += 3*NLIMBS; Why does this function add an offset to table, but the previous function (select_affine_point) doesn't? Should this fact be documented? Or should we have the caller add this offset before calling this function? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1127 ecp_256_32.c:1127: felem_square(z_inv_sq, z_inv); In your formula on line 809, I see x=x'/z, y=y'/z. I only see z inverse. So I don't understand why you're using z_inv_sq here. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1129 ecp_256_32.c:1129: felem_mul(z_inv, z_inv, z_inv_sq); Isn't this z inverse cubed? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1193 ecp_256_32.c:1193: Delete a blank line? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1200 ecp_256_32.c:1200: static const u32 kRInvDigits[8] = {0x80000000, 1, 0xffffffff, 0, 0x80000001, 0xfffffffe, 1, 0x7fffffff}; Fold this long line. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1215 ecp_256_32.c:1215: static mp_err to_montgomery(felem out, const mp_int *in, const ECGroup *group) { Put the opening '{' for this function on its own line. Same for all subsequent functions. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1241 ecp_256_32.c:1241: mp_clear(&in_shifted); I believe we need to do this mp_clear call under the CLEANUP: label so that in_shifted is also freed on error. Make the same change to the next function. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1249 ecp_256_32.c:1249: const ECGroup *group) { Adjust the indentation. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1290 ecp_256_32.c:1290: mp_int *out_x, mp_int *out_y, Does freebl follow the convention that output arguments (out_x, out_y) should be listed before the input argument (n)? I guess this function's prototype is prescribed? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1296 ecp_256_32.c:1296: // FIXME(agl): test that n < order. Use C-style /* */ comment delimiters. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1332 ecp_256_32.c:1332: /* ec_GFp_nistp256_point_mul sets {out_x,out_y} = n1*G + n2*{in_x,in_y}, where Typo: ec_GFp_nistp256_point_mul => ec_GFp_nistp256_points_mul_vartime http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1339 ecp_256_32.c:1339: (const mp_int *n1, const mp_int *n2, Put the opening '(' on the previous line.
http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode331 ecp_256_32.c:331: for (i = 0; i < 9; i += 2) { 9 => NLIMBS ? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode398 ecp_256_32.c:398: break; I discovered that this used to be i++ if (i == NLIMBS) break; I guess you or ekasper thinks the new form is clearer? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode411 ecp_256_32.c:411: * 0x1e000000 = 2**29 - 2**25. Since we haved updated i, the 9th word Should this comment say: Since we haved not updated i, the 9th word from i+1 is i+10. because tmp2[i + 10] is the last element updated by this loop? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1239 ecp_256_32.c:1239: } Should we write this for loop using the prevalent form in this file? for (i = 0; i < NLIMBS; i++) { out[i] = MP_DIGIT(&in_shifted, 0) & kBottom29Bits; mp_div_d(&in_shifted, kTwo29, &in_shifted, NULL); i++; if (i == NLIMBS) break; out[i] = MP_DIGIT(&in_shifted, 0) & kBottom28Bits; mp_div_d(&in_shifted, kTwo28, &in_shifted, NULL); }
http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c File ecp_256_32.c (right): http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1 ecp_256_32.c:1: #include "prcpucfg.h" On 2012/11/15 22:58:41, wtc wrote: > > 1. Please copy the MPL 2 header from another file. Done. > 2. NSS doesn't have style recommendations for headers. I > suggest we list the headers in two groups: Done. > 3. The <stdint.h> header could be problematic when we integrate > this file with upstream NSS. A solution is to replace > <stdint.h> with "prtypes.h" and replace uint8_t, uint32_t, > etc. with PRUint8, PRUint32, etc. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode56 ecp_256_32.c:56: static const limb k2P[NLIMBS] = { On 2012/11/15 22:58:41, wtc wrote: > > Use the felem typedef when you define kZero, kP, and k2P, > as in the definition of kOne. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode60 ecp_256_32.c:60: }; On 2012/11/15 22:58:41, wtc wrote: > > Add a blank line here? Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode86 ecp_256_32.c:86: * The second table follows the same style, but the multiples are 2**32G, On 2012/11/15 22:58:41, wtc wrote: > > Nit: 2**32G, 2**96G, etc. probably should be called "terms" > rather than "multiples". Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode89 ecp_256_32.c:89: * This is ~2KB of data. */ On 2012/11/15 22:58:41, wtc wrote: > If you make this change, please make the same change to > all comment blocks in this file for consistency. (I am > also fine with your current style.) Done for all multi-line comments in the file. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode152 ecp_256_32.c:152: On 2012/11/15 22:58:41, wtc wrote: > > Nit: delete one blank line. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode163 ecp_256_32.c:163: #define NON_ZERO_TO_ALL_ONES(x) (~((u32) (((s32) (x-1)) >> 31))) > Should we add parentheses around x in the macro's expansion? Despite the noted abundance of them already, I think so. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode165 ecp_256_32.c:165: /* felem_reduce_degree adds a multiple of p in order to cancel |carry|, On 2012/11/15 22:58:41, wtc wrote: > > Typo: felem_reduce_degree => felem_reduce_carry Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode168 ecp_256_32.c:168: * On entry: carry < 2**3, inout[0,2,...] < 2**29, inout[1,3,...] < 2**28. > Question: The above says |carry| is a term at 2**257. But > here we have carry < 2**3. These seem to contradict each > other. |carry| is the coefficient of the term at 2**257, but it is less than 8. In the same way that, in a big-endian, 32-bit word, the first byte is a term at 2**24, and is less than 256. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode184 ecp_256_32.c:184: // next line. On 2012/11/15 22:58:41, wtc wrote: > > Why don't we add and then subtract? Will that avoid the > potential underflow altogether? Just curious. Since we're using two's complement, adding then subtracting is exactly the same as subtracting and then adding. I added the comment because ekasper asked for it when she missed the " & carry_mask". I think I prefer to keep the subtraction nearer its cause (the addition to [6]). http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode192 ecp_256_32.c:192: * On exit: in[0,2,...] < 2**30, in2[1,3,...] < 2**29 */ On 2012/11/15 22:58:41, wtc wrote: > > Are "in" and "in2" typos? I think you meant "out". Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode196 ecp_256_32.c:196: unsigned i; On 2012/11/15 22:58:41, wtc wrote: > > Nit: change "unsigned" to "unsigned int" throughout the file. > The latter is more common in NSS. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode224 ecp_256_32.c:224: static const felem zero31 = { two31m3, two30m2, two31m2, two30p13m2, two31m2, two30m2, two31p24m2, two30m27m2, two31m2 }; On 2012/11/15 22:58:41, wtc wrote: > > Please fold this long line. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode257 ecp_256_32.c:257: /* felem_reduce_degree sets out = tmp/R mod p where tmp contains 64-bit words On 2012/11/15 22:58:41, wtc wrote: > > Please add "where R = 2**257", or define R earlier in the file, > such as on line 34. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode260 ecp_256_32.c:260: * On entry: tmp[i] < 2**64 On 2012/11/15 22:58:41, wtc wrote: > > You said tmp contains the 29, 28 bit pattern. I've changed it to say "bit positions". The values themselves can take the whole 64-bit range, but they are coefficients of a polynomial-like equation with terms at 2**0, 2**29, 2**57 etc. Hopefully the table makes things a little clearer. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode269 ecp_256_32.c:269: * (odd word): 0 | 28| 57| 85|114|142|142|171|199|228|256 On 2012/11/15 22:58:41, wtc wrote: > > Nit: should "(odd word):" be left aligned? It's right aligned because the row contains the starting bits when starting on an odd-numbered word. I've changed it to read "odd phase". Maybe that's clearer? http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode321 ecp_256_32.c:321: * in Montgomery form. On 2012/11/15 22:58:41, wtc wrote: > > This paragraph should be moved to the comment block in front > of this function, because it explains why this function computes out = tmp/R mod > p. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode331 ecp_256_32.c:331: for (i = 0; i < 9; i += 2) { On 2012/11/16 05:28:53, wtc wrote: > > 9 => NLIMBS ? Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode375 ecp_256_32.c:375: * So the greatest amount is added to tmp2[10/12]. If tmp2[10/12] has an > What does tmp2[10/12] mean? tmp2[10] and tmp2[12]? Yes. Have expanded that in the comment. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode388 ecp_256_32.c:388: exactly hits word 8. */ On 2012/11/15 22:58:41, wtc wrote: > > Nit: add '*' to the left of this comment line. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode398 ecp_256_32.c:398: break; > I guess you or ekasper thinks the new form is clearer? ekasper asked for it. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode411 ecp_256_32.c:411: * 0x1e000000 = 2**29 - 2**25. Since we haved updated i, the 9th word On 2012/11/16 05:28:53, wtc wrote: > Since we haved not updated i, the 9th word > from i+1 is i+10. Yes, thanks, there was a missing "not" in there. I also duplicated a column in the table at the top of the function. After fixing that, the comment reads: /* At position 199, which is the starting bit of the 8th word when * dealing with a context starting on an odd word, we have a factor of * 0x1e000000 = 2**29 - 2**25. Since we have not updated i, the 8th * word from i+1 is i+8. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode458 ecp_256_32.c:458: uint64_t tmp[17]; On 2012/11/15 22:58:41, wtc wrote: > > You can use u64 instead of uint64_t here and in the typecasts. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode463 ecp_256_32.c:463: ((uint64_t) in[1]) * (in[1] << 1); > Just curious: where do these << 1 and << 2 come from? Two sources: When the indexes unequal, a factor of two comes from the fact that we're doing a squaring. So, in a multiplication of two different values we would have in1[0] * in2[1] + in1[1] + in2[0]. But here, as in1 == in2, thus just 2*(in[0] * in[1]). There are also factors of two that arise from the limb pattern. When multiplying two terms at 2**29, the result is a term at 2**57, rather than 2**58 which would be natural. Therefore the result has to be shifted up a bit. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode609 ecp_256_32.c:609: memcpy(out, in, sizeof(felem)); > Does C allow us to assign an array directly? I know we can > do that with structs. I'm not sure. Since neither of us are sure, I'd still want to use the obvious code even if it worked :) http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode678 ecp_256_32.c:678: felem_mul(out, ftmp2, ftmp); /* 2^256 - 2^224 + 2^192 + 2^96 - 3 */ > How did you figure out this sequence?! :-) Pen and paper, if I recall. In general, the problem of calculating these sequences is NP-complete and is known as the `addition chain' problem. I am not sure that this chain is optimal, but it works. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode690 ecp_256_32.c:690: for (i = 0;; i++) { On 2012/11/15 22:58:41, wtc wrote: > > In some other for loops you still put the i < NLIMBS condition > here. It would be good to be consistent. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode809 ecp_256_32.c:809: * coordinates. An affine point (x', y') is x=x'/z, y=y'/z in Jacobian form. */ > Should the Jacobian form include the z coordinate? Yes, thanks. I also messed up the definition of Jacobian here and confused it with projective coordinates. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1020 ecp_256_32.c:1020: } > Do you think this is a little clearer? > > for (j = 0; j < NLIMBS; j++, table++) { > out_x[j] |= *table & mask; > } Done. (And 4x below.) http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1029 ecp_256_32.c:1029: static void select_jacobian_point(felem out_x, felem out_y, felem out_z, On 2012/11/15 22:58:41, wtc wrote: > > Typo: select_affine_point => select_jacobian_point Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1038 ecp_256_32.c:1038: table += 3*NLIMBS; On 2012/11/15 22:58:41, wtc wrote: > Why does this function add an offset to table, but > the previous function (select_affine_point) doesn't? Have added comment. /* The implicit value at index 0 is all zero. We don't need to perform that * iteration of the loop because we already set out_* to zero. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1127 ecp_256_32.c:1127: felem_square(z_inv_sq, z_inv); On 2012/11/15 22:58:41, wtc wrote: > > In your formula on line 809, I see x=x'/z, y=y'/z. I only > see z inverse. So I don't understand why you're using > z_inv_sq here. I had written down the formula for projective coordinates, but I'm using Jacobian here! http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1193 ecp_256_32.c:1193: On 2012/11/15 22:58:41, wtc wrote: > > Delete a blank line? Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1200 ecp_256_32.c:1200: static const u32 kRInvDigits[8] = {0x80000000, 1, 0xffffffff, 0, 0x80000001, 0xfffffffe, 1, 0x7fffffff}; On 2012/11/15 22:58:41, wtc wrote: > > Fold this long line. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1215 ecp_256_32.c:1215: static mp_err to_montgomery(felem out, const mp_int *in, const ECGroup *group) { On 2012/11/15 22:58:41, wtc wrote: > > Put the opening '{' for this function on its own line. > Same for all subsequent functions. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1239 ecp_256_32.c:1239: } On 2012/11/16 05:28:53, wtc wrote: > > Should we write this for loop using the prevalent form in > this file? > > for (i = 0; i < NLIMBS; i++) { > out[i] = MP_DIGIT(&in_shifted, 0) & kBottom29Bits; > mp_div_d(&in_shifted, kTwo29, &in_shifted, NULL); > > i++; > if (i == NLIMBS) > break; > > out[i] = MP_DIGIT(&in_shifted, 0) & kBottom28Bits; > mp_div_d(&in_shifted, kTwo28, &in_shifted, NULL); > } Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1241 ecp_256_32.c:1241: mp_clear(&in_shifted); On 2012/11/15 22:58:41, wtc wrote: > > I believe we need to do this mp_clear call under the > CLEANUP: label so that in_shifted is also freed on error. > > Make the same change to the next function. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1249 ecp_256_32.c:1249: const ECGroup *group) { On 2012/11/15 22:58:41, wtc wrote: > > Adjust the indentation. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1290 ecp_256_32.c:1290: mp_int *out_x, mp_int *out_y, On 2012/11/15 22:58:41, wtc wrote: > I guess this function's prototype is prescribed? Yes, this function's prototype matches base_point_mul elsewhere in freebl. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1296 ecp_256_32.c:1296: // FIXME(agl): test that n < order. On 2012/11/15 22:58:41, wtc wrote: > > Use C-style /* */ comment delimiters. Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1332 ecp_256_32.c:1332: /* ec_GFp_nistp256_point_mul sets {out_x,out_y} = n1*G + n2*{in_x,in_y}, where On 2012/11/15 22:58:41, wtc wrote: > > Typo: ec_GFp_nistp256_point_mul => ec_GFp_nistp256_points_mul_vartime Done. http://codereview.chromium.org/10944017/diff/12001/ecp_256_32.c#newcode1339 ecp_256_32.c:1339: (const mp_int *n1, const mp_int *n2, On 2012/11/15 22:58:41, wtc wrote: > > Put the opening '(' on the previous line. Done.
For reference, here's a copy of the file and the patch to NSS that wires it in. I'm running Chrome with it currently. Cheers AGL |