| 
    
      
  | 
  
 Chromium Code Reviews
        
  DescriptionSk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast on Windows and Linux, and may help mobile x86.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1817353005
Committed: https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f
   
  Patch Set 1 #Patch Set 2 : tweaks #Patch Set 3 : derp #Messages
    Total messages: 25 (12 generated)
     
  
  
 Description was changed from
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performst most uniformly
    across different chips.
BUG=skia:
==========
to
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performst most uniformly
    across different chips.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
          
 Description was changed from
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performst most uniformly
    across different chips.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
to
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast(er) on Windows, Linux, and mobile x86.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
          
 Description was changed from
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast(er) on Windows, Linux, and mobile x86.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
to
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast on Windows and Linux, and may help mobile x86.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
          
 The CQ bit was checked by mtklein@chromium.org to run a CQ dry run 
 Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/1 
 mtklein@chromium.org changed reviewers: + fmalita@chromium.org 
 The CQ bit was checked by mtklein@chromium.org 
 
 CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/20001 
 Note for Reviewers: The CQ is waiting for an approval. If you believe that the CL is not ready yet, or if you would like to L-G-T-M with comments then please uncheck the CQ checkbox. Waiting for LGTM from valid reviewer(s) till 2016-03-23 08:22 UTC 
 The CQ bit was unchecked by commit-bot@chromium.org 
 Try jobs failed on following builders: Test-Ubuntu-GCC-ShuttleA-GPU-GTX660-x86_64-Release-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Ubuntu-GCC-ShuttleA-GPU...) Build-Mac-Clang-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac-Clang-x86_...) Build-Ubuntu-Clang-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-Clang-x...) Build-Ubuntu-GCC-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-x86...) 
 The CQ bit was checked by mtklein@google.com to run a CQ dry run 
 Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/40001 
 The CQ bit was unchecked by commit-bot@chromium.org 
 Dry run: This issue passed the CQ dry run. 
 mtklein@chromium.org changed reviewers: + herb@google.com - mtklein@google.com 
 +Herb 
 The CQ bit was checked by fmalita@chromium.org 
 lgtm 
 CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/40001 
 
            
              
                Message was sent while issue was closed.
              
            
             
          
        Description was changed from
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast on Windows and Linux, and may help mobile x86.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
==========
to
==========
Sk4x4f: Simplify x86 down to SSE2.
  - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
    removing calls to _mm_shuffle_epi8().  Instead we use good old
    shifting and masking.
  - Performance is very similar to SSSE3, close enough I'm having trouble
    telling which is faster.  I think we should let ourselves circle back
    on whether we need an SSSE3 version later.  When possible it's nice
    to stick to SSE2: it's most available, and performs most uniformly
    across different chips.
This makes Sk4x4f fast on Windows and Linux, and may help mobile x86.
BUG=skia:
GOLD_TRYBOT_URL=
https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is...
Committed:
https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f
==========
          
 
            
              
                Message was sent while issue was closed.
              
            
             
          
        Committed patchset #3 (id:40001) as https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f 
 
            
              
                Message was sent while issue was closed.
              
            
             
          
        You may be able to move the shifts into the mults and divs for normalization. 
 
            
              
                Message was sent while issue was closed.
              
            
             
          
        lgtm  | 
    
