OLD | NEW |
---|---|
1 ============================ | 1 ============================ |
2 PNaCl C/C++ Language Support | 2 PNaCl C/C++ Language Support |
3 ============================ | 3 ============================ |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :backlinks: none | 7 :backlinks: none |
8 :depth: 3 | 8 :depth: 3 |
9 | 9 |
10 Source language support | 10 Source language support |
(...skipping 209 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
220 <http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors >`_ | 220 <http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors >`_ |
221 and `GCC vectors | 221 and `GCC vectors |
222 <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these | 222 <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these |
223 are well supported by different hardware platforms and don't require any | 223 are well supported by different hardware platforms and don't require any |
224 new compiler intrinsics. | 224 new compiler intrinsics. |
225 | 225 |
226 Vector types can be used through the ``vector_size`` attribute: | 226 Vector types can be used through the ``vector_size`` attribute: |
227 | 227 |
228 .. naclcode:: | 228 .. naclcode:: |
229 | 229 |
230 typedef int v4s __attribute__((vector_size(16))); | 230 #define VECTOR_BYTES 16 |
231 typedef int v4s __attribute__((vector_size(VECTOR_BYTES))); | |
231 v4s a = {1,2,3,4}; | 232 v4s a = {1,2,3,4}; |
232 v4s b = {5,6,7,8}; | 233 v4s b = {5,6,7,8}; |
233 v4s c, d, e; | 234 v4s c, d, e; |
234 c = b + 1; /* c = b + {1,1,1,1}; */ | 235 c = b + 1; /* c = b + {1,1,1,1}; */ |
235 d = 2 * b; /* d = {2,2,2,2} * b; */ | 236 d = 2 * b; /* d = {2,2,2,2} * b; */ |
236 e = c + d; | 237 e = c + d; |
237 | 238 |
238 Vector comparisons are represented as a bitmask as wide as the compared | 239 Vector comparisons are represented as a bitmask as wide as the compared |
239 elements of all ``0`` or all ``1``: | 240 elements of all ``0`` or all ``1``: |
240 | 241 |
(...skipping 65 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
306 .. naclcode:: | 307 .. naclcode:: |
307 | 308 |
308 typedef unsigned v4u __attribute__((vector_size(16))); | 309 typedef unsigned v4u __attribute__((vector_size(16))); |
309 template<typename T> | 310 template<typename T> |
310 void print(const T v) { | 311 void print(const T v) { |
311 for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i) | 312 for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i) |
312 std::cout << v[i] << ' '; | 313 std::cout << v[i] << ' '; |
313 std::cout << std::endl; | 314 std::cout << std::endl; |
314 } | 315 } |
315 | 316 |
316 Vector shuffles are currently unsupported but will be added soon. | 317 Vector shuffles (often called permutation or swizzle) operations are |
318 supported through ``__builtin_shufflevector``. The builtin has two | |
319 vector arguments of the same element type, followed by a list of | |
320 constant integers that specify the elements indices of the first two | |
Derek Schuff
2014/05/06 17:29:50
"elements indices" -> "element indices"
JF
2014/05/06 21:11:02
Done.
| |
321 vectors that should be extracted and returned in a new vector. These | |
322 element indices are numbered sequentially starting with the first | |
323 vector, continuing into the second vector. Thus, if ``vec1`` is a | |
324 4-element vector, index ``5`` would refer to the second element of | |
325 ``vec2``. An index of ``-1`` can be used to indicate that the | |
326 corresponding element in the returned vector is a don’t care and can be | |
327 optimized by the backend. | |
328 | |
329 The result of ``__builtin_shufflevector`` is a vector with the same | |
330 element type as ``vec1`` / ``vec2`` but that has an element count equal | |
331 to the number of indices specified. | |
332 | |
333 .. naclcode:: | |
334 | |
335 // identity operation - return 4-element vector v1. | |
336 __builtin_shufflevector(v1, v1, 0, 1, 2, 3) | |
337 | |
338 // "Splat" element 0 of V1 into a 4-element result. | |
339 __builtin_shufflevector(V1, V1, 0, 0, 0, 0) | |
340 | |
341 // Reverse 4-element vector V1. | |
342 __builtin_shufflevector(V1, V1, 3, 2, 1, 0) | |
343 | |
344 // Concatenate every other element of 4-element vectors V1 and V2. | |
345 __builtin_shufflevector(V1, V2, 0, 2, 4, 6) | |
346 | |
347 // Concatenate every other element of 8-element vectors V1 and V2. | |
348 __builtin_shufflevector(V1, V2, 0, 2, 4, 6, 8, 10, 12, 14) | |
349 | |
350 // Shuffle v1 with some elements being undefined | |
351 __builtin_shufflevector(v1, v1, 3, -1, 1, -1) | |
317 | 352 |
318 Auto-Vectorization | 353 Auto-Vectorization |
319 ------------------ | 354 ------------------ |
320 | 355 |
321 Auto-vectorization is currently not enabled for Portable Native Client, | 356 Auto-vectorization is currently not enabled for Portable Native Client, |
322 but will be in a future release. | 357 but will be in a future release. |
323 | 358 |
324 Undefined Behavior | 359 Undefined Behavior |
325 ================== | 360 ================== |
326 | 361 |
(...skipping 76 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
403 A similar feature is **thread suspension**: The ability to | 438 A similar feature is **thread suspension**: The ability to |
404 asynchronously suspend and resume a thread and inspect or modify its | 439 asynchronously suspend and resume a thread and inspect or modify its |
405 execution state (such as register state). | 440 execution state (such as register state). |
406 | 441 |
407 Neither PNaCl nor NaCl currently support asynchronous interruption | 442 Neither PNaCl nor NaCl currently support asynchronous interruption |
408 or suspension of threads. | 443 or suspension of threads. |
409 | 444 |
410 If PNaCl were to support either of these, the interaction of | 445 If PNaCl were to support either of these, the interaction of |
411 ``volatile`` and atomics with same-thread signal handling would need | 446 ``volatile`` and atomics with same-thread signal handling would need |
412 to be carefully detailed. | 447 to be carefully detailed. |
OLD | NEW |