Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(130)

Side by Side Diff: source/common/unicode/unistr.h

Issue 1621843002: ICU 56 update step 1 (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/icu.git@561
Patch Set: Created 4 years, 11 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « source/common/unicode/uniset.h ('k') | source/common/unicode/unorm.h » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 /* 1 /*
2 ********************************************************************** 2 **********************************************************************
3 * Copyright (C) 1998-2013, International Business Machines 3 * Copyright (C) 1998-2015, International Business Machines
4 * Corporation and others. All Rights Reserved. 4 * Corporation and others. All Rights Reserved.
5 ********************************************************************** 5 **********************************************************************
6 * 6 *
7 * File unistr.h 7 * File unistr.h
8 * 8 *
9 * Modification History: 9 * Modification History:
10 * 10 *
11 * Date Name Description 11 * Date Name Description
12 * 09/25/98 stephen Creation. 12 * 09/25/98 stephen Creation.
13 * 11/11/98 stephen Changed per 11/9 code review. 13 * 11/11/98 stephen Changed per 11/9 code review.
(...skipping 151 matching lines...) Expand 10 before | Expand all | Expand 10 after
165 #ifndef UNISTR_FROM_STRING_EXPLICIT 165 #ifndef UNISTR_FROM_STRING_EXPLICIT
166 # if defined(U_COMBINED_IMPLEMENTATION) || defined(U_COMMON_IMPLEMENTATION) || d efined(U_I18N_IMPLEMENTATION) || defined(U_IO_IMPLEMENTATION) 166 # if defined(U_COMBINED_IMPLEMENTATION) || defined(U_COMMON_IMPLEMENTATION) || d efined(U_I18N_IMPLEMENTATION) || defined(U_IO_IMPLEMENTATION)
167 // Auto-"explicit" in ICU library code. 167 // Auto-"explicit" in ICU library code.
168 # define UNISTR_FROM_STRING_EXPLICIT explicit 168 # define UNISTR_FROM_STRING_EXPLICIT explicit
169 # else 169 # else
170 // Empty by default for source code compatibility. 170 // Empty by default for source code compatibility.
171 # define UNISTR_FROM_STRING_EXPLICIT 171 # define UNISTR_FROM_STRING_EXPLICIT
172 # endif 172 # endif
173 #endif 173 #endif
174 174
175 /* Cannot make the following #ifndef U_HIDE_INTERNAL_API,
176 it is used to construct other non-internal constants */
177 /**
178 * \def UNISTR_OBJECT_SIZE
179 * Desired sizeof(UnicodeString) in bytes.
180 * It should be a multiple of sizeof(pointer) to avoid unusable space for paddin g.
181 * The object size may want to be a multiple of 16 bytes,
182 * which is a common granularity for heap allocation.
183 *
184 * Any space inside the object beyond sizeof(vtable pointer) + 2
185 * is available for storing short strings inside the object.
186 * The bigger the object, the longer a string that can be stored inside the obje ct,
187 * without additional heap allocation.
188 *
189 * Depending on a platform's pointer size, pointer alignment requirements,
190 * and struct padding, the compiler will usually round up sizeof(UnicodeString)
191 * to 4 * sizeof(pointer) (or 3 * sizeof(pointer) for P128 data models),
192 * to hold the fields for heap-allocated strings.
193 * Such a minimum size also ensures that the object is easily large enough
194 * to hold at least 2 UChars, for one supplementary code point (U16_MAX_LENGTH).
195 *
196 * sizeof(UnicodeString) >= 48 should work for all known platforms.
197 *
198 * For example, on a 64-bit machine where sizeof(vtable pointer) is 8,
199 * sizeof(UnicodeString) = 64 would leave space for
200 * (64 - sizeof(vtable pointer) - 2) / U_SIZEOF_UCHAR = (64 - 8 - 2) / 2 = 27
201 * UChars stored inside the object.
202 *
203 * The minimum object size on a 64-bit machine would be
204 * 4 * sizeof(pointer) = 4 * 8 = 32 bytes,
205 * and the internal buffer would hold up to 11 UChars in that case.
206 *
207 * @see U16_MAX_LENGTH
208 * @draft ICU 56
209 */
210 #ifndef UNISTR_OBJECT_SIZE
211 # define UNISTR_OBJECT_SIZE 64
212 #endif
213
175 /** 214 /**
176 * UnicodeString is a string class that stores Unicode characters directly and p rovides 215 * UnicodeString is a string class that stores Unicode characters directly and p rovides
177 * similar functionality as the Java String and StringBuffer classes. 216 * similar functionality as the Java String and StringBuffer/StringBuilder class es.
178 * It is a concrete implementation of the abstract class Replaceable (for transl iteration). 217 * It is a concrete implementation of the abstract class Replaceable (for transl iteration).
179 * 218 *
219 * A UnicodeString may also "alias" an external array of characters
220 * (that is, point to it, rather than own the array)
221 * whose lifetime must then at least match the lifetime of the aliasing object.
222 * This aliasing may be preserved when returning a UnicodeString by value,
223 * depending on the compiler and the function implementation,
224 * via Return Value Optimization (RVO) or the move assignment operator.
225 * (However, the copy assignment operator does not preserve aliasing.)
226 * For details see the description of storage models at the end of the class API docs
227 * and in the User Guide chapter linked from there.
228 *
180 * The UnicodeString class is not suitable for subclassing. 229 * The UnicodeString class is not suitable for subclassing.
181 * 230 *
182 * <p>For an overview of Unicode strings in C and C++ see the 231 * <p>For an overview of Unicode strings in C and C++ see the
183 * <a href="http://icu-project.org/userguide/strings.html">User Guide Strings ch apter</a>.</p> 232 * <a href="http://userguide.icu-project.org/strings#TOC-Strings-in-C-C-">User G uide Strings chapter</a>.</p>
184 * 233 *
185 * <p>In ICU, a Unicode string consists of 16-bit Unicode <em>code units</em>. 234 * <p>In ICU, a Unicode string consists of 16-bit Unicode <em>code units</em>.
186 * A Unicode character may be stored with either one code unit 235 * A Unicode character may be stored with either one code unit
187 * (the most common case) or with a matched pair of special code units 236 * (the most common case) or with a matched pair of special code units
188 * ("surrogates"). The data type for code units is UChar. 237 * ("surrogates"). The data type for code units is UChar.
189 * For single-character handling, a Unicode character code <em>point</em> is a v alue 238 * For single-character handling, a Unicode character code <em>point</em> is a v alue
190 * in the range 0..0x10ffff. ICU uses the UChar32 type for code points.</p> 239 * in the range 0..0x10ffff. ICU uses the UChar32 type for code points.</p>
191 * 240 *
192 * <p>Indexes and offsets into and lengths of strings always count code units, n ot code points. 241 * <p>Indexes and offsets into and lengths of strings always count code units, n ot code points.
193 * This is the same as with multi-byte char* strings in traditional string handl ing. 242 * This is the same as with multi-byte char* strings in traditional string handl ing.
(...skipping 34 matching lines...) Expand 10 before | Expand all | Expand 10 after
228 * This includes the const UnicodeString & parameters for 277 * This includes the const UnicodeString & parameters for
229 * copy construction, assignment, and cloning. 278 * copy construction, assignment, and cloning.
230 * 279 *
231 * <p>UnicodeString uses several storage methods. 280 * <p>UnicodeString uses several storage methods.
232 * String contents can be stored inside the UnicodeString object itself, 281 * String contents can be stored inside the UnicodeString object itself,
233 * in an allocated and shared buffer, or in an outside buffer that is "aliased". 282 * in an allocated and shared buffer, or in an outside buffer that is "aliased".
234 * Most of this is done transparently, but careful aliasing in particular provid es 283 * Most of this is done transparently, but careful aliasing in particular provid es
235 * significant performance improvements. 284 * significant performance improvements.
236 * Also, the internal buffer is accessible via special functions. 285 * Also, the internal buffer is accessible via special functions.
237 * For details see the 286 * For details see the
238 * <a href="http://icu-project.org/userguide/strings.html">User Guide Strings ch apter</a>.</p> 287 * <a href="http://userguide.icu-project.org/strings#TOC-Maximizing-Performance- with-the-UnicodeString-Storage-Model">User Guide Strings chapter</a>.</p>
239 * 288 *
240 * @see utf.h 289 * @see utf.h
241 * @see CharacterIterator 290 * @see CharacterIterator
242 * @stable ICU 2.0 291 * @stable ICU 2.0
243 */ 292 */
244 class U_COMMON_API UnicodeString : public Replaceable 293 class U_COMMON_API UnicodeString : public Replaceable
245 { 294 {
246 public: 295 public:
247 296
248 /** 297 /**
(...skipping 1225 matching lines...) Expand 10 before | Expand all | Expand 10 after
1474 * @param target UnicodeString into which to copy characters. 1523 * @param target UnicodeString into which to copy characters.
1475 * @return A reference to <TT>target</TT> 1524 * @return A reference to <TT>target</TT>
1476 * @stable ICU 2.0 1525 * @stable ICU 2.0
1477 */ 1526 */
1478 virtual void extractBetween(int32_t start, 1527 virtual void extractBetween(int32_t start,
1479 int32_t limit, 1528 int32_t limit,
1480 UnicodeString& target) const; 1529 UnicodeString& target) const;
1481 1530
1482 /** 1531 /**
1483 * Copy the characters in the range 1532 * Copy the characters in the range
1484 * [<tt>start</TT>, <tt>start + length</TT>) into an array of characters. 1533 * [<tt>start</TT>, <tt>start + startLength</TT>) into an array of characters.
1485 * All characters must be invariant (see utypes.h). 1534 * All characters must be invariant (see utypes.h).
1486 * Use US_INV as the last, signature-distinguishing parameter. 1535 * Use US_INV as the last, signature-distinguishing parameter.
1487 * 1536 *
1488 * This function does not write any more than <code>targetLength</code> 1537 * This function does not write any more than <code>targetCapacity</code>
1489 * characters but returns the length of the entire output string 1538 * characters but returns the length of the entire output string
1490 * so that one can allocate a larger buffer and call the function again 1539 * so that one can allocate a larger buffer and call the function again
1491 * if necessary. 1540 * if necessary.
1492 * The output string is NUL-terminated if possible. 1541 * The output string is NUL-terminated if possible.
1493 * 1542 *
1494 * @param start offset of first character which will be copied 1543 * @param start offset of first character which will be copied
1495 * @param startLength the number of characters to extract 1544 * @param startLength the number of characters to extract
1496 * @param target the target buffer for extraction, can be NULL 1545 * @param target the target buffer for extraction, can be NULL
1497 * if targetLength is 0 1546 * if targetLength is 0
1498 * @param targetCapacity the length of the target buffer 1547 * @param targetCapacity the length of the target buffer
(...skipping 303 matching lines...) Expand 10 before | Expand all | Expand 10 after
1802 1851
1803 //======================================== 1852 //========================================
1804 // Write operations 1853 // Write operations
1805 //======================================== 1854 //========================================
1806 1855
1807 /* Assignment operations */ 1856 /* Assignment operations */
1808 1857
1809 /** 1858 /**
1810 * Assignment operator. Replace the characters in this UnicodeString 1859 * Assignment operator. Replace the characters in this UnicodeString
1811 * with the characters from <TT>srcText</TT>. 1860 * with the characters from <TT>srcText</TT>.
1861 *
1862 * Starting with ICU 2.4, the assignment operator and the copy constructor
1863 * allocate a new buffer and copy the buffer contents even for readonly aliase s.
1864 * By contrast, the fastCopyFrom() function implements the old,
1865 * more efficient but less safe behavior
1866 * of making this string also a readonly alias to the same buffer.
1867 *
1868 * If the source object has an "open" buffer from getBuffer(minCapacity),
1869 * then the copy is an empty string.
1870 *
1812 * @param srcText The text containing the characters to replace 1871 * @param srcText The text containing the characters to replace
1813 * @return a reference to this 1872 * @return a reference to this
1814 * @stable ICU 2.0 1873 * @stable ICU 2.0
1874 * @see fastCopyFrom
1815 */ 1875 */
1816 UnicodeString &operator=(const UnicodeString &srcText); 1876 UnicodeString &operator=(const UnicodeString &srcText);
1817 1877
1818 /** 1878 /**
1819 * Almost the same as the assignment operator. 1879 * Almost the same as the assignment operator.
1820 * Replace the characters in this UnicodeString 1880 * Replace the characters in this UnicodeString
1821 * with the characters from <code>srcText</code>. 1881 * with the characters from <code>srcText</code>.
1822 * 1882 *
1823 * This function works the same as the assignment operator 1883 * This function works the same as the assignment operator
1824 * for all strings except for ones that are readonly aliases. 1884 * for all strings except for ones that are readonly aliases.
1825 * 1885 *
1826 * Starting with ICU 2.4, the assignment operator and the copy constructor 1886 * Starting with ICU 2.4, the assignment operator and the copy constructor
1827 * allocate a new buffer and copy the buffer contents even for readonly aliase s. 1887 * allocate a new buffer and copy the buffer contents even for readonly aliase s.
1828 * This function implements the old, more efficient but less safe behavior 1888 * This function implements the old, more efficient but less safe behavior
1829 * of making this string also a readonly alias to the same buffer. 1889 * of making this string also a readonly alias to the same buffer.
1830 * 1890 *
1831 * The fastCopyFrom function must be used only if it is known that the lifetim e of 1891 * The fastCopyFrom function must be used only if it is known that the lifetim e of
1832 * this UnicodeString does not exceed the lifetime of the aliased buffer 1892 * this UnicodeString does not exceed the lifetime of the aliased buffer
1833 * including its contents, for example for strings from resource bundles 1893 * including its contents, for example for strings from resource bundles
1834 * or aliases to string constants. 1894 * or aliases to string constants.
1835 * 1895 *
1896 * If the source object has an "open" buffer from getBuffer(minCapacity),
1897 * then the copy is an empty string.
1898 *
1836 * @param src The text containing the characters to replace. 1899 * @param src The text containing the characters to replace.
1837 * @return a reference to this 1900 * @return a reference to this
1838 * @stable ICU 2.4 1901 * @stable ICU 2.4
1839 */ 1902 */
1840 UnicodeString &fastCopyFrom(const UnicodeString &src); 1903 UnicodeString &fastCopyFrom(const UnicodeString &src);
1841 1904
1905 #ifndef U_HIDE_DRAFT_API
1906 #if U_HAVE_RVALUE_REFERENCES
1907 /**
1908 * Move assignment operator, might leave src in bogus state.
1909 * This string will have the same contents and state that the source string ha d.
1910 * The behavior is undefined if *this and src are the same object.
1911 * @param src source string
1912 * @return *this
1913 * @draft ICU 56
1914 */
1915 UnicodeString &operator=(UnicodeString &&src) U_NOEXCEPT {
1916 return moveFrom(src);
1917 }
1918 #endif
1919 /**
1920 * Move assignment, might leave src in bogus state.
1921 * This string will have the same contents and state that the source string ha d.
1922 * The behavior is undefined if *this and src are the same object.
1923 *
1924 * Can be called explicitly, does not need C++11 support.
1925 * @param src source string
1926 * @return *this
1927 * @draft ICU 56
1928 */
1929 UnicodeString &moveFrom(UnicodeString &src) U_NOEXCEPT;
1930
1931 /**
1932 * Swap strings.
1933 * @param other other string
1934 * @draft ICU 56
1935 */
1936 void swap(UnicodeString &other) U_NOEXCEPT;
1937
1938 /**
1939 * Non-member UnicodeString swap function.
1940 * @param s1 will get s2's contents and state
1941 * @param s2 will get s1's contents and state
1942 * @draft ICU 56
1943 */
1944 friend U_COMMON_API inline void U_EXPORT2
1945 swap(UnicodeString &s1, UnicodeString &s2) U_NOEXCEPT {
1946 s1.swap(s2);
1947 }
1948 #endif /* U_HIDE_DRAFT_API */
1949
1842 /** 1950 /**
1843 * Assignment operator. Replace the characters in this UnicodeString 1951 * Assignment operator. Replace the characters in this UnicodeString
1844 * with the code unit <TT>ch</TT>. 1952 * with the code unit <TT>ch</TT>.
1845 * @param ch the code unit to replace 1953 * @param ch the code unit to replace
1846 * @return a reference to this 1954 * @return a reference to this
1847 * @stable ICU 2.0 1955 * @stable ICU 2.0
1848 */ 1956 */
1849 inline UnicodeString& operator= (UChar ch); 1957 inline UnicodeString& operator= (UChar ch);
1850 1958
1851 /** 1959 /**
(...skipping 1229 matching lines...) Expand 10 before | Expand all | Expand 10 after
3081 * @param inv Signature-distinguishing paramater, use US_INV. 3189 * @param inv Signature-distinguishing paramater, use US_INV.
3082 * 3190 *
3083 * @see US_INV 3191 * @see US_INV
3084 * @stable ICU 3.2 3192 * @stable ICU 3.2
3085 */ 3193 */
3086 UnicodeString(const char *src, int32_t length, enum EInvariant inv); 3194 UnicodeString(const char *src, int32_t length, enum EInvariant inv);
3087 3195
3088 3196
3089 /** 3197 /**
3090 * Copy constructor. 3198 * Copy constructor.
3199 *
3200 * Starting with ICU 2.4, the assignment operator and the copy constructor
3201 * allocate a new buffer and copy the buffer contents even for readonly aliase s.
3202 * By contrast, the fastCopyFrom() function implements the old,
3203 * more efficient but less safe behavior
3204 * of making this string also a readonly alias to the same buffer.
3205 *
3206 * If the source object has an "open" buffer from getBuffer(minCapacity),
3207 * then the copy is an empty string.
3208 *
3091 * @param that The UnicodeString object to copy. 3209 * @param that The UnicodeString object to copy.
3092 * @stable ICU 2.0 3210 * @stable ICU 2.0
3211 * @see fastCopyFrom
3093 */ 3212 */
3094 UnicodeString(const UnicodeString& that); 3213 UnicodeString(const UnicodeString& that);
3095 3214
3215 #ifndef U_HIDE_DRAFT_API
3216 #if U_HAVE_RVALUE_REFERENCES
3217 /**
3218 * Move constructor, might leave src in bogus state.
3219 * This string will have the same contents and state that the source string ha d.
3220 * @param src source string
3221 * @draft ICU 56
3222 */
3223 UnicodeString(UnicodeString &&src) U_NOEXCEPT;
3224 #endif
3225 #endif /* U_HIDE_DRAFT_API */
3226
3096 /** 3227 /**
3097 * 'Substring' constructor from tail of source string. 3228 * 'Substring' constructor from tail of source string.
3098 * @param src The UnicodeString object to copy. 3229 * @param src The UnicodeString object to copy.
3099 * @param srcStart The offset into <tt>src</tt> at which to start copying. 3230 * @param srcStart The offset into <tt>src</tt> at which to start copying.
3100 * @stable ICU 2.2 3231 * @stable ICU 2.2
3101 */ 3232 */
3102 UnicodeString(const UnicodeString& src, int32_t srcStart); 3233 UnicodeString(const UnicodeString& src, int32_t srcStart);
3103 3234
3104 /** 3235 /**
3105 * 'Substring' constructor from subrange of source string. 3236 * 'Substring' constructor from subrange of source string.
(...skipping 245 matching lines...) Expand 10 before | Expand all | Expand 10 after
3351 const UnicodeString& srcText, 3482 const UnicodeString& srcText,
3352 int32_t srcStart, 3483 int32_t srcStart,
3353 int32_t srcLength); 3484 int32_t srcLength);
3354 3485
3355 UnicodeString& doReplace(int32_t start, 3486 UnicodeString& doReplace(int32_t start,
3356 int32_t length, 3487 int32_t length,
3357 const UChar *srcChars, 3488 const UChar *srcChars,
3358 int32_t srcStart, 3489 int32_t srcStart,
3359 int32_t srcLength); 3490 int32_t srcLength);
3360 3491
3492 UnicodeString& doAppend(const UnicodeString& src, int32_t srcStart, int32_t sr cLength);
3493 UnicodeString& doAppend(const UChar *srcChars, int32_t srcStart, int32_t srcLe ngth);
3494
3361 UnicodeString& doReverse(int32_t start, 3495 UnicodeString& doReverse(int32_t start,
3362 int32_t length); 3496 int32_t length);
3363 3497
3364 // calculate hash code 3498 // calculate hash code
3365 int32_t doHashCode(void) const; 3499 int32_t doHashCode(void) const;
3366 3500
3367 // get pointer to start of array 3501 // get pointer to start of array
3368 // these do not check for kOpenGetBuffer, unlike the public getBuffer() functi on 3502 // these do not check for kOpenGetBuffer, unlike the public getBuffer() functi on
3369 inline UChar* getArrayStart(void); 3503 inline UChar* getArrayStart(void);
3370 inline const UChar* getArrayStart(void) const; 3504 inline const UChar* getArrayStart(void) const;
3371 3505
3506 inline UBool hasShortLength() const;
3507 inline int32_t getShortLength() const;
3508
3372 // A UnicodeString object (not necessarily its current buffer) 3509 // A UnicodeString object (not necessarily its current buffer)
3373 // is writable unless it isBogus() or it has an "open" getBuffer(minCapacity). 3510 // is writable unless it isBogus() or it has an "open" getBuffer(minCapacity).
3374 inline UBool isWritable() const; 3511 inline UBool isWritable() const;
3375 3512
3376 // Is the current buffer writable? 3513 // Is the current buffer writable?
3377 inline UBool isBufferWritable() const; 3514 inline UBool isBufferWritable() const;
3378 3515
3379 // None of the following does releaseArray(). 3516 // None of the following does releaseArray().
3380 inline void setLength(int32_t len); // sets only fShortLength and fLeng th 3517 inline void setZeroLength();
3381 inline void setToEmpty(); // sets fFlags=kShortString 3518 inline void setShortLength(int32_t len);
3382 inline void setArray(UChar *array, int32_t len, int32_t capacity); // does not set fFlags 3519 inline void setLength(int32_t len);
3520 inline void setToEmpty();
3521 inline void setArray(UChar *array, int32_t len, int32_t capacity); // sets len gth but not flags
3383 3522
3384 // allocate the array; result may be fStackBuffer 3523 // allocate the array; result may be the stack buffer
3385 // sets refCount to 1 if appropriate 3524 // sets refCount to 1 if appropriate
3386 // sets fArray, fCapacity, and fFlags 3525 // sets fArray, fCapacity, and flags
3526 // sets length to 0
3387 // returns boolean for success or failure 3527 // returns boolean for success or failure
3388 UBool allocate(int32_t capacity); 3528 UBool allocate(int32_t capacity);
3389 3529
3390 // release the array if owned 3530 // release the array if owned
3391 void releaseArray(void); 3531 void releaseArray(void);
3392 3532
3393 // turn a bogus string into an empty one 3533 // turn a bogus string into an empty one
3394 void unBogus(); 3534 void unBogus();
3395 3535
3396 // implements assigment operator, copy constructor, and fastCopyFrom() 3536 // implements assigment operator, copy constructor, and fastCopyFrom()
3397 UnicodeString &copyFrom(const UnicodeString &src, UBool fastCopy=FALSE); 3537 UnicodeString &copyFrom(const UnicodeString &src, UBool fastCopy=FALSE);
3398 3538
3539 // Copies just the fields without memory management.
3540 void copyFieldsFrom(UnicodeString &src, UBool setSrcToBogus) U_NOEXCEPT;
3541
3399 // Pin start and limit to acceptable values. 3542 // Pin start and limit to acceptable values.
3400 inline void pinIndex(int32_t& start) const; 3543 inline void pinIndex(int32_t& start) const;
3401 inline void pinIndices(int32_t& start, 3544 inline void pinIndices(int32_t& start,
3402 int32_t& length) const; 3545 int32_t& length) const;
3403 3546
3404 #if !UCONFIG_NO_CONVERSION 3547 #if !UCONFIG_NO_CONVERSION
3405 3548
3406 /* Internal extract() using UConverter. */ 3549 /* Internal extract() using UConverter. */
3407 int32_t doExtract(int32_t start, int32_t length, 3550 int32_t doExtract(int32_t start, int32_t length,
3408 char *dest, int32_t destCapacity, 3551 char *dest, int32_t destCapacity,
(...skipping 51 matching lines...) Expand 10 before | Expand all | Expand 10 after
3460 UnicodeString & 3603 UnicodeString &
3461 caseMap(const UCaseMap *csm, UStringCaseMapper *stringCaseMapper); 3604 caseMap(const UCaseMap *csm, UStringCaseMapper *stringCaseMapper);
3462 3605
3463 // ref counting 3606 // ref counting
3464 void addRef(void); 3607 void addRef(void);
3465 int32_t removeRef(void); 3608 int32_t removeRef(void);
3466 int32_t refCount(void) const; 3609 int32_t refCount(void) const;
3467 3610
3468 // constants 3611 // constants
3469 enum { 3612 enum {
3470 // Set the stack buffer size so that sizeof(UnicodeString) is, 3613 /**
3471 // naturally (without padding), a multiple of sizeof(pointer). 3614 * Size of stack buffer for short strings.
3472 US_STACKBUF_SIZE= sizeof(void *)==4 ? 13 : 15, // Size of stack buffer for s hort strings 3615 * Must be at least U16_MAX_LENGTH for the single-code point constructor to work.
3473 kInvalidUChar=0xffff, // invalid UChar index 3616 * @see UNISTR_OBJECT_SIZE
3617 */
3618 US_STACKBUF_SIZE=(int32_t)(UNISTR_OBJECT_SIZE-sizeof(void *)-2)/U_SIZEOF_UCH AR,
3619 kInvalidUChar=0xffff, // U+FFFF returned by charAt(invalid index)
3474 kGrowSize=128, // grow size for this buffer 3620 kGrowSize=128, // grow size for this buffer
3475 kInvalidHashCode=0, // invalid hash code 3621 kInvalidHashCode=0, // invalid hash code
3476 kEmptyHashCode=1, // hash code for empty string 3622 kEmptyHashCode=1, // hash code for empty string
3477 3623
3478 // bit flag values for fFlags 3624 // bit flag values for fLengthAndFlags
3479 kIsBogus=1, // this string is bogus, i.e., not valid or NULL 3625 kIsBogus=1, // this string is bogus, i.e., not valid or NULL
3480 kUsingStackBuffer=2,// using fUnion.fStackBuffer instead of fUnion.fFields 3626 kUsingStackBuffer=2,// using fUnion.fStackFields instead of fUnion.fFields
3481 kRefCounted=4, // there is a refCount field before the characters in fA rray 3627 kRefCounted=4, // there is a refCount field before the characters in fA rray
3482 kBufferIsReadonly=8,// do not write to this buffer 3628 kBufferIsReadonly=8,// do not write to this buffer
3483 kOpenGetBuffer=16, // getBuffer(minCapacity) was called (is "open"), 3629 kOpenGetBuffer=16, // getBuffer(minCapacity) was called (is "open"),
3484 // and releaseBuffer(newLength) must be called 3630 // and releaseBuffer(newLength) must be called
3631 kAllStorageFlags=0x1f,
3632
3633 kLengthShift=5, // remaining 11 bits for non-negative short length, or n egative if long
3634 kLength1=1<<kLengthShift,
3635 kMaxShortLength=0x3ff, // max non-negative short length (leaves top bit 0)
3636 kLengthIsLarge=0xffe0, // short length < 0, real length is in fUnion.fField s.fLength
3485 3637
3486 // combined values for convenience 3638 // combined values for convenience
3487 kShortString=kUsingStackBuffer, 3639 kShortString=kUsingStackBuffer,
3488 kLongString=kRefCounted, 3640 kLongString=kRefCounted,
3489 kReadonlyAlias=kBufferIsReadonly, 3641 kReadonlyAlias=kBufferIsReadonly,
3490 kWritableAlias=0 3642 kWritableAlias=0
3491 }; 3643 };
3492 3644
3493 friend class UnicodeStringAppendable; 3645 friend class UnicodeStringAppendable;
3494 3646
(...skipping 11 matching lines...) Expand all
3506 * on 64-bit machines (8-byte pointers), it should be 40 bytes. 3658 * on 64-bit machines (8-byte pointers), it should be 40 bytes.
3507 * 3659 *
3508 * We use a hack to achieve this. 3660 * We use a hack to achieve this.
3509 * 3661 *
3510 * With at least some compilers, each of the following is forced to 3662 * With at least some compilers, each of the following is forced to
3511 * a multiple of sizeof(pointer) [the largest field base unit here is a data p ointer], 3663 * a multiple of sizeof(pointer) [the largest field base unit here is a data p ointer],
3512 * rounded up with additional padding if the fields do not already fit that re quirement: 3664 * rounded up with additional padding if the fields do not already fit that re quirement:
3513 * - sizeof(class UnicodeString) 3665 * - sizeof(class UnicodeString)
3514 * - offsetof(UnicodeString, fUnion) 3666 * - offsetof(UnicodeString, fUnion)
3515 * - sizeof(fUnion) 3667 * - sizeof(fUnion)
3516 * - sizeof(fFields) 3668 * - sizeof(fStackFields)
3517 * 3669 *
3518 * In order to avoid padding, we make sizeof(fStackBuffer)=16 (=8 UChars) 3670 * We optimize for the longest possible internal buffer for short strings.
3519 * which is at least as large as sizeof(fFields) on 32-bit and 64-bit machines . 3671 * fUnion.fStackFields begins with 2 bytes for storage flags
3672 * and the length of relatively short strings,
3673 * followed by the buffer for short string contents.
3674 * There is no padding inside fStackFields.
3675 *
3676 * Heap-allocated and aliased strings use fUnion.fFields.
3677 * Both fStackFields and fFields must begin with the same fields for flags and short length,
3678 * that is, those must have the same memory offsets inside the object,
3679 * because the flags must be inspected in order to decide which half of fUnion is being used.
3680 * We assume that the compiler does not reorder the fields.
3681 *
3520 * (Padding at the end of fFields is ok: 3682 * (Padding at the end of fFields is ok:
3521 * As long as there is no padding after fStackBuffer, it is not wasted space.) 3683 * As long as it is no larger than fStackFields, it is not wasted space.)
3522 * 3684 *
3523 * We further assume that the compiler does not reorder the fields, 3685 * For some of the history of the UnicodeString class fields layout, see
3524 * so that fRestOfStackBuffer (which holds a few more UChars) immediately foll ows after fUnion, 3686 * - ICU ticket #11551 "longer UnicodeString contents in stack buffer"
3525 * with at most some padding (but no other field) in between. 3687 * - ICU ticket #11336 "UnicodeString: recombine stack buffer arrays"
3526 * (Padding there would be wasted space, but functionally harmless.) 3688 * - ICU ticket #8322 "why is sizeof(UnicodeString)==48?"
3527 *
3528 * We use a few more sizeof(pointer)'s chunks of space with
3529 * fRestOfStackBuffer, fShortLength and fFlags,
3530 * to get up exactly to the intended sizeof(UnicodeString).
3531 */ 3689 */
3532 // (implicit) *vtable; 3690 // (implicit) *vtable;
3533 union StackBufferOrFields { 3691 union StackBufferOrFields {
3534 // fStackBuffer is used iff (fFlags&kUsingStackBuffer) 3692 // fStackFields is used iff (fLengthAndFlags&kUsingStackBuffer) else fFields is used.
3535 // else fFields is used 3693 // Each struct of the union must begin with fLengthAndFlags.
3536 UChar fStackBuffer[8]; // buffer for short strings, together with fRestOfSt ackBuffer
3537 struct { 3694 struct {
3695 int16_t fLengthAndFlags; // bit fields: see constants above
3696 UChar fBuffer[US_STACKBUF_SIZE]; // buffer for short strings
3697 } fStackFields;
3698 struct {
3699 int16_t fLengthAndFlags; // bit fields: see constants above
3700 int32_t fLength; // number of characters in fArray if >127; else undefi ned
3701 int32_t fCapacity; // capacity of fArray (in UChars)
3702 // array pointer last to minimize padding for machines with P128 data mode l
3703 // or pointer sizes that are not a power of 2
3538 UChar *fArray; // the Unicode data 3704 UChar *fArray; // the Unicode data
3539 int32_t fCapacity; // capacity of fArray (in UChars)
3540 int32_t fLength; // number of characters in fArray if >127; else undefi ned
3541 } fFields; 3705 } fFields;
3542 } fUnion; 3706 } fUnion;
3543 UChar fRestOfStackBuffer[US_STACKBUF_SIZE-8];
3544 int8_t fShortLength; // 0..127: length <0: real length is in fUnion.fFields. fLength
3545 uint8_t fFlags; // bit flags: see constants above
3546 }; 3707 };
3547 3708
3548 /** 3709 /**
3549 * Create a new UnicodeString with the concatenation of two others. 3710 * Create a new UnicodeString with the concatenation of two others.
3550 * 3711 *
3551 * @param s1 The first string to be copied to the new one. 3712 * @param s1 The first string to be copied to the new one.
3552 * @param s2 The second string to be copied to the new one, after s1. 3713 * @param s2 The second string to be copied to the new one, after s1.
3553 * @return UnicodeString(s1).append(s2) 3714 * @return UnicodeString(s1).append(s2)
3554 * @stable ICU 2.8 3715 * @stable ICU 2.8
3555 */ 3716 */
(...skipping 31 matching lines...) Expand 10 before | Expand all | Expand 10 after
3587 start = len; 3748 start = len;
3588 } 3749 }
3589 if(_length < 0) { 3750 if(_length < 0) {
3590 _length = 0; 3751 _length = 0;
3591 } else if(_length > (len - start)) { 3752 } else if(_length > (len - start)) {
3592 _length = (len - start); 3753 _length = (len - start);
3593 } 3754 }
3594 } 3755 }
3595 3756
3596 inline UChar* 3757 inline UChar*
3597 UnicodeString::getArrayStart() 3758 UnicodeString::getArrayStart() {
3598 { return (fFlags&kUsingStackBuffer) ? fUnion.fStackBuffer : fUnion.fFields.fArra y; } 3759 return (fUnion.fFields.fLengthAndFlags&kUsingStackBuffer) ?
3760 fUnion.fStackFields.fBuffer : fUnion.fFields.fArray;
3761 }
3599 3762
3600 inline const UChar* 3763 inline const UChar*
3601 UnicodeString::getArrayStart() const 3764 UnicodeString::getArrayStart() const {
3602 { return (fFlags&kUsingStackBuffer) ? fUnion.fStackBuffer : fUnion.fFields.fArra y; } 3765 return (fUnion.fFields.fLengthAndFlags&kUsingStackBuffer) ?
3766 fUnion.fStackFields.fBuffer : fUnion.fFields.fArray;
3767 }
3603 3768
3604 //======================================== 3769 //========================================
3605 // Default constructor 3770 // Default constructor
3606 //======================================== 3771 //========================================
3607 3772
3608 inline 3773 inline
3609 UnicodeString::UnicodeString() 3774 UnicodeString::UnicodeString() {
3610 : fShortLength(0), 3775 fUnion.fStackFields.fLengthAndFlags=kShortString;
3611 fFlags(kShortString) 3776 }
3612 {}
3613 3777
3614 //======================================== 3778 //========================================
3615 // Read-only implementation methods 3779 // Read-only implementation methods
3616 //======================================== 3780 //========================================
3617 inline int32_t 3781 inline UBool
3618 UnicodeString::length() const 3782 UnicodeString::hasShortLength() const {
3619 { return fShortLength>=0 ? fShortLength : fUnion.fFields.fLength; } 3783 return fUnion.fFields.fLengthAndFlags>=0;
3784 }
3620 3785
3621 inline int32_t 3786 inline int32_t
3622 UnicodeString::getCapacity() const 3787 UnicodeString::getShortLength() const {
3623 { return (fFlags&kUsingStackBuffer) ? US_STACKBUF_SIZE : fUnion.fFields.fCapacit y; } 3788 // fLengthAndFlags must be non-negative -> short length >= 0
3789 // and arithmetic or logical shift does not matter.
3790 return fUnion.fFields.fLengthAndFlags>>kLengthShift;
3791 }
3792
3793 inline int32_t
3794 UnicodeString::length() const {
3795 return hasShortLength() ? getShortLength() : fUnion.fFields.fLength;
3796 }
3797
3798 inline int32_t
3799 UnicodeString::getCapacity() const {
3800 return (fUnion.fFields.fLengthAndFlags&kUsingStackBuffer) ?
3801 US_STACKBUF_SIZE : fUnion.fFields.fCapacity;
3802 }
3624 3803
3625 inline int32_t 3804 inline int32_t
3626 UnicodeString::hashCode() const 3805 UnicodeString::hashCode() const
3627 { return doHashCode(); } 3806 { return doHashCode(); }
3628 3807
3629 inline UBool 3808 inline UBool
3630 UnicodeString::isBogus() const 3809 UnicodeString::isBogus() const
3631 { return (UBool)(fFlags & kIsBogus); } 3810 { return (UBool)(fUnion.fFields.fLengthAndFlags & kIsBogus); }
3632 3811
3633 inline UBool 3812 inline UBool
3634 UnicodeString::isWritable() const 3813 UnicodeString::isWritable() const
3635 { return (UBool)!(fFlags&(kOpenGetBuffer|kIsBogus)); } 3814 { return (UBool)!(fUnion.fFields.fLengthAndFlags&(kOpenGetBuffer|kIsBogus)); }
3636 3815
3637 inline UBool 3816 inline UBool
3638 UnicodeString::isBufferWritable() const 3817 UnicodeString::isBufferWritable() const
3639 { 3818 {
3640 return (UBool)( 3819 return (UBool)(
3641 !(fFlags&(kOpenGetBuffer|kIsBogus|kBufferIsReadonly)) && 3820 !(fUnion.fFields.fLengthAndFlags&(kOpenGetBuffer|kIsBogus|kBufferIsReadonl y)) &&
3642 (!(fFlags&kRefCounted) || refCount()==1)); 3821 (!(fUnion.fFields.fLengthAndFlags&kRefCounted) || refCount()==1));
3643 } 3822 }
3644 3823
3645 inline const UChar * 3824 inline const UChar *
3646 UnicodeString::getBuffer() const { 3825 UnicodeString::getBuffer() const {
3647 if(fFlags&(kIsBogus|kOpenGetBuffer)) { 3826 if(fUnion.fFields.fLengthAndFlags&(kIsBogus|kOpenGetBuffer)) {
3648 return 0; 3827 return 0;
3649 } else if(fFlags&kUsingStackBuffer) { 3828 } else if(fUnion.fFields.fLengthAndFlags&kUsingStackBuffer) {
3650 return fUnion.fStackBuffer; 3829 return fUnion.fStackFields.fBuffer;
3651 } else { 3830 } else {
3652 return fUnion.fFields.fArray; 3831 return fUnion.fFields.fArray;
3653 } 3832 }
3654 } 3833 }
3655 3834
3656 //======================================== 3835 //========================================
3657 // Read-only alias methods 3836 // Read-only alias methods
3658 //======================================== 3837 //========================================
3659 inline int8_t 3838 inline int8_t
3660 UnicodeString::doCompare(int32_t start, 3839 UnicodeString::doCompare(int32_t start,
(...skipping 580 matching lines...) Expand 10 before | Expand all | Expand 10 after
4241 inline UChar 4420 inline UChar
4242 UnicodeString::charAt(int32_t offset) const 4421 UnicodeString::charAt(int32_t offset) const
4243 { return doCharAt(offset); } 4422 { return doCharAt(offset); }
4244 4423
4245 inline UChar 4424 inline UChar
4246 UnicodeString::operator[] (int32_t offset) const 4425 UnicodeString::operator[] (int32_t offset) const
4247 { return doCharAt(offset); } 4426 { return doCharAt(offset); }
4248 4427
4249 inline UBool 4428 inline UBool
4250 UnicodeString::isEmpty() const { 4429 UnicodeString::isEmpty() const {
4251 return fShortLength == 0; 4430 // Arithmetic or logical right shift does not matter: only testing for 0.
4431 return (fUnion.fFields.fLengthAndFlags>>kLengthShift) == 0;
4252 } 4432 }
4253 4433
4254 //======================================== 4434 //========================================
4255 // Write implementation methods 4435 // Write implementation methods
4256 //======================================== 4436 //========================================
4257 inline void 4437 inline void
4438 UnicodeString::setZeroLength() {
4439 fUnion.fFields.fLengthAndFlags &= kAllStorageFlags;
4440 }
4441
4442 inline void
4443 UnicodeString::setShortLength(int32_t len) {
4444 // requires 0 <= len <= kMaxShortLength
4445 fUnion.fFields.fLengthAndFlags =
4446 (int16_t)((fUnion.fFields.fLengthAndFlags & kAllStorageFlags) | (len << kLen gthShift));
4447 }
4448
4449 inline void
4258 UnicodeString::setLength(int32_t len) { 4450 UnicodeString::setLength(int32_t len) {
4259 if(len <= 127) { 4451 if(len <= kMaxShortLength) {
4260 fShortLength = (int8_t)len; 4452 setShortLength(len);
4261 } else { 4453 } else {
4262 fShortLength = (int8_t)-1; 4454 fUnion.fFields.fLengthAndFlags |= kLengthIsLarge;
4263 fUnion.fFields.fLength = len; 4455 fUnion.fFields.fLength = len;
4264 } 4456 }
4265 } 4457 }
4266 4458
4267 inline void 4459 inline void
4268 UnicodeString::setToEmpty() { 4460 UnicodeString::setToEmpty() {
4269 fShortLength = 0; 4461 fUnion.fFields.fLengthAndFlags = kShortString;
4270 fFlags = kShortString;
4271 } 4462 }
4272 4463
4273 inline void 4464 inline void
4274 UnicodeString::setArray(UChar *array, int32_t len, int32_t capacity) { 4465 UnicodeString::setArray(UChar *array, int32_t len, int32_t capacity) {
4275 setLength(len); 4466 setLength(len);
4276 fUnion.fFields.fArray = array; 4467 fUnion.fFields.fArray = array;
4277 fUnion.fFields.fCapacity = capacity; 4468 fUnion.fFields.fCapacity = capacity;
4278 } 4469 }
4279 4470
4280 inline UnicodeString& 4471 inline UnicodeString&
(...skipping 47 matching lines...) Expand 10 before | Expand all | Expand 10 after
4328 UnicodeString::setTo(UChar32 srcChar) 4519 UnicodeString::setTo(UChar32 srcChar)
4329 { 4520 {
4330 unBogus(); 4521 unBogus();
4331 return replace(0, length(), srcChar); 4522 return replace(0, length(), srcChar);
4332 } 4523 }
4333 4524
4334 inline UnicodeString& 4525 inline UnicodeString&
4335 UnicodeString::append(const UnicodeString& srcText, 4526 UnicodeString::append(const UnicodeString& srcText,
4336 int32_t srcStart, 4527 int32_t srcStart,
4337 int32_t srcLength) 4528 int32_t srcLength)
4338 { return doReplace(length(), 0, srcText, srcStart, srcLength); } 4529 { return doAppend(srcText, srcStart, srcLength); }
4339 4530
4340 inline UnicodeString& 4531 inline UnicodeString&
4341 UnicodeString::append(const UnicodeString& srcText) 4532 UnicodeString::append(const UnicodeString& srcText)
4342 { return doReplace(length(), 0, srcText, 0, srcText.length()); } 4533 { return doAppend(srcText, 0, srcText.length()); }
4343 4534
4344 inline UnicodeString& 4535 inline UnicodeString&
4345 UnicodeString::append(const UChar *srcChars, 4536 UnicodeString::append(const UChar *srcChars,
4346 int32_t srcStart, 4537 int32_t srcStart,
4347 int32_t srcLength) 4538 int32_t srcLength)
4348 { return doReplace(length(), 0, srcChars, srcStart, srcLength); } 4539 { return doAppend(srcChars, srcStart, srcLength); }
4349 4540
4350 inline UnicodeString& 4541 inline UnicodeString&
4351 UnicodeString::append(const UChar *srcChars, 4542 UnicodeString::append(const UChar *srcChars,
4352 int32_t srcLength) 4543 int32_t srcLength)
4353 { return doReplace(length(), 0, srcChars, 0, srcLength); } 4544 { return doAppend(srcChars, 0, srcLength); }
4354 4545
4355 inline UnicodeString& 4546 inline UnicodeString&
4356 UnicodeString::append(UChar srcChar) 4547 UnicodeString::append(UChar srcChar)
4357 { return doReplace(length(), 0, &srcChar, 0, 1); } 4548 { return doAppend(&srcChar, 0, 1); }
4358 4549
4359 inline UnicodeString& 4550 inline UnicodeString&
4360 UnicodeString::operator+= (UChar ch) 4551 UnicodeString::operator+= (UChar ch)
4361 { return doReplace(length(), 0, &ch, 0, 1); } 4552 { return doAppend(&ch, 0, 1); }
4362 4553
4363 inline UnicodeString& 4554 inline UnicodeString&
4364 UnicodeString::operator+= (UChar32 ch) { 4555 UnicodeString::operator+= (UChar32 ch) {
4365 return append(ch); 4556 return append(ch);
4366 } 4557 }
4367 4558
4368 inline UnicodeString& 4559 inline UnicodeString&
4369 UnicodeString::operator+= (const UnicodeString& srcText) 4560 UnicodeString::operator+= (const UnicodeString& srcText)
4370 { return doReplace(length(), 0, srcText, 0, srcText.length()); } 4561 { return doAppend(srcText, 0, srcText.length()); }
4371 4562
4372 inline UnicodeString& 4563 inline UnicodeString&
4373 UnicodeString::insert(int32_t start, 4564 UnicodeString::insert(int32_t start,
4374 const UnicodeString& srcText, 4565 const UnicodeString& srcText,
4375 int32_t srcStart, 4566 int32_t srcStart,
4376 int32_t srcLength) 4567 int32_t srcLength)
4377 { return doReplace(start, 0, srcText, srcStart, srcLength); } 4568 { return doReplace(start, 0, srcText, srcStart, srcLength); }
4378 4569
4379 inline UnicodeString& 4570 inline UnicodeString&
4380 UnicodeString::insert(int32_t start, 4571 UnicodeString::insert(int32_t start,
(...skipping 24 matching lines...) Expand all
4405 { return replace(start, 0, srcChar); } 4596 { return replace(start, 0, srcChar); }
4406 4597
4407 4598
4408 inline UnicodeString& 4599 inline UnicodeString&
4409 UnicodeString::remove() 4600 UnicodeString::remove()
4410 { 4601 {
4411 // remove() of a bogus string makes the string empty and non-bogus 4602 // remove() of a bogus string makes the string empty and non-bogus
4412 if(isBogus()) { 4603 if(isBogus()) {
4413 setToEmpty(); 4604 setToEmpty();
4414 } else { 4605 } else {
4415 fShortLength = 0; 4606 setZeroLength();
4416 } 4607 }
4417 return *this; 4608 return *this;
4418 } 4609 }
4419 4610
4420 inline UnicodeString& 4611 inline UnicodeString&
4421 UnicodeString::remove(int32_t start, 4612 UnicodeString::remove(int32_t start,
4422 int32_t _length) 4613 int32_t _length)
4423 { 4614 {
4424 if(start <= 0 && _length == INT32_MAX) { 4615 if(start <= 0 && _length == INT32_MAX) {
4425 // remove(guaranteed everything) of a bogus string makes the string empt y and non-bogus 4616 // remove(guaranteed everything) of a bogus string makes the string empt y and non-bogus
(...skipping 33 matching lines...) Expand 10 before | Expand all | Expand 10 after
4459 { return doReverse(0, length()); } 4650 { return doReverse(0, length()); }
4460 4651
4461 inline UnicodeString& 4652 inline UnicodeString&
4462 UnicodeString::reverse(int32_t start, 4653 UnicodeString::reverse(int32_t start,
4463 int32_t _length) 4654 int32_t _length)
4464 { return doReverse(start, _length); } 4655 { return doReverse(start, _length); }
4465 4656
4466 U_NAMESPACE_END 4657 U_NAMESPACE_END
4467 4658
4468 #endif 4659 #endif
OLDNEW
« no previous file with comments | « source/common/unicode/uniset.h ('k') | source/common/unicode/unorm.h » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698