Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(26)

Side by Side Diff: recipes/src/core/strings.md

Issue 12335109: Strings recipes for the Dart Cookbook (Closed) Base URL: https://github.com/dart-lang/cookbook.git@master
Patch Set: Numerous minor changes based on reviewers' comments. Created 7 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 # Strings
2
3 A Dart string represents a sequence of characters encoded in UTF-16. Decoding
4 UTF-16 yields Unicode code points. Borrowing terminology from Go, Dart uses
5 the term `rune` for an integer representing a Unicode code point.
6
7 The string recipes included in this chapter assume that you have some
8 familiarity with Unicode and UTF-16. Here is a brief refresher:
9
10 ### What is the Basic Multilingual Plane?
11
12 The Unicode code space is divided into seventeen planes of 65,536 points each.
13 The first plane (code points U+0000 to U+FFFF) contains the most
14 frequently used characters and is called the Basic Multilingual Plane or BMP.
15
16 ### What is a Surrogate Pair?
17
18 The term 'surrogate pair' refers to a means of encoding Unicode characters
19 outside the Basic Multilingual Plane.
20
21 In UTF-16, two-byte (16-bit) code sequences are used to store Unicode
22 characters. Since two bytes can only contain the 65,536 characters in the 0x0
23 to 0xFFFF range, a pair of code points are used to store values in the
24 0x10000 to 0x10FFFF range.
25
26 For example the Unicode character for musical Treble-clef (🎼 ), with
27 a value of '\u{1F3BC}', it too large to fit in 16 bits.
28
29 var clef = '\u{1F3BC}'; // 🎼
30
31 '\u{1F3BC}' is composed of a UTF-16 surrogate pair: [u\D83C, \uDFBC].
32
33 ### What is the difference between a code point and a code unit?
34
35 Within the Basic Multilingual Plane, the code point for a character is
36 numerically the same as the code unit for that character.
37
38 'D'.runes.first; // 68
39 'D'.codeUnits.first; // 68
40
41 For non-BMP characters, each code point is represented by two code units.
42
43 var clef = '\u{1F3BC}'; // 🎼
44 clef.runes.length; // 1
45 clef.codeUnits.length; // 2
46
47 ### What exactly is a character?
48
49 A character is a string contained in the Universal Character Set. Each character
50 maps to a single rune value (code point); BMP characters map to 1 code
51 unit; non-BMP characters map to 2 code units.
52
53 You can read more about the Universal Character Set at
54 http://en.wikipedia.org/wiki/Universal_Character_Set.
55
56 ### Do I have to really deal with Unicode?
57
58 Yes, if you want to build robust international applications, you do.
59 Besides, the String library makes working with Unicode relatively painless,
60 so there's no great overhead in doing things right.
61
62 ## Concatenating Strings
63
64 ### Problem
65
66 You want to concatenate strings in Dart. You tried using `+`, but
67 that resulted in an error.
68
69 ### Solution
70
71 Use adjacent string literals:
72
73 var fact = 'Dart' 'is' ' fun!'; // 'Dart is fun!'
74
75 ### Discussion
76
77 Adjacent literals also work over multiple lines:
78
79 var fact = 'Dart'
80 'is'
81 'fun!'; // 'Dart is fun!'
82
83 They also work when using multiline strings:
84
85 var lunch = '''Peanut
86 butter'''
87 '''and
88 jelly'''; // 'Peanut\nbutter and\njelly'
89
90 You can concatenate adjacent single line literals with multiline strings:
91
92 var funnyGuys = 'Dewey ' 'Cheatem'
93 ''' and
94 Howe'''; // 'Dewey Cheatem and\n Howe'
95
96
97 #### Alternatives to adjacent string literals
98
99 You can also use the `concat()` method on a string to concatenate it to another
floitsch 2013/03/09 00:01:41 I just gave an LGTM to Lasse for changing concat t
100 string:
101
102 var film = filmToWatch();
103 film = film.concat('\n'); // 'The Big Lebowski\n'
104
105 Since `concat()` creates a new string every time it is invoked, a long chain of
106 `concat()`s can be expensive. Avoid those. Use a StringBuffer instead (see
107 _Incrementally building a string efficiently using a StringBuffer_, below).
108
109 Use can `join()` to combine a sequence of strings:
110
111 var film = ['The', 'Big', 'Lebowski']).join(' '); // 'The Big Lebowski'
112
113 You can also use string interpolation to concatenate strings (see
114 _Interpolating expressions inside strings_, below).
115
116
117 ## Interpolating expressions inside strings
118
119 ### Problem
120
121 You want to create strings that contain Dart expressions and identifiers.
122
123 ### Solution
124
125 You can put the value of an expression inside a string by using ${expression}.
126
127 var favFood = 'sushi';
128 var whatDoILove = 'I love ${favFood.toUpperCase()}'; // 'I love SUSHI'
129
130 You can skip the {} if the expression is an identifier:
131
132 var whatDoILove = 'I love $favFood'; // 'I love sushi'
133
134 ### Discussion
135
136 An interpolated string, `string ${expression}` is equivalent to the
137 concatenation of the strings 'string ' and `expression.toString()`.
138 Consider this code:
139
140 var four = 4;
141 var seasons = 'The $four seasons'; // 'The 4 seasons'
142
143 It is equivalent to the following:
floitsch 2013/03/09 00:01:41 It is not. the concat will make two copies, wherea
144
145 var seasons = 'The '.concat(4.toString()).concat(' seasons'); // 'The 4 seas ons'
146
147 You should consider implementing a `toString()` method for user-defined
148 objects. Here's what happens if you don't:
149
150 class Point {
151 num x, y;
152 Point(this.x, this.y);
153 }
154
155 var point = new Point(3, 4);
156 print('Point: $point'); // "Point: Instance of 'Point'"
157
158 Probably not what you wanted. Here is the same example with an explicit
159 `toString()`:
160
161 class Point {
162 ...
163
164 String toString() => 'x: $x, y: $y';
165 }
166
167 print('Point: $point'); // 'Point: x: 3, y: 4'
168
169
170 ## Escaping special characters
171
172 ### Problem
173
174 You want to put newlines, dollar signs, or other special characters in your stri ngs.
175
176 ### Solution
177
178 Prefix special characters with a `\`.
179
180 print(Wile\nCoyote');
181 // Wile
182 // Coyote
183
184 ### Discussion
185
186 Dart designates a few characters as special, and these can be escaped:
187
188 - \n for newline, equivalent to \x0A.
189 - \r for carriage return, equivalent to \x0D.
190 - \f for form feed, equivalent to \x0C.
191 - \b for backspace, equivalent to \x08.
192 - \t for tab, equivalent to \x09.
193 - \v for vertical tab, equivalent to \x0B.
194
195 If you prefer, you can use `\x` or `\u` notation to indicate the special
196 character:
197
198 print('Wile\x0ACoyote'); // same as print('Wile\nCoyote');
199 print('Wile\u000ACoyote'); // same as print('Wile\nCoyote');
200
201 You can also use `\u{}` notation:
202
203 print('Wile\u{000A}Coyote'); // same as print('Wile\nCoyote');
204
205 You can also escape the `$` used in string interpolation:
206
207 var superGenius = 'Wile Coyote';
208 print('$superGenius and Road Runner'); // 'Wile Coyote and Road Runner'
209 print('\$superGenius and Road Runner'); // '$superGenius and Road Runner'
210
211 If you escape a non-special character, the `\` is ignored:
212
213 print('Wile \E Coyote'); // 'Wile E Coyote'
214
215
216 ## Incrementally building a string efficiently using a StringBuffer
217
218 ### Problem
219
220 You want to collect string fragments and combine them in an efficient manner.
221
222 ### Solution
223
224 Use a StringBuffer to programmatically generate a string. A StringBuffer
225 collects the string fragments, but does not generate a new string until
226 `toString()` is called:
227
228 var sb = new StringBuffer();
229 sb.write('John, ');
230 sb.write('Paul, ');
231 sb.write('George, ');
232 sb.write('and Ringo');
233 var beatles = sb.toString(); // 'John, Paul, George, and Ringo'
234
235 ### Discussion
236
237 In addition to `write()`, the StringBuffer class provides methods to write a
238 list of strings (`writeAll()`), write a numerical character code
239 (`writeCharCode()`), write with an added newline ('writeln()`), and more. Here
240 is a simple example that show the use of these methods:
241
242 var sb = new StringBuffer();
243 sb.writeln('The Beatles:');
244 sb.writeAll(['John, ', 'Paul, ', 'George, and Ringo']);
245 sb.writeCharCode(33); // charCode for '!'.
246 var beatles = sb.toString(); // 'The Beatles:\nJohn, Paul, George, and Ringo !'
247
248 Since a StringBuffer waits until the call to `toString()` to generate the
249 concatenated string, it represents a more efficient way of combining strings
250 than `concat()`. See the _Concatenating Strings_ recipe for a description of
251 `concat()`.
252
253 ## Converting between string characters and numerical codes
254
255 ### Problem
256
257 You want to convert string characters into numerical codes and back.
258
259 ### Solution
260
261 Use the `runes` getter to access a string's code points:
262
263 'Dart'.runes.toList(); // [68, 97, 114, 116]
264
265 var smileyFace = '\u263A'; // ☺
266 smileyFace.runes.toList(); // [9786]
267
268 The number 9786 represents the code unit '\u263A'.
269
270 Use `string.codeUnits` to get a string's UTF-16 code units:
271
272 'Dart'.codeUnits.toList(); // [68, 97, 114, 116]
273 smileyFace.codeUnits.toList(); // [9786]
274
275 ### Discussion
276
277 Notice that using `runes` and `codeUnits` produces identical results
278 in the examples above. That happens because each character in 'Dart' and in
279 `smileyFace` fits within 16 bits, resulting in a code unit corresponding
280 neatly with a code point.
281
282 Consider an example where a character cannot be represented within 16-bits,
283 the Unicode character for a Treble clef ('\u{1F3BC}'). This character consists
284 of a surrogate pair: '\uD83C', '\uDFBC'. Getting the numerical value of this
285 character using `codeUnits` and `runes` produces the following result:
286
287 var clef = '\u{1F3BC}'; // 🎼
288 clef.codeUnits.toList(); // [55356, 57276]
289 clef.runes.toList(); // [127932]
290
291 The numbers 55356 and 57276 represent `clef`'s surrogate pair, '\uD83C' and
292 '\uDFBC', respectively. The number 127932 represents the code point '\u1F3BC'.
293
294 #### Using codeUnitAt() to access individual code units
295
296 To access the 16-Bit UTF-16 code unit at a particular index, use
297 `codeUnitAt()`:
298
299 'Dart'.codeUnitAt(0); // 68
300 smileyFace.codeUnitAt(0); // 9786
301
302 Using `codeUnitAt()` with the multi-byte `clef` character leads to problems:
303
304 clef.codeUnitAt(0); // 55356
305 clef.codeUnitAt(1); // 57276
306
307 In either call to `clef.codeUnitAt()`, the values returned represent strings
308 that are only one half of a UTF-16 surrogate pair. These are not valid UTF-16
309 strings.
310
311
312 #### Converting numerical codes to strings
313
314 You can generate a new string from runes or code units using the factory
315 `String.fromCharCodes(charCodes)`:
316
317 new String.fromCharCodes([68, 97, 114, 116]); // 'Dart'
318
319 new String.fromCharCodes([73, 32, 9825, 32, 76, 117, 99, 121]);
320 // 'I ♡ Lucy'
321
322 new String.fromCharCodes([55356, 57276]); // 🎼
323 new String.fromCharCodes([127932]), // 🎼
324
325 You can use the `String.fromCharCode()` factory to convert a single rune or
326 code unit to a string:
327
328 new String.fromCharCode(68); // 'D'
329 new String.fromCharCode(9786); // ☺
330 new String.fromCharCode(127932); // 🎼
331
332 Creating a string with only one half of a surrogate pair is permitted, but not
333 recommended.
334
335 ## Determining if a string is empty
336
337 ### Problem
338
339 You want to know if a string is empty. You tried ` if(string) {...}`, but that
340 did not work.
341
342 ### Solution
343
344 Use `string.isEmpty`:
345
346 var emptyString = '';
347 emptyString.isEmpty; // true
348
349 A string with a space is not empty:
350
351 var space = ' ';
352 space.isEmpty; // false
353
354 ### Discussion
355
356 Don't use `if (string)` to test the emptiness of a string. In Dart, all
357 objects except the boolean true evaluate to false. `if(string)` will always
358 be false.
359
360
361 ## Removing leading and trailing whitespace
362
363 ### Problem
364
365 You want to remove leading and trailing whitespace from a string.
366
367 ### Solution
368
369 Use `string.trim()`:
370
371 var space = '\n\r\f\t\v'; // We'll use a variety of space characters.
372 var string = '$space X $space';
373 var newString = string.trim(); // 'X'
374
375 The String class has no methods to remove only leading or only trailing
376 whitespace. But you can always use regExps.
377
378 Remove only leading whitespace:
379
380 var newString = string.replaceFirst(new RegExp(r'^\s+'), ''); // 'X $space'
381
382 Remove only trailing whitespace:
383
384 var newString = string.replaceFirst(new RegExp(r'\s+$'), ''); // '$space X'
385
386
387 ## Calculating the length of a string
388
389 ### Problem
390
391 You want to get the length of a string, but are not sure how to
392 correctly calculate the length when working with Unicode.
393
394 ### Solution
395
396 Use string.length to get the number of UTF-16 code units in a string:
397
398 'I love music'.length; // 12
399 'I love music'.runes.length; // 12
400
401 ### Discussion
402
403 For characters that fit into 16 bits, the code unit length is the same as the
404 rune length:
405
406 var hearts = '\u2661'; // ♡
407 hearts.length; // 1
408 hearts.runes.length; // 1
409
410 If the string contains any characters outside the Basic Multilingual
411 Plane (BMP), the rune length will be less than the code unit length:
412
413 var clef = '\u{1F3BC}'; // 🎼
414 clef.length; // 2
415 clef.runes.length; // 1
416
417 var music = 'I $hearts $clef'; // 'I ♡ 🎼 '
418 music.length; // 6
419 music.runes.length // 5
420
421 Use `length` if you want to number of code units; use `runes.length` if you
422 want the number of runes.
floitsch 2013/03/09 00:01:41 You could add, that Twitter uses runes for the len
423
424
425 ## Subscripting a string
426
427 ### Problem
428
429 You want to be able to access a character in a string at a particular index.
430
431 ### Solution
432
433 Subscript runes:
434
435 var teacup = '\u{1F375}'; // 🍵
436 teacup.runes.toList()[0]; // 127861
floitsch 2013/03/09 00:01:41 If you want to access it only once, you can also u
437
438 The number 127861 represents the code point for teacup, '\u{1F375}' (🍵 ).
439
440 ### Discussion
441
442 Subscripting a string directly can be problematic. This is because the default
443 `[]` implementation subscripts along code units. This means that
444 for non-BMP characters, subscripting yields invalid UTF-16 characters:
445
446 'Dart'[0]; // 'D'
447
448 var hearts = '\u2661'; // ♡
449 hearts[0]; '\u2661' // ♡
450
451 teacup[0]; // 55356, Invalid string, half of a surrogate pair.
452 teacup.codeUnits.toList()[0]; // The same.
453
454
455 ## Processing a string one character at a time
456
457 ### Problem
458
459 You want to do something with each individual character in a string.
460
461 ### Solution
462
463 To access an individual character, map the string runes:
464
465 var charList = "Dart".runes.map((rune) => '*${new String.fromCharCode(rune)} *').toList();
466 // ['*D*', '*a*', '*r*', '*t*']
467
468 var runeList = happy.runes.map((rune) => [rune, new String.fromCharCode(rune )]).toList(),
469 // [[73, 'I'], [32, ' '], [97, 'a'], [109, 'm'], [32, ' '], [9786, '☺' ]]
470
471 If you are sure that the string is in the Basic Multilingual Plane (BMP), you
472 can use string.split(''):
473
474 'Dart'.split(''); // ['D', 'a', 'r', 't']
475 smileyFace.split('').length; // 1
476
477 Since `split('')` splits at the UTF-16 code unit boundaries,
478 invoking it on a non-BMP character yields the string's surrogate pair:
479
480 var clef = '\u{1F3BC}'; // 🎼 , not in BMP.
481 clef.split('').length; // 2
482
483 The surrogate pair members are not valid UTF-16 strings.
484
485
486 ## Splitting a string into substrings
487
488 ### Problem
489
490 You want to split a string into substrings.
491
492 ### Solution
493
494 Use the `split()` method with a string or a regExp as an argument.
495
496 var smileyFace = '\u263A';
497 var happy = 'I am $smileyFace';
498 happy.split(' '); // ['I', 'am', '☺']
499
500 Here is an example of using `split()` with a regExp:
501
502 var nums = '2/7 3 4/5 3~/5';
503 var numsRegExp = new RegExp(r'(\s|/|~/)');
504 nums.split(numsRegExp); // ['2', '7', '3', '4', '5', '3', '5']
505
506 In the code above, the string `nums` contains various numbers, some of which
507 are expressed as fractions or as int-divisions. A regExp is used to split the
508 string to extract just the numbers.
509
510 You can perform operations on the matched and unmatched portions of a string
511 when using `split()` with a regExp:
512
513 'Eats SHOOTS leaves'.splitMapJoin((new RegExp(r'SHOOTS')),
514 onMatch: (m) => '*${m.group(0).toLowerCase()}*',
515 onNonMatch: (n) => n.toUpperCase()); // 'EATS *shoots* LEAVES'
516
517 The regExp matches the middle word ('SHOOTS'). A pair of callbacks are
518 registered to transform the matched and unmatched substrings before the
519 substrings are joined together again.
520
521
522 ## Changing string case
523
524 ### Problem
525
526 You want to change the case of strings.
527
528 ### Solution
529
530 Use `string.toUpperCase()` and `string.toLowerCase()` to convert a string to
531 lower-case or upper-case, respectively:
532
533 var theOneILove = 'I love Lucy';
534 theOneILove.toUpperCase(); // 'I LOVE LUCY!'
535 theOneILove.toLowerCase(); // 'i love lucy!'
536
537 ### Discussion
538
539 Case changes affect the characters of bi-cameral scripts like Greek and French:
540 var zeus = '\u0394\u03af\u03b1\u03c2'; // 'Δίας' (Zeus in modern Greek)
541 zeus.toUpperCase(); // 'ΔΊΑΣ'
542
543 var resume = '\u0052\u00e9\u0073\u0075\u006d\u00e9'; // 'Résumé'
544 resume.toLowerCase(); // 'résumé'
545
546 They do not affect the characters of uni-cameral scripts like Devanagari (used f or
547 writing many of the languages of India):
548
549 var chickenKebab = '\u091a\u093f\u0915\u0928 \u0915\u092c\u093e\u092c';
550 // 'चिकन कबाब' (in Devanagari)
551 chickenKebab.toLowerCase(); // 'चिकन कबाब'
552 chickenKebab.toUpperCase(); // 'चिकन कबाब'
553
554 If a character's case does not change when using `toUpperCase()` and
555 `toLowerCase()`, it is most likely because the character only has one
556 form.
557
558 ## Determining whether a string contains another string
559
560 ### Problem
561
562 You want to find out if a string is the substring of another string.
563
564 ### Solution
565
566 Use `string.contains()`:
567
568 var fact = 'Dart strings are immutable';
569 string.contains('immutable'); // True.
570
571 You can indicate a startIndex as a second argument:
572
573 string.contains('Dart', 2); // False
574
575 ### Discussion
576
577 The String library provides a couple of shortcuts for testing whether a string
578 is a substring of another:
579
580 string.startsWith('Dart'); // True.
581 string.endsWith('e'); // True.
582
583 You can also use `string.indexOf()`, which returns -1 if the substring is
584 not found within a string, and its matching index, if it is:
585
586 string.indexOf('art') != -1; // True, `art` is found in `Dart`
587
588 You can also use a regExp and `hasMatch()`:
589
590 new RegExp(r'ar[et]').hasMatch(string); // True, 'art' and 'are' match.
591
592
593 ## Finding matches of a regExp pattern in a string
594
595 ### Problem
596
597 You want to use regExp to match a pattern in a string, and
598 want to be able to access the matches.
599
600 ### Solution
601
602 Construct a regular expression using the RegExp class and find matches using
603 the `allMatches()` method:
604
605 var neverEatingThat = 'Not with a fox, not in a box';
606 var regExp = new RegExp(r'[fb]ox');
607 List matches = regExp.allMatches(neverEatingThat);
608 matches.map((match) => match.group(0)).toList(); // ['fox', 'box']
609
610 ### Discussion
611
612 You can query the object returned by `allMatches()` to find out the number of
613 matches:
614
615 matches.length; // 2
616
617 To find the first match, use `firstMatch()`:
618
619 regExp.firstMatch(neverEatingThat).group(0); // 'fox'
620
621 To directly access the matched string, use `stringMatch()`:
622
623 regExp.stringMatch(neverEatingThat); // 'fox'
624 regExp.stringMatch('I like bagels and lox'); // null
625
626
627 ## Substituting strings based on regExp matches
628
629 ### Problem
630
631 You want to match substrings within a string and make substitutions based on
632 the matches.
633
634 ### Solution
635
636 Construct a regular expression using the RegExp class and make replacements
637 using `replaceAll()` method:
638
639 'resume'.replaceAll(new RegExp(r'e'), '\u00E9'); // 'résumé'
640
641 If you want to replace just the first match, use 'replaceFirst()`:
642
643 '0.0001'.replaceFirst(new RegExp(r'0+'), ''); // '.0001'
644
645 The RegExp matches for one or more 0's and replaces them with an empty string.
646
647 You can use `replaceAllMatched()` and register a function to modify the
648 matches:
649
650 var heart = '\u2661'; // '♡'
651 var string = 'I like Ike but I $heart Lucy';
652 var regExp = new RegExp(r'[A-Z]\w+');
653 string.replaceAllMapped(regExp, (match) => match.group(0).toUpperCase());
654 // 'I like IKE but I ♡ LUCY'
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698