recipes/src/core/strings.md - Issue 12335109: Strings recipes for the Dart Cookbook

Side by Side Diff: recipes/src/core/strings.md

Issue 12335109: Strings recipes for the Dart Cookbook (Closed) Base URL: https://github.com/dart-lang/cookbook.git@master

Patch Set: Numerous minor changes based on reviewers' comments. Created 7 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Strings

	2

	3 A Dart string represents a sequence of characters encoded in UTF-16. Decoding

	4 UTF-16 yields Unicode code points. Borrowing terminology from Go, Dart uses

	5 the term `rune` for an integer representing a Unicode code point.

	6

	7 The string recipes included in this chapter assume that you have some

	8 familiarity with Unicode and UTF-16. Here is a brief refresher:

	9

	10 ### What is the Basic Multilingual Plane?

	11

	12 The Unicode code space is divided into seventeen planes of 65,536 points each.

	13 The first plane (code points U+0000 to U+FFFF) contains the most

	14 frequently used characters and is called the Basic Multilingual Plane or BMP.

	15

	16 ### What is a Surrogate Pair?

	17

	18 The term 'surrogate pair' refers to a means of encoding Unicode characters

	19 outside the Basic Multilingual Plane.

	20

	21 In UTF-16, two-byte (16-bit) code sequences are used to store Unicode

	22 characters. Since two bytes can only contain the 65,536 characters in the 0x0

	23 to 0xFFFF range, a pair of code points are used to store values in the

	24 0x10000 to 0x10FFFF range.

	25

	26 For example the Unicode character for musical Treble-clef (🎼 ), with

	27 a value of '\u{1F3BC}', it too large to fit in 16 bits.

	28

	29 var clef = '\u{1F3BC}'; // 🎼

	30

	31 '\u{1F3BC}' is composed of a UTF-16 surrogate pair: [u\D83C, \uDFBC].

	32

	33 ### What is the difference between a code point and a code unit?

	34

	35 Within the Basic Multilingual Plane, the code point for a character is

	36 numerically the same as the code unit for that character.

	37

	38 'D'.runes.first; // 68

	39 'D'.codeUnits.first; // 68

	40

	41 For non-BMP characters, each code point is represented by two code units.

	42

	43 var clef = '\u{1F3BC}'; // 🎼

	44 clef.runes.length; // 1

	45 clef.codeUnits.length; // 2

	46

	47 ### What exactly is a character?

	48

	49 A character is a string contained in the Universal Character Set. Each character

	50 maps to a single rune value (code point); BMP characters map to 1 code

	51 unit; non-BMP characters map to 2 code units.

	52

	53 You can read more about the Universal Character Set at

	54 http://en.wikipedia.org/wiki/Universal_Character_Set.

	55

	56 ### Do I have to really deal with Unicode?

	57

	58 Yes, if you want to build robust international applications, you do.

	59 Besides, the String library makes working with Unicode relatively painless,

	60 so there's no great overhead in doing things right.

	61

	62 ## Concatenating Strings

	63

	64 ### Problem

	65

	66 You want to concatenate strings in Dart. You tried using `+`, but

	67 that resulted in an error.

	68

	69 ### Solution

	70

	71 Use adjacent string literals:

	72

	73 var fact = 'Dart' 'is' ' fun!'; // 'Dart is fun!'

	74

	75 ### Discussion

	76

	77 Adjacent literals also work over multiple lines:

	78

	79 var fact = 'Dart'

	80 'is'

	81 'fun!'; // 'Dart is fun!'

	82

	83 They also work when using multiline strings:

	84

	85 var lunch = '''Peanut

	86 butter'''

	87 '''and

	88 jelly'''; // 'Peanut\nbutter and\njelly'

	89

	90 You can concatenate adjacent single line literals with multiline strings:

	91

	92 var funnyGuys = 'Dewey ' 'Cheatem'

	93 ''' and

	94 Howe'''; // 'Dewey Cheatem and\n Howe'

	95

	96

	97 #### Alternatives to adjacent string literals

	98

	99 You can also use the `concat()` method on a string to concatenate it to another
	floitsch 2013/03/09 00:01:41 I just gave an LGTM to Lasse for changing concat t I just gave an LGTM to Lasse for changing concat to String.+. So you might want to wait for that to be committed and adapt this section.
	100 string:

	101

	102 var film = filmToWatch();

	103 film = film.concat('\n'); // 'The Big Lebowski\n'

	104

	105 Since `concat()` creates a new string every time it is invoked, a long chain of

	106 `concat()`s can be expensive. Avoid those. Use a StringBuffer instead (see

	107 _Incrementally building a string efficiently using a StringBuffer_, below).

	108

	109 Use can `join()` to combine a sequence of strings:

	110

	111 var film = ['The', 'Big', 'Lebowski']).join(' '); // 'The Big Lebowski'

	112

	113 You can also use string interpolation to concatenate strings (see

	114 _Interpolating expressions inside strings_, below).

	115

	116

	117 ## Interpolating expressions inside strings

	118

	119 ### Problem

	120

	121 You want to create strings that contain Dart expressions and identifiers.

	122

	123 ### Solution

	124

	125 You can put the value of an expression inside a string by using ${expression}.

	126

	127 var favFood = 'sushi';

	128 var whatDoILove = 'I love ${favFood.toUpperCase()}'; // 'I love SUSHI'

	129

	130 You can skip the {} if the expression is an identifier:

	131

	132 var whatDoILove = 'I love $favFood'; // 'I love sushi'

	133

	134 ### Discussion

	135

	136 An interpolated string, `string ${expression}` is equivalent to the

	137 concatenation of the strings 'string ' and `expression.toString()`.

	138 Consider this code:

	139

	140 var four = 4;

	141 var seasons = 'The $four seasons'; // 'The 4 seasons'

	142

	143 It is equivalent to the following:
	floitsch 2013/03/09 00:01:41 It is not. the concat will make two copies, wherea It is not. the concat will make two copies, whereas the string-interpolation only copies into the result. The result is equivalent, though.
	144

	145 var seasons = 'The '.concat(4.toString()).concat(' seasons'); // 'The 4 seas ons'

	146

	147 You should consider implementing a `toString()` method for user-defined

	148 objects. Here's what happens if you don't:

	149

	150 class Point {

	151 num x, y;

	152 Point(this.x, this.y);

	153 }

	154

	155 var point = new Point(3, 4);

	156 print('Point: $point'); // "Point: Instance of 'Point'"

	157

	158 Probably not what you wanted. Here is the same example with an explicit

	159 `toString()`:

	160

	161 class Point {

	162 ...

	163

	164 String toString() => 'x: $x, y: $y';

	165 }

	166

	167 print('Point: $point'); // 'Point: x: 3, y: 4'

	168

	169

	170 ## Escaping special characters

	171

	172 ### Problem

	173

	174 You want to put newlines, dollar signs, or other special characters in your stri ngs.

	175

	176 ### Solution

	177

	178 Prefix special characters with a `\`.

	179

	180 print(Wile\nCoyote');

	181 // Wile

	182 // Coyote

	183

	184 ### Discussion

	185

	186 Dart designates a few characters as special, and these can be escaped:

	187

	188 - \n for newline, equivalent to \x0A.

	189 - \r for carriage return, equivalent to \x0D.

	190 - \f for form feed, equivalent to \x0C.

	191 - \b for backspace, equivalent to \x08.

	192 - \t for tab, equivalent to \x09.

	193 - \v for vertical tab, equivalent to \x0B.

	194

	195 If you prefer, you can use `\x` or `\u` notation to indicate the special

	196 character:

	197

	198 print('Wile\x0ACoyote'); // same as print('Wile\nCoyote');

	199 print('Wile\u000ACoyote'); // same as print('Wile\nCoyote');

	200

	201 You can also use `\u{}` notation:

	202

	203 print('Wile\u{000A}Coyote'); // same as print('Wile\nCoyote');

	204

	205 You can also escape the `$` used in string interpolation:

	206

	207 var superGenius = 'Wile Coyote';

	208 print('$superGenius and Road Runner'); // 'Wile Coyote and Road Runner'

	209 print('\$superGenius and Road Runner'); // '$superGenius and Road Runner'

	210

	211 If you escape a non-special character, the `\` is ignored:

	212

	213 print('Wile \E Coyote'); // 'Wile E Coyote'

	214

	215

	216 ## Incrementally building a string efficiently using a StringBuffer

	217

	218 ### Problem

	219

	220 You want to collect string fragments and combine them in an efficient manner.

	221

	222 ### Solution

	223

	224 Use a StringBuffer to programmatically generate a string. A StringBuffer

	225 collects the string fragments, but does not generate a new string until

	226 `toString()` is called:

	227

	228 var sb = new StringBuffer();

	229 sb.write('John, ');

	230 sb.write('Paul, ');

	231 sb.write('George, ');

	232 sb.write('and Ringo');

	233 var beatles = sb.toString(); // 'John, Paul, George, and Ringo'

	234

	235 ### Discussion

	236

	237 In addition to `write()`, the StringBuffer class provides methods to write a

	238 list of strings (`writeAll()`), write a numerical character code

	239 (`writeCharCode()`), write with an added newline ('writeln()`), and more. Here

	240 is a simple example that show the use of these methods:

	241

	242 var sb = new StringBuffer();

	243 sb.writeln('The Beatles:');

	244 sb.writeAll(['John, ', 'Paul, ', 'George, and Ringo']);

	245 sb.writeCharCode(33); // charCode for '!'.

	246 var beatles = sb.toString(); // 'The Beatles:\nJohn, Paul, George, and Ringo !'

	247

	248 Since a StringBuffer waits until the call to `toString()` to generate the

	249 concatenated string, it represents a more efficient way of combining strings

	250 than `concat()`. See the _Concatenating Strings_ recipe for a description of

	251 `concat()`.

	252

	253 ## Converting between string characters and numerical codes

	254

	255 ### Problem

	256

	257 You want to convert string characters into numerical codes and back.

	258

	259 ### Solution

	260

	261 Use the `runes` getter to access a string's code points:

	262

	263 'Dart'.runes.toList(); // [68, 97, 114, 116]

	264

	265 var smileyFace = '\u263A'; // ☺

	266 smileyFace.runes.toList(); // [9786]

	267

	268 The number 9786 represents the code unit '\u263A'.

	269

	270 Use `string.codeUnits` to get a string's UTF-16 code units:

	271

	272 'Dart'.codeUnits.toList(); // [68, 97, 114, 116]

	273 smileyFace.codeUnits.toList(); // [9786]

	274

	275 ### Discussion

	276

	277 Notice that using `runes` and `codeUnits` produces identical results

	278 in the examples above. That happens because each character in 'Dart' and in

	279 `smileyFace` fits within 16 bits, resulting in a code unit corresponding

	280 neatly with a code point.

	281

	282 Consider an example where a character cannot be represented within 16-bits,

	283 the Unicode character for a Treble clef ('\u{1F3BC}'). This character consists

	284 of a surrogate pair: '\uD83C', '\uDFBC'. Getting the numerical value of this

	285 character using `codeUnits` and `runes` produces the following result:

	286

	287 var clef = '\u{1F3BC}'; // 🎼

	288 clef.codeUnits.toList(); // [55356, 57276]

	289 clef.runes.toList(); // [127932]

	290

	291 The numbers 55356 and 57276 represent `clef`'s surrogate pair, '\uD83C' and

	292 '\uDFBC', respectively. The number 127932 represents the code point '\u1F3BC'.

	293

	294 #### Using codeUnitAt() to access individual code units

	295

	296 To access the 16-Bit UTF-16 code unit at a particular index, use

	297 `codeUnitAt()`:

	298

	299 'Dart'.codeUnitAt(0); // 68

	300 smileyFace.codeUnitAt(0); // 9786

	301

	302 Using `codeUnitAt()` with the multi-byte `clef` character leads to problems:

	303

	304 clef.codeUnitAt(0); // 55356

	305 clef.codeUnitAt(1); // 57276

	306

	307 In either call to `clef.codeUnitAt()`, the values returned represent strings

	308 that are only one half of a UTF-16 surrogate pair. These are not valid UTF-16

	309 strings.

	310

	311

	312 #### Converting numerical codes to strings

	313

	314 You can generate a new string from runes or code units using the factory

	315 `String.fromCharCodes(charCodes)`:

	316

	317 new String.fromCharCodes([68, 97, 114, 116]); // 'Dart'

	318

	319 new String.fromCharCodes([73, 32, 9825, 32, 76, 117, 99, 121]);

	320 // 'I ♡ Lucy'

	321

	322 new String.fromCharCodes([55356, 57276]); // 🎼

	323 new String.fromCharCodes([127932]), // 🎼

	324

	325 You can use the `String.fromCharCode()` factory to convert a single rune or

	326 code unit to a string:

	327

	328 new String.fromCharCode(68); // 'D'

	329 new String.fromCharCode(9786); // ☺

	330 new String.fromCharCode(127932); // 🎼

	331

	332 Creating a string with only one half of a surrogate pair is permitted, but not

	333 recommended.

	334

	335 ## Determining if a string is empty

	336

	337 ### Problem

	338

	339 You want to know if a string is empty. You tried ` if(string) {...}`, but that

	340 did not work.

	341

	342 ### Solution

	343

	344 Use `string.isEmpty`:

	345

	346 var emptyString = '';

	347 emptyString.isEmpty; // true

	348

	349 A string with a space is not empty:

	350

	351 var space = ' ';

	352 space.isEmpty; // false

	353

	354 ### Discussion

	355

	356 Don't use `if (string)` to test the emptiness of a string. In Dart, all

	357 objects except the boolean true evaluate to false. `if(string)` will always

	358 be false.

	359

	360

	361 ## Removing leading and trailing whitespace

	362

	363 ### Problem

	364

	365 You want to remove leading and trailing whitespace from a string.

	366

	367 ### Solution

	368

	369 Use `string.trim()`:

	370

	371 var space = '\n\r\f\t\v'; // We'll use a variety of space characters.

	372 var string = '$space X $space';

	373 var newString = string.trim(); // 'X'

	374

	375 The String class has no methods to remove only leading or only trailing

	376 whitespace. But you can always use regExps.

	377

	378 Remove only leading whitespace:

	379

	380 var newString = string.replaceFirst(new RegExp(r'^\s+'), ''); // 'X $space'

	381

	382 Remove only trailing whitespace:

	383

	384 var newString = string.replaceFirst(new RegExp(r'\s+$'), ''); // '$space X'

	385

	386

	387 ## Calculating the length of a string

	388

	389 ### Problem

	390

	391 You want to get the length of a string, but are not sure how to

	392 correctly calculate the length when working with Unicode.

	393

	394 ### Solution

	395

	396 Use string.length to get the number of UTF-16 code units in a string:

	397

	398 'I love music'.length; // 12

	399 'I love music'.runes.length; // 12

	400

	401 ### Discussion

	402

	403 For characters that fit into 16 bits, the code unit length is the same as the

	404 rune length:

	405

	406 var hearts = '\u2661'; // ♡

	407 hearts.length; // 1

	408 hearts.runes.length; // 1

	409

	410 If the string contains any characters outside the Basic Multilingual

	411 Plane (BMP), the rune length will be less than the code unit length:

	412

	413 var clef = '\u{1F3BC}'; // 🎼

	414 clef.length; // 2

	415 clef.runes.length; // 1

	416

	417 var music = 'I $hearts $clef'; // 'I ♡ 🎼 '

	418 music.length; // 6

	419 music.runes.length // 5

	420

	421 Use `length` if you want to number of code units; use `runes.length` if you

	422 want the number of runes.
	floitsch 2013/03/09 00:01:41 You could add, that Twitter uses runes for the len You could add, that Twitter uses runes for the length limit.
	423

	424

	425 ## Subscripting a string

	426

	427 ### Problem

	428

	429 You want to be able to access a character in a string at a particular index.

	430

	431 ### Solution

	432

	433 Subscript runes:

	434

	435 var teacup = '\u{1F375}'; // 🍵

	436 teacup.runes.toList()[0]; // 127861
	floitsch 2013/03/09 00:01:41 If you want to access it only once, you can also u If you want to access it only once, you can also use 'elementAt()' teacup.runes.first is also valid here. Furthermore note that runes has a special iterator that allows to move forward and backward.
	437

	438 The number 127861 represents the code point for teacup, '\u{1F375}' (🍵 ).

	439

	440 ### Discussion

	441

	442 Subscripting a string directly can be problematic. This is because the default

	443 `[]` implementation subscripts along code units. This means that

	444 for non-BMP characters, subscripting yields invalid UTF-16 characters:

	445

	446 'Dart'[0]; // 'D'

	447

	448 var hearts = '\u2661'; // ♡

	449 hearts[0]; '\u2661' // ♡

	450

	451 teacup[0]; // 55356, Invalid string, half of a surrogate pair.

	452 teacup.codeUnits.toList()[0]; // The same.

	453

	454

	455 ## Processing a string one character at a time

	456

	457 ### Problem

	458

	459 You want to do something with each individual character in a string.

	460

	461 ### Solution

	462

	463 To access an individual character, map the string runes:

	464

	465 var charList = "Dart".runes.map((rune) => '${new String.fromCharCode(rune)} ').toList();

	466 // ['D', 'a', 'r', 't']

	467

	468 var runeList = happy.runes.map((rune) => [rune, new String.fromCharCode(rune )]).toList(),

	469 // [[73, 'I'], [32, ' '], [97, 'a'], [109, 'm'], [32, ' '], [9786, '☺' ]]

	470

	471 If you are sure that the string is in the Basic Multilingual Plane (BMP), you

	472 can use string.split(''):

	473

	474 'Dart'.split(''); // ['D', 'a', 'r', 't']

	475 smileyFace.split('').length; // 1

	476

	477 Since `split('')` splits at the UTF-16 code unit boundaries,

	478 invoking it on a non-BMP character yields the string's surrogate pair:

	479

	480 var clef = '\u{1F3BC}'; // 🎼 , not in BMP.

	481 clef.split('').length; // 2

	482

	483 The surrogate pair members are not valid UTF-16 strings.

	484

	485

	486 ## Splitting a string into substrings

	487

	488 ### Problem

	489

	490 You want to split a string into substrings.

	491

	492 ### Solution

	493

	494 Use the `split()` method with a string or a regExp as an argument.

	495

	496 var smileyFace = '\u263A';

	497 var happy = 'I am $smileyFace';

	498 happy.split(' '); // ['I', 'am', '☺']

	499

	500 Here is an example of using `split()` with a regExp:

	501

	502 var nums = '2/7 3 4/5 3~/5';

	503 var numsRegExp = new RegExp(r'(\s\|/\|~/)');

	504 nums.split(numsRegExp); // ['2', '7', '3', '4', '5', '3', '5']

	505

	506 In the code above, the string `nums` contains various numbers, some of which

	507 are expressed as fractions or as int-divisions. A regExp is used to split the

	508 string to extract just the numbers.

	509

	510 You can perform operations on the matched and unmatched portions of a string

	511 when using `split()` with a regExp:

	512

	513 'Eats SHOOTS leaves'.splitMapJoin((new RegExp(r'SHOOTS')),

	514 onMatch: (m) => '${m.group(0).toLowerCase()}',

	515 onNonMatch: (n) => n.toUpperCase()); // 'EATS shoots LEAVES'

	516

	517 The regExp matches the middle word ('SHOOTS'). A pair of callbacks are

	518 registered to transform the matched and unmatched substrings before the

	519 substrings are joined together again.

	520

	521

	522 ## Changing string case

	523

	524 ### Problem

	525

	526 You want to change the case of strings.

	527

	528 ### Solution

	529

	530 Use `string.toUpperCase()` and `string.toLowerCase()` to convert a string to

	531 lower-case or upper-case, respectively:

	532

	533 var theOneILove = 'I love Lucy';

	534 theOneILove.toUpperCase(); // 'I LOVE LUCY!'

	535 theOneILove.toLowerCase(); // 'i love lucy!'

	536

	537 ### Discussion

	538

	539 Case changes affect the characters of bi-cameral scripts like Greek and French:

	540 var zeus = '\u0394\u03af\u03b1\u03c2'; // 'Δίας' (Zeus in modern Greek)

	541 zeus.toUpperCase(); // 'ΔΊΑΣ'

	542

	543 var resume = '\u0052\u00e9\u0073\u0075\u006d\u00e9'; // 'Résumé'

	544 resume.toLowerCase(); // 'résumé'

	545

	546 They do not affect the characters of uni-cameral scripts like Devanagari (used f or

	547 writing many of the languages of India):

	548

	549 var chickenKebab = '\u091a\u093f\u0915\u0928 \u0915\u092c\u093e\u092c';

	550 // 'चिकन कबाब' (in Devanagari)

	551 chickenKebab.toLowerCase(); // 'चिकन कबाब'

	552 chickenKebab.toUpperCase(); // 'चिकन कबाब'

	553

	554 If a character's case does not change when using `toUpperCase()` and

	555 `toLowerCase()`, it is most likely because the character only has one

	556 form.

	557

	558 ## Determining whether a string contains another string

	559

	560 ### Problem

	561

	562 You want to find out if a string is the substring of another string.

	563

	564 ### Solution

	565

	566 Use `string.contains()`:

	567

	568 var fact = 'Dart strings are immutable';

	569 string.contains('immutable'); // True.

	570

	571 You can indicate a startIndex as a second argument:

	572

	573 string.contains('Dart', 2); // False

	574

	575 ### Discussion

	576

	577 The String library provides a couple of shortcuts for testing whether a string

	578 is a substring of another:

	579

	580 string.startsWith('Dart'); // True.

	581 string.endsWith('e'); // True.

	582

	583 You can also use `string.indexOf()`, which returns -1 if the substring is

	584 not found within a string, and its matching index, if it is:

	585

	586 string.indexOf('art') != -1; // True, `art` is found in `Dart`

	587

	588 You can also use a regExp and `hasMatch()`:

	589

	590 new RegExp(r'ar[et]').hasMatch(string); // True, 'art' and 'are' match.

	591

	592

	593 ## Finding matches of a regExp pattern in a string

	594

	595 ### Problem

	596

	597 You want to use regExp to match a pattern in a string, and

	598 want to be able to access the matches.

	599

	600 ### Solution

	601

	602 Construct a regular expression using the RegExp class and find matches using

	603 the `allMatches()` method:

	604

	605 var neverEatingThat = 'Not with a fox, not in a box';

	606 var regExp = new RegExp(r'[fb]ox');

	607 List matches = regExp.allMatches(neverEatingThat);

	608 matches.map((match) => match.group(0)).toList(); // ['fox', 'box']

	609

	610 ### Discussion

	611

	612 You can query the object returned by `allMatches()` to find out the number of

	613 matches:

	614

	615 matches.length; // 2

	616

	617 To find the first match, use `firstMatch()`:

	618

	619 regExp.firstMatch(neverEatingThat).group(0); // 'fox'

	620

	621 To directly access the matched string, use `stringMatch()`:

	622

	623 regExp.stringMatch(neverEatingThat); // 'fox'

	624 regExp.stringMatch('I like bagels and lox'); // null

	625

	626

	627 ## Substituting strings based on regExp matches

	628

	629 ### Problem

	630

	631 You want to match substrings within a string and make substitutions based on

	632 the matches.

	633

	634 ### Solution

	635

	636 Construct a regular expression using the RegExp class and make replacements

	637 using `replaceAll()` method:

	638

	639 'resume'.replaceAll(new RegExp(r'e'), '\u00E9'); // 'résumé'

	640

	641 If you want to replace just the first match, use 'replaceFirst()`:

	642

	643 '0.0001'.replaceFirst(new RegExp(r'0+'), ''); // '.0001'

	644

	645 The RegExp matches for one or more 0's and replaces them with an empty string.

	646

	647 You can use `replaceAllMatched()` and register a function to modify the

	648 matches:

	649

	650 var heart = '\u2661'; // '♡'

	651 var string = 'I like Ike but I $heart Lucy';

	652 var regExp = new RegExp(r'[A-Z]\w+');

	653 string.replaceAllMapped(regExp, (match) => match.group(0).toUpperCase());

	654 // 'I like IKE but I ♡ LUCY'

OLD	NEW

« no previous file with comments | « recipes/pubspec.yaml ('k') | recipes/test/all_tests.dart » ('j') | recipes/test/core/strings/concatenating_strings_test.dart » ('J')