|
|
Created:
6 years, 3 months ago by nweiz Modified:
6 years, 3 months ago Reviewers:
Bob Nystrom CC:
reviews_dartlang.org Visibility:
Public. |
DescriptionCreate a glob package.
BUG=17093
R=rnystrom@google.com
Committed: https://code.google.com/p/dart/source/detail?r=39784
Patch Set 1 #
Total comments: 52
Patch Set 2 : Code review changes #
Total comments: 18
Patch Set 3 : Code review changes #
Total comments: 2
Patch Set 4 : Code review changes #
Messages
Total messages: 13 (0 generated)
nweiz@google.com changed reviewers: + rnystrom@google.com
I ended up changing my mind after our last discussion about syntax. Globs now use POSIX syntax, since that's consistent with the closest thing there is to a standard, the Bash implementation.
https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode46 pkg/glob/README.md:46: exceptions that will be outlined below. "will be" -> "are". https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode51 pkg/glob/README.md:51: for example, a glob matching all files in the C drive would be `C:/*`. This means r"C:\foo.txt" will not match the file C:\foo.txt? That's going to make users really sad. Maybe this should do platform detection like path does? https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode55 pkg/glob/README.md:55: The `*` character matches any character other than `/`. This means that it can "any" -> "zero or more of any" https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode62 pkg/glob/README.md:62: `**` is like `*`, but matches `/` as well. It's useful for matching files or Does this mean "a**b" will match "azzzb", "a/zzb", etc.? I know of course that idiomatic usage of ** is surrounded by path separators, but it seems confusing if you omit them. https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode69 pkg/glob/README.md:69: glob produces the same results as listing that glob. I don't understand this. Can you explain why this exception is needed? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart File pkg/glob/lib/glob.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:13: final _quoteRegExp = new RegExp(r'[*{[?\\}\],\-]'); Doc comment. Also, escape "()". https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:19: /// POSIX path syntax, it can match against POSIX, Windows, or URL paths. Which This is confusing. Maybe: 'Although the glob pattern syntax always uses "/" as a path separator, it can match POSIX, Windows, or URL paths. The format it expects paths to use is based on the `context`...' https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:40: /// Whether this glob recursively matches and lists entities. It took me a while to figure out what this meant in the context of matches (for listing it's a little more intuitive). Maybe something like, "If true, then a path matches if it matches the glob itself or is contained in a directory that does." https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:48: if (_contextIsAbsoluteCache == null) { Why is this cached? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:56: bool get _patternCouldBeAbsolute { "could be" reads confusingly to me. I don't think of globs as being relative or absolute. How about _canMatchAbsolute? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:92: Glob._(String pattern, p.Context context, bool recursive) It might be simpler to make this take the ast too and then make the above ctor a factor ctor. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart File pkg/glob/lib/src/ast.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:14: abstract class AstNode { Instead of actually parsing to an AST and then compiling that down to a RegExp in a separate step, how about having the parser just accumulate the result as it goes along? There's no real reason to keep the AST around after generating the RegExp is done, is there? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:167: final List<SequenceNode> options; Why not AstNode? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart File pkg/glob/lib/src/parser.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:38: return new SequenceNode(nodes); This allows an empty sequence intentionally, right? Document that. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:91: if (_scanner.matches('-') || _scanner.matches(']')) { '?' or '^' too? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:101: position: _scanner.position - 3, length: 3); This doesn't take into account '\'. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:120: while (_scanner.scan(',')) { Use a do/while here. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:123: if (options.length == 1) _scanner.expect(','); What's this for? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:132: inOptions ? r'[^*{[?\\}\],()]*' : r'[^*{[?\\}\]()]*'); Document this. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:139: _scanner.scan(regExp); What about multiple sequential escaped characters?
Code review changes
https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode46 pkg/glob/README.md:46: exceptions that will be outlined below. On 2014/08/27 00:42:04, Bob Nystrom wrote: > "will be" -> "are". Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode51 pkg/glob/README.md:51: for example, a glob matching all files in the C drive would be `C:/*`. On 2014/08/27 00:42:04, Bob Nystrom wrote: > This means r"C:\foo.txt" will not match the file C:\foo.txt? That's going to > make users really sad. That's correct. r"C:\foo.txt" is already a well-formed, meaningful glob (equivalent to "C:foo.txt"). There's no good way to make backslashes work how Windows users expect, and I think this is the simplest and most-understandable alternative. > Maybe this should do platform detection like path does? It's important that globs be as cross-platform as possible. If we fundamentally change their syntax across platforms, they lose their utility as a general-purpose representation of "this group of files". https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode55 pkg/glob/README.md:55: The `*` character matches any character other than `/`. This means that it can On 2014/08/27 00:42:04, Bob Nystrom wrote: > "any" -> "zero or more of any" Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode62 pkg/glob/README.md:62: `**` is like `*`, but matches `/` as well. It's useful for matching files or On 2014/08/27 00:42:03, Bob Nystrom wrote: > Does this mean "a**b" will match "azzzb", "a/zzb", etc.? I know of course that > idiomatic usage of ** is surrounded by path separators, but it seems confusing > if you omit them. Yes. It's useful to use ** without surrounding separators when doing things like "**.dart", which will match all dart files recursively beneath the current directory. https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode69 pkg/glob/README.md:69: glob produces the same results as listing that glob. On 2014/08/27 00:42:04, Bob Nystrom wrote: > I don't understand this. Can you explain why this exception is needed? I explain in the text: it ensures that the list behavior is the same as the matches behavior. In other words, there shouldn't be any paths that match a glob that the glob wouldn't list. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart File pkg/glob/lib/glob.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:13: final _quoteRegExp = new RegExp(r'[*{[?\\}\],\-]'); On 2014/08/27 00:42:04, Bob Nystrom wrote: > Doc comment. Also, escape "()". Done, although I'm ambivalent about rigorously document regular expressions that would be inlined if not for Dart's const rules. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:19: /// POSIX path syntax, it can match against POSIX, Windows, or URL paths. Which On 2014/08/27 00:42:04, Bob Nystrom wrote: > This is confusing. Maybe: > > 'Although the glob pattern syntax always uses "/" as a path separator, it can > match POSIX, Windows, or URL paths. The format it expects paths to use is based > on the `context`...' I think that's more confusing than my phrasing, especially since there are differences beyond the separator for URL paths. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:40: /// Whether this glob recursively matches and lists entities. On 2014/08/27 00:42:04, Bob Nystrom wrote: > It took me a while to figure out what this meant in the context of matches (for > listing it's a little more intuitive). Maybe something like, "If true, then a > path matches if it matches the glob itself or is contained in a directory that > does." Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:48: if (_contextIsAbsoluteCache == null) { On 2014/08/27 00:42:04, Bob Nystrom wrote: > Why is this cached? I want to minimize the amount of work that has to be done on each [matches] call, especially work that involves path stuff. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:56: bool get _patternCouldBeAbsolute { On 2014/08/27 00:42:04, Bob Nystrom wrote: > "could be" reads confusingly to me. I don't think of globs as being relative or > absolute. How about _canMatchAbsolute? Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:92: Glob._(String pattern, p.Context context, bool recursive) On 2014/08/27 00:42:04, Bob Nystrom wrote: > It might be simpler to make this take the ast too and then make the above ctor a > factor ctor. I don't like making things factory constructors when I can avoid it since it makes inheritance harder. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart File pkg/glob/lib/src/ast.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:14: abstract class AstNode { On 2014/08/27 00:42:04, Bob Nystrom wrote: > Instead of actually parsing to an AST and then compiling that down to a RegExp > in a separate step, how about having the parser just accumulate the result as it > goes along? There's no real reason to keep the AST around after generating the > RegExp is done, is there? Checking for absolute vs relative globs uses the AST. Listing will also use it. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:167: final List<SequenceNode> options; On 2014/08/27 00:42:04, Bob Nystrom wrote: > Why not AstNode? Because in practice they're always SequenceNodes and making that explicit can simplify manipulation logic. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart File pkg/glob/lib/src/parser.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:38: return new SequenceNode(nodes); On 2014/08/27 00:42:04, Bob Nystrom wrote: > This allows an empty sequence intentionally, right? Document that. Eh, it should probably error for an empty sequence. Added that and a test. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:91: if (_scanner.matches('-') || _scanner.matches(']')) { On 2014/08/27 00:42:04, Bob Nystrom wrote: > '?' or '^' too? No, something like [#-?] is fine. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:101: position: _scanner.position - 3, length: 3); On 2014/08/27 00:42:04, Bob Nystrom wrote: > This doesn't take into account '\'. Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:120: while (_scanner.scan(',')) { On 2014/08/27 00:42:04, Bob Nystrom wrote: > Use a do/while here. Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:123: if (options.length == 1) _scanner.expect(','); On 2014/08/27 00:42:04, Bob Nystrom wrote: > What's this for? It disallows single-option groups. Added a comment. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:132: inOptions ? r'[^*{[?\\}\],()]*' : r'[^*{[?\\}\]()]*'); On 2014/08/27 00:42:04, Bob Nystrom wrote: > Document this. Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/parser.dart... pkg/glob/lib/src/parser.dart:139: _scanner.scan(regExp); On 2014/08/27 00:42:04, Bob Nystrom wrote: > What about multiple sequential escaped characters? [regExp] just matches zero characters here.
Sorry for the new batch of comments. I lost my train of thought yesterday and sent this out before I reviewed the whole patch. Overall, I really like this! :) https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode69 pkg/glob/README.md:69: glob produces the same results as listing that glob. On 2014/08/27 01:36:32, nweiz wrote: > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > I don't understand this. Can you explain why this exception is needed? > > I explain in the text: it ensures that the list behavior is the same as the > matches behavior. In other words, there shouldn't be any paths that match a glob > that the glob wouldn't list. Right, but you don't say *why* the behavior between listing and matching would be different without this exception. Thinking about it more, I get it now. If you list the files that match "**.dart", you only get stuff within the cwd that match that. When listing, there's a sort of implied "./" prefix. Will "**.dart" match "../foo.dart"? Should it? https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart File pkg/glob/lib/glob.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:13: final _quoteRegExp = new RegExp(r'[*{[?\\}\],\-]'); On 2014/08/27 01:36:32, nweiz wrote: > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > Doc comment. Also, escape "()". > > Done, although I'm ambivalent about rigorously document regular expressions that > would be inlined if not for Dart's const rules. Never in my life have I found myself thinking, "you know, I wish we had fewer docs for our regular expressions." :) https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:19: /// POSIX path syntax, it can match against POSIX, Windows, or URL paths. Which On 2014/08/27 01:36:32, nweiz wrote: > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > This is confusing. Maybe: > > > > 'Although the glob pattern syntax always uses "/" as a path separator, it can > > match POSIX, Windows, or URL paths. The format it expects paths to use is > based > > on the `context`...' > > I think that's more confusing than my phrasing, especially since there are > differences beyond the separator for URL paths. How about then: "Although the glob pattern uses POSIX syntax, it can match POSIX, Windows, or URL paths. The format it expects paths to use is based... The current text uses "path path path" and it isn't clear when you're talking about a pattern, a path style, or a path argument. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:48: if (_contextIsAbsoluteCache == null) { On 2014/08/27 01:36:32, nweiz wrote: > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > Why is this cached? > > I want to minimize the amount of work that has to be done on each [matches] > call, especially work that involves path stuff. That makes sense. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart File pkg/glob/lib/src/ast.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:14: abstract class AstNode { On 2014/08/27 01:36:33, nweiz wrote: > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > Instead of actually parsing to an AST and then compiling that down to a RegExp > > in a separate step, how about having the parser just accumulate the result as > it > > goes along? There's no real reason to keep the AST around after generating the > > RegExp is done, is there? > > Checking for absolute vs relative globs uses the AST. Sure, but you could bubble that up at parse time and just store that in the Glob. I don't see much use in keeping the entire tree around when its immutable and only contains a few fields. For example, each tree node calculates and caches its _regExp, but you only need the top-level one, right? I think it would be simpler (and certainly more memory efficient) to treat it more like a builder pattern and either flatten as you go while parsing or flatten once you're done and then discard the tree. > Listing will also use it. I guess I'd have to see the details here. https://codereview.chromium.org/506993004/diff/1/pkg/glob/test/glob_test.dart File pkg/glob/test/glob_test.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/test/glob_test.dart... pkg/glob/test/glob_test.dart:54: Test that it returns an empty collection if it doesn't match. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/glob_test.... File pkg/glob/test/glob_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/glob_test.... pkg/glob/test/glob_test.dart:54: Test that it returns an empty collection if it doesn't match. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... File pkg/glob/test/match_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:18: expect("foo*", contains(new Glob("foo\\*"))); Raw string here? https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:30: }); What about a totally empty string? expect("", contains(new Glob("*")); I know it's meaningless for a real path, but it would be good to nail down the edge case behavior. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:136: expect("foo/bar", isNot(contains(new Glob("foo[/]bar")))); Should it be an error to use "/" in a [] then? https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:137: expect("foo/bar", isNot(contains(new Glob("foo[\t-~]bar")))); Document that \t-~ is a range that includes "/". https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... File pkg/glob/test/parse_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:21: expect("fo1", contains(bang)); Use something other than 1 here. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:49: }); Test "[-]", "[a-]", "[-b]", and "[-". https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:77: }); Test "{abc,".
Code review changes
https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/README.md#newcode69 pkg/glob/README.md:69: glob produces the same results as listing that glob. On 2014/08/27 22:16:44, Bob Nystrom wrote: > On 2014/08/27 01:36:32, nweiz wrote: > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > I don't understand this. Can you explain why this exception is needed? > > > > I explain in the text: it ensures that the list behavior is the same as the > > matches behavior. In other words, there shouldn't be any paths that match a > glob > > that the glob wouldn't list. > > Right, but you don't say *why* the behavior between listing and matching would > be different without this exception. > > Thinking about it more, I get it now. If you list the files that match > "**.dart", you only get stuff within the cwd that match that. When listing, > there's a sort of implied "./" prefix. > > Will "**.dart" match "../foo.dart"? Should it? Added some more clarification. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart File pkg/glob/lib/glob.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:13: final _quoteRegExp = new RegExp(r'[*{[?\\}\],\-]'); On 2014/08/27 22:16:44, Bob Nystrom wrote: > On 2014/08/27 01:36:32, nweiz wrote: > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > Doc comment. Also, escape "()". > > > > Done, although I'm ambivalent about rigorously document regular expressions > that > > would be inlined if not for Dart's const rules. > > Never in my life have I found myself thinking, "you know, I wish we had fewer > docs for our regular expressions." :) https://github.com/dart-lang/bleeding_edge/commit/098dc476b08b658bbdf71ab3fc8... Just sayin'. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:19: /// POSIX path syntax, it can match against POSIX, Windows, or URL paths. Which On 2014/08/27 22:16:44, Bob Nystrom wrote: > On 2014/08/27 01:36:32, nweiz wrote: > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > This is confusing. Maybe: > > > > > > 'Although the glob pattern syntax always uses "/" as a path separator, it > can > > > match POSIX, Windows, or URL paths. The format it expects paths to use is > > based > > > on the `context`...' > > > > I think that's more confusing than my phrasing, especially since there are > > differences beyond the separator for URL paths. > > How about then: "Although the glob pattern uses POSIX syntax, it can match > POSIX, Windows, or URL paths. The format it expects paths to use is based... > > The current text uses "path path path" and it isn't clear when you're talking > about a pattern, a path style, or a path argument. Done. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart File pkg/glob/lib/src/ast.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:14: abstract class AstNode { On 2014/08/27 22:16:44, Bob Nystrom wrote: > On 2014/08/27 01:36:33, nweiz wrote: > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > Instead of actually parsing to an AST and then compiling that down to a > RegExp > > > in a separate step, how about having the parser just accumulate the result > as > > it > > > goes along? There's no real reason to keep the AST around after generating > the > > > RegExp is done, is there? > > > > Checking for absolute vs relative globs uses the AST. > > Sure, but you could bubble that up at parse time and just store that in the > Glob. > > I don't see much use in keeping the entire tree around when its immutable and > only contains a few fields. For example, each tree node calculates and caches > its _regExp, but you only need the top-level one, right? No; only the top-level AST node caches its RegExp. The rest are generated using [AstNode._toRegExp], which doesn't cache. > I think it would be simpler (and certainly more memory efficient) to treat it > more like a builder pattern and either flatten as you go while parsing or > flatten once you're done and then discard the tree. Once listing exists, it will compile the AST to a completely different data structure. I don't want to do both the RegExp compilation and the listing compilation when the glob is first created, since probably only one of them will be used. I could add some logic to ditch the AST if both of the compilations *are* done, but that doesn't seem worth the complexity. > > Listing will also use it. > > I guess I'd have to see the details here. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/glob_test.... File pkg/glob/test/glob_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/glob_test.... pkg/glob/test/glob_test.dart:54: On 2014/08/27 22:16:44, Bob Nystrom wrote: > Test that it returns an empty collection if it doesn't match. Done. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... File pkg/glob/test/match_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:18: expect("foo*", contains(new Glob("foo\\*"))); On 2014/08/27 22:16:44, Bob Nystrom wrote: > Raw string here? Done. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:30: }); On 2014/08/27 22:16:44, Bob Nystrom wrote: > What about a totally empty string? > > expect("", contains(new Glob("*")); > > I know it's meaningless for a real path, but it would be good to nail down the > edge case behavior. Done. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:136: expect("foo/bar", isNot(contains(new Glob("foo[/]bar")))); On 2014/08/27 22:16:44, Bob Nystrom wrote: > Should it be an error to use "/" in a [] then? I'm not sure. On one hand, it won't ever match anything; on the other hand, I don't think it makes sense to disallow ranges that include "/" (such as "!-~"), and it seems weird to allow it there but forbid it in and of itself. What do you think? https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:137: expect("foo/bar", isNot(contains(new Glob("foo[\t-~]bar")))); On 2014/08/27 22:16:44, Bob Nystrom wrote: > Document that \t-~ is a range that includes "/". Done. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... File pkg/glob/test/parse_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:21: expect("fo1", contains(bang)); On 2014/08/27 22:16:44, Bob Nystrom wrote: > Use something other than 1 here. Done. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:49: }); On 2014/08/27 22:16:44, Bob Nystrom wrote: > Test "[-]", "[a-]", "[-b]", and "[-". I think the first three are actually expected to work. They treat the "-" as a literal character in RegExps and Bash globs. Added positive tests for them. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/parse_test... pkg/glob/test/parse_test.dart:77: }); On 2014/08/27 22:16:45, Bob Nystrom wrote: > Test "{abc,". Done.
LGTM! https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart File pkg/glob/lib/glob.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/glob.dart#newco... pkg/glob/lib/glob.dart:13: final _quoteRegExp = new RegExp(r'[*{[?\\}\],\-]'); On 2014/09/02 19:48:15, nweiz wrote: > On 2014/08/27 22:16:44, Bob Nystrom wrote: > > On 2014/08/27 01:36:32, nweiz wrote: > > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > > Doc comment. Also, escape "()". > > > > > > Done, although I'm ambivalent about rigorously document regular expressions > > that > > > would be inlined if not for Dart's const rules. > > > > Never in my life have I found myself thinking, "you know, I wish we had fewer > > docs for our regular expressions." :) > > https://github.com/dart-lang/bleeding_edge/commit/098dc476b08b658bbdf71ab3fc8... > > Just sayin'. The person reviewing that code didn't seem to mind! :) I will note that that regex is a good bit easier to read than this one, but fine. https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart File pkg/glob/lib/src/ast.dart (right): https://codereview.chromium.org/506993004/diff/1/pkg/glob/lib/src/ast.dart#ne... pkg/glob/lib/src/ast.dart:14: abstract class AstNode { On 2014/09/02 19:48:15, nweiz wrote: > On 2014/08/27 22:16:44, Bob Nystrom wrote: > > On 2014/08/27 01:36:33, nweiz wrote: > > > On 2014/08/27 00:42:04, Bob Nystrom wrote: > > > > Instead of actually parsing to an AST and then compiling that down to a > > RegExp > > > > in a separate step, how about having the parser just accumulate the result > > as > > > it > > > > goes along? There's no real reason to keep the AST around after generating > > the > > > > RegExp is done, is there? > > > > > > Checking for absolute vs relative globs uses the AST. > > > > Sure, but you could bubble that up at parse time and just store that in the > > Glob. > > > > I don't see much use in keeping the entire tree around when its immutable and > > only contains a few fields. For example, each tree node calculates and caches > > its _regExp, but you only need the top-level one, right? > > No; only the top-level AST node caches its RegExp. The rest are generated using > [AstNode._toRegExp], which doesn't cache. > > > I think it would be simpler (and certainly more memory efficient) to treat it > > more like a builder pattern and either flatten as you go while parsing or > > flatten once you're done and then discard the tree. > > Once listing exists, it will compile the AST to a completely different data > structure. I don't want to do both the RegExp compilation and the listing > compilation when the glob is first created, since probably only one of them will > be used. I could add some logic to ditch the AST if both of the compilations > *are* done, but that doesn't seem worth the complexity. SGTM. https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... File pkg/glob/test/match_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:136: expect("foo/bar", isNot(contains(new Glob("foo[/]bar")))); On 2014/09/02 19:48:15, nweiz wrote: > On 2014/08/27 22:16:44, Bob Nystrom wrote: > > Should it be an error to use "/" in a [] then? > > I'm not sure. On one hand, it won't ever match anything; on the other hand, I > don't think it makes sense to disallow ranges that include "/" (such as "!-~"), > and it seems weird to allow it there but forbid it in and of itself. What do you > think? Yeah, I think it makes sense to allow ranges that span it (even though it will be ignored). However, if a user uses "/" by itself, or as the endpoint of a range, it seems like they are trying to express that the "/" is meaningful, and it's not going to do anything useful for them. Given that, I think raising an error will be more helpful than silently doing nothing. https://codereview.chromium.org/506993004/diff/40001/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/40001/pkg/glob/README.md#newco... pkg/glob/README.md:69: checking whether they matches a glob produces the same results as listing that "match"
https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... File pkg/glob/test/match_test.dart (right): https://codereview.chromium.org/506993004/diff/20001/pkg/glob/test/match_test... pkg/glob/test/match_test.dart:136: expect("foo/bar", isNot(contains(new Glob("foo[/]bar")))); On 2014/09/02 20:23:51, Bob Nystrom wrote: > On 2014/09/02 19:48:15, nweiz wrote: > > On 2014/08/27 22:16:44, Bob Nystrom wrote: > > > Should it be an error to use "/" in a [] then? > > > > I'm not sure. On one hand, it won't ever match anything; on the other hand, I > > don't think it makes sense to disallow ranges that include "/" (such as > "!-~"), > > and it seems weird to allow it there but forbid it in and of itself. What do > you > > think? > > Yeah, I think it makes sense to allow ranges that span it (even though it will > be ignored). > > However, if a user uses "/" by itself, or as the endpoint of a range, it seems > like they are trying to express that the "/" is meaningful, and it's not going > to do anything useful for them. Given that, I think raising an error will be > more helpful than silently doing nothing. Done. https://codereview.chromium.org/506993004/diff/40001/pkg/glob/README.md File pkg/glob/README.md (right): https://codereview.chromium.org/506993004/diff/40001/pkg/glob/README.md#newco... pkg/glob/README.md:69: checking whether they matches a glob produces the same results as listing that On 2014/09/02 20:23:51, Bob Nystrom wrote: > "match" Done.
Code review changes
Message was sent while issue was closed.
Committed patchset #4 (id:60001) manually as 39784 (presubmit successful). |