runtime/vm/regexp.cc - Issue 678193004: Copy irregexp related code from V8.

Side by Side Diff: runtime/vm/regexp.cc

Issue 678193004: Copy irregexp related code from V8. (Closed) Base URL: https://dart.googlecode.com/svn/branches/bleeding_edge/dart

Patch Set: rebase Created 6 years, 1 month ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

OLD	NEW
(Empty)
	1 // Copyright (c) 2014, the Dart project authors. Please see the AUTHORS file

	2 // for details. All rights reserved. Use of this source code is governed by a

	3 // BSD-style license that can be found in the LICENSE file.

	4

	5 #include "vm/regexp.h"

	6

	7 // SNIP

	8

	9 namespace dart {

	10

	11 // SNIP

	12

	13 // -------------------------------------------------------------------

	14 // Implementation of the Irregexp regular expression engine.

	15 //

	16 // The Irregexp regular expression engine is intended to be a complete

	17 // implementation of ECMAScript regular expressions. It generates either

	18 // bytecodes or native code.

	19

	20 // The Irregexp regexp engine is structured in three steps.

	21 // 1) The parser generates an abstract syntax tree. See ast.cc.

	22 // 2) From the AST a node network is created. The nodes are all

	23 // subclasses of RegExpNode. The nodes represent states when

	24 // executing a regular expression. Several optimizations are

	25 // performed on the node network.

	26 // 3) From the nodes we generate either byte codes or native code

	27 // that can actually execute the regular expression (perform

	28 // the search). The code generation step is described in more

	29 // detail below.

	30

	31 // Code generation.

	32 //

	33 // The nodes are divided into four main categories.

	34 // * Choice nodes

	35 // These represent places where the regular expression can

	36 // match in more than one way. For example on entry to an

	37 // alternation (foo\|bar) or a repetition (*, +, ? or {}).

	38 // * Action nodes

	39 // These represent places where some action should be

	40 // performed. Examples include recording the current position

	41 // in the input string to a register (in order to implement

	42 // captures) or other actions on register for example in order

	43 // to implement the counters needed for {} repetitions.

	44 // * Matching nodes

	45 // These attempt to match some element part of the input string.

	46 // Examples of elements include character classes, plain strings

	47 // or back references.

	48 // * End nodes

	49 // These are used to implement the actions required on finding

	50 // a successful match or failing to find a match.

	51 //

	52 // The code generated (whether as byte codes or native code) maintains

	53 // some state as it runs. This consists of the following elements:

	54 //

	55 // * The capture registers. Used for string captures.

	56 // * Other registers. Used for counters etc.

	57 // * The current position.

	58 // * The stack of backtracking information. Used when a matching node

	59 // fails to find a match and needs to try an alternative.

	60 //

	61 // Conceptual regular expression execution model:

	62 //

	63 // There is a simple conceptual model of regular expression execution

	64 // which will be presented first. The actual code generated is a more

	65 // efficient simulation of the simple conceptual model:

	66 //

	67 // * Choice nodes are implemented as follows:

	68 // For each choice except the last {

	69 // push current position

	70 // push backtrack code location

	71 // <generate code to test for choice>

	72 // backtrack code location:

	73 // pop current position

	74 // }

	75 // <generate code to test for last choice>

	76 //

	77 // * Actions nodes are generated as follows

	78 // <push affected registers on backtrack stack>

	79 // <generate code to perform action>

	80 // push backtrack code location

	81 // <generate code to test for following nodes>

	82 // backtrack code location:

	83 // <pop affected registers to restore their state>

	84 // <pop backtrack location from stack and go to it>

	85 //

	86 // * Matching nodes are generated as follows:

	87 // if input string matches at current position

	88 // update current position

	89 // <generate code to test for following nodes>

	90 // else

	91 // <pop backtrack location from stack and go to it>

	92 //

	93 // Thus it can be seen that the current position is saved and restored

	94 // by the choice nodes, whereas the registers are saved and restored by

	95 // by the action nodes that manipulate them.

	96 //

	97 // The other interesting aspect of this model is that nodes are generated

	98 // at the point where they are needed by a recursive call to Emit(). If

	99 // the node has already been code generated then the Emit() call will

	100 // generate a jump to the previously generated code instead. In order to

	101 // limit recursion it is possible for the Emit() function to put the node

	102 // on a work list for later generation and instead generate a jump. The

	103 // destination of the jump is resolved later when the code is generated.

	104 //

	105 // Actual regular expression code generation.

	106 //

	107 // Code generation is actually more complicated than the above. In order

	108 // to improve the efficiency of the generated code some optimizations are

	109 // performed

	110 //

	111 // * Choice nodes have 1-character lookahead.

	112 // A choice node looks at the following character and eliminates some of

	113 // the choices immediately based on that character. This is not yet

	114 // implemented.

	115 // * Simple greedy loops store reduced backtracking information.

	116 // A quantifier like /.*foo/m will greedily match the whole input. It will

	117 // then need to backtrack to a point where it can match "foo". The naive

	118 // implementation of this would push each character position onto the

	119 // backtracking stack, then pop them off one by one. This would use space

	120 // proportional to the length of the input string. However since the "."

	121 // can only match in one way and always has a constant length (in this case

	122 // of 1) it suffices to store the current position on the top of the stack

	123 // once. Matching now becomes merely incrementing the current position and

	124 // backtracking becomes decrementing the current position and checking the

	125 // result against the stored current position. This is faster and saves

	126 // space.

	127 // * The current state is virtualized.

	128 // This is used to defer expensive operations until it is clear that they

	129 // are needed and to generate code for a node more than once, allowing

	130 // specialized an efficient versions of the code to be created. This is

	131 // explained in the section below.

	132 //

	133 // Execution state virtualization.

	134 //

	135 // Instead of emitting code, nodes that manipulate the state can record their

	136 // manipulation in an object called the Trace. The Trace object can record a

	137 // current position offset, an optional backtrack code location on the top of

	138 // the virtualized backtrack stack and some register changes. When a node is

	139 // to be emitted it can flush the Trace or update it. Flushing the Trace

	140 // will emit code to bring the actual state into line with the virtual state.

	141 // Avoiding flushing the state can postpone some work (e.g. updates of capture

	142 // registers). Postponing work can save time when executing the regular

	143 // expression since it may be found that the work never has to be done as a

	144 // failure to match can occur. In addition it is much faster to jump to a

	145 // known backtrack code location than it is to pop an unknown backtrack

	146 // location from the stack and jump there.

	147 //

	148 // The virtual state found in the Trace affects code generation. For example

	149 // the virtual state contains the difference between the actual current

	150 // position and the virtual current position, and matching code needs to use

	151 // this offset to attempt a match in the correct location of the input

	152 // string. Therefore code generated for a non-trivial trace is specialized

	153 // to that trace. The code generator therefore has the ability to generate

	154 // code for each node several times. In order to limit the size of the

	155 // generated code there is an arbitrary limit on how many specialized sets of

	156 // code may be generated for a given node. If the limit is reached, the

	157 // trace is flushed and a generic version of the code for a node is emitted.

	158 // This is subsequently used for that node. The code emitted for non-generic

	159 // trace is not recorded in the node and so it cannot currently be reused in

	160 // the event that code generation is requested for an identical trace.

	161

	162

	163 void RegExpTree::AppendToText(RegExpText* text, Zone* zone) {

	164 UNREACHABLE();

	165 }

	166

	167

	168 void RegExpAtom::AppendToText(RegExpText* text, Zone* zone) {

	169 text->AddElement(TextElement::Atom(this), zone);

	170 }

	171

	172

	173 void RegExpCharacterClass::AppendToText(RegExpText* text, Zone* zone) {

	174 text->AddElement(TextElement::CharClass(this), zone);

	175 }

	176

	177

	178 void RegExpText::AppendToText(RegExpText* text, Zone* zone) {

	179 for (int i = 0; i < elements()->length(); i++)

	180 text->AddElement(elements()->at(i), zone);

	181 }

	182

	183

	184 TextElement TextElement::Atom(RegExpAtom* atom) {

	185 return TextElement(ATOM, atom);

	186 }

	187

	188

	189 TextElement TextElement::CharClass(RegExpCharacterClass* char_class) {

	190 return TextElement(CHAR_CLASS, char_class);

	191 }

	192

	193

	194 int TextElement::length() const {

	195 switch (text_type()) {

	196 case ATOM:

	197 return atom()->length();

	198

	199 case CHAR_CLASS:

	200 return 1;

	201 }

	202 UNREACHABLE();

	203 return 0;

	204 }

	205

	206

	207 DispatchTable* ChoiceNode::GetTable(bool ignore_case) {

	208 if (table_ == NULL) {

	209 table_ = new(zone()) DispatchTable(zone());

	210 DispatchTableConstructor cons(table_, ignore_case, zone());

	211 cons.BuildTable(this);

	212 }

	213 return table_;

	214 }

	215

	216

	217 class FrequencyCollator {

	218 public:

	219 FrequencyCollator() : total_samples_(0) {

	220 for (int i = 0; i < RegExpMacroAssembler::kTableSize; i++) {

	221 frequencies_[i] = CharacterFrequency(i);

	222 }

	223 }

	224

	225 void CountCharacter(int character) {

	226 int index = (character & RegExpMacroAssembler::kTableMask);

	227 frequencies_[index].Increment();

	228 total_samples_++;

	229 }

	230

	231 // Does not measure in percent, but rather per-128 (the table size from the

	232 // regexp macro assembler).

	233 int Frequency(int in_character) {

	234 DCHECK((in_character & RegExpMacroAssembler::kTableMask) == in_character);

	235 if (total_samples_ < 1) return 1; // Division by zero.

	236 int freq_in_per128 =

	237 (frequencies_[in_character].counter() * 128) / total_samples_;

	238 return freq_in_per128;

	239 }

	240

	241 private:

	242 class CharacterFrequency {

	243 public:

	244 CharacterFrequency() : counter_(0), character_(-1) { }

	245 explicit CharacterFrequency(int character)

	246 : counter_(0), character_(character) { }

	247

	248 void Increment() { counter_++; }

	249 int counter() { return counter_; }

	250 int character() { return character_; }

	251

	252 private:

	253 int counter_;

	254 int character_;

	255 };

	256

	257

	258 private:

	259 CharacterFrequency frequencies_[RegExpMacroAssembler::kTableSize];

	260 int total_samples_;

	261 };

	262

	263

	264 class RegExpCompiler {

	265 public:

	266 RegExpCompiler(int capture_count, bool ignore_case, bool is_one_byte,

	267 Zone* zone);

	268

	269 int AllocateRegister() {

	270 if (next_register_ >= RegExpMacroAssembler::kMaxRegister) {

	271 reg_exp_too_big_ = true;

	272 return next_register_;

	273 }

	274 return next_register_++;

	275 }

	276

	277 RegExpEngine::CompilationResult Assemble(RegExpMacroAssembler* assembler,

	278 RegExpNode* start,

	279 int capture_count,

	280 Handle<String> pattern);

	281

	282 inline void AddWork(RegExpNode* node) { work_list_->Add(node); }

	283

	284 static const int kImplementationOffset = 0;

	285 static const int kNumberOfRegistersOffset = 0;

	286 static const int kCodeOffset = 1;

	287

	288 RegExpMacroAssembler* macro_assembler() { return macro_assembler_; }

	289 EndNode* accept() { return accept_; }

	290

	291 static const int kMaxRecursion = 100;

	292 inline int recursion_depth() { return recursion_depth_; }

	293 inline void IncrementRecursionDepth() { recursion_depth_++; }

	294 inline void DecrementRecursionDepth() { recursion_depth_--; }

	295

	296 void SetRegExpTooBig() { reg_exp_too_big_ = true; }

	297

	298 inline bool ignore_case() { return ignore_case_; }

	299 inline bool one_byte() { return one_byte_; }

	300 FrequencyCollator* frequency_collator() { return &frequency_collator_; }

	301

	302 int current_expansion_factor() { return current_expansion_factor_; }

	303 void set_current_expansion_factor(int value) {

	304 current_expansion_factor_ = value;

	305 }

	306

	307 Zone* zone() const { return zone_; }

	308

	309 static const int kNoRegister = -1;

	310

	311 private:

	312 EndNode* accept_;

	313 int next_register_;

	314 List<RegExpNode> work_list_;

	315 int recursion_depth_;

	316 RegExpMacroAssembler* macro_assembler_;

	317 bool ignore_case_;

	318 bool one_byte_;

	319 bool reg_exp_too_big_;

	320 int current_expansion_factor_;

	321 FrequencyCollator frequency_collator_;

	322 Zone* zone_;

	323 };

	324

	325

	326 class RecursionCheck {

	327 public:

	328 explicit RecursionCheck(RegExpCompiler* compiler) : compiler_(compiler) {

	329 compiler->IncrementRecursionDepth();

	330 }

	331 ~RecursionCheck() { compiler_->DecrementRecursionDepth(); }

	332 private:

	333 RegExpCompiler* compiler_;

	334 };

	335

	336

	337 static RegExpEngine::CompilationResult IrregexpRegExpTooBig(Isolate* isolate) {

	338 return RegExpEngine::CompilationResult(isolate, "RegExp too big");

	339 }

	340

	341

	342 // Attempts to compile the regexp using an Irregexp code generator. Returns

	343 // a fixed array or a null handle depending on whether it succeeded.

	344 RegExpCompiler::RegExpCompiler(int capture_count, bool ignore_case,

	345 bool one_byte, Zone* zone)

	346 : next_register_(2 * (capture_count + 1)),

	347 work_list_(NULL),

	348 recursion_depth_(0),

	349 ignore_case_(ignore_case),

	350 one_byte_(one_byte),

	351 reg_exp_too_big_(false),

	352 current_expansion_factor_(1),

	353 frequency_collator_(),

	354 zone_(zone) {

	355 accept_ = new(zone) EndNode(EndNode::ACCEPT, zone);

	356 DCHECK(next_register_ - 1 <= RegExpMacroAssembler::kMaxRegister);

	357 }

	358

	359

	360 RegExpEngine::CompilationResult RegExpCompiler::Assemble(

	361 RegExpMacroAssembler* macro_assembler,

	362 RegExpNode* start,

	363 int capture_count,

	364 Handle<String> pattern) {

	365 Heap* heap = pattern->GetHeap();

	366

	367 bool use_slow_safe_regexp_compiler = false;

	368 if (heap->total_regexp_code_generated() >

	369 RegExpImpl::kRegWxpCompiledLimit &&

	370 heap->isolate()->memory_allocator()->SizeExecutable() >

	371 RegExpImpl::kRegExpExecutableMemoryLimit) {

	372 use_slow_safe_regexp_compiler = true;

	373 }

	374

	375 macro_assembler->set_slow_safe(use_slow_safe_regexp_compiler);

	376

	377 #ifdef DEBUG

	378 if (FLAG_trace_regexp_assembler)

	379 macro_assembler_ = new RegExpMacroAssemblerTracer(macro_assembler);

	380 else

	381 #endif

	382 macro_assembler_ = macro_assembler;

	383

	384 List <RegExpNode*> work_list(0);

	385 work_list_ = &work_list;

	386 Label fail;

	387 macro_assembler_->PushBacktrack(&fail);

	388 Trace new_trace;

	389 start->Emit(this, &new_trace);

	390 macro_assembler_->Bind(&fail);

	391 macro_assembler_->Fail();

	392 while (!work_list.is_empty()) {

	393 work_list.RemoveLast()->Emit(this, &new_trace);

	394 }

	395 if (reg_exp_too_big_) return IrregexpRegExpTooBig(zone_->isolate());

	396

	397 Handle<HeapObject> code = macro_assembler_->GetCode(pattern);

	398 heap->IncreaseTotalRegexpCodeGenerated(code->Size());

	399 work_list_ = NULL;

	400 #ifdef DEBUG

	401 if (FLAG_print_code) {

	402 CodeTracer::Scope trace_scope(heap->isolate()->GetCodeTracer());

	403 OFStream os(trace_scope.file());

	404 Handle<Code>::cast(code)->Disassemble(pattern->ToCString().get(), os);

	405 }

	406 if (FLAG_trace_regexp_assembler) {

	407 delete macro_assembler_;

	408 }

	409 #endif

	410 return RegExpEngine::CompilationResult(*code, next_register_);

	411 }

	412

	413

	414 bool Trace::DeferredAction::Mentions(int that) {

	415 if (action_type() == ActionNode::CLEAR_CAPTURES) {

	416 Interval range = static_cast<DeferredClearCaptures*>(this)->range();

	417 return range.Contains(that);

	418 } else {

	419 return reg() == that;

	420 }

	421 }

	422

	423

	424 bool Trace::mentions_reg(int reg) {

	425 for (DeferredAction* action = actions_;

	426 action != NULL;

	427 action = action->next()) {

	428 if (action->Mentions(reg))

	429 return true;

	430 }

	431 return false;

	432 }

	433

	434

	435 bool Trace::GetStoredPosition(int reg, int* cp_offset) {

	436 DCHECK_EQ(0, *cp_offset);

	437 for (DeferredAction* action = actions_;

	438 action != NULL;

	439 action = action->next()) {

	440 if (action->Mentions(reg)) {

	441 if (action->action_type() == ActionNode::STORE_POSITION) {

	442 cp_offset = static_cast<DeferredCapture>(action)->cp_offset();

	443 return true;

	444 } else {

	445 return false;

	446 }

	447 }

	448 }

	449 return false;

	450 }

	451

	452

	453 int Trace::FindAffectedRegisters(OutSet* affected_registers,

	454 Zone* zone) {

	455 int max_register = RegExpCompiler::kNoRegister;

	456 for (DeferredAction* action = actions_;

	457 action != NULL;

	458 action = action->next()) {

	459 if (action->action_type() == ActionNode::CLEAR_CAPTURES) {

	460 Interval range = static_cast<DeferredClearCaptures*>(action)->range();

	461 for (int i = range.from(); i <= range.to(); i++)

	462 affected_registers->Set(i, zone);

	463 if (range.to() > max_register) max_register = range.to();

	464 } else {

	465 affected_registers->Set(action->reg(), zone);

	466 if (action->reg() > max_register) max_register = action->reg();

	467 }

	468 }

	469 return max_register;

	470 }

	471

	472

	473 void Trace::RestoreAffectedRegisters(RegExpMacroAssembler* assembler,

	474 int max_register,

	475 const OutSet& registers_to_pop,

	476 const OutSet& registers_to_clear) {

	477 for (int reg = max_register; reg >= 0; reg--) {

	478 if (registers_to_pop.Get(reg)) {

	479 assembler->PopRegister(reg);

	480 } else if (registers_to_clear.Get(reg)) {

	481 int clear_to = reg;

	482 while (reg > 0 && registers_to_clear.Get(reg - 1)) {

	483 reg--;

	484 }

	485 assembler->ClearRegisters(reg, clear_to);

	486 }

	487 }

	488 }

	489

	490

	491 void Trace::PerformDeferredActions(RegExpMacroAssembler* assembler,

	492 int max_register,

	493 const OutSet& affected_registers,

	494 OutSet* registers_to_pop,

	495 OutSet* registers_to_clear,

	496 Zone* zone) {

	497 // The "+1" is to avoid a push_limit of zero if stack_limit_slack() is 1.

	498 const int push_limit = (assembler->stack_limit_slack() + 1) / 2;

	499

	500 // Count pushes performed to force a stack limit check occasionally.

	501 int pushes = 0;

	502

	503 for (int reg = 0; reg <= max_register; reg++) {

	504 if (!affected_registers.Get(reg)) {

	505 continue;

	506 }

	507

	508 // The chronologically first deferred action in the trace

	509 // is used to infer the action needed to restore a register

	510 // to its previous state (or not, if it's safe to ignore it).

	511 enum DeferredActionUndoType { IGNORE, RESTORE, CLEAR };

	512 DeferredActionUndoType undo_action = IGNORE;

	513

	514 int value = 0;

	515 bool absolute = false;

	516 bool clear = false;

	517 int store_position = -1;

	518 // This is a little tricky because we are scanning the actions in reverse

	519 // historical order (newest first).

	520 for (DeferredAction* action = actions_;

	521 action != NULL;

	522 action = action->next()) {

	523 if (action->Mentions(reg)) {

	524 switch (action->action_type()) {

	525 case ActionNode::SET_REGISTER: {

	526 Trace::DeferredSetRegister* psr =

	527 static_cast<Trace::DeferredSetRegister*>(action);

	528 if (!absolute) {

	529 value += psr->value();

	530 absolute = true;

	531 }

	532 // SET_REGISTER is currently only used for newly introduced loop

	533 // counters. They can have a significant previous value if they

	534 // occour in a loop. TODO(lrn): Propagate this information, so

	535 // we can set undo_action to IGNORE if we know there is no value to

	536 // restore.

	537 undo_action = RESTORE;

	538 DCHECK_EQ(store_position, -1);

	539 DCHECK(!clear);

	540 break;

	541 }

	542 case ActionNode::INCREMENT_REGISTER:

	543 if (!absolute) {

	544 value++;

	545 }

	546 DCHECK_EQ(store_position, -1);

	547 DCHECK(!clear);

	548 undo_action = RESTORE;

	549 break;

	550 case ActionNode::STORE_POSITION: {

	551 Trace::DeferredCapture* pc =

	552 static_cast<Trace::DeferredCapture*>(action);

	553 if (!clear && store_position == -1) {

	554 store_position = pc->cp_offset();

	555 }

	556

	557 // For captures we know that stores and clears alternate.

	558 // Other register, are never cleared, and if the occur

	559 // inside a loop, they might be assigned more than once.

	560 if (reg <= 1) {

	561 // Registers zero and one, aka "capture zero", is

	562 // always set correctly if we succeed. There is no

	563 // need to undo a setting on backtrack, because we

	564 // will set it again or fail.

	565 undo_action = IGNORE;

	566 } else {

	567 undo_action = pc->is_capture() ? CLEAR : RESTORE;

	568 }

	569 DCHECK(!absolute);

	570 DCHECK_EQ(value, 0);

	571 break;

	572 }

	573 case ActionNode::CLEAR_CAPTURES: {

	574 // Since we're scanning in reverse order, if we've already

	575 // set the position we have to ignore historically earlier

	576 // clearing operations.

	577 if (store_position == -1) {

	578 clear = true;

	579 }

	580 undo_action = RESTORE;

	581 DCHECK(!absolute);

	582 DCHECK_EQ(value, 0);

	583 break;

	584 }

	585 default:

	586 UNREACHABLE();

	587 break;

	588 }

	589 }

	590 }

	591 // Prepare for the undo-action (e.g., push if it's going to be popped).

	592 if (undo_action == RESTORE) {

	593 pushes++;

	594 RegExpMacroAssembler::StackCheckFlag stack_check =

	595 RegExpMacroAssembler::kNoStackLimitCheck;

	596 if (pushes == push_limit) {

	597 stack_check = RegExpMacroAssembler::kCheckStackLimit;

	598 pushes = 0;

	599 }

	600

	601 assembler->PushRegister(reg, stack_check);

	602 registers_to_pop->Set(reg, zone);

	603 } else if (undo_action == CLEAR) {

	604 registers_to_clear->Set(reg, zone);

	605 }

	606 // Perform the chronologically last action (or accumulated increment)

	607 // for the register.

	608 if (store_position != -1) {

	609 assembler->WriteCurrentPositionToRegister(reg, store_position);

	610 } else if (clear) {

	611 assembler->ClearRegisters(reg, reg);

	612 } else if (absolute) {

	613 assembler->SetRegister(reg, value);

	614 } else if (value != 0) {

	615 assembler->AdvanceRegister(reg, value);

	616 }

	617 }

	618 }

	619

	620

	621 // This is called as we come into a loop choice node and some other tricky

	622 // nodes. It normalizes the state of the code generator to ensure we can

	623 // generate generic code.

	624 void Trace::Flush(RegExpCompiler* compiler, RegExpNode* successor) {

	625 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	626

	627 DCHECK(!is_trivial());

	628

	629 if (actions_ == NULL && backtrack() == NULL) {

	630 // Here we just have some deferred cp advances to fix and we are back to

	631 // a normal situation. We may also have to forget some information gained

	632 // through a quick check that was already performed.

	633 if (cp_offset_ != 0) assembler->AdvanceCurrentPosition(cp_offset_);

	634 // Create a new trivial state and generate the node with that.

	635 Trace new_state;

	636 successor->Emit(compiler, &new_state);

	637 return;

	638 }

	639

	640 // Generate deferred actions here along with code to undo them again.

	641 OutSet affected_registers;

	642

	643 if (backtrack() != NULL) {

	644 // Here we have a concrete backtrack location. These are set up by choice

	645 // nodes and so they indicate that we have a deferred save of the current

	646 // position which we may need to emit here.

	647 assembler->PushCurrentPosition();

	648 }

	649

	650 int max_register = FindAffectedRegisters(&affected_registers,

	651 compiler->zone());

	652 OutSet registers_to_pop;

	653 OutSet registers_to_clear;

	654 PerformDeferredActions(assembler,

	655 max_register,

	656 affected_registers,

	657 &registers_to_pop,

	658 &registers_to_clear,

	659 compiler->zone());

	660 if (cp_offset_ != 0) {

	661 assembler->AdvanceCurrentPosition(cp_offset_);

	662 }

	663

	664 // Create a new trivial state and generate the node with that.

	665 Label undo;

	666 assembler->PushBacktrack(&undo);

	667 Trace new_state;

	668 successor->Emit(compiler, &new_state);

	669

	670 // On backtrack we need to restore state.

	671 assembler->Bind(&undo);

	672 RestoreAffectedRegisters(assembler,

	673 max_register,

	674 registers_to_pop,

	675 registers_to_clear);

	676 if (backtrack() == NULL) {

	677 assembler->Backtrack();

	678 } else {

	679 assembler->PopCurrentPosition();

	680 assembler->GoTo(backtrack());

	681 }

	682 }

	683

	684

	685 void NegativeSubmatchSuccess::Emit(RegExpCompiler* compiler, Trace* trace) {

	686 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	687

	688 // Omit flushing the trace. We discard the entire stack frame anyway.

	689

	690 if (!label()->is_bound()) {

	691 // We are completely independent of the trace, since we ignore it,

	692 // so this code can be used as the generic version.

	693 assembler->Bind(label());

	694 }

	695

	696 // Throw away everything on the backtrack stack since the start

	697 // of the negative submatch and restore the character position.

	698 assembler->ReadCurrentPositionFromRegister(current_position_register_);

	699 assembler->ReadStackPointerFromRegister(stack_pointer_register_);

	700 if (clear_capture_count_ > 0) {

	701 // Clear any captures that might have been performed during the success

	702 // of the body of the negative look-ahead.

	703 int clear_capture_end = clear_capture_start_ + clear_capture_count_ - 1;

	704 assembler->ClearRegisters(clear_capture_start_, clear_capture_end);

	705 }

	706 // Now that we have unwound the stack we find at the top of the stack the

	707 // backtrack that the BeginSubmatch node got.

	708 assembler->Backtrack();

	709 }

	710

	711

	712 void EndNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	713 if (!trace->is_trivial()) {

	714 trace->Flush(compiler, this);

	715 return;

	716 }

	717 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	718 if (!label()->is_bound()) {

	719 assembler->Bind(label());

	720 }

	721 switch (action_) {

	722 case ACCEPT:

	723 assembler->Succeed();

	724 return;

	725 case BACKTRACK:

	726 assembler->GoTo(trace->backtrack());

	727 return;

	728 case NEGATIVE_SUBMATCH_SUCCESS:

	729 // This case is handled in a different virtual method.

	730 UNREACHABLE();

	731 }

	732 UNIMPLEMENTED();

	733 }

	734

	735

	736 void GuardedAlternative::AddGuard(Guard* guard, Zone* zone) {

	737 if (guards_ == NULL)

	738 guards_ = new(zone) ZoneList<Guard*>(1, zone);

	739 guards_->Add(guard, zone);

	740 }

	741

	742

	743 ActionNode* ActionNode::SetRegister(int reg,

	744 int val,

	745 RegExpNode* on_success) {

	746 ActionNode* result =

	747 new(on_success->zone()) ActionNode(SET_REGISTER, on_success);

	748 result->data_.u_store_register.reg = reg;

	749 result->data_.u_store_register.value = val;

	750 return result;

	751 }

	752

	753

	754 ActionNode* ActionNode::IncrementRegister(int reg, RegExpNode* on_success) {

	755 ActionNode* result =

	756 new(on_success->zone()) ActionNode(INCREMENT_REGISTER, on_success);

	757 result->data_.u_increment_register.reg = reg;

	758 return result;

	759 }

	760

	761

	762 ActionNode* ActionNode::StorePosition(int reg,

	763 bool is_capture,

	764 RegExpNode* on_success) {

	765 ActionNode* result =

	766 new(on_success->zone()) ActionNode(STORE_POSITION, on_success);

	767 result->data_.u_position_register.reg = reg;

	768 result->data_.u_position_register.is_capture = is_capture;

	769 return result;

	770 }

	771

	772

	773 ActionNode* ActionNode::ClearCaptures(Interval range,

	774 RegExpNode* on_success) {

	775 ActionNode* result =

	776 new(on_success->zone()) ActionNode(CLEAR_CAPTURES, on_success);

	777 result->data_.u_clear_captures.range_from = range.from();

	778 result->data_.u_clear_captures.range_to = range.to();

	779 return result;

	780 }

	781

	782

	783 ActionNode* ActionNode::BeginSubmatch(int stack_reg,

	784 int position_reg,

	785 RegExpNode* on_success) {

	786 ActionNode* result =

	787 new(on_success->zone()) ActionNode(BEGIN_SUBMATCH, on_success);

	788 result->data_.u_submatch.stack_pointer_register = stack_reg;

	789 result->data_.u_submatch.current_position_register = position_reg;

	790 return result;

	791 }

	792

	793

	794 ActionNode* ActionNode::PositiveSubmatchSuccess(int stack_reg,

	795 int position_reg,

	796 int clear_register_count,

	797 int clear_register_from,

	798 RegExpNode* on_success) {

	799 ActionNode* result =

	800 new(on_success->zone()) ActionNode(POSITIVE_SUBMATCH_SUCCESS, on_success);

	801 result->data_.u_submatch.stack_pointer_register = stack_reg;

	802 result->data_.u_submatch.current_position_register = position_reg;

	803 result->data_.u_submatch.clear_register_count = clear_register_count;

	804 result->data_.u_submatch.clear_register_from = clear_register_from;

	805 return result;

	806 }

	807

	808

	809 ActionNode* ActionNode::EmptyMatchCheck(int start_register,

	810 int repetition_register,

	811 int repetition_limit,

	812 RegExpNode* on_success) {

	813 ActionNode* result =

	814 new(on_success->zone()) ActionNode(EMPTY_MATCH_CHECK, on_success);

	815 result->data_.u_empty_match_check.start_register = start_register;

	816 result->data_.u_empty_match_check.repetition_register = repetition_register;

	817 result->data_.u_empty_match_check.repetition_limit = repetition_limit;

	818 return result;

	819 }

	820

	821

	822 #define DEFINE_ACCEPT(Type) \

	823 void Type##Node::Accept(NodeVisitor* visitor) { \

	824 visitor->Visit##Type(this); \

	825 }

	826 FOR_EACH_NODE_TYPE(DEFINE_ACCEPT)

	827 #undef DEFINE_ACCEPT

	828

	829

	830 void LoopChoiceNode::Accept(NodeVisitor* visitor) {

	831 visitor->VisitLoopChoice(this);

	832 }

	833

	834

	835 // -------------------------------------------------------------------

	836 // Emit code.

	837

	838

	839 void ChoiceNode::GenerateGuard(RegExpMacroAssembler* macro_assembler,

	840 Guard* guard,

	841 Trace* trace) {

	842 switch (guard->op()) {

	843 case Guard::LT:

	844 DCHECK(!trace->mentions_reg(guard->reg()));

	845 macro_assembler->IfRegisterGE(guard->reg(),

	846 guard->value(),

	847 trace->backtrack());

	848 break;

	849 case Guard::GEQ:

	850 DCHECK(!trace->mentions_reg(guard->reg()));

	851 macro_assembler->IfRegisterLT(guard->reg(),

	852 guard->value(),

	853 trace->backtrack());

	854 break;

	855 }

	856 }

	857

	858

	859 // Returns the number of characters in the equivalence class, omitting those

	860 // that cannot occur in the source string because it is ASCII.

	861 static int GetCaseIndependentLetters(Isolate* isolate, uc16 character,

	862 bool one_byte_subject,

	863 unibrow::uchar* letters) {

	864 int length =

	865 isolate->jsregexp_uncanonicalize()->get(character, '\0', letters);

	866 // Unibrow returns 0 or 1 for characters where case independence is

	867 // trivial.

	868 if (length == 0) {

	869 letters[0] = character;

	870 length = 1;

	871 }

	872 if (!one_byte_subject \|\| character <= String::kMaxOneByteCharCode) {

	873 return length;

	874 }

	875

	876 // The standard requires that non-ASCII characters cannot have ASCII

	877 // character codes in their equivalence class.

	878 // TODO(dcarney): issue 3550 this is not actually true for Latin1 anymore,

	879 // is it? For example, \u00C5 is equivalent to \u212B.

	880 return 0;

	881 }

	882

	883

	884 static inline bool EmitSimpleCharacter(Isolate* isolate,

	885 RegExpCompiler* compiler,

	886 uc16 c,

	887 Label* on_failure,

	888 int cp_offset,

	889 bool check,

	890 bool preloaded) {

	891 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	892 bool bound_checked = false;

	893 if (!preloaded) {

	894 assembler->LoadCurrentCharacter(

	895 cp_offset,

	896 on_failure,

	897 check);

	898 bound_checked = true;

	899 }

	900 assembler->CheckNotCharacter(c, on_failure);

	901 return bound_checked;

	902 }

	903

	904

	905 // Only emits non-letters (things that don't have case). Only used for case

	906 // independent matches.

	907 static inline bool EmitAtomNonLetter(Isolate* isolate,

	908 RegExpCompiler* compiler,

	909 uc16 c,

	910 Label* on_failure,

	911 int cp_offset,

	912 bool check,

	913 bool preloaded) {

	914 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	915 bool one_byte = compiler->one_byte();

	916 unibrow::uchar chars[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	917 int length = GetCaseIndependentLetters(isolate, c, one_byte, chars);

	918 if (length < 1) {

	919 // This can't match. Must be an one-byte subject and a non-one-byte

	920 // character. We do not need to do anything since the one-byte pass

	921 // already handled this.

	922 return false; // Bounds not checked.

	923 }

	924 bool checked = false;

	925 // We handle the length > 1 case in a later pass.

	926 if (length == 1) {

	927 if (one_byte && c > String::kMaxOneByteCharCodeU) {

	928 // Can't match - see above.

	929 return false; // Bounds not checked.

	930 }

	931 if (!preloaded) {

	932 macro_assembler->LoadCurrentCharacter(cp_offset, on_failure, check);

	933 checked = check;

	934 }

	935 macro_assembler->CheckNotCharacter(c, on_failure);

	936 }

	937 return checked;

	938 }

	939

	940

	941 static bool ShortCutEmitCharacterPair(RegExpMacroAssembler* macro_assembler,

	942 bool one_byte, uc16 c1, uc16 c2,

	943 Label* on_failure) {

	944 uc16 char_mask;

	945 if (one_byte) {

	946 char_mask = String::kMaxOneByteCharCode;

	947 } else {

	948 char_mask = String::kMaxUtf16CodeUnit;

	949 }

	950 uc16 exor = c1 ^ c2;

	951 // Check whether exor has only one bit set.

	952 if (((exor - 1) & exor) == 0) {

	953 // If c1 and c2 differ only by one bit.

	954 // Ecma262UnCanonicalize always gives the highest number last.

	955 DCHECK(c2 > c1);

	956 uc16 mask = char_mask ^ exor;

	957 macro_assembler->CheckNotCharacterAfterAnd(c1, mask, on_failure);

	958 return true;

	959 }

	960 DCHECK(c2 > c1);

	961 uc16 diff = c2 - c1;

	962 if (((diff - 1) & diff) == 0 && c1 >= diff) {

	963 // If the characters differ by 2^n but don't differ by one bit then

	964 // subtract the difference from the found character, then do the or

	965 // trick. We avoid the theoretical case where negative numbers are

	966 // involved in order to simplify code generation.

	967 uc16 mask = char_mask ^ diff;

	968 macro_assembler->CheckNotCharacterAfterMinusAnd(c1 - diff,

	969 diff,

	970 mask,

	971 on_failure);

	972 return true;

	973 }

	974 return false;

	975 }

	976

	977

	978 typedef bool EmitCharacterFunction(Isolate* isolate,

	979 RegExpCompiler* compiler,

	980 uc16 c,

	981 Label* on_failure,

	982 int cp_offset,

	983 bool check,

	984 bool preloaded);

	985

	986 // Only emits letters (things that have case). Only used for case independent

	987 // matches.

	988 static inline bool EmitAtomLetter(Isolate* isolate,

	989 RegExpCompiler* compiler,

	990 uc16 c,

	991 Label* on_failure,

	992 int cp_offset,

	993 bool check,

	994 bool preloaded) {

	995 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	996 bool one_byte = compiler->one_byte();

	997 unibrow::uchar chars[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	998 int length = GetCaseIndependentLetters(isolate, c, one_byte, chars);

	999 if (length <= 1) return false;

	1000 // We may not need to check against the end of the input string

	1001 // if this character lies before a character that matched.

	1002 if (!preloaded) {

	1003 macro_assembler->LoadCurrentCharacter(cp_offset, on_failure, check);

	1004 }

	1005 Label ok;

	1006 DCHECK(unibrow::Ecma262UnCanonicalize::kMaxWidth == 4);

	1007 switch (length) {

	1008 case 2: {

	1009 if (ShortCutEmitCharacterPair(macro_assembler, one_byte, chars[0],

	1010 chars[1], on_failure)) {

	1011 } else {

	1012 macro_assembler->CheckCharacter(chars[0], &ok);

	1013 macro_assembler->CheckNotCharacter(chars[1], on_failure);

	1014 macro_assembler->Bind(&ok);

	1015 }

	1016 break;

	1017 }

	1018 case 4:

	1019 macro_assembler->CheckCharacter(chars[3], &ok);

	1020 // Fall through!

	1021 case 3:

	1022 macro_assembler->CheckCharacter(chars[0], &ok);

	1023 macro_assembler->CheckCharacter(chars[1], &ok);

	1024 macro_assembler->CheckNotCharacter(chars[2], on_failure);

	1025 macro_assembler->Bind(&ok);

	1026 break;

	1027 default:

	1028 UNREACHABLE();

	1029 break;

	1030 }

	1031 return true;

	1032 }

	1033

	1034

	1035 static void EmitBoundaryTest(RegExpMacroAssembler* masm,

	1036 int border,

	1037 Label* fall_through,

	1038 Label* above_or_equal,

	1039 Label* below) {

	1040 if (below != fall_through) {

	1041 masm->CheckCharacterLT(border, below);

	1042 if (above_or_equal != fall_through) masm->GoTo(above_or_equal);

	1043 } else {

	1044 masm->CheckCharacterGT(border - 1, above_or_equal);

	1045 }

	1046 }

	1047

	1048

	1049 static void EmitDoubleBoundaryTest(RegExpMacroAssembler* masm,

	1050 int first,

	1051 int last,

	1052 Label* fall_through,

	1053 Label* in_range,

	1054 Label* out_of_range) {

	1055 if (in_range == fall_through) {

	1056 if (first == last) {

	1057 masm->CheckNotCharacter(first, out_of_range);

	1058 } else {

	1059 masm->CheckCharacterNotInRange(first, last, out_of_range);

	1060 }

	1061 } else {

	1062 if (first == last) {

	1063 masm->CheckCharacter(first, in_range);

	1064 } else {

	1065 masm->CheckCharacterInRange(first, last, in_range);

	1066 }

	1067 if (out_of_range != fall_through) masm->GoTo(out_of_range);

	1068 }

	1069 }

	1070

	1071

	1072 // even_label is for ranges[i] to ranges[i + 1] where i - start_index is even.

	1073 // odd_label is for ranges[i] to ranges[i + 1] where i - start_index is odd.

	1074 static void EmitUseLookupTable(

	1075 RegExpMacroAssembler* masm,

	1076 ZoneList<int>* ranges,

	1077 int start_index,

	1078 int end_index,

	1079 int min_char,

	1080 Label* fall_through,

	1081 Label* even_label,

	1082 Label* odd_label) {

	1083 static const int kSize = RegExpMacroAssembler::kTableSize;

	1084 static const int kMask = RegExpMacroAssembler::kTableMask;

	1085

	1086 int base = (min_char & ~kMask);

	1087 USE(base);

	1088

	1089 // Assert that everything is on one kTableSize page.

	1090 for (int i = start_index; i <= end_index; i++) {

	1091 DCHECK_EQ(ranges->at(i) & ~kMask, base);

	1092 }

	1093 DCHECK(start_index == 0 \|\| (ranges->at(start_index - 1) & ~kMask) <= base);

	1094

	1095 char templ[kSize];

	1096 Label* on_bit_set;

	1097 Label* on_bit_clear;

	1098 int bit;

	1099 if (even_label == fall_through) {

	1100 on_bit_set = odd_label;

	1101 on_bit_clear = even_label;

	1102 bit = 1;

	1103 } else {

	1104 on_bit_set = even_label;

	1105 on_bit_clear = odd_label;

	1106 bit = 0;

	1107 }

	1108 for (int i = 0; i < (ranges->at(start_index) & kMask) && i < kSize; i++) {

	1109 templ[i] = bit;

	1110 }

	1111 int j = 0;

	1112 bit ^= 1;

	1113 for (int i = start_index; i < end_index; i++) {

	1114 for (j = (ranges->at(i) & kMask); j < (ranges->at(i + 1) & kMask); j++) {

	1115 templ[j] = bit;

	1116 }

	1117 bit ^= 1;

	1118 }

	1119 for (int i = j; i < kSize; i++) {

	1120 templ[i] = bit;

	1121 }

	1122 Factory* factory = masm->zone()->isolate()->factory();

	1123 // TODO(erikcorry): Cache these.

	1124 Handle<ByteArray> ba = factory->NewByteArray(kSize, TENURED);

	1125 for (int i = 0; i < kSize; i++) {

	1126 ba->set(i, templ[i]);

	1127 }

	1128 masm->CheckBitInTable(ba, on_bit_set);

	1129 if (on_bit_clear != fall_through) masm->GoTo(on_bit_clear);

	1130 }

	1131

	1132

	1133 static void CutOutRange(RegExpMacroAssembler* masm,

	1134 ZoneList<int>* ranges,

	1135 int start_index,

	1136 int end_index,

	1137 int cut_index,

	1138 Label* even_label,

	1139 Label* odd_label) {

	1140 bool odd = (((cut_index - start_index) & 1) == 1);

	1141 Label* in_range_label = odd ? odd_label : even_label;

	1142 Label dummy;

	1143 EmitDoubleBoundaryTest(masm,

	1144 ranges->at(cut_index),

	1145 ranges->at(cut_index + 1) - 1,

	1146 &dummy,

	1147 in_range_label,

	1148 &dummy);

	1149 DCHECK(!dummy.is_linked());

	1150 // Cut out the single range by rewriting the array. This creates a new

	1151 // range that is a merger of the two ranges on either side of the one we

	1152 // are cutting out. The oddity of the labels is preserved.

	1153 for (int j = cut_index; j > start_index; j--) {

	1154 ranges->at(j) = ranges->at(j - 1);

	1155 }

	1156 for (int j = cut_index + 1; j < end_index; j++) {

	1157 ranges->at(j) = ranges->at(j + 1);

	1158 }

	1159 }

	1160

	1161

	1162 // Unicode case. Split the search space into kSize spaces that are handled

	1163 // with recursion.

	1164 static void SplitSearchSpace(ZoneList<int>* ranges,

	1165 int start_index,

	1166 int end_index,

	1167 int* new_start_index,

	1168 int* new_end_index,

	1169 int* border) {

	1170 static const int kSize = RegExpMacroAssembler::kTableSize;

	1171 static const int kMask = RegExpMacroAssembler::kTableMask;

	1172

	1173 int first = ranges->at(start_index);

	1174 int last = ranges->at(end_index) - 1;

	1175

	1176 *new_start_index = start_index;

	1177 *border = (ranges->at(start_index) & ~kMask) + kSize;

	1178 while (*new_start_index < end_index) {

	1179 if (ranges->at(new_start_index) > border) break;

	1180 (*new_start_index)++;

	1181 }

	1182 // new_start_index is the index of the first edge that is beyond the

	1183 // current kSize space.

	1184

	1185 // For very large search spaces we do a binary chop search of the non-Latin1

	1186 // space instead of just going to the end of the current kSize space. The

	1187 // heuristics are complicated a little by the fact that any 128-character

	1188 // encoding space can be quickly tested with a table lookup, so we don't

	1189 // wish to do binary chop search at a smaller granularity than that. A

	1190 // 128-character space can take up a lot of space in the ranges array if,

	1191 // for example, we only want to match every second character (eg. the lower

	1192 // case characters on some Unicode pages).

	1193 int binary_chop_index = (end_index + start_index) / 2;

	1194 // The first test ensures that we get to the code that handles the Latin1

	1195 // range with a single not-taken branch, speeding up this important

	1196 // character range (even non-Latin1 charset-based text has spaces and

	1197 // punctuation).

	1198 if (*border - 1 > String::kMaxOneByteCharCode && // Latin1 case.

	1199 end_index - start_index > (new_start_index - start_index) 2 &&

	1200 last - first > kSize * 2 && binary_chop_index > *new_start_index &&

	1201 ranges->at(binary_chop_index) >= first + 2 * kSize) {

	1202 int scan_forward_for_section_border = binary_chop_index;;

	1203 int new_border = (ranges->at(binary_chop_index) \| kMask) + 1;

	1204

	1205 while (scan_forward_for_section_border < end_index) {

	1206 if (ranges->at(scan_forward_for_section_border) > new_border) {

	1207 *new_start_index = scan_forward_for_section_border;

	1208 *border = new_border;

	1209 break;

	1210 }

	1211 scan_forward_for_section_border++;

	1212 }

	1213 }

	1214

	1215 DCHECK(*new_start_index > start_index);

	1216 new_end_index = new_start_index - 1;

	1217 if (ranges->at(new_end_index) == border) {

	1218 (*new_end_index)--;

	1219 }

	1220 if (*border >= ranges->at(end_index)) {

	1221 *border = ranges->at(end_index);

	1222 *new_start_index = end_index; // Won't be used.

	1223 *new_end_index = end_index - 1;

	1224 }

	1225 }

	1226

	1227

	1228 // Gets a series of segment boundaries representing a character class. If the

	1229 // character is in the range between an even and an odd boundary (counting from

	1230 // start_index) then go to even_label, otherwise go to odd_label. We already

	1231 // know that the character is in the range of min_char to max_char inclusive.

	1232 // Either label can be NULL indicating backtracking. Either label can also be

	1233 // equal to the fall_through label.

	1234 static void GenerateBranches(RegExpMacroAssembler* masm,

	1235 ZoneList<int>* ranges,

	1236 int start_index,

	1237 int end_index,

	1238 uc16 min_char,

	1239 uc16 max_char,

	1240 Label* fall_through,

	1241 Label* even_label,

	1242 Label* odd_label) {

	1243 int first = ranges->at(start_index);

	1244 int last = ranges->at(end_index) - 1;

	1245

	1246 DCHECK_LT(min_char, first);

	1247

	1248 // Just need to test if the character is before or on-or-after

	1249 // a particular character.

	1250 if (start_index == end_index) {

	1251 EmitBoundaryTest(masm, first, fall_through, even_label, odd_label);

	1252 return;

	1253 }

	1254

	1255 // Another almost trivial case: There is one interval in the middle that is

	1256 // different from the end intervals.

	1257 if (start_index + 1 == end_index) {

	1258 EmitDoubleBoundaryTest(

	1259 masm, first, last, fall_through, even_label, odd_label);

	1260 return;

	1261 }

	1262

	1263 // It's not worth using table lookup if there are very few intervals in the

	1264 // character class.

	1265 if (end_index - start_index <= 6) {

	1266 // It is faster to test for individual characters, so we look for those

	1267 // first, then try arbitrary ranges in the second round.

	1268 static int kNoCutIndex = -1;

	1269 int cut = kNoCutIndex;

	1270 for (int i = start_index; i < end_index; i++) {

	1271 if (ranges->at(i) == ranges->at(i + 1) - 1) {

	1272 cut = i;

	1273 break;

	1274 }

	1275 }

	1276 if (cut == kNoCutIndex) cut = start_index;

	1277 CutOutRange(

	1278 masm, ranges, start_index, end_index, cut, even_label, odd_label);

	1279 DCHECK_GE(end_index - start_index, 2);

	1280 GenerateBranches(masm,

	1281 ranges,

	1282 start_index + 1,

	1283 end_index - 1,

	1284 min_char,

	1285 max_char,

	1286 fall_through,

	1287 even_label,

	1288 odd_label);

	1289 return;

	1290 }

	1291

	1292 // If there are a lot of intervals in the regexp, then we will use tables to

	1293 // determine whether the character is inside or outside the character class.

	1294 static const int kBits = RegExpMacroAssembler::kTableSizeBits;

	1295

	1296 if ((max_char >> kBits) == (min_char >> kBits)) {

	1297 EmitUseLookupTable(masm,

	1298 ranges,

	1299 start_index,

	1300 end_index,

	1301 min_char,

	1302 fall_through,

	1303 even_label,

	1304 odd_label);

	1305 return;

	1306 }

	1307

	1308 if ((min_char >> kBits) != (first >> kBits)) {

	1309 masm->CheckCharacterLT(first, odd_label);

	1310 GenerateBranches(masm,

	1311 ranges,

	1312 start_index + 1,

	1313 end_index,

	1314 first,

	1315 max_char,

	1316 fall_through,

	1317 odd_label,

	1318 even_label);

	1319 return;

	1320 }

	1321

	1322 int new_start_index = 0;

	1323 int new_end_index = 0;

	1324 int border = 0;

	1325

	1326 SplitSearchSpace(ranges,

	1327 start_index,

	1328 end_index,

	1329 &new_start_index,

	1330 &new_end_index,

	1331 &border);

	1332

	1333 Label handle_rest;

	1334 Label* above = &handle_rest;

	1335 if (border == last + 1) {

	1336 // We didn't find any section that started after the limit, so everything

	1337 // above the border is one of the terminal labels.

	1338 above = (end_index & 1) != (start_index & 1) ? odd_label : even_label;

	1339 DCHECK(new_end_index == end_index - 1);

	1340 }

	1341

	1342 DCHECK_LE(start_index, new_end_index);

	1343 DCHECK_LE(new_start_index, end_index);

	1344 DCHECK_LT(start_index, new_start_index);

	1345 DCHECK_LT(new_end_index, end_index);

	1346 DCHECK(new_end_index + 1 == new_start_index \|\|

	1347 (new_end_index + 2 == new_start_index &&

	1348 border == ranges->at(new_end_index + 1)));

	1349 DCHECK_LT(min_char, border - 1);

	1350 DCHECK_LT(border, max_char);

	1351 DCHECK_LT(ranges->at(new_end_index), border);

	1352 DCHECK(border < ranges->at(new_start_index) \|\|

	1353 (border == ranges->at(new_start_index) &&

	1354 new_start_index == end_index &&

	1355 new_end_index == end_index - 1 &&

	1356 border == last + 1));

	1357 DCHECK(new_start_index == 0 \|\| border >= ranges->at(new_start_index - 1));

	1358

	1359 masm->CheckCharacterGT(border - 1, above);

	1360 Label dummy;

	1361 GenerateBranches(masm,

	1362 ranges,

	1363 start_index,

	1364 new_end_index,

	1365 min_char,

	1366 border - 1,

	1367 &dummy,

	1368 even_label,

	1369 odd_label);

	1370 if (handle_rest.is_linked()) {

	1371 masm->Bind(&handle_rest);

	1372 bool flip = (new_start_index & 1) != (start_index & 1);

	1373 GenerateBranches(masm,

	1374 ranges,

	1375 new_start_index,

	1376 end_index,

	1377 border,

	1378 max_char,

	1379 &dummy,

	1380 flip ? odd_label : even_label,

	1381 flip ? even_label : odd_label);

	1382 }

	1383 }

	1384

	1385

	1386 static void EmitCharClass(RegExpMacroAssembler* macro_assembler,

	1387 RegExpCharacterClass* cc, bool one_byte,

	1388 Label* on_failure, int cp_offset, bool check_offset,

	1389 bool preloaded, Zone* zone) {

	1390 ZoneList<CharacterRange>* ranges = cc->ranges(zone);

	1391 if (!CharacterRange::IsCanonical(ranges)) {

	1392 CharacterRange::Canonicalize(ranges);

	1393 }

	1394

	1395 int max_char;

	1396 if (one_byte) {

	1397 max_char = String::kMaxOneByteCharCode;

	1398 } else {

	1399 max_char = String::kMaxUtf16CodeUnit;

	1400 }

	1401

	1402 int range_count = ranges->length();

	1403

	1404 int last_valid_range = range_count - 1;

	1405 while (last_valid_range >= 0) {

	1406 CharacterRange& range = ranges->at(last_valid_range);

	1407 if (range.from() <= max_char) {

	1408 break;

	1409 }

	1410 last_valid_range--;

	1411 }

	1412

	1413 if (last_valid_range < 0) {

	1414 if (!cc->is_negated()) {

	1415 macro_assembler->GoTo(on_failure);

	1416 }

	1417 if (check_offset) {

	1418 macro_assembler->CheckPosition(cp_offset, on_failure);

	1419 }

	1420 return;

	1421 }

	1422

	1423 if (last_valid_range == 0 &&

	1424 ranges->at(0).IsEverything(max_char)) {

	1425 if (cc->is_negated()) {

	1426 macro_assembler->GoTo(on_failure);

	1427 } else {

	1428 // This is a common case hit by non-anchored expressions.

	1429 if (check_offset) {

	1430 macro_assembler->CheckPosition(cp_offset, on_failure);

	1431 }

	1432 }

	1433 return;

	1434 }

	1435 if (last_valid_range == 0 &&

	1436 !cc->is_negated() &&

	1437 ranges->at(0).IsEverything(max_char)) {

	1438 // This is a common case hit by non-anchored expressions.

	1439 if (check_offset) {

	1440 macro_assembler->CheckPosition(cp_offset, on_failure);

	1441 }

	1442 return;

	1443 }

	1444

	1445 if (!preloaded) {

	1446 macro_assembler->LoadCurrentCharacter(cp_offset, on_failure, check_offset);

	1447 }

	1448

	1449 if (cc->is_standard(zone) &&

	1450 macro_assembler->CheckSpecialCharacterClass(cc->standard_type(),

	1451 on_failure)) {

	1452 return;

	1453 }

	1454

	1455

	1456 // A new list with ascending entries. Each entry is a code unit

	1457 // where there is a boundary between code units that are part of

	1458 // the class and code units that are not. Normally we insert an

	1459 // entry at zero which goes to the failure label, but if there

	1460 // was already one there we fall through for success on that entry.

	1461 // Subsequent entries have alternating meaning (success/failure).

	1462 ZoneList<int>* range_boundaries =

	1463 new(zone) ZoneList<int>(last_valid_range, zone);

	1464

	1465 bool zeroth_entry_is_failure = !cc->is_negated();

	1466

	1467 for (int i = 0; i <= last_valid_range; i++) {

	1468 CharacterRange& range = ranges->at(i);

	1469 if (range.from() == 0) {

	1470 DCHECK_EQ(i, 0);

	1471 zeroth_entry_is_failure = !zeroth_entry_is_failure;

	1472 } else {

	1473 range_boundaries->Add(range.from(), zone);

	1474 }

	1475 range_boundaries->Add(range.to() + 1, zone);

	1476 }

	1477 int end_index = range_boundaries->length() - 1;

	1478 if (range_boundaries->at(end_index) > max_char) {

	1479 end_index--;

	1480 }

	1481

	1482 Label fall_through;

	1483 GenerateBranches(macro_assembler,

	1484 range_boundaries,

	1485 0, // start_index.

	1486 end_index,

	1487 0, // min_char.

	1488 max_char,

	1489 &fall_through,

	1490 zeroth_entry_is_failure ? &fall_through : on_failure,

	1491 zeroth_entry_is_failure ? on_failure : &fall_through);

	1492 macro_assembler->Bind(&fall_through);

	1493 }

	1494

	1495

	1496 RegExpNode::~RegExpNode() {

	1497 }

	1498

	1499

	1500 RegExpNode::LimitResult RegExpNode::LimitVersions(RegExpCompiler* compiler,

	1501 Trace* trace) {

	1502 // If we are generating a greedy loop then don't stop and don't reuse code.

	1503 if (trace->stop_node() != NULL) {

	1504 return CONTINUE;

	1505 }

	1506

	1507 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	1508 if (trace->is_trivial()) {

	1509 if (label_.is_bound()) {

	1510 // We are being asked to generate a generic version, but that's already

	1511 // been done so just go to it.

	1512 macro_assembler->GoTo(&label_);

	1513 return DONE;

	1514 }

	1515 if (compiler->recursion_depth() >= RegExpCompiler::kMaxRecursion) {

	1516 // To avoid too deep recursion we push the node to the work queue and just

	1517 // generate a goto here.

	1518 compiler->AddWork(this);

	1519 macro_assembler->GoTo(&label_);

	1520 return DONE;

	1521 }

	1522 // Generate generic version of the node and bind the label for later use.

	1523 macro_assembler->Bind(&label_);

	1524 return CONTINUE;

	1525 }

	1526

	1527 // We are being asked to make a non-generic version. Keep track of how many

	1528 // non-generic versions we generate so as not to overdo it.

	1529 trace_count_++;

	1530 if (FLAG_regexp_optimization &&

	1531 trace_count_ < kMaxCopiesCodeGenerated &&

	1532 compiler->recursion_depth() <= RegExpCompiler::kMaxRecursion) {

	1533 return CONTINUE;

	1534 }

	1535

	1536 // If we get here code has been generated for this node too many times or

	1537 // recursion is too deep. Time to switch to a generic version. The code for

	1538 // generic versions above can handle deep recursion properly.

	1539 trace->Flush(compiler, this);

	1540 return DONE;

	1541 }

	1542

	1543

	1544 int ActionNode::EatsAtLeast(int still_to_find,

	1545 int budget,

	1546 bool not_at_start) {

	1547 if (budget <= 0) return 0;

	1548 if (action_type_ == POSITIVE_SUBMATCH_SUCCESS) return 0; // Rewinds input!

	1549 return on_success()->EatsAtLeast(still_to_find,

	1550 budget - 1,

	1551 not_at_start);

	1552 }

	1553

	1554

	1555 void ActionNode::FillInBMInfo(int offset,

	1556 int budget,

	1557 BoyerMooreLookahead* bm,

	1558 bool not_at_start) {

	1559 if (action_type_ == BEGIN_SUBMATCH) {

	1560 bm->SetRest(offset);

	1561 } else if (action_type_ != POSITIVE_SUBMATCH_SUCCESS) {

	1562 on_success()->FillInBMInfo(offset, budget - 1, bm, not_at_start);

	1563 }

	1564 SaveBMInfo(bm, not_at_start, offset);

	1565 }

	1566

	1567

	1568 int AssertionNode::EatsAtLeast(int still_to_find,

	1569 int budget,

	1570 bool not_at_start) {

	1571 if (budget <= 0) return 0;

	1572 // If we know we are not at the start and we are asked "how many characters

	1573 // will you match if you succeed?" then we can answer anything since false

	1574 // implies false. So lets just return the max answer (still_to_find) since

	1575 // that won't prevent us from preloading a lot of characters for the other

	1576 // branches in the node graph.

	1577 if (assertion_type() == AT_START && not_at_start) return still_to_find;

	1578 return on_success()->EatsAtLeast(still_to_find,

	1579 budget - 1,

	1580 not_at_start);

	1581 }

	1582

	1583

	1584 void AssertionNode::FillInBMInfo(int offset,

	1585 int budget,

	1586 BoyerMooreLookahead* bm,

	1587 bool not_at_start) {

	1588 // Match the behaviour of EatsAtLeast on this node.

	1589 if (assertion_type() == AT_START && not_at_start) return;

	1590 on_success()->FillInBMInfo(offset, budget - 1, bm, not_at_start);

	1591 SaveBMInfo(bm, not_at_start, offset);

	1592 }

	1593

	1594

	1595 int BackReferenceNode::EatsAtLeast(int still_to_find,

	1596 int budget,

	1597 bool not_at_start) {

	1598 if (budget <= 0) return 0;

	1599 return on_success()->EatsAtLeast(still_to_find,

	1600 budget - 1,

	1601 not_at_start);

	1602 }

	1603

	1604

	1605 int TextNode::EatsAtLeast(int still_to_find,

	1606 int budget,

	1607 bool not_at_start) {

	1608 int answer = Length();

	1609 if (answer >= still_to_find) return answer;

	1610 if (budget <= 0) return answer;

	1611 // We are not at start after this node so we set the last argument to 'true'.

	1612 return answer + on_success()->EatsAtLeast(still_to_find - answer,

	1613 budget - 1,

	1614 true);

	1615 }

	1616

	1617

	1618 int NegativeLookaheadChoiceNode::EatsAtLeast(int still_to_find,

	1619 int budget,

	1620 bool not_at_start) {

	1621 if (budget <= 0) return 0;

	1622 // Alternative 0 is the negative lookahead, alternative 1 is what comes

	1623 // afterwards.

	1624 RegExpNode* node = alternatives_->at(1).node();

	1625 return node->EatsAtLeast(still_to_find, budget - 1, not_at_start);

	1626 }

	1627

	1628

	1629 void NegativeLookaheadChoiceNode::GetQuickCheckDetails(

	1630 QuickCheckDetails* details,

	1631 RegExpCompiler* compiler,

	1632 int filled_in,

	1633 bool not_at_start) {

	1634 // Alternative 0 is the negative lookahead, alternative 1 is what comes

	1635 // afterwards.

	1636 RegExpNode* node = alternatives_->at(1).node();

	1637 return node->GetQuickCheckDetails(details, compiler, filled_in, not_at_start);

	1638 }

	1639

	1640

	1641 int ChoiceNode::EatsAtLeastHelper(int still_to_find,

	1642 int budget,

	1643 RegExpNode* ignore_this_node,

	1644 bool not_at_start) {

	1645 if (budget <= 0) return 0;

	1646 int min = 100;

	1647 int choice_count = alternatives_->length();

	1648 budget = (budget - 1) / choice_count;

	1649 for (int i = 0; i < choice_count; i++) {

	1650 RegExpNode* node = alternatives_->at(i).node();

	1651 if (node == ignore_this_node) continue;

	1652 int node_eats_at_least =

	1653 node->EatsAtLeast(still_to_find, budget, not_at_start);

	1654 if (node_eats_at_least < min) min = node_eats_at_least;

	1655 if (min == 0) return 0;

	1656 }

	1657 return min;

	1658 }

	1659

	1660

	1661 int LoopChoiceNode::EatsAtLeast(int still_to_find,

	1662 int budget,

	1663 bool not_at_start) {

	1664 return EatsAtLeastHelper(still_to_find,

	1665 budget - 1,

	1666 loop_node_,

	1667 not_at_start);

	1668 }

	1669

	1670

	1671 int ChoiceNode::EatsAtLeast(int still_to_find,

	1672 int budget,

	1673 bool not_at_start) {

	1674 return EatsAtLeastHelper(still_to_find,

	1675 budget,

	1676 NULL,

	1677 not_at_start);

	1678 }

	1679

	1680

	1681 // Takes the left-most 1-bit and smears it out, setting all bits to its right.

	1682 static inline uint32_t SmearBitsRight(uint32_t v) {

	1683 v \|= v >> 1;

	1684 v \|= v >> 2;

	1685 v \|= v >> 4;

	1686 v \|= v >> 8;

	1687 v \|= v >> 16;

	1688 return v;

	1689 }

	1690

	1691

	1692 bool QuickCheckDetails::Rationalize(bool asc) {

	1693 bool found_useful_op = false;

	1694 uint32_t char_mask;

	1695 if (asc) {

	1696 char_mask = String::kMaxOneByteCharCode;

	1697 } else {

	1698 char_mask = String::kMaxUtf16CodeUnit;

	1699 }

	1700 mask_ = 0;

	1701 value_ = 0;

	1702 int char_shift = 0;

	1703 for (int i = 0; i < characters_; i++) {

	1704 Position* pos = &positions_[i];

	1705 if ((pos->mask & String::kMaxOneByteCharCode) != 0) {

	1706 found_useful_op = true;

	1707 }

	1708 mask_ \|= (pos->mask & char_mask) << char_shift;

	1709 value_ \|= (pos->value & char_mask) << char_shift;

	1710 char_shift += asc ? 8 : 16;

	1711 }

	1712 return found_useful_op;

	1713 }

	1714

	1715

	1716 bool RegExpNode::EmitQuickCheck(RegExpCompiler* compiler,

	1717 Trace* bounds_check_trace,

	1718 Trace* trace,

	1719 bool preload_has_checked_bounds,

	1720 Label* on_possible_success,

	1721 QuickCheckDetails* details,

	1722 bool fall_through_on_failure) {

	1723 if (details->characters() == 0) return false;

	1724 GetQuickCheckDetails(

	1725 details, compiler, 0, trace->at_start() == Trace::FALSE_VALUE);

	1726 if (details->cannot_match()) return false;

	1727 if (!details->Rationalize(compiler->one_byte())) return false;

	1728 DCHECK(details->characters() == 1 \|\|

	1729 compiler->macro_assembler()->CanReadUnaligned());

	1730 uint32_t mask = details->mask();

	1731 uint32_t value = details->value();

	1732

	1733 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	1734

	1735 if (trace->characters_preloaded() != details->characters()) {

	1736 DCHECK(trace->cp_offset() == bounds_check_trace->cp_offset());

	1737 // We are attempting to preload the minimum number of characters

	1738 // any choice would eat, so if the bounds check fails, then none of the

	1739 // choices can succeed, so we can just immediately backtrack, rather

	1740 // than go to the next choice.

	1741 assembler->LoadCurrentCharacter(trace->cp_offset(),

	1742 bounds_check_trace->backtrack(),

	1743 !preload_has_checked_bounds,

	1744 details->characters());

	1745 }

	1746

	1747

	1748 bool need_mask = true;

	1749

	1750 if (details->characters() == 1) {

	1751 // If number of characters preloaded is 1 then we used a byte or 16 bit

	1752 // load so the value is already masked down.

	1753 uint32_t char_mask;

	1754 if (compiler->one_byte()) {

	1755 char_mask = String::kMaxOneByteCharCode;

	1756 } else {

	1757 char_mask = String::kMaxUtf16CodeUnit;

	1758 }

	1759 if ((mask & char_mask) == char_mask) need_mask = false;

	1760 mask &= char_mask;

	1761 } else {

	1762 // For 2-character preloads in one-byte mode or 1-character preloads in

	1763 // two-byte mode we also use a 16 bit load with zero extend.

	1764 if (details->characters() == 2 && compiler->one_byte()) {

	1765 if ((mask & 0xffff) == 0xffff) need_mask = false;

	1766 } else if (details->characters() == 1 && !compiler->one_byte()) {

	1767 if ((mask & 0xffff) == 0xffff) need_mask = false;

	1768 } else {

	1769 if (mask == 0xffffffff) need_mask = false;

	1770 }

	1771 }

	1772

	1773 if (fall_through_on_failure) {

	1774 if (need_mask) {

	1775 assembler->CheckCharacterAfterAnd(value, mask, on_possible_success);

	1776 } else {

	1777 assembler->CheckCharacter(value, on_possible_success);

	1778 }

	1779 } else {

	1780 if (need_mask) {

	1781 assembler->CheckNotCharacterAfterAnd(value, mask, trace->backtrack());

	1782 } else {

	1783 assembler->CheckNotCharacter(value, trace->backtrack());

	1784 }

	1785 }

	1786 return true;

	1787 }

	1788

	1789

	1790 // Here is the meat of GetQuickCheckDetails (see also the comment on the

	1791 // super-class in the .h file).

	1792 //

	1793 // We iterate along the text object, building up for each character a

	1794 // mask and value that can be used to test for a quick failure to match.

	1795 // The masks and values for the positions will be combined into a single

	1796 // machine word for the current character width in order to be used in

	1797 // generating a quick check.

	1798 void TextNode::GetQuickCheckDetails(QuickCheckDetails* details,

	1799 RegExpCompiler* compiler,

	1800 int characters_filled_in,

	1801 bool not_at_start) {

	1802 Isolate* isolate = compiler->macro_assembler()->zone()->isolate();

	1803 DCHECK(characters_filled_in < details->characters());

	1804 int characters = details->characters();

	1805 int char_mask;

	1806 if (compiler->one_byte()) {

	1807 char_mask = String::kMaxOneByteCharCode;

	1808 } else {

	1809 char_mask = String::kMaxUtf16CodeUnit;

	1810 }

	1811 for (int k = 0; k < elms_->length(); k++) {

	1812 TextElement elm = elms_->at(k);

	1813 if (elm.text_type() == TextElement::ATOM) {

	1814 Vector<const uc16> quarks = elm.atom()->data();

	1815 for (int i = 0; i < characters && i < quarks.length(); i++) {

	1816 QuickCheckDetails::Position* pos =

	1817 details->positions(characters_filled_in);

	1818 uc16 c = quarks[i];

	1819 if (c > char_mask) {

	1820 // If we expect a non-Latin1 character from an one-byte string,

	1821 // there is no way we can match. Not even case-independent

	1822 // matching can turn an Latin1 character into non-Latin1 or

	1823 // vice versa.

	1824 // TODO(dcarney): issue 3550. Verify that this works as expected.

	1825 // For example, \u0178 is uppercase of \u00ff (y-umlaut).

	1826 details->set_cannot_match();

	1827 pos->determines_perfectly = false;

	1828 return;

	1829 }

	1830 if (compiler->ignore_case()) {

	1831 unibrow::uchar chars[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	1832 int length = GetCaseIndependentLetters(isolate, c,

	1833 compiler->one_byte(), chars);

	1834 DCHECK(length != 0); // Can only happen if c > char_mask (see above).

	1835 if (length == 1) {

	1836 // This letter has no case equivalents, so it's nice and simple

	1837 // and the mask-compare will determine definitely whether we have

	1838 // a match at this character position.

	1839 pos->mask = char_mask;

	1840 pos->value = c;

	1841 pos->determines_perfectly = true;

	1842 } else {

	1843 uint32_t common_bits = char_mask;

	1844 uint32_t bits = chars[0];

	1845 for (int j = 1; j < length; j++) {

	1846 uint32_t differing_bits = ((chars[j] & common_bits) ^ bits);

	1847 common_bits ^= differing_bits;

	1848 bits &= common_bits;

	1849 }

	1850 // If length is 2 and common bits has only one zero in it then

	1851 // our mask and compare instruction will determine definitely

	1852 // whether we have a match at this character position. Otherwise

	1853 // it can only be an approximate check.

	1854 uint32_t one_zero = (common_bits \| ~char_mask);

	1855 if (length == 2 && ((~one_zero) & ((~one_zero) - 1)) == 0) {

	1856 pos->determines_perfectly = true;

	1857 }

	1858 pos->mask = common_bits;

	1859 pos->value = bits;

	1860 }

	1861 } else {

	1862 // Don't ignore case. Nice simple case where the mask-compare will

	1863 // determine definitely whether we have a match at this character

	1864 // position.

	1865 pos->mask = char_mask;

	1866 pos->value = c;

	1867 pos->determines_perfectly = true;

	1868 }

	1869 characters_filled_in++;

	1870 DCHECK(characters_filled_in <= details->characters());

	1871 if (characters_filled_in == details->characters()) {

	1872 return;

	1873 }

	1874 }

	1875 } else {

	1876 QuickCheckDetails::Position* pos =

	1877 details->positions(characters_filled_in);

	1878 RegExpCharacterClass* tree = elm.char_class();

	1879 ZoneList<CharacterRange>* ranges = tree->ranges(zone());

	1880 if (tree->is_negated()) {

	1881 // A quick check uses multi-character mask and compare. There is no

	1882 // useful way to incorporate a negative char class into this scheme

	1883 // so we just conservatively create a mask and value that will always

	1884 // succeed.

	1885 pos->mask = 0;

	1886 pos->value = 0;

	1887 } else {

	1888 int first_range = 0;

	1889 while (ranges->at(first_range).from() > char_mask) {

	1890 first_range++;

	1891 if (first_range == ranges->length()) {

	1892 details->set_cannot_match();

	1893 pos->determines_perfectly = false;

	1894 return;

	1895 }

	1896 }

	1897 CharacterRange range = ranges->at(first_range);

	1898 uc16 from = range.from();

	1899 uc16 to = range.to();

	1900 if (to > char_mask) {

	1901 to = char_mask;

	1902 }

	1903 uint32_t differing_bits = (from ^ to);

	1904 // A mask and compare is only perfect if the differing bits form a

	1905 // number like 00011111 with one single block of trailing 1s.

	1906 if ((differing_bits & (differing_bits + 1)) == 0 &&

	1907 from + differing_bits == to) {

	1908 pos->determines_perfectly = true;

	1909 }

	1910 uint32_t common_bits = ~SmearBitsRight(differing_bits);

	1911 uint32_t bits = (from & common_bits);

	1912 for (int i = first_range + 1; i < ranges->length(); i++) {

	1913 CharacterRange range = ranges->at(i);

	1914 uc16 from = range.from();

	1915 uc16 to = range.to();

	1916 if (from > char_mask) continue;

	1917 if (to > char_mask) to = char_mask;

	1918 // Here we are combining more ranges into the mask and compare

	1919 // value. With each new range the mask becomes more sparse and

	1920 // so the chances of a false positive rise. A character class

	1921 // with multiple ranges is assumed never to be equivalent to a

	1922 // mask and compare operation.

	1923 pos->determines_perfectly = false;

	1924 uint32_t new_common_bits = (from ^ to);

	1925 new_common_bits = ~SmearBitsRight(new_common_bits);

	1926 common_bits &= new_common_bits;

	1927 bits &= new_common_bits;

	1928 uint32_t differing_bits = (from & common_bits) ^ bits;

	1929 common_bits ^= differing_bits;

	1930 bits &= common_bits;

	1931 }

	1932 pos->mask = common_bits;

	1933 pos->value = bits;

	1934 }

	1935 characters_filled_in++;

	1936 DCHECK(characters_filled_in <= details->characters());

	1937 if (characters_filled_in == details->characters()) {

	1938 return;

	1939 }

	1940 }

	1941 }

	1942 DCHECK(characters_filled_in != details->characters());

	1943 if (!details->cannot_match()) {

	1944 on_success()-> GetQuickCheckDetails(details,

	1945 compiler,

	1946 characters_filled_in,

	1947 true);

	1948 }

	1949 }

	1950

	1951

	1952 void QuickCheckDetails::Clear() {

	1953 for (int i = 0; i < characters_; i++) {

	1954 positions_[i].mask = 0;

	1955 positions_[i].value = 0;

	1956 positions_[i].determines_perfectly = false;

	1957 }

	1958 characters_ = 0;

	1959 }

	1960

	1961

	1962 void QuickCheckDetails::Advance(int by, bool one_byte) {

	1963 DCHECK(by >= 0);

	1964 if (by >= characters_) {

	1965 Clear();

	1966 return;

	1967 }

	1968 for (int i = 0; i < characters_ - by; i++) {

	1969 positions_[i] = positions_[by + i];

	1970 }

	1971 for (int i = characters_ - by; i < characters_; i++) {

	1972 positions_[i].mask = 0;

	1973 positions_[i].value = 0;

	1974 positions_[i].determines_perfectly = false;

	1975 }

	1976 characters_ -= by;

	1977 // We could change mask_ and value_ here but we would never advance unless

	1978 // they had already been used in a check and they won't be used again because

	1979 // it would gain us nothing. So there's no point.

	1980 }

	1981

	1982

	1983 void QuickCheckDetails::Merge(QuickCheckDetails* other, int from_index) {

	1984 DCHECK(characters_ == other->characters_);

	1985 if (other->cannot_match_) {

	1986 return;

	1987 }

	1988 if (cannot_match_) {

	1989 this = other;

	1990 return;

	1991 }

	1992 for (int i = from_index; i < characters_; i++) {

	1993 QuickCheckDetails::Position* pos = positions(i);

	1994 QuickCheckDetails::Position* other_pos = other->positions(i);

	1995 if (pos->mask != other_pos->mask \|\|

	1996 pos->value != other_pos->value \|\|

	1997 !other_pos->determines_perfectly) {

	1998 // Our mask-compare operation will be approximate unless we have the

	1999 // exact same operation on both sides of the alternation.

	2000 pos->determines_perfectly = false;

	2001 }

	2002 pos->mask &= other_pos->mask;

	2003 pos->value &= pos->mask;

	2004 other_pos->value &= pos->mask;

	2005 uc16 differing_bits = (pos->value ^ other_pos->value);

	2006 pos->mask &= ~differing_bits;

	2007 pos->value &= pos->mask;

	2008 }

	2009 }

	2010

	2011

	2012 class VisitMarker {

	2013 public:

	2014 explicit VisitMarker(NodeInfo* info) : info_(info) {

	2015 DCHECK(!info->visited);

	2016 info->visited = true;

	2017 }

	2018 ~VisitMarker() {

	2019 info_->visited = false;

	2020 }

	2021 private:

	2022 NodeInfo* info_;

	2023 };

	2024

	2025

	2026 RegExpNode* SeqRegExpNode::FilterOneByte(int depth, bool ignore_case) {

	2027 if (info()->replacement_calculated) return replacement();

	2028 if (depth < 0) return this;

	2029 DCHECK(!info()->visited);

	2030 VisitMarker marker(info());

	2031 return FilterSuccessor(depth - 1, ignore_case);

	2032 }

	2033

	2034

	2035 RegExpNode* SeqRegExpNode::FilterSuccessor(int depth, bool ignore_case) {

	2036 RegExpNode* next = on_success_->FilterOneByte(depth - 1, ignore_case);

	2037 if (next == NULL) return set_replacement(NULL);

	2038 on_success_ = next;

	2039 return set_replacement(this);

	2040 }

	2041

	2042

	2043 // We need to check for the following characters: 0x39c 0x3bc 0x178.

	2044 static inline bool RangeContainsLatin1Equivalents(CharacterRange range) {

	2045 // TODO(dcarney): this could be a lot more efficient.

	2046 return range.Contains(0x39c) \|\|

	2047 range.Contains(0x3bc) \|\| range.Contains(0x178);

	2048 }

	2049

	2050

	2051 static bool RangesContainLatin1Equivalents(ZoneList<CharacterRange>* ranges) {

	2052 for (int i = 0; i < ranges->length(); i++) {

	2053 // TODO(dcarney): this could be a lot more efficient.

	2054 if (RangeContainsLatin1Equivalents(ranges->at(i))) return true;

	2055 }

	2056 return false;

	2057 }

	2058

	2059

	2060 RegExpNode* TextNode::FilterOneByte(int depth, bool ignore_case) {

	2061 if (info()->replacement_calculated) return replacement();

	2062 if (depth < 0) return this;

	2063 DCHECK(!info()->visited);

	2064 VisitMarker marker(info());

	2065 int element_count = elms_->length();

	2066 for (int i = 0; i < element_count; i++) {

	2067 TextElement elm = elms_->at(i);

	2068 if (elm.text_type() == TextElement::ATOM) {

	2069 Vector<const uc16> quarks = elm.atom()->data();

	2070 for (int j = 0; j < quarks.length(); j++) {

	2071 uint16_t c = quarks[j];

	2072 if (c <= String::kMaxOneByteCharCode) continue;

	2073 if (!ignore_case) return set_replacement(NULL);

	2074 // Here, we need to check for characters whose upper and lower cases

	2075 // are outside the Latin-1 range.

	2076 uint16_t converted = unibrow::Latin1::ConvertNonLatin1ToLatin1(c);

	2077 // Character is outside Latin-1 completely

	2078 if (converted == 0) return set_replacement(NULL);

	2079 // Convert quark to Latin-1 in place.

	2080 uint16_t* copy = const_cast<uint16_t*>(quarks.start());

	2081 copy[j] = converted;

	2082 }

	2083 } else {

	2084 DCHECK(elm.text_type() == TextElement::CHAR_CLASS);

	2085 RegExpCharacterClass* cc = elm.char_class();

	2086 ZoneList<CharacterRange>* ranges = cc->ranges(zone());

	2087 if (!CharacterRange::IsCanonical(ranges)) {

	2088 CharacterRange::Canonicalize(ranges);

	2089 }

	2090 // Now they are in order so we only need to look at the first.

	2091 int range_count = ranges->length();

	2092 if (cc->is_negated()) {

	2093 if (range_count != 0 &&

	2094 ranges->at(0).from() == 0 &&

	2095 ranges->at(0).to() >= String::kMaxOneByteCharCode) {

	2096 // This will be handled in a later filter.

	2097 if (ignore_case && RangesContainLatin1Equivalents(ranges)) continue;

	2098 return set_replacement(NULL);

	2099 }

	2100 } else {

	2101 if (range_count == 0 \|\|

	2102 ranges->at(0).from() > String::kMaxOneByteCharCode) {

	2103 // This will be handled in a later filter.

	2104 if (ignore_case && RangesContainLatin1Equivalents(ranges)) continue;

	2105 return set_replacement(NULL);

	2106 }

	2107 }

	2108 }

	2109 }

	2110 return FilterSuccessor(depth - 1, ignore_case);

	2111 }

	2112

	2113

	2114 RegExpNode* LoopChoiceNode::FilterOneByte(int depth, bool ignore_case) {

	2115 if (info()->replacement_calculated) return replacement();

	2116 if (depth < 0) return this;

	2117 if (info()->visited) return this;

	2118 {

	2119 VisitMarker marker(info());

	2120

	2121 RegExpNode* continue_replacement =

	2122 continue_node_->FilterOneByte(depth - 1, ignore_case);

	2123 // If we can't continue after the loop then there is no sense in doing the

	2124 // loop.

	2125 if (continue_replacement == NULL) return set_replacement(NULL);

	2126 }

	2127

	2128 return ChoiceNode::FilterOneByte(depth - 1, ignore_case);

	2129 }

	2130

	2131

	2132 RegExpNode* ChoiceNode::FilterOneByte(int depth, bool ignore_case) {

	2133 if (info()->replacement_calculated) return replacement();

	2134 if (depth < 0) return this;

	2135 if (info()->visited) return this;

	2136 VisitMarker marker(info());

	2137 int choice_count = alternatives_->length();

	2138

	2139 for (int i = 0; i < choice_count; i++) {

	2140 GuardedAlternative alternative = alternatives_->at(i);

	2141 if (alternative.guards() != NULL && alternative.guards()->length() != 0) {

	2142 set_replacement(this);

	2143 return this;

	2144 }

	2145 }

	2146

	2147 int surviving = 0;

	2148 RegExpNode* survivor = NULL;

	2149 for (int i = 0; i < choice_count; i++) {

	2150 GuardedAlternative alternative = alternatives_->at(i);

	2151 RegExpNode* replacement =

	2152 alternative.node()->FilterOneByte(depth - 1, ignore_case);

	2153 DCHECK(replacement != this); // No missing EMPTY_MATCH_CHECK.

	2154 if (replacement != NULL) {

	2155 alternatives_->at(i).set_node(replacement);

	2156 surviving++;

	2157 survivor = replacement;

	2158 }

	2159 }

	2160 if (surviving < 2) return set_replacement(survivor);

	2161

	2162 set_replacement(this);

	2163 if (surviving == choice_count) {

	2164 return this;

	2165 }

	2166 // Only some of the nodes survived the filtering. We need to rebuild the

	2167 // alternatives list.

	2168 ZoneList<GuardedAlternative>* new_alternatives =

	2169 new(zone()) ZoneList<GuardedAlternative>(surviving, zone());

	2170 for (int i = 0; i < choice_count; i++) {

	2171 RegExpNode* replacement =

	2172 alternatives_->at(i).node()->FilterOneByte(depth - 1, ignore_case);

	2173 if (replacement != NULL) {

	2174 alternatives_->at(i).set_node(replacement);

	2175 new_alternatives->Add(alternatives_->at(i), zone());

	2176 }

	2177 }

	2178 alternatives_ = new_alternatives;

	2179 return this;

	2180 }

	2181

	2182

	2183 RegExpNode* NegativeLookaheadChoiceNode::FilterOneByte(int depth,

	2184 bool ignore_case) {

	2185 if (info()->replacement_calculated) return replacement();

	2186 if (depth < 0) return this;

	2187 if (info()->visited) return this;

	2188 VisitMarker marker(info());

	2189 // Alternative 0 is the negative lookahead, alternative 1 is what comes

	2190 // afterwards.

	2191 RegExpNode* node = alternatives_->at(1).node();

	2192 RegExpNode* replacement = node->FilterOneByte(depth - 1, ignore_case);

	2193 if (replacement == NULL) return set_replacement(NULL);

	2194 alternatives_->at(1).set_node(replacement);

	2195

	2196 RegExpNode* neg_node = alternatives_->at(0).node();

	2197 RegExpNode* neg_replacement = neg_node->FilterOneByte(depth - 1, ignore_case);

	2198 // If the negative lookahead is always going to fail then

	2199 // we don't need to check it.

	2200 if (neg_replacement == NULL) return set_replacement(replacement);

	2201 alternatives_->at(0).set_node(neg_replacement);

	2202 return set_replacement(this);

	2203 }

	2204

	2205

	2206 void LoopChoiceNode::GetQuickCheckDetails(QuickCheckDetails* details,

	2207 RegExpCompiler* compiler,

	2208 int characters_filled_in,

	2209 bool not_at_start) {

	2210 if (body_can_be_zero_length_ \|\| info()->visited) return;

	2211 VisitMarker marker(info());

	2212 return ChoiceNode::GetQuickCheckDetails(details,

	2213 compiler,

	2214 characters_filled_in,

	2215 not_at_start);

	2216 }

	2217

	2218

	2219 void LoopChoiceNode::FillInBMInfo(int offset,

	2220 int budget,

	2221 BoyerMooreLookahead* bm,

	2222 bool not_at_start) {

	2223 if (body_can_be_zero_length_ \|\| budget <= 0) {

	2224 bm->SetRest(offset);

	2225 SaveBMInfo(bm, not_at_start, offset);

	2226 return;

	2227 }

	2228 ChoiceNode::FillInBMInfo(offset, budget - 1, bm, not_at_start);

	2229 SaveBMInfo(bm, not_at_start, offset);

	2230 }

	2231

	2232

	2233 void ChoiceNode::GetQuickCheckDetails(QuickCheckDetails* details,

	2234 RegExpCompiler* compiler,

	2235 int characters_filled_in,

	2236 bool not_at_start) {

	2237 not_at_start = (not_at_start \|\| not_at_start_);

	2238 int choice_count = alternatives_->length();

	2239 DCHECK(choice_count > 0);

	2240 alternatives_->at(0).node()->GetQuickCheckDetails(details,

	2241 compiler,

	2242 characters_filled_in,

	2243 not_at_start);

	2244 for (int i = 1; i < choice_count; i++) {

	2245 QuickCheckDetails new_details(details->characters());

	2246 RegExpNode* node = alternatives_->at(i).node();

	2247 node->GetQuickCheckDetails(&new_details, compiler,

	2248 characters_filled_in,

	2249 not_at_start);

	2250 // Here we merge the quick match details of the two branches.

	2251 details->Merge(&new_details, characters_filled_in);

	2252 }

	2253 }

	2254

	2255

	2256 // Check for [0-9A-Z_a-z].

	2257 static void EmitWordCheck(RegExpMacroAssembler* assembler,

	2258 Label* word,

	2259 Label* non_word,

	2260 bool fall_through_on_word) {

	2261 if (assembler->CheckSpecialCharacterClass(

	2262 fall_through_on_word ? 'w' : 'W',

	2263 fall_through_on_word ? non_word : word)) {

	2264 // Optimized implementation available.

	2265 return;

	2266 }

	2267 assembler->CheckCharacterGT('z', non_word);

	2268 assembler->CheckCharacterLT('0', non_word);

	2269 assembler->CheckCharacterGT('a' - 1, word);

	2270 assembler->CheckCharacterLT('9' + 1, word);

	2271 assembler->CheckCharacterLT('A', non_word);

	2272 assembler->CheckCharacterLT('Z' + 1, word);

	2273 if (fall_through_on_word) {

	2274 assembler->CheckNotCharacter('_', non_word);

	2275 } else {

	2276 assembler->CheckCharacter('_', word);

	2277 }

	2278 }

	2279

	2280

	2281 // Emit the code to check for a ^ in multiline mode (1-character lookbehind

	2282 // that matches newline or the start of input).

	2283 static void EmitHat(RegExpCompiler* compiler,

	2284 RegExpNode* on_success,

	2285 Trace* trace) {

	2286 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	2287 // We will be loading the previous character into the current character

	2288 // register.

	2289 Trace new_trace(*trace);

	2290 new_trace.InvalidateCurrentCharacter();

	2291

	2292 Label ok;

	2293 if (new_trace.cp_offset() == 0) {

	2294 // The start of input counts as a newline in this context, so skip to

	2295 // ok if we are at the start.

	2296 assembler->CheckAtStart(&ok);

	2297 }

	2298 // We already checked that we are not at the start of input so it must be

	2299 // OK to load the previous character.

	2300 assembler->LoadCurrentCharacter(new_trace.cp_offset() -1,

	2301 new_trace.backtrack(),

	2302 false);

	2303 if (!assembler->CheckSpecialCharacterClass('n',

	2304 new_trace.backtrack())) {

	2305 // Newline means \n, \r, 0x2028 or 0x2029.

	2306 if (!compiler->one_byte()) {

	2307 assembler->CheckCharacterAfterAnd(0x2028, 0xfffe, &ok);

	2308 }

	2309 assembler->CheckCharacter('\n', &ok);

	2310 assembler->CheckNotCharacter('\r', new_trace.backtrack());

	2311 }

	2312 assembler->Bind(&ok);

	2313 on_success->Emit(compiler, &new_trace);

	2314 }

	2315

	2316

	2317 // Emit the code to handle \b and \B (word-boundary or non-word-boundary).

	2318 void AssertionNode::EmitBoundaryCheck(RegExpCompiler* compiler, Trace* trace) {

	2319 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	2320 Trace::TriBool next_is_word_character = Trace::UNKNOWN;

	2321 bool not_at_start = (trace->at_start() == Trace::FALSE_VALUE);

	2322 BoyerMooreLookahead* lookahead = bm_info(not_at_start);

	2323 if (lookahead == NULL) {

	2324 int eats_at_least =

	2325 Min(kMaxLookaheadForBoyerMoore, EatsAtLeast(kMaxLookaheadForBoyerMoore,

	2326 kRecursionBudget,

	2327 not_at_start));

	2328 if (eats_at_least >= 1) {

	2329 BoyerMooreLookahead* bm =

	2330 new(zone()) BoyerMooreLookahead(eats_at_least, compiler, zone());

	2331 FillInBMInfo(0, kRecursionBudget, bm, not_at_start);

	2332 if (bm->at(0)->is_non_word())

	2333 next_is_word_character = Trace::FALSE_VALUE;

	2334 if (bm->at(0)->is_word()) next_is_word_character = Trace::TRUE_VALUE;

	2335 }

	2336 } else {

	2337 if (lookahead->at(0)->is_non_word())

	2338 next_is_word_character = Trace::FALSE_VALUE;

	2339 if (lookahead->at(0)->is_word())

	2340 next_is_word_character = Trace::TRUE_VALUE;

	2341 }

	2342 bool at_boundary = (assertion_type_ == AssertionNode::AT_BOUNDARY);

	2343 if (next_is_word_character == Trace::UNKNOWN) {

	2344 Label before_non_word;

	2345 Label before_word;

	2346 if (trace->characters_preloaded() != 1) {

	2347 assembler->LoadCurrentCharacter(trace->cp_offset(), &before_non_word);

	2348 }

	2349 // Fall through on non-word.

	2350 EmitWordCheck(assembler, &before_word, &before_non_word, false);

	2351 // Next character is not a word character.

	2352 assembler->Bind(&before_non_word);

	2353 Label ok;

	2354 BacktrackIfPrevious(compiler, trace, at_boundary ? kIsNonWord : kIsWord);

	2355 assembler->GoTo(&ok);

	2356

	2357 assembler->Bind(&before_word);

	2358 BacktrackIfPrevious(compiler, trace, at_boundary ? kIsWord : kIsNonWord);

	2359 assembler->Bind(&ok);

	2360 } else if (next_is_word_character == Trace::TRUE_VALUE) {

	2361 BacktrackIfPrevious(compiler, trace, at_boundary ? kIsWord : kIsNonWord);

	2362 } else {

	2363 DCHECK(next_is_word_character == Trace::FALSE_VALUE);

	2364 BacktrackIfPrevious(compiler, trace, at_boundary ? kIsNonWord : kIsWord);

	2365 }

	2366 }

	2367

	2368

	2369 void AssertionNode::BacktrackIfPrevious(

	2370 RegExpCompiler* compiler,

	2371 Trace* trace,

	2372 AssertionNode::IfPrevious backtrack_if_previous) {

	2373 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	2374 Trace new_trace(*trace);

	2375 new_trace.InvalidateCurrentCharacter();

	2376

	2377 Label fall_through, dummy;

	2378

	2379 Label* non_word = backtrack_if_previous == kIsNonWord ?

	2380 new_trace.backtrack() :

	2381 &fall_through;

	2382 Label* word = backtrack_if_previous == kIsNonWord ?

	2383 &fall_through :

	2384 new_trace.backtrack();

	2385

	2386 if (new_trace.cp_offset() == 0) {

	2387 // The start of input counts as a non-word character, so the question is

	2388 // decided if we are at the start.

	2389 assembler->CheckAtStart(non_word);

	2390 }

	2391 // We already checked that we are not at the start of input so it must be

	2392 // OK to load the previous character.

	2393 assembler->LoadCurrentCharacter(new_trace.cp_offset() - 1, &dummy, false);

	2394 EmitWordCheck(assembler, word, non_word, backtrack_if_previous == kIsNonWord);

	2395

	2396 assembler->Bind(&fall_through);

	2397 on_success()->Emit(compiler, &new_trace);

	2398 }

	2399

	2400

	2401 void AssertionNode::GetQuickCheckDetails(QuickCheckDetails* details,

	2402 RegExpCompiler* compiler,

	2403 int filled_in,

	2404 bool not_at_start) {

	2405 if (assertion_type_ == AT_START && not_at_start) {

	2406 details->set_cannot_match();

	2407 return;

	2408 }

	2409 return on_success()->GetQuickCheckDetails(details,

	2410 compiler,

	2411 filled_in,

	2412 not_at_start);

	2413 }

	2414

	2415

	2416 void AssertionNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	2417 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	2418 switch (assertion_type_) {

	2419 case AT_END: {

	2420 Label ok;

	2421 assembler->CheckPosition(trace->cp_offset(), &ok);

	2422 assembler->GoTo(trace->backtrack());

	2423 assembler->Bind(&ok);

	2424 break;

	2425 }

	2426 case AT_START: {

	2427 if (trace->at_start() == Trace::FALSE_VALUE) {

	2428 assembler->GoTo(trace->backtrack());

	2429 return;

	2430 }

	2431 if (trace->at_start() == Trace::UNKNOWN) {

	2432 assembler->CheckNotAtStart(trace->backtrack());

	2433 Trace at_start_trace = *trace;

	2434 at_start_trace.set_at_start(true);

	2435 on_success()->Emit(compiler, &at_start_trace);

	2436 return;

	2437 }

	2438 }

	2439 break;

	2440 case AFTER_NEWLINE:

	2441 EmitHat(compiler, on_success(), trace);

	2442 return;

	2443 case AT_BOUNDARY:

	2444 case AT_NON_BOUNDARY: {

	2445 EmitBoundaryCheck(compiler, trace);

	2446 return;

	2447 }

	2448 }

	2449 on_success()->Emit(compiler, trace);

	2450 }

	2451

	2452

	2453 static bool DeterminedAlready(QuickCheckDetails* quick_check, int offset) {

	2454 if (quick_check == NULL) return false;

	2455 if (offset >= quick_check->characters()) return false;

	2456 return quick_check->positions(offset)->determines_perfectly;

	2457 }

	2458

	2459

	2460 static void UpdateBoundsCheck(int index, int* checked_up_to) {

	2461 if (index > *checked_up_to) {

	2462 *checked_up_to = index;

	2463 }

	2464 }

	2465

	2466

	2467 // We call this repeatedly to generate code for each pass over the text node.

	2468 // The passes are in increasing order of difficulty because we hope one

	2469 // of the first passes will fail in which case we are saved the work of the

	2470 // later passes. for example for the case independent regexp /%[asdfghjkl]a/

	2471 // we will check the '%' in the first pass, the case independent 'a' in the

	2472 // second pass and the character class in the last pass.

	2473 //

	2474 // The passes are done from right to left, so for example to test for /bar/

	2475 // we will first test for an 'r' with offset 2, then an 'a' with offset 1

	2476 // and then a 'b' with offset 0. This means we can avoid the end-of-input

	2477 // bounds check most of the time. In the example we only need to check for

	2478 // end-of-input when loading the putative 'r'.

	2479 //

	2480 // A slight complication involves the fact that the first character may already

	2481 // be fetched into a register by the previous node. In this case we want to

	2482 // do the test for that character first. We do this in separate passes. The

	2483 // 'preloaded' argument indicates that we are doing such a 'pass'. If such a

	2484 // pass has been performed then subsequent passes will have true in

	2485 // first_element_checked to indicate that that character does not need to be

	2486 // checked again.

	2487 //

	2488 // In addition to all this we are passed a Trace, which can

	2489 // contain an AlternativeGeneration object. In this AlternativeGeneration

	2490 // object we can see details of any quick check that was already passed in

	2491 // order to get to the code we are now generating. The quick check can involve

	2492 // loading characters, which means we do not need to recheck the bounds

	2493 // up to the limit the quick check already checked. In addition the quick

	2494 // check can have involved a mask and compare operation which may simplify

	2495 // or obviate the need for further checks at some character positions.

	2496 void TextNode::TextEmitPass(RegExpCompiler* compiler,

	2497 TextEmitPassType pass,

	2498 bool preloaded,

	2499 Trace* trace,

	2500 bool first_element_checked,

	2501 int* checked_up_to) {

	2502 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	2503 Isolate* isolate = assembler->zone()->isolate();

	2504 bool one_byte = compiler->one_byte();

	2505 Label* backtrack = trace->backtrack();

	2506 QuickCheckDetails* quick_check = trace->quick_check_performed();

	2507 int element_count = elms_->length();

	2508 for (int i = preloaded ? 0 : element_count - 1; i >= 0; i--) {

	2509 TextElement elm = elms_->at(i);

	2510 int cp_offset = trace->cp_offset() + elm.cp_offset();

	2511 if (elm.text_type() == TextElement::ATOM) {

	2512 Vector<const uc16> quarks = elm.atom()->data();

	2513 for (int j = preloaded ? 0 : quarks.length() - 1; j >= 0; j--) {

	2514 if (first_element_checked && i == 0 && j == 0) continue;

	2515 if (DeterminedAlready(quick_check, elm.cp_offset() + j)) continue;

	2516 EmitCharacterFunction* emit_function = NULL;

	2517 switch (pass) {

	2518 case NON_LATIN1_MATCH:

	2519 DCHECK(one_byte);

	2520 if (quarks[j] > String::kMaxOneByteCharCode) {

	2521 assembler->GoTo(backtrack);

	2522 return;

	2523 }

	2524 break;

	2525 case NON_LETTER_CHARACTER_MATCH:

	2526 emit_function = &EmitAtomNonLetter;

	2527 break;

	2528 case SIMPLE_CHARACTER_MATCH:

	2529 emit_function = &EmitSimpleCharacter;

	2530 break;

	2531 case CASE_CHARACTER_MATCH:

	2532 emit_function = &EmitAtomLetter;

	2533 break;

	2534 default:

	2535 break;

	2536 }

	2537 if (emit_function != NULL) {

	2538 bool bound_checked = emit_function(isolate,

	2539 compiler,

	2540 quarks[j],

	2541 backtrack,

	2542 cp_offset + j,

	2543 *checked_up_to < cp_offset + j,

	2544 preloaded);

	2545 if (bound_checked) UpdateBoundsCheck(cp_offset + j, checked_up_to);

	2546 }

	2547 }

	2548 } else {

	2549 DCHECK_EQ(TextElement::CHAR_CLASS, elm.text_type());

	2550 if (pass == CHARACTER_CLASS_MATCH) {

	2551 if (first_element_checked && i == 0) continue;

	2552 if (DeterminedAlready(quick_check, elm.cp_offset())) continue;

	2553 RegExpCharacterClass* cc = elm.char_class();

	2554 EmitCharClass(assembler, cc, one_byte, backtrack, cp_offset,

	2555 *checked_up_to < cp_offset, preloaded, zone());

	2556 UpdateBoundsCheck(cp_offset, checked_up_to);

	2557 }

	2558 }

	2559 }

	2560 }

	2561

	2562

	2563 int TextNode::Length() {

	2564 TextElement elm = elms_->last();

	2565 DCHECK(elm.cp_offset() >= 0);

	2566 return elm.cp_offset() + elm.length();

	2567 }

	2568

	2569

	2570 bool TextNode::SkipPass(int int_pass, bool ignore_case) {

	2571 TextEmitPassType pass = static_cast<TextEmitPassType>(int_pass);

	2572 if (ignore_case) {

	2573 return pass == SIMPLE_CHARACTER_MATCH;

	2574 } else {

	2575 return pass == NON_LETTER_CHARACTER_MATCH \|\| pass == CASE_CHARACTER_MATCH;

	2576 }

	2577 }

	2578

	2579

	2580 // This generates the code to match a text node. A text node can contain

	2581 // straight character sequences (possibly to be matched in a case-independent

	2582 // way) and character classes. For efficiency we do not do this in a single

	2583 // pass from left to right. Instead we pass over the text node several times,

	2584 // emitting code for some character positions every time. See the comment on

	2585 // TextEmitPass for details.

	2586 void TextNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	2587 LimitResult limit_result = LimitVersions(compiler, trace);

	2588 if (limit_result == DONE) return;

	2589 DCHECK(limit_result == CONTINUE);

	2590

	2591 if (trace->cp_offset() + Length() > RegExpMacroAssembler::kMaxCPOffset) {

	2592 compiler->SetRegExpTooBig();

	2593 return;

	2594 }

	2595

	2596 if (compiler->one_byte()) {

	2597 int dummy = 0;

	2598 TextEmitPass(compiler, NON_LATIN1_MATCH, false, trace, false, &dummy);

	2599 }

	2600

	2601 bool first_elt_done = false;

	2602 int bound_checked_to = trace->cp_offset() - 1;

	2603 bound_checked_to += trace->bound_checked_up_to();

	2604

	2605 // If a character is preloaded into the current character register then

	2606 // check that now.

	2607 if (trace->characters_preloaded() == 1) {

	2608 for (int pass = kFirstRealPass; pass <= kLastPass; pass++) {

	2609 if (!SkipPass(pass, compiler->ignore_case())) {

	2610 TextEmitPass(compiler,

	2611 static_cast<TextEmitPassType>(pass),

	2612 true,

	2613 trace,

	2614 false,

	2615 &bound_checked_to);

	2616 }

	2617 }

	2618 first_elt_done = true;

	2619 }

	2620

	2621 for (int pass = kFirstRealPass; pass <= kLastPass; pass++) {

	2622 if (!SkipPass(pass, compiler->ignore_case())) {

	2623 TextEmitPass(compiler,

	2624 static_cast<TextEmitPassType>(pass),

	2625 false,

	2626 trace,

	2627 first_elt_done,

	2628 &bound_checked_to);

	2629 }

	2630 }

	2631

	2632 Trace successor_trace(*trace);

	2633 successor_trace.set_at_start(false);

	2634 successor_trace.AdvanceCurrentPositionInTrace(Length(), compiler);

	2635 RecursionCheck rc(compiler);

	2636 on_success()->Emit(compiler, &successor_trace);

	2637 }

	2638

	2639

	2640 void Trace::InvalidateCurrentCharacter() {

	2641 characters_preloaded_ = 0;

	2642 }

	2643

	2644

	2645 void Trace::AdvanceCurrentPositionInTrace(int by, RegExpCompiler* compiler) {

	2646 DCHECK(by > 0);

	2647 // We don't have an instruction for shifting the current character register

	2648 // down or for using a shifted value for anything so lets just forget that

	2649 // we preloaded any characters into it.

	2650 characters_preloaded_ = 0;

	2651 // Adjust the offsets of the quick check performed information. This

	2652 // information is used to find out what we already determined about the

	2653 // characters by means of mask and compare.

	2654 quick_check_performed_.Advance(by, compiler->one_byte());

	2655 cp_offset_ += by;

	2656 if (cp_offset_ > RegExpMacroAssembler::kMaxCPOffset) {

	2657 compiler->SetRegExpTooBig();

	2658 cp_offset_ = 0;

	2659 }

	2660 bound_checked_up_to_ = Max(0, bound_checked_up_to_ - by);

	2661 }

	2662

	2663

	2664 void TextNode::MakeCaseIndependent(bool is_one_byte) {

	2665 int element_count = elms_->length();

	2666 for (int i = 0; i < element_count; i++) {

	2667 TextElement elm = elms_->at(i);

	2668 if (elm.text_type() == TextElement::CHAR_CLASS) {

	2669 RegExpCharacterClass* cc = elm.char_class();

	2670 // None of the standard character classes is different in the case

	2671 // independent case and it slows us down if we don't know that.

	2672 if (cc->is_standard(zone())) continue;

	2673 ZoneList<CharacterRange>* ranges = cc->ranges(zone());

	2674 int range_count = ranges->length();

	2675 for (int j = 0; j < range_count; j++) {

	2676 ranges->at(j).AddCaseEquivalents(ranges, is_one_byte, zone());

	2677 }

	2678 }

	2679 }

	2680 }

	2681

	2682

	2683 int TextNode::GreedyLoopTextLength() {

	2684 TextElement elm = elms_->at(elms_->length() - 1);

	2685 return elm.cp_offset() + elm.length();

	2686 }

	2687

	2688

	2689 RegExpNode* TextNode::GetSuccessorOfOmnivorousTextNode(

	2690 RegExpCompiler* compiler) {

	2691 if (elms_->length() != 1) return NULL;

	2692 TextElement elm = elms_->at(0);

	2693 if (elm.text_type() != TextElement::CHAR_CLASS) return NULL;

	2694 RegExpCharacterClass* node = elm.char_class();

	2695 ZoneList<CharacterRange>* ranges = node->ranges(zone());

	2696 if (!CharacterRange::IsCanonical(ranges)) {

	2697 CharacterRange::Canonicalize(ranges);

	2698 }

	2699 if (node->is_negated()) {

	2700 return ranges->length() == 0 ? on_success() : NULL;

	2701 }

	2702 if (ranges->length() != 1) return NULL;

	2703 uint32_t max_char;

	2704 if (compiler->one_byte()) {

	2705 max_char = String::kMaxOneByteCharCode;

	2706 } else {

	2707 max_char = String::kMaxUtf16CodeUnit;

	2708 }

	2709 return ranges->at(0).IsEverything(max_char) ? on_success() : NULL;

	2710 }

	2711

	2712

	2713 // Finds the fixed match length of a sequence of nodes that goes from

	2714 // this alternative and back to this choice node. If there are variable

	2715 // length nodes or other complications in the way then return a sentinel

	2716 // value indicating that a greedy loop cannot be constructed.

	2717 int ChoiceNode::GreedyLoopTextLengthForAlternative(

	2718 GuardedAlternative* alternative) {

	2719 int length = 0;

	2720 RegExpNode* node = alternative->node();

	2721 // Later we will generate code for all these text nodes using recursion

	2722 // so we have to limit the max number.

	2723 int recursion_depth = 0;

	2724 while (node != this) {

	2725 if (recursion_depth++ > RegExpCompiler::kMaxRecursion) {

	2726 return kNodeIsTooComplexForGreedyLoops;

	2727 }

	2728 int node_length = node->GreedyLoopTextLength();

	2729 if (node_length == kNodeIsTooComplexForGreedyLoops) {

	2730 return kNodeIsTooComplexForGreedyLoops;

	2731 }

	2732 length += node_length;

	2733 SeqRegExpNode* seq_node = static_cast<SeqRegExpNode*>(node);

	2734 node = seq_node->on_success();

	2735 }

	2736 return length;

	2737 }

	2738

	2739

	2740 void LoopChoiceNode::AddLoopAlternative(GuardedAlternative alt) {

	2741 DCHECK_EQ(loop_node_, NULL);

	2742 AddAlternative(alt);

	2743 loop_node_ = alt.node();

	2744 }

	2745

	2746

	2747 void LoopChoiceNode::AddContinueAlternative(GuardedAlternative alt) {

	2748 DCHECK_EQ(continue_node_, NULL);

	2749 AddAlternative(alt);

	2750 continue_node_ = alt.node();

	2751 }

	2752

	2753

	2754 void LoopChoiceNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	2755 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	2756 if (trace->stop_node() == this) {

	2757 // Back edge of greedy optimized loop node graph.

	2758 int text_length =

	2759 GreedyLoopTextLengthForAlternative(&(alternatives_->at(0)));

	2760 DCHECK(text_length != kNodeIsTooComplexForGreedyLoops);

	2761 // Update the counter-based backtracking info on the stack. This is an

	2762 // optimization for greedy loops (see below).

	2763 DCHECK(trace->cp_offset() == text_length);

	2764 macro_assembler->AdvanceCurrentPosition(text_length);

	2765 macro_assembler->GoTo(trace->loop_label());

	2766 return;

	2767 }

	2768 DCHECK(trace->stop_node() == NULL);

	2769 if (!trace->is_trivial()) {

	2770 trace->Flush(compiler, this);

	2771 return;

	2772 }

	2773 ChoiceNode::Emit(compiler, trace);

	2774 }

	2775

	2776

	2777 int ChoiceNode::CalculatePreloadCharacters(RegExpCompiler* compiler,

	2778 int eats_at_least) {

	2779 int preload_characters = Min(4, eats_at_least);

	2780 if (compiler->macro_assembler()->CanReadUnaligned()) {

	2781 bool one_byte = compiler->one_byte();

	2782 if (one_byte) {

	2783 if (preload_characters > 4) preload_characters = 4;

	2784 // We can't preload 3 characters because there is no machine instruction

	2785 // to do that. We can't just load 4 because we could be reading

	2786 // beyond the end of the string, which could cause a memory fault.

	2787 if (preload_characters == 3) preload_characters = 2;

	2788 } else {

	2789 if (preload_characters > 2) preload_characters = 2;

	2790 }

	2791 } else {

	2792 if (preload_characters > 1) preload_characters = 1;

	2793 }

	2794 return preload_characters;

	2795 }

	2796

	2797

	2798 // This class is used when generating the alternatives in a choice node. It

	2799 // records the way the alternative is being code generated.

	2800 class AlternativeGeneration: public Malloced {

	2801 public:

	2802 AlternativeGeneration()

	2803 : possible_success(),

	2804 expects_preload(false),

	2805 after(),

	2806 quick_check_details() { }

	2807 Label possible_success;

	2808 bool expects_preload;

	2809 Label after;

	2810 QuickCheckDetails quick_check_details;

	2811 };

	2812

	2813

	2814 // Creates a list of AlternativeGenerations. If the list has a reasonable

	2815 // size then it is on the stack, otherwise the excess is on the heap.

	2816 class AlternativeGenerationList {

	2817 public:

	2818 AlternativeGenerationList(int count, Zone* zone)

	2819 : alt_gens_(count, zone) {

	2820 for (int i = 0; i < count && i < kAFew; i++) {

	2821 alt_gens_.Add(a_few_alt_gens_ + i, zone);

	2822 }

	2823 for (int i = kAFew; i < count; i++) {

	2824 alt_gens_.Add(new AlternativeGeneration(), zone);

	2825 }

	2826 }

	2827 ~AlternativeGenerationList() {

	2828 for (int i = kAFew; i < alt_gens_.length(); i++) {

	2829 delete alt_gens_[i];

	2830 alt_gens_[i] = NULL;

	2831 }

	2832 }

	2833

	2834 AlternativeGeneration* at(int i) {

	2835 return alt_gens_[i];

	2836 }

	2837

	2838 private:

	2839 static const int kAFew = 10;

	2840 ZoneList<AlternativeGeneration*> alt_gens_;

	2841 AlternativeGeneration a_few_alt_gens_[kAFew];

	2842 };

	2843

	2844

	2845 // The '2' variant is has inclusive from and exclusive to.

	2846 // This covers \s as defined in ECMA-262 5.1, 15.10.2.12,

	2847 // which include WhiteSpace (7.2) or LineTerminator (7.3) values.

	2848 static const int kSpaceRanges[] = { '\t', '\r' + 1, ' ', ' ' + 1,

	2849 0x00A0, 0x00A1, 0x1680, 0x1681, 0x180E, 0x180F, 0x2000, 0x200B,

	2850 0x2028, 0x202A, 0x202F, 0x2030, 0x205F, 0x2060, 0x3000, 0x3001,

	2851 0xFEFF, 0xFF00, 0x10000 };

	2852 static const int kSpaceRangeCount = arraysize(kSpaceRanges);

	2853

	2854 static const int kWordRanges[] = {

	2855 '0', '9' + 1, 'A', 'Z' + 1, '_', '_' + 1, 'a', 'z' + 1, 0x10000 };

	2856 static const int kWordRangeCount = arraysize(kWordRanges);

	2857 static const int kDigitRanges[] = { '0', '9' + 1, 0x10000 };

	2858 static const int kDigitRangeCount = arraysize(kDigitRanges);

	2859 static const int kSurrogateRanges[] = { 0xd800, 0xe000, 0x10000 };

	2860 static const int kSurrogateRangeCount = arraysize(kSurrogateRanges);

	2861 static const int kLineTerminatorRanges[] = { 0x000A, 0x000B, 0x000D, 0x000E,

	2862 0x2028, 0x202A, 0x10000 };

	2863 static const int kLineTerminatorRangeCount = arraysize(kLineTerminatorRanges);

	2864

	2865

	2866 void BoyerMoorePositionInfo::Set(int character) {

	2867 SetInterval(Interval(character, character));

	2868 }

	2869

	2870

	2871 void BoyerMoorePositionInfo::SetInterval(const Interval& interval) {

	2872 s_ = AddRange(s_, kSpaceRanges, kSpaceRangeCount, interval);

	2873 w_ = AddRange(w_, kWordRanges, kWordRangeCount, interval);

	2874 d_ = AddRange(d_, kDigitRanges, kDigitRangeCount, interval);

	2875 surrogate_ =

	2876 AddRange(surrogate_, kSurrogateRanges, kSurrogateRangeCount, interval);

	2877 if (interval.to() - interval.from() >= kMapSize - 1) {

	2878 if (map_count_ != kMapSize) {

	2879 map_count_ = kMapSize;

	2880 for (int i = 0; i < kMapSize; i++) map_->at(i) = true;

	2881 }

	2882 return;

	2883 }

	2884 for (int i = interval.from(); i <= interval.to(); i++) {

	2885 int mod_character = (i & kMask);

	2886 if (!map_->at(mod_character)) {

	2887 map_count_++;

	2888 map_->at(mod_character) = true;

	2889 }

	2890 if (map_count_ == kMapSize) return;

	2891 }

	2892 }

	2893

	2894

	2895 void BoyerMoorePositionInfo::SetAll() {

	2896 s_ = w_ = d_ = kLatticeUnknown;

	2897 if (map_count_ != kMapSize) {

	2898 map_count_ = kMapSize;

	2899 for (int i = 0; i < kMapSize; i++) map_->at(i) = true;

	2900 }

	2901 }

	2902

	2903

	2904 BoyerMooreLookahead::BoyerMooreLookahead(

	2905 int length, RegExpCompiler* compiler, Zone* zone)

	2906 : length_(length),

	2907 compiler_(compiler) {

	2908 if (compiler->one_byte()) {

	2909 max_char_ = String::kMaxOneByteCharCode;

	2910 } else {

	2911 max_char_ = String::kMaxUtf16CodeUnit;

	2912 }

	2913 bitmaps_ = new(zone) ZoneList<BoyerMoorePositionInfo*>(length, zone);

	2914 for (int i = 0; i < length; i++) {

	2915 bitmaps_->Add(new(zone) BoyerMoorePositionInfo(zone), zone);

	2916 }

	2917 }

	2918

	2919

	2920 // Find the longest range of lookahead that has the fewest number of different

	2921 // characters that can occur at a given position. Since we are optimizing two

	2922 // different parameters at once this is a tradeoff.

	2923 bool BoyerMooreLookahead::FindWorthwhileInterval(int* from, int* to) {

	2924 int biggest_points = 0;

	2925 // If more than 32 characters out of 128 can occur it is unlikely that we can

	2926 // be lucky enough to step forwards much of the time.

	2927 const int kMaxMax = 32;

	2928 for (int max_number_of_chars = 4;

	2929 max_number_of_chars < kMaxMax;

	2930 max_number_of_chars *= 2) {

	2931 biggest_points =

	2932 FindBestInterval(max_number_of_chars, biggest_points, from, to);

	2933 }

	2934 if (biggest_points == 0) return false;

	2935 return true;

	2936 }

	2937

	2938

	2939 // Find the highest-points range between 0 and length_ where the character

	2940 // information is not too vague. 'Too vague' means that there are more than

	2941 // max_number_of_chars that can occur at this position. Calculates the number

	2942 // of points as the product of width-of-the-range and

	2943 // probability-of-finding-one-of-the-characters, where the probability is

	2944 // calculated using the frequency distribution of the sample subject string.

	2945 int BoyerMooreLookahead::FindBestInterval(

	2946 int max_number_of_chars, int old_biggest_points, int* from, int* to) {

	2947 int biggest_points = old_biggest_points;

	2948 static const int kSize = RegExpMacroAssembler::kTableSize;

	2949 for (int i = 0; i < length_; ) {

	2950 while (i < length_ && Count(i) > max_number_of_chars) i++;

	2951 if (i == length_) break;

	2952 int remembered_from = i;

	2953 bool union_map[kSize];

	2954 for (int j = 0; j < kSize; j++) union_map[j] = false;

	2955 while (i < length_ && Count(i) <= max_number_of_chars) {

	2956 BoyerMoorePositionInfo* map = bitmaps_->at(i);

	2957 for (int j = 0; j < kSize; j++) union_map[j] \|= map->at(j);

	2958 i++;

	2959 }

	2960 int frequency = 0;

	2961 for (int j = 0; j < kSize; j++) {

	2962 if (union_map[j]) {

	2963 // Add 1 to the frequency to give a small per-character boost for

	2964 // the cases where our sampling is not good enough and many

	2965 // characters have a frequency of zero. This means the frequency

	2966 // can theoretically be up to 2*kSize though we treat it mostly as

	2967 // a fraction of kSize.

	2968 frequency += compiler_->frequency_collator()->Frequency(j) + 1;

	2969 }

	2970 }

	2971 // We use the probability of skipping times the distance we are skipping to

	2972 // judge the effectiveness of this. Actually we have a cut-off: By

	2973 // dividing by 2 we switch off the skipping if the probability of skipping

	2974 // is less than 50%. This is because the multibyte mask-and-compare

	2975 // skipping in quickcheck is more likely to do well on this case.

	2976 bool in_quickcheck_range =

	2977 ((i - remembered_from < 4) \|\|

	2978 (compiler_->one_byte() ? remembered_from <= 4 : remembered_from <= 2));

	2979 // Called 'probability' but it is only a rough estimate and can actually

	2980 // be outside the 0-kSize range.

	2981 int probability = (in_quickcheck_range ? kSize / 2 : kSize) - frequency;

	2982 int points = (i - remembered_from) * probability;

	2983 if (points > biggest_points) {

	2984 *from = remembered_from;

	2985 *to = i - 1;

	2986 biggest_points = points;

	2987 }

	2988 }

	2989 return biggest_points;

	2990 }

	2991

	2992

	2993 // Take all the characters that will not prevent a successful match if they

	2994 // occur in the subject string in the range between min_lookahead and

	2995 // max_lookahead (inclusive) measured from the current position. If the

	2996 // character at max_lookahead offset is not one of these characters, then we

	2997 // can safely skip forwards by the number of characters in the range.

	2998 int BoyerMooreLookahead::GetSkipTable(int min_lookahead,

	2999 int max_lookahead,

	3000 Handle<ByteArray> boolean_skip_table) {

	3001 const int kSize = RegExpMacroAssembler::kTableSize;

	3002

	3003 const int kSkipArrayEntry = 0;

	3004 const int kDontSkipArrayEntry = 1;

	3005

	3006 for (int i = 0; i < kSize; i++) {

	3007 boolean_skip_table->set(i, kSkipArrayEntry);

	3008 }

	3009 int skip = max_lookahead + 1 - min_lookahead;

	3010

	3011 for (int i = max_lookahead; i >= min_lookahead; i--) {

	3012 BoyerMoorePositionInfo* map = bitmaps_->at(i);

	3013 for (int j = 0; j < kSize; j++) {

	3014 if (map->at(j)) {

	3015 boolean_skip_table->set(j, kDontSkipArrayEntry);

	3016 }

	3017 }

	3018 }

	3019

	3020 return skip;

	3021 }

	3022

	3023

	3024 // See comment above on the implementation of GetSkipTable.

	3025 void BoyerMooreLookahead::EmitSkipInstructions(RegExpMacroAssembler* masm) {

	3026 const int kSize = RegExpMacroAssembler::kTableSize;

	3027

	3028 int min_lookahead = 0;

	3029 int max_lookahead = 0;

	3030

	3031 if (!FindWorthwhileInterval(&min_lookahead, &max_lookahead)) return;

	3032

	3033 bool found_single_character = false;

	3034 int single_character = 0;

	3035 for (int i = max_lookahead; i >= min_lookahead; i--) {

	3036 BoyerMoorePositionInfo* map = bitmaps_->at(i);

	3037 if (map->map_count() > 1 \|\|

	3038 (found_single_character && map->map_count() != 0)) {

	3039 found_single_character = false;

	3040 break;

	3041 }

	3042 for (int j = 0; j < kSize; j++) {

	3043 if (map->at(j)) {

	3044 found_single_character = true;

	3045 single_character = j;

	3046 break;

	3047 }

	3048 }

	3049 }

	3050

	3051 int lookahead_width = max_lookahead + 1 - min_lookahead;

	3052

	3053 if (found_single_character && lookahead_width == 1 && max_lookahead < 3) {

	3054 // The mask-compare can probably handle this better.

	3055 return;

	3056 }

	3057

	3058 if (found_single_character) {

	3059 Label cont, again;

	3060 masm->Bind(&again);

	3061 masm->LoadCurrentCharacter(max_lookahead, &cont, true);

	3062 if (max_char_ > kSize) {

	3063 masm->CheckCharacterAfterAnd(single_character,

	3064 RegExpMacroAssembler::kTableMask,

	3065 &cont);

	3066 } else {

	3067 masm->CheckCharacter(single_character, &cont);

	3068 }

	3069 masm->AdvanceCurrentPosition(lookahead_width);

	3070 masm->GoTo(&again);

	3071 masm->Bind(&cont);

	3072 return;

	3073 }

	3074

	3075 Factory* factory = masm->zone()->isolate()->factory();

	3076 Handle<ByteArray> boolean_skip_table = factory->NewByteArray(kSize, TENURED);

	3077 int skip_distance = GetSkipTable(

	3078 min_lookahead, max_lookahead, boolean_skip_table);

	3079 DCHECK(skip_distance != 0);

	3080

	3081 Label cont, again;

	3082 masm->Bind(&again);

	3083 masm->LoadCurrentCharacter(max_lookahead, &cont, true);

	3084 masm->CheckBitInTable(boolean_skip_table, &cont);

	3085 masm->AdvanceCurrentPosition(skip_distance);

	3086 masm->GoTo(&again);

	3087 masm->Bind(&cont);

	3088 }

	3089

	3090

	3091 /* Code generation for choice nodes.

	3092 *

	3093 * We generate quick checks that do a mask and compare to eliminate a

	3094 * choice. If the quick check succeeds then it jumps to the continuation to

	3095 * do slow checks and check subsequent nodes. If it fails (the common case)

	3096 * it falls through to the next choice.

	3097 *

	3098 * Here is the desired flow graph. Nodes directly below each other imply

	3099 * fallthrough. Alternatives 1 and 2 have quick checks. Alternative

	3100 * 3 doesn't have a quick check so we have to call the slow check.

	3101 * Nodes are marked Qn for quick checks and Sn for slow checks. The entire

	3102 * regexp continuation is generated directly after the Sn node, up to the

	3103 * next GoTo if we decide to reuse some already generated code. Some

	3104 * nodes expect preload_characters to be preloaded into the current

	3105 * character register. R nodes do this preloading. Vertices are marked

	3106 * F for failures and S for success (possible success in the case of quick

	3107 * nodes). L, V, < and > are used as arrow heads.

	3108 *

	3109 * ----------> R

	3110 * \|

	3111 * V

	3112 * Q1 -----> S1

	3113 * \| S /

	3114 * F\| /

	3115 * \| F/

	3116 * \| /

	3117 * \| R

	3118 * \| /

	3119 * V L

	3120 * Q2 -----> S2

	3121 * \| S /

	3122 * F\| /

	3123 * \| F/

	3124 * \| /

	3125 * \| R

	3126 * \| /

	3127 * V L

	3128 * S3

	3129 * \|

	3130 * F\|

	3131 * \|

	3132 * R

	3133 * \|

	3134 * backtrack V

	3135 * <----------Q4

	3136 * \ F \|

	3137 * \ \|S

	3138 * \ F V

	3139 * \-----S4

	3140 *

	3141 * For greedy loops we push the current position, then generate the code that

	3142 * eats the input specially in EmitGreedyLoop. The other choice (the

	3143 * continuation) is generated by the normal code in EmitChoices, and steps back

	3144 * in the input to the starting position when it fails to match. The loop code

	3145 * looks like this (U is the unwind code that steps back in the greedy loop).

	3146 *

	3147 * _____

	3148 * / \

	3149 * V \|

	3150 * ----------> S1 \|

	3151 * /\| \|

	3152 * / \|S \|

	3153 * F/ \_____/

	3154 * /

	3155 * \|<-----

	3156 * \| \

	3157 * V \|S

	3158 * Q2 ---> U----->backtrack

	3159 * \| F /

	3160 * S\| /

	3161 * V F /

	3162 * S2--/

	3163 */

	3164

	3165 GreedyLoopState::GreedyLoopState(bool not_at_start) {

	3166 counter_backtrack_trace_.set_backtrack(&label_);

	3167 if (not_at_start) counter_backtrack_trace_.set_at_start(false);

	3168 }

	3169

	3170

	3171 void ChoiceNode::AssertGuardsMentionRegisters(Trace* trace) {

	3172 #ifdef DEBUG

	3173 int choice_count = alternatives_->length();

	3174 for (int i = 0; i < choice_count - 1; i++) {

	3175 GuardedAlternative alternative = alternatives_->at(i);

	3176 ZoneList<Guard> guards = alternative.guards();

	3177 int guard_count = (guards == NULL) ? 0 : guards->length();

	3178 for (int j = 0; j < guard_count; j++) {

	3179 DCHECK(!trace->mentions_reg(guards->at(j)->reg()));

	3180 }

	3181 }

	3182 #endif

	3183 }

	3184

	3185

	3186 void ChoiceNode::SetUpPreLoad(RegExpCompiler* compiler,

	3187 Trace* current_trace,

	3188 PreloadState* state) {

	3189 if (state->eats_at_least_ == PreloadState::kEatsAtLeastNotYetInitialized) {

	3190 // Save some time by looking at most one machine word ahead.

	3191 state->eats_at_least_ =

	3192 EatsAtLeast(compiler->one_byte() ? 4 : 2, kRecursionBudget,

	3193 current_trace->at_start() == Trace::FALSE_VALUE);

	3194 }

	3195 state->preload_characters_ =

	3196 CalculatePreloadCharacters(compiler, state->eats_at_least_);

	3197

	3198 state->preload_is_current_ =

	3199 (current_trace->characters_preloaded() == state->preload_characters_);

	3200 state->preload_has_checked_bounds_ = state->preload_is_current_;

	3201 }

	3202

	3203

	3204 void ChoiceNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	3205 int choice_count = alternatives_->length();

	3206

	3207 AssertGuardsMentionRegisters(trace);

	3208

	3209 LimitResult limit_result = LimitVersions(compiler, trace);

	3210 if (limit_result == DONE) return;

	3211 DCHECK(limit_result == CONTINUE);

	3212

	3213 // For loop nodes we already flushed (see LoopChoiceNode::Emit), but for

	3214 // other choice nodes we only flush if we are out of code size budget.

	3215 if (trace->flush_budget() == 0 && trace->actions() != NULL) {

	3216 trace->Flush(compiler, this);

	3217 return;

	3218 }

	3219

	3220 RecursionCheck rc(compiler);

	3221

	3222 PreloadState preload;

	3223 preload.init();

	3224 GreedyLoopState greedy_loop_state(not_at_start());

	3225

	3226 int text_length = GreedyLoopTextLengthForAlternative(&alternatives_->at(0));

	3227 AlternativeGenerationList alt_gens(choice_count, zone());

	3228

	3229 if (choice_count > 1 && text_length != kNodeIsTooComplexForGreedyLoops) {

	3230 trace = EmitGreedyLoop(compiler,

	3231 trace,

	3232 &alt_gens,

	3233 &preload,

	3234 &greedy_loop_state,

	3235 text_length);

	3236 } else {

	3237 // TODO(erikcorry): Delete this. We don't need this label, but it makes us

	3238 // match the traces produced pre-cleanup.

	3239 Label second_choice;

	3240 compiler->macro_assembler()->Bind(&second_choice);

	3241

	3242 preload.eats_at_least_ = EmitOptimizedUnanchoredSearch(compiler, trace);

	3243

	3244 EmitChoices(compiler,

	3245 &alt_gens,

	3246 0,

	3247 trace,

	3248 &preload);

	3249 }

	3250

	3251 // At this point we need to generate slow checks for the alternatives where

	3252 // the quick check was inlined. We can recognize these because the associated

	3253 // label was bound.

	3254 int new_flush_budget = trace->flush_budget() / choice_count;

	3255 for (int i = 0; i < choice_count; i++) {

	3256 AlternativeGeneration* alt_gen = alt_gens.at(i);

	3257 Trace new_trace(*trace);

	3258 // If there are actions to be flushed we have to limit how many times

	3259 // they are flushed. Take the budget of the parent trace and distribute

	3260 // it fairly amongst the children.

	3261 if (new_trace.actions() != NULL) {

	3262 new_trace.set_flush_budget(new_flush_budget);

	3263 }

	3264 bool next_expects_preload =

	3265 i == choice_count - 1 ? false : alt_gens.at(i + 1)->expects_preload;

	3266 EmitOutOfLineContinuation(compiler,

	3267 &new_trace,

	3268 alternatives_->at(i),

	3269 alt_gen,

	3270 preload.preload_characters_,

	3271 next_expects_preload);

	3272 }

	3273 }

	3274

	3275

	3276 Trace* ChoiceNode::EmitGreedyLoop(RegExpCompiler* compiler,

	3277 Trace* trace,

	3278 AlternativeGenerationList* alt_gens,

	3279 PreloadState* preload,

	3280 GreedyLoopState* greedy_loop_state,

	3281 int text_length) {

	3282 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	3283 // Here we have special handling for greedy loops containing only text nodes

	3284 // and other simple nodes. These are handled by pushing the current

	3285 // position on the stack and then incrementing the current position each

	3286 // time around the switch. On backtrack we decrement the current position

	3287 // and check it against the pushed value. This avoids pushing backtrack

	3288 // information for each iteration of the loop, which could take up a lot of

	3289 // space.

	3290 DCHECK(trace->stop_node() == NULL);

	3291 macro_assembler->PushCurrentPosition();

	3292 Label greedy_match_failed;

	3293 Trace greedy_match_trace;

	3294 if (not_at_start()) greedy_match_trace.set_at_start(false);

	3295 greedy_match_trace.set_backtrack(&greedy_match_failed);

	3296 Label loop_label;

	3297 macro_assembler->Bind(&loop_label);

	3298 greedy_match_trace.set_stop_node(this);

	3299 greedy_match_trace.set_loop_label(&loop_label);

	3300 alternatives_->at(0).node()->Emit(compiler, &greedy_match_trace);

	3301 macro_assembler->Bind(&greedy_match_failed);

	3302

	3303 Label second_choice; // For use in greedy matches.

	3304 macro_assembler->Bind(&second_choice);

	3305

	3306 Trace* new_trace = greedy_loop_state->counter_backtrack_trace();

	3307

	3308 EmitChoices(compiler,

	3309 alt_gens,

	3310 1,

	3311 new_trace,

	3312 preload);

	3313

	3314 macro_assembler->Bind(greedy_loop_state->label());

	3315 // If we have unwound to the bottom then backtrack.

	3316 macro_assembler->CheckGreedyLoop(trace->backtrack());

	3317 // Otherwise try the second priority at an earlier position.

	3318 macro_assembler->AdvanceCurrentPosition(-text_length);

	3319 macro_assembler->GoTo(&second_choice);

	3320 return new_trace;

	3321 }

	3322

	3323 int ChoiceNode::EmitOptimizedUnanchoredSearch(RegExpCompiler* compiler,

	3324 Trace* trace) {

	3325 int eats_at_least = PreloadState::kEatsAtLeastNotYetInitialized;

	3326 if (alternatives_->length() != 2) return eats_at_least;

	3327

	3328 GuardedAlternative alt1 = alternatives_->at(1);

	3329 if (alt1.guards() != NULL && alt1.guards()->length() != 0) {

	3330 return eats_at_least;

	3331 }

	3332 RegExpNode* eats_anything_node = alt1.node();

	3333 if (eats_anything_node->GetSuccessorOfOmnivorousTextNode(compiler) != this) {

	3334 return eats_at_least;

	3335 }

	3336

	3337 // Really we should be creating a new trace when we execute this function,

	3338 // but there is no need, because the code it generates cannot backtrack, and

	3339 // we always arrive here with a trivial trace (since it's the entry to a

	3340 // loop. That also implies that there are no preloaded characters, which is

	3341 // good, because it means we won't be violating any assumptions by

	3342 // overwriting those characters with new load instructions.

	3343 DCHECK(trace->is_trivial());

	3344

	3345 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	3346 // At this point we know that we are at a non-greedy loop that will eat

	3347 // any character one at a time. Any non-anchored regexp has such a

	3348 // loop prepended to it in order to find where it starts. We look for

	3349 // a pattern of the form ...abc... where we can look 6 characters ahead

	3350 // and step forwards 3 if the character is not one of abc. Abc need

	3351 // not be atoms, they can be any reasonably limited character class or

	3352 // small alternation.

	3353 BoyerMooreLookahead* bm = bm_info(false);

	3354 if (bm == NULL) {

	3355 eats_at_least = Min(kMaxLookaheadForBoyerMoore,

	3356 EatsAtLeast(kMaxLookaheadForBoyerMoore,

	3357 kRecursionBudget,

	3358 false));

	3359 if (eats_at_least >= 1) {

	3360 bm = new(zone()) BoyerMooreLookahead(eats_at_least,

	3361 compiler,

	3362 zone());

	3363 GuardedAlternative alt0 = alternatives_->at(0);

	3364 alt0.node()->FillInBMInfo(0, kRecursionBudget, bm, false);

	3365 }

	3366 }

	3367 if (bm != NULL) {

	3368 bm->EmitSkipInstructions(macro_assembler);

	3369 }

	3370 return eats_at_least;

	3371 }

	3372

	3373

	3374 void ChoiceNode::EmitChoices(RegExpCompiler* compiler,

	3375 AlternativeGenerationList* alt_gens,

	3376 int first_choice,

	3377 Trace* trace,

	3378 PreloadState* preload) {

	3379 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	3380 SetUpPreLoad(compiler, trace, preload);

	3381

	3382 // For now we just call all choices one after the other. The idea ultimately

	3383 // is to use the Dispatch table to try only the relevant ones.

	3384 int choice_count = alternatives_->length();

	3385

	3386 int new_flush_budget = trace->flush_budget() / choice_count;

	3387

	3388 for (int i = first_choice; i < choice_count; i++) {

	3389 bool is_last = i == choice_count - 1;

	3390 bool fall_through_on_failure = !is_last;

	3391 GuardedAlternative alternative = alternatives_->at(i);

	3392 AlternativeGeneration* alt_gen = alt_gens->at(i);

	3393 alt_gen->quick_check_details.set_characters(preload->preload_characters_);

	3394 ZoneList<Guard> guards = alternative.guards();

	3395 int guard_count = (guards == NULL) ? 0 : guards->length();

	3396 Trace new_trace(*trace);

	3397 new_trace.set_characters_preloaded(preload->preload_is_current_ ?

	3398 preload->preload_characters_ :

	3399 0);

	3400 if (preload->preload_has_checked_bounds_) {

	3401 new_trace.set_bound_checked_up_to(preload->preload_characters_);

	3402 }

	3403 new_trace.quick_check_performed()->Clear();

	3404 if (not_at_start_) new_trace.set_at_start(Trace::FALSE_VALUE);

	3405 if (!is_last) {

	3406 new_trace.set_backtrack(&alt_gen->after);

	3407 }

	3408 alt_gen->expects_preload = preload->preload_is_current_;

	3409 bool generate_full_check_inline = false;

	3410 if (FLAG_regexp_optimization &&

	3411 try_to_emit_quick_check_for_alternative(i == 0) &&

	3412 alternative.node()->EmitQuickCheck(compiler,

	3413 trace,

	3414 &new_trace,

	3415 preload->preload_has_checked_bounds_,

	3416 &alt_gen->possible_success,

	3417 &alt_gen->quick_check_details,

	3418 fall_through_on_failure)) {

	3419 // Quick check was generated for this choice.

	3420 preload->preload_is_current_ = true;

	3421 preload->preload_has_checked_bounds_ = true;

	3422 // If we generated the quick check to fall through on possible success,

	3423 // we now need to generate the full check inline.

	3424 if (!fall_through_on_failure) {

	3425 macro_assembler->Bind(&alt_gen->possible_success);

	3426 new_trace.set_quick_check_performed(&alt_gen->quick_check_details);

	3427 new_trace.set_characters_preloaded(preload->preload_characters_);

	3428 new_trace.set_bound_checked_up_to(preload->preload_characters_);

	3429 generate_full_check_inline = true;

	3430 }

	3431 } else if (alt_gen->quick_check_details.cannot_match()) {

	3432 if (!fall_through_on_failure) {

	3433 macro_assembler->GoTo(trace->backtrack());

	3434 }

	3435 continue;

	3436 } else {

	3437 // No quick check was generated. Put the full code here.

	3438 // If this is not the first choice then there could be slow checks from

	3439 // previous cases that go here when they fail. There's no reason to

	3440 // insist that they preload characters since the slow check we are about

	3441 // to generate probably can't use it.

	3442 if (i != first_choice) {

	3443 alt_gen->expects_preload = false;

	3444 new_trace.InvalidateCurrentCharacter();

	3445 }

	3446 generate_full_check_inline = true;

	3447 }

	3448 if (generate_full_check_inline) {

	3449 if (new_trace.actions() != NULL) {

	3450 new_trace.set_flush_budget(new_flush_budget);

	3451 }

	3452 for (int j = 0; j < guard_count; j++) {

	3453 GenerateGuard(macro_assembler, guards->at(j), &new_trace);

	3454 }

	3455 alternative.node()->Emit(compiler, &new_trace);

	3456 preload->preload_is_current_ = false;

	3457 }

	3458 macro_assembler->Bind(&alt_gen->after);

	3459 }

	3460 }

	3461

	3462

	3463 void ChoiceNode::EmitOutOfLineContinuation(RegExpCompiler* compiler,

	3464 Trace* trace,

	3465 GuardedAlternative alternative,

	3466 AlternativeGeneration* alt_gen,

	3467 int preload_characters,

	3468 bool next_expects_preload) {

	3469 if (!alt_gen->possible_success.is_linked()) return;

	3470

	3471 RegExpMacroAssembler* macro_assembler = compiler->macro_assembler();

	3472 macro_assembler->Bind(&alt_gen->possible_success);

	3473 Trace out_of_line_trace(*trace);

	3474 out_of_line_trace.set_characters_preloaded(preload_characters);

	3475 out_of_line_trace.set_quick_check_performed(&alt_gen->quick_check_details);

	3476 if (not_at_start_) out_of_line_trace.set_at_start(Trace::FALSE_VALUE);

	3477 ZoneList<Guard> guards = alternative.guards();

	3478 int guard_count = (guards == NULL) ? 0 : guards->length();

	3479 if (next_expects_preload) {

	3480 Label reload_current_char;

	3481 out_of_line_trace.set_backtrack(&reload_current_char);

	3482 for (int j = 0; j < guard_count; j++) {

	3483 GenerateGuard(macro_assembler, guards->at(j), &out_of_line_trace);

	3484 }

	3485 alternative.node()->Emit(compiler, &out_of_line_trace);

	3486 macro_assembler->Bind(&reload_current_char);

	3487 // Reload the current character, since the next quick check expects that.

	3488 // We don't need to check bounds here because we only get into this

	3489 // code through a quick check which already did the checked load.

	3490 macro_assembler->LoadCurrentCharacter(trace->cp_offset(),

	3491 NULL,

	3492 false,

	3493 preload_characters);

	3494 macro_assembler->GoTo(&(alt_gen->after));

	3495 } else {

	3496 out_of_line_trace.set_backtrack(&(alt_gen->after));

	3497 for (int j = 0; j < guard_count; j++) {

	3498 GenerateGuard(macro_assembler, guards->at(j), &out_of_line_trace);

	3499 }

	3500 alternative.node()->Emit(compiler, &out_of_line_trace);

	3501 }

	3502 }

	3503

	3504

	3505 void ActionNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	3506 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	3507 LimitResult limit_result = LimitVersions(compiler, trace);

	3508 if (limit_result == DONE) return;

	3509 DCHECK(limit_result == CONTINUE);

	3510

	3511 RecursionCheck rc(compiler);

	3512

	3513 switch (action_type_) {

	3514 case STORE_POSITION: {

	3515 Trace::DeferredCapture

	3516 new_capture(data_.u_position_register.reg,

	3517 data_.u_position_register.is_capture,

	3518 trace);

	3519 Trace new_trace = *trace;

	3520 new_trace.add_action(&new_capture);

	3521 on_success()->Emit(compiler, &new_trace);

	3522 break;

	3523 }

	3524 case INCREMENT_REGISTER: {

	3525 Trace::DeferredIncrementRegister

	3526 new_increment(data_.u_increment_register.reg);

	3527 Trace new_trace = *trace;

	3528 new_trace.add_action(&new_increment);

	3529 on_success()->Emit(compiler, &new_trace);

	3530 break;

	3531 }

	3532 case SET_REGISTER: {

	3533 Trace::DeferredSetRegister

	3534 new_set(data_.u_store_register.reg, data_.u_store_register.value);

	3535 Trace new_trace = *trace;

	3536 new_trace.add_action(&new_set);

	3537 on_success()->Emit(compiler, &new_trace);

	3538 break;

	3539 }

	3540 case CLEAR_CAPTURES: {

	3541 Trace::DeferredClearCaptures

	3542 new_capture(Interval(data_.u_clear_captures.range_from,

	3543 data_.u_clear_captures.range_to));

	3544 Trace new_trace = *trace;

	3545 new_trace.add_action(&new_capture);

	3546 on_success()->Emit(compiler, &new_trace);

	3547 break;

	3548 }

	3549 case BEGIN_SUBMATCH:

	3550 if (!trace->is_trivial()) {

	3551 trace->Flush(compiler, this);

	3552 } else {

	3553 assembler->WriteCurrentPositionToRegister(

	3554 data_.u_submatch.current_position_register, 0);

	3555 assembler->WriteStackPointerToRegister(

	3556 data_.u_submatch.stack_pointer_register);

	3557 on_success()->Emit(compiler, trace);

	3558 }

	3559 break;

	3560 case EMPTY_MATCH_CHECK: {

	3561 int start_pos_reg = data_.u_empty_match_check.start_register;

	3562 int stored_pos = 0;

	3563 int rep_reg = data_.u_empty_match_check.repetition_register;

	3564 bool has_minimum = (rep_reg != RegExpCompiler::kNoRegister);

	3565 bool know_dist = trace->GetStoredPosition(start_pos_reg, &stored_pos);

	3566 if (know_dist && !has_minimum && stored_pos == trace->cp_offset()) {

	3567 // If we know we haven't advanced and there is no minimum we

	3568 // can just backtrack immediately.

	3569 assembler->GoTo(trace->backtrack());

	3570 } else if (know_dist && stored_pos < trace->cp_offset()) {

	3571 // If we know we've advanced we can generate the continuation

	3572 // immediately.

	3573 on_success()->Emit(compiler, trace);

	3574 } else if (!trace->is_trivial()) {

	3575 trace->Flush(compiler, this);

	3576 } else {

	3577 Label skip_empty_check;

	3578 // If we have a minimum number of repetitions we check the current

	3579 // number first and skip the empty check if it's not enough.

	3580 if (has_minimum) {

	3581 int limit = data_.u_empty_match_check.repetition_limit;

	3582 assembler->IfRegisterLT(rep_reg, limit, &skip_empty_check);

	3583 }

	3584 // If the match is empty we bail out, otherwise we fall through

	3585 // to the on-success continuation.

	3586 assembler->IfRegisterEqPos(data_.u_empty_match_check.start_register,

	3587 trace->backtrack());

	3588 assembler->Bind(&skip_empty_check);

	3589 on_success()->Emit(compiler, trace);

	3590 }

	3591 break;

	3592 }

	3593 case POSITIVE_SUBMATCH_SUCCESS: {

	3594 if (!trace->is_trivial()) {

	3595 trace->Flush(compiler, this);

	3596 return;

	3597 }

	3598 assembler->ReadCurrentPositionFromRegister(

	3599 data_.u_submatch.current_position_register);

	3600 assembler->ReadStackPointerFromRegister(

	3601 data_.u_submatch.stack_pointer_register);

	3602 int clear_register_count = data_.u_submatch.clear_register_count;

	3603 if (clear_register_count == 0) {

	3604 on_success()->Emit(compiler, trace);

	3605 return;

	3606 }

	3607 int clear_registers_from = data_.u_submatch.clear_register_from;

	3608 Label clear_registers_backtrack;

	3609 Trace new_trace = *trace;

	3610 new_trace.set_backtrack(&clear_registers_backtrack);

	3611 on_success()->Emit(compiler, &new_trace);

	3612

	3613 assembler->Bind(&clear_registers_backtrack);

	3614 int clear_registers_to = clear_registers_from + clear_register_count - 1;

	3615 assembler->ClearRegisters(clear_registers_from, clear_registers_to);

	3616

	3617 DCHECK(trace->backtrack() == NULL);

	3618 assembler->Backtrack();

	3619 return;

	3620 }

	3621 default:

	3622 UNREACHABLE();

	3623 }

	3624 }

	3625

	3626

	3627 void BackReferenceNode::Emit(RegExpCompiler* compiler, Trace* trace) {

	3628 RegExpMacroAssembler* assembler = compiler->macro_assembler();

	3629 if (!trace->is_trivial()) {

	3630 trace->Flush(compiler, this);

	3631 return;

	3632 }

	3633

	3634 LimitResult limit_result = LimitVersions(compiler, trace);

	3635 if (limit_result == DONE) return;

	3636 DCHECK(limit_result == CONTINUE);

	3637

	3638 RecursionCheck rc(compiler);

	3639

	3640 DCHECK_EQ(start_reg_ + 1, end_reg_);

	3641 if (compiler->ignore_case()) {

	3642 assembler->CheckNotBackReferenceIgnoreCase(start_reg_,

	3643 trace->backtrack());

	3644 } else {

	3645 assembler->CheckNotBackReference(start_reg_, trace->backtrack());

	3646 }

	3647 on_success()->Emit(compiler, trace);

	3648 }

	3649

	3650

	3651 // -------------------------------------------------------------------

	3652 // Dot/dotty output

	3653

	3654

	3655 #ifdef DEBUG

	3656

	3657

	3658 class DotPrinter: public NodeVisitor {

	3659 public:

	3660 DotPrinter(OStream& os, bool ignore_case) // NOLINT

	3661 : os_(os),

	3662 ignore_case_(ignore_case) {}

	3663 void PrintNode(const char* label, RegExpNode* node);

	3664 void Visit(RegExpNode* node);

	3665 void PrintAttributes(RegExpNode* from);

	3666 void PrintOnFailure(RegExpNode* from, RegExpNode* to);

	3667 #define DECLARE_VISIT(Type) \

	3668 virtual void Visit##Type(Type##Node* that);

	3669 FOR_EACH_NODE_TYPE(DECLARE_VISIT)

	3670 #undef DECLARE_VISIT

	3671 private:

	3672 OStream& os_;

	3673 bool ignore_case_;

	3674 };

	3675

	3676

	3677 void DotPrinter::PrintNode(const char* label, RegExpNode* node) {

	3678 os_ << "digraph G {\n graph [label=\"";

	3679 for (int i = 0; label[i]; i++) {

	3680 switch (label[i]) {

	3681 case '\\':

	3682 os_ << "\\\\";

	3683 break;

	3684 case '"':

	3685 os_ << "\"";

	3686 break;

	3687 default:

	3688 os_ << label[i];

	3689 break;

	3690 }

	3691 }

	3692 os_ << "\"];\n";

	3693 Visit(node);

	3694 os_ << "}" << endl;

	3695 }

	3696

	3697

	3698 void DotPrinter::Visit(RegExpNode* node) {

	3699 if (node->info()->visited) return;

	3700 node->info()->visited = true;

	3701 node->Accept(this);

	3702 }

	3703

	3704

	3705 void DotPrinter::PrintOnFailure(RegExpNode* from, RegExpNode* on_failure) {

	3706 os_ << " n" << from << " -> n" << on_failure << " [style=dotted];\n";

	3707 Visit(on_failure);

	3708 }

	3709

	3710

	3711 class TableEntryBodyPrinter {

	3712 public:

	3713 TableEntryBodyPrinter(OStream& os, ChoiceNode* choice) // NOLINT

	3714 : os_(os),

	3715 choice_(choice) {}

	3716 void Call(uc16 from, DispatchTable::Entry entry) {

	3717 OutSet* out_set = entry.out_set();

	3718 for (unsigned i = 0; i < OutSet::kFirstLimit; i++) {

	3719 if (out_set->Get(i)) {

	3720 os_ << " n" << choice() << ":s" << from << "o" << i << " -> n"

	3721 << choice()->alternatives()->at(i).node() << ";\n";

	3722 }

	3723 }

	3724 }

	3725 private:

	3726 ChoiceNode* choice() { return choice_; }

	3727 OStream& os_;

	3728 ChoiceNode* choice_;

	3729 };

	3730

	3731

	3732 class TableEntryHeaderPrinter {

	3733 public:

	3734 explicit TableEntryHeaderPrinter(OStream& os) // NOLINT

	3735 : first_(true),

	3736 os_(os) {}

	3737 void Call(uc16 from, DispatchTable::Entry entry) {

	3738 if (first_) {

	3739 first_ = false;

	3740 } else {

	3741 os_ << "\|";

	3742 }

	3743 os_ << "{\\" << AsUC16(from) << "-\\" << AsUC16(entry.to()) << "\|{";

	3744 OutSet* out_set = entry.out_set();

	3745 int priority = 0;

	3746 for (unsigned i = 0; i < OutSet::kFirstLimit; i++) {

	3747 if (out_set->Get(i)) {

	3748 if (priority > 0) os_ << "\|";

	3749 os_ << "<s" << from << "o" << i << "> " << priority;

	3750 priority++;

	3751 }

	3752 }

	3753 os_ << "}}";

	3754 }

	3755

	3756 private:

	3757 bool first_;

	3758 OStream& os_;

	3759 };

	3760

	3761

	3762 class AttributePrinter {

	3763 public:

	3764 explicit AttributePrinter(OStream& os) // NOLINT

	3765 : os_(os),

	3766 first_(true) {}

	3767 void PrintSeparator() {

	3768 if (first_) {

	3769 first_ = false;

	3770 } else {

	3771 os_ << "\|";

	3772 }

	3773 }

	3774 void PrintBit(const char* name, bool value) {

	3775 if (!value) return;

	3776 PrintSeparator();

	3777 os_ << "{" << name << "}";

	3778 }

	3779 void PrintPositive(const char* name, int value) {

	3780 if (value < 0) return;

	3781 PrintSeparator();

	3782 os_ << "{" << name << "\|" << value << "}";

	3783 }

	3784

	3785 private:

	3786 OStream& os_;

	3787 bool first_;

	3788 };

	3789

	3790

	3791 void DotPrinter::PrintAttributes(RegExpNode* that) {

	3792 os_ << " a" << that << " [shape=Mrecord, color=grey, fontcolor=grey, "

	3793 << "margin=0.1, fontsize=10, label=\"{";

	3794 AttributePrinter printer(os_);

	3795 NodeInfo* info = that->info();

	3796 printer.PrintBit("NI", info->follows_newline_interest);

	3797 printer.PrintBit("WI", info->follows_word_interest);

	3798 printer.PrintBit("SI", info->follows_start_interest);

	3799 Label* label = that->label();

	3800 if (label->is_bound())

	3801 printer.PrintPositive("@", label->pos());

	3802 os_ << "}\"];\n"

	3803 << " a" << that << " -> n" << that

	3804 << " [style=dashed, color=grey, arrowhead=none];\n";

	3805 }

	3806

	3807

	3808 static const bool kPrintDispatchTable = false;

	3809 void DotPrinter::VisitChoice(ChoiceNode* that) {

	3810 if (kPrintDispatchTable) {

	3811 os_ << " n" << that << " [shape=Mrecord, label=\"";

	3812 TableEntryHeaderPrinter header_printer(os_);

	3813 that->GetTable(ignore_case_)->ForEach(&header_printer);

	3814 os_ << "\"]\n";

	3815 PrintAttributes(that);

	3816 TableEntryBodyPrinter body_printer(os_, that);

	3817 that->GetTable(ignore_case_)->ForEach(&body_printer);

	3818 } else {

	3819 os_ << " n" << that << " [shape=Mrecord, label=\"?\"];\n";

	3820 for (int i = 0; i < that->alternatives()->length(); i++) {

	3821 GuardedAlternative alt = that->alternatives()->at(i);

	3822 os_ << " n" << that << " -> n" << alt.node();

	3823 }

	3824 }

	3825 for (int i = 0; i < that->alternatives()->length(); i++) {

	3826 GuardedAlternative alt = that->alternatives()->at(i);

	3827 alt.node()->Accept(this);

	3828 }

	3829 }

	3830

	3831

	3832 void DotPrinter::VisitText(TextNode* that) {

	3833 Zone* zone = that->zone();

	3834 os_ << " n" << that << " [label=\"";

	3835 for (int i = 0; i < that->elements()->length(); i++) {

	3836 if (i > 0) os_ << " ";

	3837 TextElement elm = that->elements()->at(i);

	3838 switch (elm.text_type()) {

	3839 case TextElement::ATOM: {

	3840 Vector<const uc16> data = elm.atom()->data();

	3841 for (int i = 0; i < data.length(); i++) {

	3842 os_ << static_cast<char>(data[i]);

	3843 }

	3844 break;

	3845 }

	3846 case TextElement::CHAR_CLASS: {

	3847 RegExpCharacterClass* node = elm.char_class();

	3848 os_ << "[";

	3849 if (node->is_negated()) os_ << "^";

	3850 for (int j = 0; j < node->ranges(zone)->length(); j++) {

	3851 CharacterRange range = node->ranges(zone)->at(j);

	3852 os_ << AsUC16(range.from()) << "-" << AsUC16(range.to());

	3853 }

	3854 os_ << "]";

	3855 break;

	3856 }

	3857 default:

	3858 UNREACHABLE();

	3859 }

	3860 }

	3861 os_ << "\", shape=box, peripheries=2];\n";

	3862 PrintAttributes(that);

	3863 os_ << " n" << that << " -> n" << that->on_success() << ";\n";

	3864 Visit(that->on_success());

	3865 }

	3866

	3867

	3868 void DotPrinter::VisitBackReference(BackReferenceNode* that) {

	3869 os_ << " n" << that << " [label=\"$" << that->start_register() << "..$"

	3870 << that->end_register() << "\", shape=doubleoctagon];\n";

	3871 PrintAttributes(that);

	3872 os_ << " n" << that << " -> n" << that->on_success() << ";\n";

	3873 Visit(that->on_success());

	3874 }

	3875

	3876

	3877 void DotPrinter::VisitEnd(EndNode* that) {

	3878 os_ << " n" << that << " [style=bold, shape=point];\n";

	3879 PrintAttributes(that);

	3880 }

	3881

	3882

	3883 void DotPrinter::VisitAssertion(AssertionNode* that) {

	3884 os_ << " n" << that << " [";

	3885 switch (that->assertion_type()) {

	3886 case AssertionNode::AT_END:

	3887 os_ << "label=\"$\", shape=septagon";

	3888 break;

	3889 case AssertionNode::AT_START:

	3890 os_ << "label=\"^\", shape=septagon";

	3891 break;

	3892 case AssertionNode::AT_BOUNDARY:

	3893 os_ << "label=\"\\b\", shape=septagon";

	3894 break;

	3895 case AssertionNode::AT_NON_BOUNDARY:

	3896 os_ << "label=\"\\B\", shape=septagon";

	3897 break;

	3898 case AssertionNode::AFTER_NEWLINE:

	3899 os_ << "label=\"(?<=\\n)\", shape=septagon";

	3900 break;

	3901 }

	3902 os_ << "];\n";

	3903 PrintAttributes(that);

	3904 RegExpNode* successor = that->on_success();

	3905 os_ << " n" << that << " -> n" << successor << ";\n";

	3906 Visit(successor);

	3907 }

	3908

	3909

	3910 void DotPrinter::VisitAction(ActionNode* that) {

	3911 os_ << " n" << that << " [";

	3912 switch (that->action_type_) {

	3913 case ActionNode::SET_REGISTER:

	3914 os_ << "label=\"$" << that->data_.u_store_register.reg

	3915 << ":=" << that->data_.u_store_register.value << "\", shape=octagon";

	3916 break;

	3917 case ActionNode::INCREMENT_REGISTER:

	3918 os_ << "label=\"$" << that->data_.u_increment_register.reg

	3919 << "++\", shape=octagon";

	3920 break;

	3921 case ActionNode::STORE_POSITION:

	3922 os_ << "label=\"$" << that->data_.u_position_register.reg

	3923 << ":=$pos\", shape=octagon";

	3924 break;

	3925 case ActionNode::BEGIN_SUBMATCH:

	3926 os_ << "label=\"$" << that->data_.u_submatch.current_position_register

	3927 << ":=$pos,begin\", shape=septagon";

	3928 break;

	3929 case ActionNode::POSITIVE_SUBMATCH_SUCCESS:

	3930 os_ << "label=\"escape\", shape=septagon";

	3931 break;

	3932 case ActionNode::EMPTY_MATCH_CHECK:

	3933 os_ << "label=\"$" << that->data_.u_empty_match_check.start_register

	3934 << "=$pos?,$" << that->data_.u_empty_match_check.repetition_register

	3935 << "<" << that->data_.u_empty_match_check.repetition_limit

	3936 << "?\", shape=septagon";

	3937 break;

	3938 case ActionNode::CLEAR_CAPTURES: {

	3939 os_ << "label=\"clear $" << that->data_.u_clear_captures.range_from

	3940 << " to $" << that->data_.u_clear_captures.range_to

	3941 << "\", shape=septagon";

	3942 break;

	3943 }

	3944 }

	3945 os_ << "];\n";

	3946 PrintAttributes(that);

	3947 RegExpNode* successor = that->on_success();

	3948 os_ << " n" << that << " -> n" << successor << ";\n";

	3949 Visit(successor);

	3950 }

	3951

	3952

	3953 class DispatchTableDumper {

	3954 public:

	3955 explicit DispatchTableDumper(OStream& os) : os_(os) {}

	3956 void Call(uc16 key, DispatchTable::Entry entry);

	3957 private:

	3958 OStream& os_;

	3959 };

	3960

	3961

	3962 void DispatchTableDumper::Call(uc16 key, DispatchTable::Entry entry) {

	3963 os_ << "[" << AsUC16(key) << "-" << AsUC16(entry.to()) << "]: {";

	3964 OutSet* set = entry.out_set();

	3965 bool first = true;

	3966 for (unsigned i = 0; i < OutSet::kFirstLimit; i++) {

	3967 if (set->Get(i)) {

	3968 if (first) {

	3969 first = false;

	3970 } else {

	3971 os_ << ", ";

	3972 }

	3973 os_ << i;

	3974 }

	3975 }

	3976 os_ << "}\n";

	3977 }

	3978

	3979

	3980 void DispatchTable::Dump() {

	3981 OFStream os(stderr);

	3982 DispatchTableDumper dumper(os);

	3983 tree()->ForEach(&dumper);

	3984 }

	3985

	3986

	3987 void RegExpEngine::DotPrint(const char* label,

	3988 RegExpNode* node,

	3989 bool ignore_case) {

	3990 OFStream os(stdout);

	3991 DotPrinter printer(os, ignore_case);

	3992 printer.PrintNode(label, node);

	3993 }

	3994

	3995

	3996 #endif // DEBUG

	3997

	3998

	3999 // -------------------------------------------------------------------

	4000 // Tree to graph conversion

	4001

	4002 RegExpNode* RegExpAtom::ToNode(RegExpCompiler* compiler,

	4003 RegExpNode* on_success) {

	4004 ZoneList<TextElement>* elms =

	4005 new(compiler->zone()) ZoneList<TextElement>(1, compiler->zone());

	4006 elms->Add(TextElement::Atom(this), compiler->zone());

	4007 return new(compiler->zone()) TextNode(elms, on_success);

	4008 }

	4009

	4010

	4011 RegExpNode* RegExpText::ToNode(RegExpCompiler* compiler,

	4012 RegExpNode* on_success) {

	4013 return new(compiler->zone()) TextNode(elements(), on_success);

	4014 }

	4015

	4016

	4017 static bool CompareInverseRanges(ZoneList<CharacterRange>* ranges,

	4018 const int* special_class,

	4019 int length) {

	4020 length--; // Remove final 0x10000.

	4021 DCHECK(special_class[length] == 0x10000);

	4022 DCHECK(ranges->length() != 0);

	4023 DCHECK(length != 0);

	4024 DCHECK(special_class[0] != 0);

	4025 if (ranges->length() != (length >> 1) + 1) {

	4026 return false;

	4027 }

	4028 CharacterRange range = ranges->at(0);

	4029 if (range.from() != 0) {

	4030 return false;

	4031 }

	4032 for (int i = 0; i < length; i += 2) {

	4033 if (special_class[i] != (range.to() + 1)) {

	4034 return false;

	4035 }

	4036 range = ranges->at((i >> 1) + 1);

	4037 if (special_class[i+1] != range.from()) {

	4038 return false;

	4039 }

	4040 }

	4041 if (range.to() != 0xffff) {

	4042 return false;

	4043 }

	4044 return true;

	4045 }

	4046

	4047

	4048 static bool CompareRanges(ZoneList<CharacterRange>* ranges,

	4049 const int* special_class,

	4050 int length) {

	4051 length--; // Remove final 0x10000.

	4052 DCHECK(special_class[length] == 0x10000);

	4053 if (ranges->length() * 2 != length) {

	4054 return false;

	4055 }

	4056 for (int i = 0; i < length; i += 2) {

	4057 CharacterRange range = ranges->at(i >> 1);

	4058 if (range.from() != special_class[i] \|\|

	4059 range.to() != special_class[i + 1] - 1) {

	4060 return false;

	4061 }

	4062 }

	4063 return true;

	4064 }

	4065

	4066

	4067 bool RegExpCharacterClass::is_standard(Zone* zone) {

	4068 // TODO(lrn): Remove need for this function, by not throwing away information

	4069 // along the way.

	4070 if (is_negated_) {

	4071 return false;

	4072 }

	4073 if (set_.is_standard()) {

	4074 return true;

	4075 }

	4076 if (CompareRanges(set_.ranges(zone), kSpaceRanges, kSpaceRangeCount)) {

	4077 set_.set_standard_set_type('s');

	4078 return true;

	4079 }

	4080 if (CompareInverseRanges(set_.ranges(zone), kSpaceRanges, kSpaceRangeCount)) {

	4081 set_.set_standard_set_type('S');

	4082 return true;

	4083 }

	4084 if (CompareInverseRanges(set_.ranges(zone),

	4085 kLineTerminatorRanges,

	4086 kLineTerminatorRangeCount)) {

	4087 set_.set_standard_set_type('.');

	4088 return true;

	4089 }

	4090 if (CompareRanges(set_.ranges(zone),

	4091 kLineTerminatorRanges,

	4092 kLineTerminatorRangeCount)) {

	4093 set_.set_standard_set_type('n');

	4094 return true;

	4095 }

	4096 if (CompareRanges(set_.ranges(zone), kWordRanges, kWordRangeCount)) {

	4097 set_.set_standard_set_type('w');

	4098 return true;

	4099 }

	4100 if (CompareInverseRanges(set_.ranges(zone), kWordRanges, kWordRangeCount)) {

	4101 set_.set_standard_set_type('W');

	4102 return true;

	4103 }

	4104 return false;

	4105 }

	4106

	4107

	4108 RegExpNode* RegExpCharacterClass::ToNode(RegExpCompiler* compiler,

	4109 RegExpNode* on_success) {

	4110 return new(compiler->zone()) TextNode(this, on_success);

	4111 }

	4112

	4113

	4114 RegExpNode* RegExpDisjunction::ToNode(RegExpCompiler* compiler,

	4115 RegExpNode* on_success) {

	4116 ZoneList<RegExpTree> alternatives = this->alternatives();

	4117 int length = alternatives->length();

	4118 ChoiceNode* result =

	4119 new(compiler->zone()) ChoiceNode(length, compiler->zone());

	4120 for (int i = 0; i < length; i++) {

	4121 GuardedAlternative alternative(alternatives->at(i)->ToNode(compiler,

	4122 on_success));

	4123 result->AddAlternative(alternative);

	4124 }

	4125 return result;

	4126 }

	4127

	4128

	4129 RegExpNode* RegExpQuantifier::ToNode(RegExpCompiler* compiler,

	4130 RegExpNode* on_success) {

	4131 return ToNode(min(),

	4132 max(),

	4133 is_greedy(),

	4134 body(),

	4135 compiler,

	4136 on_success);

	4137 }

	4138

	4139

	4140 // Scoped object to keep track of how much we unroll quantifier loops in the

	4141 // regexp graph generator.

	4142 class RegExpExpansionLimiter {

	4143 public:

	4144 static const int kMaxExpansionFactor = 6;

	4145 RegExpExpansionLimiter(RegExpCompiler* compiler, int factor)

	4146 : compiler_(compiler),

	4147 saved_expansion_factor_(compiler->current_expansion_factor()),

	4148 ok_to_expand_(saved_expansion_factor_ <= kMaxExpansionFactor) {

	4149 DCHECK(factor > 0);

	4150 if (ok_to_expand_) {

	4151 if (factor > kMaxExpansionFactor) {

	4152 // Avoid integer overflow of the current expansion factor.

	4153 ok_to_expand_ = false;

	4154 compiler->set_current_expansion_factor(kMaxExpansionFactor + 1);

	4155 } else {

	4156 int new_factor = saved_expansion_factor_ * factor;

	4157 ok_to_expand_ = (new_factor <= kMaxExpansionFactor);

	4158 compiler->set_current_expansion_factor(new_factor);

	4159 }

	4160 }

	4161 }

	4162

	4163 ~RegExpExpansionLimiter() {

	4164 compiler_->set_current_expansion_factor(saved_expansion_factor_);

	4165 }

	4166

	4167 bool ok_to_expand() { return ok_to_expand_; }

	4168

	4169 private:

	4170 RegExpCompiler* compiler_;

	4171 int saved_expansion_factor_;

	4172 bool ok_to_expand_;

	4173

	4174 DISALLOW_IMPLICIT_CONSTRUCTORS(RegExpExpansionLimiter);

	4175 };

	4176

	4177

	4178 RegExpNode* RegExpQuantifier::ToNode(int min,

	4179 int max,

	4180 bool is_greedy,

	4181 RegExpTree* body,

	4182 RegExpCompiler* compiler,

	4183 RegExpNode* on_success,

	4184 bool not_at_start) {

	4185 // x{f, t} becomes this:

	4186 //

	4187 // (r++)<-.

	4188 // \| `

	4189 // \| (x)

	4190 // v ^

	4191 // (r=0)-->(?)---/ [if r < t]

	4192 // \|

	4193 // [if r >= f] \----> ...

	4194 //

	4195

	4196 // 15.10.2.5 RepeatMatcher algorithm.

	4197 // The parser has already eliminated the case where max is 0. In the case

	4198 // where max_match is zero the parser has removed the quantifier if min was

	4199 // > 0 and removed the atom if min was 0. See AddQuantifierToAtom.

	4200

	4201 // If we know that we cannot match zero length then things are a little

	4202 // simpler since we don't need to make the special zero length match check

	4203 // from step 2.1. If the min and max are small we can unroll a little in

	4204 // this case.

	4205 static const int kMaxUnrolledMinMatches = 3; // Unroll (foo)+ and (foo){3,}

	4206 static const int kMaxUnrolledMaxMatches = 3; // Unroll (foo)? and (foo){x,3}

	4207 if (max == 0) return on_success; // This can happen due to recursion.

	4208 bool body_can_be_empty = (body->min_match() == 0);

	4209 int body_start_reg = RegExpCompiler::kNoRegister;

	4210 Interval capture_registers = body->CaptureRegisters();

	4211 bool needs_capture_clearing = !capture_registers.is_empty();

	4212 Zone* zone = compiler->zone();

	4213

	4214 if (body_can_be_empty) {

	4215 body_start_reg = compiler->AllocateRegister();

	4216 } else if (FLAG_regexp_optimization && !needs_capture_clearing) {

	4217 // Only unroll if there are no captures and the body can't be

	4218 // empty.

	4219 {

	4220 RegExpExpansionLimiter limiter(

	4221 compiler, min + ((max != min) ? 1 : 0));

	4222 if (min > 0 && min <= kMaxUnrolledMinMatches && limiter.ok_to_expand()) {

	4223 int new_max = (max == kInfinity) ? max : max - min;

	4224 // Recurse once to get the loop or optional matches after the fixed

	4225 // ones.

	4226 RegExpNode* answer = ToNode(

	4227 0, new_max, is_greedy, body, compiler, on_success, true);

	4228 // Unroll the forced matches from 0 to min. This can cause chains of

	4229 // TextNodes (which the parser does not generate). These should be

	4230 // combined if it turns out they hinder good code generation.

	4231 for (int i = 0; i < min; i++) {

	4232 answer = body->ToNode(compiler, answer);

	4233 }

	4234 return answer;

	4235 }

	4236 }

	4237 if (max <= kMaxUnrolledMaxMatches && min == 0) {

	4238 DCHECK(max > 0); // Due to the 'if' above.

	4239 RegExpExpansionLimiter limiter(compiler, max);

	4240 if (limiter.ok_to_expand()) {

	4241 // Unroll the optional matches up to max.

	4242 RegExpNode* answer = on_success;

	4243 for (int i = 0; i < max; i++) {

	4244 ChoiceNode* alternation = new(zone) ChoiceNode(2, zone);

	4245 if (is_greedy) {

	4246 alternation->AddAlternative(

	4247 GuardedAlternative(body->ToNode(compiler, answer)));

	4248 alternation->AddAlternative(GuardedAlternative(on_success));

	4249 } else {

	4250 alternation->AddAlternative(GuardedAlternative(on_success));

	4251 alternation->AddAlternative(

	4252 GuardedAlternative(body->ToNode(compiler, answer)));

	4253 }

	4254 answer = alternation;

	4255 if (not_at_start) alternation->set_not_at_start();

	4256 }

	4257 return answer;

	4258 }

	4259 }

	4260 }

	4261 bool has_min = min > 0;

	4262 bool has_max = max < RegExpTree::kInfinity;

	4263 bool needs_counter = has_min \|\| has_max;

	4264 int reg_ctr = needs_counter

	4265 ? compiler->AllocateRegister()

	4266 : RegExpCompiler::kNoRegister;

	4267 LoopChoiceNode* center = new(zone) LoopChoiceNode(body->min_match() == 0,

	4268 zone);

	4269 if (not_at_start) center->set_not_at_start();

	4270 RegExpNode* loop_return = needs_counter

	4271 ? static_cast<RegExpNode*>(ActionNode::IncrementRegister(reg_ctr, center))

	4272 : static_cast<RegExpNode*>(center);

	4273 if (body_can_be_empty) {

	4274 // If the body can be empty we need to check if it was and then

	4275 // backtrack.

	4276 loop_return = ActionNode::EmptyMatchCheck(body_start_reg,

	4277 reg_ctr,

	4278 min,

	4279 loop_return);

	4280 }

	4281 RegExpNode* body_node = body->ToNode(compiler, loop_return);

	4282 if (body_can_be_empty) {

	4283 // If the body can be empty we need to store the start position

	4284 // so we can bail out if it was empty.

	4285 body_node = ActionNode::StorePosition(body_start_reg, false, body_node);

	4286 }

	4287 if (needs_capture_clearing) {

	4288 // Before entering the body of this loop we need to clear captures.

	4289 body_node = ActionNode::ClearCaptures(capture_registers, body_node);

	4290 }

	4291 GuardedAlternative body_alt(body_node);

	4292 if (has_max) {

	4293 Guard* body_guard =

	4294 new(zone) Guard(reg_ctr, Guard::LT, max);

	4295 body_alt.AddGuard(body_guard, zone);

	4296 }

	4297 GuardedAlternative rest_alt(on_success);

	4298 if (has_min) {

	4299 Guard* rest_guard = new(compiler->zone()) Guard(reg_ctr, Guard::GEQ, min);

	4300 rest_alt.AddGuard(rest_guard, zone);

	4301 }

	4302 if (is_greedy) {

	4303 center->AddLoopAlternative(body_alt);

	4304 center->AddContinueAlternative(rest_alt);

	4305 } else {

	4306 center->AddContinueAlternative(rest_alt);

	4307 center->AddLoopAlternative(body_alt);

	4308 }

	4309 if (needs_counter) {

	4310 return ActionNode::SetRegister(reg_ctr, 0, center);

	4311 } else {

	4312 return center;

	4313 }

	4314 }

	4315

	4316

	4317 RegExpNode* RegExpAssertion::ToNode(RegExpCompiler* compiler,

	4318 RegExpNode* on_success) {

	4319 NodeInfo info;

	4320 Zone* zone = compiler->zone();

	4321

	4322 switch (assertion_type()) {

	4323 case START_OF_LINE:

	4324 return AssertionNode::AfterNewline(on_success);

	4325 case START_OF_INPUT:

	4326 return AssertionNode::AtStart(on_success);

	4327 case BOUNDARY:

	4328 return AssertionNode::AtBoundary(on_success);

	4329 case NON_BOUNDARY:

	4330 return AssertionNode::AtNonBoundary(on_success);

	4331 case END_OF_INPUT:

	4332 return AssertionNode::AtEnd(on_success);

	4333 case END_OF_LINE: {

	4334 // Compile $ in multiline regexps as an alternation with a positive

	4335 // lookahead in one side and an end-of-input on the other side.

	4336 // We need two registers for the lookahead.

	4337 int stack_pointer_register = compiler->AllocateRegister();

	4338 int position_register = compiler->AllocateRegister();

	4339 // The ChoiceNode to distinguish between a newline and end-of-input.

	4340 ChoiceNode* result = new(zone) ChoiceNode(2, zone);

	4341 // Create a newline atom.

	4342 ZoneList<CharacterRange>* newline_ranges =

	4343 new(zone) ZoneList<CharacterRange>(3, zone);

	4344 CharacterRange::AddClassEscape('n', newline_ranges, zone);

	4345 RegExpCharacterClass* newline_atom = new(zone) RegExpCharacterClass('n');

	4346 TextNode* newline_matcher = new(zone) TextNode(

	4347 newline_atom,

	4348 ActionNode::PositiveSubmatchSuccess(stack_pointer_register,

	4349 position_register,

	4350 0, // No captures inside.

	4351 -1, // Ignored if no captures.

	4352 on_success));

	4353 // Create an end-of-input matcher.

	4354 RegExpNode* end_of_line = ActionNode::BeginSubmatch(

	4355 stack_pointer_register,

	4356 position_register,

	4357 newline_matcher);

	4358 // Add the two alternatives to the ChoiceNode.

	4359 GuardedAlternative eol_alternative(end_of_line);

	4360 result->AddAlternative(eol_alternative);

	4361 GuardedAlternative end_alternative(AssertionNode::AtEnd(on_success));

	4362 result->AddAlternative(end_alternative);

	4363 return result;

	4364 }

	4365 default:

	4366 UNREACHABLE();

	4367 }

	4368 return on_success;

	4369 }

	4370

	4371

	4372 RegExpNode* RegExpBackReference::ToNode(RegExpCompiler* compiler,

	4373 RegExpNode* on_success) {

	4374 return new(compiler->zone())

	4375 BackReferenceNode(RegExpCapture::StartRegister(index()),

	4376 RegExpCapture::EndRegister(index()),

	4377 on_success);

	4378 }

	4379

	4380

	4381 RegExpNode* RegExpEmpty::ToNode(RegExpCompiler* compiler,

	4382 RegExpNode* on_success) {

	4383 return on_success;

	4384 }

	4385

	4386

	4387 RegExpNode* RegExpLookahead::ToNode(RegExpCompiler* compiler,

	4388 RegExpNode* on_success) {

	4389 int stack_pointer_register = compiler->AllocateRegister();

	4390 int position_register = compiler->AllocateRegister();

	4391

	4392 const int registers_per_capture = 2;

	4393 const int register_of_first_capture = 2;

	4394 int register_count = capture_count_ * registers_per_capture;

	4395 int register_start =

	4396 register_of_first_capture + capture_from_ * registers_per_capture;

	4397

	4398 RegExpNode* success;

	4399 if (is_positive()) {

	4400 RegExpNode* node = ActionNode::BeginSubmatch(

	4401 stack_pointer_register,

	4402 position_register,

	4403 body()->ToNode(

	4404 compiler,

	4405 ActionNode::PositiveSubmatchSuccess(stack_pointer_register,

	4406 position_register,

	4407 register_count,

	4408 register_start,

	4409 on_success)));

	4410 return node;

	4411 } else {

	4412 // We use a ChoiceNode for a negative lookahead because it has most of

	4413 // the characteristics we need. It has the body of the lookahead as its

	4414 // first alternative and the expression after the lookahead of the second

	4415 // alternative. If the first alternative succeeds then the

	4416 // NegativeSubmatchSuccess will unwind the stack including everything the

	4417 // choice node set up and backtrack. If the first alternative fails then

	4418 // the second alternative is tried, which is exactly the desired result

	4419 // for a negative lookahead. The NegativeLookaheadChoiceNode is a special

	4420 // ChoiceNode that knows to ignore the first exit when calculating quick

	4421 // checks.

	4422 Zone* zone = compiler->zone();

	4423

	4424 GuardedAlternative body_alt(

	4425 body()->ToNode(

	4426 compiler,

	4427 success = new(zone) NegativeSubmatchSuccess(stack_pointer_register,

	4428 position_register,

	4429 register_count,

	4430 register_start,

	4431 zone)));

	4432 ChoiceNode* choice_node =

	4433 new(zone) NegativeLookaheadChoiceNode(body_alt,

	4434 GuardedAlternative(on_success),

	4435 zone);

	4436 return ActionNode::BeginSubmatch(stack_pointer_register,

	4437 position_register,

	4438 choice_node);

	4439 }

	4440 }

	4441

	4442

	4443 RegExpNode* RegExpCapture::ToNode(RegExpCompiler* compiler,

	4444 RegExpNode* on_success) {

	4445 return ToNode(body(), index(), compiler, on_success);

	4446 }

	4447

	4448

	4449 RegExpNode* RegExpCapture::ToNode(RegExpTree* body,

	4450 int index,

	4451 RegExpCompiler* compiler,

	4452 RegExpNode* on_success) {

	4453 int start_reg = RegExpCapture::StartRegister(index);

	4454 int end_reg = RegExpCapture::EndRegister(index);

	4455 RegExpNode* store_end = ActionNode::StorePosition(end_reg, true, on_success);

	4456 RegExpNode* body_node = body->ToNode(compiler, store_end);

	4457 return ActionNode::StorePosition(start_reg, true, body_node);

	4458 }

	4459

	4460

	4461 RegExpNode* RegExpAlternative::ToNode(RegExpCompiler* compiler,

	4462 RegExpNode* on_success) {

	4463 ZoneList<RegExpTree> children = nodes();

	4464 RegExpNode* current = on_success;

	4465 for (int i = children->length() - 1; i >= 0; i--) {

	4466 current = children->at(i)->ToNode(compiler, current);

	4467 }

	4468 return current;

	4469 }

	4470

	4471

	4472 static void AddClass(const int* elmv,

	4473 int elmc,

	4474 ZoneList<CharacterRange>* ranges,

	4475 Zone* zone) {

	4476 elmc--;

	4477 DCHECK(elmv[elmc] == 0x10000);

	4478 for (int i = 0; i < elmc; i += 2) {

	4479 DCHECK(elmv[i] < elmv[i + 1]);

	4480 ranges->Add(CharacterRange(elmv[i], elmv[i + 1] - 1), zone);

	4481 }

	4482 }

	4483

	4484

	4485 static void AddClassNegated(const int *elmv,

	4486 int elmc,

	4487 ZoneList<CharacterRange>* ranges,

	4488 Zone* zone) {

	4489 elmc--;

	4490 DCHECK(elmv[elmc] == 0x10000);

	4491 DCHECK(elmv[0] != 0x0000);

	4492 DCHECK(elmv[elmc-1] != String::kMaxUtf16CodeUnit);

	4493 uc16 last = 0x0000;

	4494 for (int i = 0; i < elmc; i += 2) {

	4495 DCHECK(last <= elmv[i] - 1);

	4496 DCHECK(elmv[i] < elmv[i + 1]);

	4497 ranges->Add(CharacterRange(last, elmv[i] - 1), zone);

	4498 last = elmv[i + 1];

	4499 }

	4500 ranges->Add(CharacterRange(last, String::kMaxUtf16CodeUnit), zone);

	4501 }

	4502

	4503

	4504 void CharacterRange::AddClassEscape(uc16 type,

	4505 ZoneList<CharacterRange>* ranges,

	4506 Zone* zone) {

	4507 switch (type) {

	4508 case 's':

	4509 AddClass(kSpaceRanges, kSpaceRangeCount, ranges, zone);

	4510 break;

	4511 case 'S':

	4512 AddClassNegated(kSpaceRanges, kSpaceRangeCount, ranges, zone);

	4513 break;

	4514 case 'w':

	4515 AddClass(kWordRanges, kWordRangeCount, ranges, zone);

	4516 break;

	4517 case 'W':

	4518 AddClassNegated(kWordRanges, kWordRangeCount, ranges, zone);

	4519 break;

	4520 case 'd':

	4521 AddClass(kDigitRanges, kDigitRangeCount, ranges, zone);

	4522 break;

	4523 case 'D':

	4524 AddClassNegated(kDigitRanges, kDigitRangeCount, ranges, zone);

	4525 break;

	4526 case '.':

	4527 AddClassNegated(kLineTerminatorRanges,

	4528 kLineTerminatorRangeCount,

	4529 ranges,

	4530 zone);

	4531 break;

	4532 // This is not a character range as defined by the spec but a

	4533 // convenient shorthand for a character class that matches any

	4534 // character.

	4535 case '*':

	4536 ranges->Add(CharacterRange::Everything(), zone);

	4537 break;

	4538 // This is the set of characters matched by the $ and ^ symbols

	4539 // in multiline mode.

	4540 case 'n':

	4541 AddClass(kLineTerminatorRanges,

	4542 kLineTerminatorRangeCount,

	4543 ranges,

	4544 zone);

	4545 break;

	4546 default:

	4547 UNREACHABLE();

	4548 }

	4549 }

	4550

	4551

	4552 Vector<const int> CharacterRange::GetWordBounds() {

	4553 return Vector<const int>(kWordRanges, kWordRangeCount - 1);

	4554 }

	4555

	4556

	4557 class CharacterRangeSplitter {

	4558 public:

	4559 CharacterRangeSplitter(ZoneList<CharacterRange>** included,

	4560 ZoneList<CharacterRange>** excluded,

	4561 Zone* zone)

	4562 : included_(included),

	4563 excluded_(excluded),

	4564 zone_(zone) { }

	4565 void Call(uc16 from, DispatchTable::Entry entry);

	4566

	4567 static const int kInBase = 0;

	4568 static const int kInOverlay = 1;

	4569

	4570 private:

	4571 ZoneList<CharacterRange>** included_;

	4572 ZoneList<CharacterRange>** excluded_;

	4573 Zone* zone_;

	4574 };

	4575

	4576

	4577 void CharacterRangeSplitter::Call(uc16 from, DispatchTable::Entry entry) {

	4578 if (!entry.out_set()->Get(kInBase)) return;

	4579 ZoneList<CharacterRange>** target = entry.out_set()->Get(kInOverlay)

	4580 ? included_

	4581 : excluded_;

	4582 if (target == NULL) target = new(zone_) ZoneList<CharacterRange>(2, zone_);

	4583 (*target)->Add(CharacterRange(entry.from(), entry.to()), zone_);

	4584 }

	4585

	4586

	4587 void CharacterRange::Split(ZoneList<CharacterRange>* base,

	4588 Vector<const int> overlay,

	4589 ZoneList<CharacterRange>** included,

	4590 ZoneList<CharacterRange>** excluded,

	4591 Zone* zone) {

	4592 DCHECK_EQ(NULL, *included);

	4593 DCHECK_EQ(NULL, *excluded);

	4594 DispatchTable table(zone);

	4595 for (int i = 0; i < base->length(); i++)

	4596 table.AddRange(base->at(i), CharacterRangeSplitter::kInBase, zone);

	4597 for (int i = 0; i < overlay.length(); i += 2) {

	4598 table.AddRange(CharacterRange(overlay[i], overlay[i + 1] - 1),

	4599 CharacterRangeSplitter::kInOverlay, zone);

	4600 }

	4601 CharacterRangeSplitter callback(included, excluded, zone);

	4602 table.ForEach(&callback);

	4603 }

	4604

	4605

	4606 void CharacterRange::AddCaseEquivalents(ZoneList<CharacterRange>* ranges,

	4607 bool is_one_byte, Zone* zone) {

	4608 Isolate* isolate = zone->isolate();

	4609 uc16 bottom = from();

	4610 uc16 top = to();

	4611 if (is_one_byte && !RangeContainsLatin1Equivalents(*this)) {

	4612 if (bottom > String::kMaxOneByteCharCode) return;

	4613 if (top > String::kMaxOneByteCharCode) top = String::kMaxOneByteCharCode;

	4614 }

	4615 unibrow::uchar chars[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	4616 if (top == bottom) {

	4617 // If this is a singleton we just expand the one character.

	4618 int length = isolate->jsregexp_uncanonicalize()->get(bottom, '\0', chars);

	4619 for (int i = 0; i < length; i++) {

	4620 uc32 chr = chars[i];

	4621 if (chr != bottom) {

	4622 ranges->Add(CharacterRange::Singleton(chars[i]), zone);

	4623 }

	4624 }

	4625 } else {

	4626 // If this is a range we expand the characters block by block,

	4627 // expanding contiguous subranges (blocks) one at a time.

	4628 // The approach is as follows. For a given start character we

	4629 // look up the remainder of the block that contains it (represented

	4630 // by the end point), for instance we find 'z' if the character

	4631 // is 'c'. A block is characterized by the property

	4632 // that all characters uncanonicalize in the same way, except that

	4633 // each entry in the result is incremented by the distance from the first

	4634 // element. So a-z is a block because 'a' uncanonicalizes to ['a', 'A'] and

	4635 // the k'th letter uncanonicalizes to ['a' + k, 'A' + k].

	4636 // Once we've found the end point we look up its uncanonicalization

	4637 // and produce a range for each element. For instance for [c-f]

	4638 // we look up ['z', 'Z'] and produce [c-f] and [C-F]. We then only

	4639 // add a range if it is not already contained in the input, so [c-f]

	4640 // will be skipped but [C-F] will be added. If this range is not

	4641 // completely contained in a block we do this for all the blocks

	4642 // covered by the range (handling characters that is not in a block

	4643 // as a "singleton block").

	4644 unibrow::uchar range[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	4645 int pos = bottom;

	4646 while (pos <= top) {

	4647 int length = isolate->jsregexp_canonrange()->get(pos, '\0', range);

	4648 uc16 block_end;

	4649 if (length == 0) {

	4650 block_end = pos;

	4651 } else {

	4652 DCHECK_EQ(1, length);

	4653 block_end = range[0];

	4654 }

	4655 int end = (block_end > top) ? top : block_end;

	4656 length = isolate->jsregexp_uncanonicalize()->get(block_end, '\0', range);

	4657 for (int i = 0; i < length; i++) {

	4658 uc32 c = range[i];

	4659 uc16 range_from = c - (block_end - pos);

	4660 uc16 range_to = c - (block_end - end);

	4661 if (!(bottom <= range_from && range_to <= top)) {

	4662 ranges->Add(CharacterRange(range_from, range_to), zone);

	4663 }

	4664 }

	4665 pos = end + 1;

	4666 }

	4667 }

	4668 }

	4669

	4670

	4671 bool CharacterRange::IsCanonical(ZoneList<CharacterRange>* ranges) {

	4672 DCHECK_NOT_NULL(ranges);

	4673 int n = ranges->length();

	4674 if (n <= 1) return true;

	4675 int max = ranges->at(0).to();

	4676 for (int i = 1; i < n; i++) {

	4677 CharacterRange next_range = ranges->at(i);

	4678 if (next_range.from() <= max + 1) return false;

	4679 max = next_range.to();

	4680 }

	4681 return true;

	4682 }

	4683

	4684

	4685 ZoneList<CharacterRange>* CharacterSet::ranges(Zone* zone) {

	4686 if (ranges_ == NULL) {

	4687 ranges_ = new(zone) ZoneList<CharacterRange>(2, zone);

	4688 CharacterRange::AddClassEscape(standard_set_type_, ranges_, zone);

	4689 }

	4690 return ranges_;

	4691 }

	4692

	4693

	4694 // Move a number of elements in a zonelist to another position

	4695 // in the same list. Handles overlapping source and target areas.

	4696 static void MoveRanges(ZoneList<CharacterRange>* list,

	4697 int from,

	4698 int to,

	4699 int count) {

	4700 // Ranges are potentially overlapping.

	4701 if (from < to) {

	4702 for (int i = count - 1; i >= 0; i--) {

	4703 list->at(to + i) = list->at(from + i);

	4704 }

	4705 } else {

	4706 for (int i = 0; i < count; i++) {

	4707 list->at(to + i) = list->at(from + i);

	4708 }

	4709 }

	4710 }

	4711

	4712

	4713 static int InsertRangeInCanonicalList(ZoneList<CharacterRange>* list,

	4714 int count,

	4715 CharacterRange insert) {

	4716 // Inserts a range into list[0..count[, which must be sorted

	4717 // by from value and non-overlapping and non-adjacent, using at most

	4718 // list[0..count] for the result. Returns the number of resulting

	4719 // canonicalized ranges. Inserting a range may collapse existing ranges into

	4720 // fewer ranges, so the return value can be anything in the range 1..count+1.

	4721 uc16 from = insert.from();

	4722 uc16 to = insert.to();

	4723 int start_pos = 0;

	4724 int end_pos = count;

	4725 for (int i = count - 1; i >= 0; i--) {

	4726 CharacterRange current = list->at(i);

	4727 if (current.from() > to + 1) {

	4728 end_pos = i;

	4729 } else if (current.to() + 1 < from) {

	4730 start_pos = i + 1;

	4731 break;

	4732 }

	4733 }

	4734

	4735 // Inserted range overlaps, or is adjacent to, ranges at positions

	4736 // [start_pos..end_pos[. Ranges before start_pos or at or after end_pos are

	4737 // not affected by the insertion.

	4738 // If start_pos == end_pos, the range must be inserted before start_pos.

	4739 // if start_pos < end_pos, the entire range from start_pos to end_pos

	4740 // must be merged with the insert range.

	4741

	4742 if (start_pos == end_pos) {

	4743 // Insert between existing ranges at position start_pos.

	4744 if (start_pos < count) {

	4745 MoveRanges(list, start_pos, start_pos + 1, count - start_pos);

	4746 }

	4747 list->at(start_pos) = insert;

	4748 return count + 1;

	4749 }

	4750 if (start_pos + 1 == end_pos) {

	4751 // Replace single existing range at position start_pos.

	4752 CharacterRange to_replace = list->at(start_pos);

	4753 int new_from = Min(to_replace.from(), from);

	4754 int new_to = Max(to_replace.to(), to);

	4755 list->at(start_pos) = CharacterRange(new_from, new_to);

	4756 return count;

	4757 }

	4758 // Replace a number of existing ranges from start_pos to end_pos - 1.

	4759 // Move the remaining ranges down.

	4760

	4761 int new_from = Min(list->at(start_pos).from(), from);

	4762 int new_to = Max(list->at(end_pos - 1).to(), to);

	4763 if (end_pos < count) {

	4764 MoveRanges(list, end_pos, start_pos + 1, count - end_pos);

	4765 }

	4766 list->at(start_pos) = CharacterRange(new_from, new_to);

	4767 return count - (end_pos - start_pos) + 1;

	4768 }

	4769

	4770

	4771 void CharacterSet::Canonicalize() {

	4772 // Special/default classes are always considered canonical. The result

	4773 // of calling ranges() will be sorted.

	4774 if (ranges_ == NULL) return;

	4775 CharacterRange::Canonicalize(ranges_);

	4776 }

	4777

	4778

	4779 void CharacterRange::Canonicalize(ZoneList<CharacterRange>* character_ranges) {

	4780 if (character_ranges->length() <= 1) return;

	4781 // Check whether ranges are already canonical (increasing, non-overlapping,

	4782 // non-adjacent).

	4783 int n = character_ranges->length();

	4784 int max = character_ranges->at(0).to();

	4785 int i = 1;

	4786 while (i < n) {

	4787 CharacterRange current = character_ranges->at(i);

	4788 if (current.from() <= max + 1) {

	4789 break;

	4790 }

	4791 max = current.to();

	4792 i++;

	4793 }

	4794 // Canonical until the i'th range. If that's all of them, we are done.

	4795 if (i == n) return;

	4796

	4797 // The ranges at index i and forward are not canonicalized. Make them so by

	4798 // doing the equivalent of insertion sort (inserting each into the previous

	4799 // list, in order).

	4800 // Notice that inserting a range can reduce the number of ranges in the

	4801 // result due to combining of adjacent and overlapping ranges.

	4802 int read = i; // Range to insert.

	4803 int num_canonical = i; // Length of canonicalized part of list.

	4804 do {

	4805 num_canonical = InsertRangeInCanonicalList(character_ranges,

	4806 num_canonical,

	4807 character_ranges->at(read));

	4808 read++;

	4809 } while (read < n);

	4810 character_ranges->Rewind(num_canonical);

	4811

	4812 DCHECK(CharacterRange::IsCanonical(character_ranges));

	4813 }

	4814

	4815

	4816 void CharacterRange::Negate(ZoneList<CharacterRange>* ranges,

	4817 ZoneList<CharacterRange>* negated_ranges,

	4818 Zone* zone) {

	4819 DCHECK(CharacterRange::IsCanonical(ranges));

	4820 DCHECK_EQ(0, negated_ranges->length());

	4821 int range_count = ranges->length();

	4822 uc16 from = 0;

	4823 int i = 0;

	4824 if (range_count > 0 && ranges->at(0).from() == 0) {

	4825 from = ranges->at(0).to();

	4826 i = 1;

	4827 }

	4828 while (i < range_count) {

	4829 CharacterRange range = ranges->at(i);

	4830 negated_ranges->Add(CharacterRange(from + 1, range.from() - 1), zone);

	4831 from = range.to();

	4832 i++;

	4833 }

	4834 if (from < String::kMaxUtf16CodeUnit) {

	4835 negated_ranges->Add(CharacterRange(from + 1, String::kMaxUtf16CodeUnit),

	4836 zone);

	4837 }

	4838 }

	4839

	4840

	4841 // -------------------------------------------------------------------

	4842 // Splay tree

	4843

	4844

	4845 OutSet* OutSet::Extend(unsigned value, Zone* zone) {

	4846 if (Get(value))

	4847 return this;

	4848 if (successors(zone) != NULL) {

	4849 for (int i = 0; i < successors(zone)->length(); i++) {

	4850 OutSet* successor = successors(zone)->at(i);

	4851 if (successor->Get(value))

	4852 return successor;

	4853 }

	4854 } else {

	4855 successors_ = new(zone) ZoneList<OutSet*>(2, zone);

	4856 }

	4857 OutSet* result = new(zone) OutSet(first_, remaining_);

	4858 result->Set(value, zone);

	4859 successors(zone)->Add(result, zone);

	4860 return result;

	4861 }

	4862

	4863

	4864 void OutSet::Set(unsigned value, Zone *zone) {

	4865 if (value < kFirstLimit) {

	4866 first_ \|= (1 << value);

	4867 } else {

	4868 if (remaining_ == NULL)

	4869 remaining_ = new(zone) ZoneList<unsigned>(1, zone);

	4870 if (remaining_->is_empty() \|\| !remaining_->Contains(value))

	4871 remaining_->Add(value, zone);

	4872 }

	4873 }

	4874

	4875

	4876 bool OutSet::Get(unsigned value) const {

	4877 if (value < kFirstLimit) {

	4878 return (first_ & (1 << value)) != 0;

	4879 } else if (remaining_ == NULL) {

	4880 return false;

	4881 } else {

	4882 return remaining_->Contains(value);

	4883 }

	4884 }

	4885

	4886

	4887 const uc16 DispatchTable::Config::kNoKey = unibrow::Utf8::kBadChar;

	4888

	4889

	4890 void DispatchTable::AddRange(CharacterRange full_range, int value,

	4891 Zone* zone) {

	4892 CharacterRange current = full_range;

	4893 if (tree()->is_empty()) {

	4894 // If this is the first range we just insert into the table.

	4895 ZoneSplayTree<Config>::Locator loc;

	4896 DCHECK_RESULT(tree()->Insert(current.from(), &loc));

	4897 loc.set_value(Entry(current.from(), current.to(),

	4898 empty()->Extend(value, zone)));

	4899 return;

	4900 }

	4901 // First see if there is a range to the left of this one that

	4902 // overlaps.

	4903 ZoneSplayTree<Config>::Locator loc;

	4904 if (tree()->FindGreatestLessThan(current.from(), &loc)) {

	4905 Entry* entry = &loc.value();

	4906 // If we've found a range that overlaps with this one, and it

	4907 // starts strictly to the left of this one, we have to fix it

	4908 // because the following code only handles ranges that start on

	4909 // or after the start point of the range we're adding.

	4910 if (entry->from() < current.from() && entry->to() >= current.from()) {

	4911 // Snap the overlapping range in half around the start point of

	4912 // the range we're adding.

	4913 CharacterRange left(entry->from(), current.from() - 1);

	4914 CharacterRange right(current.from(), entry->to());

	4915 // The left part of the overlapping range doesn't overlap.

	4916 // Truncate the whole entry to be just the left part.

	4917 entry->set_to(left.to());

	4918 // The right part is the one that overlaps. We add this part

	4919 // to the map and let the next step deal with merging it with

	4920 // the range we're adding.

	4921 ZoneSplayTree<Config>::Locator loc;

	4922 DCHECK_RESULT(tree()->Insert(right.from(), &loc));

	4923 loc.set_value(Entry(right.from(),

	4924 right.to(),

	4925 entry->out_set()));

	4926 }

	4927 }

	4928 while (current.is_valid()) {

	4929 if (tree()->FindLeastGreaterThan(current.from(), &loc) &&

	4930 (loc.value().from() <= current.to()) &&

	4931 (loc.value().to() >= current.from())) {

	4932 Entry* entry = &loc.value();

	4933 // We have overlap. If there is space between the start point of

	4934 // the range we're adding and where the overlapping range starts

	4935 // then we have to add a range covering just that space.

	4936 if (current.from() < entry->from()) {

	4937 ZoneSplayTree<Config>::Locator ins;

	4938 DCHECK_RESULT(tree()->Insert(current.from(), &ins));

	4939 ins.set_value(Entry(current.from(),

	4940 entry->from() - 1,

	4941 empty()->Extend(value, zone)));

	4942 current.set_from(entry->from());

	4943 }

	4944 DCHECK_EQ(current.from(), entry->from());

	4945 // If the overlapping range extends beyond the one we want to add

	4946 // we have to snap the right part off and add it separately.

	4947 if (entry->to() > current.to()) {

	4948 ZoneSplayTree<Config>::Locator ins;

	4949 DCHECK_RESULT(tree()->Insert(current.to() + 1, &ins));

	4950 ins.set_value(Entry(current.to() + 1,

	4951 entry->to(),

	4952 entry->out_set()));

	4953 entry->set_to(current.to());

	4954 }

	4955 DCHECK(entry->to() <= current.to());

	4956 // The overlapping range is now completely contained by the range

	4957 // we're adding so we can just update it and move the start point

	4958 // of the range we're adding just past it.

	4959 entry->AddValue(value, zone);

	4960 // Bail out if the last interval ended at 0xFFFF since otherwise

	4961 // adding 1 will wrap around to 0.

	4962 if (entry->to() == String::kMaxUtf16CodeUnit)

	4963 break;

	4964 DCHECK(entry->to() + 1 > current.from());

	4965 current.set_from(entry->to() + 1);

	4966 } else {

	4967 // There is no overlap so we can just add the range

	4968 ZoneSplayTree<Config>::Locator ins;

	4969 DCHECK_RESULT(tree()->Insert(current.from(), &ins));

	4970 ins.set_value(Entry(current.from(),

	4971 current.to(),

	4972 empty()->Extend(value, zone)));

	4973 break;

	4974 }

	4975 }

	4976 }

	4977

	4978

	4979 OutSet* DispatchTable::Get(uc16 value) {

	4980 ZoneSplayTree<Config>::Locator loc;

	4981 if (!tree()->FindGreatestLessThan(value, &loc))

	4982 return empty();

	4983 Entry* entry = &loc.value();

	4984 if (value <= entry->to())

	4985 return entry->out_set();

	4986 else

	4987 return empty();

	4988 }

	4989

	4990

	4991 // -------------------------------------------------------------------

	4992 // Analysis

	4993

	4994

	4995 void Analysis::EnsureAnalyzed(RegExpNode* that) {

	4996 StackLimitCheck check(that->zone()->isolate());

	4997 if (check.HasOverflowed()) {

	4998 fail("Stack overflow");

	4999 return;

	5000 }

	5001 if (that->info()->been_analyzed \|\| that->info()->being_analyzed)

	5002 return;

	5003 that->info()->being_analyzed = true;

	5004 that->Accept(this);

	5005 that->info()->being_analyzed = false;

	5006 that->info()->been_analyzed = true;

	5007 }

	5008

	5009

	5010 void Analysis::VisitEnd(EndNode* that) {

	5011 // nothing to do

	5012 }

	5013

	5014

	5015 void TextNode::CalculateOffsets() {

	5016 int element_count = elements()->length();

	5017 // Set up the offsets of the elements relative to the start. This is a fixed

	5018 // quantity since a TextNode can only contain fixed-width things.

	5019 int cp_offset = 0;

	5020 for (int i = 0; i < element_count; i++) {

	5021 TextElement& elm = elements()->at(i);

	5022 elm.set_cp_offset(cp_offset);

	5023 cp_offset += elm.length();

	5024 }

	5025 }

	5026

	5027

	5028 void Analysis::VisitText(TextNode* that) {

	5029 if (ignore_case_) {

	5030 that->MakeCaseIndependent(is_one_byte_);

	5031 }

	5032 EnsureAnalyzed(that->on_success());

	5033 if (!has_failed()) {

	5034 that->CalculateOffsets();

	5035 }

	5036 }

	5037

	5038

	5039 void Analysis::VisitAction(ActionNode* that) {

	5040 RegExpNode* target = that->on_success();

	5041 EnsureAnalyzed(target);

	5042 if (!has_failed()) {

	5043 // If the next node is interested in what it follows then this node

	5044 // has to be interested too so it can pass the information on.

	5045 that->info()->AddFromFollowing(target->info());

	5046 }

	5047 }

	5048

	5049

	5050 void Analysis::VisitChoice(ChoiceNode* that) {

	5051 NodeInfo* info = that->info();

	5052 for (int i = 0; i < that->alternatives()->length(); i++) {

	5053 RegExpNode* node = that->alternatives()->at(i).node();

	5054 EnsureAnalyzed(node);

	5055 if (has_failed()) return;

	5056 // Anything the following nodes need to know has to be known by

	5057 // this node also, so it can pass it on.

	5058 info->AddFromFollowing(node->info());

	5059 }

	5060 }

	5061

	5062

	5063 void Analysis::VisitLoopChoice(LoopChoiceNode* that) {

	5064 NodeInfo* info = that->info();

	5065 for (int i = 0; i < that->alternatives()->length(); i++) {

	5066 RegExpNode* node = that->alternatives()->at(i).node();

	5067 if (node != that->loop_node()) {

	5068 EnsureAnalyzed(node);

	5069 if (has_failed()) return;

	5070 info->AddFromFollowing(node->info());

	5071 }

	5072 }

	5073 // Check the loop last since it may need the value of this node

	5074 // to get a correct result.

	5075 EnsureAnalyzed(that->loop_node());

	5076 if (!has_failed()) {

	5077 info->AddFromFollowing(that->loop_node()->info());

	5078 }

	5079 }

	5080

	5081

	5082 void Analysis::VisitBackReference(BackReferenceNode* that) {

	5083 EnsureAnalyzed(that->on_success());

	5084 }

	5085

	5086

	5087 void Analysis::VisitAssertion(AssertionNode* that) {

	5088 EnsureAnalyzed(that->on_success());

	5089 }

	5090

	5091

	5092 void BackReferenceNode::FillInBMInfo(int offset,

	5093 int budget,

	5094 BoyerMooreLookahead* bm,

	5095 bool not_at_start) {

	5096 // Working out the set of characters that a backreference can match is too

	5097 // hard, so we just say that any character can match.

	5098 bm->SetRest(offset);

	5099 SaveBMInfo(bm, not_at_start, offset);

	5100 }

	5101

	5102

	5103 STATIC_ASSERT(BoyerMoorePositionInfo::kMapSize ==

	5104 RegExpMacroAssembler::kTableSize);

	5105

	5106

	5107 void ChoiceNode::FillInBMInfo(int offset,

	5108 int budget,

	5109 BoyerMooreLookahead* bm,

	5110 bool not_at_start) {

	5111 ZoneList<GuardedAlternative>* alts = alternatives();

	5112 budget = (budget - 1) / alts->length();

	5113 for (int i = 0; i < alts->length(); i++) {

	5114 GuardedAlternative& alt = alts->at(i);

	5115 if (alt.guards() != NULL && alt.guards()->length() != 0) {

	5116 bm->SetRest(offset); // Give up trying to fill in info.

	5117 SaveBMInfo(bm, not_at_start, offset);

	5118 return;

	5119 }

	5120 alt.node()->FillInBMInfo(offset, budget, bm, not_at_start);

	5121 }

	5122 SaveBMInfo(bm, not_at_start, offset);

	5123 }

	5124

	5125

	5126 void TextNode::FillInBMInfo(int initial_offset,

	5127 int budget,

	5128 BoyerMooreLookahead* bm,

	5129 bool not_at_start) {

	5130 if (initial_offset >= bm->length()) return;

	5131 int offset = initial_offset;

	5132 int max_char = bm->max_char();

	5133 for (int i = 0; i < elements()->length(); i++) {

	5134 if (offset >= bm->length()) {

	5135 if (initial_offset == 0) set_bm_info(not_at_start, bm);

	5136 return;

	5137 }

	5138 TextElement text = elements()->at(i);

	5139 if (text.text_type() == TextElement::ATOM) {

	5140 RegExpAtom* atom = text.atom();

	5141 for (int j = 0; j < atom->length(); j++, offset++) {

	5142 if (offset >= bm->length()) {

	5143 if (initial_offset == 0) set_bm_info(not_at_start, bm);

	5144 return;

	5145 }

	5146 uc16 character = atom->data()[j];

	5147 if (bm->compiler()->ignore_case()) {

	5148 unibrow::uchar chars[unibrow::Ecma262UnCanonicalize::kMaxWidth];

	5149 int length = GetCaseIndependentLetters(

	5150 Isolate::Current(),

	5151 character,

	5152 bm->max_char() == String::kMaxOneByteCharCode,

	5153 chars);

	5154 for (int j = 0; j < length; j++) {

	5155 bm->Set(offset, chars[j]);

	5156 }

	5157 } else {

	5158 if (character <= max_char) bm->Set(offset, character);

	5159 }

	5160 }

	5161 } else {

	5162 DCHECK_EQ(TextElement::CHAR_CLASS, text.text_type());

	5163 RegExpCharacterClass* char_class = text.char_class();

	5164 ZoneList<CharacterRange>* ranges = char_class->ranges(zone());

	5165 if (char_class->is_negated()) {

	5166 bm->SetAll(offset);

	5167 } else {

	5168 for (int k = 0; k < ranges->length(); k++) {

	5169 CharacterRange& range = ranges->at(k);

	5170 if (range.from() > max_char) continue;

	5171 int to = Min(max_char, static_cast<int>(range.to()));

	5172 bm->SetInterval(offset, Interval(range.from(), to));

	5173 }

	5174 }

	5175 offset++;

	5176 }

	5177 }

	5178 if (offset >= bm->length()) {

	5179 if (initial_offset == 0) set_bm_info(not_at_start, bm);

	5180 return;

	5181 }

	5182 on_success()->FillInBMInfo(offset,

	5183 budget - 1,

	5184 bm,

	5185 true); // Not at start after a text node.

	5186 if (initial_offset == 0) set_bm_info(not_at_start, bm);

	5187 }

	5188

	5189

	5190 // -------------------------------------------------------------------

	5191 // Dispatch table construction

	5192

	5193

	5194 void DispatchTableConstructor::VisitEnd(EndNode* that) {

	5195 AddRange(CharacterRange::Everything());

	5196 }

	5197

	5198

	5199 void DispatchTableConstructor::BuildTable(ChoiceNode* node) {

	5200 node->set_being_calculated(true);

	5201 ZoneList<GuardedAlternative>* alternatives = node->alternatives();

	5202 for (int i = 0; i < alternatives->length(); i++) {

	5203 set_choice_index(i);

	5204 alternatives->at(i).node()->Accept(this);

	5205 }

	5206 node->set_being_calculated(false);

	5207 }

	5208

	5209

	5210 class AddDispatchRange {

	5211 public:

	5212 explicit AddDispatchRange(DispatchTableConstructor* constructor)

	5213 : constructor_(constructor) { }

	5214 void Call(uc32 from, DispatchTable::Entry entry);

	5215 private:

	5216 DispatchTableConstructor* constructor_;

	5217 };

	5218

	5219

	5220 void AddDispatchRange::Call(uc32 from, DispatchTable::Entry entry) {

	5221 CharacterRange range(from, entry.to());

	5222 constructor_->AddRange(range);

	5223 }

	5224

	5225

	5226 void DispatchTableConstructor::VisitChoice(ChoiceNode* node) {

	5227 if (node->being_calculated())

	5228 return;

	5229 DispatchTable* table = node->GetTable(ignore_case_);

	5230 AddDispatchRange adder(this);

	5231 table->ForEach(&adder);

	5232 }

	5233

	5234

	5235 void DispatchTableConstructor::VisitBackReference(BackReferenceNode* that) {

	5236 // TODO(160): Find the node that we refer back to and propagate its start

	5237 // set back to here. For now we just accept anything.

	5238 AddRange(CharacterRange::Everything());

	5239 }

	5240

	5241

	5242 void DispatchTableConstructor::VisitAssertion(AssertionNode* that) {

	5243 RegExpNode* target = that->on_success();

	5244 target->Accept(this);

	5245 }

	5246

	5247

	5248 static int CompareRangeByFrom(const CharacterRange* a,

	5249 const CharacterRange* b) {

	5250 return Compare<uc16>(a->from(), b->from());

	5251 }

	5252

	5253

	5254 void DispatchTableConstructor::AddInverse(ZoneList<CharacterRange>* ranges) {

	5255 ranges->Sort(CompareRangeByFrom);

	5256 uc16 last = 0;

	5257 for (int i = 0; i < ranges->length(); i++) {

	5258 CharacterRange range = ranges->at(i);

	5259 if (last < range.from())

	5260 AddRange(CharacterRange(last, range.from() - 1));

	5261 if (range.to() >= last) {

	5262 if (range.to() == String::kMaxUtf16CodeUnit) {

	5263 return;

	5264 } else {

	5265 last = range.to() + 1;

	5266 }

	5267 }

	5268 }

	5269 AddRange(CharacterRange(last, String::kMaxUtf16CodeUnit));

	5270 }

	5271

	5272

	5273 void DispatchTableConstructor::VisitText(TextNode* that) {

	5274 TextElement elm = that->elements()->at(0);

	5275 switch (elm.text_type()) {

	5276 case TextElement::ATOM: {

	5277 uc16 c = elm.atom()->data()[0];

	5278 AddRange(CharacterRange(c, c));

	5279 break;

	5280 }

	5281 case TextElement::CHAR_CLASS: {

	5282 RegExpCharacterClass* tree = elm.char_class();

	5283 ZoneList<CharacterRange>* ranges = tree->ranges(that->zone());

	5284 if (tree->is_negated()) {

	5285 AddInverse(ranges);

	5286 } else {

	5287 for (int i = 0; i < ranges->length(); i++)

	5288 AddRange(ranges->at(i));

	5289 }

	5290 break;

	5291 }

	5292 default: {

	5293 UNIMPLEMENTED();

	5294 }

	5295 }

	5296 }

	5297

	5298

	5299 void DispatchTableConstructor::VisitAction(ActionNode* that) {

	5300 RegExpNode* target = that->on_success();

	5301 target->Accept(this);

	5302 }

	5303

	5304

	5305 RegExpEngine::CompilationResult RegExpEngine::Compile(

	5306 RegExpCompileData* data, bool ignore_case, bool is_global,

	5307 bool is_multiline, bool is_sticky, Handle<String> pattern,

	5308 Handle<String> sample_subject, bool is_one_byte, Zone* zone) {

	5309 if ((data->capture_count + 1) * 2 - 1 > RegExpMacroAssembler::kMaxRegister) {

	5310 return IrregexpRegExpTooBig(zone->isolate());

	5311 }

	5312 RegExpCompiler compiler(data->capture_count, ignore_case, is_one_byte, zone);

	5313

	5314 // Sample some characters from the middle of the string.

	5315 static const int kSampleSize = 128;

	5316

	5317 sample_subject = String::Flatten(sample_subject);

	5318 int chars_sampled = 0;

	5319 int half_way = (sample_subject->length() - kSampleSize) / 2;

	5320 for (int i = Max(0, half_way);

	5321 i < sample_subject->length() && chars_sampled < kSampleSize;

	5322 i++, chars_sampled++) {

	5323 compiler.frequency_collator()->CountCharacter(sample_subject->Get(i));

	5324 }

	5325

	5326 // Wrap the body of the regexp in capture #0.

	5327 RegExpNode* captured_body = RegExpCapture::ToNode(data->tree,

	5328 0,

	5329 &compiler,

	5330 compiler.accept());

	5331 RegExpNode* node = captured_body;

	5332 bool is_end_anchored = data->tree->IsAnchoredAtEnd();

	5333 bool is_start_anchored = data->tree->IsAnchoredAtStart();

	5334 int max_length = data->tree->max_match();

	5335 if (!is_start_anchored && !is_sticky) {

	5336 // Add a .*? at the beginning, outside the body capture, unless

	5337 // this expression is anchored at the beginning or sticky.

	5338 RegExpNode* loop_node =

	5339 RegExpQuantifier::ToNode(0,

	5340 RegExpTree::kInfinity,

	5341 false,

	5342 new(zone) RegExpCharacterClass('*'),

	5343 &compiler,

	5344 captured_body,

	5345 data->contains_anchor);

	5346

	5347 if (data->contains_anchor) {

	5348 // Unroll loop once, to take care of the case that might start

	5349 // at the start of input.

	5350 ChoiceNode* first_step_node = new(zone) ChoiceNode(2, zone);

	5351 first_step_node->AddAlternative(GuardedAlternative(captured_body));

	5352 first_step_node->AddAlternative(GuardedAlternative(

	5353 new(zone) TextNode(new(zone) RegExpCharacterClass('*'), loop_node)));

	5354 node = first_step_node;

	5355 } else {

	5356 node = loop_node;

	5357 }

	5358 }

	5359 if (is_one_byte) {

	5360 node = node->FilterOneByte(RegExpCompiler::kMaxRecursion, ignore_case);

	5361 // Do it again to propagate the new nodes to places where they were not

	5362 // put because they had not been calculated yet.

	5363 if (node != NULL) {

	5364 node = node->FilterOneByte(RegExpCompiler::kMaxRecursion, ignore_case);

	5365 }

	5366 }

	5367

	5368 if (node == NULL) node = new(zone) EndNode(EndNode::BACKTRACK, zone);

	5369 data->node = node;

	5370 Analysis analysis(ignore_case, is_one_byte);

	5371 analysis.EnsureAnalyzed(node);

	5372 if (analysis.has_failed()) {

	5373 const char* error_message = analysis.error_message();

	5374 return CompilationResult(zone->isolate(), error_message);

	5375 }

	5376

	5377 // Create the correct assembler for the architecture.

	5378 #ifndef V8_INTERPRETED_REGEXP

	5379 // Native regexp implementation.

	5380

	5381 NativeRegExpMacroAssembler::Mode mode =

	5382 is_one_byte ? NativeRegExpMacroAssembler::LATIN1

	5383 : NativeRegExpMacroAssembler::UC16;

	5384

	5385 #if V8_TARGET_ARCH_IA32

	5386 RegExpMacroAssemblerIA32 macro_assembler(mode, (data->capture_count + 1) * 2,

	5387 zone);

	5388 #elif V8_TARGET_ARCH_X64

	5389 RegExpMacroAssemblerX64 macro_assembler(mode, (data->capture_count + 1) * 2,

	5390 zone);

	5391 #elif V8_TARGET_ARCH_ARM

	5392 RegExpMacroAssemblerARM macro_assembler(mode, (data->capture_count + 1) * 2,

	5393 zone);

	5394 #elif V8_TARGET_ARCH_ARM64

	5395 RegExpMacroAssemblerARM64 macro_assembler(mode, (data->capture_count + 1) * 2,

	5396 zone);

	5397 #elif V8_TARGET_ARCH_MIPS

	5398 RegExpMacroAssemblerMIPS macro_assembler(mode, (data->capture_count + 1) * 2,

	5399 zone);

	5400 #elif V8_TARGET_ARCH_MIPS64

	5401 RegExpMacroAssemblerMIPS macro_assembler(mode, (data->capture_count + 1) * 2,

	5402 zone);

	5403 #elif V8_TARGET_ARCH_X87

	5404 RegExpMacroAssemblerX87 macro_assembler(mode, (data->capture_count + 1) * 2,

	5405 zone);

	5406 #else

	5407 #error "Unsupported architecture"

	5408 #endif

	5409

	5410 #else // V8_INTERPRETED_REGEXP

	5411 // Interpreted regexp implementation.

	5412 EmbeddedVector<byte, 1024> codes;

	5413 RegExpMacroAssemblerIrregexp macro_assembler(codes, zone);

	5414 #endif // V8_INTERPRETED_REGEXP

	5415

	5416 // Inserted here, instead of in Assembler, because it depends on information

	5417 // in the AST that isn't replicated in the Node structure.

	5418 static const int kMaxBacksearchLimit = 1024;

	5419 if (is_end_anchored &&

	5420 !is_start_anchored &&

	5421 max_length < kMaxBacksearchLimit) {

	5422 macro_assembler.SetCurrentPositionFromEnd(max_length);

	5423 }

	5424

	5425 if (is_global) {

	5426 macro_assembler.set_global_mode(

	5427 (data->tree->min_match() > 0)

	5428 ? RegExpMacroAssembler::GLOBAL_NO_ZERO_LENGTH_CHECK

	5429 : RegExpMacroAssembler::GLOBAL);

	5430 }

	5431

	5432 return compiler.Assemble(&macro_assembler,

	5433 node,

	5434 data->capture_count,

	5435 pattern);

	5436 }

	5437

	5438 } // namespace dart

OLD	NEW

« no previous file with comments | « runtime/vm/regexp.h ('k') | runtime/vm/regexp_assembler.h » ('j') | no next file with comments »