|
|
Chromium Code Reviews
Descriptionoptimize Deoptimizer::GetOutputInfo to binary search
BUG=
Patch Set 1 #
Messages
Total messages: 8 (0 generated)
@yangguo , @kasper, please review it, thanks!
NOT LGTM: I don't think that anything deoptimizer-related is really performance-critical. (If we have benchmarks for this, I'd like too see them.) Furthermore, *if* we decide to improve this, we should really use std::binary_search: Making DeoptimizationOutputData a ForwardIterator is very easy, much less error-prone than writing the n-th binary search by hand and much more readable and flexible.
On 2014/06/30 06:12:55, Sven Panne wrote: > NOT LGTM: I don't think that anything deoptimizer-related is really > performance-critical. (If we have benchmarks for this, I'd like too see them.) > Furthermore, *if* we decide to improve this, we should really use > std::binary_search: Making DeoptimizationOutputData a ForwardIterator is very > easy, much less error-prone than writing the n-th binary search by hand and much > more readable and flexible. I used OProfile to take a look d8's hot functions. command : d8 run.js here is result: CPU: CPU with timer interrupt, speed 2472.74 MHz (estimated) Profiling through timer interrupt samples % symbol name 1166 5.3772 v8::internal::UseInterval::start() const 995 4.5886 v8::internal::LiveRange::first_interval() const 523 2.4119 v8::internal::HSuccessorIterator::Done() 501 2.3105 v8::internal::HandleScope::ZapRange(v8::internal::Object**, v8::internal::Object**) 468 2.1583 v8::internal::Object::Number() 432 1.9923 v8::internal::Runtime_Interrupt(int, v8::internal::Object**, v8::internal::Isolate*) 382 1.7617 v8::internal::LAllocator::UnhandledIsSorted() 355 1.6372 v8::internal::HBasicBlock::dominator() const 348 1.6049 v8::internal::LiveRange::Start() const 330 1.5219 v8::internal::FixedArray::get(int) 314 1.4481 v8::internal::HeapObject::map_word() 310 1.4296 v8::internal::Heap::ZapFromSpace() 307 1.4158 v8::internal::List<v8::internal::LiveRange*, v8::internal::ZoneAllocationPolicy>::operator[](int) const 301 1.3881 v8::internal::LiveRange::IsEmpty() const 297 1.3697 v8::internal::HBasicBlock::block_id() const 246 1.1345 v8::internal::LifetimePosition::Value() const 242 1.1160 v8::internal::DeoptimizationOutputData::AstId(int) 237 1.0930 v8::internal::List<v8::internal::LiveRange*, v8::internal::ZoneAllocationPolicy>::at(int) const 221 1.0192 v8::internal::EnumSet<v8::internal::GVNFlag, int>::Add(v8::internal::GVNFlag) 206 0.9500 v8::internal::Internals::SmiValue(v8::internal::Object*) 203 0.9362 v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object*) 198 0.9131 v8::internal::HBasicBlock::end() const 191 0.8808 v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, v8::internal::Object**, v8::internal::Isolate*) 189 0.8716 v8::internal::HBasicBlock::Dominates(v8::internal::HBasicBlock*) const 188 0.8670 v8::internal::Scanner::DoubleValue() 181 0.8347 v8::internal::Deoptimizer::GetOutputInfo(v8::internal::DeoptimizationOutputData*, v8::internal::BailoutId, v8::internal::SharedFunctionInfo*) 174 0.8024 v8::internal::Smi::value() 148 0.6825 v8::internal::HTemplateControlInstruction<1, 0>::SuccessorAt(int) 0.8347 v8::internal::Deoptimizer::GetOutputInfo cost 0.8% , After I optimize it , Oprofile shows it reduce the time cost.
On 2014/06/30 09:08:06, jianghua wrote: > On 2014/06/30 06:12:55, Sven Panne wrote: > > NOT LGTM: I don't think that anything deoptimizer-related is really > > performance-critical. (If we have benchmarks for this, I'd like too see them.) > > Furthermore, *if* we decide to improve this, we should really use > > std::binary_search: Making DeoptimizationOutputData a ForwardIterator is very > > easy, much less error-prone than writing the n-th binary search by hand and > much > > more readable and flexible. > > I used OProfile to take a look d8's hot functions. > > command : d8 run.js > > > here is result: > CPU: CPU with timer interrupt, speed 2472.74 MHz (estimated) > Profiling through timer interrupt > samples % symbol name > 1166 5.3772 v8::internal::UseInterval::start() const > 995 4.5886 v8::internal::LiveRange::first_interval() const > 523 2.4119 v8::internal::HSuccessorIterator::Done() > 501 2.3105 v8::internal::HandleScope::ZapRange(v8::internal::Object**, > v8::internal::Object**) > 468 2.1583 v8::internal::Object::Number() > 432 1.9923 v8::internal::Runtime_Interrupt(int, v8::internal::Object**, > v8::internal::Isolate*) > 382 1.7617 v8::internal::LAllocator::UnhandledIsSorted() > 355 1.6372 v8::internal::HBasicBlock::dominator() const > 348 1.6049 v8::internal::LiveRange::Start() const > 330 1.5219 v8::internal::FixedArray::get(int) > 314 1.4481 v8::internal::HeapObject::map_word() > 310 1.4296 v8::internal::Heap::ZapFromSpace() > 307 1.4158 v8::internal::List<v8::internal::LiveRange*, > v8::internal::ZoneAllocationPolicy>::operator[](int) const > 301 1.3881 v8::internal::LiveRange::IsEmpty() const > 297 1.3697 v8::internal::HBasicBlock::block_id() const > 246 1.1345 v8::internal::LifetimePosition::Value() const > 242 1.1160 v8::internal::DeoptimizationOutputData::AstId(int) > 237 1.0930 v8::internal::List<v8::internal::LiveRange*, > v8::internal::ZoneAllocationPolicy>::at(int) const > 221 1.0192 v8::internal::EnumSet<v8::internal::GVNFlag, > int>::Add(v8::internal::GVNFlag) > 206 0.9500 v8::internal::Internals::SmiValue(v8::internal::Object*) > 203 0.9362 v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object*) > 198 0.9131 v8::internal::HBasicBlock::end() const > 191 0.8808 v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, > v8::internal::Object**, v8::internal::Isolate*) > 189 0.8716 > v8::internal::HBasicBlock::Dominates(v8::internal::HBasicBlock*) const > 188 0.8670 v8::internal::Scanner::DoubleValue() > 181 0.8347 > v8::internal::Deoptimizer::GetOutputInfo(v8::internal::DeoptimizationOutputData*, > v8::internal::BailoutId, v8::internal::SharedFunctionInfo*) > 174 0.8024 v8::internal::Smi::value() > 148 0.6825 v8::internal::HTemplateControlInstruction<1, > 0>::SuccessorAt(int) > > > 0.8347 v8::internal::Deoptimizer::GetOutputInfo cost 0.8% , After I optimize it > , Oprofile shows it reduce the time cost. What's that run.js you are referring to?
On 2014/06/30 09:10:17, Yang wrote: > On 2014/06/30 09:08:06, jianghua wrote: > > On 2014/06/30 06:12:55, Sven Panne wrote: > > > NOT LGTM: I don't think that anything deoptimizer-related is really > > > performance-critical. (If we have benchmarks for this, I'd like too see > them.) > > > Furthermore, *if* we decide to improve this, we should really use > > > std::binary_search: Making DeoptimizationOutputData a ForwardIterator is > very > > > easy, much less error-prone than writing the n-th binary search by hand and > > much > > > more readable and flexible. > > > > I used OProfile to take a look d8's hot functions. > > > > command : d8 run.js > > > > > > here is result: > > CPU: CPU with timer interrupt, speed 2472.74 MHz (estimated) > > Profiling through timer interrupt > > samples % symbol name > > 1166 5.3772 v8::internal::UseInterval::start() const > > 995 4.5886 v8::internal::LiveRange::first_interval() const > > 523 2.4119 v8::internal::HSuccessorIterator::Done() > > 501 2.3105 v8::internal::HandleScope::ZapRange(v8::internal::Object**, > > v8::internal::Object**) > > 468 2.1583 v8::internal::Object::Number() > > 432 1.9923 v8::internal::Runtime_Interrupt(int, v8::internal::Object**, > > v8::internal::Isolate*) > > 382 1.7617 v8::internal::LAllocator::UnhandledIsSorted() > > 355 1.6372 v8::internal::HBasicBlock::dominator() const > > 348 1.6049 v8::internal::LiveRange::Start() const > > 330 1.5219 v8::internal::FixedArray::get(int) > > 314 1.4481 v8::internal::HeapObject::map_word() > > 310 1.4296 v8::internal::Heap::ZapFromSpace() > > 307 1.4158 v8::internal::List<v8::internal::LiveRange*, > > v8::internal::ZoneAllocationPolicy>::operator[](int) const > > 301 1.3881 v8::internal::LiveRange::IsEmpty() const > > 297 1.3697 v8::internal::HBasicBlock::block_id() const > > 246 1.1345 v8::internal::LifetimePosition::Value() const > > 242 1.1160 v8::internal::DeoptimizationOutputData::AstId(int) > > 237 1.0930 v8::internal::List<v8::internal::LiveRange*, > > v8::internal::ZoneAllocationPolicy>::at(int) const > > 221 1.0192 v8::internal::EnumSet<v8::internal::GVNFlag, > > int>::Add(v8::internal::GVNFlag) > > 206 0.9500 v8::internal::Internals::SmiValue(v8::internal::Object*) > > 203 0.9362 > v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object*) > > 198 0.9131 v8::internal::HBasicBlock::end() const > > 191 0.8808 > v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, > > v8::internal::Object**, v8::internal::Isolate*) > > 189 0.8716 > > v8::internal::HBasicBlock::Dominates(v8::internal::HBasicBlock*) const > > 188 0.8670 v8::internal::Scanner::DoubleValue() > > 181 0.8347 > > > v8::internal::Deoptimizer::GetOutputInfo(v8::internal::DeoptimizationOutputData*, > > v8::internal::BailoutId, v8::internal::SharedFunctionInfo*) > > 174 0.8024 v8::internal::Smi::value() > > 148 0.6825 v8::internal::HTemplateControlInstruction<1, > > 0>::SuccessorAt(int) > > > > > > 0.8347 v8::internal::Deoptimizer::GetOutputInfo cost 0.8% , After I optimize > it > > , Oprofile shows it reduce the time cost. > > What's that run.js you are referring to? oo, it's v8/benchmarks/run.js
On 2014/06/30 09:12:06, jianghua wrote: > On 2014/06/30 09:10:17, Yang wrote: > > On 2014/06/30 09:08:06, jianghua wrote: > > > On 2014/06/30 06:12:55, Sven Panne wrote: > > > > NOT LGTM: I don't think that anything deoptimizer-related is really > > > > performance-critical. (If we have benchmarks for this, I'd like too see > > them.) > > > > Furthermore, *if* we decide to improve this, we should really use > > > > std::binary_search: Making DeoptimizationOutputData a ForwardIterator is > > very > > > > easy, much less error-prone than writing the n-th binary search by hand > and > > > much > > > > more readable and flexible. > > > > > > I used OProfile to take a look d8's hot functions. > > > > > > command : d8 run.js > > > > > > > > > here is result: > > > CPU: CPU with timer interrupt, speed 2472.74 MHz (estimated) > > > Profiling through timer interrupt > > > samples % symbol name > > > 1166 5.3772 v8::internal::UseInterval::start() const > > > 995 4.5886 v8::internal::LiveRange::first_interval() const > > > 523 2.4119 v8::internal::HSuccessorIterator::Done() > > > 501 2.3105 > v8::internal::HandleScope::ZapRange(v8::internal::Object**, > > > v8::internal::Object**) > > > 468 2.1583 v8::internal::Object::Number() > > > 432 1.9923 v8::internal::Runtime_Interrupt(int, > v8::internal::Object**, > > > v8::internal::Isolate*) > > > 382 1.7617 v8::internal::LAllocator::UnhandledIsSorted() > > > 355 1.6372 v8::internal::HBasicBlock::dominator() const > > > 348 1.6049 v8::internal::LiveRange::Start() const > > > 330 1.5219 v8::internal::FixedArray::get(int) > > > 314 1.4481 v8::internal::HeapObject::map_word() > > > 310 1.4296 v8::internal::Heap::ZapFromSpace() > > > 307 1.4158 v8::internal::List<v8::internal::LiveRange*, > > > v8::internal::ZoneAllocationPolicy>::operator[](int) const > > > 301 1.3881 v8::internal::LiveRange::IsEmpty() const > > > 297 1.3697 v8::internal::HBasicBlock::block_id() const > > > 246 1.1345 v8::internal::LifetimePosition::Value() const > > > 242 1.1160 v8::internal::DeoptimizationOutputData::AstId(int) > > > 237 1.0930 v8::internal::List<v8::internal::LiveRange*, > > > v8::internal::ZoneAllocationPolicy>::at(int) const > > > 221 1.0192 v8::internal::EnumSet<v8::internal::GVNFlag, > > > int>::Add(v8::internal::GVNFlag) > > > 206 0.9500 v8::internal::Internals::SmiValue(v8::internal::Object*) > > > 203 0.9362 > > v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object*) > > > 198 0.9131 v8::internal::HBasicBlock::end() const > > > 191 0.8808 > > v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, > > > v8::internal::Object**, v8::internal::Isolate*) > > > 189 0.8716 > > > v8::internal::HBasicBlock::Dominates(v8::internal::HBasicBlock*) const > > > 188 0.8670 v8::internal::Scanner::DoubleValue() > > > 181 0.8347 > > > > > > v8::internal::Deoptimizer::GetOutputInfo(v8::internal::DeoptimizationOutputData*, > > > v8::internal::BailoutId, v8::internal::SharedFunctionInfo*) > > > 174 0.8024 v8::internal::Smi::value() > > > 148 0.6825 v8::internal::HTemplateControlInstruction<1, > > > 0>::SuccessorAt(int) > > > > > > > > > 0.8347 v8::internal::Deoptimizer::GetOutputInfo cost 0.8% , After I > optimize > > it > > > , Oprofile shows it reduce the time cost. > > > > What's that run.js you are referring to? > > oo, it's v8/benchmarks/run.js Are you sure you are measuring the release build? I see some functions in the profile that should not run in release build. (Debug build profiles are misleading - they perform expensive checks in ASSERTs, especially around the tricky parts of V8, such as deopt.)
On 2014/06/30 09:29:11, jarin wrote: > On 2014/06/30 09:12:06, jianghua wrote: > > On 2014/06/30 09:10:17, Yang wrote: > > > On 2014/06/30 09:08:06, jianghua wrote: > > > > On 2014/06/30 06:12:55, Sven Panne wrote: > > > > > NOT LGTM: I don't think that anything deoptimizer-related is really > > > > > performance-critical. (If we have benchmarks for this, I'd like too see > > > them.) > > > > > Furthermore, *if* we decide to improve this, we should really use > > > > > std::binary_search: Making DeoptimizationOutputData a ForwardIterator is > > > very > > > > > easy, much less error-prone than writing the n-th binary search by hand > > and > > > > much > > > > > more readable and flexible. > > > > > > > > I used OProfile to take a look d8's hot functions. > > > > > > > > command : d8 run.js > > > > > > > > > > > > here is result: > > > > CPU: CPU with timer interrupt, speed 2472.74 MHz (estimated) > > > > Profiling through timer interrupt > > > > samples % symbol name > > > > 1166 5.3772 v8::internal::UseInterval::start() const > > > > 995 4.5886 v8::internal::LiveRange::first_interval() const > > > > 523 2.4119 v8::internal::HSuccessorIterator::Done() > > > > 501 2.3105 > > v8::internal::HandleScope::ZapRange(v8::internal::Object**, > > > > v8::internal::Object**) > > > > 468 2.1583 v8::internal::Object::Number() > > > > 432 1.9923 v8::internal::Runtime_Interrupt(int, > > v8::internal::Object**, > > > > v8::internal::Isolate*) > > > > 382 1.7617 v8::internal::LAllocator::UnhandledIsSorted() > > > > 355 1.6372 v8::internal::HBasicBlock::dominator() const > > > > 348 1.6049 v8::internal::LiveRange::Start() const > > > > 330 1.5219 v8::internal::FixedArray::get(int) > > > > 314 1.4481 v8::internal::HeapObject::map_word() > > > > 310 1.4296 v8::internal::Heap::ZapFromSpace() > > > > 307 1.4158 v8::internal::List<v8::internal::LiveRange*, > > > > v8::internal::ZoneAllocationPolicy>::operator[](int) const > > > > 301 1.3881 v8::internal::LiveRange::IsEmpty() const > > > > 297 1.3697 v8::internal::HBasicBlock::block_id() const > > > > 246 1.1345 v8::internal::LifetimePosition::Value() const > > > > 242 1.1160 v8::internal::DeoptimizationOutputData::AstId(int) > > > > 237 1.0930 v8::internal::List<v8::internal::LiveRange*, > > > > v8::internal::ZoneAllocationPolicy>::at(int) const > > > > 221 1.0192 v8::internal::EnumSet<v8::internal::GVNFlag, > > > > int>::Add(v8::internal::GVNFlag) > > > > 206 0.9500 v8::internal::Internals::SmiValue(v8::internal::Object*) > > > > 203 0.9362 > > > v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object*) > > > > 198 0.9131 v8::internal::HBasicBlock::end() const > > > > 191 0.8808 > > > v8::internal::Runtime_StringReplaceGlobalRegExpWithString(int, > > > > v8::internal::Object**, v8::internal::Isolate*) > > > > 189 0.8716 > > > > v8::internal::HBasicBlock::Dominates(v8::internal::HBasicBlock*) const > > > > 188 0.8670 v8::internal::Scanner::DoubleValue() > > > > 181 0.8347 > > > > > > > > > > v8::internal::Deoptimizer::GetOutputInfo(v8::internal::DeoptimizationOutputData*, > > > > v8::internal::BailoutId, v8::internal::SharedFunctionInfo*) > > > > 174 0.8024 v8::internal::Smi::value() > > > > 148 0.6825 v8::internal::HTemplateControlInstruction<1, > > > > 0>::SuccessorAt(int) > > > > > > > > > > > > 0.8347 v8::internal::Deoptimizer::GetOutputInfo cost 0.8% , After I > > optimize > > > it > > > > , Oprofile shows it reduce the time cost. > > > > > > What's that run.js you are referring to? > > > > oo, it's v8/benchmarks/run.js > > Are you sure you are measuring the release build? I see some functions in the > profile that should not run in release build. (Debug build profiles are > misleading - they perform expensive checks in ASSERTs, especially around the > tricky parts of V8, such as deopt.) I closed macro DEBUG.
After discussing this with a colleague, you do seem to be measuring a build that fundamentally looks like debug build. In release mode, GetOutputInfo only gets called once per optimized code object. So you are replacing a O(ln(n) * m) with a O(ln(n) * n + m * ln(n)) algorithm (where n == number of deopt points and m == number of opt/deopts cycles per SharedFunctionInfo). For a small m--which something V8 strives for anyway--your patch likely makes the complexity and performance worse. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
