|
|
DescriptionIt derives the dng_host and overrides PerformAreaTask() to split the task into several sub-tasks which get added to SkTaskGroup.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1634763002
Committed: https://skia.googlesource.com/skia/+/b84b5b42c1d848aa2b87216b9bda2b8c9e5781c1
Patch Set 1 #Patch Set 2 : Fixes the tile organization. #
Total comments: 10
Patch Set 3 : Uses lambda instead of bind now. #Patch Set 4 : Simplifies the thread area calculation. #
Total comments: 2
Patch Set 5 : Addressed comment from adaubert. #
Total comments: 23
Patch Set 6 : Addressed a couple of comments. #Patch Set 7 : Differentiates between internal sub-tasks and threads. #Patch Set 8 : Synced with internal code. #Patch Set 9 : Resolves comments #Messages
Total messages: 30 (9 generated)
Description was changed from ========== WIP: This is how I like to do the multithreaded dng host in principal. Unfortunately the dm results in scrambled images. Can that happen due to my use of SkTaskGroup? Please comment and guide me. BUG=skia: ========== to ========== WIP: This is how I like to do the multithreaded dng host in principal. Unfortunately the dm results in scrambled images. Can that happen due to my use of SkTaskGroup? Please comment and guide me. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== WIP: This is how I like to do the multithreaded dng host in principal. Unfortunately the dm results in scrambled images. Can that happen due to my use of SkTaskGroup? Please comment and guide me. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== WIP: This is how I like to do the multithreaded dng host in principal. Unfortunately the dm results in scrambled images. Can that happen due to my use of SkTaskGroup? Please comment and guide me. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
ebrauer@google.com changed reviewers: + msarett@google.com, scroggo@google.com
PTAL
ebrauer@google.com changed reviewers: + djsollen@google.com
PTAL. As far as I can see with dm it is working now.
mtklein@google.com changed reviewers: + mtklein@google.com
Are you sure this is a situation that's best handled by multithreading, as opposed to improving the single-threaded performance of that library? https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:71: SkTaskGroup threadPool; This isn't really an independent thread pool; it represents a group of tasks that are multiplexed onto a global threadpool. Creating it does not create a thread pool, nor does destroying it destroy any threads. You should be fine to break your problem up into as many small parts as necessary. It'll usually perform better if you add more tasks to the task group than there are cpu threads, to allow for uneven task runtime. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:87: std::function<void()> func = std::bind(&dng_area_task::ProcessOnThread, &task, Generally I'd rather see a lambda for this than std::bind. You can usually pass them right into SkTaskGroup.add() without giving them a name.
On 2016/01/25 19:50:59, mtklein wrote: > Are you sure this is a situation that's best handled by multithreading, as > opposed to improving the single-threaded performance of that library? Not totally, but the multithreading part reduces the rendering time about 50% and is easy to do. > https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cpp > File src/codec/SkRawCodec.cpp (right): > > https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:71: SkTaskGroup threadPool; > This isn't really an independent thread pool; it represents a group of tasks > that are multiplexed onto a global threadpool. Creating it does not create a > thread pool, nor does destroying it destroy any threads. True, but the threadpool itself is hidden in SkTaskGroup. However a better name is taskGroup. > > You should be fine to break your problem up into as many small parts as > necessary. It'll usually perform better if you add more tasks to the task group > than there are cpu threads, to allow for uneven task runtime. Only constraint is that the dng task has a fix maximum that I must not exceed. > > https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:87: std::function<void()> func = > std::bind(&dng_area_task::ProcessOnThread, &task, > Generally I'd rather see a lambda for this than std::bind. > You can usually pass them right into SkTaskGroup.add() without giving them a > name. I am not familiar with lambdas. Could you help me with this?
https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:89: threadPool.add(func); Something like, threadPool.add([&task, this, taskIndex, threadArea, tileSize] { task.ProcessOnThread(taskIndex, threadArea, tileSize, this->Sniffer()); });
"> Are you sure this is a situation that's best handled by multithreading, as > opposed to improving the single-threaded performance of that library? Not totally, but the multithreading part reduces the rendering time about 50% and is easy to do." I have a similar concern. I believe a common Photos/Gallery app use case would be to decode many images at once. In that case, would it make sense to maximize single-threaded performance and allow a higher level user to perform the decodes in multiple threads? Can you point me to your performance test? It wouldn't surprise me if adding threads makes a single image decode faster, but I think the results hold more weight if we're testing a larger use case end to end. Have you talked to any of the Photos people (maybe judds@) about their opinion on multi-threading single image decodes? If they're in favor, maybe this is something we should consider for all of our codecs? In general, I think it's a little strange to have some codecs that are optimized for single thread performance and others that are multi-threaded.
https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:306: SkAutoTDelete<SkDngHost> host(fHost.release()); This change is not necessary. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:399: SkAutoTDelete<SkDngHost> fHost; This change is not necessary.
On 2016/01/25 21:49:00, msarett wrote: > "> Are you sure this is a situation that's best handled by multithreading, as > > opposed to improving the single-threaded performance of that library? > Not totally, but the multithreading part reduces the rendering time about 50% > and is easy to do." > > I have a similar concern. > > I believe a common Photos/Gallery app use case would be to decode many images at > once. In that case, would it make sense to maximize single-threaded performance > and allow a higher level user to perform the decodes in multiple threads? > > Can you point me to your performance test? It wouldn't surprise me if adding > threads makes a single image decode faster, but I think the results hold more > weight if we're testing a larger use case end to end. > > Have you talked to any of the Photos people (maybe judds@) about their opinion > on multi-threading single image decodes? If they're in favor, maybe this is > something we should consider for all of our codecs? In general, I think it's a > little strange to have some codecs that are optimized for single thread > performance and others that are multi-threaded. We have benchmarks for Linux and manually tested on Android N with the Files app. In both cases we have experienced a speedup of 2x. (See CL/112784559) I can see your concern with this approach, but the code was not dramatically slow when we decode a smaller preview, but on the full size rendering it took a long time. I think the general use case is to decode a small preview to show the image in e.g. the Files app and to render the full image if the user tries to open the image in e.g. Fotos, for which our multithreading approach on a single image seems to be a good option. No, I haven't talked to Sam about this, but it makes sense to know what they prefer.
https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:71: SkTaskGroup threadPool; On 2016/01/25 19:50:59, mtklein wrote: > This isn't really an independent thread pool; it represents a group of tasks > that are multiplexed onto a global threadpool. Creating it does not create a > thread pool, nor does destroying it destroy any threads. > > You should be fine to break your problem up into as many small parts as > necessary. It'll usually perform better if you add more tasks to the task group > than there are cpu threads, to allow for uneven task runtime. Changed the name. The dng_task has a maximum we must not exceed. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:87: std::function<void()> func = std::bind(&dng_area_task::ProcessOnThread, &task, On 2016/01/25 19:50:59, mtklein wrote: > Generally I'd rather see a lambda for this than std::bind. > You can usually pass them right into SkTaskGroup.add() without giving them a > name. Done. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:89: threadPool.add(func); On 2016/01/25 21:02:01, mtklein wrote: > Something like, > > threadPool.add([&task, this, taskIndex, threadArea, tileSize] { > task.ProcessOnThread(taskIndex, threadArea, tileSize, this->Sniffer()); > }); Done. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:306: SkAutoTDelete<SkDngHost> host(fHost.release()); On 2016/01/26 09:26:37, adaubert wrote: > This change is not necessary. Done. https://codereview.chromium.org/1634763002/diff/20001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:399: SkAutoTDelete<SkDngHost> fHost; On 2016/01/26 09:26:38, adaubert wrote: > This change is not necessary. Done.
https://codereview.chromium.org/1634763002/diff/60001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/60001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:101: ++taskIndex; ++taskIndex; should be in the inner scope.
https://codereview.chromium.org/1634763002/diff/60001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/60001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:101: ++taskIndex; On 2016/01/26 14:20:24, adaubert wrote: > ++taskIndex; should be in the inner scope. Done.
“We have benchmarks for Linux and manually tested on Android N with the Files app. In both cases we have experienced a speedup of 2x. (See CL/112784559)“ Can you link to this CL? I’m not really sure what you’re referring to. “I can see your concern with this approach, but the code was not dramatically slow when we decode a smaller preview, but on the full size rendering it took a long time. I think the general use case is to decode a small preview to show the image in e.g. the Files app and to render the full image if the user tries to open the image in e.g. Fotos, for which our multithreading approach on a single image seems to be a good option.” I’m not sure I really understand what you’re saying here. My understanding is that we will always decode the preview (using Piex) when possible. And when it’s not possible, we will always decode the full image (using dng_sdk). So we cannot choose whether to decode a preview or the full image depending on the situation? We choose depending on if Piex supports the image? I do agree that the multi-threading approach is more practical when we know that we are only decoding a single image. And that use case is also important. In general, I’m feeling better about this, given that the dng_sdk seems to be designed to encourage multi-threaded decodes. Maybe this should have some influence on how we choose to design SkCodec? Still 32 (or 8) threads seems like a lot, if we are assuming that clients will be multi-threading decodes at the BitmapFactory level. Of course, I am just speculating. It’d be great if we could test how this impacts multiple decodes in parallel using BitmapFactory. Is there a way to parallelize your test using the Files app? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:44: return dng_point((areaSize.v + tileSize.v - 1) / tileSize.v, nit: This is non-critical, but can we use a ceil_div() helper function? SkGifCodec actually already has one. Can we share it in SkCodecPriv.h? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:60: if (tilesInThread.h < tilesInArea.h) { So, when possible, we will always split into tiles horizontally (as much as possible) before splitting into tiles vertically? Is there a reason for this? Or does it not matter? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:65: ThrowProgramError("num_tiles_per_thread calculation is wrong."); Can we signal an error here without throwing an exception? (Or maybe not, because this is used in overriding a function in the dng_sdk?) If it's code that will never be reached, you might just omit the else case? Or maybe more Skia-like: SkASSERT(false); return {0,0}; https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:76: const int maxThreads = static_cast<int>(kMaxMPThreads); So this is 32 on 64-bit systems and 8 on 32-bit systems? Can you explain why we need to get this value from the dng_sdk? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:79: const dng_point tileSize(task.FindTileSize(area)); Can you add a comment explaining how the dng_sdk determines the tile size? Side note: It would be useful for me to know some typical values of "area" and "tileSize" (and how those compare to the image size). I've looked at this function in dng_sdk and find it a little confusing. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:108: uint32 PerformAreaTaskThreads() override { I'm not sure how this is used, but the comments in dng_host.h make it seem like this should return the actual number of threads (rather than the max).
On 2016/01/26 21:34:03, msarett wrote: > “We have benchmarks for Linux and manually tested on Android N with the Files > app. In both cases we have experienced a speedup of 2x. (See CL/112784559)“ > > Can you link to this CL? I’m not really sure what you’re referring to. I was referring to this cl: https://critique.corp.google.com/#review/112784559 > > “I can see your concern with this approach, but the code was not dramatically > slow when we decode a smaller preview, but on the full size rendering it took a > long time. I think the general use case is to decode a small preview to show > the image in e.g. the Files app and to render the full image if the user tries > to open the image in e.g. Fotos, for which our multithreading approach on a > single image seems to be a good option.” > > I’m not sure I really understand what you’re saying here. My understanding is > that we will always decode the preview (using Piex) when possible. And when > it’s not possible, we will always decode the full image (using dng_sdk). So we > cannot choose whether to decode a preview or the full image depending on the > situation? We choose depending on if Piex supports the image? Just wanted to say that DNGs from Android do not contain preview images. Yes, we then use the dng sdk. However it is possible to render small, not full size images that are faster. > > I do agree that the multi-threading approach is more practical when we know that > we are only decoding a single image. And that use case is also important. > > > In general, I’m feeling better about this, given that the dng_sdk seems to be > designed to encourage multi-threaded decodes. Maybe this should have some > influence on how we choose to design SkCodec? Basically the SkTaskGroup and its threadpool could also influence the amount of parallel tasks. > > Still 32 (or 8) threads seems like a lot, if we are assuming that clients will > be multi-threading decodes at the BitmapFactory level. Of course, I am just > speculating. It’d be great if we could test how this impacts multiple decodes > in parallel using BitmapFactory. Is there a way to parallelize your test using > the Files app? Lets see. I can not react from home on this task. > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cpp > File src/codec/SkRawCodec.cpp (right): > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:44: return dng_point((areaSize.v + tileSize.v - 1) / > tileSize.v, > nit: > > This is non-critical, but can we use a ceil_div() helper function? SkGifCodec > actually already has one. Can we share it in SkCodecPriv.h? > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:60: if (tilesInThread.h < tilesInArea.h) { > So, when possible, we will always split into tiles horizontally (as much as > possible) before splitting into tiles vertically? Yes, that is the goal here. > > Is there a reason for this? Or does it not matter? The reason was a better locality mboehme asked for in that cl I have referenced above. > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:65: ThrowProgramError("num_tiles_per_thread calculation > is wrong."); > Can we signal an error here without throwing an exception? (Or maybe not, > because this is used in overriding a function in the dng_sdk?) > > If it's code that will never be reached, you might just omit the else case? > > Or maybe more Skia-like: > SkASSERT(false); > return {0,0}; > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:76: const int maxThreads = > static_cast<int>(kMaxMPThreads); > So this is 32 on 64-bit systems and 8 on 32-bit systems? > > Can you explain why we need to get this value from the dng_sdk? > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:79: const dng_point tileSize(task.FindTileSize(area)); > Can you add a comment explaining how the dng_sdk determines the tile size? > > Side note: It would be useful for me to know some typical values of "area" and > "tileSize" (and how those compare to the image size). I've looked at this > function in dng_sdk and find it a little confusing. > > https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... > src/codec/SkRawCodec.cpp:108: uint32 PerformAreaTaskThreads() override { > I'm not sure how this is used, but the comments in dng_host.h make it seem like > this should return the actual number of threads (rather than the max).
https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:44: return dng_point((areaSize.v + tileSize.v - 1) / tileSize.v, On 2016/01/26 21:34:03, msarett wrote: > nit: > > This is non-critical, but can we use a ceil_div() helper function? SkGifCodec > actually already has one. Can we share it in SkCodecPriv.h? Would it be Ok to add a Fixit? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:60: if (tilesInThread.h < tilesInArea.h) { On 2016/01/26 21:34:03, msarett wrote: > So, when possible, we will always split into tiles horizontally (as much as > possible) before splitting into tiles vertically? > > Is there a reason for this? Or does it not matter? I think it's preferable to make them as wide as possible -- i.e. to increase tiles_in_thread.h as long as possible while keeping tiles_in_thread.v at 1 -- because it gives better locality of reference. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:65: ThrowProgramError("num_tiles_per_thread calculation is wrong."); On 2016/01/26 21:34:03, msarett wrote: > Can we signal an error here without throwing an exception? (Or maybe not, > because this is used in overriding a function in the dng_sdk?) > > If it's code that will never be reached, you might just omit the else case? > > Or maybe more Skia-like: > SkASSERT(false); > return {0,0}; It might be reached on program errors or invalid data. This prevents the function from producing erroneous results if the caller should ever pass in max_threads == 0. I think I like the assert. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:76: const int maxThreads = static_cast<int>(kMaxMPThreads); On 2016/01/26 21:34:03, msarett wrote: > So this is 32 on 64-bit systems and 8 on 32-bit systems? > > Can you explain why we need to get this value from the dng_sdk? I think we should usetask.MaxThreads(), because the task that's passed in to PerformAreaTask() may request to be processed on a smaller number of threads, though it seems to be either 8 or 32. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:79: const dng_point tileSize(task.FindTileSize(area)); On 2016/01/26 21:34:03, msarett wrote: > Can you add a comment explaining how the dng_sdk determines the tile size? > > Side note: It would be useful for me to know some typical values of "area" and > "tileSize" (and how those compare to the image size). I've looked at this > function in dng_sdk and find it a little confusing. area is typically something like (4000, 3000) while tileSize is (256, 256). I have always seen tileSize to be that which is maxTileSize in the dng sdk. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:108: uint32 PerformAreaTaskThreads() override { On 2016/01/26 21:34:03, msarett wrote: > I'm not sure how this is used, but the comments in dng_host.h make it seem like > this should return the actual number of threads (rather than the max). The documentation for PerformAreaTaskThreads() isn't completely clear, but my understanding is that it's supposed to return the maximum number of threads that PerformAreaTask() is potentially able to use. (This also makes sense given that PerformAreaTaskThreads() is used in multiple places to check whether multithreading is possible or not.)
The CQ bit was checked by adaubert@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1634763002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1634763002/120001
Note for Reviewers: The CQ is waiting for an approval. If you believe that the CL is not ready yet, or if you would like to L-G-T-M with comments then please uncheck the CQ checkbox. Waiting for LGTM from valid reviewer(s) till 2016-01-27 19:25 UTC
Just a few nits. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:60: if (tilesInThread.h < tilesInArea.h) { On 2016/01/26 22:50:58, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > So, when possible, we will always split into tiles horizontally (as much as > > possible) before splitting into tiles vertically? > > > > Is there a reason for this? Or does it not matter? > > I think it's preferable to make them as wide as possible -- i.e. to increase > tiles_in_thread.h as long as possible while keeping tiles_in_thread.v at 1 -- > because it gives better locality of reference. Great! Can you add a comment explaining this? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:65: ThrowProgramError("num_tiles_per_thread calculation is wrong."); On 2016/01/26 22:50:59, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > Can we signal an error here without throwing an exception? (Or maybe not, > > because this is used in overriding a function in the dng_sdk?) > > > > If it's code that will never be reached, you might just omit the else case? > > > > Or maybe more Skia-like: > > SkASSERT(false); > > return {0,0}; > It might be reached on program errors or invalid data. This prevents the > function from producing erroneous results if the caller should ever pass in > max_threads == 0. I think I like the assert. Sorry for being unclear here. If reaching this code indicates programmer error, then let's use SkASSERT. This will assert in Debug mode and do nothing in Release mode. If reaching this code might indicate invalid input data, we need to indicate that the decode failed. AFAICT, the only way to do this is using an exception? That we will catch in SkRawCodec and return a failure? https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:76: const int maxThreads = static_cast<int>(kMaxMPThreads); On 2016/01/26 22:50:59, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > So this is 32 on 64-bit systems and 8 on 32-bit systems? > > > > Can you explain why we need to get this value from the dng_sdk? > > I think we should usetask.MaxThreads(), because the task that's passed in to > PerformAreaTask() may request to be processed on a smaller number of threads, > though it seems to be either 8 or 32. sgtm. Thanks for the comment. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:79: const dng_point tileSize(task.FindTileSize(area)); On 2016/01/26 22:50:58, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > Can you add a comment explaining how the dng_sdk determines the tile size? > > > > Side note: It would be useful for me to know some typical values of "area" and > > "tileSize" (and how those compare to the image size). I've looked at this > > function in dng_sdk and find it a little confusing. > > area is typically something like (4000, 3000) while tileSize is (256, 256). I > have always seen tileSize to be that which is maxTileSize in the dng sdk. Can you add a comment that is something like this? // tileSize is typically 256x256. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:108: uint32 PerformAreaTaskThreads() override { On 2016/01/26 22:50:58, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > I'm not sure how this is used, but the comments in dng_host.h make it seem > like > > this should return the actual number of threads (rather than the max). > > The documentation for PerformAreaTaskThreads() isn't completely clear, but my > understanding is that it's supposed to return the maximum number of threads that > PerformAreaTask() is potentially able to use. (This also makes sense given that > PerformAreaTaskThreads() is used in multiple places to check whether > multithreading is possible or not.) Yeah I noticed that they mostly compare it to 1, so we definitely need to return a value greater than 1. They also seem to use it to determine the size of the AreaTask. I don't have insight into how this might matter, but I think the code looks fine as is.
https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cpp File src/codec/SkRawCodec.cpp (right): https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:44: return dng_point((areaSize.v + tileSize.v - 1) / tileSize.v, On 2016/01/26 22:50:59, ebrauer wrote: > On 2016/01/26 21:34:03, msarett wrote: > > nit: > > > > This is non-critical, but can we use a ceil_div() helper function? SkGifCodec > > actually already has one. Can we share it in SkCodecPriv.h? > > Would it be Ok to add a Fixit? Done. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:60: if (tilesInThread.h < tilesInArea.h) { On 2016/01/27 15:34:42, msarett wrote: > On 2016/01/26 22:50:58, ebrauer wrote: > > On 2016/01/26 21:34:03, msarett wrote: > > > So, when possible, we will always split into tiles horizontally (as much as > > > possible) before splitting into tiles vertically? > > > > > > Is there a reason for this? Or does it not matter? > > > > I think it's preferable to make them as wide as possible -- i.e. to increase > > tiles_in_thread.h as long as possible while keeping tiles_in_thread.v at 1 -- > > because it gives better locality of reference. > > Great! Can you add a comment explaining this? Done. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:65: ThrowProgramError("num_tiles_per_thread calculation is wrong."); On 2016/01/27 15:34:42, msarett wrote: > On 2016/01/26 22:50:59, ebrauer wrote: > > On 2016/01/26 21:34:03, msarett wrote: > > > Can we signal an error here without throwing an exception? (Or maybe not, > > > because this is used in overriding a function in the dng_sdk?) > > > > > > If it's code that will never be reached, you might just omit the else case? > > > > > > Or maybe more Skia-like: > > > SkASSERT(false); > > > return {0,0}; > > It might be reached on program errors or invalid data. This prevents the > > function from producing erroneous results if the caller should ever pass in > > max_threads == 0. I think I like the assert. > > Sorry for being unclear here. > > If reaching this code indicates programmer error, then let's use SkASSERT. This > will assert in Debug mode and do nothing in Release mode. > > If reaching this code might indicate invalid input data, we need to indicate > that the decode failed. AFAICT, the only way to do this is using an exception? > That we will catch in SkRawCodec and return a failure? Discussed this with Anton and we agreed that it is better to be safe here and throw. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:76: const int maxThreads = static_cast<int>(kMaxMPThreads); On 2016/01/27 15:34:42, msarett wrote: > On 2016/01/26 22:50:59, ebrauer wrote: > > On 2016/01/26 21:34:03, msarett wrote: > > > So this is 32 on 64-bit systems and 8 on 32-bit systems? > > > > > > Can you explain why we need to get this value from the dng_sdk? > > > > I think we should usetask.MaxThreads(), because the task that's passed in to > > PerformAreaTask() may request to be processed on a smaller number of threads, > > though it seems to be either 8 or 32. > > sgtm. Thanks for the comment. Done. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:79: const dng_point tileSize(task.FindTileSize(area)); On 2016/01/27 15:34:42, msarett wrote: > On 2016/01/26 22:50:58, ebrauer wrote: > > On 2016/01/26 21:34:03, msarett wrote: > > > Can you add a comment explaining how the dng_sdk determines the tile size? > > > > > > Side note: It would be useful for me to know some typical values of "area" > and > > > "tileSize" (and how those compare to the image size). I've looked at this > > > function in dng_sdk and find it a little confusing. > > > > area is typically something like (4000, 3000) while tileSize is (256, 256). I > > have always seen tileSize to be that which is maxTileSize in the dng sdk. > > Can you add a comment that is something like this? > > // tileSize is typically 256x256. Done. https://codereview.chromium.org/1634763002/diff/80001/src/codec/SkRawCodec.cp... src/codec/SkRawCodec.cpp:108: uint32 PerformAreaTaskThreads() override { On 2016/01/27 15:34:42, msarett wrote: > On 2016/01/26 22:50:58, ebrauer wrote: > > On 2016/01/26 21:34:03, msarett wrote: > > > I'm not sure how this is used, but the comments in dng_host.h make it seem > > like > > > this should return the actual number of threads (rather than the max). > > > > The documentation for PerformAreaTaskThreads() isn't completely clear, but my > > understanding is that it's supposed to return the maximum number of threads > that > > PerformAreaTask() is potentially able to use. (This also makes sense given > that > > PerformAreaTaskThreads() is used in multiple places to check whether > > multithreading is possible or not.) > > Yeah I noticed that they mostly compare it to 1, so we definitely need to return > a value greater than 1. > > They also seem to use it to determine the size of the AreaTask. > > I don't have insight into how this might matter, but I think the code looks fine > as is. Acknowledged.
Description was changed from ========== WIP: This is how I like to do the multithreaded dng host in principal. Unfortunately the dm results in scrambled images. Can that happen due to my use of SkTaskGroup? Please comment and guide me. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== It derives the dng_host and overrides PerformAreaTask() to split the task into several sub-tasks which get added to SkTaskGroup. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
lgtm :)
The CQ bit was checked by ebrauer@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1634763002/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1634763002/160001
Message was sent while issue was closed.
Description was changed from ========== It derives the dng_host and overrides PerformAreaTask() to split the task into several sub-tasks which get added to SkTaskGroup. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== It derives the dng_host and overrides PerformAreaTask() to split the task into several sub-tasks which get added to SkTaskGroup. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/b84b5b42c1d848aa2b87216b9bda2b8c9e5781c1 ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as https://skia.googlesource.com/skia/+/b84b5b42c1d848aa2b87216b9bda2b8c9e5781c1 |