|
|
Descriptioncrypto: Load libchaps.so with RTLD_DEEPBIND
This fixes the component=shared_library build on Chrome OS. Previously,
the TPM token initialization would crash the browser because libchaps.so
loads a 2nd, incompatible copy of the protobuf-lite library into the
browser.
The crash happens because by default the global scope takes precedence,
so symbols in the bundled copy shadow symbols in the system copy. This
can lead to accessing the wrong object or calling the wrong function.
The usual symptom is a crash during static initialization of libchaps.so.
RTL_DEEPBIND rearranges the scope for libchaps.so so that the system
library takes precedence instead. The scope for chrome is unaffected.
Some other possibilities were considered:
- Unbundling the library. This would cause us to lose our local
modifications to libprotobuf on Chrome OS.
- Fixing the soname so that the bundled copy is used by chaps. This
doesn't even link because the bundled code is not ABI compatible
with upstream.
- Statically linking libchaps.so. This is tricky because we need
position-independent static libs for its deps, and we don't support
building those currently. It will also cause bloat.
This hack is minimally invasive and allows use of shared library builds
on Chrome OS devices.
BUG=175508
TEST=cros chrome-sdk --board=link
GYP_DEFINES="$GYP_DEFINES component=shared_library" gclient runhooks
ninja -C out_link/Release chrome chrome-sandbox
deploy_chrome --board link --build-dir out_link/Release --to $IP
Log into the system. No crash.
TPM-backed user certs show up in chrome://certificate-manager.
Committed: https://crrev.com/ba254c4e4a69cfa48871ab483fd3dc9c98b3582c
Cr-Commit-Position: refs/heads/master@{#329743}
Patch Set 1 #Patch Set 2 : move inside if (!tpm_args->chaps_module) {} #Patch Set 3 : move to class #
Total comments: 1
Patch Set 4 : s/Drop ours.// #Messages
Total messages: 23 (3 generated)
spang@chromium.org changed reviewers: + dkrahn@chromium.org, rsleevi@chromium.org
The shared_library build has been broken on Chrome OS for at least 2 years. I'd like to get the ball rolling on some kind of solution to this problem because it really saves a lot of time linking. chaps has been the only blocker for those 2 years. This patch is one way out. It's not very high road, but it works.
On 2015/04/29 19:28:38, spang wrote: > The shared_library build has been broken on Chrome OS for at least 2 years. I'd > like to get the ball rolling on some kind of solution to this problem because it > really saves a lot of time linking. > > chaps has been the only blocker for those 2 years. > > This patch is one way out. It's not very high road, but it works. Does RTLD_DEEPBIND have side effects that would be bad for other PKCS #11 modules? If not, would patching LoadModule to always use RTLD_DEEPBIND be an option?
On 2015/04/30 00:05:45, Darren Krahn wrote: > Does RTLD_DEEPBIND have side effects that would be bad for other PKCS #11 > modules? I mean, it seems like it. A PKCS#11 module is only supposed to export 1 and exactly 1 symbol, and bringing in other symbols from the library seems like a bad time. Wouldn't a .def file for the SO so that it only exported certain symbols resolve this? Am I missing something (caveat: I know very little about Linux linkers, beyond enough to bash them to submission) > If not, would patching LoadModule to always use RTLD_DEEPBIND be an > option?
On 2015/04/30 00:21:16, Ryan Sleevi wrote: > On 2015/04/30 00:05:45, Darren Krahn wrote: > > Does RTLD_DEEPBIND have side effects that would be bad for other PKCS #11 > > modules? I think if a module breaks from this, it'd have to be doing something pretty unusual. > > I mean, it seems like it. A PKCS#11 module is only supposed to export 1 and > exactly 1 symbol, and bringing in other symbols from the library seems like a > bad time. It's not the PKCS#11 module causing the conflict, but rather a library it depends on (libprotobuf). The protobuf public functions obviously have to be exported in order to be used. On static_library builds, we still have two copies of protobuf in the browser process, but only the system copy exports its symbols. So even without RTLD_DEEPBIND, chaps resolves symbols from the system copy as expected and everything works fine. On shared_library builds, both copies of libprotobuf necessarily export their symbols. Symbol resolution always prefers the chrome copy unless you specify RTLD_DEEPBIND, so chaps erroneously link to the bundled version, which is not only a different version but has also been modified incompatibly. So it crashes. > > Wouldn't a .def file for the SO so that it only exported certain symbols resolve > this? Am I missing something (caveat: I know very little about Linux linkers, > beyond enough to bash them to submission) > > > If not, would patching LoadModule to always use RTLD_DEEPBIND be an > > option? I tried to keep the scope as narrow as possible in this CL - it only affects the build that currently always crashes. We could try a bigger change if that's preferable.
On 2015/04/30 01:02:18, spang wrote: > It's not the PKCS#11 module causing the conflict, but rather a library it > depends on (libprotobuf). The protobuf public functions obviously have to be > exported in order to be used. I guess put differently, isn't an alternative, viable solution that libchaps statically links its dependency? This is how it's solved for other PKCS#11 modules.
On 2015/04/30 01:04:36, Ryan Sleevi wrote: > On 2015/04/30 01:02:18, spang wrote: > > It's not the PKCS#11 module causing the conflict, but rather a library it > > depends on (libprotobuf). The protobuf public functions obviously have to be > > exported in order to be used. > > I guess put differently, isn't an alternative, viable solution that libchaps > statically links its dependency? > This is how it's solved for other PKCS#11 modules. Yes, if we statically link libprotobuf-lite from chaps there will be no problem.
On 2015/04/30 01:06:50, spang wrote: > On 2015/04/30 01:04:36, Ryan Sleevi wrote: > > On 2015/04/30 01:02:18, spang wrote: > > > It's not the PKCS#11 module causing the conflict, but rather a library it > > > depends on (libprotobuf). The protobuf public functions obviously have to be > > > exported in order to be used. > > > > I guess put differently, isn't an alternative, viable solution that libchaps > > statically links its dependency? > > > This is how it's solved for other PKCS#11 modules. > > Yes, if we statically link libprotobuf-lite from chaps there will be no problem. Should we go with that approach? There will obviously be some space cost but otherwise I don't think there's any issues.
On 2015/05/04 18:42:20, spang wrote: > Should we go with that approach? There will obviously be some space cost but > otherwise I don't think there's any issues. ^ question for Darren :)
On 2015/05/05 01:10:10, Ryan Sleevi wrote: > On 2015/05/04 18:42:20, spang wrote: > > Should we go with that approach? There will obviously be some space cost but > > otherwise I don't think there's any issues. > > > ^ question for Darren :) I have no problem with this approach. As long as it works...
On 2015/05/05 17:54:22, Darren Krahn wrote: > On 2015/05/05 01:10:10, Ryan Sleevi wrote: > > On 2015/05/04 18:42:20, spang wrote: > > > Should we go with that approach? There will obviously be some space cost but > > > otherwise I don't think there's any issues. > > > > > > ^ question for Darren :) > > I have no problem with this approach. As long as it works... I looked more into this approach & managed to get a static relocatable libprotobuf-lite built and linked into libchaps.so. However, there are more problems. For one, there's a 2nd indirect dependency on libprotobuf-lite via libbase-dbus. For another, pulling in libbase is itself a problem. We shouldn't have two versions of chromium base/ in scope, either. We could build static relocatable copies of all of these objects, but then chaps will contain a third copy of all of them. This is not a usual build config so turning it on in all the necessary places is a bit tricky. Also, chaps will get quite big if it embeds a third copy of all these objects. So we have painted ourselves into a corner here - chaps is pulling a large number of duplicate libraries into chrome, including parts of chrome itself (base::). Could we land this chrome change? It works around all of these issues neatly.
On 2015/05/12 20:59:30, spang wrote: > On 2015/05/05 17:54:22, Darren Krahn wrote: > > On 2015/05/05 01:10:10, Ryan Sleevi wrote: > > > On 2015/05/04 18:42:20, spang wrote: > > > > Should we go with that approach? There will obviously be some space cost > but > > > > otherwise I don't think there's any issues. > > > > > > > > > ^ question for Darren :) > > > > I have no problem with this approach. As long as it works... > > I looked more into this approach & managed to get a static relocatable > libprotobuf-lite built and linked into libchaps.so. > > However, there are more problems. For one, there's a 2nd indirect dependency on > libprotobuf-lite via libbase-dbus. For another, pulling in libbase is itself a > problem. We shouldn't have two versions of chromium base/ in scope, either. > > We could build static relocatable copies of all of these objects, but then chaps > will contain a third copy of all of them. This is not a usual build config so > turning it on in all the necessary places is a bit tricky. Also, chaps will get > quite big if it embeds a third copy of all these objects. > > So we have painted ourselves into a corner here - chaps is pulling a large > number of duplicate libraries into chrome, including parts of chrome itself > (base::). > > Could we land this chrome change? It works around all of these issues neatly. I think either way is fine. Loading chaps is ChromeOS-specific so loading it in a way that works well on ChromeOS seems reasonable. LGTM
I'm really not keen for hacks, which this feels like, certainly in approach. This would equally be a problem for platforms that want to load Chaps outside of ChromeOS (e.g. Linux). And I honestly have no idea what sort of special hell this will introduce w/r/t component-builds, so I find it easier to say "not supported" than to chase down issues here :/ I think for me to stamp this, a couple concrete suggestions: - Move the dlopen/dlclose/#ifdefs into a helper class - Have the helper class "do something" if COMPONENT_BUILD && OS_CHROMEOS - Have the helper class noop otherwise e.g. #if defined(COMPONENT_BUILD) && defined(OS_CHROMEOS) class Foo { public: Foo() { dlopen(...); } ~Foo() { dlclose(...); } private: void* chaps_handle; DISALLOW_COPY_AND_ASSIGN(Foo); }; #else class Foo { }; #endif
On 2015/05/12 21:31:56, Ryan Sleevi wrote: > I'm really not keen for hacks, which this feels like, certainly in approach. Fully agreed, and it's marked as such in the code. The problem is organizational: there are two software distributions at play: Chrome and Chrome OS. Both ship different versions of libraries. Chrome OS even cleaves bits from Chrome and ships them separately from Chrome. It's a big headache, but there's no purely technical solution to it. > > This would equally be a problem for platforms that want to load Chaps outside of > ChromeOS (e.g. Linux). And I honestly have no idea what sort of special hell > this will introduce w/r/t component-builds, so I find it easier to say "not > supported" than to chase down issues here :/ I don't think unsupported is really satisfactory. Chrome is such a behemoth that doing a static_library build basically breaks the compile/edit/debug flow. As to how Chrome eng test on Chrome OS when its so slow, experience shows that they generally don't. That's a shame. Having an incremental workflow that is actually usable might help that. > > I think for me to stamp this, a couple concrete suggestions: > - Move the dlopen/dlclose/#ifdefs into a helper class > - Have the helper class "do something" if COMPONENT_BUILD && OS_CHROMEOS > - Have the helper class noop otherwise On it. > > e.g. > #if defined(COMPONENT_BUILD) && defined(OS_CHROMEOS) > class Foo { > public: > Foo() { dlopen(...); } > ~Foo() { dlclose(...); } > > private: > void* chaps_handle; > > DISALLOW_COPY_AND_ASSIGN(Foo); > }; > #else > class Foo { > }; > #endif
This sounds similar to some BoringSSL/system-OpenSSL issues I was worried we might start having if our dependencies are enthusiastic enough about linking in system OpenSSL. I don't have a good answer for this apart from "this is why we can never ever ship the components build for real" and "shared libraries are the worst". Linux is especially bad at symbols. (I believe every other platform can scope symbols by soname which avoids much of this nonsense.) At least without RTLD_DEEPBIND, all this globalness means the set of shared objects that a module links in is fundamentally part of the public interface of that module, which is pretty terrible. What does RTLD_DEEPBIND do exactly? The man page says "Place the lookup scope of the symbols in this library ahead of the global scope. This means that a self-contained library will use its own symbols in preference to global symbols with the same name contained in libraries that have already been loaded. This flag is not specified in POSIX.1-2001.". I gather that means that symbols within libchaps.so will use the libraries it pulls in to resolve symbols. That's definitely half of what we want. What does that do to Chromium's symbol lookup? If Chromium were then to resolve a symbol (or dlopen some other library which tries to resolve a symbol), would it also prefer libchaps.so's versions. That one isn't desirable.
On 2015/05/12 21:46:22, David Benjamin wrote: > This sounds similar to some BoringSSL/system-OpenSSL issues I was worried we > might start having if our dependencies are enthusiastic enough about linking in > system OpenSSL. > > I don't have a good answer for this apart from "this is why we can never ever > ship the components build for real" and "shared libraries are the worst". Linux > is especially bad at symbols. (I believe every other platform can scope symbols > by soname which avoids much of this nonsense.) At least without RTLD_DEEPBIND, > all this globalness means the set of shared objects that a module links in is > fundamentally part of the public interface of that module, which is pretty > terrible. > > What does RTLD_DEEPBIND do exactly? The man page says "Place the lookup scope of > the symbols in this library ahead of the global scope. This means that a > self-contained library will use its own symbols in preference to global symbols > with the same name contained in libraries that have already been loaded. This > flag is not specified in POSIX.1-2001.". I gather that means that symbols within > libchaps.so will use the libraries it pulls in to resolve symbols. That's > definitely half of what we want. Either way, it creates a new local scope and puts the target object and its dependencies into it. Without RTLD_DEEPBIND, the global scope is consulted first for symbol lookup. With RTLD_DEEPBIND, the new local scope is consulted first for symbol lookup. > > What does that do to Chromium's symbol lookup? If Chromium were then to resolve > a symbol (or dlopen some other library which tries to resolve a symbol), would > it also prefer libchaps.so's versions. That one isn't desirable. When you pass RTLD_LOCAL dlopen won't make any change to the global (chrome's) scope. So with or without RTLD_DEEPBIND symbol resolution from chrome and its dependencies are unaffected.
On 2015/05/12 21:31:56, Ryan Sleevi wrote: > I'm really not keen for hacks, which this feels like, certainly in approach. > > This would equally be a problem for platforms that want to load Chaps outside of > ChromeOS (e.g. Linux). And I honestly have no idea what sort of special hell > this will introduce w/r/t component-builds, so I find it easier to say "not > supported" than to chase down issues here :/ > > I think for me to stamp this, a couple concrete suggestions: > - Move the dlopen/dlclose/#ifdefs into a helper class > - Have the helper class "do something" if COMPONENT_BUILD && OS_CHROMEOS > - Have the helper class noop otherwise Done. > > e.g. > #if defined(COMPONENT_BUILD) && defined(OS_CHROMEOS) > class Foo { > public: > Foo() { dlopen(...); } > ~Foo() { dlclose(...); } > > private: > void* chaps_handle; > > DISALLOW_COPY_AND_ASSIGN(Foo); > }; > #else > class Foo { > }; > #endif
lgtm https://codereview.chromium.org/1111373003/diff/40001/crypto/nss_util.cc File crypto/nss_util.cc (right): https://codereview.chromium.org/1111373003/diff/40001/crypto/nss_util.cc#newc... crypto/nss_util.cc:295: // LoadModule() will have taken a 2nd reference. Drop ours. s/Drop ours.// (In general, I grumble about pronouns, but it's also a comment that doesn't add too much)
The CQ bit was checked by spang@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dkrahn@chromium.org, rsleevi@chromium.org Link to the patchset: https://codereview.chromium.org/1111373003/#ps60001 (title: "s/Drop ours.//")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1111373003/60001
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/ba254c4e4a69cfa48871ab483fd3dc9c98b3582c Cr-Commit-Position: refs/heads/master@{#329743} |