|
|
Created:
5 years, 11 months ago by Robert Sesek Modified:
5 years, 11 months ago Reviewers:
Mark Mentovai CC:
crashpad-dev_chromium.org Base URL:
https://chromium.googlesource.com/crashpad/crashpad@master Target Ref:
refs/heads/master Project:
crashpad Visibility:
Public. |
DescriptionCreate CrashReportDatabase interface, a test, and a Mac implementation.
R=mark@chromium.org
TEST=client_test --gtest_filter=CrashReportDatabase\*
Committed: https://chromium.googlesource.com/crashpad/crashpad/+/ee9844975570e6f9e06451ed7971c3bdc21487a8
Patch Set 1 #
Total comments: 1
Patch Set 2 : #
Total comments: 74
Patch Set 3 : Address comments #Patch Set 4 : Fix 80cols #
Total comments: 24
Patch Set 5 : Address comments #
Messages
Total messages: 12 (4 generated)
https://codereview.chromium.org/842513002/diff/1/client/crash_report_database... File client/crash_report_database_test.cc (right): https://codereview.chromium.org/842513002/diff/1/client/crash_report_database... client/crash_report_database_test.cc:333: // Fail to upload one report. After this, things could be more thorough. I didn't want to write it, though, and I suspect you don't want to read it.
Patchset #2 (id:20001) has been deleted
Patchset #2 (id:40001) has been deleted
https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... File client/crash_report_database.h (right): https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:1: // Copyright 2015 The Crashpad Authors. All rights reserved. Top-level question: what do you do about reports that we collect while uploads are turned off? https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:33: //! All \a Report objects that are returned by this class are logically const. You don’t need the \a, this gets linked automatically and that’s all of the special formatting we need. Same on line 41. http://www.doxygen.org/manual/autolink.html#linkclass \a is best for parameters, which aren’t detected in prose normally, and need special treatment to be formatted differently. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:34: //! They are snapshots of the database at the time the query was ran, and the run https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:41: //! of the report itself. A \a CrashReportDatabase maintains at least this No \a, it gets linked automatically. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:44: Report() : uuid(), file_path(), id(), uploaded(false), I wouldn’t inline this. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:47: // A unique identifier by which this report will always be known to the //! doxygen me, throughout this struct. Looks like you meant to do that, because I see \a in here (which should be changed to #, read on…) https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:51: // The current location of the crash report on the client's filesystem. Your quotes aren’t too smart (but if you find it hard to type ’em smart, it’s OK.) https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:65: // been uploaded. If \a uploaded is true, then this timestamp is the time #uploaded https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:71: // a collection server. If this is more than zero, then \a #last_upload_attempt_time. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:107: //! \param[in] path A path to a writable directory, where the database can The database doesn’t have to be a directory as far as this interface is concerned, does it? https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:107: //! \param[in] path A path to a writable directory, where the database can It’s kinda weird for the interface to take the parent pathname instead of the actual pathname. Think about it from the perspective of other hypothetical implementations of this interface, like one that stuffs everything in a sqlite3 database. You’d probably want to say “use the crash report database at /home/rsesek/crashes” and if you were using the dir/xattr-based implementation, have that mean that /home/rsesek/crashes should be the directory that has the three subdirectories. If you were using the sqlite3 one, you’d probably want it to mean /home/rsesek/crashes.db. I don’t see a good reason to mandate a directory named “Crashpad Database” at the level of this interface or the dir/xattr-based implementation. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:110: //! \return A database object on success, NULL on failure with an error `nullptr` https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:120: //! \param[out] report A crash report record. …only valid if this returns OperationStatus::kNoError. (Right?) Same for several other methods. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:148: //! empty. must be empty on entry. Same for GetUploadedReports. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... File client/crash_report_database_mac.mm (right): https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:20: #include <sys/errno.h> Just <errno.h> https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:40: const char kWriteDirectory[] = "In Progress"; If I saw these three directories side-by-side on the disk, I wouldn’t know the difference between “reports” and “in progress.” All three directories contain reports. It’s not obvious that “in progress” should mean “being written” or “being uploaded” or “needs upload.” Also, spaceless lowercase names are easier to deal with from a shell for things that don’t really need to be user-visible. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:69: class CrashReportDatabaseMac : public CrashReportDatabase { Either this file or this class should have some comments explaining interesting bits of implementation in English, like the life-cycle of things on disk, what the flocks do, etc. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:139: if (mkdir(base_dir_.value().c_str(), S_IRWXU | S_IRWXG)) { Why are you giving group (but not other) permission (especially write)? I could see 0700 or 0755 here. Same on line 160. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:154: if ([file_manager fileExistsAtPath:base::SysUTF8ToNSString(path.value()) You’ve got this pattern twice now, maybe you want to extract it into a helper that ensures its path argument exists and is a directory, and attempts to create it otherwise. The safest way to implement it is, I think: if (mkdir(…) == 0) { return true; } else if (errno == EEXIST) { if (stat(...) == 0) { // For this, call stat(), not lstat() if (S_ISDIR(stat_buf.st_mode)) { return true; } LOG(ERROR) << "not a dir"; } else { PLOG(ERROR) << "stat"; } } else { PLOG(ERROR) << "mkdir"; } return false; How is it different than checking existence and type first? Atomicity. This way lets two guys come along and try to initialize the same directory simultaneously. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:167: // them. This attribute will never be read. Smart! https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:186: if (![manager createFileAtPath:base::SysUTF8ToNSString(placeholder.value()) Honestly, I was kinda hoping that the on-disk filenames would be a little friendlier, with a timestamp, pid, and maybe something in the crashy app’s control that Chrome would stuff with ptype, so that it’d be easy to look at a dump_dir and know quickly which crash was responsible for yesterday morning’s sad tab. But these are hopes and dreams. Let’s go with the UUIDs for now. They do smell like Breakpad, but we can revisit later. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:186: if (![manager createFileAtPath:base::SysUTF8ToNSString(placeholder.value()) This doesn’t have O_EXCL semantics, but it probably ought to. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:194: report->file_path = write_dir.Append( Why is there a separate placeholder file? Why don’t you just have a single file in the write_dir and have the client write to that? I think, ideally: 1. PrepareNewCrashReport() makes the new file with O_WRONLY | O_CREAT | O_EXCL | O_EXLOCK. 2. The dumpers write straight to this file. (Ideally, the client would just get a FD, not a path.) 3. FinishedWritingCrashReport() sets xattrs, renames it to the gotta-upload directory, and unlocks it. … 0. (for the future, no need to add it to this CL which is already long) On startup, maybe on a background thread, look at everything in the write_dir. Anything that’s in there and that isn’t locked (you can “trylock” with LOCK_EX | LOCK_NB) should be removed, because it’s a leftover turd from a dumper that never finished and no longer exists. It’s nice to clean up after ourselves. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:216: PLOG(ERROR) << "unlink " << placeholder.value(); For things that you’re tolerating, WARNING is probably better. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:223: if (![manager fileExistsAtPath:base::SysUTF8ToNSString(report_path.value()) I don’t know if you need any of these file-exists checks. The WriteXattr and rename would fail with obvious messages if the report isn’t there. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:260: return kDatabaseError; The interface documentation said that this return code is supposed to log something, but it doesn’t. (ReadReportMetadataLocked() never logs anything.) https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:287: ScopedFileLock lock(ObtainReportLock(report_path)); The file needs to stay locked while the uploader is trying to upload it, otherwise someone else can come along and try to upload it at the same time. This scoper here makes me think that the locking isn’t quite right. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:317: if (!WriteXattrTimeT(report_path, I’d set the time_t before doing the upload_attempts maintenance. The upload_attempts thing can fail if the existing upload_attempts xattr isn’t an int, and that’s fine, but at least you’ll have stored the time_t first in case anyone cares. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:332: [[NSFileManager defaultManager] enumeratorAtPath:path_ns_string]; Directory enumeration is kind of heavy. Why isn’t this just looking for the file at its two known locations? (That’d also make it easy to have a consistency check here: it shouldn’t exist in both locations.) https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:355: PCHECK(fd >= 0) << "open lock " << path.value(); CHECK is a little harsh here: 1. The filesystem might not support locks. OK, so maybe you’d say “we don’t support running on that filesystem right now, so doing anything other than a CHECK wouldn’t help.” Maybe. More importantly… 2. The whole point of the locking is to keep multiple invocations of this code from stepping on each other, but there’s no guarantee on anything about |path| until the lock is actually obtained. There isn’t even a guarantee that anything will continue to exist at |path|. So if two of these guys are working in the same directory at once, and one of them is actively working on a report and decides to rename it right as the other one tries to take a lock on it, then they’ll race and the second one could fail and abort. There’s another problem here: what happens when you do an open with O_EXLOCK? Does the kernel treat the open and lock as a single atomic operation, or does it open the file at its path and then block to take a lock? I think it’s the latter, and that’s also bad. Here’s why: if the same two guys are working in the same directory, and are trying to upload a report, the first one’s holding a lock and uploading the report, and then the second one comes along and also tries to take a lock. It blocks. The first guy then renames the file, let’s say because the upload succeeded, and it does some bookkeeping and unlocks it. Now the second guy wins the lock he was waiting for, but he’s under the mistaken impression that this report hasn’t been uploaded yet. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:359: bool CrashReportDatabaseMac::ReadReportMetadataLocked( This one can be static. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:366: if (!HasXattr(path, XattrName(kXattrCollectorID))) { HasXattr(), ReadXattr(), HasXattr(), ReadXattr(), and so on. “test for something” and “get something” is a sign that the interface isn’t doing all that it should. The xattr interface should have a way to handle this. It’d be better if you only had to call getxattr() once and were able to distinguish between success, ENOATTR, and other failures. Maybe it should return a 3-state enum. Perhaps this would render HasXattr() unnecessary altogether. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:394: CrashReportDatabaseMac::ReportsInDirectory( This one can be static too. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:402: error:&error]; Not really sure that this is any better than opendir()/readdir()/closedir(), since you need a loop to stick things into the vector anyway. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:415: LOG(ERROR) << "Failed to read report metadata for " Since you continue on past this, it’s more of a WARNING than an ERROR. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:427: return base::StringPrintf("com.google.crashpad.%s", name.data()); org.googlecode https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:437: NSFileManager* file_manager = [NSFileManager defaultManager]; I don’t think you need this check, because CrashReportDatabaseMac will do the same thing. https://codereview.chromium.org/842513002/diff/60001/util/mac/xattr.cc File util/mac/xattr.cc (right): https://codereview.chromium.org/842513002/diff/60001/util/mac/xattr.cc#newcode30 util/mac/xattr.cc:30: if (getxattr(file.value().c_str(), name.data(), nullptr, 0, 0, 0) < 0) { Especially since you just wind up calling getxattr() twice on the same file for the same attribute. Seems unnecessary. https://codereview.chromium.org/842513002/diff/60001/util/mac/xattr.h File util/mac/xattr.h (right): https://codereview.chromium.org/842513002/diff/60001/util/mac/xattr.h#newcode27 util/mac/xattr.h:27: //! \brief Tests if an extended attribute is present on a file. Meh. I’d rather the ReadXattr-family functions be extended to return a 3-state enum.
https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... File client/crash_report_database_mac.mm (right): https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:427: return base::StringPrintf("com.google.crashpad.%s", name.data()); I wrote: > org.googlecode com.googlecode, I mean.
Patchset #3 (id:80001) has been deleted
Patchset #3 (id:100001) has been deleted
https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... File client/crash_report_database.h (right): https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:1: // Copyright 2015 The Crashpad Authors. All rights reserved. On 2015/01/08 22:38:09, Mark Mentovai wrote: > Top-level question: what do you do about reports that we collect while uploads > are turned off? They can be retrieved with GetReportsNotUploaded(), and using the (newly added) creation time, the uploader can choose to upload recent ones. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:33: //! All \a Report objects that are returned by this class are logically const. On 2015/01/08 22:38:09, Mark Mentovai wrote: > You don’t need the \a, this gets linked automatically and that’s all of the > special formatting we need. Same on line 41. > http://www.doxygen.org/manual/autolink.html#linkclass > > \a is best for parameters, which aren’t detected in prose normally, and need > special treatment to be formatted differently. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:34: //! They are snapshots of the database at the time the query was ran, and the On 2015/01/08 22:38:09, Mark Mentovai wrote: > run Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:41: //! of the report itself. A \a CrashReportDatabase maintains at least this On 2015/01/08 22:38:09, Mark Mentovai wrote: > No \a, it gets linked automatically. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:44: Report() : uuid(), file_path(), id(), uploaded(false), On 2015/01/08 22:38:09, Mark Mentovai wrote: > I wouldn’t inline this. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:47: // A unique identifier by which this report will always be known to the On 2015/01/08 22:38:09, Mark Mentovai wrote: > //! doxygen me, throughout this struct. > > Looks like you meant to do that, because I see \a in here (which should be > changed to #, read on…) Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:51: // The current location of the crash report on the client's filesystem. On 2015/01/08 22:38:09, Mark Mentovai wrote: > Your quotes aren’t too smart (but if you find it hard to type ’em smart, it’s > OK.) Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:65: // been uploaded. If \a uploaded is true, then this timestamp is the time On 2015/01/08 22:38:09, Mark Mentovai wrote: > #uploaded Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:71: // a collection server. If this is more than zero, then \a On 2015/01/08 22:38:09, Mark Mentovai wrote: > #last_upload_attempt_time. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:107: //! \param[in] path A path to a writable directory, where the database can On 2015/01/08 22:38:09, Mark Mentovai wrote: > The database doesn’t have to be a directory as far as this interface is > concerned, does it? No. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:107: //! \param[in] path A path to a writable directory, where the database can On 2015/01/08 22:38:09, Mark Mentovai wrote: > It’s kinda weird for the interface to take the parent pathname instead of the > actual pathname. > > Think about it from the perspective of other hypothetical implementations of > this interface, like one that stuffs everything in a sqlite3 database. You’d > probably want to say “use the crash report database at /home/rsesek/crashes” and > if you were using the dir/xattr-based implementation, have that mean that > /home/rsesek/crashes should be the directory that has the three subdirectories. > If you were using the sqlite3 one, you’d probably want it to mean > /home/rsesek/crashes.db. > > I don’t see a good reason to mandate a directory named “Crashpad Database” at > the level of this interface or the dir/xattr-based implementation. I disagree. The caller does not know the structure nor name of the database, so it should pass the directory to where it wants the database set up. In Chrome, this would be the user data dir, for example. Even if the database were SQLite-backed, the database may still require a directory for files. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:110: //! \return A database object on success, NULL on failure with an error On 2015/01/08 22:38:09, Mark Mentovai wrote: > `nullptr` Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:120: //! \param[out] report A crash report record. On 2015/01/08 22:38:09, Mark Mentovai wrote: > …only valid if this returns OperationStatus::kNoError. (Right?) Same for several > other methods. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database.h:148: //! empty. On 2015/01/08 22:38:09, Mark Mentovai wrote: > must be empty on entry. > > Same for GetUploadedReports. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... File client/crash_report_database_mac.mm (right): https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:20: #include <sys/errno.h> On 2015/01/08 22:38:10, Mark Mentovai wrote: > Just <errno.h> Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:40: const char kWriteDirectory[] = "In Progress"; On 2015/01/08 22:38:10, Mark Mentovai wrote: > If I saw these three directories side-by-side on the disk, I wouldn’t know the > difference between “reports” and “in progress.” All three directories contain > reports. It’s not obvious that “in progress” should mean “being written” or > “being uploaded” or “needs upload.” > > Also, spaceless lowercase names are easier to deal with from a shell for things > that don’t really need to be user-visible. This seems self-contradictory to say that it should be obvious yet it's also not user-visible. But changed the names. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:69: class CrashReportDatabaseMac : public CrashReportDatabase { On 2015/01/08 22:38:10, Mark Mentovai wrote: > Either this file or this class should have some comments explaining interesting > bits of implementation in English, like the life-cycle of things on disk, what > the flocks do, etc. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:139: if (mkdir(base_dir_.value().c_str(), S_IRWXU | S_IRWXG)) { On 2015/01/08 22:38:10, Mark Mentovai wrote: > Why are you giving group (but not other) permission (especially write)? I could > see 0700 or 0755 here. Same on line 160. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:154: if ([file_manager fileExistsAtPath:base::SysUTF8ToNSString(path.value()) On 2015/01/08 22:38:10, Mark Mentovai wrote: > You’ve got this pattern twice now, maybe you want to extract it into a helper > that ensures its path argument exists and is a directory, and attempts to create > it otherwise. > > The safest way to implement it is, I think: > > if (mkdir(…) == 0) { > return true; > } else if (errno == EEXIST) { > if (stat(...) == 0) { // For this, call stat(), not lstat() > if (S_ISDIR(stat_buf.st_mode)) { > return true; > } > LOG(ERROR) << "not a dir"; > } else { > PLOG(ERROR) << "stat"; > } > } else { > PLOG(ERROR) << "mkdir"; > } > return false; > > How is it different than checking existence and type first? Atomicity. This way > lets two guys come along and try to initialize the same directory > simultaneously. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:186: if (![manager createFileAtPath:base::SysUTF8ToNSString(placeholder.value()) On 2015/01/08 22:38:10, Mark Mentovai wrote: > Honestly, I was kinda hoping that the on-disk filenames would be a little > friendlier, with a timestamp, pid, and maybe something in the crashy app’s > control that Chrome would stuff with ptype, so that it’d be easy to look at a > dump_dir and know quickly which crash was responsible for yesterday morning’s > sad tab. > > But these are hopes and dreams. Let’s go with the UUIDs for now. They do smell > like Breakpad, but we can revisit later. I'm not sure the PID is useful. Timestamp definitely would be, but that'd be recorded in the filesystem. I hadn't thought a user-supplied string. But it seems like displaying these reports would be better handled by the embedding application (e.g. with chrome://crashes). https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:194: report->file_path = write_dir.Append( On 2015/01/08 22:38:10, Mark Mentovai wrote: > Why is there a separate placeholder file? Why don’t you just have a single file > in the write_dir and have the client write to that? > > I think, ideally: > > 1. PrepareNewCrashReport() makes the new file with O_WRONLY | O_CREAT | O_EXCL | > O_EXLOCK. > 2. The dumpers write straight to this file. (Ideally, the client would just get > a FD, not a path.) > 3. FinishedWritingCrashReport() sets xattrs, renames it to the gotta-upload > directory, and unlocks it. > … > 0. (for the future, no need to add it to this CL which is already long) On > startup, maybe on a background thread, look at everything in the write_dir. > Anything that’s in there and that isn’t locked (you can “trylock” with LOCK_EX | > LOCK_NB) should be removed, because it’s a leftover turd from a dumper that > never finished and no longer exists. It’s nice to clean up after ourselves. I'm glad you suggested returning the FD. I went back and forth on whether this should return the FileHandle or not, so will happily make it so. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:223: if (![manager fileExistsAtPath:base::SysUTF8ToNSString(report_path.value()) On 2015/01/08 22:38:10, Mark Mentovai wrote: > I don’t know if you need any of these file-exists checks. The WriteXattr and > rename would fail with obvious messages if the report isn’t there. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:260: return kDatabaseError; On 2015/01/08 22:38:10, Mark Mentovai wrote: > The interface documentation said that this return code is supposed to log > something, but it doesn’t. (ReadReportMetadataLocked() never logs anything.) ReadXattr and friends, do, though. Do you want more explicit logging than that? https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:287: ScopedFileLock lock(ObtainReportLock(report_path)); On 2015/01/08 22:38:10, Mark Mentovai wrote: > The file needs to stay locked while the uploader is trying to upload it, > otherwise someone else can come along and try to upload it at the same time. > This scoper here makes me think that the locking isn’t quite right. Done. Now the caller has a two-phase call like with generating a new crash report. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:317: if (!WriteXattrTimeT(report_path, On 2015/01/08 22:38:10, Mark Mentovai wrote: > I’d set the time_t before doing the upload_attempts maintenance. The > upload_attempts thing can fail if the existing upload_attempts xattr isn’t an > int, and that’s fine, but at least you’ll have stored the time_t first in case > anyone cares. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:332: [[NSFileManager defaultManager] enumeratorAtPath:path_ns_string]; On 2015/01/08 22:38:09, Mark Mentovai wrote: > Directory enumeration is kind of heavy. Why isn’t this just looking for the file > at its two known locations? > > (That’d also make it easy to have a consistency check here: it shouldn’t exist > in both locations.) Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:355: PCHECK(fd >= 0) << "open lock " << path.value(); On 2015/01/08 22:38:10, Mark Mentovai wrote: > CHECK is a little harsh here: > > 1. The filesystem might not support locks. OK, so maybe you’d say “we don’t > support running on that filesystem right now, so doing anything other than a > CHECK wouldn’t help.” Maybe. > > More importantly… > > 2. The whole point of the locking is to keep multiple invocations of this code > from stepping on each other, but there’s no guarantee on anything about |path| > until the lock is actually obtained. There isn’t even a guarantee that anything > will continue to exist at |path|. So if two of these guys are working in the > same directory at once, and one of them is actively working on a report and > decides to rename it right as the other one tries to take a lock on it, then > they’ll race and the second one could fail and abort. > > There’s another problem here: what happens when you do an open with O_EXLOCK? > Does the kernel treat the open and lock as a single atomic operation, or does it > open the file at its path and then block to take a lock? I think it’s the > latter, and that’s also bad. Here’s why: if the same two guys are working in the > same directory, and are trying to upload a report, the first one’s holding a > lock and uploading the report, and then the second one comes along and also > tries to take a lock. It blocks. The first guy then renames the file, let’s say > because the upload succeeded, and it does some bookkeeping and unlocks it. Now > the second guy wins the lock he was waiting for, but he’s under the mistaken > impression that this report hasn’t been uploaded yet. 1) Callers now check the result of ObtainReportLock() and return an error if that it fails. 2) I don't think the atomicity of open with O_EXLOCK is actually relevant, but the blocking nature is. In the scenario you outline, the process would error out when reading extended attributes because the file would have moved, but it's wrong to force this (indirect) sanity check on the caller. I switched this open to be O_NONBLOCK, which, when combined with the new result checking, causes this to error out early in a safe manner. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:359: bool CrashReportDatabaseMac::ReadReportMetadataLocked( On 2015/01/08 22:38:09, Mark Mentovai wrote: > This one can be static. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:366: if (!HasXattr(path, XattrName(kXattrCollectorID))) { On 2015/01/08 22:38:10, Mark Mentovai wrote: > HasXattr(), ReadXattr(), HasXattr(), ReadXattr(), and so on. > > “test for something” and “get something” is a sign that the interface isn’t > doing all that it should. > > The xattr interface should have a way to handle this. It’d be better if you only > had to call getxattr() once and were able to distinguish between success, > ENOATTR, and other failures. Maybe it should return a 3-state enum. > > Perhaps this would render HasXattr() unnecessary altogether. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:394: CrashReportDatabaseMac::ReportsInDirectory( On 2015/01/08 22:38:09, Mark Mentovai wrote: > This one can be static too. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:402: error:&error]; On 2015/01/08 22:38:10, Mark Mentovai wrote: > Not really sure that this is any better than opendir()/readdir()/closedir(), > since you need a loop to stick things into the vector anyway. I don't have to manage the DIR* or handle dot-dirs. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:415: LOG(ERROR) << "Failed to read report metadata for " On 2015/01/08 22:38:09, Mark Mentovai wrote: > Since you continue on past this, it’s more of a WARNING than an ERROR. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:427: return base::StringPrintf("com.google.crashpad.%s", name.data()); On 2015/01/09 14:26:09, Mark Mentovai wrote: > I wrote: > > org.googlecode > > com.googlecode, I mean. Done. https://codereview.chromium.org/842513002/diff/60001/client/crash_report_data... client/crash_report_database_mac.mm:437: NSFileManager* file_manager = [NSFileManager defaultManager]; On 2015/01/08 22:38:10, Mark Mentovai wrote: > I don’t think you need this check, because CrashReportDatabaseMac will do the > same thing. Done.
https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... File client/crash_report_database.h (right): https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:105: //! \brief The operation could not be completed because a concurrent A concurrent what? https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:128: //! written. Only valid if this returns #kNoError. The caller should not should→must. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:137: //! the file at Record::file_path. I don’t know Record::, only Report::. Lines 177 and 197 too. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:163: virtual OperationStatus GetNotUploadedReports( The only real major problem I see with this now is that there’s really only a distinction between “not uploaded” and “uploaded.” In practice, I think that these terms will be confusing: We shouldn’t upload reports for with uploads turned off. If they subsequently turn uploads on, we shouldn’t upload reports that we collected while uploads were turned off. The reports should graduate to a “done” state somewhat rapidly. You’ve got a comment in the .cc file indicating that this is how it works, but I don’t think we should keep everything confused by perpetuating the “uploaded” naming. When uploads are on, we shouldn’t keep attempting to upload a report indefinitely. After a certain number of attempts, we should give up. These, too, should graduate to a “done” state. We’ve got three possible top-level states: - report is being written - report has been written - report is “done” In the done state, a report can have a server ID attribute if it was uploaded successfully, some error attribute if it wasn’t ever able to be uploaded after a number of attempts, or no attribute at all (or a different attribute) if it was set aside without an upload attempt due to user preference. These three states map pretty well to the directories you’ve already got, which should maybe be named “tmp”, (“pending” or “written”), and “done”. More importantly, the naming needs to bubble through to the method names like these. GetNotUploadedReports(), this method, is almost OK, except it makes it sound like the report is definitely going to be uploaded, which may not be the case if the user preference is to not do so. GetReportForUploading() makes it even worse. And if a report did graduate from pending to done without upload, GetUploadedReports() doesn’t really do what it says. Tying it all up: I expect the client of your class will be responsible for “graduating” things between the pending and done states. Your class doesn’t need to be involved in the decision-making process of “are uploads desired?” or “should I give up trying to upload this report?” The client of your class will be in charge of these decisions. All that CrashReportDatabase needs to know is that the client might want to move something from “pending” to “done” without a successful upload and crash ID. This has implications for [the reanmed] RecordUploadAttempt(). https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:178: //! the returned Record object should be disposed of via a call to Strengthen this to “must.” https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:181: //! A subsequent call to this method with the same \a uuid is illegal if And since it’s a “must”, the “if x() has not been called first” can become “until x() has been called”. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... File client/crash_report_database_mac.mm (right): https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:71: using ScopedFileLock = base::ScopedGeneric<int, ScopedFileLockTraits>; You can just do “using ScopedFileLock = base::ScopedFD” and get rid of the duplicate traits implementation. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:82: if (stat(path.value().c_str(), &st)) { != 0 for clarity, otherwise these calls that return 0==false for success read pretty oddly. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:185: //! \return The operatino status code. spelling https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:236: 0644)); Memory dumps can contain sensitive data, let’s be more conservative here and use 0600. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:256: if (fcntl(handle, F_GETPATH, path_buf)) { First: != 0 as above. More improtantly: system calls involving MAXPATHLEN/PATH_MAX are among my least favorite, because MAXPATHLEN isn’t actually the maximum length of a path. We knew the path in PrepareNewCrashReport(), can we define a new struct that carries both the FD and the path, and use that as the baton between PrepareNewCrashReport() and FinishedWritinCrashReport(). https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:262: ScopedFileLock lock(handle); This should be the first thing in the method, regardless of how it winds up figuring out the path to operate on. Otherwise, we could have an FD leak. Comment that this takes both ownership of the lock established in the open by O_EXLOCK as well as ownership of the FD itself.
https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... File client/crash_report_database.h (right): https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:105: //! \brief The operation could not be completed because a concurrent On 2015/01/21 18:01:35, Mark Mentovai wrote: > A concurrent what? Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:128: //! written. Only valid if this returns #kNoError. The caller should not On 2015/01/21 18:01:35, Mark Mentovai wrote: > should→must. Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:137: //! the file at Record::file_path. On 2015/01/21 18:01:35, Mark Mentovai wrote: > I don’t know Record::, only Report::. Lines 177 and 197 too. Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:163: virtual OperationStatus GetNotUploadedReports( On 2015/01/21 18:01:35, Mark Mentovai wrote: > The only real major problem I see with this now is that there’s really only a > distinction between “not uploaded” and “uploaded.” In practice, I think that > these terms will be confusing: > > We shouldn’t upload reports for with uploads turned off. If they subsequently > turn uploads on, we shouldn’t upload reports that we collected while uploads > were turned off. The reports should graduate to a “done” state somewhat rapidly. > You’ve got a comment in the .cc file indicating that this is how it works, but I > don’t think we should keep everything confused by perpetuating the > “uploaded” naming. > > When uploads are on, we shouldn’t keep attempting to upload a report > indefinitely. After a certain number of attempts, we should give up. These, too, > should graduate to a “done” state. > > We’ve got three possible top-level states: > > - report is being written > - report has been written > - report is “done” > > In the done state, a report can have a server ID attribute if it was uploaded > successfully, some error attribute if it wasn’t ever able to be uploaded after a > number of attempts, or no attribute at all (or a different attribute) if it was > set aside without an upload attempt due to user preference. > > These three states map pretty well to the directories you’ve already got, which > should maybe be named “tmp”, (“pending” or “written”), and “done”. More > importantly, the naming needs to bubble through to the method names like these. > GetNotUploadedReports(), this method, is almost OK, except it makes it sound > like the report is definitely going to be uploaded, which may not be the case if > the user preference is to not do so. GetReportForUploading() makes it even > worse. And if a report did graduate from pending to done without upload, > GetUploadedReports() doesn’t really do what it says. > > Tying it all up: I expect the client of your class will be responsible for > “graduating” things between the pending and done states. Your class doesn’t need > to be involved in the decision-making process of “are uploads desired?” or > “should I give up trying to upload this report?” The client of your class will > be in charge of these decisions. All that CrashReportDatabase needs to know is > that the client might want to move something from “pending” to “done” without a > successful upload and crash ID. This has implications for [the reanmed] > RecordUploadAttempt(). Done. I've added a new SkipReportUpload() method and added some commentary about report lifecycle (new -> pending -> completed). https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:178: //! the returned Record object should be disposed of via a call to On 2015/01/21 18:01:35, Mark Mentovai wrote: > Strengthen this to “must.” Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database.h:181: //! A subsequent call to this method with the same \a uuid is illegal if On 2015/01/21 18:01:35, Mark Mentovai wrote: > And since it’s a “must”, the “if x() has not been called first” can become > “until x() has been called”. Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... File client/crash_report_database_mac.mm (right): https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:71: using ScopedFileLock = base::ScopedGeneric<int, ScopedFileLockTraits>; On 2015/01/21 18:01:35, Mark Mentovai wrote: > You can just do “using ScopedFileLock = base::ScopedFD” and get rid of the > duplicate traits implementation. Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:82: if (stat(path.value().c_str(), &st)) { On 2015/01/21 18:01:35, Mark Mentovai wrote: > != 0 for clarity, otherwise these calls that return 0==false for success read > pretty oddly. Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:185: //! \return The operatino status code. On 2015/01/21 18:01:35, Mark Mentovai wrote: > spelling Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:236: 0644)); On 2015/01/21 18:01:35, Mark Mentovai wrote: > Memory dumps can contain sensitive data, let’s be more conservative here and use > 0600. Done. That's why I originally had the directory mode not be o+r. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:256: if (fcntl(handle, F_GETPATH, path_buf)) { On 2015/01/21 18:01:35, Mark Mentovai wrote: > First: != 0 as above. > > More improtantly: system calls involving MAXPATHLEN/PATH_MAX are among my least > favorite, because MAXPATHLEN isn’t actually the maximum length of a path. We > knew the path in PrepareNewCrashReport(), can we define a new struct that > carries both the FD and the path, and use that as the baton between > PrepareNewCrashReport() and FinishedWritinCrashReport(). Done. https://codereview.chromium.org/842513002/diff/140001/client/crash_report_dat... client/crash_report_database_mac.mm:262: ScopedFileLock lock(handle); On 2015/01/21 18:01:35, Mark Mentovai wrote: > This should be the first thing in the method, regardless of how it winds up > figuring out the path to operate on. Otherwise, we could have an FD leak. > > Comment that this takes both ownership of the lock established in the open by > O_EXLOCK as well as ownership of the FD itself. Done.
Great! Let’s go with this. If we need any other changes, we should just iterate on it. LGTM
Message was sent while issue was closed.
Committed patchset #5 (id:160001) manually as ee9844975570e6f9e06451ed7971c3bdc21487a8 (presubmit successful). |