| OLD | NEW |
| (Empty) | |
| 1 LogDog Archivist |
| 2 ================ |
| 3 |
| 4 The LogDog Archivist is tasked by the **Coordinator**, and has the job of |
| 5 collecting log stream data from **Intermediate Storage** (where it was deposited |
| 6 by the **Collector**) and loading it into **Archival Storage**. It does this by |
| 7 scanning through **Intermediate Storage** for consecutive log entries and |
| 8 constructing archive files: |
| 9 |
| 10 * The Logs file, consisting of Record IO entries containing the |
| 11 `LogStreamDescriptor` protobuf followed by every `LogEntry` protobuf in the |
| 12 stream. |
| 13 * The Index file, consisting of a `LogStreamDescriptor` RecordIO entry followed |
| 14 by a `LogIndex` protobuf entry. |
| 15 * An optional Data file, consisting of the reassembled contiguous raw log stream |
| 16 data. |
| 17 |
| 18 These files are written into **Archival Storage** by the **Archivist** during |
| 19 archival. After archival is complete, the **Archivist** notifies the |
| 20 **Coordinator** and the log stream's state is updated. |
| 21 |
| 22 **Archivist** microservices are designed to operate cooperatively as part of |
| 23 a scaleable cluster. Deploying additional **Archivist** instances will linearly |
| 24 increase the archival throughput. |
| 25 |
| 26 **Archivist** instances load the global LogDog configuration, and are |
| 27 additionally configured via the `Archivist` configuration message in |
| 28 [config.proto](../../../api/config/svcconfig/config.proto). Configuration |
| 29 is loaded from the **Coordinator** and the **Configuration Service**. |
| 30 |
| 31 ## Staging |
| 32 |
| 33 Archival is initially written to a staging storage location. After the archival |
| 34 successfully completes, the staged files are moved to permanent location using |
| 35 an inexpensive rename operation. |
| 36 |
| 37 ## Incomplete Logs |
| 38 |
| 39 It is possible for log streams to be missing data at the time of archival. Each |
| 40 archival request includes a completeness threshold. If the archival request is |
| 41 younger than that threshold and the archival fails due to error or |
| 42 incompleteness, the request will be returned to the queue for future processing. |
| 43 |
| 44 If, however, the archival request is older than that threshold, a best-effort |
| 45 archival where missing logs are not considered errors will be executed. This |
| 46 will gracefully skip over any missing log entries, resulting in an incomplete |
| 47 log stream. |
| 48 |
| 49 This threshold is configured using the `archive_settle_delay` option in the |
| 50 `Coordinator` configuration message. |
| OLD | NEW |