OLD | NEW |
(Empty) | |
| 1 gRPC Connectivity Semantics and API |
| 2 =================================== |
| 3 |
| 4 This document describes the connectivity semantics for gRPC channels and the |
| 5 corresponding impact on RPCs. We then discuss an API. |
| 6 |
| 7 States of Connectivity |
| 8 ---------------------- |
| 9 |
| 10 gRPC Channels provide the abstraction over which clients can communicate with |
| 11 servers.The client-side channel object can be constructed using little more |
| 12 than a DNS name. Channels encapsulate a range of functionality including name |
| 13 resolution, establishing a TCP connection (with retries and backoff) and TLS |
| 14 handshakes. Channels can also handle errors on established connections and |
| 15 reconnect, or in the case of HTTP/2 GO_AWAY, re-resolve the name and reconnect. |
| 16 |
| 17 To hide the details of all this activity from the user of the gRPC API (i.e., |
| 18 application code) while exposing meaningful information about the state of a |
| 19 channel, we use a state machine with four states, defined below: |
| 20 |
| 21 CONNECTING: The channel is trying to establish a connection and is waiting to |
| 22 make progress on one of the steps involved in name resolution, TCP connection |
| 23 establishment or TLS handshake. This may be used as the initial state for channe
ls upon |
| 24 creation. |
| 25 |
| 26 READY: The channel has successfully established a connection all the way |
| 27 through TLS handshake (or equivalent) and all subsequent attempt to communicate |
| 28 have succeeded (or are pending without any known failure ). |
| 29 |
| 30 TRANSIENT_FAILURE: There has been some transient failure (such as a TCP 3-way |
| 31 handshake timing out or a socket error). Channels in this state will eventually |
| 32 switch to the CONNECTING state and try to establish a connection again. Since |
| 33 retries are done with exponential backoff, channels that fail to connect will |
| 34 start out spending very little time in this state but as the attempts fail |
| 35 repeatedly, the channel will spend increasingly large amounts of time in this |
| 36 state. For many non-fatal failures (e.g., TCP connection attempts timing out |
| 37 because the server is not yet available), the channel may spend increasingly |
| 38 large amounts of time in this state. |
| 39 |
| 40 IDLE: This is the state where the channel is not even trying to create a |
| 41 connection because of a lack of new or pending RPCs. New RPCs MAY be created |
| 42 in this state. Any attempt to start an RPC on the channel will push the channel |
| 43 out of this state to connecting. When there has been no RPC activity on a channe
l |
| 44 for a specified IDLE_TIMEOUT, i.e., no new or pending (active) RPCs for this |
| 45 period, channels that are READY or CONNECTING switch to IDLE. Additionaly, |
| 46 channels that receive a GOAWAY when there are no active or pending RPCs should |
| 47 also switch to IDLE to avoid connection overload at servers that are attempting |
| 48 to shed connections. We will use a default IDLE_TIMEOUT of 300 seconds (5 minute
s). |
| 49 |
| 50 SHUTDOWN: This channel has started shutting down. Any new RPCs should fail |
| 51 immediately. Pending RPCs may continue running till the application cancels them
. |
| 52 Channels may enter this state either because the application explicitly requeste
d |
| 53 a shutdown or if a non-recoverable error has happened during attempts to connect |
| 54 communicate . (As of 6/12/2015, there are no known errors (while connecting or |
| 55 communicating) that are classified as non-recoverable) |
| 56 Channels that enter this state never leave this state. |
| 57 |
| 58 The following table lists the legal transitions from one state to another and |
| 59 corresponding reasons. Empty cells denote disallowed transitions. |
| 60 |
| 61 <table style='border: 1px solid black'> |
| 62 <tr> |
| 63 <th>From/To</th> |
| 64 <th>CONNECTING</th> |
| 65 <th>READY</th> |
| 66 <th>TRANSIENT_FAILURE</th> |
| 67 <th>IDLE</th> |
| 68 <th>SHUTDOWN</th> |
| 69 </tr> |
| 70 <tr> |
| 71 <th>CONNECTING</th> |
| 72 <td>Incremental progress during connection establishment</td> |
| 73 <td>All steps needed to establish a connection succeeded</td> |
| 74 <td>Any failure in any of the steps needed to establish connection</td> |
| 75 <td>No RPC activity on channel for IDLE_TIMEOUT</td> |
| 76 <td>Shutdown triggered by application.</td> |
| 77 </tr> |
| 78 <tr> |
| 79 <th>READY</th> |
| 80 <td></td> |
| 81 <td>Incremental successful communication on established channel.</td> |
| 82 <td>Any failure encountered while expecting successful communication on |
| 83 established channel.</td> |
| 84 <td>No RPC activity on channel for IDLE_TIMEOUT <br>OR<br>upon receiving a G
OAWAY while there are no pending RPCs.</td> |
| 85 <td>Shutdown triggered by application.</td> |
| 86 </tr> |
| 87 <tr> |
| 88 <th>TRANSIENT_FAILURE</th> |
| 89 <td>Wait time required to implement (exponential) backoff is over.</td> |
| 90 <td></td> |
| 91 <td></td> |
| 92 <td></td> |
| 93 <td>Shutdown triggered by application.</td> |
| 94 </tr> |
| 95 <tr> |
| 96 <th>IDLE</th> |
| 97 <td>Any new RPC activity on the channel</td> |
| 98 <td></td> |
| 99 <td></td> |
| 100 <td></td> |
| 101 <td>Shutdown triggered by application.</td> |
| 102 </tr> |
| 103 <tr> |
| 104 <th>FATAL_FAILURE</th> |
| 105 <td></td> |
| 106 <td></td> |
| 107 <td></td> |
| 108 <td></td> |
| 109 <td></td> |
| 110 </tr> |
| 111 </table> |
| 112 |
| 113 |
| 114 Channel State API |
| 115 ----------------- |
| 116 |
| 117 All gRPC libraries will expose a channel-level API method to poll the current |
| 118 state of a channel. In C++, this method is called GetCurrentState and returns |
| 119 an enum for one of the four legal states. |
| 120 |
| 121 All libraries should also expose an API that enables the application (user of |
| 122 the gRPC API) to be notified when the channel state changes. Since state |
| 123 changes can be rapid and race with any such notification, the notification |
| 124 should just inform the user that some state change has happened, leaving it to |
| 125 the user to poll the channel for the current state. |
| 126 |
| 127 The synchronous version of this API is: |
| 128 |
| 129 ```cpp |
| 130 bool WaitForStateChange(gpr_timespec deadline, ChannelState source_state); |
| 131 ``` |
| 132 |
| 133 which returns true when the state changes to something other than the |
| 134 source_state and false if the deadline expires. Asynchronous and futures based |
| 135 APIs should have a corresponding method that allows the application to be |
| 136 notified when the state of a channel changes. |
| 137 |
| 138 Note that a notification is delivered every time there is a transition from any |
| 139 state to any *other* state. On the other hand the rules for legal state |
| 140 transition, require a transition from CONNECTING to TRANSIENT_FAILURE and back |
| 141 to CONNECTING for every recoverable failure, even if the corresponding |
| 142 exponential backoff requires no wait before retry. The combined effect is that |
| 143 the application may receive state change notifications that appear spurious. |
| 144 e.g., an application waiting for state changes on a channel that is CONNECTING |
| 145 may receive a state change notification but find the channel in the same |
| 146 CONNECTING state on polling for current state because the channel may have |
| 147 spent infinitesimally small amount of time in the TRANSIENT_FAILURE state. |
OLD | NEW |