-
Notifications
You must be signed in to change notification settings - Fork 39
/
l02-rpc.html
454 lines (367 loc) · 12.9 KB
/
l02-rpc.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
<h1>6.824 2015 Lecture 2: Infrastructure: RPC and threads</h1>
<p><strong>Note:</strong> These lecture notes were slightly modified from the ones posted on the 6.824
<a href="http://nil.csail.mit.edu/6.824/2015/schedule.html">course website</a> from Spring 2015.</p>
<h2>Remote Procedure Call (RPC)</h2>
<ul>
<li>A key piece of distrib sys machinery; all the labs use RPC</li>
<li><em>Goal:</em> easy-to-program network communication
<ul>
<li>hides most details of client/server communication</li>
<li>client call is much like ordinary procedure call</li>
<li>server handlers are much like ordinary procedures</li>
</ul></li>
<li>RPC is widely used!</li>
</ul>
<p>RPC ideally makes net communication look just like an ordinary <em>function call</em>:</p>
<pre><code> Client
------
z = fn(x, y)
Server
------
fn(x, y) {
compute
return z
}
</code></pre>
<p>RPC aims for this level of transparency</p>
<h3>RPC message diagram</h3>
<pre><code> Client Server
------ ------
"fn", x, y
request ---------->
compute fn(x, y)
z = fn(x, y)
<------------- response
</code></pre>
<h3>Software structure</h3>
<pre><code> Client Server
------ ------
client app handlers
stubs dispatcher
RPC lib RPC lib
net <-----------> net
</code></pre>
<p><em>Stubs</em> are sort of the fake client-side functions that look like the real <code>f(x, y)</code> but
they just take care of packaging the arguments, sending them over the
network and ask the server to compute <code>f(x, y)</code>. The stub can then
receive the result over the network and return the value to the client
code.</p>
<p>Examples from lab 1:</p>
<ul>
<li><code>DoJob</code></li>
<li><code>Register</code></li>
</ul>
<h3>A few details of RPC</h3>
<ul>
<li><em>Marshalling:</em> format data into packets
<ul>
<li>Tricky for arrays, pointers, objects, etc.</li>
<li>Go's RPC library is pretty powerful!</li>
<li>some things you cannot pass/marshall: e.g., channels, functions</li>
</ul></li>
<li><em>Binding:</em> how does client know who to talk to?
<ul>
<li>Might be a name service -- e.g. DNS</li>
</ul></li>
<li><em>Threads:</em>
<ul>
<li>Client often has many threads, so <code>> 1</code> call outstanding, match up replies to calls</li>
<li>Handlers may be slow, so server often runs each in a thread</li>
</ul></li>
</ul>
<p><em>RPC problem:</em> what to do about failures?</p>
<ul>
<li>e.g. lost packets, broken network, crashed servers, slow servers</li>
</ul>
<p>What does a failure look like to the client's RPC library?</p>
<ul>
<li>It never sees a response from the server
<ul>
<li>Maybe packet was lost</li>
</ul></li>
<li>It does <em>not</em> know if the server saw the request!
<ul>
<li>Maybe server/net failed just before sending reply</li>
</ul></li>
</ul>
<h3>Simplest scheme: <em>"at least once"</em> behavior</h3>
<pre><code> while true
send req
wait up to 10 seconds for reply
if reply arrives
return reply
else
continue
</code></pre>
<ul>
<li>RPC client library waits for response for a while</li>
<li>If none arrives, re-send the request</li>
<li>Do this a few times</li>
<li>Still no response -- return an error to the application</li>
</ul>
<p><strong>Q:</strong> is "at least once" easy for applications to cope with?</p>
<p>Simple problem w/ at least once:</p>
<ul>
<li>Occurs with requests that are <strong>not</strong> <em>side-effect free</em></li>
<li>Client sends <em>"deduct $10 from bank account"</em> twice because
it did not hear back for the first one</li>
</ul>
<p>More subtle problem: what can go wrong with this client program?</p>
<ul>
<li><code>Put("k", "v")</code> overwrites the value at <code>k</code> with <code>v</code></li>
<li><code>Put("key", "value1")</code> -- an RPC to set key's value in a DB server</li>
<li><code>Put("key", "value2")</code> -- client then does a 2nd Put to same key</li>
</ul>
<p>Example:</p>
<pre><code>Client Server
------ ------
put k, 10
----\
\
put k, 20 ---------------------> k <- 20
\
-------------> k <- 10
get k --------------------->
10
<---------------------
</code></pre>
<p><strong>Note:</strong> This situation where client sends a request, server does some work and
replies, but the reply is lost occurs frequently and will come up a lot
in labs.</p>
<p>Is at-least-once ever OK?</p>
<ul>
<li>Yes: if it's OK to repeat operations, e.g. read-only op</li>
<li>Yes: if application has its own plan for detecting duplicates
<ul>
<li>which you will need for Lab 1</li>
</ul></li>
</ul>
<h3>Better RPC behavior: <em>"at most once"</em></h3>
<ul>
<li><em>Idea:</em> server RPC code detects duplicate requests
<ul>
<li>returns previous reply instead of re-running handler</li>
</ul></li>
<li>Client includes <em>unique ID (XID)</em> with each request
<ul>
<li>uses same XID for re-send</li>
</ul></li>
<li>Server checks if XID has been seen before</li>
</ul>
<p>Example:</p>
<pre><code> if seen[xid]:
r = old[xid]
else
r = handler()
old[xid] = r
seen[xid] = true
</code></pre>
<p>Some at-most-once complexities</p>
<ul>
<li>How to ensure XID is unique?
<ul>
<li>big random number?</li>
<li>combine unique client ID (ip address?) with sequence #?</li>
</ul></li>
<li>Server must eventually discard info about old RPCs
<ul>
<li>When is discard safe?</li>
<li><em>Idea:</em>
<ul>
<li>unique client IDs</li>
<li>per-client RPC sequence numbers</li>
<li>client includes <em>"seen all replies <code><= X</code>"</em> with every RPC
much like TCP sequence #s and ACKs</li>
<li>or only allow client one outstanding RPC at a time s.t.
arrival of <code>seq+1</code> allows server to discard all <code><= seq</code></li>
<li>or client agrees to keep retrying for <code>< 5</code> minutes
server discards after 5+ minutes</li>
</ul></li>
</ul></li>
<li>How to handle duplicate request while original is still executing?
<ul>
<li>Server doesn't know reply yet; don't want to run twice</li>
<li><em>Idea:</em> "pending" flag per executing RPC; wait or ignore</li>
</ul></li>
</ul>
<p>What if an at-most-once server crashes?</p>
<ul>
<li>if at-most-once duplicate info in memory, server will forget
<ul>
<li>and accept duplicate requests</li>
</ul></li>
<li>maybe it should write the duplicate info to disk?</li>
<li>maybe replica server should also replicate duplicate info?</li>
</ul>
<h3>What about <em>"exactly once"</em>?</h3>
<ul>
<li><em>at-most-once</em> semantics plus unbounded retries plus fault-tolerant service</li>
</ul>
<h3>Go RPC is "at-most-once"</h3>
<ul>
<li>open TCP connection</li>
<li>write request to TCP connection</li>
<li>TCP may retransmit, but server's TCP will filter out duplicates</li>
<li>no retry in Go code (i.e. will NOT create 2nd TCP connection)</li>
<li>Go RPC code returns an error if it doesn't get a reply
<ul>
<li>perhaps after a timeout (from TCP)</li>
<li>perhaps server didn't see request</li>
<li>perhaps server processed request but server/net failed before reply came back</li>
</ul></li>
</ul>
<h3>Go's at-most-once RPC isn't enough for Lab 1</h3>
<ul>
<li>it only applies to a single RPC call</li>
<li>if worker doesn't respond, the master re-sends to it to another worker
<ul>
<li>but original worker may have not failed, and is working on it too</li>
</ul></li>
<li>Go RPC can't detect this kind of duplicate
<ul>
<li>No problem in lab 1, which handles at application level</li>
<li>In lab 2 you will have to protect against these kinds of duplicates</li>
</ul></li>
</ul>
<h2>Threads</h2>
<ul>
<li>threads are a fundamental server structuring tool</li>
<li>you'll use them a lot in the labs</li>
<li>they can be tricky</li>
<li>useful with RPC </li>
<li>called goroutines in Go</li>
</ul>
<p>Thread = "thread of control"</p>
<ul>
<li>threads allow one program to (logically) do many things at once</li>
<li>the threads share memory</li>
<li>each thread includes some per-thread state:
<ul>
<li>program counter, registers, stack</li>
</ul></li>
</ul>
<h3>Threading challenges:</h3>
<ul>
<li>sharing data between thread
<ul>
<li>what if two threads modify same variable at same time?</li>
<li>what if one thread reads data another thread is changing?</li>
<li>these problems are often called <em>races</em></li>
<li>need to protect invariants on shared data (Go: <em>mutex</em>)</li>
</ul></li>
<li><em>coordination</em> between threads (Go: <em>channels</em>)
<ul>
<li>e.g. wait for all Map threads to finish</li>
</ul></li>
<li><em>deadlocks</em>
<ul>
<li>thread 1 is waiting for thread 2</li>
<li>thread 2 is waiting for thread 1</li>
<li>easy detectable (unlike races)</li>
</ul></li>
<li>lock granularity
<ul>
<li>goarse-grained <code>-></code> little concurrency/parallelism</li>
<li>fine-grained <code>-></code> lots of concurrency, but race and deadlocks</li>
</ul></li>
<li>let's look at a toy RPC package to illustrate these problems</li>
</ul>
<h2>Look at today's handout -- <a href="code/l-rpc.go">l-rpc.go</a></h2>
<p>Get it <a href="code/l-rpc.go">here</a>.</p>
<ul>
<li>it's a toy RPC system</li>
<li>illustrates threads, mutexes, channels</li>
<li>it's a toy
<ul>
<li>assumes connection already open</li>
<li>only supports an integer arg, integer reply</li>
<li>doesn't deal with errors</li>
</ul></li>
</ul>
<h4><code>struct ToyClient</code></h4>
<ul>
<li>client RPC state </li>
<li>mutex per <code>ToyClient</code></li>
<li>connection to server (e.g. TCP socket)</li>
<li>xid -- unique ID per call, to match reply to caller</li>
<li><code>pending[]</code> -- multiple threads may call, need to find them
<ul>
<li>channel on which caller is waiting</li>
</ul></li>
</ul>
<h4><code>Call()</code></h4>
<ul>
<li>application calls <code>reply := client.Call(procNum, arg)</code></li>
<li><code>procNum</code> indicates what function to run on server</li>
<li><code>WriteRequest</code> knows the format of an RPC msg
<ul>
<li>basically just the arguments turned into bits in a packet</li>
</ul></li>
<li><strong>Q:</strong> why the mutex in <code>Call()</code>? what does <code>mu.Lock()</code> do?</li>
<li><strong>Q:</strong> could we move <code>xid := tc.xid</code> outside the critical section?
<ul>
<li>after all, we are not changing anything</li>
<li>[See diagram below]</li>
</ul></li>
<li><strong>Q:</strong> do we need to <code>WriteRequest</code> inside the critical section?
<ul>
<li>note: Go says you are responsible for preventing concurrent map ops</li>
<li>that's one reason the update to pending is locked</li>
</ul></li>
</ul>
<p>Diagram:</p>
<h4><code>Listener()</code></h4>
<ul>
<li>runs as a background thread</li>
<li>what is <code><-</code> doing?</li>
<li>not quite right that it may need to wait on chan for caller</li>
</ul>
<h4>Back to <code>Call()</code>...</h4>
<p><strong>Q:</strong> what if reply comes back very quickly?</p>
<ul>
<li>could <code>Listener()</code> see reply before <code>pending[xid]</code> entry exists?</li>
<li>or before caller is waiting for channel?</li>
</ul>
<p><strong>Q:</strong> should we put <code>reply := <-done</code> inside the critical section?</p>
<ul>
<li>why is it OK outside? after all, two threads use it.</li>
</ul>
<p><strong>Q:</strong> why mutex per <code>ToyClient</code>, rather than single mutex per whole RPC pkg?</p>
<h4>Server's <code>Dispatcher()</code></h4>
<ul>
<li>note that the Dispatcher echos the xid back to the client
<ul>
<li>so that <code>Listener</code> knows which Call to wake up</li>
</ul></li>
<li><strong>Q:</strong> why run the handler in a separate thread?</li>
<li><strong>Q:</strong> is it a problem that the dispatcher can reply out of order?</li>
</ul>
<h4><code>main()</code></h4>
<ul>
<li>note registering handler in <code>handlers[]</code></li>
<li>what will the program print?</li>
</ul>
<p>When to use shared memory (and locks) vs when to use channels?</p>
<ul>
<li>here is my opinion</li>
<li>use channels when you want one thread to explicitly wait for another
<ul>
<li>often wait for a result, or wait for the next request</li>
<li>e.g. when client <code>Call()</code> waits for <code>Listener()</code></li>
</ul></li>
<li>use shared memory and locks when the threads are not intentionally
<ul>
<li>directly interacting, but just happen to r/w the same data</li>
<li>e.g. when <code>Call()</code> uses <code>tc.xid</code></li>
</ul></li>
</ul>
<p>Go's "memory model" requires explicit synchronization to communicate!</p>
<p>This code is not correct:</p>
<pre><code> var x int
done := false
go func() { x = f(...); done = true }
while done == false { }
</code></pre>
<p>It's very tempting to write, but the Go spec says it's undefined
use a channel or <code>sync.WaitGroup</code> instead</p>
<p>Study the Go tutorials on <em>goroutines</em> and <em>channels</em>.</p>