l16-timing-attacks.html

<h1>Side-channel attacks on RSA</h1>

<p><strong>Note:</strong> These lecture notes were slightly modified from the ones posted on the 6.858 <a href="http://css.csail.mit.edu/6.858/2014/schedule.html">course website</a> from 2014.</p>

<h2>Side channel attacks</h2>

<ul>
<li>Historically, worried about EM signals leaking. <a href="http://cryptome.org/nsa-tempest.pdf">NSA TEMPEST</a>.</li>
<li>Broadly, systems may need to worry about many unexpected ways in which</li>
<li>information can be revealed.</li>
</ul>

<p><em>Example setting:</em> a server (e.g., Apache) has an RSA private key.</p>

<ul>
<li>Server uses RSA private key (e.g., decrypt message from client).</li>
<li>Something about the server's computation is leaked to the client.</li>
</ul>

<p>Many information leaks have been looked at:</p>

<ul>
<li>How long it takes to decrypt.</li>
<li>How decryption affects shared resources (cache, TLB, branch predictor).</li>
<li>Emissions from the CPU itself (RF, audio, power consumption, etc).</li>
</ul>

<p>Side-channel attacks don't have to be crypto-related.</p>

<ul>
<li>E.g., operation time relates to which character of password was incorrect.</li>
<li>Or time related to how many common friends you + some user have on Facebook.</li>
<li>Or how long it takes to load a page in browser (depends if it was cached).</li>
<li>Or recovering printed text based on sound from dot-matrix printer.
 <a href="https://www.usenix.org/conference/usenixsecurity10/acoustic-side-channel-attacks-printers">Ref</a></li>
<li>But attacks on passwords or keys are usually the most damaging.</li>
</ul>

<p>Adversary can analyze information leaks, use it to reconstruct private key.</p>

<ul>
<li>Currently, side-channel attacks on systems described in the paper are rare.
<ul>
<li>E.g., Apache web server running on some Internet-connected machine.</li>
<li>Often some other vulnerability exists and is easier to exploit.</li>
<li>Slowly becoming a bigger concern: new side-channels (VMs), better attacks.</li>
</ul></li>
<li>Side-channel attacks are more commonly used to attack trusted/embedded hw.
<ul>
<li>E.g., chip running cryptographic operations on a smartcard.</li>
<li>Often these have a small attack surface, not many other ways to get in.</li>
<li>As paper mentions, some crypto coprocessors designed to avoid this attack.</li>
</ul></li>
</ul>

<p>What's the <em>"Remote timing attacks are practical"</em> paper's contribution? <a href="http://css.csail.mit.edu/6.858/2014/readings/brumley-timing.pdf">Ref</a></p>

<ul>
<li>Timing attacks known for a while.</li>
<li>This paper: possible to attack standard Apache web server over the network.</li>
<li>Uses lots of observations/techniques from prior work on timing attacks.</li>
<li>To understand how this works, first let's look at some internals of RSA..</li>
</ul>

<h2>RSA: high level plan</h2>

<ul>
<li>Pick two random primes, <code>p</code> and <code>q</code>.
<ul>
<li>Let <code>n = p*q</code>.</li>
</ul></li>
<li>A reasonable key length, i.e., <code>|n|</code> or <code>|d|</code>, is 2048 bits today.</li>
<li>Euler's function <code>phi(n)</code>: number of elements of <code>Z_n^*</code> relatively prime to <code>n</code>.
<ul>
<li><strong>Theorem</strong> [no proof here]: <code>a^(phi(n)) = 1 mod n</code>, for all <code>a</code> and <code>n</code>.</li>
</ul></li>
<li>So, how to encrypt and decrypt?
<ul>
<li>Pick two exponents <code>d</code> and <code>e</code>, such that <code>m^(e*d) = m (mod n)</code>, which
<ul>
<li>means <code>e*d = 1 mod phi(n)</code>.</li>
</ul></li>
<li>Encryption will be <code>c = m^e (mod n)</code>; decryption will be <code>m = c^d (mod n)</code>.</li>
</ul></li>
<li>How to get such <code>e</code> and <code>d</code>?
<ul>
<li>For <code>n=pq</code>, <code>phi(n) = (p-1)(q-1)</code>.</li>
<li>Easy to compute <code>d=1/e</code>, if we know <code>phi(n)</code>.
<a href="http://en.wikipedia.org/wiki/Modular_multiplicative_inverse">Extended Euclidean algorithm</a></li>
<li>In practice, pick small <code>e</code> (e.g., 65537), to make encryption fast.</li>
</ul></li>
<li>Public key is <code>(n, e)</code>.</li>
<li>Private key is, in principle, <code>(n, d)</code>.
<ul>
<li><strong>Note:</strong> <code>p</code> and <code>q</code> must be kept secret!</li>
<li>Otherwise, adversary can compute <code>d</code> from <code>e</code>, as we did above.</li>
<li>Knowing <code>p</code> and <code>q</code> also turns out to be helpful for fast decryption.</li>
<li>So, in practice, private key includes <code>(p, q)</code> as well.</li>
</ul></li>
</ul>

<p>RSA is tricky to use "securely" -- be careful if using RSA directly!</p>

<ul>
<li>Ciphertexts are multiplicative
<ul>
<li><code>E(a)*E(b) = a^e * b^e = (ab)^e</code>.</li>
<li>Can allow adversary to manipulate encryptions, generate new ones.</li>
</ul></li>
<li>RSA is deterministic
<ul>
<li>Encrypting the same plaintext will generate the same ciphertext each time.</li>
<li>Adversary can tell when the same thing is being re-encrypted.</li>
</ul></li>
<li>Typically solved by "padding" messages before encryption.
<a href="http://en.wikipedia.org/wiki/Optimal_asymmetric_encryption_padding">OAEP</a>
<ul>
<li>Take plaintext message bits, add padding bits before and after plaintext.</li>
<li>Encrypt the combined bits (must be less than <code>|n|</code> bits total).</li>
<li>Padding includes randomness, as well as fixed bit patterns.</li>
<li>Helps detect tampering (e.g. ciphertext multiplication).</li>
</ul></li>
</ul>

<h2>How to implement RSA?</h2>

<ul>
<li><strong>Key problem:</strong> fast modular exponentiation.
<ul>
<li>In general, quadratic complexity.</li>
</ul></li>
<li>Multiplying two 1024-bit numbers is slow.</li>
<li>Computing the modulus for 1024-bit numbers is slow (1024-bit divison).</li>
</ul>

<h3>Optimization 1: Chinese Remainder Theorem (CRT).</h3>

<ul>
<li>Recall what the CRT says:
<ul>
<li>if <code>x==a1 (mod p)</code> and <code>x==a2 (mod q)</code>, where <code>p</code> and <code>q</code> are relatively prime,</li>
<li>then there's a unique solution <code>x==a (mod pq)</code>.
<ul>
<li>and, there's an efficient algorithm for computing <code>a</code></li>
</ul></li>
</ul></li>
<li>Suppose we want to compute <code>m = c^d (mod pq)</code>.</li>
<li>Can compute <code>m1 = c^d (mod p)</code>, and <code>m2 = c^d (mod q)</code>.</li>
<li>Then use CRT to compute <code>m = c^d (mod n)</code> from <code>m1</code>, <code>m2</code>; it's unique and fast.</li>
<li>Computing <code>m1</code> (or <code>m2</code>) is ~4x faster than computing <code>m</code> directly (~quadratic).</li>
<li>Computing <code>m</code> from <code>m1</code> and <code>m2</code> using CRT is ~negligible in comparison.</li>
<li>So, roughly a 2x speedup.</li>
</ul>

<h3>Optimization 2: Repeated squaring and Sliding windows.</h3>

<ul>
<li>Naive approach to computing <code>c^d</code>: multiply <code>c</code> by itself, <code>d</code> times.</li>
<li>Better approach, called repeated squaring:
<ul>
<li><code>c^(2x)   = (c^x)^2</code></li>
<li><code>c^(2x+1) = (c^x)^2 * c</code></li>
<li>To compute <code>c^d</code>, first compute <code>c^(floor(d/2))</code>, then use above for <code>c^d</code>.</li>
<li>Recursively apply until the computation hits <code>c^0 = 1</code>.</li>
<li>Number of squarings:       <code>|d|</code> (the number of bits needed to represent <code>d</code>)</li>
<li>Number of multiplications: number of 1 bits in <code>d</code></li>
</ul></li>
<li>Better yet (sometimes), called <em>sliding window</em>:
<ul>
<li><code>c^(2x)    = (c^x)^2</code></li>
<li><code>c^(32x+1) = (c^x)^32 * c</code></li>
<li><code>c^(32x+3) = (c^x)^32 * c^3</code></li>
<li>...</li>
<li><code>c^(32x+z) = (c^x)^32 * c^z</code>, generally (where <code>z&lt;=31</code>)</li>
<li>Can pre-compute a table of all necessary <code>c^z</code> powers, store in memory.</li>
<li>The choice of power-of-2 constant (e.g., 32) depends on usage.
<ul>
<li>Costs: extra memory, extra time to pre-compute powers ahead of time.</li>
</ul></li>
<li>Note: only pre-compute odd powers of <code>c</code> (use first rule for even).</li>
<li>OpenSSL uses 32 (table with 16 pre-computed entries).</li>
</ul></li>
</ul>

<h3>Optimization 3: Montgomery representation.</h3>

<ul>
<li>Reducing <code>mod p</code> each time (after square or multiply) is expensive.
<ul>
<li>Typical implementation: do long division, find remainder.</li>
<li>Hard to avoid reduction: otherwise, value grows exponentially.</li>
</ul></li>
<li>Idea (by Peter Montgomery): do computations in another representation.
<ul>
<li>Shift the base (e.g., <code>c</code>) into different representation upfront.</li>
<li>Perform modular operations in this representation (will be cheaper).</li>
<li>Shift numbers back into original representation when done.</li>
<li>Ideally, savings from reductions outweigh cost of shifting.</li>
</ul></li>
<li>Montgomery representation: multiply everything by some factor R.
<ul>
<li><code>a mod q        &lt;-&gt; aR mod q</code></li>
<li><code>b mod q        &lt;-&gt; bR mod q</code></li>
<li><code>c = a*b mod q  &lt;-&gt; cR mod q = (aR * bR)/R mod q</code></li>
<li>Each mul (or sqr) in Montgomery-space requires division by <code>R</code>.</li>
</ul></li>
<li>Why is modular multiplication cheaper in Montgomery repr.?
<ul>
<li>Choose <code>R</code> so division by <code>R</code> is easy: <code>R = 2^|q|</code> (<code>2^512</code> for 1024-bit keys).</li>
<li>Because we divide by <code>R</code>, we will often not need to do <code>mod q</code>.
<ul>
<li><code>|aR|          = |q|</code></li>
<li><code>|bR|          = |q|</code></li>
<li><code>|aR * bR|     = 2|q|</code></li>
<li><code>|aR * bR / R| = |q|</code></li>
</ul></li>
<li>How do we divide by <code>R</code> cheaply? 
<ul>
<li>Only works if lower bits are zero.</li>
</ul></li>
<li><em>Observation:</em> since we care about value <code>mod q</code>, multiples of <code>q</code> don't matter.</li>
<li><em>Trick:</em> add multiples of <code>q</code> to the number being divided by <code>R</code>, make low bits 0.
<ul>
<li>For example, suppose <code>R=2^4 (10000)</code>, <code>q=7 (111)</code>, divide <code>x=26 (11010)</code> by R.</li>
<li><code>x+2q    = (binary) * 101000</code></li>
<li><code>x+2q+8q = (binary) 1100000</code></li>
<li>Now, can easily divide by <code>R</code>: result is binary 110 (or 6).</li>
<li>Generally, always possible:</li>
<li>Low bit of <code>q</code> is 1 (<code>q</code> is prime), so can "shoot down" any bits.</li>
<li>To "shoot down" bit <code>k</code>, add <code>2^k * q</code></li>
<li>To shoot down low-order bits <code>l</code>, add <code>q*(l*(-q^-1) mod R)</code></li>
<li>Then, dividing by <code>R</code> means simply discarding low zero bits.</li>
</ul></li>
</ul></li>
<li><em>One remaining problem:</em> result will be <code>&lt; R</code>, but might be <code>&gt; q</code>.
<ul>
<li>If the result happens to be greater than <code>q</code>, need to subtract <code>q</code>.</li>
<li>This is called the "extra reduction".</li>
<li>When computing <code>x^d mod q</code>, <code>Pr[extra reduction] = (x mod q) / 2R</code>.
<ul>
<li>Here, <code>x</code> is assumed to be already in Montgomery form.</li>
<li><em>Intuition:</em> as we multiply bigger numbers, will overflow more often.</li>
</ul></li>
</ul></li>
</ul>

<h3>Optimization 4: Efficient multiplication.</h3>

<ul>
<li>How to multiply 512-bit numbers?</li>
<li>Representation: break up into 32-bit values (or whatever hardware supports).</li>
<li>Naive approach: pair-wise multiplication of all 32-bit components.
<ul>
<li>Same as if you were doing digit-wise multiplication of numbers on paper.</li>
<li>Requires <code>O(nm)</code> time if two numbers have <code>n</code> and <code>m</code> components respectively.</li>
<li><code>O(n^2)</code> if the two numbers are close.</li>
</ul></li>
<li><strong>Karatsuba multiplication:</strong> assumes both numbers have same number of components.
<ul>
<li><code>O(n^log_3(2)) = O(n^1.585)</code>time.</li>
<li>Split both numbers (<code>x</code> and <code>y</code>) into two components (<code>x1</code>, <code>x0</code> and <code>y1</code>, <code>y0</code>).
<ul>
<li><code>x = x1 * B + x0</code></li>
<li><code>y = y1 * B + y0</code></li>
<li>E.g., <code>B=2^32</code> when splitting 64-bit numbers into 32-bit components.</li>
</ul></li>
<li>Naive: <code>x*y = x1y1 * B^2 + x0y1 * B + x1y0 * B + x0y0</code>
<ul>
<li>Four multiplies: <code>O(n^2)</code></li>
</ul></li>
<li>Faster: <code>x*y = x1y1 * (B^2+B) - (x1-x0)(y1-y0) * B + x0y0 * (B+1)</code>
<ul>
<li>... <code>= x1y1 * B^2 + ( -(x1-x0)(y1-y0) + x1y1 + x0y0 ) * B + x0y0</code></li>
<li>Just three multiplies, and a few more additions.</li>
</ul></li>
<li>Recursively apply this algorithm to keep splitting into more halves.
<ul>
<li>Sometimes called "recursive multiplication".</li>
</ul></li>
<li>Meaningfully faster (no hidden big constants)
<ul>
<li>For 1024-bit keys, "<code>n</code>" here is 16 (512/32).</li>
<li><code>n^2     = 256</code></li>
<li><code>n^1.585 = 81</code></li>
</ul></li>
</ul></li>
<li>Multiplication algorithm needs to decide when to use Karatsuba vs. Naive.
<ul>
<li>Two cases matter: <em>two large numbers</em>, and <em>one large + one small number</em>.</li>
<li>OpenSSL: if equal number of components, use Karatsuba, otherwise Naive.</li>
<li>In some intermediate cases, Karatsuba may win too, but OpenSSL ignores it,
<ul>
<li>according to this paper.</li>
</ul></li>
</ul></li>
</ul>

<h2>How does SSL use RSA?</h2>

<ul>
<li>Server's SSL certificate contains public key.</li>
<li>Server must use private key to prove its identity.</li>
<li>Client sends random bits to server, encrypted with server's public key.</li>
<li>Server decrypts client's message, uses these bits to generate session key.
<ul>
<li>In reality, server also verifies message padding.</li>
<li>However, can still measure time until server responds in some way.</li>
</ul></li>
</ul>

<p>Figure of <strong>decryption pipeline</strong> on the server:</p>

<pre><code>                  * CRT             * To Montgomery             * Modular exp
        --&gt; * c_0 = c mod q  --&gt; * c'_0 = c_0*R mod q  --&gt; * m'_0 = (c'_0)^d mod q
       /
      /                                            * Use sliding window for bits
     /                                               * of the exponent d

    c                                              * Karatsuba if c'_0 and q have
                                                     * same number of 32-bit parts
     \
      \                                            * Extra reductions proportional
       \                                             * to ((c'_0)^z mod q) / 2R;
        --&gt;  ...                                     * z comes from sliding window
</code></pre>

<ul>
<li>Then, compute <code>m_0 = m'_0/R mod q</code>.</li>
<li>Then, combine <code>m_0</code> and <code>m_1</code> using CRT to get <code>m</code>.</li>
<li>Then verify padding in <code>m</code>.</li>
<li>Finally, use payload in some way (SSL, etc).</li>
</ul>

<h2>Setup for the attack described in Brumley's paper</h2>

<ul>
<li>Victim Apache HTTPS web server using OpenSSL, has private key in memory.</li>
<li>Connected to Stanford's campus network.</li>
<li>Adversary controls some client machine on campus network.</li>
<li>Adversary sends specially-constructed ciphertext in msg to server.
<ul>
<li>Server decrypts ciphertext, finds garbage padding, returns an error.</li>
<li>Client measures response time to get error message.</li>
<li>Uses the response time to guess bits of <code>q</code>.</li>
</ul></li>
<li>Overall response time is on the order of 5 msec.
<ul>
<li>Time difference between requests can be around 10 usec.</li>
</ul></li>
<li>What causes time variations? 
<ul>
<li>Karatsuba vs. Naive</li>
<li>extra reductions</li>
</ul></li>
<li>Once guessed enough bits of <code>q</code>, can factor <code>n=p*q</code>, compute <code>d</code> from <code>e</code>.</li>
<li>About 1M queries seem enough to obtain 512-bit <code>p</code> and <code>q</code> for 1024-bit key.
<ul>
<li>Only need to guess the top 256 bits of <code>p</code> and <code>q</code>, then use another algorithm.</li>
</ul></li>
</ul>

<h2>Attack from Brumley's paper</h2>

<ul>
<li>See the <em>Remote timing attacks are practical</em> paper cited in the <em>References</em> section at the end for more details.</li>
<li>Let <code>q = q_0 q_1 .. q_N</code>, where <code>N = |q|</code> (say, 512 bits for 1024-bit keys).</li>
<li>Assume we know some number <code>j</code> of high-order bits of <code>q</code> (<code>q_0</code> through <code>q_j</code>).</li>
<li>Construct two approximations of q, guessing <code>q_{j+1}</code> is either 0 or 1:
<ul>
<li><code>g    = q_0 q_1 .. q_j 0 0 0 .. 0</code></li>
<li><code>g_hi = q_0 q_1 .. q_j 1 0 0 .. 0</code></li>
</ul></li>
<li>Get the server to perform modular exponentiation (<code>g^d</code>) for both guesses.
<ul>
<li>We know <code>g</code> is necessarily less than <code>q</code>.</li>
<li>If <code>g</code> and <code>g_hi</code> are both less than <code>q</code>, time taken shouldn't change much.</li>
<li>If <code>g_hi</code> is greater than <code>q</code>, time taken might change noticeably.
<ul>
<li><code>g_hi mod q</code> is small.</li>
<li>Less time: fewer extra reductions in Montgomery.</li>
<li>More time: switch from Karatsuba to normal multiplication.</li>
</ul></li>
<li>Knowing the time taken can tell us if 0 or 1 was the right guess.</li>
</ul></li>
<li>How to get the server to perform modular exponentiation on our guess?
<ul>
<li>Send our guess as if it were the encryption of randomness to server.</li>
<li>One snag: server will convert our message to Montgomery form.</li>
<li>Since Montgomery's <code>R</code> is known, send (<code>g/R mod n</code>) as message to server.</li>
</ul></li>
<li>How do we know if the time difference should be positive or negative?
<ul>
<li>Paper seems to suggest it doesn't matter: just look for large diff.</li>
<li>Figure 3a shows the measured time differences for each bit's guess.</li>
<li>Karatsuba vs. normal multiplication happens at 32-bit boundaries.</li>
<li>First 32 bits: extra reductions dominate.</li>
<li>Next bits: Karatsuba vs normal multiplication dominates.</li>
<li>At some point, extra reductions start dominating again.</li>
</ul></li>
<li>What happens if the time difference from the two effects cancels out?
<ul>
<li>Figure 3, key 3.</li>
<li>Larger neighborhood changes the balance a bit, reveals a non-zero gap.</li>
</ul></li>
<li>How does the paper get accurate measurements?
<ul>
<li>Client machine uses processor's timestamp counter (<code>rdtsc</code> on x86).</li>
<li>Measure several times, take the median value.
<ul>
<li>Not clear why median; min seems like it would be the true compute time.</li>
</ul></li>
<li>One snag: relatively few multiplications by <code>g</code>, due to sliding windows.</li>
<li>Solution: get more multiplications by values close to <code>g</code> (+ same for <code>g_hi</code>).</li>
<li>Specifically, probe a "neighborhood" of <code>g</code> (<code>g, g+1, .., g+400</code>).</li>
</ul></li>
<li>Why probe a 400-value neighborhood of <code>g</code> instead of measuring <code>g</code> 400 times?
<ul>
<li>Consider the kinds of noise we are trying to deal with.</li>
<li>(1) Noise unrelated to computation (e.g. interrupts, network latency).
<ul>
<li>This might go away when we measure the same thing many times.</li>
<li>See Figure 2a in the paper.</li>
</ul></li>
<li>(2) "Noise" related to computation.
<ul>
<li>E.g., multiplying by <code>g^3</code> and <code>g_hi^3</code> in sliding window takes diff time.</li>
<li>Repeated measurements will return the same value.</li>
<li>Will not help determine whether mul by <code>g</code> or <code>g_hi</code> has more reductions.</li>
<li>See Figure 2b in the paper.</li>
</ul></li>
<li>Neighborhood values average out 2nd kind of noise.</li>
<li>Since neighborhood values are nearby, still has ~same # reductions.</li>
</ul></li>
</ul>

<h2>How to avoid these attacks?</h2>

<ul>
<li>Timing attack on decryption time: RSA blinding.
<ul>
<li>Choose random <code>r</code>.</li>
<li>Multiply ciphertext by <code>r^e mod n</code>: <code>c' = c*r^e mod n</code>.</li>
<li>Due to multiplicative property of RSA, <code>c'</code> is an encryption of <code>m*r</code>.</li>
<li>Decrypt ciphertext <code>c'</code> to get message <code>m'</code>.</li>
<li>Divide plaintext by <code>r</code>: <code>m = m'/r</code>.</li>
<li>About a 10% CPU overhead for OpenSSL, according to Brumley's paper.</li>
</ul></li>
<li>Make all code paths predictable in terms of execution time.
<ul>
<li>Hard, compilers will strive to remove unnecessary operations.</li>
<li>Precludes efficient special-case algorithms.</li>
<li>Difficult to predict execution time: instructions aren't fixed-time.</li>
</ul></li>
<li>Can we take away access to precise clocks?
<ul>
<li>Yes for single-threaded attackers on a machine we control.</li>
<li>Can add noise to legitimate computation, but attacker might average.</li>
<li>Can quantize legitimate computations, at some performance cost.</li>
<li>But with "sleeping" quantization, throughput can still leak info.</li>
</ul></li>
</ul>

<h3>How worried should we be about these attacks?</h3>

<ul>
<li>Relatively tricky to develop an exploit (but that's a one-time problem).</li>
<li>Possible to notice attack on server (many connection requests).
<ul>
<li>Though maybe not so easy on a busy web server cluster?</li>
</ul></li>
<li>Adversary has to be close by, in terms of network.
<ul>
<li>Not that big of a problem for adversary.</li>
<li>Can average over more queries, co-locate nearby (Amazon EC2),
<ul>
<li>Run on a nearby bot or browser, etc.</li>
</ul></li>
</ul></li>
<li>Adversary may need to know the version, optimization flags, etc of OpenSSL.
<ul>
<li>Is it a good idea to rely on such a defense?</li>
<li>How big of an impediment is this?</li>
</ul></li>
<li>If adversary mounts attack, effects are quite bad (key leaked).</li>
</ul>

<h2>Other types of timing attacks</h2>

<ul>
<li><strong>Page-fault timing for password guessing</strong> [Tenex system]
<ul>
<li>Suppose the kernel provides a system call to check user's password.
<ul>
<li>Checks the password one byte at a time, returns error when finds mismatch.</li>
</ul></li>
<li>Adversary aligns password, so that first byte is at the end of a page,
<ul>
<li>Rest of password is on next page.</li>
</ul></li>
<li>Somehow arrange for the second page to be swapped out to disk.
<ul>
<li>Or just unmap the next page entirely (using equivalent of <code>mmap</code>).</li>
</ul></li>
<li>Measure time to return an error when guessing password.
<ul>
<li>If it took a long time, kernel had to read in the second page from disk.
<ul>
<li>Or, if unmapped, if crashed, then kernel tried to read second page. ]</li>
</ul></li>
<li>Means first character was right!</li>
</ul></li>
<li>Can guess an <code>N</code>-character password in <code>256*N</code> tries, rather than <code>256^N</code>.</li>
</ul></li>
<li><strong>Cache analysis attacks:</strong> processor's cache shared by all processes.
<ul>
<li>E.g.: accessing one of the sliding-window multiples brings it in cache.</li>
<li>Necessarily evicts something else in the cache.</li>
<li>Malicious process could fill cache with large array, watch what's evicted.</li>
<li>Guess parts of exponent (<code>d</code>) based on offsets being evicted.</li>
</ul></li>
<li>Cache attacks are potentially problematic with "mobile code".
<ul>
<li>NaCl modules, Javascript, Flash, etc running on your desktop or phone.</li>
</ul></li>
<li><strong>Network traffic timing / analysis attacks</strong>.
<ul>
<li>Even when data is encrypted, its ciphertext size remains ~same as plaintext.</li>
<li>Recent papers show can infer a lot about SSL/VPN traffic by sizes, timing.</li>
<li>E.g., Fidelity lets customers manage stocks through an SSL web site.
<ul>
<li>Web site displays some kind of pie chart image for each stock.</li>
<li>User's browser requests images for all of the user's stocks.</li>
<li>Adversary can enumerate all stock pie chart images, knows sizes.</li>
<li>Can tell what stocks a user has, based on sizes of data transfers.</li>
</ul></li>
<li>Similar to CRIME attack mentioned in guest lecture earlier this term.</li>
</ul></li>
</ul>

<h2>References</h2>

<ul>
<li><a href="http://css.csail.mit.edu/6.858/2014/readings/brumley-timing.pdf">Remote timing attacks are practical</a></li>
<li><a href="http://css.csail.mit.edu/6.858/2014/readings/ht-cache.pdf">Cache missing for fun and profit</a></li>
<li><a href="http://www.tau.ac.il/~tromer/papers/cache-joc-20090619.pdf">Efficient Cache Attacks on AES, and Countermeasures</a></li>
<li><a href="http://www.tau.ac.il/~tromer/papers/handsoff-20140731.pdf">Get Your Hands Off My Laptop: Physical Side-Channel Key-Extraction Attacks on PCs</a></li>
<li><a href="http://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf">Cross-VM Side Channels and Their Use to Extract Private Keys</a></li>
<li><a href="http://ed25519.cr.yp.to/">Ed25519: high-speed high-security signatures</a></li>
</ul>