Authenticating people
=====================

Problem: how to authenticate a person in a computer system?
  Authentication is a basic starting point for most security discussions.
  Turns out to be a difficult problem!

Larger context: deciding what to do with requests.
  User asks computer to perform some request.
  Typical plan:
    1. Authenticate: what user issued the request?
      The result is some identifier (name) for the user.
    2. Authorize: is that user is allowed to do requested operation?
      The result is yes/no.
  Often useful to separate authentication from authorization.
  Will talk more about this broader context later.
  For now, how can we authenticate the user issuing a request?

Many ways to authenticate.
  Different approaches make sense in different settings.
  For this lecture, will focus on authenticating humans.
  Later lecture will talk about authenticating computers.

Simplest setting: exchanging messages between a person and a computer.
  User <~~~> Computer.
  Authentication using a shared secret.
  Password, PIN.

What makes for a good password?
  Adversary should not be able to log in as a victim user.
  Is "password" a good password?
  Is "passW0rd1!" better than, say, "yellow-elephant-reading"?
  What matters is the entropy with respect to adversary's knowledge of
    password distributions, not formats.
  Practical password distributions are extremely skewed.
    E.g., 5000 most popular passwords cover 20% accounts (millions).
    Similar patterns across web sites.
    [[ Ref: statistics from various password disclosures ]]
  Complex-looking passwords are not strong if they are predictable.

Password goal: high entropy.
  Generate password for user: high entropy, but may have trouble remembering.
  Allow user to choose password: unclear entropy.
    Hard to enforce entropy, unlike format requirements.
  Forcing password changes may reduce entropy.
    User forced to remember new passwords, might not bother with high entropy.
    Forcing password change is more relevant as defense for leaked passwords.
    Trade-off depending on what attacks you think are more relevant.

Passwords require limiting the number of attempts.
  Even strong passwords are unlikely to offer lots of entropy.
  If adversary can make many guesses, can probably access many accounts.
    20-30 bits of entropy is pretty high for a password.
    2^20 (1M) or even 2^30 (1B) attempts is easy for a computer.
  Important to limit the number of password authentication attempts.
  Suppose we limit every account to 10 attempts; what goes wrong?
  Availability.  Lock out users?
  Security across users.
  Rate-limiting based on some other resource (e.g., IP address).
  CAPTCHAs, though tricky to use correctly (see previous lecture).

How to store passwords on a server?
  Naive approach: table of user, password.
  Problematic if adversary compromises server, learns everyone's password.
    Problem 1: need to establish new passwords for users on that server.
    Problem 2: if users had same password on many systems,
      adversary can now log into other systems as a result!

Better plan for storing passwords: hashing.
  Store username and H(password).
  If hash function is one-way, cannot directly obtain password from hash.
  Adversary has to repeatedly hash guesses and compare.
  Good to have a slow hash function!  Quite a different goal from fast hashes.
    "Key derivation function" is the name typically used for this construct.
    PBKDF2, bcrypt, scrypt, ..
  Potential attack: pre-compute hashes of known passwords ("rainbow table").
    Similar to the rate-limiting problem we saw earlier.

Even better hashing plan: use salts.
  Store username, salt, H(salt || password).
  Every time an account is created or updated, pick a fresh salt.
  Can still authenticate users: get the salt from the table.
  Pre-computing tables no longer sensible: too many salts.

Improving passwords using a password manager.
  New setting: human <-> human's computer <-> server.
    We assume human's computer is trusted here, but it's a big assumption.
    Many real attacks take advantage of this (e.g., key loggers).
  Better secrets on the device: password manager, effectively.
  Really long password ("key") stored on the client computer.
  Authenticate human to client computer first
    Log into your laptop / password manager.
  Then use the long key to authenticate to server.
  Benefit: adversary has to guess a high-entropy password for servers.
  Benefit: human-memorized password is only used when logging into laptop.
    Adversary would need to steal your laptop to try to guess that password
      so they might get access to the high-entropy server passwords.
    .. Or just get malware running on your laptop somehow.

What if the link between the user and server is being monitored?
  Sending password directly to server means adversary sees the password.
  Alternative plan: challenge-response protocol.
  Server picks some random value, called a nonce, sends it to the client.
  Client sends back MAC(KDF(password), nonce).
    Think of MAC as a fancy hash function, to be covered in lectures soon.
  Server can check if this matches (requires server to store raw password).
  Need to be careful: what happens if there are many servers out there?
    Client might accidentally authenticate to the wrong server.
    Client should include expected server name in the input for hash response.
  Should probably use a slow hash (KDF) here too.
    Adversary monitoring the network could use response to brute-force password.
  Fancier cryptographic schemes can reconcile hashed storage, challenge-response.
    SRP.

What if an adversary can substitute their own request?
  Important to bind the authentication to the request.
  We've been informally assuming a secure channel (e.g., direct wire).
  Over the network, adversary can observe messages, tamper with requests, etc.
  Future lectures will built up to how we can establish a secure channel.
  Another approach: directly connect the request to the authentication.
  E.g., send MAC(KDF(password), nonce || request).

Meta-technique: sessions.
  Passwords are tricky and error-prone; minimize their use.
  Authenticate once to establish a "session".
  Subsequent requests can be issued with a session secret instead of password.
    In HTTP/HTTPS, this often ends up in a Cookie.
  Requires cryptography to exchange a secret without adversary seeing it.
  Session token has lots of randomness -- not susceptible to guessing.
  Typically short-lived; re-authenticate with password every once in a while.

One more big problem with passwords: susceptible to "phishing" attacks.
  User can inadvertently reveal password to the adversary.
  E.g., visit a website that looks like the real bank, type in bank password.

Defending against password weaknesses: two-factor authentication.
  Require user to authenticate in some other way in addition to passwords.
  Ideally the two authentication plans have uncorrelated security failures.
  Typically three classes of authentication plans:
    "Something you know": passwords.
    "Something you have": device.
    "Something you are": biometrics.

Time-based one-time passwords (TOTP).
  Slightly different setting again: human's TOTP device <-> human <-> server.
    E.g., Google Authenticator app on an Android phone.
  Secret K shared between the TOTP device and the server.
    This is set up when you enroll in TOTP-based two-factor authentication.
  TOTP device continuously computes:
    MAC(K, current time div 30 seconds)
    Typically displayed as a 6-digit code (truncated hash).
  User must enter the current code when logging in.
  Server can compute the expected MAC value for any given user.
    Can even check for several time values, just in case.
  Defends against weak passwords: adversary does not know TOTP code.
  Defends against device loss: adversary cannot login w/ TOTP alone.
  Still susceptible to phishing attacks.
    User can mistakenly disclose both password and TOTP value to wrong server.

Defending against phishing attacks: bind authentication to server.
  Slightly different setting: human <-> human's computer w/ 2FA key <-> server.
  Strawman / sketch: compute TOTP code by including server name.
    MAC(K, current time || server name)
  Need computer involved in 2FA because computer can precisely identify server.
    E.g., web browser knows exactly what URL is being loaded.
    But human might be sloppy in checking all the URL letters are correct.
  Server receiving TOTP code checks that it's based on expected server name.
    Otherwise, this code came from user being tricked by phishing attack.
    Reject incorrect server names.
  This is a much simplified version of the U2F two-factor authentication protocol.
    Typically U2F involves a USB device with the secret.
    Browser and USB device cooperate.
    Uses what's called "public-key cryptography" instead of MACs; will see later.

Biometrics.
  Modest amount of entropy.
  Not particularly secret.
  More of an identity rather than a key/password.

How to use biometric authentication?
  Authentication between a human and a computer, directly next to each other.
  Need a trusted input path (sensor) for biometrics.
  Hard to authenticate over the network.
  Not useful to think of biometrics as a password: noisy, hard to change.
  Useful for authenticating to your phone / computer.

Main point: computer has to trust biometric reading is coming from real human.
  E.g., fingerprint reader's hard problem is identifying live finger.
  E.g., face authentication's hard problem is identifying live image vs. photo.
    Apple devices have specialized hardware to project random IR dots on face.
  Moderate cost to defeat biometrics.
    Manufacture silicone finger replica, textured head replica, etc.
    Does not scale to many devices: attack each one separately.

Meta-approach: delegation / "single-sign-on".
  Human authenticates with computer A.
  Computer A vouches for human's identity to computer B.
    B trusts A to correctly authenticate the human.
  Naive plan: human <-> computer A <-> computer B.
    A relays human's request to B.
    Not great (bottleneck, privacy, etc) but gives some intuition.
  Real systems use more sophisticated cryptographic techniques.
    Will learn about them soon!
  Common in larger systems.
    MIT: many servers but just one account.
    Kerberos, Touchstone.
    "Sign in with Google", etc.

So far, we've talked about authentication in the steady-state.
  Two other important phases: registration and recovery.

Bootstrapping / registration.
  How to establish the initial link between a person's identity and credentials
    (password, TOTP secret, etc)?
  First-come first-served.
    Anyone can register a new identity, as long as it wasn't used before.
    Used for registering accounts on open systems -- e.g., gmail.
  Bootstrap identity verification from another mechanism.
    Must prove identity in some other way.
    E.g., web sites require verifying email to create an account.
  Administrator-managed / out-of-band.
    New employee account at company created by HR.
    New student account at MIT created by admissions / registrar's office.

Recovery.
  Password, device loss.
  Security questions: talked about these last week.
  Another mechanism: recovery email, credit card number, etc.
  No recovery, create new account (if there's little value in identity).
  Call customer service: escape hatch, often with ill-defined policies.
    Susceptible to social engineering attacks.

Privacy.
  May be desirable to avoid linking identities across applications.
  Don't reuse passwords.
  Hard to avoid with biometrics.

Summary.
  Three basic approaches for authenticating humans:
    "Something you know": passwords.
    "Something you have": devices.
    "Something you are": biometrics.
  Different approaches make sense in different settings.
  Important to bind authentication to a request.
    Will talk more in future lectures about MACs, secure channels.
  Cryptography used in many ways related to user authentication.
  Registration is crucial: establishes trust.
  Recovery.
  Meta-approaches: two-factor, sessions, delegation.