begriffs

How Transparent Encryption Works in HDFS

February 22, 2015

Charles Lamb, software enginner at Cloudera, describes the tradeoffs between various levels of encryption, the choices he made when designing transparent encryption in HDFS, and the concepts you need to understand to use it.

Summary

  • Transparent encryption: data is read and written to an encrypted subtree on HDFS
  • This helps helps applications be regulation-compliant
  • Encryption/decryption is always handled by the client, HDFS itself never sees plaintext
  • The levels of encryption:
    • Application (hard to do and to add to legacy apps)
    • Database (prone to leaks through e.g. secondary indices)
    • Filesystem (higher performance, transparent, less flexible for various tenants)
    • Disk level (only protects against physical theft)
  • HDFS transparent encryption lives somewhere between db and filesystem levels
  • Design goals
  • In-depth explanations of architectural concepts
    • Key-management server
    • Encryption zones
    • Keys
  • HDFS encryption configuration
  • Per-user and per-key ACLs
  • Performance results