Good books for deep hacks
April 13, 2017
For the past few months I’ve been compiling a list of books for a deep dive into interesting technical topics. My theory is that working on projects based on these topics will be like strong individual threads I can weave into epic hacks. This list is basically a curriculum for decades of learning about the wonders of computers.
What’s exciting about many of these books is how they draw on the good ideas from history. Many of them cover technologies created in the 1990s and earlier, things that we’d do well to understand, even while surpassing them. Much old software has had time to mature, and has been adjusted to be very effective. If there’s a printed book that is old but still accurate this indicates the software it describes is well constructed.
I’ve also chosen books that cover alternative ways to do things. For instance learning about document layout engines to compare them with the current DOM/CSS monoculture, or about how various distributed version control systems compare with Git.
The books here are emphatically not about “cracking” coding interviews, or any other demonstrative brainteasing. It’s all about intrinsically interesting things. I’ve also omitted the usual suspects like SICP, TAOCP or CLRS – my choice of books are higher-level. They are guides for jumping into fun deep hacks.
Let’s start here. I want a language to grow with, one with enough depth to offer years of learning. For me that language is Haskell. Depending on the hack, I’ll be using Haskell or C. Why mess with the things in between? (What’s up with everyone nowadays using a misbegotten child of the browser wars as their main language?)
Haskell compiles into fast code if you avoid some gotchas, and prevents classes of dumb bugs that nobody should have to worry about.
- Haskell from First Principles
- Parallel and Concurrent Programming in Haskell: Techniques for Multicore and Multithreaded Programming
- Haskell High Performancee Programming
- Data 61 Functional Programming Course
Sure, Haskell is great and its abstraction is rewarding but you can’t beat the C language for intrinsic simplicity. The attendant tasks of manual memory management and concurrency may be complex, but there is certainly no hand-waving.
- The C Programming Language aka K&R
- Expert C Programming: Deep C Secrets
- The Standard C Library
- 21st Century C: C Tips from the New School
- C: A Reference Manual
Learn the measurements that are relevant to system performance, and how to design rigorous experiments to capture them.
- Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling
- Systems Performance: Enterprise and the Cloud
Stop guessing and flailing and instead use a systematic approach for finding bugs.
- Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
- Why Programs Fail: A Guide to Systematic Debugging
- Debug It!: Find, Repair, and Prevent Bugs in Your Code
- C language: Debugging with GDB: The GNU Source-Level Debugger
- C language: Valgrind 3.3 - Advanced Debugging and Profiling for Gnu/Linux Applications
- C language: DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD
Relational Data Management
Talk about mature technology, SQL has evolved for decades as the world’s foremost declarative language. This selection of books covers SQL mastery along with a deep understanding of the problems of transactions and recovery solved by modern RDBMSs.
- SQL for Smarties: Advanced SQL Programming
- Art of SQL
- SQL and Relational Theory: How to Write Accurate SQL Code
- Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery
- New Relational Database Dictionary: Terms, Concepts, and Examples
These books cover the history and design of TCP/IP and the standard network layers. They talk about design choices, and new developments like IPv6.
- History: Inventing the Internet
- History: Where Wizards Stay Up Late: The Origins of the Internet
- TCP/IP Illustrated, Volume 1: The Protocols
- TCP/IP Guide: A Comprehensive, Illustrated Internet Protocols Reference
- Wireshark Network Analysis: The Official Wireshark Certified Network Analyst Study Guide
The magic of radio… it’s a wonder of nature. From its simple spark gap origins to modern mesh networking, radio offers free lightspeed communication to all.
- ARRL Ham Radio License Manual
- Software Receiver Design
- Fundamentals of Mobile Data Networks
- History: Innovation Journey of Wi-Fi
- History: Wi-Fi and the Bad Boys of Radio: Dawn of a Wireless Technology
- 802.11 Wireless Networks: The Definitive Guide
- Wireless Mesh Networking
Delay-tolerant networked programs are designed to work smoothly under an intermittent network connection. They often use a store-and-forward system in which nodes exchange traffic only when they are able.
The old reality of telephone modems and long distance costs made these programs tough and resilient. In today’s always-connected world of pocket surveillance devices it’s nice to have software that works offline.
Good old email, the original social network. As a successful interoperable world-wide communications standard that has lasted for decades, it should be a rich and instructive topic.
- History: X.400 and SMTP: Battle of the E-mail Protocols
- Programmer’s Guide to Internet Mail: SMTP, Pop, IMAP, and LDAP
- Internet Email Protocols: A Developer’s Guide
- UUCP and Usenet
These systems allow decentralized propagation of files and messages over several different types of physical connections and link layer protocols.
- Distributed Version Control
I’ve been using Git for many years and quite enjoy it, or at least am brainwashed by familiarity. It would be worthwhile to give other systems a try for comparison.
- Pro Git
- Mercurial: The Definitive Guide
- Darcs Manual | Understanding Darcs
- Bazaar User Guide
Chat / Instant Messaging
Before the proliferation of web-based companies competing to host, hoard, and mine organizations’ chats, there was IRC. Learn how to use it and how to operate a channel. Help keep an open internet alive.
For a more person to person chat experience with support for multimedia, there’s XMPP, a well established open standard.
HTTP Reverse Proxy and Caching
Reverse proxies and load balancers have come up many times for me when working with web applications. I think it would pay off to learn them thoroughly.
Learn the building blocks of cryptography, and how/when to apply them as full cryptosystems. These books go deep but not in an overly proof-heavy way.
- Practical Cryptography
- Understanding Cryptography: A Textbook For Students And Practitioners
- Introduction to Public Key Infrastructures
- Implementing SSL/TLS Using Cryptography and PKI
- SSH, The Secure Shell: The Definitive Guide
- OpenPGP Message Format, RFC4880
- Digital Watermarking and Steganography
- History: PGP: Pretty Good Privacy
Much of the geeky encryption mumbo jumbo is defenseless against the power of law. What are reasonable expectations for privacy, what is the current law, and how should we frame this issue for those unfamiliar with it?
- Nothing to Hide: The False Tradeoff between Privacy and Security
- Obfuscation: A User’s Guide for Privacy and Protest
- Privacy, Information, And Technology
Dates and Times
Whenever a coding task involves date or time processing I always mentally add a big bump to my cost estimation. That’s because we’re hurtling through a cosmos of spinning rocks that are simultaneously free-falling toward each other, whose very measurements of time and distance are a relativistic funhouse mirror. We make feeble calendar simplifications and smirk, “looks like somebody has a case of the Mondays,” while the infinitude of space rolls above.
- Calendrical Calculations
- Time: From Earth Rotation to Atomic Physics
- vCalendar Specification
- iCalendar, RFC5545
Geographical Information Systems
Like measurements of time, measurements of space are complicated. However the payoff appears to be big, with query systems like PostGIS able to plan routes and answer sophisticated spatial queries.
Unicode and Fonts
Amazingly, people have created a standard that can encode all written human languages. Learning about this should provide an interesting perspective on writing and language itself.
- Unicode Explained
- Unicode Demystified: A Practical Programmer’s Guide to the Encoding Standard
- Fonts & Encodings
Being able to parse languages feels like the stuff that wizards do. Those people. Thus far I’m constrained by the syntax devised by others, but creating my own would feel pretty magical.
Understanding the techniques of automatic memory management allows us to predict and tune this aspect of runtime performance of programs written in high level languages. For instance, Haskell uses a generational garbage collector with tunable parameters. Knowing the theory allows for reasoned tuning.
I’ve used various distros (and of course OS X) for a long time. My knowledge is pretty strong, but it can be haphazard. These books and manuals fill in all the gaps.
- Debian Administrator’s Handbook, Debian Jessie from Discovery to Mastery
- Debian Package Management (in German, use Google Translate)
- Filesystem Hierarchy Standard
- GNU Coreutils Manual
- X Power Tools
- Learning Linux Binary Analysis
- Linux Programming Interface: A Linux and UNIX System Programming Handbook
- Security Power Tools
- Backup & Recovery: Inexpensive Backup Solutions for Open Systems
- Logging and Log Management
Document layout engines are designed to specify exactly how a document should look on a fixed size page. There are a number of popular systems and comparing them should be interesting.
- Developing with PDF: Dive Into the Portable Document Format
- PostScript Language Reference
- Computers & Typesetting, Volume A: The TeXBook
- Presentations with LaTeX: Which package, which command, which syntax? and the Beamer User Guide
- Groff, the GNU implementation of troff
Application layout engines deal with organizing graphical user interfaces which must accommodate variable window and display sizes.
Seems like everybody’s unreflectively in love with the DOM and CSS. They even use bloatware like Electron in order to bring this beloved layout engine to the desktop. What are the alternatives?
- Designing Visual Interfaces: Communication Oriented Techniques
- Explore these and try them out
What with evil maid attacks and Poisontap, I think it would be good to be educated about how USB really works. Plus it’s the way most devices connect.
I would love to make impeccable graphics, choosing raster or vector appropriately, and using the best file format for the job. Really understanding how images are encoded and how to efficiently use open source editing tools would provide a lot of power for designing beautiful and usable documentation and ornamentation.
- Real World Color Management: Industrial Strength Production Techniques
- Book of GIMP: A Complete Guide to Nearly Everything
- Compressed Image File Formats: JPEG, PNG, GIF, XBM, BMP
- Graphics File Formats: Reference and Guide
- SVG Essentials
- Book of Inkscape: The Definitive Guide to the Free Graphics Editor
- Inkscape Beginner’s Guide
I’m pretty good with Vim, but my reliance on fancy plugins makes me think there may be basics yet to learn in the core program. Also the Emacs based Org Mode looks like the textual Evernote killer.
- Learning the vi and Vim Editors
- Vim User Manual
- Org Mode 9 Reference Manual: Organize Your Life with GNU Emacs
How do you efficiently and accurately represent the arithmetic of the real numbers in a computer? The IEEE floating point standard has been called “one of the greatest achievements in computing,” so yeah, tell me more!
- Handbook of Floating-Point Arithmetic
- Numerical Computing with IEEE Floating Point Arithmetic
- The actual standard
The Human Side
Licenses and Law
Licenses capture people’s expectations for the behavior, development, and use of programs. Ultimately software exists for human beings, so this topic is very important. It’s also good to understand the implications of the terms and conditions attached to pretty much every commercial program and web site.
- Software Licensing Handbook
- Tech Contracts Handbook
- Software Agreements Line by Line
- Understanding Open Source and Free Software Licensing
- Intellectual Property and Open Source
I suck at estimating software development time! The reassuring thing is most people do. Think what a difference it would make to be able to formulate accurate confidence intervals for development time.
I have most experience with code reviews through the Github pull request workflow. However I’ve heard people complain that it is too primitive. Curious to see other approaches and try other programs for the job.