A Journey in Protecting Remote Data Backups
“Encryption works. Properly implemented strong crypto systems are one of the few things that you can rely on.” ~ Edward Snowden
“The solution to government surveillance is to encrypt everything.” ~ Eric Schmidt, Google
“If you've done nothing wrong you've got nothing to fear...”
(depends on what a country's government / dictatorship defines as 'right' and 'wrong')
“Still, I trust the mathematics [of cryptography].” ~ Bruce Schneier
“The only privacy left is inside of your head.” ~ Enemy of the State (tagline of the 1998 film)
Data backups need to be stored remotely from local drives – to protect against loss from fire, theft, and failure. Until the 2013 Snowden revelations, some people regarded an HTTPS or SSH connection as sufficient for the secure transfer of files, and left the rest to the storage provider.
In the early 2000s, a chance event showed me the insufficiency of HTTPS: after logging into my Yahoo Briefcase I had entered another user's Briefcase through a system error – revealing the vulnerability of my own files. Since then I've encrypted important files before uploading. DSCrypt on Windows (partial open source) provided convenient drag-and-drop encryption by a talented and crypto-aware programmer. However, as I used Linux PCs more, I needed a cross-platform encryption program.
My backups are important to me – information for my job, passwords – but of little importance to others, except for identity theft, passwords ...
Government agencies are unlikely to be interested in anything other than the passwords, but should they break the encryption, it doesn't mean an individual agent won't overstep his remit (as has occurred in the U.S.).
- I store compressed archives – many files into a single file – which makes encryption almost simple compared to encrypting email messages etc.
- I work on both Windows and Linux, so I need seamless cross-platform encryption and decryption on each operating system.
- GUI programs can be great, but I'm fine with the command-line, so that opens more choice.
- Strong encryption algorithms (ciphers) are essential.
- Effective password hashing and key-stretching are also essential to make brute-force cracking more difficult.
- An encryption program needs to be open source, so anyone can check the implementation (nothing like lots of eyeballs to find bugs and backdoors, paraphrasing Eric Raymond).
After much searching and evaluating, the most promising open source cross-platform file encryption programs I've used so far are 7-Zip and GNU Privacy Guard (GnuPG or GPG). In addition to operating on both Windows and Linux, these programs also run on ARM-based Linux (e.g. Raspberry Pi) and Mac OS X.
- Uses perhaps the finest file compression algorithm available.
- Many compression formats are supported.
- GUI file manager on Windows.
- Linux GUI Archive Manager integration on GNOME after installation.
- Command-line use and scripting available on all platforms.
- Supports ZipCrypto (legacy, insecure) and AES256 file encryption.
- Included by default in most Linux distros.
- Command-line based, with some independent GUI interfaces available.
- A large download installer for Windows. Alternatively, gpg.exe can be extracted from the Windows Git application.
- Adheres closely to the OpenPGP standard (explains the odd default settings).
- Complex security program, with a range of uses – file encryption is just one part.
How good are 7-Zip and GPG at strong file encryption?
... it quickly goes down a rabbit hole.
Cryptographic experts have reviewed GPG. Bruce Schneier currently uses it, which with his experience and knowledge, is a recommendation. But it's unthinkable that he would use the default options.
The GPG default file encryption options are barely adequate. Adhering to the OpenPGP standard, the default file encryption uses the CAST5 cipher with SHA1 key hashing and 65 thousand rounds of key-stretching (until version 1.4.20, which uses AES). CAST5 is an older cipher (1996) with a 64-bit block size (a size which throws 'message integrity' warnings in GPG) and a 128-bit key size. CAST5 hasn't been analysed as much as DES or AES. It dates from a time before the AES competition, which produced three much stronger ciphers (Rijndael [AES], Twofish, and Serpent).
If you don't need to adhere to the OpenPGP standard, you can harden GPG's key generation and encryption considerably.
Instead of the default file encryption:
gpg -c MyFile.txt
(which after prompting for a password twice, creates a CAST5-encrypted MyFile.txt.gpg [< v. 1.4.20])
gpg -c --s2k-mode 3 --s2k-count 65011712 --s2k-digest-algo SHA512 --cipher-algo TWOFISH MyFile.txt
– which hashes the password using SHA512, key-stretches the hash 65 million times, and encrypts the file data using the Twofish cipher with a 256-bit key.
(AES256 or CAMELLIA256 can be substituted for TWOFISH: they are all strong ciphers with 256-bit keys. In theory, CAMELLIA256 and TWOFISH have the greater safety margin (higher number of encryption rounds), whilst AES has undergone the most cryptanalysis [i.e. vetting].)
Despite the complexity of the encryption parameters above, decryption of the resulting encrypted file is identical to the default mode:
Great! Strong usable encryption (apart from the messy and unmemorable command-line switches [which can be ameliorated with GPGit or GPG-Gui]), which, when used with a password of 20+ characters, should mean that at current CPU/GPU speeds the Sun should expand and die before a government agency can brute-force the encryption.
Except... GPG knows exactly how to decrypt the .gpg file it is given without any of the original custom parameters. Indeed, GPG reports the cipher used before decrypting. Therefore, hash, key-stretching, and cipher information must be attached to the encrypted file. And if you enter the wrong password when trying to decrypt the file, GPG responds with:
gpg: decryption failed: bad key
So there's a password checksum present: if a block of bytes in the file does not match another block on initial decryption, the password must be wrong [further details]. GPG also uses file compression. A valid compression format, once decrypted, has checksums too. All this means that there is significant decryption information embedded in a GPG-encrypted file,
When an attacker intercepts such a file, much of the custom encryption information can be quickly obtained [see File Identification]. A brute-force attack is also potentially given a cue: process the file's password checksum and compare it to the the encrypted verification block.
GPG file protection must rely on strong user passwords and the extreme computer processing and associated electrical power necessary that makes brute-forcing a keyspace of 128-bit+ keys infeasible (a power of 2 for each key bit). But – brute-forcing a GPG-encrypted file using a list of common passwords (pre-hashed from the embedded hints, or using a modified-source GPG executable that receives bulk plaintext) is feasible.
- Usable GUI and Explorer shell integration on Windows.
- Integrates into the Archive Manager on Ubuntu-based Linux distros (e.g. file > right-click > compress > select '.7z' in dropdown).
- Always select the option 'Encrypt file names' / 'Encrypt the file list too' when creating password-protected archives.
- (Enter a strong password in the password fields on Windows, else a non-encrypted archive is produced.)
- 7z / 7zr is the command-line utility.
7-Zip is primarily created by one skilled programmer, Igor Pavlov. From the author's claim, 7-Zip uses a key hashed with SHA256 and key-stretched 512,000 times. The single cipher provided is 256-bit AES in CBC-mode. The random initialization vector (IV) is a “complex function with many iterations of SHA1 (current time, CPU speed, processID, threadID)”, the key-stretching is “512K iterations of (UTF-16 password, iteration_number)” (Igor Pavlov, SourceForge, 2011-02-22). There are no configuration options. However, if these methods are implemented correctly in all the code, the security is strong.
Looking at the 7z source code (version 9.20), the trouble is finding confirmation of the above in 4MB of code (despite fast greps):
static const UInt32 kNumKeyGenIterations = 1000; WzAES.cpp
16 byte salt for AES 7zAes.h
(1 << (kNumHashBytes * 8) # 65536 BwtSort.c
// This is not very good random number generator. RandGen.cpp
// Please use it only for salt.
Comments above copied from the C++ file; crypto-secure random number generation is paramount for IV, salt, and complex-password generation – weaknesses will compromise the encryption system.
Both 7-Zip encrypted archives and GPG-encrypted files clearly identify themselves. That helps the Linux 'file' information command, but will also assist government agencies seeking to brute-force an encrypted file or attack a known weakness in a cipher.
The following can be identified in hex editors:
- '7z' is the first two characters of the file.
- The first eight bytes of the file are always the same.
- Linux reports the file as '7-zip archive (application/x-7z-compressed)'
- The first three bytes of the file are constant: these must be the GPG file identifier.
- The fourth byte is the cipher used, e.g. 0x09 for AES256, 0x0A for Twofish.
- The fifth byte does not change.
- The sixth byte is the hash algorithm used e.g. 0x08 for SHA256, 0x0A for SHA512.
- Subsequent bytes are probably the IV.
- Linux reports the file as 'PGP/MIME-encrypted message header (application/pgp-encrypted)'
With 7-Zip, you use what it provides for creating and encrypting .7z files. In theory, its key generation and encryption should be strong, and indeed the Crark-7z program suggests it is (~100 password attempts per second on an i3 2.3GHz). Nevertheless, parts of the v. 9.20 source code are vague.
GPG gives you options to considerably strengthen the key generation and encryption. For usability, GPG embeds this information into the encrypted file. But this information also provides decryption insights to a powerful and patient attacker.
Strong encryption relies on the strength of the chosen cipher and its programming implementation, along with a crypto-secure pseudo-random number generator and a strong user password. The sheer complexity and size of the code-base of GPG and 7-Zip (particularly GPG, as it's a multi-role application) means that code can be overlooked, errors inadvertently introduced, and surreptitious tampering exacted. A single character changed in a program statement can be subtle, difficult to spot, and create a devastating security hole (e.g. == changed to =). GPG probably has more eyeballs reviewing its source code for vulnerabilities than 7-Zip. Nevertheless, it has been suggested that a large open source code-base with many developers and sloppy version control can provide just the environment for tampering.
So, do I trust GPG or 7-Zip with protecting my data? Mostly, I do.
For securely protecting remote backups, I would recommend overlaying encrypted 7-Zip archives with customised GPG encryption.
(Please note that this subject is an ongoing interest. I'm always learning, and 7-Zip and GPG are always progressing and fixing issues.)
Page created November 2013, last updated 17 February 2017.
* * *
Other cross-platform encryption programs.
An initially promising old standalone program. But I found it full of bugs that aren't being fixed (see its SourceForge page), it hasn't been updated since 2007 on Windows, and Arch Linux recently dropped the package from its repository. Will be removed from PHP soon.
My lightweight standalone program created in response to dropping NCrypt. Uses the Blowfish cipher with a 448-bit key (the old cipher still has its strengths, particularly against brute-forcing). The program's key-stretching routine can be improved. It's advantages over some other command-line programs are simple options and no header byte identification or checksum verification. Conversely, this approach is usability: enter the wrong password and the output file becomes a mangled binary blob. The only saving grace is that the program refuses to overwrite the encrypted source file, so the decryption process can be repeated with different passwords indefinitely. Files ideally need to be compressed before encrypting. An MLCrypt-encrypted file is reported as plain text or binary.
No key-stretching used, merely uses a SHA1 hash of the password. The program appears to be a bolt-together of the AES competition source code (which is an achievement in itself). Linux operation is fine. File creation is buggy and annoying on 32-bit Windows. Due to 32-bit pointer errors in the C source, NCrypt refuses to run on 64-bit Windows versions.
Possible, but tricky command-line switches (e.g. salt parameters) and usage I found in practice. Too risky for my usage. Additionally, OpenSSL is not the most respected codebase.