Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cryptography in MLA

MLA uses cryptographic primitives essentially for the purpose of the Encrytion and Signature layers.

This document introduces the primitives used, arguments for the choice made and some security considerations.

Keys used for encryption and signature are generated and used separately.

Signature

As described in FORMAT.md an archive can be signed. Implementation must ensure users explicitely choose if signature is made and verified.

A PQ/T key consists of a pair of a post-quantum key and a traditional key. An archive is considered correctly signed for a PQ/T key if and only if it is correctly signed for its post-quantum part AND its traditional part.

Two signature methods are available and must be used together. Signature method input is called m. The SHA-512 hash h of m may be computed in a first step.

For method MLAEd25519SigMethod, signature_data is the Ed25519ph (as described in RFC 8032 1) signature of m (not h even though it can be used for computing the result). The context given as parameter to Ed25519ph is the ASCII MLAEd25519SigMethod. Signature verification and key generation are done as described in RFC 8032. Key storage is described in KEY_FORMAT.md.

For method MLAMLDSA87SigMethod, signature_data is the ML-DSA-87 signature (as described in FIPS 204 2, not HashML-DSA) of h (not m this time) with the ASCII MLAMLDSA87SigMethod as context. Signature verification and key generation are done as described in FIPS 204. Key storage is described in KEY_FORMAT.md.

An archive can be signed with multiple signing keys. If a user provides a set of PQ/T keys for signature verification, implementations should give a way for the user to know if archive is correctly signed for at least one key. Implementations may give a way for users to know if archive is correctly signed for all keys. Users must explicitely know if they are validating against at least one or all keys. Implementations may also give a way for users to know which PQ/T keys correspond to valid signatures or their number.

Encryption high-level overview

Objectives

The purpose of the Encryption layer is to provide confidentiality and data integrity of the inner layer.

These objectives are obtained using:

  • Authenticated encryption
  • Asymmetric cryptography, for several recipients

This layer does not provide signature.

General design guidelines

  1. The size and the initial computation time used for the encryption needs are not a big issue, if kept reasonable. Indeed, in the author understanding, MLA archives are usually several MB long and the computation time is primarily spent in compression/decompression and encryption/decryption of the data

As a result, some optimization have not been performed -- which help keeping an hopefully auditable and conservative design.

  1. Only one encryption method and key type is available, to avoid confusion and potential corner cases errors

  2. When possible, use audited code and test vectors

Main bricks: Encryption

The data is encrypted using AES-256-GCM, an AEAD algorithm. To offer a seekable layer, data is encrypted using chunks of 128KB each, except for the last one. These encrypted chunks are all present with their associated tag. Tags are checked during decryption before returning data to the upper layer.

To prevent truncation attacks, another chunk is added at the end corresponding to the encryption of the ASCII string "FINALBLOCK" with "FINALAAD" as additional authenticated data. Any usage of the archive must check correct decryption (including tag verification) of this last block.

The key, the base nonce and the nonce derivation for each data chunk are computed following HPKE (RFC 9180) 3. HPKE is parameterized with:

  • Mode: "Base" (no PSK, no sender authentication)
  • KDF: HKDF-SHA512
  • AEAD: AES-256-GCM
  • KEM: Multi-Recipient Hybrid KEM, a custom KEM described later in this document

Thus, only one cryptography suite is available for now. If this setting ends up broken by cryptanalysis, we will move users onward to the next MLA version, using appropriate cryptography. Therefore, MLA lacks cryptography agility which is an encouraged property regarding post-quantum cryptography by ANSSI 4. Still, HPKE improves this aspect of MLA 3.

Full details are available below.

Additionally, "key commitment" is included using a method described in 5 and detailed in 6.

Main bricks: Asymmetric encryption

Since the format v2, the Encrypt layer is using post-quantum cryptography (PQC) through an hybrid approach, to avoid "Harvest now, decrypt later" attacks.

The algorithms used are:

  • X25519 for pre-quantum cryptography, using DHKEM (RFC 9180) 3
  • FIPS 2037 (CRYSTALS Kyber) MLKEM-1024 for post-quantum cryptography

The two keys are mixed together (see below) in a manner keeping the IND-CCA2 properties of the two algorithms.

Sending to multiple recipients is achieved using a two-step process:

  1. For each recipient, a per-recipient Hybrid KEM is done, leading to a per-recipient shared secret
  2. These per-recipient shared secret are derived through HPKE to obtain a key and a nonce
  3. These per-recipient key and nonce are used to decrypt a secret shared by all recipients

This final secret is the one later used as an input to the encryption layer. The whole process can be viewed as a KEM encapsulation for multiple recipients.

Encryption Details

The following sections describe the whole process for data encryption and seed derivation. They are meant to ease the understanding of the code and MLA format re-implementation.

The interested reader could also look at the Rust implementation in this repository for more details. The implementation also includes tests (including some test vectors) and comments.

Asymmetric encryption - Per-recipient KEM

Notations
  • , , and : respectively the X25519 public key and secret key, and the MLKEM-1024 (FIPS 203 7) encapsulating key and decapsulating key
  • and : key encapsulation methods with X25519, as defined in RFC 9180, section 4 3
  • and : key encapsulation methods on MLKEM-1024, as defined in FIPS 203 7
  • : a 32-bytes secret, produced by a cryptographic RNG. Informally, this is the secret shared among recipients, encapsulated separately for each recipient
  • : KeySchedule function from RFC 9180 3, instanciated with:
    • Mode: "Base"
    • KDF: HKDF-SHA-512
    • AEAD: AES-256-GCM
    • KEM: a custom KEM ID, numbered 0x1120
  • : AES-256-GCM encryption, returning the encrypted data concatened with the associated tag
  • : AES-256-GCM decryption, returning the decrypted data after verifying the tag
  • and : respectively produce a byte string encoding the data in argument, and produce the data from the byte string in argument
Process

To encrypt to a target recipient , knowing and :

  1. Compute shared secrets and ciphertexts for both KEM:

  1. Combine the shared secrets (implemented in mla::crypto::hybrid::combine):
def combine(ss1, ss2, ct1, ct2):
    uniformly_random_ss1 = HKDF-SHA512-Extract(
        salt=0,
        ikm=ss1
    )
    key = HKDF(
        salt=uniformly_random_ss1,
        ikm=ss2,
        info=ct1 . ct2
    )
    return key

  1. Wrap the recipients' shared secret:

Informally, this process can be viewed as a per-recipient KEM taking a shared secret , the recipient public key (made of the elliptic curve and the PQC public keys) and returning a ciphertext .


To obtain the shared secret from for a recipient knowing and :

  1. Compute the recipient's shared secret:

  1. Try to decrypt the secret shared among recipients:

If the decryption is a success, returns . Otherwise, returns an error.

Arguments
  • Using HPKE (RFC 9180 3) for both elliptic curve encryption (DHKEM) and post-quantum encryption (MLKEM) offers several benefits8:
    • Easier re-implementation of the format MLA, thanks to the availability of HPKE in cryptographic libraries
    • An existing formal analysis 9
    • Easier code and security auditing, thanks to the use of known bricks
    • Availability of test vectors in the RFC, making the implementation more reliable
    • If signature is added to MLA in a future version, it could also be integrated using HPKE
  • To the knowledge of the author, no HPKE algorithm has been standardized for quantum hybridation, hence the custom algorithm
  • FIPS 203 is used as, at the time of writing:
    • It is the only KEM algorithm standardized by the NIST 10
    • It is in line with the French suggestions 4 for PQ cryptography
  • The MLKEM-1024 mode is used for stronger security, and to limit consequence of future advances 11 12. This is also the choice of other industry standards 13 14
  • The shared secret from the two-KEM is produced using a "Nested Dual-PRF Combiner", proved in 15 (3.3):
    • The use of concatenation scheme including ciphertexts keeps IND-CCA2 if one of the two underlying scheme is IND-CCA2, as proved in 16 and explained in 17
    • TLS 18 uses a similar scheme, and IKE 19 also uses a concatenation scheme
    • This kind of scheme follows ANSSI recommendations 4
    • HKDF can be considered as a Dual-PRF if both inputs are uniformly random 20. In MLA, the combine method is called with a shared secret from ML-KEM, and the resulting ECC key derivation -- both are uniformly random
    • To avoid potential mistake in the future, or a mis-reuse of this method, the "Nested Dual-PRF Combiner" is used instead of the "Dual-PRF Combiner" (also from 15). Indeed, this combiner force the "salt" part of HKDF to be uniformly random using an additional PRF use, ensuring the following HKDF is indeed a Dual-PRF

Asymmetric encryption - Multi-Recipient Hybrid KEM

Intuition

KEM, such as the one described above, returns a fresh and distinct secret for each recipient.

To obtain a "meta-KEM", working for multi-recipient, the strategy is the use of per-recipient KEM to encrypt a common secret.

This whole process can then be viewed as a KEM for multi-recipient, taking in input a list of public keys and returning a shared secret and a ciphertext made of the concatenation of each per-recipient ciphertext.

To avoid marking which per-recipient ciphertext correspond to which recipient public key, the decapsulation process "brute-force" each ciphertext for a given decapsulation key. If the decryption works (with the associated tag), the shared secret is returned.

Key commitment, to avoid rather unlikely mismatch, is further ensured inside the Encrypt layer (see below).

Process

The "Per-recipient KEM" process described above is noted:

  • , taking a couple of public key ( and ), a shared secret and returning a recipient ciphertext
  • , taking a couple of private key ( and ), a ciphertext and returning either a shared secret if the recipient is a legitimate recipient (if the AEAD decryption works), or an error otherwise

is a cryptographically secured RNG producing a n-bytes secret.

To encapsulate to a list of recipient :


To decapsulate from a ciphertext , knowing a recipient private key :









Arguments
  • The shared secret is cryptographically generated, so it can later be used as a shared secret in HPKE encryption
  • This secret is unique per archive, as it is generated on archive creation. Even "converting" or "repairing" an archive in mlar CLI will force a newly fresh secret. It is a new secret as there is no edit feature implemented, even if it is doable. Hence, a new random symetric key is used to encrypt its content while "converting" or "repairing" an archive.
  • Even if the AEAD decryption worked for an non legitimate recipient, for instance following an intentional manipulation, the shared secret obtained will later be checked using Key commitment before decrypting actual data (see below)
  • Optimization would have been possible here, such as sharing a common ephemeral key for the DHKEM. But the size gain is not worth enough regarding the ciphertext size of MLKEM and would move the implementation away from the DHKEM in RFC 9180

Encryption

Notation

The "Multi-Recipient Hybrid KEM" process described above is noted:

  • , taking a list of public keys and returing a shared secret and a ciphertext
  • , taking a couple of private keys ( and ), a ciphertext and returning either a shared secret if the recipient is a legitimate recipient (if the AEAD decryption works), or an error otherwise

KeyCommitmentChain is defined as the array of 64-bytes: -KEY COMMITMENT--KEY COMMITMENT--KEY COMMITMENT--KEY COMMITMENT-.

: KeySchedule function from RFC 9180 3, instanciated with:

  • Mode: "Base"
  • KDF: HKDF-SHA-512
  • AEAD: AES-256-GCM
  • KEM: a custom KEM ID, numbered 0x1020

: function from RFC 9180 3.

Process

To encrypt n-bytes data to a list of public keys :

  1. Compute a shared secret and the corresponding ciphertext:

  1. Derive the key and base nonce using HPKE

  1. Ensure key-commitment

  1. For each 128KB of data:

Note: starts at 0. is used because the sequence numbered 0 has already been used by the Key commitment.

  1. When the layer is finalized, the last chunk of data (with a length lower than or equals to 128KB) is encrypted the same way

  2. Finally, a final chunk with sequence number (where is the number of data chunks) and special content and additional authenticated data is appended:

The resulting layer is composed of:

  • header:
  • data:

Special care must be taken not to reuse a sequence number in implementations as this would be catastrophic given GCM properties. For chunks of data:

  • sequence 0: key commitment
  • sequence 1 to : data
  • sequence : with only the 10 bytes "FINALBLOCK" as content

To decrypt the data at position :

  1. Once for the whole session, get the cryptographic materials

  1. Once for the whole session, check the key commitment

  1. Retrieve the encrypted chunk of data

Where is the Euclidian division.

Then:

Arguments
  • Key commitment is always checked before returning clear-text data to the caller
  • AEAD tag of a chunk is always checked before returning the corresponding clear-text data to the caller
  • Arguments for HPKE use are very similar to the ones mentioned above. In particular, this is a standardized approach with existing analysis
  • As there is two kind of custom KEM used ("Per-recipient KEM" and "Hybrid KEM"), two distinct KEM ID are used. In addition, two distinct MLA specific info are used to bind this derivation to MLA
  • As described in 5 and 21, AES in GCM mode does not ensure "key commitment". This property is added in the layer using the "padding fix" scheme from 5 with the recommended 512-bits size for a 256-bits security
  • Key commitment is mainly used to ensure that two recipients will decrypt to the same plaintext if given the same ciphertext, i.e. an attacker modifying the header of an archive cannot provide two distinct plaintext to two distinct recipient
  • AES-GCM is used as an industry standard AEAD
    • the base nonce, and therefore each nonce used, are unique per archive because they are generated from the archive-specific shared secret, limiting the nonce-reuse risk to standard acceptability 3
    • no more than chunks will be produced, as the sequence's type used in MLA implementation is a u64 checked for overflow. As this is a widely accepted limit of AES-GCM, this value is also within the range provided by 3
    • the tag size is 128-bits (standard one), avoiding attacks described in 22
    • 128KiB is lower than the maximum plaintext length for a single message in AES-GCM (64 GiB)22

Seed derivation

The asymmetric encryption in MLA, particularly the KEMs, provides deterministic API.

These API are usually fed with cryptographically generated data, except for the regression test and the "seed derivation" feature in mlar CLI.

This feature is meant to provide a way for client to implement:

  • A derivation tree
  • Keep the root secret in a safe place, and be able to find back the derived secrets

The derivation scheme is based on the same ideas than mla::crypto::hybrid::combine:

  1. A dual-PRF (HKDF-Extract with a uniform random salt 20) to extract entropy from the private key
  2. HKDF-Expand to derive along the given path component

From a private key ( and ), the secret is derived from the path component through:

To derive a key using a seed, a ChaCha20Rng is used. If a seed is provided, the ChaCha20Rng is seeded with the first 32-bytes of . Otherwise, the seed comes from OS Cryptographic RNG sources.

A ChaCha20Rng is the ChaCha2023 stream cipher feeded with a seed as key and 8 null bytes as nonce.

The CSRNG is then provided to MLA deterministic APIs.

Implementation specificities

External dependencies

Some of the external cryptographic libraries have been reviewed:

  • RustCrypto AES-GCM, reviewed by NCC Group 24
  • Dalek cryptography library, reviewed by Quarkslab 25
  • rust-hpke library, reviewed in version 0.8 by CloudFlare 26

In addition to the review, rust-hpke is mainly based on RustCrypto, avoiding the need for additional newer dependencies.

The MLKEM implementation used is the one of RustCrypto, as MLA already depends on this project and the code quality and auditability are, in the author understanding, rather good.

The generation uses OsRng from crate rand, that uses getrandom() from crate getrandom. getrandom provides implementations for many systems, listed here. On Linux it uses the getrandom() syscall and falls back on /dev/urandom. On Windows it uses the RtlGenRandom API (available since Windows XP/Windows Server 2003).

In order to be "better safe than sorry", a ChaCha20Rng is seeded from the bytes generated by OsRng in order to build a CSPRNG(Cryptographically Secure PseudoRandom Number Generator). This ChaCha20Rng provides the actual bytes used in keys and nonces generations.

The authors decided to use elliptic curve over RSA, because:

  • No ready-for-production Rust-based libraries have been found at the date of writing
  • A security-audited Rust library already exists for Curve25519
  • Curve25519 is widely used and respects several criteria
  • Common arguments, such as the ones of Trail of bits

AES-GCM is used because it is one of the most commonly used AEAD algorithms and using one avoids a whole class of attacks. In addition, it lets us rely on hardware acceleration (like AES-NI) to keep reasonable performance.

AES-GCM re-implementation

While the AES and GHash bricks come from RustCrypto, the GCM mode for AES-256 has been re-implemented in MLA.

Indeed, the repair mode must be able to only partially decrypt a data chunk, and decide whether the associated tag must be verified or not. This API is not provided by the RustCrypto project, for very understandable reasons.

To ensure the implementation follows the standard, it is tested against AES-256-GCM test vectors in MLA regression tests.

HPKE Key Schedule re-implementation

For several reasons described in the code, but mainly due to the availability of API, the possibility to add custom KEM ID and the relative few lines needed for re-implementation, the method has been re-implemented in MLA.

It still use some bricks from rust-hpke, as the KDF, and . It is tested against RFC 9180 3 test vectors in MLA regression tests.

MLKEM implementation without a review

Thanks to the hybrid approach, a flawed implementation of MLKEM would have limited consequences. It satisfies ANSSI guidelines for the transition first phase to PQC hybridization 4. For this reason, MLA is eligible for a security visa evaluation.

For now, it is therefore accepted by the author (as a trade-off) to use a MLKEM implementation without existing review to bring as soon as possible a reasonable protection against "Harvest now, decrypt later" attacks.

If a reviewed implementation with acceptable dependency emerges in the future, it can be easily swapped in MLA. Thus, MLA would also satisfy the requirements to get a security visa evaluation in the second and third phases of these guidelines by including its PQC implementation.

Security considerations

Absence of signature

As there is no signature for now in MLA, an attacker knowing the recipient public key can always create a custom archive with arbitrary data.

For this reason, several known attacks are considered acceptable, such as:

  • The bit indicating if the Encrypt layer is present is not protected in integrity

An attacker can remove it, making the reader treating the archive as if encryption was absent. The reader is responsible of checking for encryption bit if it was expected in the first place.

For instance, the mlar CLI will refuse to open an archive without the Encrypt bit unless --accept-unencrypted is provided on the command line.

  • An attacker with the ability to modify a real archive in transit can replace what the reader will be able to read with arbitrary data

To perform this attack, the attacker will have to either remove the Encrypt bit or modify the key used for decryption with one she has. The remaining encrypted data will then act as random values.

Still, the attacker could expect to gain enough privilege, like arbitrary code execution in the process, during the archive read. One can then try to reuse the provided key to decrypt, then act on the real data.

Limiting this attack is beyond the scope of this document. It mainly involves the security features of Rust, reviewed implementation, testing & fuzzing, zeroizing secrets when possible 27, etc.

  • An attacker can truncate an archive and hope for repair

This attack is based on a trade-off: should the SafeReader try to get as many bytes as possible, or should it return only data that have been authenticated?

The choice has been made to report the decision to the user of the library28.

Other properties

  • Plaintext length

The Encrypt layer does not hide the plaintext length.

Usually, this layer is used with the Compress layer. If an attacker knows the original file size, he might learn information about the original data entropy.

  • Hidden recipient list

Only the owner of a recipient's private key can determine that they are a recipient of the archive. In other words, while the recipient list remains private, the total number of recipients is still visible.

This is an intentional privacy feature.


  1. RFC 8032 - Ed25519 Signature Algorithm

  2. NIST FIPS 204

  3. Hybrid Public Key Encryption, RFC 9180 ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11

  4. ANSSI Position Paper on Post-Quantum Cryptography ↩2 ↩3 ↩4

  5. How to Abuse and Fix Authenticated Encryption Without Key Commitment, Usenix'22 ↩2 ↩3

  6. MLA GitHub Issue #206

  7. FIPS 203 - MLKEM Standard ↩2 ↩3

  8. MLA GitHub Issue #211

  9. A Formal Analysis of HPKE

  10. NIST PQC Standardization News

  11. Counting Correctly in MLKEM

  12. KyberSlash

  13. Signal PQXDH Specification

  14. Apple iMessage PQ3 Security Blog

  15. Dual-PRF Combiners ↩2

  16. Hybrid Key Exchange Security

  17. Hybrid Key Exchange Security (2024)

  18. TLS Hybrid Design Draft

  19. RFC 9370 - IKEv2 Post-quantum Hybrid Key Exchange

  20. On the Security of Dual-PRF Combiners ↩2

  21. Key Commitment in AEAD

  22. Authentication weaknesses in GCM ↩2

  23. RFC 8439 - ChaCha20 and Poly1305 for IETF Protocols

  24. NCC Group Review of RustCrypto AES-GCM

  25. Quarkslab Security Audit of Dalek Libraries

  26. Cloudflare on HPKE

  27. MLA GitHub Issue #46

  28. MLA GitHub Issue #167