Cryptography in MLA

MLA uses cryptographic primitives essentially for the purpose of the Encryption and Signature layers.

This document introduces the primitives used, arguments for the choice made and some security considerations.

Keys used for encryption and signature are generated and used separately.

Signature

As described in FORMAT.md an archive can be signed. Implementation must ensure users explicitly choose if signature is made and verified.

A PQ/T key consists of a pair of a post-quantum key and a traditional key. An archive is considered correctly signed for a PQ/T key if and only if it is correctly signed for its post-quantum part AND its traditional part.

Two signature methods are available and must be used together. Signature method input is called m. The SHA-512 hash h of m may be computed in a first step.

For method MLAEd25519SigMethod, signature_data is the Ed25519ph (as described in RFC 8032 ¹) signature of m (not h even though it can be used for computing the result). The context given as parameter to Ed25519ph is the ASCII MLAEd25519SigMethod. Signature verification and key generation are done as described in RFC 8032. Key storage is described in KEY_FORMAT.md.

For method MLAMLDSA87SigMethod, signature_data is the ML-DSA-87 signature (as described in FIPS 204 ², not HashML-DSA) of h (not m this time) with the ASCII MLAMLDSA87SigMethod as context. Signature verification and key generation are done as described in FIPS 204. Key storage is described in KEY_FORMAT.md.

An archive can be signed with multiple signing keys. If a user provides a set of PQ/T keys for signature verification, implementations should give a way for the user to know if archive is correctly signed for at least one key. Implementations may give a way for users to know if archive is correctly signed for all keys. Users must explicitly know if they are validating against at least one or all keys. Implementations may also give a way for users to know which PQ/T keys correspond to valid signatures or their number.

Encryption high-level overview

Objectives

The purpose of the Encryption layer is to provide confidentiality and data integrity of the inner layer.

These objectives are obtained using:

Authenticated encryption
Asymmetric cryptography, for several recipients

This layer does not provide signature.

General design guidelines

The size and the initial computation time used for the encryption needs are not a big issue, if kept reasonable. Indeed, in the author understanding, MLA archives are usually several MB long and the computation time is primarily spent in compression/decompression and encryption/decryption of the data

As a result, some optimization have not been performed -- which help keeping an hopefully auditable and conservative design.

Only one encryption method and key type is available, to avoid confusion and potential corner cases errors
When possible, use audited code and test vectors

Main bricks: Encryption

The data is encrypted using AES-256-GCM, an AEAD algorithm. To offer a seekable layer, data is encrypted using chunks of 128KB each, except for the last one. These encrypted chunks are all present with their associated tag. Tags are checked during decryption before returning data to the upper layer.

To prevent truncation attacks, another chunk is added at the end corresponding to the encryption of the ASCII string "FINALBLOCK" with "FINALAAD" as additional authenticated data. Any usage of the archive must check correct decryption (including tag verification) of this last block.

The key, the base nonce and the nonce derivation for each data chunk are computed following HPKE (RFC 9180) ³. HPKE is parameterized with:

Mode: "Base" (no PSK, no sender authentication)
KDF: HKDF-SHA512
AEAD: AES-256-GCM
KEM: Multi-Recipient Hybrid KEM, a custom KEM described later in this document

Thus, only one cryptography suite is available for now. If this setting ends up broken by cryptanalysis, we will move users onward to the next MLA version, using appropriate cryptography. Therefore, MLA lacks cryptography agility which is an encouraged property regarding post-quantum cryptography by ANSSI ⁴. Still, HPKE improves this aspect of MLA ³.

Full details are available below.

Additionally, "key commitment" is included using a method described in ⁵ and detailed in ⁶.

Main bricks: Asymmetric encryption

Since the format v2, the Encrypt layer is using post-quantum cryptography (PQC) through an hybrid approach, to avoid "Harvest now, decrypt later" attacks.

The algorithms used are:

X25519 for pre-quantum cryptography, using DHKEM (RFC 9180) ³
FIPS 203⁷ (CRYSTALS Kyber) MLKEM-1024 for post-quantum cryptography

The two keys are mixed together (see below) in a manner keeping the IND-CCA2 properties of the two algorithms.

Sending to multiple recipients is achieved using a two-step process:

For each recipient, a per-recipient Hybrid KEM is done, leading to a per-recipient shared secret
These per-recipient shared secret are derived through HPKE to obtain a key and a nonce
These per-recipient key and nonce are used to decrypt a secret shared by all recipients

This final secret is the one later used as an input to the encryption layer. The whole process can be viewed as a KEM encapsulation for multiple recipients.

Encryption Details

The following sections describe the whole process for data encryption and seed derivation. They are meant to ease the understanding of the code and MLA format re-implementation.

The interested reader could also look at the Rust implementation in this repository for more details. The implementation also includes tests (including some test vectors) and comments.

Asymmetric encryption - Per-recipient KEM

Notations

$p k_{ecc}^{i}$ , $s k_{ecc}^{i}$ , $p k_{m l k e m}^{i}$ and $s k_{m l k e m}^{i}$ : respectively the X25519 public key and secret key, and the MLKEM-1024 (FIPS 203 ⁷) encapsulating key and decapsulating key
$DHKEM.Encapsulate$ and $DHKEM.Decapsulate$ : key encapsulation methods with X25519, as defined in RFC 9180, section 4 ³
$MLKEM.Encapsulate$ and $MLKEM.Decapsulate$ : key encapsulation methods on MLKEM-1024, as defined in FIPS 203 ⁷
$s s_{rec i p i e n t s}$ : a 32-bytes secret, produced by a cryptographic RNG. Informally, this is the secret shared among recipients, encapsulated separately for each recipient
$KeySchedule_{rec i p i e n t}$ : KeySchedule function from RFC 9180 ³, instantiated with:
- Mode: "Base"
- KDF: HKDF-SHA-512
- AEAD: AES-256-GCM
- KEM: a custom KEM ID, numbered 0x1120
$Encrypt_{A ES 256 GCM}$ : AES-256-GCM encryption, returning the encrypted data concatenated with the associated tag
$Decrypt_{A ES 256 GCM}$ : AES-256-GCM decryption, returning the decrypted data after verifying the tag
$Serialize$ and $Deserialize$ : respectively produce a byte string encoding the data in argument, and produce the data from the byte string in argument

Process

To encrypt to a target recipient $i$ , knowing $p k_{ecc}^{i}$ and $p k_{m l k e m}^{i}$ :

Compute shared secrets and ciphertexts for both KEM:

$(s s_{ecc}^{i}, c t_{ecc}^{i}) (s s_{m l k e m}^{i}, c t_{m l k e m}^{i}) = DHKEM.Encapsulate (p k_{ecc}^{i}) = MLKEM.Encapsulate (p k_{m l k e m}^{i})$

Combine the shared secrets (implemented in mla::crypto::hybrid::combine):

def combine(ss1, ss2, ct1, ct2):
    uniformly_random_ss1 = HKDF-SHA512-Extract(
        salt=0,
        ikm=ss1
    )
    key = HKDF(
        salt=uniformly_random_ss1,
        ikm=ss2,
        info=ct1 . ct2
    )
    return key

$s s_{rec i p i e n t}^{i} = combine (s s_{ecc}^{i}, s s_{m l k e m}^{i}, c t_{ecc}^{i}, c t_{m l k e m}^{i})$

Wrap the recipients' shared secret:

$(k e y^{i}, n o n c e^{i}) c t_{w r a p}^{i} c t_{rec i p i e n t}^{i} = KeySchedule_{rec i p i e n t} (s ha re d_secre t = s s_{rec i p i e n t}^{i}, info = "MLA Recipient") = Encrypt_{A ES 256 GCM} (key = k e y^{i}, nonce = n o n c e^{i}, data = s s_{rec i p i e n t s}) = Serialize (c t_{w r a p}^{i}, c t_{ecc}^{i}, c t_{m l k e m}^{i})$

Informally, this process can be viewed as a per-recipient KEM taking a shared secret $s s_{rec i p i e n t s}$ , the recipient public key (made of the elliptic curve and the PQC public keys) and returning a ciphertext $c t_{rec i p i e n t}^{i}$ .

To obtain the shared secret from $c t_{rec i p i e n t}^{i}$ for a recipient $i$ knowing $s k_{ecc}^{i}$ and $s k_{m l k e m}^{i}$ :

Compute the recipient's shared secret:

$(c t_{w r a p}^{i}, c t_{ecc}^{i}, c t_{m l k e m}^{i}) s s_{ecc}^{i} s s_{m l k e m}^{i} s s_{rec i p i e n t}^{i} = Deserialize (c t_{rec i p i e n t}^{i}) = DHKEM.Decapsulate (s k_{ecc}^{i}, c t_{ecc}^{i}) = MLKEM.Decapsulate (s k_{m l k e m}^{i}, c t_{m l k e m}^{i}) = combine (s s_{ecc}^{i}, s s_{m l k e m}^{i}, c t_{ecc}^{i}, c t_{m l k e m}^{i})$

Try to decrypt the secret shared among recipients:

$(k e y^{i}, n o n c e^{i}) s s_{rec i p i e n t s} = KeySchedule_{rec i p i e n t} (s ha re d_secre t = s s_{rec i p i e n t}^{i}, info = "MLA Recipient") = Decrypt_{A ES 256 GCM} (key = k e y^{i}, nonce = n o n c e^{i}, data = c t_{w r a p}^{i})$

If the decryption is a success, returns $s s_{rec i p i e n t s}$ . Otherwise, returns an error.

Arguments

Using HPKE (RFC 9180 ³) for both elliptic curve encryption (DHKEM) and post-quantum encryption (MLKEM) offers several benefits⁸:
- Easier re-implementation of the format MLA, thanks to the availability of HPKE in cryptographic libraries
- An existing formal analysis ⁹
- Easier code and security auditing, thanks to the use of known bricks
- Availability of test vectors in the RFC, making the implementation more reliable
To the knowledge of the author, no HPKE algorithm has been standardized for quantum hybridation, hence the custom algorithm
FIPS 203 is used as, at the time of writing:
- It is the only KEM algorithm standardized by the NIST ¹⁰
- It is in line with the French suggestions ⁴ for PQ cryptography
The MLKEM-1024 mode is used for stronger security, and to limit consequence of future advances ¹¹ ¹². This is also the choice of other industry standards ¹³ ¹⁴
The shared secret from the two-KEM is produced using a "Nested Dual-PRF Combiner", proved in ¹⁵ (3.3):
- The use of concatenation scheme including ciphertexts keeps IND-CCA2 if one of the two underlying scheme is IND-CCA2, as proved in ¹⁶ and explained in ¹⁷
- TLS ¹⁸ uses a similar scheme, and IKE ¹⁹ also uses a concatenation scheme
- This kind of scheme follows ANSSI recommendations ⁴
- HKDF can be considered as a Dual-PRF if both inputs are uniformly random ²⁰. In MLA, the combine method is called with a shared secret from ML-KEM, and the resulting ECC key derivation -- both are uniformly random
- To avoid potential mistake in the future, or a mis-reuse of this method, the "Nested Dual-PRF Combiner" is used instead of the "Dual-PRF Combiner" (also from ¹⁵). Indeed, this combiner force the "salt" part of HKDF to be uniformly random using an additional PRF use, ensuring the following HKDF is indeed a Dual-PRF

Asymmetric encryption - Multi-Recipient Hybrid KEM

Intuition

KEM, such as the one described above, returns a fresh and distinct secret for each recipient.

To obtain a "meta-KEM", working for multi-recipient, the strategy is the use of per-recipient KEM to encrypt a common secret.

This whole process can then be viewed as a KEM for multi-recipient, taking in input a list of public keys and returning a shared secret and a ciphertext made of the concatenation of each per-recipient ciphertext.

To avoid marking which per-recipient ciphertext correspond to which recipient public key, the decapsulation process "brute-force" each ciphertext for a given decapsulation key. If the decryption works (with the associated tag), the shared secret is returned.

Key commitment, to avoid rather unlikely mismatch, is further ensured inside the Encrypt layer (see below).

Process

The "Per-recipient KEM" process described above is noted:

$PerRecipientKEM.Encapsulate$ , taking a couple of public key ( $p k_{ecc}^{i}$ and $p k_{m l k e m}^{i}$ ), a shared secret $s s_{rec i p i e n t s}$ and returning a recipient ciphertext $c t_{rec i p i e n t}^{i}$
$PerRecipientKEM.Decapsulate$ , taking a couple of private key ( $s k_{ecc}^{i}$ and $s k_{m l k e m}^{i}$ ), a ciphertext $c t_{rec i p i e n t s}$ and returning either a shared secret $s s_{rec i p i e n t s}$ if the recipient $i$ is a legitimate recipient (if the AEAD decryption works), or an error otherwise

$CSPRNG (n)$ is a cryptographically secured RNG producing a n-bytes secret.

To encapsulate to a list of recipient $[(p k_{ecc}^{0}, p k_{m l k e m}^{0}), ..., (p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1})]$ :

$def HybridKEM.Encapsulate ([(p k_{ecc}^{0}, p k_{m l k e m}^{0}), ..., (p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1})]) s s_{rec i p i e n t s} = CSPRNG (32) c t_{rec i p i e n t}^{0} = PerRecipientKEM ((p k_{ecc}^{0}, p k_{m l k e m}^{0}), s s_{rec i p i e n t s}) \dots c t_{rec i p i e n t}^{n - 1} = PerRecipientKEM ((p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1}), s s_{rec i p i e n t s}) c t_{rec i p i e n t s} = Serialize (c t_{rec i p i e n t}^{0}, \dots, c t_{rec i p i e n t}^{n - 1}) return s s_{rec i p i e n t s}, c t_{rec i p i e n t s}$

To decapsulate from a ciphertext $c t_{rec i p i e n t s}$ , knowing a recipient private key $(s k_{ecc}^{i}, s k_{m l k e m}^{i})$ :

$def HybridKEM.Decapsulate ((s k_{ecc}^{i}, s k_{m l k e m}^{i}), c t_{rec i p i e n t s})$
$foreach c t_{k} in Deserialize (c t_{rec i p i e n t s})$
$try :$
$s s_{rec i p i e n t s} = PerRecipientKEM.Decapsulate ((s k_{ecc}^{i}, s k_{m l k e m}^{i}), c t_{k})$
$success :$
$return s s_{rec i p i e n t s}$
$error :$
$continue$
$throw KeyNotFoundError$

Arguments

The shared secret is cryptographically generated, so it can later be used as a shared secret in HPKE encryption
This secret is unique per archive, as it is generated on archive creation. Even converting (convert) or cleaning a truncated archive (clean-truncate) an archive in mlar CLI will force a newly fresh secret. It is a new secret as there is no edit feature implemented, even if it is doable. Hence, a new random symmetric key is used to encrypt its content while "converting" or "recovering" an archive.
Even if the AEAD decryption worked for an non legitimate recipient, for instance following an intentional manipulation, the shared secret obtained will later be checked using Key commitment before decrypting actual data (see below)
Optimization would have been possible here, such as sharing a common ephemeral key for the DHKEM. But the size gain is not worth enough regarding the ciphertext size of MLKEM and would move the implementation away from the DHKEM in RFC 9180

Encryption

Notation

The "Multi-Recipient Hybrid KEM" process described above is noted:

$MultiRecipientHybridKEM.Encapsulate$ , taking a list of public keys $[(p k_{ecc}^{0}, p k_{m l k e m}^{0}), ..., (p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1})]$ and returing a shared secret $s s_{rec i p i e n t s}$ and a ciphertext $c t_{rec i p i e n t s}$
$MultiRecipientHybridKEM.Decapsulate$ , taking a couple of private keys ( $s k_{ecc}^{i}$ and $s k_{m l k e m}^{i}$ ), a ciphertext $c t_{rec i p i e n t s}$ and returning either a shared secret $s s_{rec i p i e n t s}$ if the recipient $i$ is a legitimate recipient (if the AEAD decryption works), or an error otherwise

KeyCommitmentChain is defined as the array of 64-bytes: -KEY COMMITMENT--KEY COMMITMENT--KEY COMMITMENT--KEY COMMITMENT-.

$KeySchedule_{h y b r i d}$ : KeySchedule function from RFC 9180 ³, instantiated with:

Mode: "Base"
KDF: HKDF-SHA-512
AEAD: AES-256-GCM
KEM: a custom KEM ID, numbered 0x1020

$ComputeNonce$ : function from RFC 9180 ³.

Process

To encrypt n-bytes data to a list of public keys $[(p k_{ecc}^{0}, p k_{m l k e m}^{0}), ..., (p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1})]$ :

Compute a shared secret and the corresponding ciphertext:

$s s_{rec i p i e n t s}, c t_{rec i p i e n t s} = MultiRecipientHybridKEM.Encapsulate ([(p k_{ecc}^{0}, p k_{m l k e m}^{0}), ..., (p k_{ecc}^{n - 1}, p k_{m l k e m}^{n - 1})])$

Derive the key and base nonce using HPKE

$(k ey, ba se_n o n ce) = KeySchedule_{h y b r i d} (shared_secret = s s_{recipients}, info = "MLA Encrypt Layer")$

Ensure key-commitment

$k eyco mmi t) = Encrypt_{A ES 256 GCM} (key = k ey, nonce = ComputeNonce (ba se_n o n ce, 0), data = KeyCommitmentChain$

For each 128KB $c h u n k_{j}$ of data:

$e n c_{j}) = Encrypt_{A ES 256 GCM} (key = k ey, nonce = ComputeNonce (ba se_n o n ce, j + 1), data = c h u n k_{j}$

Note: $j$ starts at 0. $j + 1$ is used because the sequence numbered 0 has already been used by the Key commitment.

When the layer is finalized, the last chunk of data (with a length lower than or equals to 128KB) is encrypted the same way
Finally, a final chunk with sequence number $n + 1$ (where $n$ is the number of data chunks) and special content and additional authenticated data is appended:

$f ina l_c h u nk) = Encrypt_{A ES 256 GCM} (key = k ey, nonce = ComputeNonce (base_nonce, n + 1), data = " F I N A L B L OC K " aad = " F I N A L AA D "$

The resulting layer is composed of:

header: $c t_{rec i p i e n t s}$
data: $k eyco mmi t . e n c_{0} . \dots e n c_{n} .$ $f ina l_c h u nk$

Special care must be taken not to reuse a sequence number in implementations as this would be catastrophic given GCM properties. For $n$ chunks of data:

sequence 0: key commitment
sequence 1 to $n$ : data
sequence $n + 1$ : $f ina l_c h u nk$ with only the 10 bytes "FINALBLOCK" as content

To decrypt the data at position $p os$ :

Once for the whole session, get the cryptographic materials

$s s_{rec i p i e n t s} (k ey, base_nonce) = MultiRecipientHybridKEM.Decapsulate ((s k_{ecc}^{i}, s k_{m l k e m}^{i}), c t_{rec i p i e n t s}) = KeySchedule_{h y b r i d} (s ha re d_secre t = s s_{rec i p i e n t s}, info = "MLA Encrypt Layer")$

Once for the whole session, check the key commitment

$co mmi t) = Decrypt_{A ES 256 GCM} (key = k ey, nonce = ComputeNonce (ba se_n o n ce, 0), data = k eyco mmi t$

$assert co mmi t = KeyCommitmentChain$

Retrieve the encrypted chunk of data

$s t a r t j = p os - sizeof (k eyco mmi t) = p os \div 128 K i B$

Where $\div$ is the Euclidean division.

Then: $c h u n k_{j}) = Decrypt_{A ES 256 GCM} (key = k ey, nonce = ComputeNonce (ba se_n o n ce, j + 1), data = e n c_{j}$

Arguments

Key commitment is always checked before returning clear-text data to the caller
AEAD tag of a chunk is always checked before returning the corresponding clear-text data to the caller
Arguments for HPKE use are very similar to the ones mentioned above. In particular, this is a standardized approach with existing analysis
As there is two kind of custom KEM used ("Per-recipient KEM" and "Hybrid KEM"), two distinct KEM ID are used. In addition, two distinct MLA specific info are used to bind this derivation to MLA
As described in ⁵ and ²¹, AES in GCM mode does not ensure "key commitment". This property is added in the layer using the "padding fix" scheme from ⁵ with the recommended 512-bits size for a 256-bits security
Key commitment is mainly used to ensure that two recipients will decrypt to the same plaintext if given the same ciphertext, i.e. an attacker modifying the header of an archive cannot provide two distinct plaintext to two distinct recipient
AES-GCM is used as an industry standard AEAD
- the base nonce, and therefore each nonce used, are unique per archive because they are generated from the archive-specific shared secret, limiting the nonce-reuse risk to standard acceptability ³
- no more than $2^{64}$ chunks will be produced, as the sequence's type used in MLA implementation is a u64 checked for overflow. As this is a widely accepted limit of AES-GCM, this value is also within the range provided by ³
- the tag size is 128-bits (standard one), avoiding attacks described in ²²
- 128KiB is lower than the maximum plaintext length for a single message in AES-GCM (64 GiB)²²

Seed derivation

The asymmetric encryption in MLA, particularly the KEMs, provides deterministic API.

These API are usually fed with cryptographically generated data, except for the regression test and the "seed derivation" feature in mlar CLI.

This feature is meant to provide a way for client to implement:

A derivation tree
Keep the root secret in a safe place, and be able to find back the derived secrets

The derivation scheme is based on the same ideas than mla::crypto::hybrid::combine:

A dual-PRF (HKDF-Extract with a uniform random salt ²⁰) to extract entropy from the private key
HKDF-Expand to derive along the given path component

From a private key ( $s k_{ecc}^{i}$ and $s k_{m l k e m}^{i}$ ), the secret is derived from the path component $p c$ through:

$ecc_r n d see d = HKDF.Extrac t_{SHA512} (salt = 0, ikm = s k_{ecc}^{i}) = HKD F_{SHA512} (salt = ecc_rnd, ikm = s k_{m l k e m}^{i}, info = "PATH DERIVATION" . p c)$

To derive a key using a seed, a ChaCha20Rng is used. If a seed is provided, the ChaCha20Rng is seeded with the first 32-bytes of $SHA512 (see d)$ . Otherwise, the seed comes from OS Cryptographic RNG sources.

A ChaCha20Rng is the ChaCha20²³ stream cipher fed with a seed as key and 8 null bytes as nonce.

The CSRNG is then provided to MLA deterministic APIs.

Implementation specificities

External dependencies

MLA relies on several external cryptographic libraries for its primitives. Below is a summary of each dependency, its role, review status, and documentation coverage:

Symmetric Encryption

RustCrypto AES-GCM (aes-gcm crate)
- Role: AES-256-GCM authenticated encryption for archive data.
- Review ²⁴.
- Documentation: Well documented and widely used in the Rust ecosystem.
- Note: GCM mode is re-implemented in MLA for chunked/partial decryption.

Traditional Signature

ed25519-dalek
- Role: Ed25519ph signatures (RFC 8032) for the traditional part of hybrid signatures.
- Review ²⁵.
- Documentation: Well documented, widely used, and considered production-grade.

Post-Quantum Signature

ml-dsa (ml-dsa crate)
- Role: ML-DSA-87 signatures (FIPS 204) for the post-quantum part of hybrid signatures.
- Review: No formal third-party audit known at time of writing; code is open-source and used in research and reference implementations.
- Documentation: Documented in FIPS 204 and crate docs, but less mature than ed25519-dalek.
- Note: MLA uses ML-DSA-87, not HashML-DSA.

Hybrid Signature Logic

Custom implementation in MLA (hybrid_signature.rs, mlakey.rs)
- Role: Combines Ed25519ph and ML-DSA-87 for hybrid PQ/T signatures.
- Review: No third-party audit. Covered by internal tests and verified against official test vectors.

Asymmetric Encryption (Hybrid KEM)

RustCrypto MLKEM (ml-kem crate)
- Role: MLKEM-1024 (Kyber) for post-quantum key encapsulation.
- Review: No formal third-party audit; code quality and auditability considered good by the author.
- Documentation: FIPS 203 and crate docs; less mature than AES-GCM or Dalek.
curve25519-dalek / x25519-dalek
- Role: X25519 for pre-quantum key encapsulation (DHKEM).
- Review: curve25519-dalek is widely used and reviewed; respects SafeCurves criteria.
- Documentation: Well documented.
rust-hpke
- Role: HPKE primitives (KDF, LabeledExtract, LabeledExpand) and test vectors.
- Review ²⁶ (version 0.8).
- Documentation: Good, but custom KEM IDs and key schedule logic are re-implemented in MLA.

Random Number Generation

rand / getrandom / OsRng
- Role: Cryptographically secure random number generation for keys and nonces.
- Review: Well documented and widely used; uses OS sources (getrandom() syscall, /dev/urandom, RtlGenRandom).
- Documentation: getrandom crate docs.
- Note: On Linux, uses the getrandom() syscall and falls back on /dev/urandom. On Windows, uses the RtlGenRandom API.
rand_chacha / ChaCha20Rng
- Role: CSPRNG for deterministic seed derivation and key generation.
- Review: Well documented and widely used.
- Note: A ChaCha20Rng is seeded from bytes generated by OsRng to build a CSPRNG. This provides the actual bytes used in key and nonce generation.

Other

hkdf
- Role: HKDF-SHA512 for key derivation and hybrid KEM combiners.
- Review: Part of RustCrypto; well documented.
zeroize
- Role: Securely zeroes sensitive data in memory.
- Review: Well documented and widely used.

Design Choices and Rationale

Elliptic curve cryptography (Curve25519/X25519) is used over RSA due to the lack of ready-for-production Rust-based RSA libraries, the availability of audited Curve25519 libraries, and its widespread use and security properties (SafeCurves, Trail of Bits arguments).
AES-GCM is chosen for authenticated encryption due to its common use, hardware acceleration support (e.g., AES-NI), and avoidance of a class of attacks.
MLA’s hybrid approach (combining traditional and post-quantum primitives) limits the impact of potential flaws in less mature or unaudited libraries (e.g., MLKEM, ML-DSA).
MLA includes regression tests and test vectors for all critical cryptographic operations.
Custom cryptographic logic (e.g., hybrid KEM, hybrid signature, chunked AES-GCM) is described in this document and tested against standards.

Summary Table

Dependency	Purpose	Review Status	Documentation
aes-gcm	AES-256-GCM encryption	NCC Group	Excellent
ed25519-dalek	Ed25519ph signature	Quarkslab	Excellent
ml-dsa	ML-DSA-87 signature	None (open-source, FIPS)	Good
ml-kem	MLKEM-1024 (Kyber) KEM	None (open-source, FIPS)	Good
curve25519-dalek	X25519 KEM	Community, SafeCurves	Excellent
rust-hpke	HPKE primitives	Cloudflare	Good
rand/getrandom	OS CSPRNG	Community	Excellent
rand_chacha	ChaCha20Rng CSPRNG	Community	Excellent
hkdf	Key derivation	Community	Excellent
zeroize	Memory zeroization	Community	Excellent

AES-GCM re-implementation

While the AES and GHash bricks come from RustCrypto, the GCM mode for AES-256 has been re-implemented in MLA.

Indeed, the recover mode must be able to only partially decrypt a data chunk, and decide whether the associated tag must be verified or not. This API is not provided by the RustCrypto project, for very understandable reasons.

To ensure the implementation follows the standard, it is tested against AES-256-GCM test vectors in MLA regression tests.

HPKE Key Schedule re-implementation

For several reasons described in the code, but mainly due to the availability of API, the possibility to add custom KEM ID and the relative few lines needed for re-implementation, the $KeySchedule$ method has been re-implemented in MLA.

It still use some bricks from rust-hpke, as the KDF, $LabeledExtract$ and $LabeledExpand$ . It is tested against RFC 9180 ³ test vectors in MLA regression tests.

MLKEM implementation without a review

Thanks to the hybrid approach, a flawed implementation of MLKEM would have limited consequences. It satisfies ANSSI guidelines for the transition first phase to PQC hybridization ⁴. For this reason, MLA is eligible for a security visa evaluation.

For now, it is therefore accepted by the author (as a trade-off) to use a MLKEM implementation without existing review to bring as soon as possible a reasonable protection against "Harvest now, decrypt later" attacks.

If a reviewed implementation with acceptable dependency emerges in the future, it can be easily swapped in MLA. Thus, MLA would also satisfy the requirements to get a security visa evaluation in the second and third phases of these guidelines by including its PQC implementation.

Security considerations

Plaintext length

The Encrypt layer does not hide the plaintext length.

Usually, this layer is used with the Compress layer. If an attacker knows the original file size, he might learn information about the original data entropy.

Hidden recipient list

Only the owner of a recipient's private key can determine that they are a recipient of the archive. In other words, while the recipient list remains private, the total number of recipients is still visible.

This is an intentional privacy feature.

Keyboard shortcuts

Multi Layer Archive