Multi Layer Archive (MLA)
MLA is an archive file format with the following features:
- Support for traditional and post-quantum encryption hybridation with asymmetric keys (HPKE with AES256-GCM and a KEM based on an hybridation of X25519 and post-quantum ML-KEM 1024)
- Support for traditional and post-quantum signing hybridation
- Support for compression (based on
rust-brotli
) - Streamable archive creation:
- An archive can be built even over a data-diode
- An entry can be added through chunks of data, without initially knowing the final size
- Entry chunks can be interleaved (one can add the beginning of an entry, start a second one, and then continue adding the first entry's parts)
- Architecture agnostic and portable to some extent (written entirely in Rust)
- Archive reading is seekable, even if compressed or encrypted. An entry can be accessed in the middle of the archive without reading from the beginning
- If truncated, archives can be repaired to some extent. Two modes are available:
- Authenticated repair (default): only authenticated (as in AEAD, there is no signature verification) encrypted chunks of data are retrieved
- Unauthenticated repair: authenticated and unauthenticated encrypted chunks of data are retrieved. Use at your own risk.
- Arguably less prone to bugs, especially while parsing an untrusted archive (Rust safety)
Repository
This repository contains:
mla
: the Rust library implementing MLA reader and writermlar
: a Rust cli utility wrappingmla
for common actions (create, list, extract...)doc
: advanced documentation related to MLA (e.g. format specification)bindings
: bindings for other languagessamples
: test assetsmla-fuzz-afl
: a Rust utility to fuzzmla
.github
: Continuous Integration needs
Quick command-line usage
Here are some commands to use mlar
in order to work with archives in MLA format.
# Generate MLA key pairs.
mlar keygen sender
mlar keygen receiver
# Create an archive with some files.
mlar create -k sender.mlapriv -p receiver.mlapub -o my_archive.mla /etc/./os-release /etc/security/../issue ../file.txt
# List the content of the archive.
# Note that order may vary, root dir are stripped,
# paths are normalized and listing is encoded as described in
# `doc/ESCAPING.md`.
# This outputs:
# ``
# etc/issue
# etc/os%2drelease
# file.txt
# ``
mlar list -k receiver.mlapriv -p sender.mlapub -i my_archive.mla
# Extract the content of the archive into a new directory.
# In this example, this creates two files:
# extracted_content/etc/issue and extracted_content/etc/os-release
mlar extract -k receiver.mlapriv -p sender.mlapub -i my_archive.mla -o extracted_content
# Display the content of a file in the archive
mlar cat -k receiver.mlapriv -p sender.mlapub -i my_archive.mla etc/os-release
# Convert the archive to a long-term one, removing encryption and using the best
# and slower compression level
mlar convert -k receiver.mlapriv -p sender.mlapub -i my_archive.mla -o longterm.mla -l compress -q 11
# Create an archive with multiple recipients and without signature nor compression
mlar create -l encrypt -p archive.mlapub -p client1.mlapub -o my_archive.mla ...
# List an archive containing an entry with a name that cannot be interpreted as path.
# This outputs:
# `c%3a%2f%00%3b%e2%80%ae%0ac%0dd%1b%5b1%3b31ma%3cscript%3eevil%5c..%2f%d8%01%c2%85%e2%88%95`
# corresponding to an entry name containing: ASCII chars, c:, /, .., \,
# NUL, RTLO, newline, terminal escape sequence, carriage return,
# HTML, surrogate code unit, U+0085 weird newline, fake unicode slash.
# Please note that some of these characters may appear in valid a path.
mlar list -k test_mlakey_archive_v2_receiver.mlapriv -p test_mlakey_archive_v2_sender.mlapub -i archive_weird.mla --raw-escaped-names
# Get its content.
# This displays:
# `' OR 1=1`
mlar cat -k test_mlakey_archive_v2_receiver.mlapriv -p test_mlakey_archive_v2_sender.mlapub -i archive_weird.mla --raw-escaped-names c%3a%2f%00%3b%e2%80%ae%0ac%0dd%1b%5b1%3b31ma%3cscript%3eevil%5c..%2f%d8%01%c2%85%e2%88%95
# Create an archive of a web file and utf-8 string, without encryption and without signature
(curl https://raw.githubusercontent.com/ANSSI-FR/MLA/refs/heads/master/README.md; echo "SEP"; echo "All Hail MLA!") | mlar create -l -o my_archive.mla --separator "SEP" --filenames great_readme.md -
mlar
can be obtained:
- through Cargo:
cargo install mlar
- using the latest release for supported operating systems
- The released binaries are built with
opt-level = 3
, enabling great performance
- The released binaries are built with
For even higher performance, you can build a native-optimized binary (not portable), for example on a Linux machine:
RUSTFLAGS="-Ctarget-cpu=native" cargo build --release --target x86_64-unknown-linux-musl
Note: Native builds are optimized for your machine's CPU and are not portable. Use them only when running on the same machine you build on.
API usage
Using MLA with others languages
Bindings are available for:
Security
- Please keep in mind, it is generally not safe to extract in a place where at least one ancestor is writable by others (symbolic link attacks).
- Even if encrypted with an authenticated cipher, if you receive an unsigned archive , it may have been crafted by anyone having your public key and thus can contain arbitrary data.
- Read API documentation and mlar help before using their functionnalities. They sometimes provide important security warnings.
doc/ENTRY_NAME.md
is also of particular interest. - mlar escapes entry names on output to avoid security issues.
- Except for symbolic link attacks, mlar will not extract outside given output directory.
FAQ
Is MLAArchiveWriter
Send
?
By default, MLAArchiveWriter
is not Send
. If the inner writable type is also Send
, one can enable the feature send
for mla
in Cargo.toml
, such as:
[dependencies]
mla = { version = "...", default-features = false, features = ["send"]}
Was a new format really required?
As existing archive formats are numerous, probably not.
But to the best of the authors' knowledge, none of them support the aforementioned features (but, of course, are better suitable for others purposes).
For instance (from the understanding of the author):
tar
format needs to know the size of files before adding them, and is not seekablezip
format could lose information about files if the footer is removed7zip
format requires to rebuild the entire archive while adding files to it (not streamable). It is also quite complex, and so harder to audit / trust when unpacking unknown archivejournald
format is not streamable. Also, one writter / multiple reader is not needed here, thus releasing some constraintsjournald
format have- any archive +
age
: age does not, as of MLA 2.0 release, support post quantum encryption nor signatures. - Backup formats are generally written to avoid things such as duplication, hence their need to keep bigger structures in memory, or not being streamable
Tweaking these formats would likely have resulted in similar properties. The choice has been made to keep a better control over what the format is capable of, and to (try to) KISS.
Performance
One can evaluate the performance through embedded benchmark, based on Criterion.
Several scenarios are already embedded, such as:
- File addition, with different size and layer configurations
- File addition, varying the compression quality
- File reading, with different size and layer configurations
- Random file read, with different size and layer configurations
- Linear archive extraction, with different size and layer configurations
On an "Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz":
$ cd mla/
$ cargo bench
...
multiple_layers_multiple_block_size/Layers ENCRYPT | COMPRESS | DEFAULT/1048576
time: [28.091 ms 28.259 ms 28.434 ms]
thrpt: [35.170 MiB/s 35.388 MiB/s 35.598 MiB/s]
...
chunk_size_decompress_mutilfiles_random/Layers ENCRYPT | COMPRESS | DEFAULT/4194304
time: [126.46 ms 129.54 ms 133.42 ms]
thrpt: [29.980 MiB/s 30.878 MiB/s 31.630 MiB/s]
...
linear_vs_normal_extract/LINEAR / Layers DEBUG | EMPTY/2097152
time: [145.19 us 150.13 us 153.69 us]
thrpt: [12.708 GiB/s 13.010 GiB/s 13.453 GiB/s]
...
Criterion.rs documentation explains how to get back HTML reports, compare results, etc.
The AES-NI extension is enabled in the compilation toolchain for the supported architectures, leading to massive performance gain for the encryption layer, especially in reading operations. Because the crate aesni
statically enables it, it might lead to errors if the user's architecture does not support it. It could be disabled at the compilation time, or by commenting the associated section in .cargo/config
.
Contributing
We appreciate your help! To contribute, please read our contributing instructions.