In some specialized applications it can be desirable - or even necessary - to use an immutable, read-only filesystem. This can be because of the underlying storage medium, e.g. a live DVD which cannot be altered, or a flash memory in an embedded system, which degrades with every cycle. An immutable filesystem also provides a well-known, well-defined and stable state. A state to which one can always return to with a reboot.
A read-only filesystem image technically works more like a random access archive that is packed once, by something like a Yocto build process. A filesystem explicitly designed for this use case can provide highly optimized data structures and compression techniques. Data and meta data can be pre-sorted, arranged in an optimal fashion, de-duplicated, and so on.
On embedded Linux systems, SquashFS is currently the dominating filesystem for such applications, having been merged into main line Linux in 2009, superseding the earlier cramfs. Unlike cramfs, SquashFS supports all the bells and whistles of a typical filesystem, such as
The SquashFS on-disk format is designed with high data density in mind. File data is compressed in chunks of typically 128KiB in size (up to 1MiB) with smaller files sharing a block.1 Meta data (e.g. inodes, directory contents) are also compressed in 8KiB chunks. File data, inodes and directory listings are stored separated from each other. All data structures are byte aligned and SquashFS even uses variable sized inodes to squeeze the last remaining bytes out.
This results in a very compact filesystem image, but comes at the cost of read speed. The granularity of meta data blocks means that the reader has a certain amount of overhead data it was not interested in, but still has to load and unpack. Since the compressed blocks are variable in size, they practically never align with disk blocks, causing a certain overhead in I/O. The reader may not even know ahead of time how much data it needs to read, first having to read a header preceding the variable sized block.
In 2019 EROFS was introduced, first available in kernel 5.4. EROFS is also a compressed, read-only filesystem, but prioritizes read performance in its design.
While SquashFS packs data with a fixed input block size, resulting in variable sized compressed chunks, EROFS uses fixed output compression. The amount of input data is allowed to be flexible, but the compressed chunks that are generated are fixed in size. This allows them to always be ideally aligned with disk I/O boundaries. The size is always known to the reader and they can be addressed by index rather than by offset.
In EROFS, data and meta data are also stored interleaved, rather than separated, allowing for better locality and faster sequential reads in some cases.
To find out how well EROFS does in comparison to SquashFS, we did a size comparison and read speed benchmark.
For this benchmark, a SquashFS image has been extracted from the Debian 10.2 Xfce live DVD and converted to various other formats. The image from the DVD is roughly 2GiB in size, using default XZ compression with default 128KiB block size.
EROFS originally only supported lz4 compression, but recently also added experimental support for lzma. This currently requires using a custom build of the also experimental xz-utils 5.3.2 alpha release.
SquashFS also supports lz4 compression. To compare the two, the SquashFS
image was converted using the tar-conversion programs from
sqfs2tar debian.sqfs | tar2sqfs -c <compressor> test.sqfs
For reference, the tarballs with the same compression were also generated this
way, and the
tar2sqfs was additionally modified to generate a completely
uncompressed SquashFS image.
For better size comparison and to see the effect of the fixed output compression, additional SquashFS images were generated with the block size reduced from 128KiB down to 4KiB, the default output block size of EROFS.
EROFS images were built by unpacking the SquashFS image and then repacking
fakeroot rdsquashfs -XTCO -u / -p temp/ test.sqfs fakeroot mkfs.erofs -z<compressor> test.erofs temp/
Lets take a look at the absolute image sizes first.
|none||6.5 GiB||6.1 GiB||6.1 GiB||6.4 GiB|
|LZ4||3.3 GiB||3.1 GiB||3.6 GiB||3.9 GiB|
|LZ4 HC||2.8 GiB||2.7 GiB||3.4 GiB||3.6 GiB|
|XZ||1.7 GiB||2.0 GiB||2.6 GiB||2.7 GiB|
We see immediately that SquashFS with default settings produces smaller images than EROFS. By reducing the block size to 4KiB, SquashFS compression clearly suffers from the reduced window size and the results tend closer to what EROFS produces, but are still smaller.
For the uncompressed case, both SquashFS and EROFS are close,
but a bit smaller than the reference tarball. They both use more compact,
binary data structures than
tar and support data deduplication.
Moreover EROFS has a little more overhead than SquashFS, since it uses larger but easier to decode data structures. Plus it cares more about data alignment for fast read access.
For LZ4, SquashFS even beats the compressed tarball, and only in the XZ case, does stream compression with a big sliding window finally win over the fixed, 128KiB block size.
The XZ compressed EROFS is pretty close in size to SquashFS with 4KiB block size. This is a result of the fixed output compression in EROFS. It becomes more apparent, if we look at each format relative to its own uncompressed baseline, giving us a better insight on how effectively each format makes use of data compression:
As we switch to more effective compressors, the gap between SquashFS with 4KiB block size and EROFS shrinks. That’s because for EROFS, better compression means it can increase the input window size, improving the compression rate.
For a speed comparison benchmark, a small Linux 5.19 kernel and busybox based initrd were built for a Raspberry Pi 3. The filesystem to be tested was put on a separate ext4 partition on the boot micro SD card. From the busybox shell, the image was then loop mounted and part of its contents packed into a tarball, causing a tree traversal and access to a number of files:
mkdir -p /mnt/sdcard/ /mnt/testfs/ mount /dev/mmcblk0p2 /mnt/sdcard/ mount /mnt/sdcard/<testfs> /mnt/testfs/ time /bin/tar cf - /mnt/testfs/usr/bin > /dev/null
Repeating the last line 5 times (flushing the page cache in between) and taking the worst case time gives us the following table:
The I/O performance is a bit easier to visualize and compare, if we divide the size of the resulting tarball (~125.0 MiB) by the time it took to produce it:
|none||18.5 MiB/s||16.5 MiB/s||19.6 MiB/s|
|LZ4||26.3 MiB/s||18.9 MiB/s||30.9 MiB/s|
|LZ4 HC||28.4 MiB/s||20.1 MiB/s||34.8 MiB/s|
|XZ||5.0 MiB/s||3.9 MiB/s||7.1 MiB/s|
Going from uncompressed to LZ4 speeds up the operation for all 3 cases. Increasing the data density with HC mode gives us another speedup. The time saved by reading less data from the SD card outweighs the time it takes to recreate it from the compressed form. We can tell that LZ4 decompression is I/O bound in our test setup.
In contrast, with LZMA compression, read speed takes a drastic hit compared to reading uncompressed data. Unpacking data on the CPU is the bottleneck for all 3 cases.
By first comparing EROFS against SquashFS taking a 4KiB block size, EROFS benefits from the aligned reads, besting the read performance irregardless of the compressor in use. The gain of EROFS over SquashFS even improves with increasing compression ratio.
This can again be explained with the fixed output compression and alignment. SquashFS almost never has filesystem blocks aligned with device blocks, so when reading a block, there is a certain overhead of data that is transferred but actually belongs to a different block. The I/O transfers from the SD card are used inefficiently. This is not the case for EROFS, and the fixed output compression means that improving data density actually improves the utilization of the device blocks that are read.
Similarly, we can see SquashFS gaining some performance when increasing the block size. A lot more files smaller than 128KiB are now packed into shared “fragment blocks”. SquashFS internally maintains a cache of 3 fragment blocks and with a bigger block size we now get more cache hits. For each cache miss, a much bigger chunk needs to be read from the SD card and the overhead caused by unaligned reads is now comparatively small.
So, is EROFS a better replacement for SquashFS? It depends.
Data compression is a classic example of a memory vs time trade-off. For instance, even in our own benchmark we can see that LZMA gives us an almost 60% reduction in space, but this comes at a performance cost when unpacking the data again. LZ4 is extremely fast in comparison, but doesn’t get anywhere near that data density.
Likewise, while SquashFS is designed to squeeze every last byte out of a filesystem, the goal of EROFS is to be fast - despite being compressed, sacrificing space for speed gains.
According to our benchmark both filesystems do reasonably well in achieving their respective goals.
If storage space is a concern, it is reasonable to stick with SquashFS. If file I/O is a bottleneck in your application, EROFS is definitely worth considering.
Please note that at the time of writing this article, LZMA compression for EROFS is still considered experimental.
On the EROFS side of things, Zstd might be an interesting compressor to add. As other SquashFS benchmarks have shown, Zstd has considerable gains over LZ4 in terms of data density, but is still a lot faster to unpack than LZMA.2