Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve large fixed-size flat address spaces as files #692

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

kaetemi
Copy link
Contributor

@kaetemi kaetemi commented Jun 1, 2022

The EVE graphics controller can render ASTC textures straight from SPI flash. We plan to integrate LittleFS to enable easy asset replacement. One requirement is that the ASTC texture is located in one contiguous area. The current file allocation strategy does not permit this due to the block header in files larger than one block.

To achieve our requirements, this PR adds an additional file record type LFS_TYPE_FLATSTRUCT flagged by LFS_F_FLAT. This record simply stores the first block number and the file size. During block traversal, all the blocks starting from the first block until the last are simply visited (without accessing storage). This allows us to effectively reserve flat address spaces in flash within the filesystem.

A function lfs_file_reserve is added to turn a file into a reserved flat address space. Calling this function will discard the file and attempt to allocate the requested size as a contiguous block. Specifying size 0 will discard the reservation and make the file usable as a regular file again. As we are accessing the reserved storage areas directly from the GPU, no implementation of read nor write is provided. The file is treated as an opaque storage area. Call lfs_file_reserved to get the first block, or to check whether a file is a reserved flat area.

To allocate the contiguous blocks, two strategies are used depending on the size of the file.

The first allocation strategy is to simply allocate blocks, and reset the allocation starting point whenever an expected block is skipped over, until the expected number of blocks has been allocated sequentially. The strategy is used when the file size is smaller than the lookahead cache, and is aborted when more blocks than are contained in the cache have been attempted.

The second allocation strategy allocates one block in the normal fashion. Then traverses the filesystem and advances the allocation starting point anytime a block is found that collides with the expected allocation space, past the colliding block. The traversal is repeated until a complete traversal is done with no collisions, and aborted if the attempted starting point has looped around the address space.

When calling the reserve function on an existing already reserved file, a new allocation is made. As the updated file record is only committed to the storage upon closing the file, and when no error is flagged, this mechanism seems suitable for atomically upgrading large assets on flash.

Usage

Writing a large file

lfs_file_open(&lfs, &file, filename, LFS_O_WRONLY | LFS_O_CREAT);
err = lfs_file_reserve(&lfs, &file, size, 0);

if (!err) {
  // Erase, write, and verify from GPU RAM directly to SPI flash
  lfs_block_t block;
  err = lfs_file_reserved(&lfs, &file, &block);
  if (!err) {
    EVE_CoCmd_flashUpdate(phost, EVE_FLASH_BLOB_SIZE + (block * lfs->cfg->block_size), gpuSrcAddr, size);
    if (!EVE_Cmd_waitFlush(phost)) {
      // Do not commit the new reservation in case of write error
      lfs_file_reserve(&lfs, &file, 0, LFS_R_ERRED);
    }
  }
}

if (err) {
  // Failed to reserve, optionally remove reservation and fallback to normal file behavior
  lfs_file_reserve(&lfs, &file, 0, 0);
  // ... lfs_file_write(...)
}

lfs_file_close(&lfs, &file);

Reading a large file

lfs_file_open(&lfs, &file, filename, LFS_O_RDONLY);

// Read from SPI flash directly to GPU RAM
lfs_block_t block;
if (lfs_file_reserved(&lfs, &file, &block) == LFS_ERR_OK) {
  EVE_CoCmd_flashRead(phost, gpuDstAddr, EVE_FLASH_BLOB_SIZE + (block * lfs->cfg->block_size), size); 
  // (Or, alternatively, use the flash address to render ASTC textures directly from SPI flash)
} else {
  // Not a reserved flat file, optionally fallback to normal behavior and read as a file instead
  // ... lfs_file_read(...)
}

lfs_file_close(&lfs, &file);

Flags

  • LFS_R_ERRED: Flag a writing error so the reservation will not be committed on close
  • LFS_R_GOBBLE: Force to only use the lookahead allocation strategy, for testing purposes
  • LFS_R_FRONT: Use only the traversal allocation strategy, start allocating in front, for tooling flash images
  • LFS_R_TRUNCATE: Attempt to use truncate to shrink or extend the existing reservation, otherwise allocate normally
  • LFS_R_OVERWRITE: Allow overwriting the previous reservation, useful with LFS_R_FRONT and LFS_R_COPY to repack
  • LFS_R_COPY: Copy existing blocks from the previous area to the new one, no padding

@kaetemi
Copy link
Contributor Author

kaetemi commented Jun 1, 2022

No idea why the automated test is failing. This shouldn't affect anything existing since all the behavior is behind the file type flags. Unless there's some corrupt data existing in the test that happens to hit those flags?

EDIT: Tried putting an LFS_ASSERT(false) in all the new code paths, and none of those got triggered in the existing tests.

====== results ======
tests/test_exhaustion.toml:353:failure: test_exhaustion#5#1 (LFS_BLOCK_CYCLES=5) failed
tests/test_exhaustion.toml:[44](https://github.com/littlefs-project/littlefs/runs/6683251429?check_suite_focus=true#step:11:45)9:warn: max wear: 19 cycles
tests/test_exhaustion.toml:[45](https://github.com/littlefs-project/littlefs/runs/6683251429?check_suite_focus=true#step:11:46)0:warn: avg wear: 0 cycles
tests/test_exhaustion.toml:451:warn: min wear: 0 cycles
tests/test_exhaustion.toml:[46](https://github.com/littlefs-project/littlefs/runs/6683251429?check_suite_focus=true#step:11:47)2:warn: std dev^2: 8
tests/test_exhaustion.toml:463:assert: assert failed with 8, expected lt 8
tests/test_exhaustion.toml:463:assert: assert failed with 8, expected lt 8
    assert(dev2 < 8);

tests passed 873/874 (99.9%)
tests failed 1/874 (0.1%)

@dpgeorge
Copy link
Contributor

dpgeorge commented Jun 2, 2022

This feature would also be interesting to MicroPython, where we need a filesystem which can hold contiguous (read-only) files that can be memory mapped. Eg they can contain resources, bytecode and machine code which can be executed in place (XIP). Essentially we would use littlefs (with this PR) as a container format to store a set of assets in the layout of a filesystem.

@kaetemi
Copy link
Contributor Author

kaetemi commented Jun 2, 2022

====== results ======
tests/test_exhaustion.toml:353:failure: test_exhaustion#5#1 (LFS_BLOCK_CYCLES=5) failed
tests/test_exhaustion.toml:[44]:warn: max wear: 19 cycles
tests/test_exhaustion.toml:[45]:warn: avg wear: 0 cycles
tests/test_exhaustion.toml:451:warn: min wear: 0 cycles
tests/test_exhaustion.toml:[46]:warn: std dev^2: 8
tests/test_exhaustion.toml:463:assert: assert failed with 8, expected lt 8
tests/test_exhaustion.toml:463:assert: assert failed with 8, expected lt 8
    assert(dev2 < 8);

Huh. Somehow incrementing the LFS_DISK_VERSION is causing this test to fail on the powerpc and mips nand test targets.

Minor disk version does need to be upped, as the new file record is not understood by library versions preceding this PR.

@geky geky added the needs minor version new functionality only allowed in minor versions label Sep 7, 2022
@geky
Copy link
Member

geky commented Sep 8, 2022

Hi @kaetemi, sorry about such a large delay in getting to this. This is really great work! Working around the lookahead buffer for large files is especially clever.

A bit of unfortunate timing, but I've been looking into some larger changes to the file data-structure for performance reasons. In particular I'm planning to move from the existing CTZ skip-list to a more traditional B-tree. Given that a B-tree would not require invasive pointers in the data-blocks, it seems a B-tree would allow you to create the lfs_file_reserve without needing a new data-structure. This would have the additional benefit of flat files appearing as normal files on littlefs drivers that don't support flat files.

Thoughts?


It's worth highlighting that this is a ~11% code size increase (15722 B -> 17462 B on thumb). This isn't a deal-breaker, just that we should either put this behind #ifdefs to let users not compile it in, or at least put an additional measurement into CI to keep track of the minimal littlefs code size.


Huh. Somehow incrementing the LFS_DISK_VERSION is causing this test to fail on the powerpc and mips nand test targets.

Ah, that test measures wear distribution and asserts that the standard deviation is within reasonable values. I've also noticed it's flaky on NAND, it writes a fixed amount of data and this currently isn't enough for inter-block wear-leveling to kick in with NAND's large block sizes. It's up to chance if the same block gets selected twice for metadata which will make it fail the check.

I wouldn't worry about it, we can disable the test for NAND for now and create an issue to make the test more robust (which probably just means throwing more data at it).

@kaetemi
Copy link
Contributor Author

kaetemi commented Sep 15, 2022

Given that a B-tree would not require invasive pointers in the data-blocks, it seems a B-tree would allow you to create the lfs_file_reserve without needing a new data-structure. This would have the additional benefit of flat files appearing as normal files on littlefs drivers that don't support flat files.

That sounds interesting. Would it be better in that case to mark the relevant files with a flag to specify them as "flat", or should it just run through the B-tree to check if the file qualifies whenever it is needed? (Performance / complexity trade off when opening the files.)

It's worth highlighting that this is a ~11% code size increase (15722 B -> 17462 B on thumb). This isn't a deal-breaker, just that we should either put this behind #ifdefs to let users not compile it in, or at least put an additional measurement into CI to keep track of the minimal littlefs code size.

Yes, agreed. Better to put it behind a macro so it can be taken out when not needed.

@geky
Copy link
Member

geky commented Oct 24, 2022

That sounds interesting. Would it be better in that case to mark the relevant files with a flag to specify them as "flat", or should it just run through the B-tree to check if the file qualifies whenever it is needed? (Performance / complexity trade off when opening the files.)

I think a flag would be worth it in this case, but it depends on your use case. If you're going to read the whole file anyways it may not cost anything extra to scan the whole B-tree.

In this hypothetical you could at least store if a file is contiguous in a custom attribute, without requiring in-filesystem changes.

That being said I've been considering including offset+length information in each pointer in the B-tree. This helps take advantage of unused-erased parts of the inner-nodes in the B-tree and as a side effect would let you encode an entire flat file as a single "pointer". Though the intention would be that LittleFS itself remains block-based for allocation, etc.

Sorry I don't have more information, this is all just in the untested design phase at the moment.

@kaetemi
Copy link
Contributor Author

kaetemi commented Jan 18, 2023

I think a flag would be worth it in this case, but it depends on your use case. If you're going to read the whole file anyways it may not cost anything extra to scan the whole B-tree.

A flag would be ideal then. In our case the flat files are generally not read by the microcontroller, they're used directly by the graphics chip. And preferably bootup times of the application should be fast. :)

As long as we can create files that effectively just reserve a flat address space that can be accessed directly outside of the filesystem library, it suits our requirements.

@X-Ryl669
Copy link

X-Ryl669 commented May 2, 2023

Effectively being able to mmap a file is a great benefit, since it would allow to save a ton of precious RAM (many code perform a copy from the file in flash to RAM for parsing the data in place, a mmap would skip this). Is there any progress on this ?

@BrianPugh
Copy link
Contributor

mirroring @X-Ryl669 , I'd find it very useful in one of my projects (Adding LittleFS to the Game & Watch homebrew scene). We memory-map ROM files, allowing us to run games that we couldn't ever fit in RAM. Currently ROMs are just compiled into the firmware, allowing this memory mapping. But, it would be so much nicer if these were part of the filesystem (and still be memory mapped).

@X-Ryl669
Copy link

X-Ryl669 commented Dec 5, 2023

@BrianPugh You can have a look at FrogFS that supports mmap file via its faccess function. It's readonly, but it would fit perfectly for your use case (and you can update it OTA, so it's a very good plus here).

@BrianPugh
Copy link
Contributor

thanks for the reference! I'd say FrogFS almost fits my usecases, but has the drawbacks:

  1. I then have to juggle 2 filesystems (I still need littlefs for writing game-saves).
  2. A super nice-to-have is the ability to incrementally add/delete files (i.e. a normal filesystem). Users commonly have 64MB flash chips which takes ~15 minutes to flash, making adding/removing games less of a casual process.

These are not absolute dealbreakers, but definitely doesn't make it seem like the obvious choice. A nice part if we were to use FrogFS is that it separates out the ROM data from the compilation process, allowing us to make pre-compiled firmware releases (and thus, simplifying the user-experience).

@X-Ryl669
Copy link

X-Ryl669 commented Dec 11, 2023

This is not an issue page, but it might be interesting to note:

  1. ESP32 supports mmap'ing non contiguous flash pages in a virtual contiguous address space (via spi_flash_mmap_pages). The only requirement is that a 64KB contiguous blocks are required.
  2. If littlefs had a way to set the "don't split a file below 64KB" mode, then it could be somehow hacked around (by adding a littefs_get_file_page_map function that would return a ordered list of page index the file is using) and you can then call the former function to map it to the virtual address space and use it in your game. I'm pretty sure it would be somehow possible, maybe by misusing a file attribute to prevent Littlefs to split a 64KB hugepage in 16x1K blocks (a bit like this PR is doing).

With the recent addition of WAMR for ESP32, you can now run XIP's WASM binary from flash, which you could update dynamically without having to reboot, a bit like a true OS. Without MMAP'd, the code needs to be loaded in RAM, and this is usually a lot slower (since you'll read from flash to SPI RAM and then back from SPI RAM to execute it).

@geky
Copy link
Member

geky commented Jan 19, 2024

If littlefs had a way to set the "don't split a file below 64KB" mode

If you want bad ideas, you could set the block_size to 64KiB. Then 64KiB files would always be contiguous.


Sorry, I've holding off on updating this issue until things are more actionable.

This is currently blocked by me behind B-tree support, as with B-trees we should be able to provide contiguous files without an additional file structure. And the fewer file structures we have floating around, the less code we need for backwards compatibility.

This is work is mostly complete, but mostly complete is still a bit far from ready to stabilize.

Contiguous files are also messing with plans for future features in complicated and messy ways. I think the general issue is the filesystem wants control of data location for filesystem purposes and contiguous files limit that:

  1. Error-correction - Contiguous files and byte-level error correction are basically incompatible. Most schemes involve interspersing error-correcting codes into the data, which violates the contiguity of contiguous files.

  2. Data-level redundancy - Data redundancy via parity blocks is more doable but presents problems when trying to repair data blocks. Like any other write, repairs need to be copy-on-write, which means we need to copy blocks somewhere else. But if we copy a block in a contiguous file, this violates contiguity. We could copy-on-write the entire contiguous file, but this adds code cost and is more likely to fail due to space/fragmentation issues.

    My current thinking is that contiguous files would be unavailable for data redundancy. They could still work with data-level checksumming though, leaving it up to the user to handle errors.

  3. Static wear-leveling - Again, the filesystem is copy-on-write, and this conflicts with contiguous files. We would need extra code to handle the special case of contiguous files, and is more likely to fail due to space/fragmentation issues.

    There is also the issue of static wear-leveling invalidating external references to the contiguous data.

    So contiguous files would probably also be unavailable for any potential static wear-leveling.

    This effectively makes contiguous files more like a hole in the filesystem, rather than actual files.

All of these special cases invalidate ealier comments of mine. Contiguous files are going to need a flag, and must be a recognized by littlefs to at least avoid corrupting their contiguousness.


For a while I was considering tying contiguous files to the future-planned optional on-disk block-map.

So instead of allowing filesystem-maintained contiguous files, allow users to reserve contiguous blocks that are tracked as reserved in the block-map.

Users could then store info about the allocated blocks in a file if they wanted to the file to appear in the filesystem tree.

This would add more work to users, but it would make it clear contiguous files don't benefit from other filesystem features.

But then I realized this creates a unavoidable race condition. There is no way for users to avoid leaking storage if power is lost between the block allocation and the writing of tracking file to disk.

So it seems contiguous files need to be implemented in the filesystem in order for the allocation + tracking to be atomic.


So the best plan to me still seems to be:

  1. Add an API to allow requesting a contiguous file of a given size, using @kaetemi's scheme for contiguous allocation.
  2. Mark contiguous files as contiguous on-disk so the filesystem knows to work around them
  3. Allow access to the physical address for things like XIP, mmap, etc.

But this all depends on B-tree support, so I don't think it's actionable yet.

There's some other questions, like should we even support write on contiguous files vs letting users call the bd_erase/bd_prog directly when they have the physical address, but figuring out the details can probably wait until after B-trees...


And it's not really that this feature has low-priority, I realize this is useful for users, just that other features have higher priority. Sorry about that.

@geky
Copy link
Member

geky commented Jan 19, 2024

  1. Mark contiguous files as contiguous on-disk so the filesystem knows to work around them

Actually the on-disk information you need is more like "pinned". That a file's data blocks should not be moved.

That the file is contiguous is only needed at allocation time.

@geky
Copy link
Member

geky commented Jan 19, 2024

With the recent addition of WAMR for ESP32, you can now run XIP's WASM binary from flash, which you could update dynamically without having to reboot, a bit like a true OS.

It think it is worth mentioning that if you only have one binary, you can always partition your disk to store that binary outside of the filesystem while still storing its metadata in the filesystem. The partitioning could be an actual partition table or hardcoded.

You could extend this to a sort of storage pool for contiguous files, though this path could end up hitting rough edges.

@dpgeorge
Copy link
Contributor

like should we even support write on contiguous files

For our use case writing/updating contiguous files is not necessary. We would be happy with a purely read-only filesystem (after it's created), although the ability to append/add new files to this "read only" filesystem would be nice.

@geky
Copy link
Member

geky commented Jan 22, 2024

For our use case writing/updating contiguous files is not necessary. We would be happy with a purely read-only filesystem (after it's created), although the ability to append/add new files to this "read only" filesystem would be nice.

I'm curious, if this is a read-only image, could you not allocate blocks for the contiguous files you need from the end before creating the filesystem, and store "fake" files in the filesystem that describe where they live on disk? This is pretty much equivalent to what you would get with contiguous files, since contiguous files also can't benefit from inlining.

(Maybe contiguous files could be inlined, but that would force their metadata block to become sort of frozen, which would get really messy with littlefs's design).

This would also allow your contiguous files to share blocks, which unfortunately I don't think will be possible with contiguous files in littlefs.

From what I understand the benefit of supporting contiguous files inside the filesystem is specifically to allow rewriting and dynamic allocation from littlefs's pool of blocks.

@BrianPugh
Copy link
Contributor

if this is a read-only image, could you not allocate blocks for the contiguous files you need from the end before creating the filesystem, and store "fake" files in the filesystem that describe where they live on disk?

I've actually proposed a similar thing for my project, but was waiting a little bit to see how this PR would evolve. I'd be willing to volunteer a small C/python project that does this. There are a few complications, but nothing unsolvable.

The primary issue with this, is that we've essentially partitioned the storage, which could be an inefficient use. E.g. How do I balance how much of my flash I should allocate to LittleFS vs this secondary contiguous filesystem? This is especially imbalanced for my use-case (game ROMs in contiguous storage, compressed savestates and other smaller files in LittleFS). It would be much nicer if they could all share the same partition.

@X-Ryl669
Copy link

While thinking about this, I think there are numerous intermediate steps that can be done that would closely fit the need:

  1. First, the need for contiguous space is mainly required for reading. We could live without contiguous write support, IMHO.
  2. On my system, there's a MMU unit, so it's possible to map the flash address to the µc unit's address space. I think the need for this feature is mainly for XIP or large data mapping. In the former case, there's a MMU somehow on the system.
  3. So, given the fact there's a MMU, the only requirement becomes: have a "virtual" block that's the same size as the MMU unit, because using the MMU can make any block appear at linear space.
  4. In a B-Tree like structure, I think it's possible to have multiple type of nodes, for different "block" sizes, without too much impact on the logic of the code (this requires tests for the node type where size is computed)
  5. So, it might be possible to keep the new B-Tree structure to deal with some "super" 64kB blocks that can't be split anymore, yet, being able to move them
  6. In the end, a mapping function could be added (via a callback). It'll be called for each block address in logical data index so that the MMU unit can be programmed to make a virtual, contiguous, address space.
  7. Additionally, a "memory mapped" bit could be added to the file or B-Tree node, so when this bit is set, no move of these blocks is allowed. They only need to be in RAM (not flash), since after reset, the mapping is gone. These bits are removed in the unmap function.
  8. In the previous requirement can't be met, maybe a callback "force unmap" is possible too, to force the CPU to unmap the block in case Littlefs needs to move them. This could allow the CPU to retry mapping the files once they are changed on disk.

@geky
Copy link
Member

geky commented Jan 22, 2024

The primary issue with this, is that we've essentially partitioned the storage, which could be an inefficient use. E.g. How do I balance how much of my flash I should allocate to LittleFS vs this secondary contiguous filesystem?

Fortunately this goes away when your image is read-only. At creation time you know exactly how much storage you need for the contiguous files, so that's how much you allocate and you can give LittleFS the rest.

But I do see where in-filesystem support is useful for dynamic contiguous files. Unfortunately navigating how to fit it into the filesystem gets messy.


@X-Ryl669 I need to think on your comment. These are all interesting/valuable comments in this thread.

@BrianPugh
Copy link
Contributor

For my exact use-case (and I don't think this is super rare), my read-only assets are... "mostly" read-only. Basically, under all normal operating conditions, the data is mmap'd and read-only. However, the assets may infrequently be updated over a slow-data-transfer-medium. In other projects, this could be updating some graphics as part of an OS-update or something.

I certainly understand that this introduces complexities and may not be feasible. If so, I'll pursue the "put contiguous data in another partition and have a littlefs file describe it" idea.

@geky
Copy link
Member

geky commented Jan 22, 2024

On my system, there's a MMU unit, so it's possible to map the flash address to the µc unit's address space.

Out of curiosity what µc are you using? Some devices I've seen have only ~1, 2, 3 XIP regions, which places a significant constraint on how you store things.

In a B-Tree like structure, I think it's possible to have multiple type of nodes, for different "block" sizes, without too much impact on the logic of the code (this requires tests for the node type where size is computed)

You're right this is easy on the B-Tree side, but it creates problems on the alloc side. If you alloc different sizes you risk fragmentation, where a small block sits in a large block and prevents it from being used in larger block allocations.

You can mitigate this with defragmentation (lfs_fs_defrag?), which moves small pieces of data together, but this adds more janitorial work to the system.

So, it might be possible to keep the new B-Tree structure to deal with some "super" 64kB blocks that can't be split anymore, yet, being able to move them

An annoying issue with fragmentation is that it can force larger allocations to fail, even if there is in theory enough storage available.

I guess you could make allocation failure trigger defragmentation, though this would quite severely increase the potential runtime spike of write operations.

First, the need for contiguous space is mainly required for reading. We could live without contiguous write support, IMHO.

But how do you create the contiguous file? Something somewhere needs to write? I suppose this is mostly an API question and not too important.

Additionally, a "memory mapped" bit could be added to the file or B-Tree node, so when this bit is set, no move of these blocks is allowed.

I was reaching the same conclusion, we really only need a "pin" bit on files to indicate that their data shouldn't be moved. This would allow for both contiguous files and more flexible MMU/virtual-mapped files. Though I don't know if mixed pinned/unpinned B-tree nodes is really useful.

They only need to be in RAM (not flash), since after reset, the mapping is gone.

Ah, this would be nice. But I think you need this bit to persist on flash so that contiguous files don't get broken up after a remount. Though this may not be a problem for systems with granular virtual mapping.

@geky
Copy link
Member

geky commented Jan 22, 2024

However, the assets may infrequently be updated over a slow-data-transfer-medium. In other projects, this could be updating some graphics as part of an OS-update or something.

I think this is very common. I guess whether or not to completely rewrite the filesystem in this situation is a bit of a user decision.

It would be nice to not need to reformat the filesystem, but with contiguous files this creates increasing fragmentation. Maybe lfs_fs_defrag is unavoidable conclusion...

I certainly understand that this introduces complexities and may not be feasible. If so, I'll pursue the "put contiguous data in another partition and have a littlefs file describe it" idea.

To be honest I think this features is going to be a minimum >=~1/2 year out in terms of timescales.

@kaetemi
Copy link
Contributor Author

kaetemi commented Jan 22, 2024

For my exact use-case (and I don't think this is super rare), my read-only assets are... "mostly" read-only. Basically, under all normal operating conditions, the data is mmap'd and read-only. However, the assets may infrequently be updated over a slow-data-transfer-medium. In other projects, this could be updating some graphics as part of an OS-update or something.

Yeah, same for our use case. Hence, this PR is designed to simply allow reserving a large contiguous block in one shot and giving direct access to the storage address. I expect most updates are infrequent and will not change size of assets either. Fragmentation may likely happen if there's too many assets, but it's not a killing issue. We can still fall back to regular fragmented files and load them into memory as regular assets instead of directly accessing from flash if that ever occurs.

@X-Ryl669
Copy link

X-Ryl669 commented Jan 22, 2024

I'm not sure fragmentation is induced here.

In order to simplify the work, maybe "just" having the ability to create a Littlefs partition with a super block size (that is, logical blocks that are an (fixed) concatenation of physical block that can't be split) would work. In that case, no special modifications are required, only the ability to pin & unpin a block (to prevent erasing or moving it) and a "give the physical address of the list of logical block for this file" function

If that was possible, I think we could just have 2 partitions: one with logical block size = physical block size for regular files and the other with large logical blocks. For large assets, then the second partition would be used, and mmap'ed in a linear address space via the MMU. Even if there isn't a MMU on the µc, I think it's completely possible to change the code using the asset to work in large page via a "virtual" address array mapping.

I'm using a ESP32 mainly, but also ARM's Cortex A chip that all supports MMU (albeit the former is rather dumb).

But how do you create the contiguous file? Something somewhere needs to write? I suppose this is mostly an API question and not too important.

I don't think we need to write a contiguous file (I would be better if we could, but as you said, there are numerous obstacle like interleaved error correction & checksum blocks and so on). Let's say we don't write a contiguous file but a randomly stored file whose blocks are next to each other (with some granularity) so it reduces the load on the assets loader (no need to load in memory first). I think there's a trade off in the chunk's size vs the performance. If the chunk size is too small, then it's easier to load the asset in RAM to concatenate it (provided we have the RAM for this). If the chunk size is large enough, then the overhead to use a mapping from accessing it is lower than copying it. Some µc have a hardware MMU so the overhead is only at file open time instead of file reading time. If there's no MMU but a possible mmap access, then the code reading the asset will need to be "wrapped" in a pseudo memory mapping interaction. In that case, it won't be efficient to map each 512 bytes block, since many sub assets can't fit in such a small place, so this means having to copy them again to a contiguous memory area. On the opposite, a 64kB block (or even larger) can avoid this copying.

But I think you need this bit to persist on flash so that contiguous files don't get broken up after a remount. Though this may not be a problem for systems with granular virtual mapping.

The "pin" bit isn't there to tell if a file is contiguous or not. I was more thinking about maintaining a "pinned" open's file table in RAM. When a file is opened for being mmap'd, the table would store all the blocks address used for this file (or a bitmap for all block, I don't know what is the most efficient). Then when littlefs needs to work on any blocks, it would search this table and fail a block that is required for this transaction is used in the table (so that the memory mapping is kept working).
Upon reset, we don't care about the previous mapping anymay, so there's no need to store this mapping on the flash.

So if someone needs to move/update/replace the file, he must first unmap it, move/update/replace (this hits the flash) and re-map it.

@geky
Copy link
Member

geky commented Jan 22, 2024

Fragmentation may likely happen if there's too many assets, but it's not a killing issue. We can still fall back to regular fragmented files and load them into memory as regular assets instead of directly accessing from flash if that ever occurs.

This is an interesting point, and clever. But I don't think it works in the XIP use case since you usually generally need the binary to be contiguous for the system to function.

And it's not something you can just throw storage at to solve. You would need enough storage for every file in the filesystem to take up the same size as the contiguous file before you can guarantee that the contiguous allocation won't fail.

The "pin" bit isn't there to tell if a file is contiguous or not. I was more thinking about maintaining a "pinned" open's file table in RAM.

A purely in-RAM flag is a nice idea, and would minimize on-disk changes. But for contiguous files I think you still need to persist this info on-disk. It's not an issue for the MMU use-case, but for contiguous files I assume there would be some application logic that expects certain files to be contiguous (boot bios -> mount fs -> xip "/image.bin").

If some work is done before the contiguous file is opened and pinned, say lfs_fs_gc() is called and that triggers wear-leveling and moves parts of "/image.bin" around, then the system could stop working.

The nice thing about the in-RAM pin is it can always be added as an optional feature without on-disk changes, but it only solves one of the use-cases presented here.

I'm using a ESP32 mainly, but also ARM's Cortex A chip that all supports MMU (albeit the former is rather dumb).

The ESP32 is a common target for littlefs. And they even get "dumber", heck, support for 8-bit MCUs is on the roadmap at some point : )

In order to simplify the work, maybe "just" having the ability to create a Littlefs partition with a super block size (that is, logical blocks that are an (fixed) concatenation of physical block that can't be split) would work. In that case, no special modifications are required, only the ability to pin & unpin a block (to prevent erasing or moving it) and a "give the physical address of the list of logical block for this file" function

If that was possible, I think we could just have 2 partitions: one with logical block size = physical block size for regular files and the other with large logical blocks. For large assets, then the second partition would be used, and mmap'ed in a linear address space via the MMU. Even if there isn't a MMU on the µc, I think it's completely possible to change the code using the asset to work in large page via a "virtual" address array mapping.

If I'm understanding this proposal correctly, can it mostly already be done? With the exception of pinning and getting the underlying block address?

  1. littlefs already supports any block_size a multiple of the physical block_size, you should just need to change the configuration.

  2. You can have two littlefs instances, one on the small block pool and one on the big block pool.

The downside is you need to choose what size to make the pools, but AFAICT this is unavoidable without introducing fragmentation in some way.

@geky
Copy link
Member

geky commented Jan 22, 2024

Yeah, same for our use case. Hence, this PR is designed to simply allow reserving a large contiguous block in one shot and giving direct access to the storage address. I expect most updates are infrequent and will not change size of assets either.

I wonder if the answer for XIP is to include optional on-demand defrag during contiguous allocations. This would get expensive ($O(FS)$), but if these files are infrequently updated, that might be ok.

Humorously, if contiguous files don't participate in error-correction or wear-leveling, I think contiguous allocations end up functionally isomorphic to a hypothetical lfs_fs_shrink().

@BrianPugh
Copy link
Contributor

Humorously, if contiguous files don't participate in error-correction or wear-leveling, I think contiguous allocations end up functionally isomorphic to a hypothetical lfs_fs_shrink().

I think you're correct. I'm a bit loss in this conversation, but how I imagine it would go:

  1. When a file is created/written, it has the option of being a contiguous mmap'd file or not. Under this circumstance, it doesn't participate in error-correction or wear-leveling.
  2. Having several contiguous files throughout the littlefs partition could lead to fragmentation that prevents a file fitting, despite their being supposedly enough space for it do so. To solve this, a defragmentation function would be necessary (and would rarely ever be called). This would make all the contiguous files be sequential at the beginning of the partition, with all the "free-flowing" normal blocks immediately after with no free-blocks intermingled.
  3. Since all the data is now at the beginning of the littlefs partition without any gaps, the filesystem could be safely shrank. So this is a "two-birds-one-stone" feature.

@X-Ryl669
Copy link

If I'm understanding this proposal correctly, can it mostly already be done?

Yes. And you're completely right for your point about contiguous block and fragmentation issues.

If I understand correctly, there are numerous step that can be made toward the goal of a contiguous area in flash:

Step 1: (With physical random order for contiguous files)

Deal at app level, only possible for µc with MMU. Create 2 littlefs partitions, as described above.
This requires a API in littlefs to pin/lock a block list and get there address. The management of the contiguous area is then done by the application, virtually via the MMU. No need to have a permanent pin stored in flash.

Pros

If I understand correctly, it might also be the fastest option, since if the "logical" block size is the size of an erase block (or a multiple of so), you can skip erasing each page sequentially.

Cons

Requires 2 filesystem overhead. Won't work with non MMU system (or requires some change in the app code to deal with a manual virtual mapping)

Step 2: (With physical random order for contiguous files)

Move this logic inside a single littlefs partition. littlefs deals with 2 area in the partition, once that's made for small blocks and another that's dealing with large blocks only. There is no possible data exchange between those area, so no fragmentation can occur.
This would somehow require a way for the app to specify what kind of file to open (maybe using a O_HUGEFILE or something to refer to the latter area ?)

Pros
Same as above, should reduce the filesystem overhead to only a single partition

Cons
Requires change the on-disk format. Still require a MMU or a modification of the app code to deal with random large block.

Step 3: (Path toward contiguous file on the flash)

Start implementing a defragmentation process. I think having 2 pools here is way easier to help defragmenting. In effect, the "small" pool doesn't actually require to be defrag, but any free space in it can be used to move the blocks from the "large" large pool to reorder and sort them.

If done correctly, the writing/updating process would be:

  1. (Optional) Delete previous (large) file
  2. Launch a defragmentation step. littlefs is unusable while this process is performed (easier)
  3. Defragmentation reorder the files in the large pool so that they are contiguous and compact
  4. Write the new (large) file in the large pool by first allocating its size and then mmap'ing it or sequentially filling it
  5. If, for some reason, the large file can't fit in the free space (because of a bad block, or whatever), fail here

Pro:
This is a true contiguous area so it can be used on non-MMU capable µc (like Cortex M)

Con:
The number of modification is (very) limited and it's a long and painful process. Might be ok for rare system update through.

@BrianPugh
Copy link
Contributor

BrianPugh commented Jan 28, 2024

if this is a read-only image, could you not allocate blocks for the contiguous files you need from the end before creating the filesystem, and store "fake" files in the filesystem that describe where they live on disk?

It's nowhere near a working state, but I began writing some python code implementing this idea.

https://github.com/BrianPugh/gnwmanager/blob/rom-fs/gnwmanager/romfs.py

This ends up being similar to FrogFS, but with:

  1. The references being in littlefs rather than self-contained.
  2. Minimal-move Defragmentation logic (doesn't exist in FrogFS since it's purely read-only).
  3. No path-hashing; unnecessary for my application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs minor version new functionality only allowed in minor versions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants