Multiple simultaneous mounts #83

Vadiml1024 · 2022-04-23T15:10:41Z

I'm working on a project where we need to simultaneously mount a LOT of archives and I'm talking about thousands of them.
For the moment while experimenting with ratarmount we launch separate instance of ratarmount for each archive.
This generates ENORMOUS memory pressure on the system (especially when ratarmount distributed as AppImage because each invocation has seprate nonshared instance of python interpreter)
So we are considering to add an option to to read a file containing pairs:
"filename" "mountpoint"
and have single instance of ratarmount create all FuseMount objects for them.

Do you have any feedback on this idea?

mxmlnkn · 2022-04-23T19:08:16Z

I'm not even sure whether FUSE and/or fusepy allows such as setup but I didn't try. For the mount point, it forks off a background process that keeps running. Maybe it works to do that multiple times but I kinda doubt it.

You could use union mounting (if each archive has a top-level directory) or recursive mounting for that, or are there problems?

In the worst case, if you really want specific mount point locations, you could use symbolic links like this:

# Create two archives at random positions
tar1=$( mktemp --suffix=.tar )
tar2=$( mktemp --suffix=.tar )
sample=$( mktemp )
echo foo > "$sample"
tar -cf "$tar1" "$sample"
tar -cf "$tar2" "$sample"
# Create an intermediary structure to mount both of them
folder=$( mktemp -d )
# Unfortunately, recursive mounting does not follow symlinks, so use hardlinks. I'd almost categorize this as a bug.
ln "$tar1" "$folder/$( basename -- "$tar1" )"
ln "$tar2" "$folder/$( basename -- "$tar2" )"
# Mount recursively
ratarmount --recursive "$folder" mountpoint
# Create links for desired mountpoint locations
ln -s "mountpoint/$( basename -- "$tar1" )" mountpoint1
ln -s "mountpoint/$( basename -- "$tar2" )" mountpoint2

One problem I see with this is that there is no option to specify the recursion depth, but that seems easier to implement than the pair-wise parsing.

Btw what kind of archives are you using and what are your performance requirements? If all file system calls go through the same ratarmount instance, it might bottleneck at some point if there are too many. And when using compressed archives, especially bz2 archives, the default is to use parallel decoding, which increases the memory usage. Use -P 1 to change that.

Vadiml1024 · 2022-04-23T22:36:37Z

It is an interesting idea, but I'm afraid will not work in our case, because the various archives (and directories) are residing on different disks so handling are nogo.... However it makes me think about another possibility - *VirtualMountSource* class which will serve the role of **$folder** from your example.... Should be not too complicated to implement.... So ratarmount will receive (somehow) a list of filenames will create a *VirtualMountSource* and populate it with the supplied filenames and will create *FuseMount *with this *VirtualMountSource...*

…

On Sat, Apr 23, 2022 at 9:08 PM Maximilian Knespel ***@***.***> wrote: I'm not even sure whether FUSE and/or fusepy allows such as setup but I didn't try. For the mount point, it forks off a background process that keeps running. Maybe it works to do that multiple times but I kinda doubt it. You could use union mounting (if each archive has a top-level directory) or recursive mounting for that, or are there problems? In the worst case, if you really want specific mount point locations, you could use symbolic links like this: # Create two archives at random positions tar1=$( mktemp --suffix=.tar ) tar2=$( mktemp --suffix=.tar ) sample=$( mktemp ) echo foo > "$sample" tar -cf "$tar1" "$sample" tar -cf "$tar2" "$sample" # Create an intermediary structure to mount both of them folder=$( mktemp -d ) # Unfortunately, recursive mounting does not follow symlinks, so use hardlinks. I'd almost categorize this as a bug. ln "$tar1" "$folder/$( basename -- "$tar1" )" ln "$tar2" "$folder/$( basename -- "$tar2" )" # Mount recursively ratarmount --recursive "$folder" mountpoint # Create links for desired mountpoint locations ln -s "mountpoint/$( basename -- "$tar1" )" mountpoint1 ln -s "mountpoint/$( basename -- "$tar2" )" mountpoint2 One problem I see with this is that there is no option to specify the recursion depth, but that seems easier to implement than the pair-wise parsing. — Reply to this email directly, view it on GitHub <#83 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG76GOIKBKCELKF4ZIGUXTVGRDCXANCNFSM5UE2JCOQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

mxmlnkn · 2022-04-24T06:27:43Z

It is an interesting idea, but I'm afraid will not work in our case,
because the various archives (and directories) are residing on
different disks
so handling are nogo....

I forgot to update the code and comment. Symbolic links and therefore multi-disk also works! The problem in my tests was that I created the test files without any extension, which makes the recursive mounting fail because it only looks at extensions first to avoid expensive "disk" or archive/decompressor accesses.

However it makes me think about another possibility -
VirtualMountSource class
which will serve the role of $folder from your example....
Should be not too complicated to implement....
So ratarmount will receive (somehow) a list of filenames will create a
VirtualMountSource and populate it with the supplied filenames
and will create *FuseMount *with this VirtualMountSource...

I think the UnionMounting feature is already very similar. You can do:

ratarmount foo1.tar foo2.tar foo3.tar mountpoint

But this will mount the contents of each archive into the mount point. To be more generic, you would want to mount the contents of each archive under a different folder under the mount point. I think that should be doable with a command line flag and I would prefer this to a new pairwise command line syntax. It would be similar in semantic to 7-zip's "Extract here" vs "Extract to Folder" options.

mxmlnkn · 2022-05-28T11:44:34Z

This generates ENORMOUS memory pressure on the system (especially when ratarmount distributed as AppImage because each invocation has seprate nonshared instance of python interpreter)

Are you sure the only problem is the Python interpreter binary? I assume your hands are bound to use that AppImage :/

One other thing I can think of might be the SQLite cache size. There currently is no option for that but as you have already forked, you can add it yourself inside SQLiteIndexedTar._openSqlDb: PRAGMA CACHE_SIZE 16 and maybe repeat for the PAGE_CACHE_SIZE. But if #85 is responsible for the memory usage, then this might also not help much.

Vadiml1024 · 2022-05-28T12:27:34Z

Are you sure the only problem is the Python interpreter binary? I assume your hands are bound to use that AppImage :/

Actually, You are right, I can't be sure about it, it was simply the first thing that came to mind as I saw the machine brought to its knees with a 1.5K ratarmount instances running.

Now seeing you mentioning CACHE_SIZE and PAGE_CACHE size brings me to understand why I've seen (and continue to see even with single instance of ratarmount) the 160G of virtual memory usage on 16G machine.
These are probably zero-filled pages without physical memory backing, preallocated by SQLite.

mxmlnkn · 2023-02-20T19:41:11Z

I'm not sure whether you are still interested. I have two ideas for realizing something like that:

--disable-union-mount or maybe --mount-in-subfolders. This option would simply mount each archive in an identically named subfolder. I guess that it does not scale up if there are multiple archives with the same name.
--batch-mount <file>. Each line in the given file would be a set of arguments just as if calling ratarmount. All given mountpoints are limited to the actual mount point subfolder.

Example for 2:

# Create a.zip containing foo.txt
# Create /tmp/b.zip containing bar.txt
ratarmount --batch-mount <<HEREDOC all-mounted
--recursive a.zip mounted-a
b.zip mounted-b
a.zip b.zip union-mounted
HEREDOC
tree all-mounted

Expected output:

all-mounted
+- mounted-a
   +- foo.txt
+- mounted-b
   +- bar.txt
+- union-mounted
   +- foo.txt
   +- bar.txt

The --recursive in the first line is only to exemplify that in the ideal case all mount-relevant options can be specified per submount. Of course some command line arguments are not applicable for these submounts, I'll have to make a list of applicable options.

As for files with newlines in them, I guess I'll need a second --batch-mount0 similar to find's -print0. In that case, I might take this one step further and have each argument be separated with one \0 and each submount with \0\0. That should cover all kinds of problematic input.

Vadiml1024 · 2023-02-20T19:47:50Z

Sure, It will be pretty useful in our use case

mxmlnkn · 2023-02-20T19:49:22Z

Same as 2 but, instead of specifying a file, each -- in the command line would start a new submount.

Same example for 3:

ratarmount all-mounted --recursive a.zip mounted-a -- b.zip mounted-b -- a.zip b.zip union-mounted

Normally -- is used to stop command line argument parsing and take all remaining arguments verbatim. I'm not sure whether I use that or need that. If there is an archive starting with -- it could also be specified as ./--archive-name.zip instead. Or alternatively, I could use --- to separate submounts. So something like this would work:

ratarmount --batch-mount all-mounted --- --recursive a.zip mounted-a --- --recursive -- --b.zip mounted-b --- a.zip b.zip union-mounted

I'm not yet sure how to specify the outer mount folder location.

Vadiml1024 · 2023-02-20T19:53:14Z

Outer mount folder location: --target mountpoint

mxmlnkn · 2023-02-20T21:55:31Z

ratarmount --batch-mount-listen. Instead of specifying a single file, ratarmount can be used like a server. The format would be the same as proposed for the files but (zero-delimited) lines can be piped to ratarmount after the startup / after the mount point has been created. This also includes lines for unmounting. My first idea was communication via stdin but it might be more conventional to use a socket, which I have not much experience of using. Or maybe something like named pipes?
```
ratarmount --batch-mount-listen special-file all-mounted
echo "a.zip mounted-a" >> special-file
echo "--unmount mounted-a" >> special-file
...
```
Heck, this doesn't even need a socket or named pipe, ratarmount could simply offer a writable special file in the root of the mountpoint that it monitors similar to some files in /sys/.

Vadiml1024 · 2023-02-21T01:59:51Z

Great idea

mxmlnkn · 2023-02-21T22:07:38Z

ratarmount-manylinux2014_x86_64.AppImage.zip

For now, I have implemented the simplest solution, the --disable-union-mount. I would be very interested if this actually reduces the memory usage you observed. The usage would be:

ratarmount-manylinux2014_x86_64.AppImage --disable-union-mount denormal-paths.zip large.zip mountPoint
tree mountPoint

Possible output:

mountPoint/
├── denormal-paths.zip
│   ├── foo
│   ├── root
│   │   └── bar
│   └── ufo
└── large.zip
    └── 10k-1MiB-files.tar.gz

3 directories, 4 files

mxmlnkn changed the title ~~Multiple simulatanious mounts~~ Multiple simultaneous mounts May 4, 2022

mxmlnkn mentioned this issue May 27, 2022

An option to limit a recursion depth on when using -r #84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple simultaneous mounts #83

Multiple simultaneous mounts #83

Vadiml1024 commented Apr 23, 2022

mxmlnkn commented Apr 23, 2022 •

edited

Loading

Vadiml1024 commented Apr 23, 2022 via email

mxmlnkn commented Apr 24, 2022 •

edited

Loading

mxmlnkn commented May 28, 2022

Vadiml1024 commented May 28, 2022

mxmlnkn commented Feb 20, 2023 •

edited

Loading

Vadiml1024 commented Feb 20, 2023

mxmlnkn commented Feb 20, 2023 •

edited

Loading

Vadiml1024 commented Feb 20, 2023

mxmlnkn commented Feb 20, 2023 •

edited

Loading

Vadiml1024 commented Feb 21, 2023

mxmlnkn commented Feb 21, 2023

Multiple simultaneous mounts #83

Multiple simultaneous mounts #83

Comments

Vadiml1024 commented Apr 23, 2022

mxmlnkn commented Apr 23, 2022 • edited Loading

Vadiml1024 commented Apr 23, 2022 via email

mxmlnkn commented Apr 24, 2022 • edited Loading

mxmlnkn commented May 28, 2022

Vadiml1024 commented May 28, 2022

mxmlnkn commented Feb 20, 2023 • edited Loading

Vadiml1024 commented Feb 20, 2023

mxmlnkn commented Feb 20, 2023 • edited Loading

Vadiml1024 commented Feb 20, 2023

mxmlnkn commented Feb 20, 2023 • edited Loading

Vadiml1024 commented Feb 21, 2023

mxmlnkn commented Feb 21, 2023

mxmlnkn commented Apr 23, 2022 •

edited

Loading

mxmlnkn commented Apr 24, 2022 •

edited

Loading

mxmlnkn commented Feb 20, 2023 •

edited

Loading

mxmlnkn commented Feb 20, 2023 •

edited

Loading

mxmlnkn commented Feb 20, 2023 •

edited

Loading