-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supermaven-nvim sends the entire buffer to the server even when ignore_filetypes is configured to skip the file #85
Comments
seems like this is a dupe of https://github.com/supermaven-inc/supermaven-nvim/pull/35/files from 3 months ago which isn't even merged. what an incredible lack of urgency for a huge privacy issue. |
https://github.com/supermaven-inc/supermaven-nvim/pull/35/files does not address the issue being raised here, as the sm-agent binary automatically includes files in the git repo as part of the context, even if they are not opened. If a file contains sensitive information it should be included in This isn't clear from documentation, so I think we could change that... |
this also needs to be clearly documented.
this is untenable for large or internal repos. imo there should be a way (allowlist and blocklist) to configure which repos to enable. |
I think this #58 could solve it in a programmable way. Check the path of the file, and disable supermaven when needed. Because |
I've merged both PRs mentioned, as they are useful in their own rights, and could seemingly help address some of the privacy concerns here, though as I mentioned earlier these don't address the underlying issue involving |
Thank you for confirming because I was wondering the same thing. I believe your point is that the full context of the repository is sent at startup via the sm binary which has nothing to do with the neovim plugin / config? And the only way to prevent things is either in the Do I have this correct @sm-victorw ? |
@sm-victorw this brings up two further questions I have been wondering about:
Thanks in advance for clearing these things up! |
Yes this is roughly what is happening, though depending on how large the repository is, the context might not include everything. Also note that the context is kept on the server for up to 7 days, as mentioned in the code policy (https://supermaven.com/code-policy)
There isn't any way currently to see exactly what is being included in the context, if you are interested in what files are eligible to be included the If you are interested in whether or not a file is being ignored,
Whenever a file is sent to the binary, the only The lack of control for non-git files is unfortunate, and should have a robust solution. |
Thank you for answering all my questions. Exactly what I needed. I think the biggest "risk" are the files outside of the git repo. Personal markdown notes, internal docs etc. Is this something handled in the nvim plugin? If so I wonder if for the time being a super conservative approach of just prompting the user in nvim for any file outside of the git repository asking if they want it uploaded? Since typically these will just be one off files opened up ad-hoc. Another option that would be nice is a config option to just blanket disable uploading any files outside the git repo (if that's possible). |
Wouldn't a single
|
@ahmedelgabri thanks for the response.
Is there any official documentation on using supermavenignore? |
Is there a way for supermaven to just not do this in the first place out of the box or does it have to have this behavior? No one wants their personal information leaked. |
@leet0rz Could you specify which behavior you are referring to? The uploading of non-repository files? Or the repository based indexing that the binary performs? We could probably give the option to have the plugin disabled by default, and require a call to the api ( |
I mean not entirely sure how this works but this does seem like a major privacy concern, as stated before obviously people will run this in all sorts of notes and would never want their personal information uploaded or leaked in any way and supermaven should not be uploading this sort of information in any way to anything ever. What I heard is that it uploads the entire buffer and I guess sources or creates information or "AI responses" or inputs that we can accept from that? If that is the case, is it possible to do this locally instead of uploading it (which is the privacy concern). I hope I am doing an ok job explaining this and have actually understood what's going on? |
@leet0rz the power comes from uploading. Most laptops are not powerful enough to do the type of processing it does and even if it could our laptops would be burning up high cpu/gpu/ram resources constantly. Also to be clear, this is how most of these AI code tools work including GitHub copilot. The difference is Supermaven is more powerful sending your entire repository to its models (more context). None of those things are the main problem. The main problem really is files that are not in your git repository but that you open in a buffer because those also are being sent up to the servers. |
@sm-victorw I think this would be great as step 1. But I think the other important thing should be changing the default of any files that are not part of your current opened git repository should be opt-in instead of opt-out. By default files outside git repo are not sent to servers unless you white list them... preferably a glob / glob array, or even better a callback function we can configure to return true if we want a file sent to servers (with the file path as an input parameter to the cb function). Thoughts? |
What about usage outside of github when you just use neovim to open personal files, which a lot of us do. Will that still not upload the entire buffer and cause a privacy concern? I mean I use neovim to open any file I want to edit outside of github related things too and if a file with sensitive information I open out of some text document and with supermaven being enabled by default will that not cause said privacy concern? |
@leet0rz yes that is the concern we have been discussing in this thread. It is definitely a concern. I was just explaining why the idea of doing anything local just on your machine is not an option. |
@leet0rz Yes, both the pull requests mentioned earlier in this issue can help mitigate this issue, but as I mentioned earlier we are going to want a robust and clear approach for letting users specify which files they would like to exclude |
@GitMurf @sm-victorw Cool thanks guys. |
Another side of the problem is, that if I create some temporary file, I should first update I mean, normally, it is the opposite - I work in project local directory which is "safe", and only when commiting, think what should be commited and what should be gitignored and what should be deleted. I mean now, if I create any temporary and/or scratch file with some probable secret inside the repo folder, even when nvim runs in different window, e.g. as a script output (I usually do some Which is even a worth problem, because many tools "expect" to run from project folder to pick up configuration. Atm, I think I might do: # .supermavenignore
*
!*.js
!*.jsx
... This at least might prevent some surprizes. |
As well what might be useful - a GLOBAL IGNORE, somewhere in |
I have seen this issue again and again. I stopped using it for a while as it's a big issue. |
For me the issue is having to add files to ignore, I don't want to do that. I want non-code files to be ignored by default. I don't want to keep track of and ignoring every file except for my code files, that should be default behavior if it's not. |
Can you elaborate on what you mean it 'skips'? As in you get completions on files which are included in |
There is a way to guarantee that binary does use only permitted files on MacOS via sandboxing. This is a native OS feature, thus highly secure and only couple text files needed. How to do:
#!/bin/sh
sandbox-exec -f /.../supermaven.sb /.../.supermaven/binary/v15/macosx-aarch64/sm-agent "$@"
Here first pack is needed to start binary correctly (including all shared system libs), then read its own folder, then read ignores and restrict to ruby/lua.
This ^^^ is a fully working template, which I wanted to improve, but don't have time atm. Thus decided to post it AS IS that somebody may have pick it up. When/if I will have more time to work on this, will post updated version. Beauty of this way, is that compliance is guaranteed by OS sandboxing (at least for binary), plugin is another story it may send whatever directly. Definitely system libs restrictions should be fine-tuned more, but overall I don't care that much about that part, as this is "normal binary way" something, doesn't relate much to personal sensitive info. |
Yes as it doesn't work, I get completions on files added to .gitignore this happens on nvim and goland, when I use vscode it ignores the files, open the same project on goland I get completions, sometimes it can work then sometimes it just doesnt, it's a hit and miss like 50% of the times it ignores fine the other 50% it autocompletes. I made sure I opened the project root, same as in vscode. Checked .gitignore, put both |
I've been on the pro trail of Supermaven for that past week and came across this issue. If I were to have a preference as a vim user, I think it would be for file groups. Sort of like how prettier works with lazyvim. Named groups that can be 1 to N file extensions. Then you can give people presets, but they also can just enable explicitly what they need. You'll also likely need a local override for repos where you want to enable "more" types. E.g. you might have a infrastructure repo with both terraform and pulumi in it, or k8s manifests you want to manage where you want to use the YAML integration. But I won't want supermaven to pay attention in the repo I'm in. Then it becomes DENY by default, with explicit ALLOW. Over the current status quo of ALLOW by default, with explicit DENY In my case I tend to only work in a handful of languages, and always configure vim to only really care about those languages over enabling everything. If this is explicit on setup, and there's concrete examples and definitions of each "group" then it should be easy for a user to enable what they need. It also would put the risks front and centre with it being part of the setup. A great place to educate users on the risks associated with enabling copilots. |
let's suppose we have this structure. ~/.config 1)if I add a .supermavenignore in the ~ (so in the root), it's gonna make sm-agent stop working for all the child folders? 2)if in the .supermavenignore I add !./repos/my-custom-project supermaven will work only for that folder?
can we use .supermavenignore as whitelist? We have tons of professional that could use these and pay these AI features, but for the "fear" of sending "important" repos to the servers, we're afraid of that, why nobodys goes for the "whitelist" approach? |
@lucax88x Currently the Based on the discussion taken place so far it sounds like a |
if in this
then yes, I think it would be great, imho! |
I guess we could just only launch supermaven if a certain filetype is open too and have it disabled otherwise, that would be a bit better for my usage and I could just do that myself with an autocmd. |
@leet0rz even with an autocmd, wouldn't you need to pass these preferences to sm-agent? I think what @sm-victorw is pointing out is whatever is done, it's the agent that needs to know about inclusion rules. I also might be missing something with how sm-agent works. So I'm not even sure writing a custom function in the setup like so
would stop files being added to the prompt context, I think this just stops the sm-agent from being started when the condition is met? @sm-victorw if I were to add a rule to say "ignore js files" as a condition function, and then open up a markdown file that starts the agent, would files "ignored" this way in setup still be sent to the server if it's not in one of the ignore files? |
I just set one up and it seems to work fine, I am getting the completion at least for the filetypes of my choosing and not in other files like txt or md where I do not want it running. |
That's not the main problem, the main problem is that Hence the Having a native way of doing this through the agent binary itself when it runs will eliminate these issues, it could be something like this: ignore = {
patterns = {'^.env.*'},
filetypes = {'md', 'sh'}
} Then passing these to the binary as flags for example sm-agent --ignore-pattern '^.env.*' --ignore-filetype md --ignore-filetype sh |
So even if I open it only on a certain filetype, sm will scan anything in that directory and all the subdirs? I could see that being bad if that is the case and in my opinion that should not be a default ever. |
supermaven-nvim adds a TextChanged autocmd here which calls binary:on_update
supermaven-nvim/lua/supermaven-nvim/document_listener.lua
Line 20 in d71257f
BinaryLifecycle:on_update sends everything to stdin which i assume ends up writing to the server (it's a closed source binary that is fetched so i can't easily check):
supermaven-nvim/lua/supermaven-nvim/binary/binary_handler.lua
Line 77 in d71257f
this code path never seems to hit poll_once which is the only place where ignore_filetypes seems to be checked:
supermaven-nvim/lua/supermaven-nvim/binary/binary_handler.lua
Line 293 in d71257f
it seems misleading that
ignore_filetypes
doesn't actually ignore files of that filetype and instead will send everything in every buffer backed by a file.The text was updated successfully, but these errors were encountered: