-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to push in-memory object directly to remote store? #5068
Comments
Hi @BikashShaw ! There is no native interface for that right now in dvc itself :( Seems like that might be an API extension on a straight-to-remote feature for CLI #4520 , but that would only solve pushing and generating local *.dvc file. But you will still need to push git changes somehow. Could you elaborate on your scenario, please? |
Hi @efiop thanks for getting back on this. I am working with @BikashShaw on this, so let me further elaborate on the use case. While we are making do with this (somewhat hacky) workflow for models, we don't really want to do the same with large data files as it uses the disk as intermediary. So we are looking to see if there is something more elegant like import git
import os
import subprocess
def get_git_root(path):
git_repo = git.Repo(path, search_parent_directories=True)
git_root = git_repo.git.rev_parse("--show-toplevel")
return git_root
def commit_model_with_msg(model_info,
path = "path/to/somewhere/in/models/dir/of/project/repo",
name = "model_expt_x",
commit_msg = "Adding model to vc"
):
"""Start tracking a model using dvc/s3 and git
Parameters
----------
model_info (dict)
A `dict` containing keys `'pipeline'`,`'features'`, and `'explainer'`.
path (str)
A path under `models/` dir of the project where the model's `.dvc` metadata will live
name (str)
A unique identifier for the model
Returns
-------
None
"""
# create directory if needed
directory = os.path.join(get_git_root(os.getcwd()),path)
if not os.path.exists(directory):
os.makedirs(directory)
fname = os.path.join(get_git_root(os.getcwd()),path,name)+".joblib"
# dump model to disk
joblib.dump(model_info, fname)
# dvc add
process = subprocess.Popen(["dvc", "add", fname],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print("dvc add... \n", stdout.decode(),stderr.decode())
# git add
try:
process = subprocess.Popen(["git", "add"]+stdout.decode().split("git add")[1].split(),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print("git add... \n", stdout.decode(),stderr.decode())
except IndexError:
print("Model name already under vc...no changes to the repo")
return
# git commmit
process = subprocess.Popen(["git", "commit", "-m", commit_msg],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print("git commit... \n",stdout.decode(),stderr.decode())
# dvc push
process = subprocess.Popen(["dvc", "push"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print("dvc push... \n",stdout.decode(),stderr.decode())
process = subprocess.Popen(["rm", fname],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
print("rm... \n",stdout.decode(),stderr.decode()) Thanks for helping us! |
@jayant91089 Thanks for the example code! Makes sense! Ok, I definitely see this as #4520 but for API. All the needed internals for your feature request will be implemented in that ticket, most likely. Does the current workflow work fine for you for now? If it does, then I would advise to stick with it until #4520 is implemented. |
Closing as stale. |
Please consider this as a naive question rather than any bug or improvement report.
Is there any equivalent "write" function like dvc,api.read()? We want to push the in-memory object directly to the remote store via python API without saving it to local storage and run the bash command.
Thanks!
The text was updated successfully, but these errors were encountered: