How to deal with 20 million + files #10450
Unanswered
alita-moore
asked this question in
Help
Replies: 1 comment 7 replies
-
Hi @alita-moore! Thanks for reaching out! Yes. DVC can be slow for that scale of dataset. We have a new tool that we've been building for those purposes. We will be releasing it June 25th. You can learn more in this recent talk our CEO, @dmpetrov gave at OSS4AI (at the 1:02 mark). The tool will work processing images, text, video, audio data at scale for computer vision, LLM, or Multimodal applications. More info can be found at https://dvc.ai and if you'd like to talk about your use case and see a demo, you can book a meeting here: https://calendly.com/dmitry-at-iterative/dmitry-petrov-30-minutes |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to use dvc to manage 20 million + small files, but I think it's pretty slow when dealing with many files. Is there a common way of handling cases like these such as using an intermediate zip file or something to that effect? Is 20 million beyond the scope / abuse of the tool? should I use a different tool instead?
Beta Was this translation helpful? Give feedback.
All reactions