add: taking more than 20min when multithreaded vs 20s with one job #8008
Labels
A: data-management
Related to dvc add/checkout/commit/move/remove
fs: nfs
performance
improvement over resource / time consuming tasks
Bug Report
Description
Following iterative/dvc-objects#99
I initialized a local repository and tried adding a 74MB folder of 564 files to dvc. I ran the command on a cluster node with 32 cpus, with no remote. Running
time dvc add data
while forcing the number of workers to 1 with iterative/dvc-objects#99 produced this outputwhile leaving the number of workers to the default 32 * 4 produced this
which I aborted, but had run until completion before, just slowly.
I paid attention to the cache and cleared it, as well as any generated file, before each execution. The blocking part of the add command seems to be happening here : https://github.com/iterative/dvc-data/blob/main/src/dvc_data/transfer.py#L180-L186, and the
core.checksum_jobs
option doesn't affect this operation.The
--jobs
option is only available with the--to-remote
option, so there is no easy way to disable multithreading. I suspect that parallelizing local copy ops might be the cause of this. However, if I do rundvc add data --to-remote
, and set a local folder as the remote, no blocking occurs no matter the number of workers and the cache fills itself as expected.I could not pinpoint precisely why the behavior between the two commands differ, as they both modify a local folder.
Reproduce
Running
$ dvc add data
takes from 20min to hours in my setup with vanillla dvc, while a few seconds after modifying the source files like iterative/dvc-objects#99.
Also, if this might help
takes a few seconds, while having both local copy ops and being multithreaded.
Expected
I expect the add command to take a few seconds, when it can take up to hours. In any case, being able to configure the number of jobs (not only
core.checksum_jobs
) globally would be great.Environment information
Additional Information (if any):
Due to some restrictions I could neither export the profiling information from viztracer nor from cprofile, my apologies.
The text was updated successfully, but these errors were encountered: