-
Notifications
You must be signed in to change notification settings - Fork 11
mnist tutorial failing #15
Comments
Probably it means that it's out of date as well. Needs some care. |
|
I think it's not from the age of packages, the container is limited in CPU and memory. It takes infinite time to compile Pandas. I don't think a newer version will solve the problem. A precompiled version from
I installed We can have some SVM parameterization and use this as an experiments tutorial. The downloaded data is not image, though, it's CSV file extracted from images. I can add featurization as well. All the content and commands should change though, Actually removing may be OK. It needs a total rewrite. I can create a new one using https://github.com/iterative/dvc-checkpoints-mnist |
I'm fine to remove and start from the Dave's one. |
I tested the Katacoda has 1.5GB of RAM. I can create a Docker environment to simulate this. BTW I read Docker discussions in iterative/dvc.org#811 and iterative/dvc#2844 Instead of a general purpose Docker image, a container to download the data and set up the example project may be provided in the docs. We can use for the tests and if they like, people may build on top of these or create their own Docker environments. |
@iesahin do we know what takes all the memory? It's a bit unexpected that MNIST requires that much RAM. |
@shcheklein I didn't profile it thoroughly but the line in training that builds the prediction, There may be some engineering, like increasing the swap space or manual gc to reduce the required memory. But Torch itself is a bit expensive library to run with 1-1.5 RAM+1GB swap. There may be some different versions of the classifiers, like random forest, SVM, NB, CNN, MLP etc. to test & experiment. (Selected via parameters in DVC.) We can use the modest ones in katacoda, but the users may try all of them in their own environment. |
I can (and probably should) load mini-batches of data in the example, which could help, but maybe not if PyTorch already uses almost all the available memory. We could also try a more lightweight deep learning framework. Also curious which branch you are using @iesahin? |
Let me first profile the script. I doubt mini-batches will solve the memory
problem, (PyTorch download size is around 700MB,) but it may help to
converge faster. It takes around 100 epochs to reach >0.90 accuracy.
I tested on several branches but traced on `basic` branch. Thank you.
|
If |
I tested dogs and cats data and model versioning tutorial on katacoda in a docker container: https://dvc.org/doc/use-cases/versioning-data-and-model-files/tutorial Tensorflow runs but creating the model takes a long time. But I'm not sure if it can be done near instantly. We still may need to have smaller datasets/models for Katacoda. I'll also test MNIST dataset with TF on katacoda. TF seems more suitable for low memory environments. MNIST is known better and it's like Hello World for the ML tutorials. |
I tested the MNIST example in TF site. https://gist.github.com/iesahin/f3a22ebca5b52579748dc7d724047c8d It takes less than 1 minute for the whole script to finish on Katacoda. The model is a bit simple, no CNN, 1 Dense/128 layer. (97% val. acc.) But at least now we know it's possible to use MNIST on Katacoda. |
In Step 1: pip install -r requirements.txt fails to run.
The text was updated successfully, but these errors were encountered: