Write a new Experiments scenario with MNIST/Tensorflow #60

iesahin · 2021-04-10T07:48:29Z

The current experiments scenario uses example-get-started project for presenting the Experiments. However most of the train.py commands fail due to out-of-memory errors in Katacoda.

We need a new way to present experiments concepts:

It should depend on MNIST. In our previous discussion in mnist tutorial failing #15, I tested Tensorflow/MNIST example and it worked on Docker on Katacoda. This is a good way to build and test the new example dataset.
Although there are other (and possibly more interesting) datasets like the [Fashion MNIST][fashion] or [Credit Card Fraud Detection][ccfraud], their requirements may be above what can be provided by Katacoda. Hence using the standard MNIST is preferable than venturing into these.

Related to epic iterative/dvc.org#1400

This is also related to #55

The text was updated successfully, but these errors were encountered:

iesahin · 2021-04-10T12:34:14Z

The structure of the project will be similar to the current example-get-started.

The usual way of distributing MNIST dataset (in LeCun's website and deep.ai) is in a format not directly usable by Python. Our prepare.py script will receive this data and convert it into a usable format.

The data will be kept in the data/ directory similar to the current one. The files will be downloaded from a DVC remote. There are 4 files unlike the current single one.

The featurize step will be replaced by normalize. It will normalize, shuffle and prepare the vectors similar to the script. It can have seed and ratio parameters. We can also see the effects of normalization.

There will be a model dependency to train, that just returns the model. We'll have Dense layers with parameterized units, activation and a CNN layer with various parameters.

The training will also receive various loss and optimizer parameters.

The evaluation will use checkpoint: true for the experimentation.

iesahin · 2021-04-13T06:03:28Z

A preliminary params.yaml looks like this:

prepare:
  seed: 20210428
  remix: True
  remix_split: 0.20

preprocess:
  seed: 20210428
  normalize: True
  shuffle: False
  add_noise: False
  noise_amount: 0.0004
  noise_s_vs_p: 0.5

train:
  seed: 20210428
  validation_split: 0.2
  model: "mlp"
  optimizer: "Adam"
  loss: "SparseCategoricalCrossentropy" 
  epochs: 10
  batch_size: 128

model_mlp:
  units: 128
  activation: "relu"

model_cnn:
  dense_units: 128
  conv_kernel_size: 3
  conv_units: 32
  dropout: 0.5

There may be some others, I think the more parameters we have, the more freedom to experiment we can have while writing the documentation.

iesahin · 2021-04-20T21:21:12Z

The project is in https://github.com/iesahin/example-get-started-mnist/

You can install the requirements and run with dvc exp run or dvc repro.

The params file contains various parameters to test with dvc exp run -S and I can add other possible optimizers, activation and loss functions. I'll add a Readme that details possible values for the parameters. Also I can add some deep NNs for users. We can't use them on Katacoda but they can be used to devise more experiments locally.

I'm preparing a Docker container for this project and provide it along with a Katacoda scenario tomorrow.

I can integrate checkpoints examples to this project, WDYT @dberenbaum ?

@shcheklein

shcheklein · 2021-04-21T02:41:38Z

@iesahin does it have checkpoints now?

can we follow the same pattern as we do in the regular get started and keep introducing things per commit? it means that we'll have tags to refer to.

also, we need to codify the project

also, @dberenbaum and @flippedcoder have beeb discussing how to make it more realistic (use drop-in dataset replacement). Have you beed able to solve this problem here?

iesahin · 2021-04-21T08:35:15Z

@iesahin does it have checkpoints now?

It has the implicit checkpoints, as all pipelines, e.g., dvc exp run -S model.name=cnn produces a checkpoint with the CNN as the model. I'll provide explicit checkpoints but I thought this as a replacement for example-get-started and some features of dvc exp is incompatible with dvc repro.

can we follow the same pattern as we do in the regular get started and keep introducing things per commit? it means that we'll have tags to refer to.

When the end project seems OK, I'll begin to write the script to produce that. But I plan to maintain the code in another (possibly private) repository, not to download from S3.

also, we need to codify the project

codify?

also, @dberenbaum and @flippedcoder have beeb discussing how to make it more realistic (use drop-in dataset replacement). Have you beed able to solve this problem here?

We can use Fashion-MNIST as a drop-in replacement. It has the same structure with the standard MNIST. I can look for other MNIST-like (28x28 images) datasets, I remember I saw something about letters recently. The project is general enough to be used with any image set actually, we can just use some image processing to reduce the size of images.

But if you would like to go more realistic, we can build a model zoo and try different models to test transfer learning. I can build something about face recognition that uses various models like VGG, Inception, etc. These won't work on Katacoda and probably need GPU for a decent performance but it may be better to show "replaceable parts" in pipelines. "Here you change this parameter in dvc exp run and you get a whole new model for your task."

iesahin · 2021-04-21T13:41:37Z

You can use docker run -it emresult/get-started-mnist to run the project. I'll move to dvcorg namespace after the review.

Katacoda seems to have some issues. It doesn't even start the containers I set up two weeks ago and this doesn't work there either. Looking into it.

@shcheklein @dberenbaum @flippedcoder

dberenbaum · 2021-04-21T20:46:21Z

Thanks, @iesahin!

IMO having more realistic models is not that important for getting started as long as there are ways to tune it and see noticeable performance improvements. I would vote to err on the side of simplicity in terms of models, parameters, etc.

Checkpoints are still an immature feature IMO, and I agree that they may have some different requirements that make it hard to integrate with the current get started scenario and may not be compatible with other features. Issues that come to mind:

The current scenario has separate stages for training and evaluation, and there's no way to inject checkpoints now to iterate over more than one stage.
Parameter tuning workflow is more complex because users can either train on top of the previous parameters' checkpoints or start from scratch.
Navigating and managing experiments becomes more complex because each experiment has multiple commits.

iesahin · 2021-04-21T21:36:30Z

Thank you @dberenbaum

I agree with your points. For the GS pages, it's a bit overwhelming to talk about checkpoints in complete detail. Instead I plan to tell -S to quickly set the parameter values and get results.

For an advanced Experiments tutorial, I plan to show

checkpoint: in dvc.yaml to iterate over a set of values
make_checkpoint in Python code, in train.py to set dense_units of MLP between 16 and 256, incrementally. The normal end point of the pipeline is evaluate.py, this example will show that we can use make_checkpoint in loops.
I also plan to use -S in a for p in 8 16 32 64 128 ; do dvc exp run -S param=$p ; done as an example to iterate.

These are effectively the same, but users may have different requirements and having options to use dvc exp features in different settings is useful. Personally, I'd use the last one but shell scripts are not a natural way to all.

For the complexity, you're right. For more than one parameter, it becomes increasingly difficult to track the changes but we can compare it with the manual (non-DVC) setups. Hyperparameter search is always tricky and one can drown in quickly. IMHO we have some tools to manage this complexity and we want the users to know these, but these tools may become a hindrance if not used correctly, like Git, for example.

iesahin · 2021-04-21T22:05:13Z

BTW, Katacoda issues seem to be solved. Even the CNN version of the project runs fine. It looks we have achieved single get-started project to run on all platforms with modest requirements goal. I'll update the scenario text tomorrow for the new project.

You can see the container here: https://www.katacoda.com/iex/courses/get-started/experiments

Just issue a dvc repro or dvc exp run to run the CNN version and dvc exp run -S model.name=mlp to run the MLP version.

@shcheklein

dberenbaum · 2021-04-22T14:44:57Z

I think we should keep parameter tuning and checkpoints separate. An advanced section combining them makes sense, but there are still plenty of people who will only want to use one or the other.

As far as complexity of params, I think it quickly becomes hard for users to follow and hard to visualize in exp show. I agree that having multiple parameters (and maybe having them in multiple stages) is important to show the value dvc provides here, but it might be better to figure out the lowest number of params where that can be accomplished and go with that.

iesahin self-assigned this Apr 10, 2021

iesahin mentioned this issue Apr 12, 2021

Add MNIST dataset to the dataset-registry for the new example-get-started iterative/example-repos-dev#27

Closed

iesahin mentioned this issue Apr 20, 2021

Add MNIST Dataset to the Dataset Registry iterative/dataset-registry#7

Merged

dberenbaum mentioned this issue Apr 22, 2021

ug: add checkpoints tutorial iterative/dvc.org#2373

Merged

iesahin mentioned this issue Apr 24, 2021

GS: Update the Experiments scenario with the MNIST dataset and Tensorflow based dvc-get-started-mnist project #63

Merged

iesahin closed this as completed in #63 Apr 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write a new Experiments scenario with MNIST/Tensorflow #60

Write a new Experiments scenario with MNIST/Tensorflow #60

iesahin commented Apr 10, 2021 •

edited

Loading

iesahin commented Apr 10, 2021

iesahin commented Apr 13, 2021

iesahin commented Apr 20, 2021 •

edited

Loading

shcheklein commented Apr 21, 2021

iesahin commented Apr 21, 2021 •

edited

Loading

iesahin commented Apr 21, 2021

dberenbaum commented Apr 21, 2021

iesahin commented Apr 21, 2021 •

edited

Loading

iesahin commented Apr 21, 2021 •

edited

Loading

dberenbaum commented Apr 22, 2021

Write a new Experiments scenario with MNIST/Tensorflow #60

Write a new Experiments scenario with MNIST/Tensorflow #60

Comments

iesahin commented Apr 10, 2021 • edited Loading

iesahin commented Apr 10, 2021

iesahin commented Apr 13, 2021

iesahin commented Apr 20, 2021 • edited Loading

shcheklein commented Apr 21, 2021

iesahin commented Apr 21, 2021 • edited Loading

iesahin commented Apr 21, 2021

dberenbaum commented Apr 21, 2021

iesahin commented Apr 21, 2021 • edited Loading

iesahin commented Apr 21, 2021 • edited Loading

dberenbaum commented Apr 22, 2021

iesahin commented Apr 10, 2021 •

edited

Loading

iesahin commented Apr 20, 2021 •

edited

Loading

iesahin commented Apr 21, 2021 •

edited

Loading

iesahin commented Apr 21, 2021 •

edited

Loading

iesahin commented Apr 21, 2021 •

edited

Loading