Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots/metrics: accept any viable target #4446

Closed
pared opened this issue Aug 21, 2020 · 20 comments · Fixed by #5339
Closed

plots/metrics: accept any viable target #4446

pared opened this issue Aug 21, 2020 · 20 comments · Fixed by #5339
Assignees
Labels
feature request Requesting a new feature p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension ui user interface / interaction

Comments

@pared
Copy link
Contributor

pared commented Aug 21, 2020

Running:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet


echo "{'metric':0.95}" >> metric.json
git add -A
git commit -am "initial"
git tag v1


echo "{'metric':0.96}" >> metric.json
git add -A
git commit -am "second"
git tag v2

dvc metrics diff v1 v2 -v --targets metric.json

yields no result. metrics diff/show should be able to handle non-metrics targets that can be interpreted by our parsers.

@pared pared added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Aug 21, 2020
@efiop efiop changed the title Make metrics accept any viable target Make metrics/plots accept any viable target Aug 21, 2020
@efiop efiop changed the title Make metrics/plots accept any viable target plots/metrics: accept any viable target Aug 21, 2020
@efiop
Copy link
Contributor

efiop commented Aug 21, 2020

For the record: same for plots

@dmpetrov
Copy link
Member

dvc params diff should follow the same logic. But first, --targets needs to be introduced for params.

@dmpetrov
Copy link
Member

Metrics/plots/params difference for Git-files is a useful scenario even outside of the data-files or pipeline context. It feels like it is a separate functionality in DVC that needs to be fully supported.

For example, it might be used in CI/CD to compare metrics/plots/params with master or previous commits in a repo when DVC is not used to generate reports like this - iterative/cml_dvc_case#4

Additional scenarios to support:

  1. metrics/plots/params diff for not-DVC repos (with --targets specified).
  2. metrics/plots/params diff for DVC repos without any pipelines (without --targets specified). So, a dummy\phony stage might be needed.

@dmpetrov dmpetrov added the product: VSCode Integration with VSCode extension label Sep 1, 2020
@pared pared added p1-important Important, aka current backlog of things to do and removed p2-medium Medium priority, should be done, but less important labels Sep 1, 2020
@dmpetrov dmpetrov mentioned this issue Sep 7, 2020
@efiop efiop added p0-critical Critical issue. Needs to be fixed ASAP. and removed p1-important Important, aka current backlog of things to do labels Sep 8, 2020
@efiop efiop assigned efiop and unassigned efiop Sep 8, 2020
@pared pared self-assigned this Sep 10, 2020
@pared
Copy link
Contributor Author

pared commented Sep 11, 2020

@dmpetrov I see here following scenarios (lets take plots as example command):

  1. DvcRepo, file that is not marked as plot, but we are able to dvc plots show and dvc plots diff - in this case providing targets is required - because we need to know what we are trying to parse.

  2. DvcRepo, some file is marked as plot, but not using dvc run --plots (but, for example dvc plots add {file}). Later we are able to dvc plots show/diff without targets, and dvc is able to find it by itself.

  3. GitRepo, we are able to dvc plots show/diff - in this case there is no way to mark file as plot beforehand, and that's why we need to provide targets.

  4. Not a repo, we are able to dvc plots show - targets required

Is there any prioritization between those scenarios?

@dmpetrov dmpetrov added the ui user interface / interaction label Sep 15, 2020
@dmpetrov
Copy link
Member

dmpetrov commented Sep 15, 2020

@pared All looks good!

Re (4). It is not a high-priority scenario. But it would be amazing to support it if not a big implementation overhead. dvc plots show makes sense for just a file outside of any repo. The diffs - probably not (we can imaging a scenario with two files diff, but cmd signature needs to be changed. it is probably do not worth it).

Re (1)&(2) - it should support data files even not plot\metrics ones.

An important thing about this issue - we need to pay attention to the UI part. All the commands should be unified including params one.

@pared
Copy link
Contributor Author

pared commented Sep 22, 2020

So, starting this issue I made a mistake and wanted to start with moving and unifying collection for plots/metrics/diff into the SCM. It will be probably required at some point, thought its definitely not a good thing to start with as it is a big change.

So my plan is as follows:

@pared
Copy link
Contributor Author

pared commented Dec 14, 2020

Summary on current state of the issue, and what still needs to be done here:

  • metrics accept non-metrics file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets required
  • plots accept non-plots file - doesn't matter if we are in DVC repo, Git repo, or just a folder - targets required
  • collection of metrics plots and params between branches is unified - we will get the same log message if given file is not found at particular revision, no matter command used

TODO:
4 scenarios of unified error handling:

(They should be fixed, leaving this list to double check)

  • introduce --targets for params command
  • params accept non-params file - not matter DVC, GIT, or no repo case

@pared
Copy link
Contributor Author

pared commented Dec 28, 2020

@dmpetrov
There are 2 things left here to be handled:

  1. In the beginning of this issue you mentioned following use case:
metrics/plots/params diff for DVC repos without any pipelines (without --targets specified). So, a dummy\phony stage might be needed

How would that work? If we don't have pipelines we don't know which files are metrics/plots/params (with one exception, the default params.yaml file) How are we going to recognize which files in repo are metrics/plots/params?

  1. Scenario 2 - when show non existing file on current revision:

result for plots:

WARNING: 'data.json' was not found at: 'workspace'.
ERROR: File: 'data.json' does not exist.

result for metrics:

WARNING: 'data.json' was not found at: ''.                            
ERROR: File: 'data.json' does not exist.

In case of metrics we display empty string, in case of plots it is workspace - It seems to me that we should unify that too. I think in both cases we should display not found at :'workspace' as no string at all is confusing. Or we could even make this "placeholder" string for current repo state even more obvious (eg current_repo).

@dmpetrov
Copy link
Member

dmpetrov commented Jan 9, 2021

If we don't have pipelines we don't know which files are metrics/plots/params (with one exception, the default params.yaml file) How are we going to recognize which files in repo are metrics/plots/params?

@pared good point! By "without any pipelines" I meant without stages (that contain commands) - "dummy\phony stage might be needed".
I see this as a dummy stage with no command and deps but with params and metrics files as outputs.

In case of metrics we display empty string, in case of plots it is workspace - It seems to me that we should unify that too.

Absolutely! Do we really need the warning messages? Why not just ERROR: File 'data.json' does not exist. (without semicolumn in File:)

@pared
Copy link
Contributor Author

pared commented Jan 26, 2021

Absolutely! Do we really need the warning messages? Why not just ERROR: File 'data.json' does not exist. (without semicolumn in File:)

So, the error will occur when we don't collect any other plots/params/metrics. The problem is that this warning is happening before we are done with collecting. So we need to take into account situation when we are, for example dvc plots diff 10 different rev's, and one of them does not contain given target. It doesn't make sense to error out - as we are able to produce the plot for all the other 9, but we still need to tell the user we were not able to find the target on this single rev. Hence the warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension ui user interface / interaction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants