Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenient FilterPath functions #75

Open
dnephin opened this issue Mar 5, 2018 · 9 comments
Open

Convenient FilterPath functions #75

dnephin opened this issue Mar 5, 2018 · 9 comments

Comments

@dnephin
Copy link

dnephin commented Mar 5, 2018

Hello, I've been using go-cmp for a bit now and I've found that the two most common cases for needing an Option are:

  1. AllowUnexported/IgnoreUnexported - which is already easy to use
  2. A FilterPath for fields that have values set from time.Now() (time.Time, or time.Duration)

It wasn't immediately obvious to me how to create a correct FilterPath filter function. After a quick scan of the godoc my first approach was something like this:

func(path gocmp.Path) bool {		
    return path.Last().String() == ".Elapsed"		
}

This worked, but it seems to me like it could potentially match fields that I don't want it to match if I used it with a large nested structure where multiple types had an Elapsed field.

Digging into the godoc further I noticed that a bunch of the types embedded PathStep, which made me realize that a more correct way of using Path would probably be to iterate over the steps, type assert to the expected type, and compare to Name() or Key().

My next iteration (gotestyourself/gotest.tools#62) looks something like this:

func fieldPathWithTypes(pathspec string) func(gocmp.Path) bool {
	return func(path gocmp.Path) bool {
		spec := strings.Split(pathspec, ".")
		index := 0

		for _, step := range path {
			fieldStep, ok := step.(gocmp.StructField)
			if !ok {
				continue
			}
			if index >= len(spec) || spec[index] != fieldStep.Name() {
				return false
			}
			index++
		}
		return index == len(spec)
	}
}

Which could be made a lot simpler if it's expected that PathStep.String() is stable and wont change between minor versions:

func fieldPathSimple(pathspec string) func(gocmp.Path) bool {
	return func(path gocmp.Path) bool {
		return path.String() == pathspec
	}
}

The same idea could be applied using GoString() if that is considered stable. The idea behind both of these functions is that it's much easier for a reader to understand a dotted path string, than it is to read a long function with a bunch of type assertions.

Questions:

  1. What is the expected/"best practice" way of using PathStep? Is it similar to fieldPathWithTypes ?
  2. Would you be interested in a PR that adds a convenience function for building the PathFilter filter function from a simple string? This could be something like fieldPathWithTypes or it could be more verbose (to match against something like GoString(). I believe the only type that can't be easily translated from a string representation would be Transform.
  3. Would you be interested in a PR that adds one or more examples for using FilterPath (based on feedback from these other questions) ?
@dsnet
Copy link
Collaborator

dsnet commented Mar 5, 2018

A FilterPath for fields that have values set from time.Now() (time.Time, or time.Duration)

If you're just trying to ignore a single struct field, does cmpopts.IgnoreFields suit your needs?

What is the expected/"best practice" way of using PathStep? Is it similar to fieldPathWithTypes ?

Iterating over the steps is the correct approach. Path.String is probably stable (hard to imagine how it would change), but Path.GoString can and has slightly changed over time.

Would you be interested in a PR that adds a convenience function for building the PathFilter filter function from a simple string?

Interestingly, when I first built cmp, I had a helper function that did exactly that. The rationale was that filtering by struct field names alone would be a very common use-case. That's actually why Path.String only returns struct fields names. However, after discussions with others and playing around with it, I decided that the lack of type safety was not worth the ease of use it brought. For the time being, I still prefer to have more type safety (e.g., the cmpopts.IgnoreFields function requires you to pass in a zero-value of the struct so that we can verify that the field specifier really does exist).

Would you be interested in a PR that adds one or more examples for using FilterPath (based on feedback from these other questions) ?

It's been on my radar for a while to write a tutorial for how to use cmp. The examples today are not great. The tutorial would either be in the form of a GitHub wiki article (with a link from godoc) or in the form of a series of godoc Examples that go from incredibly simple to more complex. I haven't figure out what I'm going to do there.

If you think you have a good example to contribute, I'm willing to take a look.

@dsnet dsnet added the question label Mar 5, 2018
@dnephin
Copy link
Author

dnephin commented Mar 5, 2018

If you're just trying to ignore a single struct field, does cmpopts.IgnoreFields suit your needs?

I would like to avoid having to ignore it. I want to use a Comparer, but only for specific fields (ex: https://github.com/gotestyourself/gotestyourself/pull/62/files#diff-3c98f0bec66867cf0fc114dece491b4aR80). The Comparer will verify that both values are non-zero and within some threshold of each other. If the field were just ignored the test would pass even if the field had a zero value, which is bad if you want the field to have a value of time.Now() .

I decided that the lack of type safety was not worth the ease of use it brought

I think it would be possible to make something that is type safe, ex: dnephin/gotest.tools@002bdd8

Every PathStep type has a specific notation, so the type of each path is checked, not just the string matching a name.

Edit: I think a similar approach could be done with a set of functions instead of parsing a string. Each step would be matched using something like:

pathFilter := Path(IsField("Summary"), IsSlice(), IsField("Elapsed"))

which would match a path of []PathStep{StructField{Name: "Summary"}, SliceIndex{}, StructField{Name: "Elapsed"}}

I'll try to put together a clean example of FilterPath.

@dsnet
Copy link
Collaborator

dsnet commented Mar 5, 2018

What I mean by lack of type safety is that:

type Foo struct { Hello string }
var x, y Foo
cmp.Equal(x, y, cmp.FilterPath(Spec("Helllo"), cmp.Ignore()))

does not detect that "Helllo" is misspelled. That is because Spec has no type information regarding the target type specifier is for.

@dnephin
Copy link
Author

dnephin commented Mar 6, 2018

Ah yes, I see what you mean. I think generally these failures to match would be obvious when the test fails, but probably not easy to debug.

Don't you still have that problem with the custom path filters? You still have to compare step.Name() == name, or is there some other way? You could just compare the type, not the field name, but that would match multiple fields.

I created a prototype of the second approach (using functions instead of parsing a string): gotestyourself/gotest.tools@master...dnephin:gocmp-opts-2

With this approach a path would be:

	path := Path(
		Type(node{}),
		Step(Field("Ref"), Type(&node{})),
		Indirect,
		Step(Field("Children"), Type([]node{})),
		Slice,
		Step(Field("Labels"), Type(map[string]node{})),
		MapKey("first"),
		Step(Field("Value"), Type(0)),
	)

It seems to me like this is unnecessarily verbose. If you just checked the type of one step was a reference, slice, or map, then you know the next will need to be Indirect, Slice, or Map.

With less strict matching logic this could be (the naming needs some work):

	partial := PathPartial(
		Step(Field("Ref"), Type(&node{})),
		Step(Field("Children"), Type([]node{})),
		Step(Field("Labels"), Type(map[string]node{})),
		Step(Field("Value"), Type(0)),
	)

With this approach a user can check all the types if they want to, or reduce it to just field matching. Does something like this look more viable?

@dnephin
Copy link
Author

dnephin commented Mar 8, 2018

I looked around cmpopts a bit and found filterField, which would perfect:

// filterField returns a new Option where opt is only evaluated on paths that
// include a specific exported field on a single struct type.
// The struct type is specified by passing in a value of that type.
//
// The name may be a dot-delimited string (e.g., "Foo.Bar") to select a
// specific sub-field that is embedded or nested within the parent struct.
func filterField(typ interface{}, name string, opt cmp.Option) cmp.Option {
// TODO: This is currently unexported over concerns of how helper filters
// can be composed together easily.
// TODO: Add tests for FilterField.
sf := newStructFilter(typ, name)
return cmp.FilterPath(sf.filter, opt)
}

This is much better than having to define the entire path, and should be type safe. What's the status of "concerns of how helper filters can be composed together easily" ?

Another option, which I'm also fond of, would be to expose just the filter part:

func FieldFilter(typ interface{}, name string) func(p cmp.Path) bool {
	sf := newStructFilter(typ, name)
	return sf.filter
}

I'd like to open a PR that exports one of these.

@dsnet
Copy link
Collaborator

dsnet commented Mar 13, 2018

I remember now some of my concerns. I wished I had expanded more on them when I wrote that TODO.

1. Returning only func(cmp.Path) bool and func(interface{}) bool doesn't compose well
Suppose you have a cmpopts helper that returns a filter that is a function of both the values and the path, you can't easily compose those two together. You would need to return both function signatures and require the user to manually call FilterPath and FilterValues on them. It's clunky to use:

pf, vf := cmpopts.HelperFilters(...)
opt := cmp.FilterPath(pf, cmp.FilterValues(vf, ...))

For that reason, I'm not sure a helper that returns a func(cmp.Path) bool is setting a good precedence.

2. Should FilterField(MyStruct{}, "MyField", opts) apply exactly on MyStruct.MyField or the entire sub-tree beneath it?
In the case of IgnoreFields it doesn't matter since there is no sub-tree underneath. However, once you generalize it, you have to answer this question. Sometimes you want the exact node, other-times you want the entire sub-tree.

@dnephin
Copy link
Author

dnephin commented Mar 13, 2018

Thanks for the explanation.

My initial reaction is:

1. Returning only func(cmp.Path) bool and func(interface{}) bool doesn't compose well

To provide a filter of both path and values why not just expose that as an cmp.Option directly (instead of returning separately filters) ?

func HelperFilters(..., opt cmp.Option) cmp.Option { 
    ...
    return cmp.FilterPath(pf, cmp.FilterValues(vf, ...))
}

This helper can be combined with any other Option. This is already the pattern used by IgnoreFields and IgnoreTypes.

I think that is a good argument against the "alternate option" from my previous comment.

2. Should FilterField(MyStruct{}, "MyField", opts) apply exactly on MyStruct.MyField or the entire sub-tree beneath it?

I was thinking about this as well. FilterField is definitely not a complete solution. To match against anything that isn't a field (slice index, map keys, indirect, or transform) FilterField would not work and you'd need something like my Path prototype, or a custom path matching function.

For the cases where FilterField is appropriate the field is either a scalar type or compound type. For scalar types there is no ambiguity, both prefix path matching and exact path matching would be equivalent.

For the case where the field has a slice or map value an exact match is appropriate because trying to match against specified indexes or keys isn't support by FilterField (as mentioned above). For fields which are struct values FilterField would need to be used with the most specific struct. If that isn't possible (because there are multiple fields with the same struct type, but you only want to match one) then once again FilterField is not appropriate.

So for all the cases where FilterField is appropriate and useful it seems like exact match is correct.

@dsnet
Copy link
Collaborator

dsnet commented Mar 21, 2018

I apologize for the delay in replying. Adding Filter helpers in cmpopts should probably come after I figure out what the right approach is in #36 as I strongly suspect that will have an impact on the API of Filter. It overlaps with first-class Filter helpers in that it needs to address the issue of single-node vs. sub-tree as well.

@dnephin
Copy link
Author

dnephin commented Mar 28, 2018

Fair enough. I've read over and commented on #36 to see if I understand the problem correctly.

For now I've implemented PathField: https://github.com/gotestyourself/gotestyourself/pull/86/files#diff-3c98f0bec66867cf0fc114dece491b4aR99

@dsnet dsnet removed the question label Aug 10, 2020
@dsnet dsnet changed the title question: convenient FilterPath functions Convenient FilterPath functions Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants