Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide demo reference dataset from WorldCereal to setup extractions workflow #40

Closed
kvantricht opened this issue Feb 2, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@kvantricht
Copy link
Collaborator

kvantricht commented Feb 2, 2024

Need a representative dataset from which we can start setting up the extractions workflow.

@kvantricht kvantricht self-assigned this Feb 2, 2024
@kvantricht kvantricht added the enhancement New feature or request label Feb 2, 2024
@kvantricht
Copy link
Collaborator Author

@GriffinBabe I prepared a first demo dataset as requested. Remember in the future this data will come from requests to the API. But for the time being, please find the file here: /vitodata/worldcereal/tmp/kristof/GFMAP/2021_EUR_DEMO_POLY_110.gpkg

This is coming from EUROCROPS, and hence spans multiple countries which is interesting for testing the spatial splitter.
Following attributes are present (can be renamed in the future):

  • sample_id: can be considered unique ID in the entire RDM to identify a field or a point
  • landcover_label: worldcereal landcover label to be rasterized in the ground truth
  • croptype_label: worldcereal croptype label to be rasterized in the ground truth
  • irrigation_label: legacy, ignore for now
  • confidence: indiciation of confidence of this ground truth label, needs to be added to metadata
  • extract: boolean flag indicating if a sample needs to be extracted or not
  • valid_date: date for which sample is valid, forms the basis for defining extraction start/end timerange (for now we should take mayba 1,5 year centered around valid_date.
  • ref_id: id of original dataset the sample belongs to. Needs to be exported to the metadata.

So when extract==True, the field needs to be used for extracting a patch. Remember that for rasterizing ground truth, we want all fields inside the patch regardless if they need to serve as extraction or not. We just need to make sure the fields cover the same (more or less) valid_date to be valid for the actual extraction. In this case, all data comes from same ref_id so we should be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants