Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate European wildfire probability model for physrisk #319

Open
joemoorhouse opened this issue Jul 8, 2024 · 3 comments
Open

Investigate European wildfire probability model for physrisk #319

joemoorhouse opened this issue Jul 8, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@joemoorhouse
Copy link
Collaborator

Hi @jmcano-arfima, please feel free to adopt and edit this issue!

The paper 'Climate change-related statistical indicators' presents an approach based on calculating fire probability.
https://www.ecb.europa.eu/pub/pdf/scpsps/ecb.sps48~e3fd21dd5a.en.pdf

This issue is to investigate whether it is possible to onboard such a set into physrisk, potentially in collaboration with the authors.

@joemoorhouse joemoorhouse added the enhancement New feature or request label Jul 8, 2024
@jmcano-arfima
Copy link

Hi @joemoorhouse, indeed we could try collaborating with the authors (which would probably be the fastest solution).

We were also contemplating replicating the methodology ourselves, but still don't fully understand the implications of becoming a "data provider"...

I take this opportunity to tag @csanmillan, who has just joined us on the quant side and will be working closely on this issue.

@csanmillan
Copy link

Hi @joemoorhouse,

I wanted to update you on the progress I've made over the past few weeks. I've been reading the ECB article to understand how they create the probability map using machine learning and acquiring the necessary datasets.

The regression needs four inputs that need to be processed:

  1. Copernicus FWI databases (baseline until 2005, with additional inputs from the RCP until 2100. These data have already been
    converted to .csv format).

  2. Distance to city, railway, and road (maps in .tif format, which I have already converted to .csv).
    These two inputs have been downloaded and are ready for processing.

I am currently preparing the remaining inputs:

3. Burnt Area MCD64A1.061 MODIS
4. Land Cover Type MCD12Q1.061 MODIS

These last two are the most challenging because they are in the Google Earth Engine database and have a resolution of 500 meters. However, we are working with a 2500-meter resolution to reproduce the results of the paper, and there will be time to improve this later.

Both datasets are images, and there is one for each year over the past 20 years, making it difficult to obtain all of them for an entire continent over such a period. Therefore, I am developing a Python script to download them as efficiently as possible. Below is a brief example from Spain, with a resolution of 25000 meters (which simplifies the process of refining the code).
image

Hope to have the data input ready by next week once I have optimized the download process.

@csanmillan
Copy link

csanmillan commented Oct 2, 2024

I am working on a Jupyter notebook (.ipynb) to prepare the data for Machine Learning. Retrieved some files from 1 month ago. The data input has the following structure:
image

Variables/datasets currently prepared/downloaded:

  1. FWI_mean and FWI_max_to_mean: Data downloaded from Copernicus. This data was downloaded 1 month ago in .csv format.
    image

  2. Landcover type and Burn Area: Data extracted country by country from MODIS using Google Earth Engine. All the countries have been combined into a single Europe map with two layers.
    image


Datasets pending download and/or processing:

  1. Country Codes I still need to obtain the country codes, which is not complicated using an external database. This will be a straightforward process.

  2. **Critical Infrastructure **The challenging part will come with the last dataset: the roads, railways, and urban centers dataset. Calculating the distance to roads and railways is simple since they are explicitly stored. The urban centers will be challenging since they are deduced from two .tif maps: hospitals and educational centers.
    image

Will be working on the input parquet file and will start with the XGBoost with monotonic constraints.
We're using xarray for multidimensional data handling, Dask for parallel processing, GeoPandas for spatial analysis, and Zarr for efficient storage of large datasets in the notebook.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants