The code was build with replicability on mind. It uses packrat
to manage packages and dependences and drake
to define on the data workflow.
The easiest way to get the code to run in your computer is:
- Get the code. Two ways to to this
- Download github, login & clone the repository in your computer
- Just download the code from the repo
- Open the project in Rstudio. Wait until Rstudio installs the required packages
- Run the code. There are three ways to do this
- Go to the Terminal tab in Rstudio and type
make
- Go to the Build tab in Rstudio and click "Build All"
- Open
main.R
and source the code
- Go to the Terminal tab in Rstudio and type
The data provided by the NSW Office of Environment and Heritage contains water height readings from instruments deployed at different locations. The accuracy of these readings is influenced by noise—usually caused by the limitations on the instrument accuracy and other factors that determine water height but are not caused by tides, like swell or water being pushed by wind. However, During periods of high or low tide, the variation on water level caused by tides are often smaller than the variations caused by noise. The result is that raw data doesn't show clear peaks but a series of noisy spikes around the high or low tide marks. Therefore, raw data must be processed such that the noise is taken out of the water height values. To do that we use a tidal model.
This code fits a tidal model for each of the files in the data/tides
directory. Files must be in ".csv"" format.
The models are fit using the ftide
function of the TideHarmonics
package. It uses 114 constituents of the tides and hence a very accurate approximation to the tide levels.
Parameters are set at the beginning of the main.R
script:
resolution_predicted_tides
: the time resolution of the the corrected tides in hours. For example 1/6 mean tides will be calculated every (60/6) 10 mins.aggregation_bin
: the length of the bins at which data is being aggregated
The output of the code is four csv files in the folder data/processed
.
predictions.csv
: contains tidal water height in each of the locations. The resolution of the predictions is given by theresolution_predicted_tides
parameter. From this file should be OK to use code that finds the maxima and minima to infer times of high/low tide.metadata.csv
: contains information of each of the sites.high_low.csv
: contains the date-times at which high and low tides. Resolution the same as that of the predictions. Therefore is given by the parameterresolution_predicted_tides
.aggregated_tides
: contains the mean tide height for each bin defined by theaggretation_bin
parameter (for example "1 hour") in each locations.
Generating new tidal models for different sites or different dates is straightforward.
Just update the csv files in data/tides
and re-run the code as explained in the point #3 above.