Picture: Looking at the globe. Credit: Nastya Sensei.
by Kimberley Reid
In 2020, I contributed to an international project where I had to apply my object identification algorithm to a few different datasets and submit the results to be compared to other identification algorithms. It was a steep learning curve in making netCDFs, putting data into a required format, and CDOs and shell scripting.
Well after I had completed my submission, I received an email from one of the project leaders asking whether I had tracked the objects across the dateline…
No, I hadn’t.
I had applied my algorithm to the dataset that had been provided with the longitude domain of -180° to 180°. Of course, after it had been stated, it was bleedingly obvious that any object crossing the dateline would not be identified leading to weird artefacts in the results. What I was supposed to have done was pad the dataset i.e. append some of the westernmost grids onto the easternmost side and then clip the grid once I had run the algorithm.
I’ve noticed there are quite a few of these tidbits that everyone seems to know…except me. They are rarely mentioned in the methods section of papers. These titbits are seemingly obvious that I think senior scientists forget to mention them to students, but we must learn them somewhere and, for me, it’s usually through making mistakes and having to repeat the analysis.
I’ve put together a list of “Common Unspoken Knowledge” for postgraduate students in the climate sciences. Hopefully, this will save you some time and mistakes in the future, or at least make you feel better about yourself by reading all the silly mistakes I’ve made as a Masters and PhD student.
1. Spatial Averages must be weighted by latitude. If the grid you are using is defined using degrees latitude/longitude, then the grid spaces get smaller as you move from the equator to the pole. This means if you are calculating a spatial average (e.g. mean annual rainfall over Australia) you have to weight each gridspace by the cosine of the latitude.
2. European datasets typically define the longitude domain from 0° to 360°, while American datasets typically use -180° to 180°, so always check the metadata before indexing.
3. Tracking across a discontinuity (see above)
4. Station data is usually in local time, which is a huge pet peeve of mine (this is why universal time was invented!!!)
5. If you are trying to analyse pressure and your dataset or model is doing weird things, 90% of the time it’s because you are trying to analyse an atmospheric variable on a pressure level that is below ground or underwater.
6. If you are comparing datasets (e.g. for model evaluation) everything needs to be at the same resolution unless, of course, you are analysing the effects of resolution.
7. When calculating correlations with temperature or calculating climate indices, you must detrend the temperature to account for global warming otherwise you will get erroneously high correlations.
8. Different organisations calculate climate indices slightly differently.
9. In relation to 7. the warming over the Niño region is non-linear so you can’t just do a simple linear detrend.
10. This is the most important one because a lot of Masters and Honours students don’t actually know this and it’s a huge factor in deciding whether or not to do a PhD: You often get paid to do a PhD.