Data wrangling

Data wrangling#

Summary: This notebook loads the different datasets used in the analysis into a single NETCDF4 file, with descriptive attributes maintained for each dataset. Each dataset is regridded to the ICESat2 grid shape [304, 448] (x,y). The datasets used in this notebook are listed below. Data are available on an AWS S3 Bucket as netCDF-4 or S3.

Version history: Version 1 (01/01/2022)

Details on each dataset#

Detailed information about each of the datasets used to compile the final data product are provided below in the order they appear in the notebook workflow. The information provided here was last updated 08-21-2021.

ATLAS/ICESat-2 Monthly Gridded Sea Ice Freeboard#

Product Type: Northern hemisphere gridded monthly means
Download link:
- NSIDC (recommended method): https://nsidc.org/data/ATL20
- Our google storage bucket (provided for compatibility with this Jupyter Book): https://console.cloud.google.com/storage/browser/sea-ice-thickness-data
Reference: Petty, A. A., R. Kwok, M. Bagnardi, A. Ivanoff, N. Kurtz, J. Lee, J. Wimert, and D. Hancock. 2021. ATLAS/ICESat-2 L3B Daily and Monthly Gridded Sea Ice Freeboard, Version 2. Northern hemisphere gridded monthly means. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. doi: https://doi.org/10.5067/ATLAS/ATL20.002. 08-21-2021.

NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration#

Variables used: NOAA/NSIDC sea ice concentration CDR
Download link: https://nsidc.org/data/g02202
Reference: Meier, W. N., F. Fetterer, A. K. Windnagel, and S. Stewart. 2021. NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration, Version 4. Mean monthly aggregated, northern hemisphere. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center https://doi.org/10.7265/efmz-2t65. 08-21-2021.
NOTE: This is provided as a data variable in the ICESat2 monthly gridded product

ERA5 monthly averaged data on single levels#

Variables used: 2m temperature; Mean surface downward long-wave radiation flux
Product type: Monthly averaged reanalysis
Download link: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means
Reference: Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2019): ERA5 monthly averaged data on single levels from 1979 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). (Accessed on 16-08-2021), 10.24381/cds.f17050d7

PIOMAS mean monthly ice thickness#

Product Type: Sea ice thickness (Volume per unit Area), monthly mean
Variables used: Sea ice thickness (Volume per unit Area) monthly mean; Grid lon and lat for scalar fields (txt file)
Download link: http://psc.apl.uw.edu/research/projects/arctic-sea-ice-volume-anomaly/data/model_grid
Reference: Zhang, Jinlun and D.A. Rothrock: Modeling global sea ice with a thickness and enthalpy distribution model in generalized curvilinear coordinates, Mon. Wea. Rev. 131(5), 681-697, 2003.
NOTE: You’ll want to download the heff format data product, not the text file. For some reason, just clicking on the gzipped file to unzip it raises an error on my computer that the archive is empty (which is not true!). You’ll need to unzip the file using the command line; i.e. gzip -d file.gz

Global Low Resolution Sea Ice Drifts from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) Ocean and Sea Ice Satellite Application Facility (OSI SAF)#

Product Type: Global Low Resolution Sea Ice Drift
Download link: https://osi-saf.eumetsat.int/products/osi-405-c
Reference: Lavergne, T., Eastwood, S., Teffah, Z., Schyberg, H., and Breivik, L.-A.: Sea ice motion from low-resolution satellite sensors: An alternative method and its validation in the Arctic, J. Geophys. Res., 115, C10032, https://doi.org/10.1029/2009JC005958, 2010.

Note

Although you’ll see an option to run this notebook in Binder, this notebook is NOT configured to run in Binder. If you want to wrangle the data yourself, each dataset used to compile the final data product can be downloaded from the links above. The final data product produced by this notebook can be downloaded from the google storage bucket associated with this jupyter book.

Import notebook dependencies#

import os
import numpy as np
import xarray as xr
import pandas as pd
from datetime import date
import pyproj 
import scipy.interpolate 
from glob import glob
import matplotlib.pyplot as plt
from utils.read_data_utils import read_book_data # This allows us to read the ICESAT2 data directly from the google storage bucket
import matplotlib as mpl

# Ignore warnings in the notebook to improve display
import warnings
warnings.filterwarnings('ignore')

# Set some plotting parameters
mpl.rcParams.update({
    "text.usetex": False,  # Use LaTeX for rendering
    "font.family": "sans-serif",
    'mathtext.fontset': 'stixsans',
    "lines.linewidth": 1.,
    "font.size": 8,
    #"lines.alpha": 0.8,
    "axes.labelsize": 8,
    "xtick.labelsize": 8,
    "ytick.labelsize": 8,
    "legend.fontsize": 8
})
mpl.rcParams['font.sans-serif'] = ['Arial']

Regrid all datasets to ICESat-2 grid#

In order to merge all the datasets into a singe netcdf4 file, they need to be on the same grid.

# Initialize map projection and project data to it
out_proj = 'EPSG:3411'
out_lons = is2_ds.longitude.values
out_lats = is2_ds.latitude.values

mapProj = pyproj.Proj("+init=" + out_proj)
xptsIS2, yptsIS2 = mapProj(out_lons, out_lats)

def regridToICESat2(dataArrayNEW, xptsNEW, yptsNEW, xptsIS2, yptsIS2):  
    """ Regrid new data to ICESat-2 grid 
    
    Args: 
        dataArrayNEW (xarray DataArray): DataArray to be gridded to ICESat-2 grid 
        xptsNEW (numpy array): x-values of dataArrayNEW projected to ICESat-2 map projection 
        yptsNEW (numpy array): y-values of dataArrayNEW projected to ICESat-2 map projection 
        xptsIS2 (numpy array): ICESat-2 longitude projected to ICESat-2 map projection
        yptsIS2 (numpy array): ICESat-2 latitude projected to ICESat-2 map projection
    
    Returns: 
        gridded (numpy array): data regridded to ICESat-2 map projection
    
    """
    gridded = []
    for i in range(len(dataArrayNEW.values)): 
        monthlyGridded = scipy.interpolate.griddata((xptsNEW.flatten(),yptsNEW.flatten()), dataArrayNEW.values[i].flatten(), (xptsIS2, yptsIS2), method = 'nearest')
        gridded.append(monthlyGridded)
    gridded = np.array(gridded)
    return gridded

ERA5 climate reanalysis#

# Choose data variables of interest 
ERA5Vars = ['t2m','msdwlwrf']

#initialize map projection and project data to it
mapProj = pyproj.Proj("+init=" + out_proj)
xptsERA, yptsERA = mapProj(*np.meshgrid(ERA5.longitude.values, ERA5.latitude.values))
xptsIS2, yptsIS2 = mapProj(out_lons, out_lats)

ERA5_list = []
for var in ERA5Vars: 
    ERA5gridded = regridToICESat2(ERA5[var], xptsERA, yptsERA, xptsIS2, yptsIS2) 
    ERAArray = xr.DataArray(data = ERA5gridded, 
                            dims = ['time', 'y', 'x'], 
                            coords = {'latitude': (('y','x'), out_lats), 'longitude': (('y','x'), out_lons), 'time':ERA5.time.values}, 
                            name = var)
    ERAArray.attrs = ERA5[var].attrs # Maintain descriptive attributes
    ERAArray = ERAArray.assign_attrs(ERA5.attrs)
    ERA5_list.append(ERAArray)
ERA5_regridded = xr.merge(ERA5_list)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 6
      4 #initialize map projection and project data to it
      5 mapProj = pyproj.Proj("+init=" + out_proj)
----> 6 xptsERA, yptsERA = mapProj(*np.meshgrid(ERA5.longitude.values, ERA5.latitude.values))
      7 xptsIS2, yptsIS2 = mapProj(out_lons, out_lats)
      9 ERA5_list = []

NameError: name 'ERA5' is not defined

PIOMAS sea ice thickness#

#project data to ICESat-2 map projection
xptsPIO, yptsPIO = mapProj(pio_da.longitude.values, pio_da.latitude.values)

#regrid data 
pio_regridded = regridToICESat2(pio_da, xptsPIO, yptsPIO, xptsIS2, yptsIS2)
pio_regridded = xr.DataArray(data = pio_regridded, 
                             dims = ['time', 'y', 'x'], 
                             coords = {'latitude': (('y','x'), out_lats), 'longitude': (('y','x'), out_lons), 'time': pio_da.time.values}, 
                             name = pio_da.name)
pio_regridded = pio_regridded.assign_attrs(pio_da.attrs)
pio_regridded = pio_regridded.to_dataset()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 2
      1 #project data to ICESat-2 map projection
----> 2 xptsPIO, yptsPIO = mapProj(pio_da.longitude.values, pio_da.latitude.values)
      4 #regrid data 
      5 pio_regridded = regridToICESat2(pio_da, xptsPIO, yptsPIO, xptsIS2, yptsIS2)

NameError: name 'pio_da' is not defined

OSI-SAF Sea Ice Drifts#

#project data to ICESat-2 map projection
xptsDRIFTS, yptsDRIFTS = mapProj(monthlyDrifts_proj.lon.values, monthlyDrifts_proj.lat.values)

# Loop through variables of interest and regrid 
drifts_list = []
for var in ["x_vel","y_vel"]: 
    driftsGridded = regridToICESat2(monthlyDrifts_proj[var], xptsDRIFTS, yptsDRIFTS, xptsIS2, yptsIS2)

    driftsArray = xr.DataArray(data = driftsGridded, 
                               dims = ['time', 'y', 'x'], 
                               coords = {'latitude': (('y','x'), out_lats), 'longitude': (('y','x'), out_lons), "time": monthlyDrifts_proj.time.values}, 
                               name = var)

    driftsArray.attrs = monthlyDrifts_proj[var].attrs
    driftsArray = driftsArray.assign_attrs(monthlyDrifts_proj.attrs)
    drifts_list.append(driftsArray)

drifts_regridded = xr.merge(drifts_list)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 2
      1 #project data to ICESat-2 map projection
----> 2 xptsDRIFTS, yptsDRIFTS = mapProj(monthlyDrifts_proj.lon.values, monthlyDrifts_proj.lat.values)
      4 # Loop through variables of interest and regrid 
      5 drifts_list = []

NameError: name 'monthlyDrifts_proj' is not defined

Compile and save final dataset#

Now that all the data is on the same grid, we can use xarray to merge all the datasets.

Combine datasets#

final_ds = xr.merge([is2_ds, pio_regridded, ERA5_regridded, drifts_regridded])
final_ds = final_ds.sel(time=slice("Nov 2018",final_ds.time.values[-1])) # Remove Sep & Oct 2018, which have no data from ICESat-2

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 final_ds = xr.merge([is2_ds, pio_regridded, ERA5_regridded, drifts_regridded])
      2 final_ds = final_ds.sel(time=slice("Nov 2018",final_ds.time.values[-1])) # Remove Sep & Oct 2018, which have no data from ICESat-2

NameError: name 'pio_regridded' is not defined

Save data to local machine as a netcdf4 file#

We also uploaded this same file to the google storage bucket.

filename = './data/IS2_jbook_dataset_201811-202104.nc'
save_file = True

if (save_file == True):
    try: 
        final_ds.to_netcdf(path=filename, format='NETCDF4', mode='w')
        print('File ' + '"%s"' % filename + ' saved to directory ' + '"%s"' % os.getcwd())
    except: 
        print("Cannot save file because file by same name already exists")

else: 
    pass

Cannot save file because file by same name already exists

Data wrangling

Contents

Data wrangling#

Details on each dataset#

ATLAS/ICESat-2 Monthly Gridded Sea Ice Freeboard#

NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration#

ERA5 monthly averaged data on single levels#

PIOMAS mean monthly ice thickness#

Global Low Resolution Sea Ice Drifts from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) Ocean and Sea Ice Satellite Application Facility (OSI SAF)#

Import notebook dependencies#

Read in data#

Define filepaths#

Set date range of interest#

ICESat-2 gridded monthly means#

ERA5 climate reanalysis#

PIOMAS sea ice thickness#

OSI-SAF Sea Ice Drifts#

Regrid all datasets to ICESat-2 grid#

ERA5 climate reanalysis#

PIOMAS sea ice thickness#

OSI-SAF Sea Ice Drifts#

Compile and save final dataset#

Combine datasets#

Save data to local machine as a netcdf4 file#