A couple of days back I was pointed (at stackoverflow.com) to xray package. I've been playing with it for about couple of hours now, and what can I say? Its's AWESOME! Here's what I am loving about xray so far:
- It brings pandas functionality to gridded data. (Very few Pandas's Panel functions are currently implemented). So functions like converting to annual/ weekly/daily (resample()), taking climatology (groupby()) etc are now possible on GRIDDED datasets!
- It loads netcdf's easily with ALL the metadata as well. This makes further manipulation extremely streamlined and fast.
- It works well with basemap. So plotting gridded is super easy and streamlined.
- It allows selecting by dimensions. It sound like a big deal, but its actually very effective.
- Say If you need to multiply lat dim. of your (time,lat,lon) data, one has to do lat[np.newaxis,:,np.newaxis] as per numpy's broadcasting. Not if using xray.
- Writing as netcdf is very easy, and again, preserves all metadata.
- Not to mention that it is easy to install (conda install xray).
Lets load the usual modules, and the xray
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import xray
import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits import basemap
Here are some quick and dirty demonstrations of a few of its abilities that I discovered. I am using GPCP precipitation data. xray loads the entire netcdf file with variables as well as metadata as a dataset object:
In [2]:
nc=xray.open_dataset('../../../InputData/Precipitation/GPCP/precip.mon.mean.nc')
#'nc' will have all the actual variables and metadata as object instances
#Print nc's info. Similar to ncdump
print nc.dump
#Load preciptation data. precip will have all the dims and associated metadata with it
precip=nc.precip 
Cool. xray lets you perform operations by specifying dimension names. To demonstrate this, take a mean across time dimension, and in the same line, make a quick and dirty plot (and display the units as well- to show how handy it is to retain the metadata)
In [11]:
plt.imshow(precip.mean(dim='time'))
plt.colorbar(orientation='horizontal',pad=0.06).set_label(precip.units)
But wait! xray also plots for you. And seems to work well with basemap!
In this particular data, the longitudes don't start from -180ish but from 0. Hence specifying lon_0 is not as straightforward as latitudes.
Here I specified the middle element of longitude dimension to be the center of the plot. 
Also, by default, xray.plot() uses variable's name as colorbar label. So I assigned the entire plot as a colorbar's label.
Also, by default, xray.plot() uses variable's name as colorbar label. So I assigned the entire plot as a colorbar's label.
In [4]:
m=basemap.Basemap(projection='cyl',lat_0=precip.lat[0],lon_0=precip.lon[len(precip.lon)/2])
m.drawcoastlines()
precip.mean(dim='time').plot(cmap='jet',
                             vmin=0,vmax=10).colorbar.set_label(precip.units)
Cool. Now lets create gridded climatology.
In [5]:
cPrecip=precip.groupby('time.month').mean('time')
print cPrecip.shape
#Create climatology of time series of total global preciptation
cPrecipTs=precip.sum(dim=('lat','lon')).groupby('time.month').mean('time')
print cPrecipTs.shape
Cool. Now lets resample precip data from monthly to annual scale.
In [6]:
precipA=precip.resample(dim='time',freq='A')
print precip.shape
print precipA.shape
plt.figure(figsize=(15,5))
plt.plot(precip.time,precip.sum(dim=('lat','lon')),'r',label='data')
plt.plot(precipA.time,precipA.sum(dim=('lat','lon')),'go',label='resample(A)')
plt.plot(precipA.time,precip.sum(dim=('lat','lon')).groupby('time.year').mean('time')
         ,'k',label='groupby(year).mean(time)')
plt.legend(ncol=3).draw_frame(False)
The plot above demonstrates that resample(), at times, is same as groupby().mean() i.e. averaging at a given frequency. Plus it can resample to monthly start ('MS') etc.
Cool. Now lets slice data along all three dimensions! I'm slicing a random region now.
In [7]:
precipA1990_2000=precipA.sel(time=slice('1990','2000'), 
                             lat=slice(88.75,1.25),
                             lon=slice(1.25,91.25))
#the slice boundary numbers have to be in the data. You can't slice by giving lons in degrees.
print precipA.shape 
print precipA1990_2000.shape 
m.drawcoastlines()
precipA1990_2000.mean(dim='time').plot(cmap='jet',
                             vmin=0,vmax=10).colorbar.set_label(precip.units)
Now lets weigh preciptation by the cosine of latitudes. 
AFAIK, this would have needed adding new blank axes as per numpy's broadcasting
In [8]:
precipW=precip*np.cos(np.deg2rad(precip.lat))
#Plot difference
m.drawcoastlines()
(precip-precipW).mean(dim='time').plot(cmap='jet').colorbar.set_label(precip.units)
Finally, lets write a variable to a netcdf file.
In [9]:
precipW.to_dataset(name='precipW').to_netcdf('precipW.nc')
Now lets display it again to ensure all metadata associated with precipW are intact.
In [10]:
!ncdump -h precipW.nc
 
