About Me

My photo
Pune/ India -> Irvine/ CA -> Boulder/ CO -> Pasadena/CA
Welcome to my blog! I'm Hrishi from Pune, India. I am an earth system scientist currently working as a postdoctoral researcher at NASA Jet Propulsion Laboratory. These blogs are mostly about my travels, landscape photography, scientific computing, book and film reviews, fitness, cooking, and science communication. Feel free to navigate based on the labels below. My website: hrishikeshac.wix.com/hchandan

Labels

Friday, October 16, 2015

xray: A brilliant little tool for climate data analysis


A couple of days back I was pointed (at stackoverflow.com) to xray package. I've been playing with it for about couple of hours now, and what can I say? Its's AWESOME! Here's what I am loving about xray so far:
  1. It brings pandas functionality to gridded data. (Very few Pandas's Panel functions are currently implemented). So functions like converting to annual/ weekly/daily (resample()), taking climatology (groupby()) etc are now possible on GRIDDED datasets!
  2. It loads netcdf's easily with ALL the metadata as well. This makes further manipulation extremely streamlined and fast.
  3. It works well with basemap. So plotting gridded is super easy and streamlined.
  4. It allows selecting by dimensions. It sound like a big deal, but its actually very effective.
  5. Say If you need to multiply lat dim. of your (time,lat,lon) data, one has to do lat[np.newaxis,:,np.newaxis] as per numpy's broadcasting. Not if using xray.
  6. Writing as netcdf is very easy, and again, preserves all metadata.
  7. Not to mention that it is easy to install (conda install xray).
Lets load the usual modules, and the xray
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import xray
import matplotlib.pyplot as plt
import matplotlib
from mpl_toolkits import basemap
Here are some quick and dirty demonstrations of a few of its abilities that I discovered. I am using GPCP precipitation data. xray loads the entire netcdf file with variables as well as metadata as a dataset object:
In [2]:
nc=xray.open_dataset('../../../InputData/Precipitation/GPCP/precip.mon.mean.nc')
#'nc' will have all the actual variables and metadata as object instances

#Print nc's info. Similar to ncdump
print nc.dump

#Load preciptation data. precip will have all the dims and associated metadata with it
precip=nc.precip 
<bound method Dataset.to_netcdf of <xray.Dataset>
Dimensions:  (lat: 72, lon: 144, time: 437)
Coordinates:
  * lat      (lat) float32 88.75 86.25 83.75 81.25 78.75 76.25 73.75 71.25 ...
  * lon      (lon) float32 1.25 3.75 6.25 8.75 11.25 13.75 16.25 18.75 21.25 ...
  * time     (time) datetime64[ns] 1979-01-01 1979-02-01 1979-03-01 ...
Data variables:
    precip   (time, lat, lon) float64 0.6615 0.5773 0.6175 0.6579 0.5897 ...
Attributes:
    Conventions: COARDS
    title: GPCP Version 2.2 Combined Precipitation Dataset (Final)
    platform: Observation
    source: GPCP Polar Satellite Precipitation Data Centre - Emission (SSM/I emission estimates).
 GPCP Polar Satellite Precipitation Data Centre - Scattering (SSM/I scattering estimates).
GPCP Geostationary Satellite Precipitation Data Centre (GPI and OPI estimates and rain gauge analyses).
NASA/GSFC Satellite Applications Office (TOVS estimates
GPCP Global Precipitation Climatology Centre (rain gauge analyses)
NASA ftp://precip.gsfc.nasa.gov/pub/gpcp-v2.2/
    documentation: http://www.esrl.noaa.gov/psd/data/gridded/data.gpcp.html
    version: V2.2
    references: http://www.esrl.noaa.gov/psd/data/gridded/data.gpcp.html
    comments: Please redownload if you obtained this file before Aug 1 2012
    Acknowledgement: 
,Please cite the original source of the data.
Please email the citation to george.j.huffman@nasa.gov or david.t.bolvin@nasa.gov

    history: Converted from netCDF3 to chunked, deflated NetCDF4 Aug 2014>
Cool. xray lets you perform operations by specifying dimension names. To demonstrate this, take a mean across time dimension, and in the same line, make a quick and dirty plot (and display the units as well- to show how handy it is to retain the metadata)
In [11]:
plt.imshow(precip.mean(dim='time'))
plt.colorbar(orientation='horizontal',pad=0.06).set_label(precip.units)
But wait! xray also plots for you. And seems to work well with basemap! In this particular data, the longitudes don't start from -180ish but from 0. Hence specifying lon_0 is not as straightforward as latitudes. Here I specified the middle element of longitude dimension to be the center of the plot.
Also, by default, xray.plot() uses variable's name as colorbar label. So I assigned the entire plot as a colorbar's label.
In [4]:
m=basemap.Basemap(projection='cyl',lat_0=precip.lat[0],lon_0=precip.lon[len(precip.lon)/2])
m.drawcoastlines()
precip.mean(dim='time').plot(cmap='jet',
                             vmin=0,vmax=10).colorbar.set_label(precip.units)
Cool. Now lets create gridded climatology.
In [5]:
cPrecip=precip.groupby('time.month').mean('time')
print cPrecip.shape

#Create climatology of time series of total global preciptation
cPrecipTs=precip.sum(dim=('lat','lon')).groupby('time.month').mean('time')
print cPrecipTs.shape
(12, 72, 144)
(12,)
Cool. Now lets resample precip data from monthly to annual scale.
In [6]:
precipA=precip.resample(dim='time',freq='A')
print precip.shape
print precipA.shape

plt.figure(figsize=(15,5))
plt.plot(precip.time,precip.sum(dim=('lat','lon')),'r',label='data')
plt.plot(precipA.time,precipA.sum(dim=('lat','lon')),'go',label='resample(A)')
plt.plot(precipA.time,precip.sum(dim=('lat','lon')).groupby('time.year').mean('time')
         ,'k',label='groupby(year).mean(time)')
plt.legend(ncol=3).draw_frame(False)
(437, 72, 144)
(37, 72, 144)
The plot above demonstrates that resample(), at times, is same as groupby().mean() i.e. averaging at a given frequency. Plus it can resample to monthly start ('MS') etc.
Cool. Now lets slice data along all three dimensions! I'm slicing a random region now.
In [7]:
precipA1990_2000=precipA.sel(time=slice('1990','2000'), 
                             lat=slice(88.75,1.25),
                             lon=slice(1.25,91.25))
#the slice boundary numbers have to be in the data. You can't slice by giving lons in degrees.
print precipA.shape 
print precipA1990_2000.shape 
m.drawcoastlines()
precipA1990_2000.mean(dim='time').plot(cmap='jet',
                             vmin=0,vmax=10).colorbar.set_label(precip.units)
(37, 72, 144)
(11, 36, 37)
Now lets weigh preciptation by the cosine of latitudes. AFAIK, this would have needed adding new blank axes as per numpy's broadcasting
In [8]:
precipW=precip*np.cos(np.deg2rad(precip.lat))
#Plot difference
m.drawcoastlines()
(precip-precipW).mean(dim='time').plot(cmap='jet').colorbar.set_label(precip.units)
Finally, lets write a variable to a netcdf file.
In [9]:
precipW.to_dataset(name='precipW').to_netcdf('precipW.nc')
Now lets display it again to ensure all metadata associated with precipW are intact.
In [10]:
!ncdump -h precipW.nc
netcdf precipW {
dimensions:
 time = 437 ;
 lat = 72 ;
 lon = 144 ;
variables:
 double precipW(time, lat, lon) ;
 float lat(lat) ;
  string lat:units = "degrees_north" ;
  lat:actual_range = 88.75f, -88.75f ;
  string lat:long_name = "Latitude" ;
  string lat:standard_name = "latitude" ;
  string lat:axis = "Y" ;
 float lon(lon) ;
  string lon:units = "degrees_east" ;
  string lon:long_name = "Longitude" ;
  lon:actual_range = 1.25f, 358.75f ;
  string lon:standard_name = "longitude" ;
  string lon:axis = "X" ;
 double time(time) ;
  string time:long_name = "Time" ;
  string time:delta_t = "0000-01-00 00:00:00" ;
  string time:avg_period = "0000-01-00 00:00:00" ;
  string time:standard_name = "time" ;
  string time:axis = "T" ;
  time:actual_range = 65378., 78647. ;
  string time:units = "days since 1800-01-01" ;
  time:calendar = "proleptic_gregorian" ;
}