Loading SHIFT Data with Intake
The SHIFT Python Utilities library has a custom intake driver which provides a simple interface for reading in SHIFT data. The custom driver allows for common preprocessing operations such as orthorectifying and subseting with a bounding box or shapefile.
This notebook demonstrates some of the basic xarray functionality.
Open an interactive notebook:
Sign into the SHIFT SMCE Daskhub and select an instance size in a different tab
Follow this link
Select a kernel
import math
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import pandas as pd
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import hvplot.pandas
import sys
sys.path.append('/efs/SHIFT-Python-Utilities/')
from shift_python_utilities.intake_shift import shift_catalog
import holoviews as hv
hv.extension('bokeh')
import hvplot.xarray
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import warnings
warnings.filterwarnings("ignore")
# Supporting Functions
def gamma_adjust(array):
# Rescale Values using gamma to adjust brightness
gamma = math.log(0.2)/math.log(np.nanmean(array)) # Create exponent for gamma scaling - can be adjusted by changing 0.2
scaled = np.power(array,gamma).clip(0,1) # Apply scaling and clip to 0-1 range
scaled = np.nan_to_num(scaled, nan = 1) #Assign NA's to 1 so they appear white in plots
return scaled
def generate_rgb_plot(ds):
# Retreive red, green and blue wavelengths and convert them to numpy arrays
red = ds.sel(wavelength=650, method="nearest").reflectance
green = ds.sel(wavelength=560, method="nearest").reflectance
blue = ds.sel(wavelength=470, method="nearest").reflectance
# Scale the Bands
r = gamma_adjust(red)
g = gamma_adjust(green)
b = gamma_adjust(blue)
# Stack Bands and make an index
rgb = np.stack([r,g,b])
bds = np.array([0,1,2])
rgb.shape
# Pull x and y values
y = ds['lat'].values
x = ds['lon'].values
y.shape
x.shape
rgb.shape
# Create new rgb xarray data array.
data_vars = {'RGB':(['wavelength', 'lat', 'lon'], rgb)}
coords = {'wavelength':(['wavelength'],bds), 'lat':(['lat'],y), 'lon':(['lon'],x)}
attrs = ds.attrs
ds_rgb = xr.Dataset(data_vars=data_vars, coords=coords, attrs=attrs)
ds_rgb.coords['lon'].attrs = ds['lon'].attrs
ds_rgb.coords['lat'].attrs = ds['lat'].attrs
return ds_rgb
# Open the data catalog
cat = shift_catalog()
# List available datasets
cat.datasets
[‘aviris_v1_gridded’, ‘L2a’, ‘L1’]
# List child datasets
cat.L1.datasets
[‘glt’, ‘igm’, ‘obs’, ‘rdn’]
Reading in Data
Data can be read in using two function calls: read and read_chunked. The read method will pull all the data in the dataset into memory. It is highly recommended to use the read_chunked method unless you are sure your data can fit into memory.
data = cat.aviris_v1_gridded.read_chunked()
data
Working with L1 and L2a Data
The L1 and L2a data products can be accessed via the catalog. To load a specific file, use date and the time of the flight. A list of possible dates and times can be retrieved using the following code. The data is located /efs/efs-data-curated. In addition to date and time there are several other arguments you can use to preprocess the data.
Optional Arguments:
ortho (bool, default: False): If set to True, the associated GLT file will be used to orthorectify the dataset. Warning orthorectification is a very memory intensive process and depending on the size of the file the Daskhub may not have enough memory to complete the task. In this case you can use the SHIFT cluster or use the subsetting argument to orthorecticy a section of the dataset
filter_bands (bool, default: False)(np.ndarray or list): For this argument you can pass a boolean and it will use a set mask to filter out bands. Additionally, if you would like to use a custom mask, you can pass an array or list with the indicies of bands you would like filtered.
Here is a list of the default filtered bands:
[1, 2, 3, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425]
subset (dict, GeoDataFrame, default: None): This argument allows you to subset the dataset by index(x, and y), latitude and longitude, or with a shapefile formatted as a GeoPandas Dataframe. If using a shapefile, the ortho argument must be set to True.
chunks (dict, default: {‘y’: 1}): This argument controls how dask chunks up your array. I recommend using the default of chunking along the y dimension however, depending on what your computing, chunking differently my increase the performance.
# List the available dates
cat.dates
[‘20220224’, ‘20220228’, ‘20220308’, ‘20220316’, ‘20220318’, ‘20220322’, ‘20220405’, ‘20220412’, ‘20220420’, ‘20220429’, ‘20220503’,’20220511’,’20220512’, ‘20220517’, ‘20220529’, ‘20220914’, ‘20220915’]
# Use a date to get the available times
cat.times["20220228"]
[‘183924’, ‘185150’, ‘185720’, ‘190702’, ‘192104’, ‘193333’, ‘194708’, ‘195958’, ‘201833’, ‘202944’, ‘204228’, ‘205624’, ‘210940’, ‘212724’, ‘214527’, ‘215349’]
# Use a date and a time to retrieve the L2a file
cat.L2a(date="20220228", time="183924").read_chunked()
# Use a date and a time to retrieve the radiance file
cat.L1.rdn(date="20220228", time="183924").read_chunked()
Orthorectifying a Dataset
Using the GLT file
In order to orthorectify a dataset using the GLT file, all you need to do is set the ortho argument to True.
ds = cat.L2a(date="20220228", time="183924", ortho=True, filter_bands=False).read_chunked()
ds
Now that the data is orthorectified we can plot an RGB image of the scene.
ds_rgb = generate_rgb_plot(ds)
rgb_image = ds_rgb.hvplot.rgb(x='lon', y='lat', bands='wavelength', aspect = 'equal', frame_width=600).opts(tools=["hover"])
rgb_image
Orthorectifying a Subset of a File
Many of the SHIFT reflectance and radiance files are too large to orthorectify the entire scene. Using the subset argument a portion of the scene can be selected and orthorectified. As described above the subset argument can be used with:
x and y indicies
lat and lon values (using the correct CRS)
A shapefile formated as a GeoPandas dataframe (using the correct CRS)
Orthorectifying using x and y indicies
ds = cat.L2a(date="20220228", time="183924", ortho=True, filter_bands=True, subset={'x':slice(29, 200), 'y':slice(34, 500)}).read_chunked()
ds
ds_rgb = generate_rgb_plot(ds)
rgb_image = ds_rgb.hvplot.rgb(x='lon', y='lat', bands='wavelength', aspect = 'equal', frame_width=600).opts(tools=["hover"])
rgb_image
Orthorectifying using lat and lon
eastings = np.array([228610.68861488, 237298.11871802])
northings = np.array([3812959.0852389 , 3810526.08057343])
ds = cat.L2a(date=20220224, time=200332, ortho=True, filter_bands=True, subset={'lat':northings, "lon": eastings}).read_chunked()
ds
ds_rgb = generate_rgb_plot(ds)
rgb_image = ds_rgb.hvplot.rgb(x='lon', y='lat', bands='wavelength', aspect = 'equal', frame_width=600).opts(tools=["hover"])
rgb_image
Orthorectifying using a shapefile
shp = gpd.read_file("~/SHIFT-Python-Utilities/shift_python_utilities/tests/test_data/shp/test.shp")
ds = cat.L2a(date=20220224, time=200332, ortho=True, filter_bands=True, subset=shp).read_chunked()
ds
ds_rgb = generate_rgb_plot(ds)
rgb_image = ds_rgb.hvplot.rgb(x='lon', y='lat', bands='wavelength', aspect = 'equal', frame_width=600).opts(tools=["hover"])
rgb_image
Using the IGM File
For this section we set the ortho argument to false and use the lat and lon data from the igm file to orthorectify outputs.
# Retrieve a subset of a scene
ds = cat.L2a(date="20220228", time="183924", ortho=False, filter_bands=True, subset={'x':slice(29, 200), 'y':slice(34, 500)}).read_chunked()
ds
Use the Lat and Lon values to plot an orthorectified elevation map.
# Load the lat, lon and elevation values into memory
x = ds.lon.values
y = ds.lat.values
z = ds.elevation.values
# Create a contour plot using matplotlib
fig,ax=plt.subplots(1,1, figsize=(20, 6))
cp = ax.contourf(x, y, z, levels=15)
fig.colorbar(cp) # Add a colorbar to a plot
ax.set_title('Elevation Map')
plt.show()
Use the lat and lon values to plot a classification map of the scene.
# Combine the x and y dimension to format data for the PCA
ds = ds.stack(combined=('x', 'y'))
pca = PCA(n_components=3).fit(ds.reflectance.values)
# Perform clustering using the PCA outputs
kmeans = KMeans(n_clusters=6, init = 'k-means++', random_state=42)
kmeans = kmeans.fit(pca.components_.T)
# Add the clustering labels to the dataset as a new variable
ds = ds.assign({'k_means_labels':(['combined'], kmeans.labels_)})
# Return the dataset to its original shape by unstacking the x and y dimensions
ds = ds.unstack('combined')
ds
# Use the hvplot quadmesh plot to create the classification map
classification_map = ds.k_means_labels.hvplot.quadmesh(x='lon', y='lat')
classification_map