Working with HLS datasets: cloud cover, GeoTIFFs, and the QA band¶

This guide shows how to use some of the utilities implemented in nasa_hls for working with NASA’s Harmonized Landsat and Sentinel-2 Project (https://hls.gsfc.nasa.gov/) data. It covers the follwoing topics:

Derive the cloud cover percentage from the metadata of a HDF file.
Convert HDF datasets to single layer GeoTIFFs.
Convert the binary QA layer in a clear-sky mask.

[1]:

%load_ext autoreload
%autoreload 2
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
import rasterio

import nasa_hls

The sample data¶

We assume the data that has been created wiht the ‘Query and download HLS datasets with nasa_hls’ guide.

[2]:

hdf_dir = Path("./xxx_uncontrolled_hls/downloads")
geotiff_dir = Path("./xxx_uncontrolled_hls/geotiffs")

import pandas as pd
df_downloaded = pd.read_csv("./xxx_uncontrolled_hls/downloads/df_downloads.csv", index_col="id")
display(df_downloaded)

	product	tile	date	url	year	month	day	path
id
HLS.L30.T32UNU.2018092.v1.4	L30	32UNU	2018-04-02	https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...	2018	4	2	./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN...
HLS.S30.T32UNU.2018092.v1.4	S30	32UNU	2018-04-02	https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...	2018	4	2	./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN...
HLS.L30.T32UPU.2018092.v1.4	L30	32UPU	2018-04-02	https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...	2018	4	2	./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP...
HLS.S30.T32UPU.2018092.v1.4	S30	32UPU	2018-04-02	https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...	2018	4	2	./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP...
HLS.L30.T32UPU.2018099.v1.4	L30	32UPU	2018-04-09	https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...	2018	4	9	./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP...
HLS.S30.T32UPU.2018099.v1.4	S30	32UPU	2018-04-09	https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...	2018	4	9	./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP...
HLS.L30.T32UNU.2018115.v1.4	L30	32UNU	2018-04-25	https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...	2018	4	25	./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN...
HLS.S30.T32UNU.2018115.v1.4	S30	32UNU	2018-04-25	https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...	2018	4	25	./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN...

Derive metadata from HDF file¶

The HDF files come with rich metadata that can be read from the files with gdalinfo. The HLS v1.4 UserGuide lists the metadata fields with a short description in Section 6.6.

The full outcome of gdalinfo for the sample scene looks as follows:

[3]:

%%bash
gdalinfo ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf

Driver: HDF4/Hierarchical Data Format Release 4
Files: ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf
Size is 512, 512
Metadata:
  ACCODE=LaSRCL8V3.5.5; LaSRCL8V3.5.5
  arop_ave_xshift(meters)=-10.5, -17.0999999999767
  arop_ave_yshift(meters)=-10.5, -10.5
  arop_ncp=156, 364
  arop_rmse(meters)=4.32, 3.91
  arop_s2_refimg=/nobackupp6/jju/S2REFIMG/32UNU_2016273A.S2A_OPER_MSI_L1C_TL_SGS__20160929T154211_A006640_T32UNU_B08.jp2
  cloud_coverage=24
  DATA_TYPE=L1TP; L1TP
  HLS_PROCESSING_TIME=2018-05-22T01:14:35Z
  HORIZONTAL_CS_NAME=UTM, WGS84, UTM ZONE 32; UTM, WGS84, UTM ZONE 32
  L1_PROCESSING_TIME=2018-04-16T22:40:45Z; 2018-04-16T22:42:02Z
  LANDSAT_PRODUCT_ID=LC08_L1TP_193026_20180402_20180416_01_T1; LC08_L1TP_193027_20180402_20180416_01_T1
  LANDSAT_SCENE_ID=LC81930262018092LGN00; LC81930272018092LGN00
  MEAN_SUN_AZIMUTH_ANGLE=154.21429006, 152.99684032
  MEAN_SUN_ZENITH_ANGLE=46.61961515, 45.48706472
  NBAR_Solar_Zenith=50.3345670872353
  NCOLS=3660
  NROWS=3660
  SENSING_TIME=2018-04-02T10:03:11.3268150Z; 2018-04-02T10:03:35.2051460Z
  SENSOR=OLI_TIRS; OLI_TIRS
  SENTINEL2_TILEID=32UNU
  spatial_coverage=29
  SPATIAL_RESOLUTION=30
  TIRS_SSM_MODEL=FINAL; FINAL
  TIRS_SSM_POSITION_STATUS=ESTIMATED; ESTIMATED
  ULX=499980
  ULY=5400000
  USGS_SOFTWARE=LPGS_13.0.0
Subdatasets:
  SUBDATASET_1_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band01
  SUBDATASET_1_DESC=[3660x3660] band01 Grid (16-bit integer)
  SUBDATASET_2_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band02
  SUBDATASET_2_DESC=[3660x3660] band02 Grid (16-bit integer)
  SUBDATASET_3_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band03
  SUBDATASET_3_DESC=[3660x3660] band03 Grid (16-bit integer)
  SUBDATASET_4_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band04
  SUBDATASET_4_DESC=[3660x3660] band04 Grid (16-bit integer)
  SUBDATASET_5_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band05
  SUBDATASET_5_DESC=[3660x3660] band05 Grid (16-bit integer)
  SUBDATASET_6_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band06
  SUBDATASET_6_DESC=[3660x3660] band06 Grid (16-bit integer)
  SUBDATASET_7_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band07
  SUBDATASET_7_DESC=[3660x3660] band07 Grid (16-bit integer)
  SUBDATASET_8_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band09
  SUBDATASET_8_DESC=[3660x3660] band09 Grid (16-bit integer)
  SUBDATASET_9_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band10
  SUBDATASET_9_DESC=[3660x3660] band10 Grid (16-bit integer)
  SUBDATASET_10_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:band11
  SUBDATASET_10_DESC=[3660x3660] band11 Grid (16-bit integer)
  SUBDATASET_11_NAME=HDF4_EOS:EOS_GRID:"./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf":Grid:QA
  SUBDATASET_11_DESC=[3660x3660] QA Grid (8-bit unsigned integer)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

The values for the fields separated from its values with a ‘=’ derived given a HDF file.

[4]:

print(df_downloaded["path"][0])
nasa_hls.get_metadata_from_hdf(df_downloaded["path"][0],
                               fields=["cloud_cover", "spatial_coverage"])

./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf

[4]:

{'cloud_cover': 24.0, 'spatial_coverage': 29.0}

Note that you will still get a dictionary back even if a specified field cannot be found. However you should get a warning for each field that could not be found.

[5]:

nasa_hls.get_metadata_from_hdf(df_downloaded["path"][0],
                               fields=["cloud_cover", "ZZZZZ", "DUBIDUBIDUUU"])

/home/ben/Devel/Repos_Py/nasa_hls/nasa_hls/utils.py:210: UserWarning: Could not find metadata for field 'ZZZZZ'.
  warnings.warn(f"Could not find metadata for field '{field}'.")
/home/ben/Devel/Repos_Py/nasa_hls/nasa_hls/utils.py:210: UserWarning: Could not find metadata for field 'DUBIDUBIDUUU'.
  warnings.warn(f"Could not find metadata for field '{field}'.")

[5]:

{'cloud_cover': 24.0}

Here we derive the cloud cover and spatial coverage for all the downloaded scenes.

[6]:

df_downloaded["cloud_cover"] = np.NaN
df_downloaded["spatial_coverage"] = np.NaN

for i in df_downloaded.index:
    _ = nasa_hls.get_metadata_from_hdf(df_downloaded.loc[i, "path"],
                                       fields=["cloud_cover",
                                               "spatial_coverage"])
    df_downloaded.loc[i, "cloud_cover"] = _["cloud_cover"]
    df_downloaded.loc[i, "spatial_coverage"] = _["spatial_coverage"]
df_downloaded[["cloud_cover", "spatial_coverage"]]

[6]:

	cloud_cover	spatial_coverage
id
HLS.L30.T32UNU.2018092.v1.4	24.0	29.0
HLS.S30.T32UNU.2018092.v1.4	85.0	100.0
HLS.L30.T32UPU.2018092.v1.4	11.0	100.0
HLS.S30.T32UPU.2018092.v1.4	66.0	100.0
HLS.L30.T32UPU.2018099.v1.4	100.0	50.0
HLS.S30.T32UPU.2018099.v1.4	100.0	72.0
HLS.L30.T32UNU.2018115.v1.4	28.0	100.0
HLS.S30.T32UNU.2018115.v1.4	47.0	58.0

Or as a plot:

[7]:

df_downloaded["product_date"] = df_downloaded["product"] + "-" + df_downloaded["date"]
df_downloaded["date_tile"] = df_downloaded["date"] + "-" + df_downloaded["tile"]
#df_downloaded.pivot(index="product_date", columns="tile", values="cloud_cover").plot.bar(figsize=(16, 6))
df_downloaded.pivot(index="date_tile",
                    columns="product",
                    values="cloud_cover").plot.bar(figsize=(16, 6))

[7]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f5f705a8b80>

../_images/tutorials_Working_with_HLS_datasets_and_nasa_hls_13_1.png

Convert HDF to single-layer GeoTIFFs¶

Note: Starting from nasa_hls version 0.1.1 you can also convert the data to Cloud Optimized GeoTIFFs (COG). This also requires GDAL >= 3.1.

To create COGs use teh option gdal_translate_options="-of COG" in convert_hdf2tiffs.

Single scene¶

[8]:

# L30
nasa_hls.convert_hdf2tiffs(Path("./xxx_uncontrolled_hls/downloads/HLS.L30.T32UNU.2018092.v1.4.hdf"),
                           geotiff_dir,
                           bands=None, # use this to convert only a subset of the bands
                           max_cloud_coverage=100,  # use this to only convert scenes that have lower than specified cloud cover
                          )

[8]:

PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4')

[9]:

# S30
nasa_hls.convert_hdf2tiffs(Path("./xxx_uncontrolled_hls/downloads/HLS.S30.T32UNU.2018092.v1.4.hdf"),
                           geotiff_dir,
                           bands=None,
                           max_cloud_coverage=100,
                          )

[9]:

PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UNU.2018092.v1.4')

As a result you get all the (desired) bands in a subfolder of the destination directory. Note the differences between the L30 and S30 files.

[10]:

%%bash
tree xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4
tree xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UNU.2018092.v1.4

xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4
├── HLS.L30.T32UNU.2018092.v1.4__Blue.tif
├── HLS.L30.T32UNU.2018092.v1.4__Cirrus.tif
├── HLS.L30.T32UNU.2018092.v1.4__Coastal_Aerosol.tif
├── HLS.L30.T32UNU.2018092.v1.4__Green.tif
├── HLS.L30.T32UNU.2018092.v1.4__NIR.tif
├── HLS.L30.T32UNU.2018092.v1.4__QA.tif
├── HLS.L30.T32UNU.2018092.v1.4__Red.tif
├── HLS.L30.T32UNU.2018092.v1.4__SWIR1.tif
├── HLS.L30.T32UNU.2018092.v1.4__SWIR2.tif
├── HLS.L30.T32UNU.2018092.v1.4__TIRS1.tif
└── HLS.L30.T32UNU.2018092.v1.4__TIRS2.tif

0 directories, 11 files
xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UNU.2018092.v1.4
├── HLS.S30.T32UNU.2018092.v1.4__Blue.tif
├── HLS.S30.T32UNU.2018092.v1.4__Cirrus.tif
├── HLS.S30.T32UNU.2018092.v1.4__Coastal_Aerosol.tif
├── HLS.S30.T32UNU.2018092.v1.4__Green.tif
├── HLS.S30.T32UNU.2018092.v1.4__NIR_Broad.tif
├── HLS.S30.T32UNU.2018092.v1.4__NIR_Narrow.tif
├── HLS.S30.T32UNU.2018092.v1.4__QA.tif
├── HLS.S30.T32UNU.2018092.v1.4__Red_Edge1.tif
├── HLS.S30.T32UNU.2018092.v1.4__Red_Edge2.tif
├── HLS.S30.T32UNU.2018092.v1.4__Red_Edge3.tif
├── HLS.S30.T32UNU.2018092.v1.4__Red.tif
├── HLS.S30.T32UNU.2018092.v1.4__SWIR1.tif
├── HLS.S30.T32UNU.2018092.v1.4__SWIR2.tif
└── HLS.S30.T32UNU.2018092.v1.4__Water_Vapor.tif

0 directories, 14 files

Batch of scene¶

[11]:

nasa_hls.convert_hdf2tiffs_batch(df_downloaded[df_downloaded["date"]=="2018-04-02"]["path"].values,
                                 geotiff_dir)

100%|██████████| 4/4 [00:29<00:00,  7.28s/it]

[11]:

[PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4'),
 PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UNU.2018092.v1.4'),
 PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UPU.2018092.v1.4'),
 PosixPath('xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UPU.2018092.v1.4')]

Create a clear-sky mask from the QA layer¶

The HLS v1.4 UserGuide helps to understanding and use the QA layer.

We can read in Appendix A. How to decode the bit-packed QA:

Quality Assessment (QA) encoded at the bit level provides concise presentation but is less convenient for users new to this format.

This is really true especially if compare the difficulty in handling the QA layer to handling the fmask or sen2cor SCL layer layer. Because of that and since the masking of invalid pixels / selection of clear sky pixels is so important we will go in a bit more detail here.

The rest of this section shows how to create the default mask and custom masks from the bit encoded AQ layer.

Sample QA layer¶

For illustration purposes we use the QA layer of the HLS.L30.T32UNU.2018092.v1.4 scene.

From the metadata we got the following information about the HDF file:

[12]:

df_downloaded.loc["HLS.L30.T32UNU.2018092.v1.4", ["cloud_cover", "spatial_coverage"]]

[12]:

cloud_cover         24
spatial_coverage    29
Name: HLS.L30.T32UNU.2018092.v1.4, dtype: object

Understand the QA layer¶

The metadata also gives us a QA description. We also find that information in the HLS v1.4 UserGuide:

title

Appendix A of the HLS v1.4 UserGuide shows how to decode the QA bits with simple integer arithmetic. Here we show a way how to do this in Python.

Lets start with the example from the Appendix A:

Suppose we get a decimal QA value 100, which translates into binary 01100100, indicating that the aerosol level is low (bits 6-7), it is water (bit 5), and adjacent to cloud (bit 2).

Derive clear-sky mask from the QA layer¶

The guide gives some advice on how to use the QA layer:

Users are advised to mask out Cirrus, Cloud and Adjacent cloud pixels.

We want to find out here, which rules have been used to calculate the cloud cover we find in in the metadata. There is no explicit information about the used rules.

Therefore we first create a dataframe where the information of the table above is expanded for easier accessability. The dataframe can be created with get_qa_look_up_table.

It contains the QA value (i.e. the integer in the raster), the binary string and a boolean column which indicates if for a given QA value a given QA case (i.e. a row in the table above) is met.

[13]:

lut_qa = nasa_hls.get_qa_look_up_table()
display(lut_qa.head())
display(lut_qa.tail())

	qa_value	binary_string	a_clima	a_low	a_avg	a_high	water	snow	cloud_shadow	adj_cloud	cloud	cirrus	no_water	no_snow	no_cloud_shadow	no_adj_cloud	no_cloud	no_cirrus
0	0	00000000	True	False	False	False	False	False	False	False	False	False	True	True	True	True	True	True
1	1	00000001	True	False	False	False	False	False	False	False	False	True	True	True	True	True	True	False
2	2	00000010	True	False	False	False	False	False	False	False	True	False	True	True	True	True	False	True
3	3	00000011	True	False	False	False	False	False	False	False	True	True	True	True	True	True	False	False
4	4	00000100	True	False	False	False	False	False	False	True	False	False	True	True	True	False	True	True

	qa_value	binary_string	a_clima	a_low	a_avg	a_high	water	snow	cloud_shadow	adj_cloud	cloud	cirrus	no_water	no_snow	no_cloud_shadow	no_adj_cloud	no_cloud	no_cirrus
251	251	11111011	False	False	False	True	True	True	True	False	True	True	False	False	False	True	False	False
252	252	11111100	False	False	False	True	True	True	True	True	False	False	False	False	False	False	True	True
253	253	11111101	False	False	False	True	True	True	True	True	False	True	False	False	False	False	True	False
254	254	11111110	False	False	False	True	True	True	True	True	True	False	False	False	False	False	False	True
255	255	11111111	False	False	False	True	True	True	True	True	True	True	False	False	False	False	False	False

Based on that dataframe it is easy to get the QA integer values for creating a specific mask from it.

Find masking rules resulting in metadata cloud cover¶

As an example let us see if we can calculate the cloud cover from the QA layer by finding the right QA attributes to be masked.

Let’s get the QA values of the sample scene.

[14]:

qa_path = './xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4/HLS.L30.T32UNU.2018092.v1.4__QA.tif'
with rasterio.open(qa_path) as qa:
    meta = qa.meta
    qa_array = qa.read()

Let’s first visualize the distribution of the values:

[15]:

qa_counts = pd.DataFrame(np.unique(qa_array), columns=["integer"])
qa_counts["binary"] = qa_counts.integer.apply(lambda i: "{0:08b}".format(i))
qa_counts["counts"] = qa_counts.integer.apply(lambda i: (qa_array == i).sum())

n_pix = qa_counts["counts"].sum()
n_obs = qa_counts[qa_counts["integer"] != 255].counts.sum()
print("number of observations (not 255) :", n_obs)
qa_counts.plot.bar(x="binary", y="counts", logy=True, figsize=(18, 6))

number of observations (not 255) : 3930335

[15]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f5f7029ee50>

../_images/tutorials_Working_with_HLS_datasets_and_nasa_hls_30_2.png

Let’s also calculate the percentage of pixels with respect to all pixels and with respect to the observed pixels (i.e. not 255):

[16]:

qa_counts["percent"] = ((qa_counts["counts"] / n_pix) * 100).round(2)
qa_counts["percent_obs"] = ((qa_counts["counts"] / n_obs) * 100).round(2)
qa_counts.loc[qa_counts["integer"] == 255, "percent_obs"] = -0.01
qa_counts.tail()

qa_counts_top_22 = qa_counts.sort_values("percent_obs", ascending=False).head(22)
qa_counts_top_22

[16]:

	integer	binary	counts	percent	percent_obs
19	128	10000000	1389745	10.37	35.36
42	192	11000000	991982	7.41	25.24
0	64	01000000	535713	4.00	13.63
43	193	11000001	228818	1.71	5.82
45	195	11000011	185779	1.39	4.73
44	194	11000010	170090	1.27	4.33
27	136	10001000	79370	0.59	2.02
50	200	11001000	57441	0.43	1.46
51	201	11001001	53104	0.40	1.35
23	132	10000100	44661	0.33	1.14
53	203	11001011	38433	0.29	0.98
49	199	11000111	28472	0.21	0.72
52	202	11001010	28155	0.21	0.72
46	196	11000100	26763	0.20	0.68
8	72	01001000	18869	0.14	0.48
4	68	01000100	10947	0.08	0.28
28	137	10001001	7772	0.06	0.20
20	129	10000001	7303	0.05	0.19
47	197	11000101	5277	0.04	0.13
48	198	11000110	5087	0.04	0.13
25	134	10000110	4114	0.03	0.10
57	207	11001111	1991	0.01	0.05

[17]:

df_downloaded.loc["HLS.L30.T32UNU.2018092.v1.4", ["cloud_cover", "spatial_coverage"]]

[17]:

cloud_cover         24
spatial_coverage    29
Name: HLS.L30.T32UNU.2018092.v1.4, dtype: object

Spatial coverage

[18]:

print("spatial coverage from QA: ", 100 - qa_counts.loc[qa_counts["integer"] == 255, "percent"].values[0])

spatial coverage from QA:  29.340000000000003

Cloud cover

I could not find in the docs how the AQ layer was used to calculate the cloud cover.

With the percentages in qa_counts and the lut_qa dataframe it is now easy to see how different masking decisions influence the percentage of masked pixels for a given QA layer.

The following shows my guesses to find a match for a cloud cover of 24 %.

[19]:

valid = lut_qa.loc[
        ~((lut_qa.cirrus) | (lut_qa.cloud) | (lut_qa.adj_cloud)), "qa_value"
    ].values
percent_valid = qa_counts.loc[qa_counts.integer.isin(valid), "percent_obs"].sum()
100 - percent_valid

[19]:

21.810000000000002

[20]:

valid = lut_qa.loc[
        ~((lut_qa.cloud) | (lut_qa.adj_cloud) | (lut_qa.cloud_shadow)), "qa_value"
    ].values # => 19.8
percent_valid = qa_counts.loc[qa_counts.integer.isin(valid), "percent_obs"].sum()
100 - percent_valid

[20]:

19.75999999999999

[21]:

valid = lut_qa.loc[
        ~((lut_qa.cirrus) | (lut_qa.cloud) | (lut_qa.adj_cloud) | (lut_qa.cloud_shadow)), "qa_value"
    ].values # => 25.8
percent_valid = qa_counts.loc[qa_counts.integer.isin(valid), "percent_obs"].sum()
100 - percent_valid

[21]:

25.769999999999996

[22]:

valid = lut_qa.loc[
        ~((lut_qa.cirrus) | (lut_qa.cloud) | (lut_qa.cloud_shadow)), "qa_value"
    ].values # => 23.7
percent_valid = qa_counts.loc[qa_counts.integer.isin(valid), "percent_obs"].sum()
100 - percent_valid

[22]:

23.67

The winner is: cirrus, cloud and cloud shadow, which somehow makes sense.

That means that we consider the following QA values as clear sky or valid pixels:

[23]:

valid

[23]:

array([  0,   4,  16,  20,  32,  36,  48,  52,  64,  68,  80,  84,  96,
       100, 112, 116, 128, 132, 144, 148, 160, 164, 176, 180, 192, 196,
       208, 212, 224, 228, 240, 244])

Convert the QA layer into a mask¶

Given we know the QA values that we want to consider as valid or invalid pixels we can derive a mask array or GeoTIFF from a GeoTIFF QA layer.

Here we use the function hls_qa_layer_to_mask to quickly test if the masking with the valid values give us the same cloud cover as the mateadata.

Note that you have the following additional options:

keep_255 (bool): If True this keeps the value 255 for the non sensed pixels.
mask_path (str): If a path is given the function will create a raster of the mask there and not output an array.
overwrite (bool): Defines if a file existing at mask_path should be overwritten.

[24]:

for i, row in df_downloaded.iterrows():
    sceneid = Path(row.path).stem
    qa_path = f'xxx_uncontrolled_hls/geotiffs/{sceneid}/{sceneid}__QA.tif'
    if not Path(qa_path).exists():
        continue

    clear_arr = nasa_hls.hls_qa_layer_to_mask(qa_path,
                                              qa_valid=valid,
                                              keep_255=True,
                                              mask_path=None,
                                              overwrite=False)
    n_pix = clear_arr.size
    n_obs = (clear_arr != 255).sum()
    n_cloudy = (clear_arr == 0).sum()
    n_clear = (clear_arr == 1).sum()
    cloud_cover = n_cloudy / (n_clear + n_cloudy)

    print("*" * 90)
    print(qa_path)
    print(f"Number of pixels             : {n_pix}")
    print(f"Number of obs.               : {n_obs}")
    print(f"Number of clear obs.         : {n_clear}")
    print(f"Number of cloudy obs.        : {n_cloudy}")

    print(f"Cloud cover - metadata       : {row.cloud_cover}")
    print(f"Cloud cover - calculated     : {round(cloud_cover * 100, 1) }")

    print(f"Spatial coverage - metadata  : {row.spatial_coverage}")
    print(f"Spatial coverage - calculated: {round((n_obs / n_pix) * 100, 1) }")

******************************************************************************************
xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UNU.2018092.v1.4/HLS.L30.T32UNU.2018092.v1.4__QA.tif
Number of pixels             : 13395600
Number of obs.               : 3930335
Number of clear obs.         : 2999963
Number of cloudy obs.        : 930372
Cloud cover - metadata       : 24.0
Cloud cover - calculated     : 23.7
Spatial coverage - metadata  : 29.0
Spatial coverage - calculated: 29.3
******************************************************************************************
xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UNU.2018092.v1.4/HLS.S30.T32UNU.2018092.v1.4__QA.tif
Number of pixels             : 13395600
Number of obs.               : 13272703
Number of clear obs.         : 1976262
Number of cloudy obs.        : 11296441
Cloud cover - metadata       : 85.0
Cloud cover - calculated     : 85.1
Spatial coverage - metadata  : 100.0
Spatial coverage - calculated: 99.1
******************************************************************************************
xxx_uncontrolled_hls/geotiffs/HLS.L30.T32UPU.2018092.v1.4/HLS.L30.T32UPU.2018092.v1.4__QA.tif
Number of pixels             : 13395600
Number of obs.               : 13395365
Number of clear obs.         : 11980540
Number of cloudy obs.        : 1414825
Cloud cover - metadata       : 11.0
Cloud cover - calculated     : 10.6
Spatial coverage - metadata  : 100.0
Spatial coverage - calculated: 100.0
******************************************************************************************
xxx_uncontrolled_hls/geotiffs/HLS.S30.T32UPU.2018092.v1.4/HLS.S30.T32UPU.2018092.v1.4__QA.tif
Number of pixels             : 13395600
Number of obs.               : 12915488
Number of clear obs.         : 4495212
Number of cloudy obs.        : 8420276
Cloud cover - metadata       : 66.0
Cloud cover - calculated     : 65.2
Spatial coverage - metadata  : 100.0
Spatial coverage - calculated: 96.4

!!! NOTE AND WARNING !!!

There are some small irregularities between the metadata and the calculated values that should be investigated further.

Feel free to do so… (;

Visualize the mask¶

[25]:

sceneid = "HLS.L30.T32UNU.2018092.v1.4"
src_nir = rasterio.open(f'xxx_uncontrolled_hls/geotiffs/{sceneid}/{sceneid}__NIR.tif')
src_qa = rasterio.open(f'xxx_uncontrolled_hls/geotiffs/{sceneid}/{sceneid}__QA.tif')
qa_path = f'xxx_uncontrolled_hls/geotiffs/{sceneid}/{sceneid}__QA.tif'

clear_arr = nasa_hls.hls_qa_layer_to_mask(qa_path,
                                          qa_valid=valid,
                                          keep_255=True,
                                          mask_path=None,
                                          overwrite=False)

fig, axes = plt.subplots(1, 3, figsize=(16, 8))

axes[0].imshow(src_nir.read(1), vmin=0, vmax=6000, cmap='pink')
axes[0].set_title(sceneid + " - " + "NIR")
axes[1].imshow(src_qa.read(1), vmin=0, vmax=255, cmap='pink')
axes[1].set_title(sceneid + " - " + "QA")
axes[2].imshow(clear_arr, vmin=0, vmax=3, cmap='pink')
axes[2].set_title(sceneid + " - " + "clear")
fig.tight_layout()

../_images/tutorials_Working_with_HLS_datasets_and_nasa_hls_46_0.png