Query and download HLS datasets with nasa_hls

This guide shows how the nasa_hls package can be used to query and download datasets from NASA’s Harmonized Landsat and Sentinel-2 Project (https://hls.gsfc.nasa.gov/) datasets.

[1]:
%load_ext autoreload
%autoreload 2

import nasa_hls

Query

Available tiles

Get a list of available tiles (see https://hls.gsfc.nasa.gov/test-sites/ for a map representation). It will be downloaded from https://hls.gsfc.nasa.gov/wp-content/uploads/2018/10/HLS_Sentinel2_Granule.csv.

[2]:
available_tiles = nasa_hls.get_available_tiles_from_url()
print("Total number of tiles: ", len(available_tiles))
print("First tiles: ", available_tiles[:3])
print("Last tiles: ", available_tiles[-3:])
Total number of tiles:  4090
First tiles:  ['01UDT', '01UET', '01UFT']
Last tiles:  ['56JMN', '56JMP', '56JMQ']

Available datasets

Get a available datasets matching the user-given

  • products (currently L30 and S30, i.e. 30m resolution Landsat and Sentinel-2),

  • tiles and

  • years of interest.

The result con be provided as

  • a list of URLs or

  • a dataframe with the URLs and the corresponding product, tile and date information.

[3]:
# returns list
urls_datasets = nasa_hls.get_available_datasets(products=["L30", "S30"],
                                                years=[2018],
                                                tiles=["32UNU", "32UPU"])
print("Number of datasets: ", len(urls_datasets))
print("First datasets:\n -", "\n - ".join(urls_datasets[:3]))
print("Last datasets:\n -", "\n - ".join(urls_datasets[-3:]))
100%|██████████| 4/4 [00:04<00:00,  1.19s/it]
Number of datasets:  373
First datasets:
 - https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018003.v1.4.hdf
 - https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018010.v1.4.hdf
 - https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018012.v1.4.hdf
Last datasets:
 - https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018359.v1.4.hdf
 - https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018362.v1.4.hdf
 - https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018364.v1.4.hdf

[4]:
# returns dataframe
df_datasets = nasa_hls.get_available_datasets(products=["L30", "S30"],
                                              years=[2018],
                                              tiles=["32UNU", "32UPU"],
                                              return_list=False)
print("Number of datasets: ", df_datasets.shape[0])
display(df_datasets.head(3))
display(df_datasets.tail(3))
100%|██████████| 4/4 [00:04<00:00,  1.17s/it]
Number of datasets:  373

product tile date url
0 L30 32UNU 2018-01-03 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...
1 L30 32UNU 2018-01-10 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...
2 L30 32UNU 2018-01-12 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...
product tile date url
370 S30 32UPU 2018-12-25 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...
371 S30 32UPU 2018-12-28 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...
372 S30 32UPU 2018-12-30 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3...

You can also create the dataframe from the list of URLs with the following function:

[5]:
nasa_hls.dataframe_from_urls(urls_datasets).head(3)
[5]:
product tile date url
0 L30 32UNU 2018-01-03 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...
1 L30 32UNU 2018-01-10 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...
2 L30 32UNU 2018-01-12 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3...

Download

Download a single dataset

If you need the URL of a dataset you can parse it as follows. Note that you only get the *.hdf file and that for each of these files there is a corresponding *.hdf.hdr file.

[6]:
url = nasa_hls.parse_url(date="2018-04-02",
                         tile="32UNU",
                         product="L30",
                         version="v1.4")
url
[6]:
'https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018092.v1.4.hdf'

You can also download the product directly. Note, that both the .hdf and .hdf.hdr files will be downloaded and as long as overwrite=False the files will only be downloaded if they do not already exist at the destination location.

[7]:
nasa_hls.download(dstdir="./xxx_uncontrolled_hls/downloads",
                 date="2018-04-02",
                 tile="32UNU",
                 product="L30",
                 overwrite=False)
[7]:
0

Note that you get an HTTPError if the url does not exist. For example:

nasa_hls.download(dstdir="./xxx_uncontrolled_hls/downloads",
                  date="2017-01-08",
                  tile="32UNU",
                  product="L30")
ERROR DURING DOWNLOAD: hls/HLS.L30.T32UNU.2017008.v1.4.hdf FROM https://hls.gsfc.nasa.gov/data/v1.4/L30/2017/32/U/N/U/HLS.L30.T32UNU.2017008.v1.4.hdf.
Traceback (most recent call last):
  ...
urllib.error.HTTPError: HTTP Error 404: Not Found

ERROR DURING DOWNLOAD: hls/HLS.L30.T32UNU.2017008.v1.4.hdf.hdr FROM https://hls.gsfc.nasa.gov/data/v1.4/L30/2017/32/U/N/U/HLS.L30.T32UNU.2017008.v1.4.hdf.hdr.
Traceback (most recent call last):
  ...
urllib.error.HTTPError: HTTP Error 404: Not Found

Download a batch of datasets

Given a dataframe as the one returned by dataframe_from_urls it is possible to download a batch of datasets. That means we can filter the the data and download what we need.

For example, set us find April scenes of the tile 32UNU where we have a LS and S-2 aquisition on the same day.

[8]:
df_datasets["year"] = df_datasets.date.dt.year
df_datasets["month"] = df_datasets.date.dt.month
df_datasets["day"] = df_datasets.date.dt.day

ls_s2_aquisitions_same_day = df_datasets.duplicated(subset=["tile", "year", "month", "day"], keep=False)

df_download = df_datasets[(ls_s2_aquisitions_same_day) & \
                          #(df_datasets["tile"] == "32UNU") & \
                          (df_datasets["date"].dt.year == 2018) & \
                          (df_datasets["date"].dt.month == 4) ]
df_download = df_download.sort_values(["date", "tile", "product"])
df_download
[8]:
product tile date url year month day
17 L30 32UNU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 2
154 S30 32UNU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 2
84 L30 32UPU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 2
272 S30 32UPU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 2
85 L30 32UPU 2018-04-09 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 9
275 S30 32UPU 2018-04-09 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 9
21 L30 32UNU 2018-04-25 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 25
163 S30 32UNU 2018-04-25 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 25

And download the matching datasets:

[9]:
nasa_hls.download_batch(dstdir="./xxx_uncontrolled_hls/downloads",
                        datasets=df_download,
                        version="v1.4",
                        overwrite=False)
100%|██████████| 8/8 [07:48<00:00, 58.61s/it]
[10]:
df_download["id"] = df_download["url"].str.split("/", expand=True)[11].str[0:-4]

df_download["path"] = "./xxx_uncontrolled_hls/downloads" + "/" + df_download["id"] + ".hdf"
df_download
[10]:
product tile date url year month day id path
17 L30 32UNU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 2 HLS.L30.T32UNU.2018092.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN...
154 S30 32UNU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 2 HLS.S30.T32UNU.2018092.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN...
84 L30 32UPU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 2 HLS.L30.T32UPU.2018092.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP...
272 S30 32UPU 2018-04-02 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 2 HLS.S30.T32UPU.2018092.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP...
85 L30 32UPU 2018-04-09 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 9 HLS.L30.T32UPU.2018099.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP...
275 S30 32UPU 2018-04-09 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 9 HLS.S30.T32UPU.2018099.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP...
21 L30 32UNU 2018-04-25 https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... 2018 4 25 HLS.L30.T32UNU.2018115.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN...
163 S30 32UNU 2018-04-25 https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... 2018 4 25 HLS.S30.T32UNU.2018115.v1.4 ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN...
[11]:
df_download.to_csv("./xxx_uncontrolled_hls/downloads/df_downloads.csv", index=False)