Query and download HLS datasets with nasa_hls¶
This guide shows how the nasa_hls package can be used to query and download datasets from NASA’s Harmonized Landsat and Sentinel-2 Project (https://hls.gsfc.nasa.gov/) datasets.
[1]:
%load_ext autoreload
%autoreload 2
import nasa_hls
Query¶
Available tiles¶
Get a list of available tiles (see https://hls.gsfc.nasa.gov/test-sites/ for a map representation). It will be downloaded from https://hls.gsfc.nasa.gov/wp-content/uploads/2018/10/HLS_Sentinel2_Granule.csv.
[2]:
available_tiles = nasa_hls.get_available_tiles_from_url()
print("Total number of tiles: ", len(available_tiles))
print("First tiles: ", available_tiles[:3])
print("Last tiles: ", available_tiles[-3:])
Total number of tiles: 4090
First tiles: ['01UDT', '01UET', '01UFT']
Last tiles: ['56JMN', '56JMP', '56JMQ']
Available datasets¶
Get a available datasets matching the user-given
products (currently L30 and S30, i.e. 30m resolution Landsat and Sentinel-2),
tiles and
years of interest.
The result con be provided as
a list of URLs or
a dataframe with the URLs and the corresponding product, tile and date information.
[3]:
# returns list
urls_datasets = nasa_hls.get_available_datasets(products=["L30", "S30"],
years=[2018],
tiles=["32UNU", "32UPU"])
print("Number of datasets: ", len(urls_datasets))
print("First datasets:\n -", "\n - ".join(urls_datasets[:3]))
print("Last datasets:\n -", "\n - ".join(urls_datasets[-3:]))
100%|██████████| 4/4 [00:04<00:00, 1.19s/it]
Number of datasets: 373
First datasets:
- https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018003.v1.4.hdf
- https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018010.v1.4.hdf
- https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018012.v1.4.hdf
Last datasets:
- https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018359.v1.4.hdf
- https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018362.v1.4.hdf
- https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/32/U/P/U/HLS.S30.T32UPU.2018364.v1.4.hdf
[4]:
# returns dataframe
df_datasets = nasa_hls.get_available_datasets(products=["L30", "S30"],
years=[2018],
tiles=["32UNU", "32UPU"],
return_list=False)
print("Number of datasets: ", df_datasets.shape[0])
display(df_datasets.head(3))
display(df_datasets.tail(3))
100%|██████████| 4/4 [00:04<00:00, 1.17s/it]
Number of datasets: 373
product | tile | date | url | |
---|---|---|---|---|
0 | L30 | 32UNU | 2018-01-03 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
1 | L30 | 32UNU | 2018-01-10 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
2 | L30 | 32UNU | 2018-01-12 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
product | tile | date | url | |
---|---|---|---|---|
370 | S30 | 32UPU | 2018-12-25 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... |
371 | S30 | 32UPU | 2018-12-28 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... |
372 | S30 | 32UPU | 2018-12-30 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... |
You can also create the dataframe from the list of URLs with the following function:
[5]:
nasa_hls.dataframe_from_urls(urls_datasets).head(3)
[5]:
product | tile | date | url | |
---|---|---|---|---|
0 | L30 | 32UNU | 2018-01-03 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
1 | L30 | 32UNU | 2018-01-10 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
2 | L30 | 32UNU | 2018-01-12 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... |
Download¶
Download a single dataset¶
If you need the URL of a dataset you can parse it as follows. Note that you only get the *.hdf file and that for each of these files there is a corresponding *.hdf.hdr file.
[6]:
url = nasa_hls.parse_url(date="2018-04-02",
tile="32UNU",
product="L30",
version="v1.4")
url
[6]:
'https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/32/U/N/U/HLS.L30.T32UNU.2018092.v1.4.hdf'
You can also download the product directly. Note, that both the .hdf and .hdf.hdr files will be downloaded and as long as overwrite=False
the files will only be downloaded if they do not already exist at the destination location.
[7]:
nasa_hls.download(dstdir="./xxx_uncontrolled_hls/downloads",
date="2018-04-02",
tile="32UNU",
product="L30",
overwrite=False)
[7]:
0
Note that you get an HTTPError
if the url does not exist. For example:
nasa_hls.download(dstdir="./xxx_uncontrolled_hls/downloads",
date="2017-01-08",
tile="32UNU",
product="L30")
ERROR DURING DOWNLOAD: hls/HLS.L30.T32UNU.2017008.v1.4.hdf FROM https://hls.gsfc.nasa.gov/data/v1.4/L30/2017/32/U/N/U/HLS.L30.T32UNU.2017008.v1.4.hdf.
Traceback (most recent call last):
...
urllib.error.HTTPError: HTTP Error 404: Not Found
ERROR DURING DOWNLOAD: hls/HLS.L30.T32UNU.2017008.v1.4.hdf.hdr FROM https://hls.gsfc.nasa.gov/data/v1.4/L30/2017/32/U/N/U/HLS.L30.T32UNU.2017008.v1.4.hdf.hdr.
Traceback (most recent call last):
...
urllib.error.HTTPError: HTTP Error 404: Not Found
Download a batch of datasets¶
Given a dataframe as the one returned by dataframe_from_urls
it is possible to download a batch of datasets. That means we can filter the the data and download what we need.
For example, set us find April scenes of the tile 32UNU where we have a LS and S-2 aquisition on the same day.
[8]:
df_datasets["year"] = df_datasets.date.dt.year
df_datasets["month"] = df_datasets.date.dt.month
df_datasets["day"] = df_datasets.date.dt.day
ls_s2_aquisitions_same_day = df_datasets.duplicated(subset=["tile", "year", "month", "day"], keep=False)
df_download = df_datasets[(ls_s2_aquisitions_same_day) & \
#(df_datasets["tile"] == "32UNU") & \
(df_datasets["date"].dt.year == 2018) & \
(df_datasets["date"].dt.month == 4) ]
df_download = df_download.sort_values(["date", "tile", "product"])
df_download
[8]:
product | tile | date | url | year | month | day | |
---|---|---|---|---|---|---|---|
17 | L30 | 32UNU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 2 |
154 | S30 | 32UNU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 2 |
84 | L30 | 32UPU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 2 |
272 | S30 | 32UPU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 2 |
85 | L30 | 32UPU | 2018-04-09 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 9 |
275 | S30 | 32UPU | 2018-04-09 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 9 |
21 | L30 | 32UNU | 2018-04-25 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 25 |
163 | S30 | 32UNU | 2018-04-25 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 25 |
And download the matching datasets:
[9]:
nasa_hls.download_batch(dstdir="./xxx_uncontrolled_hls/downloads",
datasets=df_download,
version="v1.4",
overwrite=False)
100%|██████████| 8/8 [07:48<00:00, 58.61s/it]
[10]:
df_download["id"] = df_download["url"].str.split("/", expand=True)[11].str[0:-4]
df_download["path"] = "./xxx_uncontrolled_hls/downloads" + "/" + df_download["id"] + ".hdf"
df_download
[10]:
product | tile | date | url | year | month | day | id | path | |
---|---|---|---|---|---|---|---|---|---|
17 | L30 | 32UNU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 2 | HLS.L30.T32UNU.2018092.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN... |
154 | S30 | 32UNU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 2 | HLS.S30.T32UNU.2018092.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN... |
84 | L30 | 32UPU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 2 | HLS.L30.T32UPU.2018092.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP... |
272 | S30 | 32UPU | 2018-04-02 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 2 | HLS.S30.T32UPU.2018092.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP... |
85 | L30 | 32UPU | 2018-04-09 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 9 | HLS.L30.T32UPU.2018099.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UP... |
275 | S30 | 32UPU | 2018-04-09 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 9 | HLS.S30.T32UPU.2018099.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UP... |
21 | L30 | 32UNU | 2018-04-25 | https://hls.gsfc.nasa.gov/data/v1.4/L30/2018/3... | 2018 | 4 | 25 | HLS.L30.T32UNU.2018115.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.L30.T32UN... |
163 | S30 | 32UNU | 2018-04-25 | https://hls.gsfc.nasa.gov/data/v1.4/S30/2018/3... | 2018 | 4 | 25 | HLS.S30.T32UNU.2018115.v1.4 | ./xxx_uncontrolled_hls/downloads/HLS.S30.T32UN... |
[11]:
df_download.to_csv("./xxx_uncontrolled_hls/downloads/df_downloads.csv", index=False)