Raw BigEarthNet Data#
After understanding where the patches come from and how the patches were annotated, the following section will present and discuss the files inside the archives.
BigEarthNet-S2#
The general contents of the BigEarthNet-S2 archive looks as follows:
π BigEarthNet-S2-Example βββ π S2A_MSIL2A_20170613T101031_87_48 β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B01.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B02.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B03.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B04.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B05.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B06.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B07.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B08.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B8A.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B09.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B11.tif β βββ πΊοΈ S2A_MSIL2A_20170613T101031_87_48_B12.tif β βββ π S2A_MSIL2A_20170613T101031_87_48_labels_metadata.json βββ π S2A_MSIL2A_20170617T113321_4_55 β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B01.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B02.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B03.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B04.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B05.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B06.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B07.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B08.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B8A.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B09.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B11.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_4_55_B12.tif β βββ π S2A_MSIL2A_20170617T113321_4_55_labels_metadata.json βββ π S2A_MSIL2A_20170617T113321_36_85 β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B01.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B02.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B03.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B04.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B05.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B06.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B07.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B08.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B8A.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B09.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B11.tif β βββ πΊοΈ S2A_MSIL2A_20170617T113321_36_85_B12.tif β βββ π S2A_MSIL2A_20170617T113321_36_85_labels_metadata.json βββ π S2A_MSIL2A_20171221T112501_56_35 β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B01.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B02.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B03.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B04.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B05.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B06.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B07.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B08.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B8A.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B09.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B11.tif β βββ πΊοΈ S2A_MSIL2A_20171221T112501_56_35_B12.tif β βββ π S2A_MSIL2A_20171221T112501_56_35_labels_metadata.json βββ π S2B_MSIL2A_20170924T93020_69_24 β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B01.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B02.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B03.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B04.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B05.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B06.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B07.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B08.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B8A.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B09.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B11.tif β βββ πΊοΈ S2B_MSIL2A_20170924T93020_69_24_B12.tif β βββ π S2B_MSIL2A_20170924T93020_69_24_labels_metadata.json βββ π S2B_MSIL2A_20180204T94161_57_38 βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B01.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B02.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B03.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B04.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B05.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B06.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B07.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B08.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B8A.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B09.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B11.tif βββ πΊοΈ S2B_MSIL2A_20180204T94161_57_38_B12.tif βββ π S2B_MSIL2A_20180204T94161_57_38_labels_metadata.json
With the following conventions:
Each folder corresponds to a single patch
The
patch_name
is encoded as the name of the folderEach patch folder contains a GeoTIFF file for each of the 12 bands.
The name of the GeoTIFF file is encoded as
<patch_name>_<band>.tif
.
The JSON file, named
<patch_name>_labels_metadata.json
, contains the metadata
The prettified contents of a metadata file is:
{ "labels": [ "Non-irrigated arable land", "Land principally occupied by agriculture, with significant areas of natural vegetation" ], "coordinates": { "ulx": 404400, "uly": 5342400, "lrx": 405600, "lry": 5341200 }, "projection": "PROJCS[\"WGS 84 / UTM zone 33N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"W...", "tile_source": "S2A_MSIL1C_20170613T101031_N0205_R022_T33UUP_20170613T101608.SAFE", "acquisition_date": "2017-06-13 10:10:31" }
labels
: Lists the older CLC Level-3 nomenclature labels of the patchtile_source
: Shows the source tile that was further processed with sen2cor to generate the atmospherically corrected L2A product tileacquisition_date
: Encodes the acquisition date of the tile in theYYYY-MM-DD hh:mm:ss
formatcoordinates
: Encodes the upper left x/y (ulx
/uly
) and lower right x/y (lrx
/lry
) coordinates of the patchprojection
: Relates the values of thecoordinates
to the given coordinate reference systems (CRS)
The unshorted (prettified) projection
entry looks as follows:
PROJCRS["WGS 84 / UTM zone 33N", BASEGEOGCRS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]]], PRIMEM["Greenwich",0, ANGLEUNIT["degree",0.0174532925199433]], ID["EPSG",4326]], CONVERSION["UTM zone 33N", METHOD["Transverse Mercator", ID["EPSG",9807]], PARAMETER["Latitude of natural origin",0, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8801]], PARAMETER["Longitude of natural origin",15, ANGLEUNIT["degree",0.0174532925199433], ID["EPSG",8802]], PARAMETER["Scale factor at natural origin",0.9996, SCALEUNIT["unity",1], ID["EPSG",8805]], PARAMETER["False easting",500000, LENGTHUNIT["metre",1], ID["EPSG",8806]], PARAMETER["False northing",0, LENGTHUNIT["metre",1], ID["EPSG",8807]]], CS[Cartesian,2], AXIS["easting",east, ORDER[1], LENGTHUNIT["metre",1]], AXIS["northing",north, ORDER[2], LENGTHUNIT["metre",1]], ID["EPSG",32633]]
The projection
entry encodes the CRS information in the WKT format.
For most use-cases, it is sufficient to know, that the combination of the CRS and coordinates
values define the exact location of a patch.
For more details about what coordinate reference systems are, feel free to take a look at one of the following introductory courses:
BigEarthNet-S1#
π BigEarthNet-S1-Example βββ π S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48 β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_VH.tif β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_VV.tif β βββ π S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_labels_metadata.json βββ π S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55 β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_VH.tif β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_VV.tif β βββ π S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_labels_metadata.json βββ π S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85 β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_VH.tif β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_VV.tif β βββ π S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_labels_metadata.json βββ π S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24 β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_VH.tif β βββ πΊοΈ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_VV.tif β βββ π S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_labels_metadata.json βββ π S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35 β βββ πΊοΈ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_VH.tif β βββ πΊοΈ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_VV.tif β βββ π S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_labels_metadata.json βββ π S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38 βββ πΊοΈ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_VH.tif βββ πΊοΈ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_VV.tif βββ π S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_labels_metadata.json
With the following conventions:
Each folder corresponds to a single patch
The
patch_name
is encoded as the name of the folderBoth bands
VH
andVV
are saved as an indivdual GeoTIFF fileThe name of the GeoTIFF file is encoded as
<patch_name>_<band>.tif
.
The JSON file, named
<patch_name>_labels_metadata.json
, contains the metadata
The prettified contents of a metadata file is:
{ "labels": [ "Non-irrigated arable land", "Pastures" ], "coordinates": { "ulx": 643200, "uly": 5798040, "lrx": 644400, "lly": 5796840 }, "projection": "PROJCS[\"WGS 84 / UTM zone 29N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"W...", "corresponding_s2_patch": "S2A_MSIL2A_20170617T113321_36_85", "scene_source": "S1A_IW_GRDH_1SDV_20170617T064724_20170617T064749_017070_01C718_CED2", "acquisition_time": "2017-06-17T06:47:24" }
Warning
Compared to the BigEarthNet-S2 metadata file, BigEarthNet-S1:
calls the date field
acquisition_time
and notacquisition_date
(S2).Encodes the date with
YYYY-MM-DD
Thh:mm:ss
and notYYYY-MM-DD hh:mm:ss
(S2)
Metadata Discussion#
The advantages of having a JSON metadata file in every patch folder are:
JSON is a well known data format and has excellent library support
JSON is human-readable (not a binary format)
Locating the metadata next to the images allows the end-user to easily select subsets of the archive without having to deal with the metadata separately
Copying the patches of interest will always include the metadata
The main disadvantages is that each dataset (~80GB) has to be downloaded and that it is not easy to perform statistical analysis. The metadata files have to be parsed and converted into a common data structure first. Usually, the metadata is converted into a tabular format to allow the use of data analysis tools, such as pandas, or the geographical extension, geopandas.
Pre-converted metadata#
Instead of re-writing another parsing script, we recommend to use BigEarthNet GDF Builder. This library parses the all JSON files from the archive and converts them to a common geopandas parquet file. See BigEarthNet GDF Builder for more information.
Do make it simpler to do statistical analysis, we provide pre-converted files. These files (and the links) may change in the future!
BigEarthNet-S2
-
The original parquet file that is produced by parsing all metadata files and projecting to a common CRS
-
An extended version of the
raw_ben_gdf.parquet
file with additional metadata:19-class nomenclature
Covered by seasonal snow
Covered by clouds or shadows
Original split
Country
Season
-
The recommended subset of
extended_ben_gdf.parquet
, where no patch is covered by snow, clouds or shadows and every patch has at least one target label in the 19-class nomenclature
-
Example output#
from bigearthnet_gdf_builder.builder import get_gdf_from_s2_patch_dir
# gdf_builder also has a CLI tool to convert the entire archive into a single
# parquet file!
# Example "raw" subset
gdf = get_gdf_from_s2_patch_dir(ben_s2_path)
# showing first row as tables have display issues
gdf
labels | tile_source | acquisition_date | name | geometry | |
---|---|---|---|---|---|
0 | [Non-irrigated arable land, Land principally o... | S2A_MSIL1C_20170613T101031_N0205_R022_T33UUP_2... | 2017-06-13 10:10:31 | S2A_MSIL2A_20170613T101031_87_48 | POLYGON ((4598116.704 2796272.943, 4598037.799... |
1 | [Coniferous forest, Mixed forest, Transitional... | S2B_MSIL1C_20170924T93020_N0205_R136_T35VPK_20... | 2017-09-24 09:30:20 | S2B_MSIL2A_20170924T93020_69_24 | POLYGON ((5359104.563 4566290.008, 5358797.816... |
2 | [Complex cultivation patterns, Land principall... | S2A_MSIL1C_20171221T112501_N0206_R037_T29SND_2... | 2017-12-21 11:25:01 | S2A_MSIL2A_20171221T112501_56_35 | POLYGON ((2759565.095 1995691.950, 2759833.735... |
3 | [Pastures] | S2A_MSIL1C_20170617T113321_N0205_R080_T29UPU_2... | 2017-06-17 11:33:21 | S2A_MSIL2A_20170617T113321_4_55 | POLYGON ((3153761.287 3421649.939, 3154073.913... |
4 | [Non-irrigated arable land, Pastures] | S2A_MSIL1C_20170617T113321_N0205_R080_T29UPU_2... | 2017-06-17 11:33:21 | S2A_MSIL2A_20170617T113321_36_85 | POLYGON ((3181331.537 3376775.097, 3181642.981... |
5 | [Non-irrigated arable land, Coniferous forest,... | S2B_MSIL1C_20180204T94161_N0206_R036_T35VPK_20... | 2018-02-04 09:41:56 | S2B_MSIL2A_20180204T94161_57_38 | POLYGON ((5349435.382 4546596.083, 5349128.907... |
Parquet files allow for easy data-processing and visualization. These files work particularly well with geopandas:
Important
Instead of writing another metadata loading script:
Download one of the pre-converted files or
Use the BigEarthNet GDF Builder tool to convert the metadata into a tabular format