Raw BigEarthNet Data#

After understanding where the patches come from and how the patches were annotated, the following section will present and discuss the files inside the archives.

BigEarthNet-S2#

The general contents of the BigEarthNet-S2 archive looks as follows:

πŸ“‚ BigEarthNet-S2-Example
β”œβ”€β”€ πŸ“‚ S2A_MSIL2A_20170613T101031_87_48
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B01.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B02.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B03.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B04.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B05.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B06.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B07.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B08.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B8A.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B09.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B11.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170613T101031_87_48_B12.tif
β”‚   └── πŸ“„ S2A_MSIL2A_20170613T101031_87_48_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S2A_MSIL2A_20170617T113321_4_55
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B01.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B02.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B03.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B04.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B05.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B06.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B07.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B08.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B8A.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B09.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B11.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_4_55_B12.tif
β”‚   └── πŸ“„ S2A_MSIL2A_20170617T113321_4_55_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S2A_MSIL2A_20170617T113321_36_85
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B01.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B02.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B03.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B04.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B05.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B06.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B07.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B08.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B8A.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B09.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B11.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20170617T113321_36_85_B12.tif
β”‚   └── πŸ“„ S2A_MSIL2A_20170617T113321_36_85_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S2A_MSIL2A_20171221T112501_56_35
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B01.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B02.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B03.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B04.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B05.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B06.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B07.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B08.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B8A.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B09.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B11.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2A_MSIL2A_20171221T112501_56_35_B12.tif
β”‚   └── πŸ“„ S2A_MSIL2A_20171221T112501_56_35_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S2B_MSIL2A_20170924T93020_69_24
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B01.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B02.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B03.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B04.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B05.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B06.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B07.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B08.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B8A.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B09.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B11.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20170924T93020_69_24_B12.tif
β”‚   └── πŸ“„ S2B_MSIL2A_20170924T93020_69_24_labels_metadata.json
└── πŸ“‚ S2B_MSIL2A_20180204T94161_57_38
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B01.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B02.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B03.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B04.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B05.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B06.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B07.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B08.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B8A.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B09.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B11.tif
    β”œβ”€β”€ πŸ—ΊοΈ S2B_MSIL2A_20180204T94161_57_38_B12.tif
    └── πŸ“„ S2B_MSIL2A_20180204T94161_57_38_labels_metadata.json

With the following conventions:

  • Each folder corresponds to a single patch

  • The patch_name is encoded as the name of the folder

  • Each patch folder contains a GeoTIFF file for each of the 12 bands.

    • The name of the GeoTIFF file is encoded as <patch_name>_<band>.tif.

  • The JSON file, named <patch_name>_labels_metadata.json, contains the metadata

The prettified contents of a metadata file is:

{
  "labels": [
    "Non-irrigated arable land",
    "Land principally occupied by agriculture, with significant areas of natural vegetation"
  ],
  "coordinates": {
    "ulx": 404400,
    "uly": 5342400,
    "lrx": 405600,
    "lry": 5341200
  },
  "projection": "PROJCS[\"WGS 84 / UTM zone 33N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"W...",
  "tile_source": "S2A_MSIL1C_20170613T101031_N0205_R022_T33UUP_20170613T101608.SAFE",
  "acquisition_date": "2017-06-13 10:10:31"
}
  • labels: Lists the older CLC Level-3 nomenclature labels of the patch

  • tile_source: Shows the source tile that was further processed with sen2cor to generate the atmospherically corrected L2A product tile

  • acquisition_date: Encodes the acquisition date of the tile in the YYYY-MM-DD hh:mm:ss format

  • coordinates: Encodes the upper left x/y (ulx/uly) and lower right x/y (lrx/lry) coordinates of the patch

  • projection: Relates the values of the coordinates to the given coordinate reference systems (CRS)

The unshorted (prettified) projection entry looks as follows:

PROJCRS["WGS 84 / UTM zone 33N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 33N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",15,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["northing",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    ID["EPSG",32633]]

The projection entry encodes the CRS information in the WKT format. For most use-cases, it is sufficient to know, that the combination of the CRS and coordinates values define the exact location of a patch. For more details about what coordinate reference systems are, feel free to take a look at one of the following introductory courses:

BigEarthNet-S1#

πŸ“‚ BigEarthNet-S1-Example
β”œβ”€β”€ πŸ“‚ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_VH.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_VV.tif
β”‚   └── πŸ“„ S1A_IW_GRDH_1SDV_20170613T165043_33UUP_87_48_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_VH.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_VV.tif
β”‚   └── πŸ“„ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_4_55_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_VH.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_VV.tif
β”‚   └── πŸ“„ S1A_IW_GRDH_1SDV_20170617T064724_29UPU_36_85_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_VH.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_VV.tif
β”‚   └── πŸ“„ S1A_IW_GRDH_1SDV_20170925T043256_35VPK_69_24_labels_metadata.json
β”œβ”€β”€ πŸ“‚ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_VH.tif
β”‚   β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_VV.tif
β”‚   └── πŸ“„ S1A_IW_GRDH_1SDV_20171221T064238_29SND_56_35_labels_metadata.json
└── πŸ“‚ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38
    β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_VH.tif
    β”œβ”€β”€ πŸ—ΊοΈ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_VV.tif
    └── πŸ“„ S1A_IW_GRDH_1SDV_20180204T043253_35VPK_57_38_labels_metadata.json

With the following conventions:

  • Each folder corresponds to a single patch

  • The patch_name is encoded as the name of the folder

  • Both bands VH and VV are saved as an indivdual GeoTIFF file

    • The name of the GeoTIFF file is encoded as <patch_name>_<band>.tif.

  • The JSON file, named <patch_name>_labels_metadata.json, contains the metadata

The prettified contents of a metadata file is:

{
  "labels": [
    "Non-irrigated arable land",
    "Pastures"
  ],
  "coordinates": {
    "ulx": 643200,
    "uly": 5798040,
    "lrx": 644400,
    "lly": 5796840
  },
  "projection": "PROJCS[\"WGS 84 / UTM zone 29N\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"W...",
  "corresponding_s2_patch": "S2A_MSIL2A_20170617T113321_36_85",
  "scene_source": "S1A_IW_GRDH_1SDV_20170617T064724_20170617T064749_017070_01C718_CED2",
  "acquisition_time": "2017-06-17T06:47:24"
}

Warning

Compared to the BigEarthNet-S2 metadata file, BigEarthNet-S1:

  • calls the date field acquisition_time and not acquisition_date (S2).

  • Encodes the date with YYYY-MM-DDThh:mm:ss and not YYYY-MM-DD hh:mm:ss (S2)

Metadata Discussion#

The advantages of having a JSON metadata file in every patch folder are:

  1. JSON is a well known data format and has excellent library support

  2. JSON is human-readable (not a binary format)

  3. Locating the metadata next to the images allows the end-user to easily select subsets of the archive without having to deal with the metadata separately

    • Copying the patches of interest will always include the metadata

The main disadvantages is that each dataset (~80GB) has to be downloaded and that it is not easy to perform statistical analysis. The metadata files have to be parsed and converted into a common data structure first. Usually, the metadata is converted into a tabular format to allow the use of data analysis tools, such as pandas, or the geographical extension, geopandas.

Pre-converted metadata#

Instead of re-writing another parsing script, we recommend to use BigEarthNet GDF Builder. This library parses the all JSON files from the archive and converts them to a common geopandas parquet file. See BigEarthNet GDF Builder for more information.

Do make it simpler to do statistical analysis, we provide pre-converted files. These files (and the links) may change in the future!

  • BigEarthNet-S2

    • raw_ben_gdf.parquet

      • The original parquet file that is produced by parsing all metadata files and projecting to a common CRS

    • extended_ben_gdf.parquet

      • An extended version of the raw_ben_gdf.parquet file with additional metadata:

        • 19-class nomenclature

        • Covered by seasonal snow

        • Covered by clouds or shadows

        • Original split

        • Country

        • Season

    • final_ben.parquet

      • The recommended subset of extended_ben_gdf.parquet, where no patch is covered by snow, clouds or shadows and every patch has at least one target label in the 19-class nomenclature

Example output#

from bigearthnet_gdf_builder.builder import get_gdf_from_s2_patch_dir

# gdf_builder also has a CLI tool to convert the entire archive into a single
# parquet file!
# Example "raw" subset
gdf = get_gdf_from_s2_patch_dir(ben_s2_path)
# showing first row as tables have display issues
gdf
labels tile_source acquisition_date name geometry
0 [Non-irrigated arable land, Land principally o... S2A_MSIL1C_20170613T101031_N0205_R022_T33UUP_2... 2017-06-13 10:10:31 S2A_MSIL2A_20170613T101031_87_48 POLYGON ((4598116.704 2796272.943, 4598037.799...
1 [Coniferous forest, Mixed forest, Transitional... S2B_MSIL1C_20170924T93020_N0205_R136_T35VPK_20... 2017-09-24 09:30:20 S2B_MSIL2A_20170924T93020_69_24 POLYGON ((5359104.563 4566290.008, 5358797.816...
2 [Complex cultivation patterns, Land principall... S2A_MSIL1C_20171221T112501_N0206_R037_T29SND_2... 2017-12-21 11:25:01 S2A_MSIL2A_20171221T112501_56_35 POLYGON ((2759565.095 1995691.950, 2759833.735...
3 [Pastures] S2A_MSIL1C_20170617T113321_N0205_R080_T29UPU_2... 2017-06-17 11:33:21 S2A_MSIL2A_20170617T113321_4_55 POLYGON ((3153761.287 3421649.939, 3154073.913...
4 [Non-irrigated arable land, Pastures] S2A_MSIL1C_20170617T113321_N0205_R080_T29UPU_2... 2017-06-17 11:33:21 S2A_MSIL2A_20170617T113321_36_85 POLYGON ((3181331.537 3376775.097, 3181642.981...
5 [Non-irrigated arable land, Coniferous forest,... S2B_MSIL1C_20180204T94161_N0206_R036_T35VPK_20... 2018-02-04 09:41:56 S2B_MSIL2A_20180204T94161_57_38 POLYGON ((5349435.382 4546596.083, 5349128.907...

Parquet files allow for easy data-processing and visualization. These files work particularly well with geopandas:

Make this Notebook Trusted to load map: File -> Trust Notebook

Important

Instead of writing another metadata loading script:

  • Download one of the pre-converted files or

  • Use the BigEarthNet GDF Builder tool to convert the metadata into a tabular format