Motivation
Two of the most outstanding features of the UP42 platform are that:
-
It offers Container as a Service: bring your own algorithm and/or data set. Just package it in an OCI compatible container format and push it to the UP42 container registry. You can now access your algorithm and/or your data in any workflow you desire.
-
The ability to run tree-like workflows where a piece of data can be shared across multiple processing tasks. For example, compute a set of vegetation indexes from a reflectance product and derive the display product from the same reflectance product. There's no need to have separate workflows and jobs.
Possible workflows in UP42
In a tree-like workflow, each child node can be a parent node of other trees. In the figure, we omitted that for the sake of simplicity. Instead, it shows a tree with just two branches from the data block. Theoretically, the tree can be as wide and as deep as you want.
This article is the first in a series exploring these aspects. Here, we focus on the simplicity of bringing a processing algorithm to UP42 and using it for the diverse sources of data available.
The context could be just trying out some proof of concept on the cheap, or a concrete project or product integration. UP42 allows for quick experimentation of ideas and a more definitive product integration where you leverage the available computation and data functionalities.
This is a foundational article where we will establish concepts that will be used subsequently.
Requirements
It is assumed that you have a decent understanding of Linux containers in general and are familiar with Dockerfile as a container build frontend. If that is not the case, we hope you will feel motivated to learn more about it.
Equally assumed is that you have a minimal understanding of remote sensing, e.g., what is a reflectance product and what is a visual product, also referred to as a display product.
Problem statement
As a developer at my company, I have a customer project or a product feature that requires us to compute a set of vegetation indexes quickly and easily. This is just a proof of concept, and the idea is to get something up and running quickly and iteratively to improve it.
Additionally, we also attempt to build useful algorithms (often used/requested by customers) with the minimum amount of code possible while also leveraging existing utilities available as free software.
For the purpose of this article, we'll build a custom block: an implementation of the vegetation indexes computation we need using the Container as a Service (CaaS) functionality offered by UP42. Let's get right into it.
Building a custom block from 0 to 100 (gently)
A custom block is a container packaging your algorithm or data source that gets built and pushed into the UP42 container registry, which you can then use in any workflow of your choosing, provided that the block satisfies the compatibility requirements of what comes before (and after) in a workflow.
The first thing to consider is what data the block receives and what data it will deliver after running its algorithm.
In our case, we want to compute a set of vegetation indexes for both commercially available and open data, respectively Pléiades 1A/1B, SPOT 6/7 and Sentinel-2A/Sentinel-2B.
How do we make sure that our block accepts these data sources? It's now time to introduce the block manifest.
Step 1: the block manifest
The block manifest is a format for specifying how a particular block (called "task" in a workflow context) fits into a given workflow.
A workflow is a Directed Acyclic Graph (DAG) that defines how data flows across a set of tasks, i.e., blocks. Tasks and blocks are terms that are used interchangeably. While these terms mean the same thing, the term task is more indicative of its role in the workflow. In a nutshell: what we create is a block that is run as a task in a given workflow.
Currently, UP42 supports two types of DAGs as workflows:
-
A graph that represents a singly linked list. The nodes and edges are arranged in a linear chain-like structure.
-
A polytree or directed tree.
In this article, we will focus on the first, a singly linked list. Tree-like workflows will be discussed in an upcoming article.
What is inside a manifest?
The manifest is a JSON document that follows a JSON Schema, specifying how a given block can be inserted into the workflow. It defines the possible adjacent nodes of the graph: what can come before and what can come after the block, and the type of machine where the block will be run.
Enough theory. Let's get into practice. Here is the manifest for our custom block that computes vegetation indexes for Pléiades, SPOT and Sentinel-2 satellite imagery.
{
"_up42_specification_version": 2,
"name": "vegetation-indices-omnibus",
"type": "processing",
"tags": [
"imagery",
"processing",
"vegetation index"
],
"display_name": "Vegetation Indices Omnibus",
"description": "Calculates NDVI, GNDVI, WDRVI, EVI, EVI2, EVI22, SIPI, SIPI3, CVI, CIG, ReCI, NDRE, ARVI, VARI, SAVI, OSAVI, MSAVI, NDSI, NDWI, NDWI2, BAI and NBR for Pléiades, SPOT and Sentinel-2.",
"parameters": {
"indexes": {
"type": "array",
"default": ["evi", "ndvi"]
},
"ram": {
"type": "integer",
"default": 256
},
"arvi_y": {
"type": "number",
"default": 0.106
},
"wdrvi_a": {
"type": "number",
"default": 0.1
}
},
"machine": {
"type": "xxlarge"
},
"input_capabilities": {
"raster": {
"up42_standard": {
"format": "GTiff",
"bands": [
"red",
"nir"
],
"sensor": {
"or": [
"Pleiades",
"SPOT",
"Sentinel2"
]
},
"dtype": `uint16`
}
}
},
"output_capabilities": {
"raster": {
"up42_standard": {
"format": "GTiff",
"sensor": ">",
"dtype": "float"
}
}
}
}
All the operators and fields are described in the documentation. We won't go into details. The parameters that the manifest specifies are:
-
JSON Schema specification version.
-
Block type: data or processing.
-
Block name (machine name).
-
Input/output data types and file formats that the block consumes/delivers.
-
Input/output electromagnetic bands that the block consumes/delivers.
-
Input/output satellite sensors (if applicable)consumes/delivers.
-
The block options/parameters.
-
The size of the machine that the block requires to run.
-
Metadata such as description, tags, human-readable name (
display_name
).
Some are mandatory, and some are optional. The more detailed we are in specifying the inputs and outputs, sensors, and bands, the simpler it will be for the block code, as the inputs and outputs are more constrained.
Our manifest
In this case, we decided:
-
The block machine name is
vegetation-indexes-omnibus
. This is the name that will be used in the console when building a workflow. -
To accept GeoTIFF as the input and output file format.
-
Input data is in 16-bit unsigned int (
unint16
), and output is infloat
(64 bit). -
The block is a processing block.
-
Input sensors are Pléiades, SPOT and Sentinel-2.
-
Input bands are Near InfraRed (NIR) and Red. This is the minimal requirement. We could have defined a much more complex band set, e.g., Blue and other bands from Sentinel-2. But that would be too complex for a block that aims to be as simple as possible and motivate custom block building.
-
To run the block on an xxlarge machine. This means that it will pick up up to 8 cores and up to 40 GB of memory when running the container with the block.
-
As output sensors, we just forward what is defined as input, specified by
>
.
For parameters, we have 4 items:
-
indexes
: an array with the index names (string) in lowercase: "ndvi", "evi", etc. -
ran
: the amount of RAM that the code can use for computing the indexes. -
arvi_y
: the value of the y coefficient for computing ARVI. -
wdrvi_a
: the value of the coefficient for computing WDRVI.
Defaults are defined for all.
There is also some metadata, like the block description, tags, and human-readable name (display_name
).
Now that we have the manifest, let's move on to the code that computes the indexes.
Step 2: the algorithm (code)
Outline
For the algorithm, we rely on the OrfeoToolbox BandMath utility to compute a set of vegetation indexes:
-
NDVI - Normalized Difference Vegetation Index
-
GNDVI - Green Normalized Difference Vegetation Index
-
WDRVI - Wide Dynamic Range Vegetation Index
-
EVI - Enhanced Vegetation Index
-
EVI2 - Enhanced Vegetation Index 2
-
EVI2-2 - Enhanced Vegetation Index 2 - 2
-
SIPI - Structure Intensive Pigment Index
-
SIPI3 - Structure Intensive Pigment Index 3
-
CVI - Chlorophyll Vegetation Index
-
CIG - Chlorophyll Index Green
-
ReCI - Red Edge Chlorophyll Index
-
NDRE - Normalized Difference Red Edge
-
ARVI - Atmospherically Resistant Vegetation Index
-
VARI - Visible Atmospherically Resistant Index
-
SAVI - Soil-Adjusted Vegetation Index
-
OSAVI - Optimized Soil Adjusted Vegetation Index
-
MSAVI - Modified Soil Adjusted Vegetation Index
-
NDSI - Normalized Difference Snow Index
-
NDWI/NDWI2 - Normalized Difference Water Index for water body detection (NDWI) and drought and irrigation management (NDWI2)
-
BAI - Burn Area Index
-
NBR - Normalized Burn Ratio
The code in some detail
The code itself is quite simple. It is written in Bash. We want a docker image that's as small as possible. Bash is quite simple and portable across multiple environments.
The software requirements are given in the README.
Written in C++, the OrfeoToolbox (OTB) is maintained by the French Space Agency — Centre National d'Études Spatiales (CNES). It consists of a series of command-line utilities to work with remote sensing data sets from a user point of view, and a toolkit/library to program remote sensing algorithms in C++.
It also has a Python API and a QGIS interface.
Performance-wise, OTB offers Streaming and Threading for handling large datasets effectively.
The biggest chunk of code corresponds to the computation of each index. For most of the considered indexes, the formulas for Sentinel-2 are different from the formulas for Pléiades or SPOT. This results from the higher spectral resolution (more bands) available to Sentinel-2. The larger number of bands translates into more informed choices when computing a given index.
The band mathematical operations are written as muparser formulas: muparser is the C++ library used by the OTB band math command-line utility to do the calculations. Under the hood, it invokes standard C++ mathematical library operations.
OTB's main building blocks are the Geospatial Data Abstraction Library (GDAL) and the Insight Toolkit (ITK). With GDAL handling mostly I/O functionality and the algorithmic part implemented using ITK.
The Python API that OTB provides may not be ideal. New initiatives like pyotb and otbtf (TensorFlow and OTB integration) try to address this issue by providing simpler interfaces.
OTB is widely used throughout research and industry as it is integrated into remote sensing image processing pipelines. E.g., atmospheric correction, cloud masking, classification, etc.
The code is minimally documented. A dispatching function invokes a calculation function for each of the indexes. A GeoJSON output (data.json
) file is written enumerating the output directory for each image. Output GeoTIFF files are written with names created by appending the index acronym (NDVI, EVI, etc.) lower case, prefixed by an underscore (_
) to the input file name. E.g., my_input_image_pan_evi.tif
is the output file created when computing the EVI for the input file named my_input_image.tif
.
Level-2 images are required to compute vegetation indexes correctly, i.e., we need atmospherically corrected reflectance values, usually called Bottom-of-Atmosphere (BOA) reflectance. Non-corrected reflectance values will shift the index values to varying degrees depending on the specific index.
Command-line arguments
The script accepts up to four optional command-line arguments:
-
-p
: A JSON document with the parameters accepted by the block. -
-i
: The input directory. The path to the data.json file of the input imagery is. -
`-o: The output directory. The path to the data.json file of the output imagery is.
-
-t
: Path to the otbcli_BandMath utility from OTB.
If any optional arguments are not provided, the script uses default values. These are:
-
block parameters (
-p
): it extracts these from the string given as a value of the UP42_TASK_PARAMETERS environment variable. -
input directory (
-i
): /tmp/input -
output directory (
-o
): /tmp/output -
path to otbcli_BandMath (
-t
): obtained from the operating system. It usescommand -v
from Bash to find the path.
We will see below that these defaults correspond to the conventions set by the UP42 platform and are used inside the container.
Step 3: the Dockerfile
Dockerfile preamble
Now, let's look at the Dockerfile. First is the definition.
A Dockerfile is a text file that uses a Domain Specific Language (DSL) to define how a container image is to be assembled. The DSL used by Dockerfiles, syntactically, is inspired by the UNIX shell, but it differs considerably in terms of semantics. In fact, if you are familiar with shell programming, you might be tempted to treat the Dockerfile as a shell script. If you do that, you will soon find out that it behaves rather differently from a shell. It is beyond the scope of this article to delve into the details of Dockerfile specification. There are plenty of materials available on the web about it and also quite an extensive published literature. It is important to note that Dockerfile is the most widely used frontend for buildkit, which is part of the Moby project. The goal of the Moby project is to build a free software framework for container tooling.
The Dockerfile line by line
Here is our Dockerfile.
1 FROM debian:testing-slim
# Required packages.
4 RUN apt-get update && \
5 apt-get install -y --no-install-recommends jq moreutils coreutils bash otb-bin
# Include the manifest.
8 ARG manifest
9 LABEL "up42_manifest"=$manifest
# Run the script as non-root.
12 ARG OTB_USERNAME="otbuser"
13 RUN useradd -ms /bin/bash $OTB_USERNAME
14 USER $OTB_USERNAME
15 WORKDIR /home/$OTB_USERNAME
# Get the script for computing the indexes.
18 COPY --chown=$OTB_USERNAME shell/indexes.sh .
19 RUN chmod 755 indexes.sh
# OrfeoToolbox environment variables.
22 ENV ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=8\
23 ENV GDAL_NUM_THREADS=8
# Invoking the script to compute the indexes.
26 CMD ["./indexes.sh"]
-
Line 1: We use the Debian testing (slim) distribution as the base image. You might be thinking, why not use Alpine instead of Debian for reducing the image size? We tried that, but due to the very large size of the OTB binary, we ended up with an image that is 400 MB larger than the one using Debian (slim).
-
Lines 4-5: Installing the required Debian packages. Bear in mind that we rely on OTB as provided by Debian. It installs quite a few extra packages needed by the OTB GUI but not by the command-line utilities. This results from how Debian has organized the packages and their respective dependencies. To further optimize the image size, we would need to compile OTB ourselves to remove all GUI related components.
-
Lines 8-9: Including the manifest. The manifest is included in the image to be built as metadata. This is via the LABEL instruction. This metadata is then extracted on the UP42 platform and used to build the block compatibility graph.
-
Lines 12-14: It is always good practice to ensure that the software packaged in a container runs with as few privileges as possible. In this case, we set up a user (otbuser) that will invoke the script to compute the indexes. Additional security-related constraints, such as dropping capabilities at runtime, are handled internally by the UP42 platform.
-
Line 15: We set up the working directory to be the home directory of otbuser.
-
Lines 18-19: We copy the script that computes the indexes to the working directory and change the file ownership to be otbuser, and we also make the script executable.
-
Lines 22-23: We adjust the machine size to the number of threads we hint OTB to use through the environment variables
ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS
andGDAL_NUM_THREADS
-
Line 26: Run the script to compute the indexes. Note that we rely on the default values for the command line arguments.
Step 4: test & run the code locally
There are currently no tests for the code. The reasons for this are:
-
This experimental block intends to motivate developers when experimenting with processing algorithm ideas on the cheap, so it acts as a disposable processing block.
-
Since we rely on OTB, we are leveraging all the testing done upstream by the OTB developers.
-
We want to keep the ecosystem around this block as small as possible, bringing in testing frameworks that would involve external libraries and utilities. There is currently no limit to the length of the string assigned to
LABEL
. It relies onstrings.SplitN
from the Golangstrings
library.
That said, testing this code would explore the properties of the algorithm. This would amount to creating generators for arbitrary images that expose certain properties of each of the indexes, e.g., looking for invariants. Another option somewhat related to property-based testing is randomized testing, which would allow us to randomize images, inspect the output, and compare with what was initially expected.
Step 5: Bringing it all together
Prerequisite: UP42 account
You will need an UP42 account to bring it all together, i.e., run your processing block with all manner of satellite imagery, in this case Pléiades, SPOT and Sentinel-2.
Creating an account is free, and you get 100 EUR of credits to experiment with on the platform. If you sign up as an individual, you can only access non-commercial data out of the box. If you sign up as an organization, you will immediately access commercial data. Eventually, individuals can also access commercial data, but it requires you to request it on the console explicitly.
Building, running and pushing the image
The README on GitHub details all the steps from the software requirements to build, run and push the image to the UP42 docker registry. What this entails is:
-
Clone the repository.
-
Build the container image.
-
Get sample data.
-
Run the container locally with the sample data.
-
Push the container image to the UP42 container registry.
-
Use the container (block) in workflows.
-
Done.
We assume that you have read and followed all the steps enumerated above.
Some checkpoints for the process are listed above:
- After building the container, you should see the image in your list of images:
docker image ls | grep indexes
registry.up42.com/d39fe05a-400c-44f6-b770-86990f64b005/simple-vegetation-indexes latest 5a532f542165 9 hours ago 694MB
- After pushing the block to the UP42 container registry, you should see the block in your custom blocks on the UP42 console:
Building a workflow and running our block on UP42
Possible workflows
There are four simple workflows for you to use with the custom block you just built and pushed to UP42. We classify these workflows below by data source:
Sentinel-2 L2A Analytical
Sentinel-2 based workflow
Pléiades Reflectance
Pléiades based workflow
SPOT Reflectance
SPOT based workflow
Processing from storage: Re-use any previously acquired Pléaides or SPOT products
Processing from storage based workflow
End-to-end example
We will showcase one of these workflows.
We are going to run one example, using Pléiades as the data source.
The original imagery is shown here after being expediently converted to a bit depth of 8 through histogram (linear) stretching. The original files are provided in 16 bits per pixel, and the image sensor's linear response to light intensity makes it unfit for human viewing.
We will look at an Area of Interest (AOI) located in Baie-Comeau, Québec, Canada.
Showing some results
Original Pléaides image.
Baie-Comeau, Québec, October 5th, 2021
The yellow spots on the image are artifacts created by the refraction of light caused by clouds. There are some clouds over this area.
And the image resulting from computation of four vegetation indexes, with a green-based colormap.
NDVI
NDVI for Baie-Comeau, Québec, October 5th, 2021
EVI
EVI for Baie-Comeau, Québec, October 5th, 2021
CIG
CIG for Baie-Comeau, Québec, October 5th, 2021
SAVI
SAVI for Baie-Comeau, Québec, October 5th, 2021
Conclusions
UP42 offers the possibility of not only easily experimenting with ideas regarding remote sensing processing algorithms, but it also allows you to go further and materialize those ideas.
Leveraging existing tools, as we did here with OTB, should be your first impulse. Typically, you should only resort to developing something from scratch if no free software option exists.
If you are not familiar with Docker or with containers in general, then use the development of custom block(s) in UP42 as an opportunity to do so. This will allow you to explore these topics to the depth you deem appropriate.
We're just scratching the surface of what is possible. In upcoming articles, we'll delve even deeper and illustrate more complex possibilities.
Next in this series
In the next article in this series, we will be creating a custom processing block using Xarray-spatial.
If you have any questions or comments, please feel free to contact us at [email protected] or reach out to António Almeida at https://twitter.com/perusio.
Header image: Baie-Comeau, Québec, Canada by Hussein Abdallah.