Data recipes 101 new header image

This blog post inaugurates a new series that deals with exploring the UP42 data platform using simple recipes.

The concept of recipes borrows from cookbooks and the idea is to provide developers with pragmatic how-tos. Similarly to cooking, we draw from multiple ingredients, meaning technologies. They'll keep things as simple as possible, while exploring the full range of possibilities offered by the UP42 data platform.

In this first data recipe, we follow the recipe by:

  1. Searching for data.
  2. Placing an order to acquire the data selected in step 1.
  3. Getting notified via a webhook when the data ordered in step 2 is delivered.
  4. Repeat starting from step 1 to order more data.
  • Technical skill level required: beginner to medium.

Requirements (Ingredients)

  • A Pipedream account: there are several ways to create an account. One of the easiest options is to authorize the application in your Github account. Your github identity will then be used throughout Pipedream.

  • An ngrok account: similar to Pipedream, there are several waysto create an account. Connecting your ngrok account to your Github account is one of the easiest ways to get started.

  • Minimal knowledge of Linux containers in general and Docker in particular.

  • Minimal knowledge of nginx.

Installation

Let's clone the repository first:

git clone https://github.com/up42/data-recipes.git

Setting up nginx locally

Launching our nginx instance

After cloning the repository and making sure that the requirements are met, we can now launch nginx. To do so, we issue the command:

make run

This make target will do the following:

  • Fetch the nginx mainline docker image based on alpine linux, if it’s not already present in your cache of container images.

  • Bind mount the nginx/nginx.conf file from the host to /etc/nginx/nginx.conf in the container.

  • Publish port 80 from the container, mapped to port 9898 in the host.

If you don't have this image available locally, it will be fetched from dockerhub after which it will run the container in the background (detached).

If you now list the running containers, you should be able to see the container up and running.

make list

The output should be something like:

docker ps -f "publish=9898"
CONTAINER ID   IMAGE                   COMMAND                  CREATED      STATUS      PORTS                  NAMES
037016f673cf   nginx:mainline-alpine   "/docker-entrypoint.…"   3 days ago   Up 3 days   0.0.0.0:9898->80/tcp   adoring_driscoll

If you open the URL http://localhost:9898 in your browser you should see the default nginx page:

nginx homepage image nginx default page

Highlights of the nginx configuration [optional]

Let's have a look at the relevant parts of the nginx configuration.


  ## For storing webhooks only allow GET requests.
  map $request_method $not_allowed_method {
    GET 0;
    default 1;
  }

  ## Deal with the information provided in the header relative to the UP42 data order.
  map $http_up42_order_info $valid_order_info {
    default 0;
    ~*^(?<order_id>[[:xdigit:]\-]+),[[:space:]]?(?<order_status>(fulfilled|failed_permanently))$ '"$order_id","$order_status"';
  }

  ## Log all order information as JSONL (https://jsonlines.org).
  log_format order_logs '["$date_gmt", "$order_id", "$order_status"]';

  server {
    listen 80;

    ## Only GET is allowed here.
    if ($not_allowed_method) {
      return 405 '{"code": 405, "msg": "method not allowed"}';
    }

    location / {
      root /usr/share/nginx/html;
      index index.html index.htm;
    }

    location /input {
      default_type application/json;

      ## Check to see if the received order information is valid.
      if ($valid_order_info = 0) {
        return 400 '{"code": 400, "msg": "Incorrect order ID and/or status"}';
      }

      ## Log all orders that are properly communicated.
      access_log /var/log/nginx/up42_order_log.jsonl order_logs if=$valid_order_info;

      ## Echo back the received order information.
      return 200 '{"code": 200, "msg": [$valid_order_info]}';
    }

    ## Get rid of the log polluting favicon.ico 404 error.
    location = /favicon.ico {
      access_log off;

      root /usr/local/openresty/nginx/html;
      try_files /favicon.ico =204;
    }

    # Ping location to find out if the server is up and running.
    location ~* /ping {
      default_type text/plain;
      return 200 'PONG';
    }
  }
}

HTTP context

  map $request_method $not_allowed_method {
    GET 0;
    default 1;
  }

The map directive is used to constrain the only accepted HTTP method to a GET.

  map $http_up42_order_info $valid_order_info {
    default 0;
    ~*^(?<order_id>[[:xdigit:]\-]+),[[:space:]]?(?<order_status>(fulfilled|failed_permanently))$ '"$order_id","$order_status"';
  }

We are intercepting the UP42-Order-Info header field, which is a list and storing it in two named captures: $order_status and $order_id, the names are self-explanatory in terms of what they represent for the UP42 webhook. Bear in mind that we are tracking order execution, and therefore we need both the order ID and the corresponding status. If the header is either missing or malformed, then $up42_order_info is 0, otherwise it has as value string "$order_if, order_info".

  log_format order_logs '["$date_gmt", "$order_id", "$order_status"]';

We are using a log format that is JSONLines.

Server context

    ## Only GET is allowed here.
    if ($not_allowed_method) {
      return 405 '{"code": 405, "msg": "method not allowed"}';
    }

Any request that is not GET is declined with a 405 status.

    location /input {
      default_type application/json;

      ## Check to see if the received order information is valid.
      if ($valid_order_info = 0) {
        return 400 '{"code": 400, "msg": "Incorrect order ID and/or status"}';
      }

      ## Log all orders that are properly communicated.
      access_log /var/log/nginx/up42_order_log.jsonl order_logs if=$valid_order_info;

      ## Echo back the received order information.
      return 200 '{"code": 200, "msg": [$valid_order_info]}';
    }

If the header with the order information is either missing or malformed, we signal an error with a 400 status.

If the order information is correct, then we log the JSONLine with the string in $valid_order_info.

    location ~* /ping {
      default_type text/plain;
      return 200 'PONG';
    }

We send a 200 status with PONG to confirm that the server is up and running.

Setting up ngrok

On the free version of ngrok you can only have one and only one tunnel active at each time.

The process of setting up of ngrok was covered in detail in a previous blog post.

Creating a tunnel from the ngrok server to our nginx instance

After setting up ngrok we do:

ngrok http --basic-auth='<username>:<password>' http://localhost:9898

Where:

  • username: username for the basic authentication.
  • password: password for the basic authentication.

At the end, we put the address of the local nginx instance, which in this case is listening on localhost, port 9898.

Once launched, the terminal is taken over by ngrok and you should have a console view of the currently running tunnel.

For additional details on running ngrok, please refer to a previous blog post.

Setting up Pipedream

In a previous article, we explained in detail how to set up a pipedream workflow. Building on that explanation, we'll now set up a workflow for our local nginx instance, which we have launched before.

Setup of the required Pipedream workflow and environment variables

Setup of the Pipedream workflow is covered extensively in the previous blog post.

The setup done there carries over, except for the environment variable related to the secret, named, up42_webhook_secret_drei.

recipes 101 image Webhook secret Pipedream environment variable

Pipedream HTTP client to forward UP42 order information to ngrok

The Python code to run on Pipedream is in the snippets directory.

The snippet named pipedream_up42_webhook_order_status_handler.py validates the webhook request as described elsewhere

The snippet named pipedream_order_info_http_forwarder.py creates the UP42-Order-Info header field and forwards the request to our local nginx instance.

# Simple HTTP client to forward an order related webhook information,
# as a custom HTTP header, to our development server running
# locally. It receives the webhook body from the previous step that
# validates the webhook.

# pipedream serialized objects.
from pipedream.script_helpers import (steps, export)

import os

from contextlib import suppress
# httpx and friends.
import httpx
from httpx_auth import Basic

def get_url(u):
    """Return the local development url.

    """
    return f"{os.environ['local_dev_host']}{u}"

# The user and password for the ngrok basic authentication
# are set as pipedream environment variables.
ngrok_basic_auth = Basic(os.environ["ngrok_basic_auth_user"], os.environ["ngrok_basic_auth_pass"])

# Test that the local server is up and running.
r = httpx.get(get_url("/ping"), auth=ngrok_basic_auth)
# If we get the proper response proceed.
assert r.status_code == 200 and r.text == "PONG", f"{os.environ['local_dev_host']} is not reachable."

# Issue the POST request in the given context.
with httpx.Client() as client:
    try:
        # Build an header that packs the order status information.
        body = steps["trigger"]["event"]["body"]["body"]
        up42_order_info_header  = {"UP42-Order-Info": f"{body['orderId']}, {body['status']}"}
        # Request the URL from the local server that will log the order status.
        r = client.get(get_url("/input"), auth=ngrok_basic_auth, headers=up42_order_info_header)
        # Raise an exception for any non 2XX status code.
        r.raise_for_status()
    except httpx.HTTPStatusError as exc:
        print(f"Error {exc.response.status_code} while requesting {exc.request.url!r}.")
    except httpx.HTTPError as exc:
        print(f"Error while requesting {exc.request.url!r}.")

    # Return the response body. Continue to returning None even is
    # there is a value error, e..g, response is not JSON.
    with suppress(ValueError):
        export("ngrok_response", r.json())

Carrying on with the SDK and a Jupyter notebook

From here on we continue this recipe exclusively on the Jupyter notebook.

Image credits

Kitchen counter by Andy Chilton on Unsplash.

António Almeida avatar

António Almeida

Senior Tech Evangelist

Improved tasking flow, advanced processing capabilities, new collections in the catalog, and more

Improved tasking flow, advanced processing capabilities, new collections in the catalog, and more

Tech

Revamped tasking flow with enhanced order status and delivery tracking We released a new tasking…

Pelle John
Enhance precision mining operations with geospatial data

Enhance precision mining operations with geospatial data

Tech

Site exploration With Earth observation data, you can easily survey sites remotely to analyze…

Dobrina Laleva
Bringing STAC to UP42 storage: lessons learned

Bringing STAC to UP42 storage: lessons learned

Tech

Data modeling challenges at UP42 Our journey with STAC started when we realized we had to adapt UP4…

Naman Jain and Dobrina Laleva

Subscribe to our newsletter and updates!

Only 1 in 200 people unsubscribe because quite frankly, they are awesome!