This blog post inaugurates a new series that deals with exploring the UP42 data platform using simple recipes.
The concept of recipes borrows from cookbooks and the idea is to provide developers with pragmatic how-tos. Similarly to cooking, we draw from multiple ingredients, meaning technologies. They'll keep things as simple as possible, while exploring the full range of possibilities offered by the UP42 data platform.
In this first data recipe, we follow the recipe by:
- Searching for data.
- Placing an order to acquire the data selected in step 1.
- Getting notified via a webhook when the data ordered in step 2 is delivered.
- Repeat starting from step 1 to order more data.
- Technical skill level required: beginner to medium.
Requirements (Ingredients)
-
A Pipedream account: there are several ways to create an account. One of the easiest options is to authorize the application in your Github account. Your github identity will then be used throughout Pipedream.
-
An ngrok account: similar to Pipedream, there are several waysto create an account. Connecting your ngrok account to your Github account is one of the easiest ways to get started.
-
Minimal knowledge of Linux containers in general and Docker in particular.
-
Minimal knowledge of nginx.
Installation
Let's clone the repository first:
git clone https://github.com/up42/data-recipes.git
Setting up nginx locally
Launching our nginx instance
After cloning the repository and making sure that the requirements are met, we can now launch nginx. To do so, we issue the command:
make run
This make target will do the following:
-
Fetch the nginx mainline docker image based on alpine linux, if it’s not already present in your cache of container images.
-
Bind mount the
nginx/nginx.conf
file from the host to/etc/nginx/nginx.conf
in the container. -
Publish port
80
from the container, mapped to port9898
in the host.
If you don't have this image available locally, it will be fetched from dockerhub after which it will run the container in the background (detached).
If you now list the running containers, you should be able to see the container up and running.
make list
The output should be something like:
docker ps -f "publish=9898"
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
037016f673cf nginx:mainline-alpine "/docker-entrypoint.…" 3 days ago Up 3 days 0.0.0.0:9898->80/tcp adoring_driscoll
If you open the URL http://localhost:9898
in your browser you should
see the default nginx page:
nginx default page
Highlights of the nginx configuration [optional]
Let's have a look at the relevant parts of the nginx configuration.
## For storing webhooks only allow GET requests.
map $request_method $not_allowed_method {
GET 0;
default 1;
}
## Deal with the information provided in the header relative to the UP42 data order.
map $http_up42_order_info $valid_order_info {
default 0;
~*^(?<order_id>[[:xdigit:]\-]+),[[:space:]]?(?<order_status>(fulfilled|failed_permanently))$ '"$order_id","$order_status"';
}
## Log all order information as JSONL (https://jsonlines.org).
log_format order_logs '["$date_gmt", "$order_id", "$order_status"]';
server {
listen 80;
## Only GET is allowed here.
if ($not_allowed_method) {
return 405 '{"code": 405, "msg": "method not allowed"}';
}
location / {
root /usr/share/nginx/html;
index index.html index.htm;
}
location /input {
default_type application/json;
## Check to see if the received order information is valid.
if ($valid_order_info = 0) {
return 400 '{"code": 400, "msg": "Incorrect order ID and/or status"}';
}
## Log all orders that are properly communicated.
access_log /var/log/nginx/up42_order_log.jsonl order_logs if=$valid_order_info;
## Echo back the received order information.
return 200 '{"code": 200, "msg": [$valid_order_info]}';
}
## Get rid of the log polluting favicon.ico 404 error.
location = /favicon.ico {
access_log off;
root /usr/local/openresty/nginx/html;
try_files /favicon.ico =204;
}
# Ping location to find out if the server is up and running.
location ~* /ping {
default_type text/plain;
return 200 'PONG';
}
}
}
HTTP context
map $request_method $not_allowed_method {
GET 0;
default 1;
}
The map directive is used to constrain the only accepted HTTP method to a GET.
map $http_up42_order_info $valid_order_info {
default 0;
~*^(?<order_id>[[:xdigit:]\-]+),[[:space:]]?(?<order_status>(fulfilled|failed_permanently))$ '"$order_id","$order_status"';
}
We are intercepting the UP42-Order-Info
header field, which is a
list and storing it in two named captures: $order_status
and
$order_id
, the names are self-explanatory in terms of what they represent
for the UP42 webhook. Bear in mind that we are tracking order
execution, and therefore we need both the order ID and the
corresponding status. If the header
is either missing or malformed, then $up42_order_info
is 0, otherwise
it has as value string "$order_if, order_info"
.
log_format order_logs '["$date_gmt", "$order_id", "$order_status"]';
We are using a log format that is JSONLines.
Server context
## Only GET is allowed here.
if ($not_allowed_method) {
return 405 '{"code": 405, "msg": "method not allowed"}';
}
Any request that is not GET is declined with a 405 status.
location /input {
default_type application/json;
## Check to see if the received order information is valid.
if ($valid_order_info = 0) {
return 400 '{"code": 400, "msg": "Incorrect order ID and/or status"}';
}
## Log all orders that are properly communicated.
access_log /var/log/nginx/up42_order_log.jsonl order_logs if=$valid_order_info;
## Echo back the received order information.
return 200 '{"code": 200, "msg": [$valid_order_info]}';
}
If the header with the order information is either missing or malformed, we signal an error with a 400 status.
If the order information is correct, then we log the JSONLine with the
string in $valid_order_info
.
location ~* /ping {
default_type text/plain;
return 200 'PONG';
}
We send a 200 status with PONG
to confirm that the server is up and running.
Setting up ngrok
On the free version of ngrok you can only have one and only one tunnel active at each time.
The process of setting up of ngrok was covered in detail in a previous blog post.
Creating a tunnel from the ngrok server to our nginx instance
After setting up ngrok we do:
ngrok http --basic-auth='<username>:<password>' http://localhost:9898
Where:
username
: username for the basic authentication.password
: password for the basic authentication.
At the end, we put the address of the local nginx instance, which in
this case is listening on localhost, port 9898
.
Once launched, the terminal is taken over by ngrok
and you should
have a console view of the currently running tunnel.
For additional details on running ngrok, please refer to a previous blog post.
Setting up Pipedream
In a previous article, we explained in detail how to set up a pipedream workflow. Building on that explanation, we'll now set up a workflow for our local nginx instance, which we have launched before.
Setup of the required Pipedream workflow and environment variables
Setup of the Pipedream workflow is covered extensively in the previous blog post.
The setup done there carries over, except for the environment
variable related to the secret, named, up42_webhook_secret_drei
.
Webhook secret Pipedream environment variable
Pipedream HTTP client to forward UP42 order information to ngrok
The Python code to run on Pipedream is in the snippets
directory.
The snippet named pipedream_up42_webhook_order_status_handler.py
validates the webhook request as described
elsewhere
The snippet named
pipedream_order_info_http_forwarder.py
creates the UP42-Order-Info
header field and forwards the request to
our local nginx instance.
# Simple HTTP client to forward an order related webhook information,
# as a custom HTTP header, to our development server running
# locally. It receives the webhook body from the previous step that
# validates the webhook.
# pipedream serialized objects.
from pipedream.script_helpers import (steps, export)
import os
from contextlib import suppress
# httpx and friends.
import httpx
from httpx_auth import Basic
def get_url(u):
"""Return the local development url.
"""
return f"{os.environ['local_dev_host']}{u}"
# The user and password for the ngrok basic authentication
# are set as pipedream environment variables.
ngrok_basic_auth = Basic(os.environ["ngrok_basic_auth_user"], os.environ["ngrok_basic_auth_pass"])
# Test that the local server is up and running.
r = httpx.get(get_url("/ping"), auth=ngrok_basic_auth)
# If we get the proper response proceed.
assert r.status_code == 200 and r.text == "PONG", f"{os.environ['local_dev_host']} is not reachable."
# Issue the POST request in the given context.
with httpx.Client() as client:
try:
# Build an header that packs the order status information.
body = steps["trigger"]["event"]["body"]["body"]
up42_order_info_header = {"UP42-Order-Info": f"{body['orderId']}, {body['status']}"}
# Request the URL from the local server that will log the order status.
r = client.get(get_url("/input"), auth=ngrok_basic_auth, headers=up42_order_info_header)
# Raise an exception for any non 2XX status code.
r.raise_for_status()
except httpx.HTTPStatusError as exc:
print(f"Error {exc.response.status_code} while requesting {exc.request.url!r}.")
except httpx.HTTPError as exc:
print(f"Error while requesting {exc.request.url!r}.")
# Return the response body. Continue to returning None even is
# there is a value error, e..g, response is not JSON.
with suppress(ValueError):
export("ngrok_response", r.json())
Carrying on with the SDK and a Jupyter notebook
From here on we continue this recipe exclusively on the Jupyter notebook.
Image credits
Kitchen counter by Andy Chilton on Unsplash.