When working with geospatial data, especially satellite imagery, some basic tasks can become quite cumbersome at scale, such as time series analysis or mosaicking. Imagine we want to apply an analysis on data either at different times or places or even both. Running each of the cases in a row is certainly an option as long as you have a limited number of them. However, when this number increases, running the same step can be very time consuming and lead to potential errors in the analysis. Furthermore, accessing and analyzing data can be expensive, and without being able to check the availability and quality of data beforehand, you might spend money without achieving any desirable outcome.
As data scientists, we have some tricks up our sleeve that can alleviate some of the friction in these processes. For example, the possibility to access data, do analysis or execute a task, such as applying a preprocessing algorithm, in parallel. Furthermore, where possible, we use tools that enable us to validate the availability and quality of data needed for our analysis before we purchase the data.
The ability to parallelize tasks and validate the availability and quality of data are now available natively through the UP42 Python SDK with the launch of our Parallelization and Quicklooks features, respectively. If you are new to the UP42 platform and haven't heard about UP42 Python SDK, you can have a look at this blog post.
In a nutshell, the UP42 Python SDK will allow you to customize, automate, and integrate your geospatial analysis easily while leveraging UP42's robust infrastructure.
The Fast: Parallelization of Jobs
Generally, you might be only interested in running one or two workflows at a time. However, in some cases, you might want to run many jobs in parallel. For instance, when you want to do time series analysis for images at one location or apply the same processing algorithm with images from different locations.
In these cases, running each job one by one can become quite a pain. Enter the new feature of our Python SDK, called Parallelization.
As the name suggests, this allows you to run more than one job at a time. You can think of it as a batch of jobs that are running simultaneously. You can specify the maximum number of jobs within each batch you want to run in your project setting via the UP42 console or simply using the SDK to update it (see below). If you specify the maximum number of jobs to ten, the first ten jobs will be executed simultaneously. As soon as these ten jobs finish, the next ten jobs will be executed, and so on.
Below is an example of running two jobs using parallelization:
Import up42
up42.authenticate(project_id="12345", project_api_key="6889")
project = up42.initialize_project()
#Do the following step to be able to use the parallelization feature
project.update_project_settings(max_concurrent_jobs=10)
workflow = project.create_workflow("workflow_airports", use_existing=True)
selected_block = "sobloo-s2-l1c-aoiclipped"
workflow.add_workflow_tasks([selected_block])
workflow.get_workflow_tasks(basic=True)
#Below is how you would construct parameters and run jobs in parallel and finally downloading the results
input_parameters_list= workflow.construct_parameters_parallel(geometries=geom.geometry.to_list(),
interval_dates=[("2018-01-01","2020-12-31")],
geometry_operation="bbox")
#Running job in parallel
real_jobs = workflow.run_jobs_parallel(input_parameters_list=input_parameters_list)
real_jobs.download_results()
print("finished")
To run several jobs simultaneously, you need to create a list of parameters for each of the jobs (using construct_parameters_parallel
) and pass this list run_jobs_parallel
function.
Using this, you can save time, especially when you need to run analysis in large numbers. Please be aware that while each batch of parallel jobs runs, you can not run another job. This will raise an error saying you have reached the limit of the maximum number of concurrent jobs.
The Fabulous: Quicklooks On The Map
When working with imagery, such as satellite or drone-based imagery, you generally want to validate image quality and ensure all your analysis requirements are met. For instance, you might need to have a cloud-free image, or you want to make sure that the selected image fully covers your area of interest (AOI). This validation step is crucial, especially when you are paying for the images.
Until now, the UP42 Python SDK has the option to plot the extent of images that intersects your AOI. This can be useful when you want to check the intersection of the image with your AOI. But what if you want to have a quick overview of the quality of the images?
This is why we have developed a new feature for the UP42 Python SDK to map quicklooks. This feature allows you to show a preview of all the quicklooks for the images that cover your AOI. This can be used within Jupyter Notebook to create an interactive map. You can also export an HTML file, which you will be able to open in your browser and have an overview of the selected images.
One example of the power of quicklooks is when you want to do mosaicking (for more information on what mosaicking is and what you can achieve by using it, have a look at this blog post). For mosaicking, you are interested in images that cover your AOI and have the best quality (for instance, minimum cloud cover, haze, or shadow). Here, having a map of quicklooks might help you have a quick overview of potential images and check if one has drastically different light, incidence angle, or color, which might affect the analysis before spending credits for purchasing them.
To use this, you can search the UP42 image archive to get all possible images covering your AOI. Below you will find a code snippet demonstrating how you can get a quicklook map via the UP42 Python SDK:
import up42
up42.authenticate(project_id="1234", project_api_key="ABCD")
catalog = up42.initialize_catalog()
aoi = up42.read_vector_file("dakar.geojson", as_dataframe=False)
search_parameters = catalog.construct_parameters(geometry=aoi,
start_date="2018-01-01",
end_date="2020-12-31",
sensors=["pleiades"],
max_cloudcover=20,
sortby="cloudCoverage",
limit=10)
search_results = catalog.search(search_parameters=search_parameters)
catalog.download_quicklooks(image_ids=search_results.id.to_list(),
sensor="pleiades")
a = catalog.map_quicklooks(scenes=search_results, aoi=aoi, save_html=".")
Here is how the quicklook map would look either in your Jupyter Notebook or browser:
Quicklook of the coast of Dakar
To view this quicklook in an interactive map in your browser, download the HTML file
Outlook
This blog post covers two new tools from the UP42 Python SDK—Parallelization and Quicklooks. It has been one year since the release of the Python SDK. We are hard at work, adding more useful and exciting features that help you develop more complex analysis via the UP42 infrastructure. Here are two improvements we have in the roadmap for the SDK:
Mock test objects for integration tests
This will enable developers to mock SDK objects that are interacting with UP42 infrastructure (a lamoto library that is used for mocking tests based on AWS infrastructure).
Price estimation for jobs
This will help users to have a rough estimation of the cost of running a workflow. This will enable users to run analysis with the confidence that they will not exceed their available credits on the UP42 platform.