We are very proud to share that work by the UP42 Data Science team has resulted in a scientific publication for the first time.
Not only that, our work about super-resolution will be presented at the XXIV ISPRS congress Virtual Event of the 2020 presentations from August 31 to September 2 2020. The session about our paper will take place 18:30 on August 31.
A link to the full paper can be found at the end of this article but for those who are interested in the topic and cannot attend the virtual event —we want to share a small summary. The described algorithms are available via the Super-resolution Pleiades/SPOT block on UP42 so it is possible to actually try them out.
Super-resolution (SR) is the process of deriving images of higher resolution (HR) by applying an algorithm to a low resolution (LR) image. Single image SR approaches do so by using a single LR image; this is considered a classical problem in computer vision.
Parallel to many other computer vision problems, SR approaches employing deep convolutional neural networks (CNNs) outperformed other techniques over the course of the last few years. SR can be used to improve the results of CNN-based object detection such as ships, airplanes, or cars.
In order to have any value, a CNN-based solution has to be better than a standard upsampling operation. Below you will find an image processed using bicubic resampling and a CNN called Super-Resolution Residual Network (SRResNet, Ledig et al. 2017) for comparison.
Almost all SR approaches coming out of the computer vision domain create HR/LR images pairs by simply downsampling the HR images. Visual comparison is one evaluation method, but only metrics allow us to compare results in a reliable manner. SR approaches are evaluated mostly by using the metrics Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM) which are also displayed at the top of the images above.
As deep learning algorithms for super-resolution originated in the computer vision domain, they are primarily developed on RGB images in 8-bit color depth, where the distance from sensor to camera is several meters.
When applying these algorithms to satellite images several challenges need to be addressed:
• Multispectral and hyperspectral data has a higher dimensionality ranging from four to dozens of bands. • Analytic satellite image products are calibrated so that they represent a physical unit, either surface reflectance or absolute radiance. These are encoded in 12-bit. • Atmospheric conditions, haze, clouds and cloud shadows add additional variation to measured values. • Land cover characteristics vary globally to a high degree. A model trained on images of temperate forest areas in Europe might completely fail when applied to images of tropical forests in e.g. South-East Asia.
To tackle the problems mentioned above we came up with two guiding ideas:
- Use pan-sharpened imagery for training. Instead of using pairs of downsampled/original imagery we use multispectral/pan-sharpened pairs from the Pléiades sensor (at 2m/0.5m resolution).
- Adjust the best computer vision algorithms to work with satellite imagery. An extensive literature review was done to identify the best performing models used in the computer vision domain.
Here some examples of LR/HR images that we used:
Starting in 2014 CNNs are used for SR. Since then, a wide variety of network architectures were applied to the problem. We identified the three most commonly used classes of CNN architectures: “standard” CNNs, residual networks and autoencoders.
In the end, we decided to use four different network architectures representing these three classes:
- SRCNN: shallow standard CNN (Dong et al. 2014)
- SRResNet: deep residual network (Ledig et al. 2017)
- RedNet: deep autoencoder network (Mao et al. 2016)
- AESR: shallow autoencoder network (our own design)
The first three architectures represent the mentioned general classes. We added AESR because our literature review indicated that autoencoder networks are particularly well suited for SR tasks. By including AESR we wanted to find out if a much simpler autoencoder would produce similar results than a very deep and complex one.
This is the architecture of the autoencoder network (AESR). Yellow rectangles represent convolutional layers with kernel size k, n feature maps and stride s. Purple rectangles represent max pooling and green rectangles upsampling layers. Arrows indicate skip connections.
We applied all four models to a number of images representing different land cover classes and computed the already mentioned metrics PSNR and SSIM.
For comparison purposes we also did the same using simple bicubic resampling. The findings were unfortunately not conclusive and we therefore decided to include two additional and more modern image similarity measures: the Feature Similarity Index (FSIM) and Information theoretic-based Statistic Similarity Measure (ISSM)
Here the results became clearer: the two autoencoder models produced best results with RedNet being slightly ahead of AESR. The simple SRCNN model still produces good results, while SRResNet interestingly falls behind.
We made the code for metric calculation Open Source (available here) as we thought this might be of interest for others as well, the metrics can be used for different kinds of image improvement and restoration approaches.
Here are a few examples showing how the results of the best performing RedNet model look like. From left to right: multispectral (2m resolution), pan-sharpened (0.5m resolution) and super-resolved (0.125m resolution).
In summary: we introduced a new method for the super-resolution of multispectral satellite images that takes advantage of the panchromatic band using pan-sharpening for creating training data. We also compare different CNN architectures and find thatautencoder-based models perform best.
The full paper can be found here:
Müller, M. U., Ekhtiari, N., Almeida, R. M., and Rieke, C.: SUPER-RESOLUTION OF MULTISPECTRAL SATELLITE IMAGES USING CONVOLUTIONAL NEURAL NETWORKS, ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci., V-1-2020, 33–40, https://doi.org/10.5194/isprs-annals-V-1-2020-33-2020, 2020.
Dong, C., Loy, C.C., He, K., Tang, X., 2014. Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 295–307.
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W., 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114.
Mao, X., Shen, C., Yang, Y.-B., 2016. Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections, in: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 2802–2810.