Satellite Image Classification using semantic segmentation methods in deep learning
The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:
Satellite Image Classification, InterIIT Techmeet 2018, IIT Bombay.
This repository contains the implementation of two algorithms namely U-Net: Convolutional Networks for Biomedical Image Segmentation and Pyramid Scene Parsing Network modified for the problem of satellite image classification.
main_unet.py: Python code for training the algorithm with U-Net architecture including the encoding of the ground truths.
unet.py: Contains our implementation of U-Net layers.
test_unet.py: Code for Testing, calculating accuracies, calculating confusion matrices for training and validation and saving predictions by the U-Net model on training, validation and testing images.
Inter-IIT-CSRE: Contains all the training, validation ad testing data.
Comparison_Test.pdf: Side by side comparison of the test data with the U-Net model predictions on the data.
train_predictions: U-Net Model predictions on training and validation images.
plots: Accuracy and loss plots for training and validation for U-Net architecture.
Test_outputs: Contains test images and their predictions b the U-Net model.
images_for_doc: Contains several images for documentation.
PSPNet: Contains training files for implementation of PSPNet algorithm to satellite image classification.
Clone the repository, change your present working directory to the cloned directory. Create folders with names
test_outputsto save model predicted outputs on training and testing images (Not required now as the repo already contains these folders)
$ git clone https://github.com/manideep2510/eye-in-the-sky.git $ cd eye-in-the-sky $ mkdir train_predictions $ mkdir test_outputs
For training the U-Net model and saving weights, run the below command
$ python3 main_unet.py
To test the U-Net model, calculating accuracies, calculating confusion matrices for training and validation and saving predictions by the model on training, validation and testing images.
$ python3 test_unet.py
You might get an error
xrange is not definedwhile running our code. This error is not due to errors in our code but due to not up to date python package named
libtiff(some parts of the source code of the package are in python2 and some are in python3) which we used to read the dataset which in which the images are in .tif format. We were not able to use other libraries like openCV or PIL to read the images as they are not properly supporting to read the 4-channel .tif images.
This error can be resolved by editing the source code of the
Go to the file in the source code of the library from where the error arises (the file name will be displayed in the terminal when it is showing the error) and replace all the
xrange()(python2) functions in the file to
We are providing some reasonably good pre-trained weights here so that the users don't need to train from scratch.
| Description | Task | Dataset | Model | | ------------- | ----------------- | ------------------- | ------------------------------------------------------------ | | UNet Architecture | Satellite Image Classification | IITB dataset (Referdownload (.h5) |
Let's now discuss
1. What this project is about,
2. Architectures we have used and experimented with and
3. Some novel training strategies we have used in the project
Remote sensing is the science of obtaining information about objects or areas from a distance, typically from aircraft or satellites.
We realized the problem of satellite image classification as a semantic segmentation problem and built semantic segmentation algorithms in deep learning to tackle this.
The ground truths provided are 3 channel RGB images. In the current dataset, there are only 9 unique RGB values in the ground truths as there are 9 classes that are to be classified. These 9 different RGB values are one-hot encoded to generate a 9 channel encoded ground truth with each channel representing a particular class.
Below is the encoding scheme
Realisation of each channel in the encoded ground truth as a class
So instead of training on the RGB values of the ground truth we have converted them into the one-hot values of different classes. This approach yielded us a validation accuracy of 85% and training accuracy of 92% compared to 71% validation accuracy and 65% training accuracy when we were using the RGB ground truth values for training.
This might be due to decrease in variance and mean of the ground truth of training data as it acts as an effective normalization technique. The better performance of this training technique is also because the model is giving an output with 9 feature maps each map indicating a class, i.e, this training technique is acting as if the model is trained on each of the 9 classes separately for some extent(but here definitely the prediction on one channel which corresponds to a particular class depends on others).
Our results on PSPNet for Satellite image classification:
Training Accuracy - 49% Validation Accuracy - 60%
For training we have used first 13 images in the dataset and for validation, 14th image is used.
The reason we have considered only one image (14th image) as validation set is because it is one of the smallest images in the dataset and we do not want to leave less data fo training as the dataset is pretty small. The validation set (14th image) we have considered does not have 3 classes (Bare soil, Rail, Swimmimg poll) in it which have pretty high training accuracies. The validation accuracy would have been better if we would have considered a image with all the classes in it(No image in the dataset contains all the classs, there is atleast one class missing in all the images).
The Strided Cropping:
To have sufficient training data from the given high definition images cropping is required to train the classifier which has about 31M parameters of our U-Net implementation. The crop size of 64x64 we find under-representation of the individual classes and the geometry and continuity of the objects is lost, decreasing the field of view of the convolutions.
Using a cropping window of 128x128 pixels with a stride of 32 resultant of 15887 training 414 validation images.
Before cropping, the dimensions of training images are converted into multiples of stride for convenience during strided cropping.
For the cases where the no. of crops is not the multiple of the image dimensions we initially tried zero padding , we realised that adding padding will add unwanted artefacts in the form of black pixels in training and test images leading to training on false data and image boundary.
Alternatively we have correctly changed the image dimensions by adding extra pixels in the right most side and bottom of the image. So we padded the difference from the left most part of the image to it’s right deficit end and similarly for the top and bottom of the image.
Training Example 1: Image '2.tif' from training data
Training Example 2: Image '4.tif' from training data
Validation Example: Image '14.tif' from dataset
Our model is able to predict some classes which a human annotator wasn't able to. The un-identifiable classes in the images are labeled as white pixels by the human annotator. Our model is able to predict some of these white pixels correctly as some class, but this caused a decrease in the overall accuracy as the white pixels are considered as a seperate class by the model.
Here the model is able to predict the white pixels as a building which is correct and can be clearly seen in the input image
|<!---Solarized dark||Solarized Ocean|
Kappa Coefficients With and Without considering the unclassified pixels
Overall Accuracy With and Without considering the unclassified pixels
Need to add regularization methods like L2 regularizarion and droupout and check the performance
Implement an algorithm to automatically detect all the unique RGB values in the ground truths and onehot encode them instead of manually finding the RGB values.
 U-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, and Thomas Brox
 Pyramid Scene Parsing Network, Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
 A 2017 Guide to Semantic Segmentation with Deep Learning, Sasank Chilamkurthy