Object localization in images using simple CNNs and Keras
This project shows how to localize objects in images by using simple convolutional neural networks.
Before getting started, we have to download a dataset and generate a csv file containing the annotations (boxes).
First, let's look at YOLOv2's approach:
We proceed in the same way to build the object detector:
The code in this repository uses MobileNetv2, because it is faster than other models and the performance can be adapted. For example, if alpha = 0.35 with 96x96 is not good enough, one can just increase both values (see here for a comparison). If you use another architecture, change
In the following images red is the predicted box, green is the ground truth:
This time we have to run the scripts
In order to distinguish between classes, we have to modify the loss function. I'm using here
w_1*log((y_hat - y)^2 + 1) + w_2*FL(p_hat, p)where
w_1 = w_2 = 1are two weights and
FL(p_hat, p) = -(0.9(1 - p_hat)^2 p*log(p_hat) + 0.1*p_hat^2(1 - p)log(1-p_hat))(focal loss).
Instead of using all 37 classes, the code will only output class 0 (contains only class 0) or class 1 (contains class 1 to 36). However, it is easy to extend this to more classes (use categorical cross entropy instead of focal loss and try out different weights).
In this example, we use a skip-net architecture similar to U-Net. For an in-depth explanation see my blog post.
This example is based on the three YOLO papers. For an in-depth explanation see this blog post.
example_4the same code can be added to the other examples