Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
MahmudulAlam committed Oct 17, 2019
2 parents f424f87 + b377436 commit 8052bd9
Showing 1 changed file with 65 additions and 0 deletions.
65 changes: 65 additions & 0 deletions SOLO/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
## SOLO
SOLO stands for [```single object localization```](https://github.com/MahmudulAlam/Fingertip-Mixed-Reality/tree/master/SOLO)
which has been developed for fast and efficient multi-class single object detection.
In this case, for the purpose of localizing hand for gesture recognition and fingertip detection, it is employed for the purpose of
hand detection. SOLO divides the input image into grid cells and predicts the probability of having an object in each grid cell.
Each grid cell is mapped in the output matrix of the CNN using binary representation. Grid cells that have at least ```50%```
overlapping with the hand bounding box are labeled as binary :one: and the rest of the cells are labeled as binary :zero:


<p align="center">
<img src="https://user-images.githubusercontent.com/37298971/65219825-3b5aa500-dadb-11e9-92f3-a1f75eef1b5a.jpg" width="550">
</p>

## Network Architecture
For prediction, a fully convolutional network (FCN) is designed based on the common knowledge of the field. The network architecture is
inspired by the visual geometry group (VGG) model for image classification.

```
input = Input(shape=(416, 416, 3))
# Block 01
x = Conv2D(64, (3, 3), strides=(2, 2), activation='relu', padding='same')(input)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x)
# Block 02
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x)
# Block 03
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x)
# Block 04
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2))(x)
# Block 05
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(1, (1, 1), activation='sigmoid')(x)
output = Reshape((13, 13), name='output')(x)
model = Model(input, output)
```

Here, in each of the convolutional layer ```3 x 3``` filters are used and followed by a rectified linear unit (ReLU) activation
function except for the final convolutional layer where ```1 x 1``` filter is used to keep the output size same as the desired output
size and the sigmoid activation function is applied for normalized output.

## Loss Function
To train the network, following binary cross-entropy loss function is defined where ![H](https://latex.codecogs.com/gif.latex?%7B%5Cmathbb%7BH%7D%7D)
and ![H^hat](https://latex.codecogs.com/gif.latex?%5Chat%7B%5Cmathbb%7BH%7D%7D) are the ground truth and the predicted
output matrix of the SOLO. Here, N and M represent the length of the output matrix and the batch size respectively.

<p align="center">
<img src="https://user-images.githubusercontent.com/37298971/65953360-9be7cb80-e465-11e9-8e9a-838d9cc75b1c.jpg" width="650">
</p>

The loss function is optimized using the ADAM optimizer with standard hyperparameter values.

0 comments on commit 8052bd9

Please sign in to comment.