Scene Segmentation and Object Detection
The robot has to know its environment before
taking an action, a sensor is required is to perceive the environment and know
what things exist. In our case, we use a 2D camera to know where and how the
objects and the robots are positioned. We use Mask R-CNN to perform
instance segmentation and object detection or use YOLO for object detection.
Mask R-CNN is
divided into two modules, first, it estimates the regions where the objects can
exist on the input image. Second, based on the initial estimation it identifies
the class of the object and generates a mask in the pixel level. In the initial
step, the RPN (Residual Pooling Network) scans all FPN (Feature Pyramid Network)
in a top-bottom approach and estimates where the objects exist on the input
image. Once the estimation is done a bounding box is assigned to the anchor
(anchors are a set of boxes with predefined locations). RPN helps in the anchor to
decide where in the feature map an object and bounding box should be located.
In most scenarios after processing the downsized, therefore we need to
up-sample the so that the objects in the original image and features are not
messed around. After estimating the location of the objects in the first stage,
in the second stage, specific areas of the feature map are scanned and generate
objects classes, bounding boxes and masks. Below is an illustration on Mask
RCNN structure. As mentioned below in Figure 3. we have two stages and how
they process the image.
Figure 3 Illustration of Mask RCNN Structure
YOLO
The YOLO framework (You Only Look Once), deals with object detection in a different way. It takes the entire image in a single instance and predicts the bounding box coordinates and class probabilities for these boxes. The biggest advantage of using YOLO is its superb speed – it’s incredibly fast and can process 45 frames per second. YOLO also understands generalized object representation. This is one of the best algorithms for object detection and has shown a comparatively similar performance to the R-CNN algorithms.
Here is a link that shows all the classes that YOLO can detect by default: https://github.com/pjreddie/darknet/blob/1e729804f61c8627eb257fba8b83f74e04945db7/data/9k.names
Examples of few images:
Next update it will be streamlined to detect the classes required for our project such as pens, books eraser etc.
References:- https://pjreddie.com/darknet/yolo/
Comments
Post a Comment