0x561 Task

1. Classification
- 1.1. Localization
2. Detection
3. Segmentation
- 3.1. Semantic Segmentation
- 3.2. Instance Segmentation
4. Scene Understanding
- 4.1. Depth Estimation
5. Generation
- 5.1. Text-to-Image

1. Classification

1.1. Localization

localization task is to identify the boundary box of a given single object

2. Detection

Unlike the localization task, the detection task should detect multiple objects

Task (Object Detection) The task of object detection is as follows:

input: an RGB image
output: A set of detected objects, for each object we have an category label (from fixed, known set of categories) and Bounding box (x, y, width, height)

The challenges of the object detection task is

multiple outputs: need to output variable numbers of objects per image
multiple output types: need to predict what and where
large images: need a higher resolution for detection often ~800x600 (classification task: 224x224)

A simple approach is to use the sliding window: apply a CNN to many different crops of images to classify for each crop. However, this would generate too many boxes therefore not feasible.

Metric (IoU Intersection Over Union) compute the overlap between the ground truth box and the prediction box. IoU > 0.5 is "decent", 0.7 is "pretty good", 0.9 is "almost perfect"

\[\frac{\text{Area of Intersection}}{\text{Area of Union}}\]

Metric (mAP: Mean Average Precision) compute AP (area under precision recall area), for each category and take mean. In COCO MAP, repeat this for different IoU threshold

0x561 Task

1. Classification

1.1. Localization

2. Detection

3. Segmentation

3.1. Semantic Segmentation

3.2. Instance Segmentation

4. Scene Understanding

4.1. Depth Estimation

5. Generation

5.1. Text-to-Image