AirDet

Abstract

Few-shot object detection has attracted increasing attention and rapidly progressed in recent years. However, the requirement of an exhaustive offline fine-tuning stage in existing methods is time-consuming and significantly hinders their usage in online applications such as autonomous exploration of low-power robots.

We find that their major limitation is that the little but valuable information from a few support images is not fully exploited.

To solve this problem, we propose a brand new architecture, AirDet, and surprisingly find that, by learning class-agnostic relation with the support images in all modules, including cross-scale object proposal network, shots aggregation module, and localization network, AirDet without fine-tuning achieves comparable or even better results than many fine-tuned methods, reaching up to 30-40% improvements. We also present solid results of onboard tests on real-world exploration data from the DARPA Subterranean Challenge, which strongly validate the feasibility of AirDet in robotics.

To the best of our knowledge, AirDet is the first feasible few-shot detection method for autonomous exploration of low-power robots. The code and pre-trained models are also released.

Contribution

We propose the first feasible model, AirDet, for robotic exploration, which enables few-shot detection w/o fine-tuning.

We propose "class-agnostic relation", including spatial relation and channel relation, which are basic building blocks of AirDet.

Exhaustive experiments on COCO, Pascal VOC, LVIS datasets, and DARPA SubT tests validate the feasibility and superiority of AirDet.

We also provide ROS wrapper of AirDet for the robotics community.

Method

The pipeline of the autonomous exploration task and the framework of AirDet.

During exploration, a few prior raw images that potentially contain novel objects (helmet) are sent to a human user first. Provided with online annotated few-shot data, the robot explorer is able to detect those objects by observing its surrounding environment.

AirDet includes 4 modules, i.e., the shared backbone, support-guided cross-scale (SCS) feature fusion module for region proposal, global-local relation (GLR) module for shots aggregation, and relation-based detection head, which are visualized by different colors.

Qualitative Results

3-shot Detection in SubT Challenge

The provided support images and examples of detection results in the real-world tests. AirDet is robust to distinct object scales and different illumination conditions.

Attention of Proposal Gneration

Compared with A-RPN, AirDet can better notice and concentrate on and novel object region, which leads to its more effective region proposals.

Attention of Detection Head

With similar proposals in red boxes, AirDet can focus more precisely on the most representative part of the object, resulting in more accurate box regression and classification.

BibTeX

@inproceedings{Li2022ECCV,
      author    = {Li, Bowen and Wang, Chen and Reddy, Pranay and Kim, Seungchan and Scherer, Sebastian},
      title     = {AirDet: Few-Shot Detection without Fine-tuning for Autonomous Exploration},
      booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
      year      = {2022}
  }

Acknowledgements

The work was done when Bowen Li and Pranay Reddy were interns at The Robotics Institute, CMU. The authors would like to thank all members of the Team Explorer for providing data collected from the DARPA Subterranean Challenge. Our code is built upon FewX, for which we sincerely express our gratitute to the authors.