Skip to content

iLearn-Lab/NeurIPS25-Embodied-Crowd-Counting

Repository files navigation

Embodied Crowd Counting

Interactive crowd counting dataset and an MLLM-driven coarse-to-fine counting agent.

Authors

Runling Long1, Yunlong Wang1, Jia Wan1*, Xiang Deng1, Xingting Zhu2, Weili Guan1, Antoni B. Chan2, Liqiang Nie1

1 <Harbin Institute of Technology, Shenzhen>
2 <City University of Hong Kong>
* Corresponding author

Links

  • Paper: Paper Link
  • Hugging Face Dataset: We will open-source the dataset this May. Sorry for the delay.
  • Code Repository: GitHub

Table of Contents


Updates

  • [04/2026] Initial release

Introduction

This is the official implementation for Embodied Crowd Counting.

Occlusion is one of the fundamental challenges in crowd counting. In the community, various data-driven approaches have been developed to address this issue, yet their effectiveness is limited. This is mainly because most existing crowd counting datasets on which the methods are trained are based on passive cameras, restricting their ability to fully sense the environment. Recently, embodied navigation methods have shown significant potential for precise object detection in interactive scenes. These methods incorporate active camera settings, holding promise for addressing the fundamental issues in crowd counting. However, most existing methods are designed for indoor navigation, showing unknown performance in analyzing complex object distributions in large-scale scenes, such as crowds. In addition, most existing embodied navigation datasets are indoor scenes with limited scale and object quantity, preventing them from being introduced into dense crowd analysis. Based on this, a novel task, Embodied Crowd Counting (ECC), is proposed to count the number of persons in a large-scale scene actively. We then build an interactive simulator, the Embodied Crowd Counting Dataset (ECCD), which enables large-scale scenes and large object quantities. A prior probability distribution approximating a realistic crowd distribution is introduced to generate crowds. Then, a zero-shot navigation method (ZECC) is proposed as a baseline. This method contains an MLLM-driven coarse-to-fine navigation mechanism, enabling active Z-axis exploration, and a normal-line-based crowd distribution analysis method for fine-grained counting. Experimental results show that the proposed method achieves the best trade-off between counting accuracy and navigation cost.

We present the dataset and the method implementation in this page.


Highlights

  • We provide the full pipeline of our method.
  • The dataset will be ready for further studies.

Framework

Framework Figure

Framework Figure 1. The proposed framework. First, ATE is proposed to estimate the global crowd distribution efficiently. Then, NLBN is proposed to generate fine observation points, alleviating crowd overlap. The final result is generated by aggregating all fine detections.


Project Structure

.
├── Config.yml
├── Main.py
├── Agent/
│   ├── Agent.py
│   ├── Prompts.py
│   ├── Prompts2.py
│   └── Prompts_count.py
├── assests/
├── Count/
│   ├── Count.py
│   ├── CountDraw.py
│   └── utils.py
├── Drone/
│   ├── Control.py
│   └── utils.py
├── Explore/
│   ├── Explore.py
│   ├── Frontier.py
│   ├── LowDG.py
│   ├── OurExplore.py
│   ├── path_3D.py
│   └── Target.py
├── Methods/
│   ├── FBE.py
│   ├── FBEConfig.yml
│   ├── FBEWithDG.py
│   ├── FBEWithDGConfig.yml
│   ├── OurMethod.py
│   └── OurMethodConfig.yml
├── Others/
│   ├── DensityGuided/
│   ├── DroneLift/
│   ├── IntuitionMap/
│   └── ValueMap/
├── Perception/
│   ├── capture_360.py
│   ├── GeneralizedLoss.py
│   ├── GPT.py
│   ├── GroundingDINO.py
│   └── Video.py
├── Point_cloud/
│   ├── Map_element.py
│   └── Point_cloud.py
├── Simulator/
│   ├── get_env_point_cloud.py
│   └── Simulator.py
├── utils/
│   ├── flight.py
│   ├── logger.py
│   ├── saver.py
│   └── video.py
├── Vision_models/
│   ├── GeneralizedLoss/
│   └── GroundingDINO/
├── Visualization/
│   └── Visulization.py
├── requirements.txt
├── LICENSE

Installation

Note that this project currently supports Windows only.

1. Clone the repository

git clone https://github.com/iLearn-Lab/NeurIPS25-Embodied-Crowd-Counting.git
cd NeurIPS25-Embodied-Crowd-Counting

2. Install dependencies

pip install -r requirements.txt

TODO

  • Upload the dataset.
  • Complete the instructions.

Citation

@article{long2025embodied,
  title={Embodied Crowd Counting},
  author={Long, Runling and Wang, Yunlong and Wan, Jia and Deng, Xiang and Zhu, Xinting and Guan, Weili and Chan, Antoni B and Nie, Liqiang},
  journal={arXiv preprint arXiv:2503.08367},
  year={2025}
}

Acknowledgement

  • Thanks to our supervisor and collaborators for valuable support.

License

This project is released under the Apache License 2.0.

About

[NeurIPS25] Official Implementation for [Embodied Crowd Counting].

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors