Efficient and On-Device AI Agents

NeurIPS 2026 Workshop · Sydney, Australia

"Can AI agents run on your phone? Efficient architectures, on-device reasoning, and multi-agent systems for real-world edge deployment."

December 11–12, 2026

ondevice.agents.neurips2026@gmail.com

News

  • May 2026 Website launched. The workshop website is now live. Stay tuned for updates on invited speakers, the submission portal, and schedule.
  • Jul 15, 2026 Call for Papers opens — submission portal will be available on OpenReview. See the Call for Papers section for topics and guidelines.
  • Jul 15, 2026 Call for Reviewers opens — we are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up via the link in the Program Committee section.
  • Aug 29, 2026 Submission deadline. All papers due by 11:59 PM AoE.
  • Dec 11–12, 2026 Workshop at NeurIPS 2026 — Sydney, Australia.

About

AI agents — systems that perceive, reason, plan, and act autonomously — have advanced dramatically with the rise of large language models. Yet the dominant paradigm relies on powerful cloud infrastructure, creating fundamental barriers to real-world deployment: latency, privacy exposure, connectivity dependence, and prohibitive energy costs. At the same time, the on-device ML community has developed a rich toolkit of compression, quantization, and efficient inference techniques — but largely without the agentic use case in mind.

This workshop brings together these two communities to address a timely and underexplored question: how do we build AI agents that run efficiently on resource-constrained hardware — smartphones, laptops, wearables, robots, and embedded systems?

Why Now?

  • Inflection point: Small language models (SLMs) such as Phi-3, Gemini Nano, and Llama 3.2 have crossed a capability threshold that makes on-device agentic behavior plausible for the first time.
  • On-device AI products: Apple Intelligence, Qualcomm AI Hub, and Google's on-device Gemini demonstrate strong industry momentum — but agentic capabilities remain largely cloud-dependent.
  • Regulatory pressure: The EU AI Act and emerging data-sovereignty regulations create strong incentives for local, privacy-preserving inference.

Key Open Problems

Architecture Design

How do we design agent architectures — memory, planning, tool use — that fit within the strict compute and memory budgets of edge devices?

Evaluation & Benchmarks

Existing benchmarks (WebArena, OSWorld, AgentBench) assume cloud-scale models. How do we evaluate agents under real hardware constraints?

Multi-Agent Coordination

When multiple small agents collaborate on a device or across a local network, how do we orchestrate them efficiently without a central cloud coordinator?

Privacy & Security

On-device agents handle sensitive personal data. What are the unique privacy and adversarial robustness challenges at the edge?

Topics of Interest

We welcome submissions on (but not limited to) the following topics

  • Efficient model architectures for on-device agents — quantization, pruning, knowledge distillation, and small language models (SLMs) tailored for agentic tasks on resource-constrained hardware.
  • Training and fine-tuning under constraints — reinforcement learning, instruction tuning, and parameter-efficient fine-tuning methods adapted for edge deployment scenarios.
  • Efficient reasoning and planning — chain-of-thought compression, early exit strategies, speculative decoding, and lightweight planning algorithms for on-device agents.
  • Memory management and context compression — KV-cache optimization, retrieval-augmented generation (RAG) at the edge, and long-context compression for memory-limited devices.
  • Multi-agent orchestration on-device — coordination protocols, task decomposition, and communication-efficient frameworks for multiple agents running on local hardware.
  • Tool use and function calling under latency constraints — efficient tool selection, API call batching, and latency-aware function calling under strict response-time budgets.
  • Benchmarks and evaluation — new benchmarks and evaluation protocols for on-device agents, extending WebArena, OSWorld, and AgentBench to hardware-constrained settings.
  • Privacy-preserving and secure on-device agents — differential privacy, federated learning, adversarial robustness, and data-sovereignty techniques for agents handling sensitive personal data.
  • Real-world applications — deployed systems and case studies in mobile, robotics, wearables, automotive, and IoT domains where on-device agents provide tangible value.

Call for Papers

We invite submissions on all aspects of efficient and on-device AI agents

Key Dates

Jul 15
CFP Opens
Aug 29
Submission Deadline
Sep 26
Notification of Acceptance
Oct 10
Camera-Ready Deadline
Dec 11–12
Workshop at NeurIPS 2026

Submission Formats

Short Paper

4 pages

+ unlimited references

Long Paper

9 pages

+ unlimited references

Submission Guidelines

  • Double-blind review: Submissions must be anonymized. Author names and affiliations should not appear in the paper.
  • Platform: All submissions via OpenReview. Each paper will receive at least 2–3 reviews.
  • Non-archival: Workshop papers are non-archival. Accepted papers may be submitted to venues with archival proceedings.
  • Concurrent submissions: Papers under review at other venues are welcome, subject to those venues' policies.
  • LLM disclosure: Authors must disclose any use of large language models in the writing process, per NeurIPS 2026 guidelines.
  • Format: Use the official NeurIPS 2026 LaTeX style file.
Best Paper Award

A Best Paper Award will be selected by the organizing committee based on reviewer feedback and presentation quality.

Contributed Spotlight Presentations

The best submissions will be invited for a contributed spotlight presentation at the workshop. Selection is based on novelty, technical quality, and relevance to the workshop's core themes.

Submit on OpenReview

Submission portal opens July 15, 2026

Invited Speakers

Leading researchers from the efficiency and agentic AI communities.

Subbarao Kambhampati
Subbarao Kambhampati
Arizona State University
Full Professor

Research interests: AI planning & multi-step reasoning, LLM-based agents, neuro-symbolic AI

Talk topic to be announced

Jianghao Lin
Jianghao Lin
Shanghai Jiao Tong University
Assistant Professor

Research interests: LLM agents, efficient inference, sequential decision-making

Talk topic to be announced

Shiqi Jiang
Shiqi Jiang
Microsoft Research
Senior Researcher

Research interests: Edge AI efficiency, on-device systems

Talk topic to be announced

Yiming Yang
Yiming Yang
Carnegie Mellon University
Full Professor

Research interests: LLM agents, RL for agentic reasoning, tool use & long-horizon planning

Talk topic to be announced

Danqi Chen
Danqi Chen
Princeton University
Associate Professor

Research interests: LLM training & alignment, efficient deployment, agentic tasks, retrieval-augmented generation

Talk topic to be announced

Peter Belcak
Peter Belcak
NVIDIA Research
Senior Researcher

Research interests: Efficient deep learning, agentic systems, tool use & function calling

Talk topic to be announced

Workshop Schedule

Full-day workshop — December 11 or 12, 2026 · Sydney, Australia

Morning Session

Time Activity
09:00–09:10 Opening Remarks
09:10–09:35 Invited Talk 1
09:35–10:00 Invited Talk 2
10:00–11:30 Coffee Break + Poster Session 1
11:30–11:55 Invited Talk 3
11:55–12:20 Invited Talk 4
12:20–12:40 Best Paper Award Presentation

Afternoon Session

Time Activity
12:40–13:40 Lunch Break
13:40–14:05 Invited Talk 5
14:05–14:55 Contributed Spotlight Talks (5 × 10 min)
14:55–16:25 Coffee Break + Poster Session 2
16:25–16:50 Invited Talk 6
16:50–17:20 Structured Debate Panel
17:20–17:30 Closing Remarks
Invited Talks & Debate Panel Contributed Spotlight Presentations Best Paper Award

Organizers

A diverse team spanning industry and academia across three continents

Davide Belli
Davide Belli
Qualcomm AI Research, Europe
Staff Research Scientist

Staff Research Scientist at Qualcomm AI Research, working at the intersection of model efficiency and agentic AI for real-world deployment on mobile and edge hardware. His research spans LLM compression, tool-calling agents, and hybrid cloud-edge multi-agent systems.

Asim Munawar
Asim Munawar
IBM Research, US
Team Lead, Agentic AI

Technical Lead for Agentic AI at IBM Research. His research spans large language model agents, enterprise AI deployment, and efficient inference systems, with contributions to the Granite model family and IBM's agentic AI platform.

Weiwen Liu
Weiwen Liu
Shanghai Jiao Tong University, China
Associate Professor

Associate Professor at Shanghai Jiao Tong University, with prior industry experience at Huawei Noah's Ark Lab. Her research focuses on post-training and reinforcement learning of AI agents, tool learning, and multi-agent collaboration, with contributions including ToolACE and ACEBench.

Yuanchun Li
Yuanchun Li
Tsinghua University, China
Assistant Professor

Assistant Professor at Tsinghua University. His research focuses on on-device AI agents, resource-efficient large language models, and intelligent mobile systems, with contributions to premier academic conferences and industry white papers on mobile AI systems.

Ruisi Cai
Ruisi Cai
University of Texas at Austin, US
PhD Student

PhD student at the University of Texas at Austin, advised by Prof. Atlas Wang. Her research focuses on LLM efficiency, model compression, and efficient inference, with multiple publications on pruning, quantization, and efficient training of large language models.

Weiwei Sun
Weiwei Sun
Carnegie Mellon University, US
PhD Student

PhD student at Carnegie Mellon University. His research focuses on long-horizon reasoning agents, context folding for efficient multi-turn interactions, and human-agent collaboration, with a focus on making agents that can reason over extended horizons without prohibitive compute costs.

Program Committee

We thank the following researchers who have agreed to serve on the program committee.

Call for Reviewers

We are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up using the button below.

Join the Program Committee
Basu, Kinjal IBM Research
Battle, Alex Qualcomm AI Research
Cesa, Gabriele Qualcomm AI Research
Conchello Vendrell, Victor Qualcomm AI Research
Dong, Yixin Carnegie Mellon University
Du, Weihua Carnegie Mellon University
Gangavarapu, Tushaar University of Texas at Austin
Hehn, Thomas Qualcomm AI Research
Jalalirad, Amir Qualcomm AI Research
Jiang, Yuan-Hao East China Normal University
Kong, Rui Baidu
Kuzmin, Andrey Qualcomm AI Research
Li, Pingzhi University of North Carolina
Li, Sijie Carnegie Mellon University
Li, Xiangyu Tsinghua University
Liu, Guohong Tsinghua University
Liu, Jiacheng Peking University
Liu, Jiarui Carnegie Mellon University
Liu, Jiaqi Shanghai Jiao Tong University
Major, Bence Qualcomm AI Research
Massoli, Fabio Valerio Qualcomm AI Research
Orekondy, Tribhuvanesh Qualcomm AI Research
Padres, Arnaud Qualcomm AI Research
Pratik, Kumar Qualcomm AI Research
Priya, Shriti IBM Research
Rainone, Corrado Qualcomm AI Research
Ro, Yeonju University of Texas at Austin
Shao, Shuai Shanghai Jiao Tong University
Shi, Zhengliang Carnegie Mellon University
Song, Yuanyi Shanghai Jiao Tong University
Sun, Haojia Carnegie Mellon University
Swaminathan, Sarath IBM Research
Tian, Shizuo Tsinghua University
Torres, Aleix Qualcomm AI Research
Wang, Kevin University of Texas at Austin
Wang, Yichuan UC Berkeley
Wang, Yuehao University of Texas at Austin
Wen, Hao Tsinghua University
Yan, Jerry Carnegie Mellon University
Yuan, Yizhen Tsinghua University
Zhang, Genghan Stanford University
Zhang, Kangning Shanghai Jiao Tong University
Zhang, Zijian University of Minnesota
Zheng, Congmin Shanghai Jiao Tong University
Zhou, Dan ByteDance
Zhu, Jiajun University of Texas at Austin
Zuo, Yushen Hong Kong Polytechnic University