Efficient and On-Device AI Agents

Name: Efficient and On-Device AI Agents — NeurIPS 2026 Workshop
Start: 2026-12-11
End: 2026-12-12
Location: NeurIPS 2026

NeurIPS 2026 Workshop · Sydney, Australia

"Can AI agents run on your phone? Efficient architectures, on-device reasoning, and multi-agent systems for real-world edge deployment."

December 11–12, 2026

Call for Papers Call for Reviewers OpenReview

ondevice.agents.neurips2026@gmail.com

News

May 2026 Website launched. The workshop website is now live. Stay tuned for updates on invited speakers, the submission portal, and schedule.
Jul 15, 2026 Call for Papers opens — submission portal will be available on OpenReview. See the Call for Papers section for topics and guidelines.
Jul 15, 2026 Call for Reviewers opens — we are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up via the link in the Program Committee section.
Aug 29, 2026 Submission deadline. All papers due by 11:59 PM AoE.
Dec 11–12, 2026 Workshop at NeurIPS 2026 — Sydney, Australia.

About

AI agents — systems that perceive, reason, plan, and act autonomously — have advanced dramatically with the rise of large language models. Yet the dominant paradigm relies on powerful cloud infrastructure, creating fundamental barriers to real-world deployment: latency, privacy exposure, connectivity dependence, and prohibitive energy costs. At the same time, the on-device ML community has developed a rich toolkit of compression, quantization, and efficient inference techniques — but largely without the agentic use case in mind.

This workshop brings together these two communities to address a timely and underexplored question: how do we build AI agents that run efficiently on resource-constrained hardware — smartphones, laptops, wearables, robots, and embedded systems?

Why Now?

Inflection point: Small language models (SLMs) such as Phi-3, Gemini Nano, and Llama 3.2 have crossed a capability threshold that makes on-device agentic behavior plausible for the first time.
On-device AI products: Apple Intelligence, Qualcomm AI Hub, and Google's on-device Gemini demonstrate strong industry momentum — but agentic capabilities remain largely cloud-dependent.
Regulatory pressure: The EU AI Act and emerging data-sovereignty regulations create strong incentives for local, privacy-preserving inference.

Key Open Problems

Architecture Design

How do we design agent architectures — memory, planning, tool use — that fit within the strict compute and memory budgets of edge devices?

Evaluation & Benchmarks

Existing benchmarks (WebArena, OSWorld, AgentBench) assume cloud-scale models. How do we evaluate agents under real hardware constraints?

Multi-Agent Coordination

When multiple small agents collaborate on a device or across a local network, how do we orchestrate them efficiently without a central cloud coordinator?

Privacy & Security

On-device agents handle sensitive personal data. What are the unique privacy and adversarial robustness challenges at the edge?

Topics of Interest

We welcome submissions on (but not limited to) the following topics

Efficient model architectures for on-device agents — quantization, pruning, knowledge distillation, and small language models (SLMs) tailored for agentic tasks on resource-constrained hardware.
Training and fine-tuning under constraints — reinforcement learning, instruction tuning, and parameter-efficient fine-tuning methods adapted for edge deployment scenarios.
Efficient reasoning and planning — chain-of-thought compression, early exit strategies, speculative decoding, and lightweight planning algorithms for on-device agents.
Memory management and context compression — KV-cache optimization, retrieval-augmented generation (RAG) at the edge, and long-context compression for memory-limited devices.
Multi-agent orchestration on-device — coordination protocols, task decomposition, and communication-efficient frameworks for multiple agents running on local hardware.

Tool use and function calling under latency constraints — efficient tool selection, API call batching, and latency-aware function calling under strict response-time budgets.
Benchmarks and evaluation — new benchmarks and evaluation protocols for on-device agents, extending WebArena, OSWorld, and AgentBench to hardware-constrained settings.
Privacy-preserving and secure on-device agents — differential privacy, federated learning, adversarial robustness, and data-sovereignty techniques for agents handling sensitive personal data.
Real-world applications — deployed systems and case studies in mobile, robotics, wearables, automotive, and IoT domains where on-device agents provide tangible value.

Call for Papers

We invite submissions on all aspects of efficient and on-device AI agents

Key Dates

Jul 15

CFP Opens

Aug 29

Submission Deadline

Sep 26

Notification of Acceptance

Oct 10

Camera-Ready Deadline

Dec 11–12

Workshop at NeurIPS 2026

Submission Formats

Short Paper

4 pages

+ unlimited references

Long Paper

9 pages

+ unlimited references

Submission Guidelines

Double-blind review: Submissions must be anonymized. Author names and affiliations should not appear in the paper.
Platform: All submissions via OpenReview. Each paper will receive at least 2–3 reviews.
Non-archival: Workshop papers are non-archival. Accepted papers may be submitted to venues with archival proceedings.
Concurrent submissions: Papers under review at other venues are welcome, subject to those venues' policies.
LLM disclosure: Authors must disclose any use of large language models in the writing process, per NeurIPS 2026 guidelines.
Format: Use the official NeurIPS 2026 LaTeX style file.

Best Paper Award

A Best Paper Award will be selected by the organizing committee based on reviewer feedback and presentation quality.

Contributed Spotlight Presentations

The best submissions will be invited for a contributed spotlight presentation at the workshop. Selection is based on novelty, technical quality, and relevance to the workshop's core themes.

Submit on OpenReview

Submission portal opens July 15, 2026

Invited Speakers

Leading researchers from the efficiency and agentic AI communities.

Subbarao Kambhampati

Arizona State University

Full Professor

Research interests: AI planning & multi-step reasoning, LLM-based agents, neuro-symbolic AI

Talk topic to be announced

Jianghao Lin

Shanghai Jiao Tong University

Assistant Professor

Research interests: LLM agents, efficient inference, sequential decision-making

Talk topic to be announced

Shiqi Jiang

Microsoft Research

Senior Researcher

Research interests: Edge AI efficiency, on-device systems

Talk topic to be announced

Yiming Yang

Carnegie Mellon University

Full Professor

Research interests: LLM agents, RL for agentic reasoning, tool use & long-horizon planning

Talk topic to be announced

Danqi Chen

Princeton University

Associate Professor

Research interests: LLM training & alignment, efficient deployment, agentic tasks, retrieval-augmented generation

Talk topic to be announced

Peter Belcak

NVIDIA Research

Senior Researcher

Research interests: Efficient deep learning, agentic systems, tool use & function calling

Talk topic to be announced

Workshop Schedule

Full-day workshop — December 11 or 12, 2026 · Sydney, Australia

Morning Session

Time	Activity
09:00–09:10	Opening Remarks
09:10–09:35	Invited Talk 1
09:35–10:00	Invited Talk 2
10:00–11:30	Coffee Break + Poster Session 1
11:30–11:55	Invited Talk 3
11:55–12:20	Invited Talk 4
12:20–12:40	Best Paper Award Presentation

Afternoon Session

Time	Activity
12:40–13:40	Lunch Break
13:40–14:05	Invited Talk 5
14:05–14:55	Contributed Spotlight Talks (5 × 10 min)
14:55–16:25	Coffee Break + Poster Session 2
16:25–16:50	Invited Talk 6
16:50–17:20	Structured Debate Panel
17:20–17:30	Closing Remarks

Invited Talks & Debate Panel Contributed Spotlight Presentations Best Paper Award

Organizers

A diverse team spanning industry and academia across three continents

Davide Belli

Qualcomm AI Research, Europe

Staff Research Scientist

Staff Research Scientist at Qualcomm AI Research, working at the intersection of model efficiency and agentic AI for real-world deployment on mobile and edge hardware. His research spans LLM compression, tool-calling agents, and hybrid cloud-edge multi-agent systems.

Asim Munawar

IBM Research, US

Team Lead, Agentic AI

Technical Lead for Agentic AI at IBM Research. His research spans large language model agents, enterprise AI deployment, and efficient inference systems, with contributions to the Granite model family and IBM's agentic AI platform.

Weiwen Liu

Shanghai Jiao Tong University, China

Associate Professor

Associate Professor at Shanghai Jiao Tong University, with prior industry experience at Huawei Noah's Ark Lab. Her research focuses on post-training and reinforcement learning of AI agents, tool learning, and multi-agent collaboration, with contributions including ToolACE and ACEBench.

Yuanchun Li

Tsinghua University, China

Assistant Professor

Assistant Professor at Tsinghua University. His research focuses on on-device AI agents, resource-efficient large language models, and intelligent mobile systems, with contributions to premier academic conferences and industry white papers on mobile AI systems.

Ruisi Cai

University of Texas at Austin, US

PhD Student

PhD student at the University of Texas at Austin, advised by Prof. Atlas Wang. Her research focuses on LLM efficiency, model compression, and efficient inference, with multiple publications on pruning, quantization, and efficient training of large language models.

Weiwei Sun

Carnegie Mellon University, US

PhD Student

PhD student at Carnegie Mellon University. His research focuses on long-horizon reasoning agents, context folding for efficient multi-turn interactions, and human-agent collaboration, with a focus on making agents that can reason over extended horizons without prohibitive compute costs.

Program Committee

We thank the following researchers who have agreed to serve on the program committee.

Call for Reviewers

We are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up using the button below.

Join the Program Committee

Basu, Kinjal IBM Research

Battle, Alex Qualcomm AI Research

Cesa, Gabriele Qualcomm AI Research

Conchello Vendrell, Victor Qualcomm AI Research

Dong, Yixin Carnegie Mellon University

Du, Weihua Carnegie Mellon University

Gangavarapu, Tushaar University of Texas at Austin

Hehn, Thomas Qualcomm AI Research

Jalalirad, Amir Qualcomm AI Research

Jiang, Yuan-Hao East China Normal University

Kong, Rui Baidu

Kuzmin, Andrey Qualcomm AI Research

Li, Pingzhi University of North Carolina

Li, Sijie Carnegie Mellon University

Li, Xiangyu Tsinghua University

Liu, Guohong Tsinghua University

Liu, Jiacheng Peking University

Liu, Jiarui Carnegie Mellon University

Liu, Jiaqi Shanghai Jiao Tong University

Major, Bence Qualcomm AI Research

Massoli, Fabio Valerio Qualcomm AI Research

Orekondy, Tribhuvanesh Qualcomm AI Research

Padres, Arnaud Qualcomm AI Research

Pratik, Kumar Qualcomm AI Research

Priya, Shriti IBM Research

Rainone, Corrado Qualcomm AI Research

Ro, Yeonju University of Texas at Austin

Shao, Shuai Shanghai Jiao Tong University

Shi, Zhengliang Carnegie Mellon University

Song, Yuanyi Shanghai Jiao Tong University

Sun, Haojia Carnegie Mellon University

Swaminathan, Sarath IBM Research

Tian, Shizuo Tsinghua University

Torres, Aleix Qualcomm AI Research

Wang, Kevin University of Texas at Austin

Wang, Yichuan UC Berkeley

Wang, Yuehao University of Texas at Austin

Wen, Hao Tsinghua University

Yan, Jerry Carnegie Mellon University

Yuan, Yizhen Tsinghua University

Zhang, Genghan Stanford University

Zhang, Kangning Shanghai Jiao Tong University

Zhang, Zijian University of Minnesota

Zheng, Congmin Shanghai Jiao Tong University

Zhou, Dan ByteDance

Zhu, Jiajun University of Texas at Austin

Zuo, Yushen Hong Kong Polytechnic University