Efficient and On-Device AI Agents
NeurIPS 2026 Workshop · Sydney, Australia
"Can AI agents run on your phone? Efficient architectures, on-device reasoning, and multi-agent systems for real-world edge deployment."
News
- May 2026 Website launched. The workshop website is now live. Stay tuned for updates on invited speakers, the submission portal, and schedule.
- Jul 15, 2026 Call for Papers opens — submission portal will be available on OpenReview. See the Call for Papers section for topics and guidelines.
- Jul 15, 2026 Call for Reviewers opens — we are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up via the link in the Program Committee section.
- Aug 29, 2026 Submission deadline. All papers due by 11:59 PM AoE.
- Dec 11–12, 2026 Workshop at NeurIPS 2026 — Sydney, Australia.
About
AI agents — systems that perceive, reason, plan, and act autonomously — have advanced dramatically with the rise of large language models. Yet the dominant paradigm relies on powerful cloud infrastructure, creating fundamental barriers to real-world deployment: latency, privacy exposure, connectivity dependence, and prohibitive energy costs. At the same time, the on-device ML community has developed a rich toolkit of compression, quantization, and efficient inference techniques — but largely without the agentic use case in mind.
This workshop brings together these two communities to address a timely and underexplored question: how do we build AI agents that run efficiently on resource-constrained hardware — smartphones, laptops, wearables, robots, and embedded systems?
Why Now?
- Inflection point: Small language models (SLMs) such as Phi-3, Gemini Nano, and Llama 3.2 have crossed a capability threshold that makes on-device agentic behavior plausible for the first time.
- On-device AI products: Apple Intelligence, Qualcomm AI Hub, and Google's on-device Gemini demonstrate strong industry momentum — but agentic capabilities remain largely cloud-dependent.
- Regulatory pressure: The EU AI Act and emerging data-sovereignty regulations create strong incentives for local, privacy-preserving inference.
Key Open Problems
Architecture Design
How do we design agent architectures — memory, planning, tool use — that fit within the strict compute and memory budgets of edge devices?
Evaluation & Benchmarks
Existing benchmarks (WebArena, OSWorld, AgentBench) assume cloud-scale models. How do we evaluate agents under real hardware constraints?
Multi-Agent Coordination
When multiple small agents collaborate on a device or across a local network, how do we orchestrate them efficiently without a central cloud coordinator?
Privacy & Security
On-device agents handle sensitive personal data. What are the unique privacy and adversarial robustness challenges at the edge?
Topics of Interest
We welcome submissions on (but not limited to) the following topics
- Efficient model architectures for on-device agents — quantization, pruning, knowledge distillation, and small language models (SLMs) tailored for agentic tasks on resource-constrained hardware.
- Training and fine-tuning under constraints — reinforcement learning, instruction tuning, and parameter-efficient fine-tuning methods adapted for edge deployment scenarios.
- Efficient reasoning and planning — chain-of-thought compression, early exit strategies, speculative decoding, and lightweight planning algorithms for on-device agents.
- Memory management and context compression — KV-cache optimization, retrieval-augmented generation (RAG) at the edge, and long-context compression for memory-limited devices.
- Multi-agent orchestration on-device — coordination protocols, task decomposition, and communication-efficient frameworks for multiple agents running on local hardware.
- Tool use and function calling under latency constraints — efficient tool selection, API call batching, and latency-aware function calling under strict response-time budgets.
- Benchmarks and evaluation — new benchmarks and evaluation protocols for on-device agents, extending WebArena, OSWorld, and AgentBench to hardware-constrained settings.
- Privacy-preserving and secure on-device agents — differential privacy, federated learning, adversarial robustness, and data-sovereignty techniques for agents handling sensitive personal data.
- Real-world applications — deployed systems and case studies in mobile, robotics, wearables, automotive, and IoT domains where on-device agents provide tangible value.
Call for Papers
We invite submissions on all aspects of efficient and on-device AI agents
Key Dates
Submission Formats
Short Paper
4 pages
+ unlimited references
Long Paper
9 pages
+ unlimited references
Submission Guidelines
- Double-blind review: Submissions must be anonymized. Author names and affiliations should not appear in the paper.
- Platform: All submissions via OpenReview. Each paper will receive at least 2–3 reviews.
- Non-archival: Workshop papers are non-archival. Accepted papers may be submitted to venues with archival proceedings.
- Concurrent submissions: Papers under review at other venues are welcome, subject to those venues' policies.
- LLM disclosure: Authors must disclose any use of large language models in the writing process, per NeurIPS 2026 guidelines.
- Format: Use the official NeurIPS 2026 LaTeX style file.
A Best Paper Award will be selected by the organizing committee based on reviewer feedback and presentation quality.
The best submissions will be invited for a contributed spotlight presentation at the workshop. Selection is based on novelty, technical quality, and relevance to the workshop's core themes.
Submission portal opens July 15, 2026
Invited Speakers
Leading researchers from the efficiency and agentic AI communities.
Research interests: AI planning & multi-step reasoning, LLM-based agents, neuro-symbolic AI
Talk topic to be announced
Research interests: LLM agents, efficient inference, sequential decision-making
Talk topic to be announced
Research interests: Edge AI efficiency, on-device systems
Talk topic to be announced
Research interests: LLM agents, RL for agentic reasoning, tool use & long-horizon planning
Talk topic to be announced
Research interests: LLM training & alignment, efficient deployment, agentic tasks, retrieval-augmented generation
Talk topic to be announced
Research interests: Efficient deep learning, agentic systems, tool use & function calling
Talk topic to be announced
Workshop Schedule
Full-day workshop — December 11 or 12, 2026 · Sydney, Australia
Morning Session
| Time | Activity |
|---|---|
| 09:00–09:10 | Opening Remarks |
| 09:10–09:35 | Invited Talk 1 |
| 09:35–10:00 | Invited Talk 2 |
| 10:00–11:30 | Coffee Break + Poster Session 1 |
| 11:30–11:55 | Invited Talk 3 |
| 11:55–12:20 | Invited Talk 4 |
| 12:20–12:40 | Best Paper Award Presentation |
Afternoon Session
| Time | Activity |
|---|---|
| 12:40–13:40 | Lunch Break |
| 13:40–14:05 | Invited Talk 5 |
| 14:05–14:55 | Contributed Spotlight Talks (5 × 10 min) |
| 14:55–16:25 | Coffee Break + Poster Session 2 |
| 16:25–16:50 | Invited Talk 6 |
| 16:50–17:20 | Structured Debate Panel |
| 17:20–17:30 | Closing Remarks |
Organizers
A diverse team spanning industry and academia across three continents
Staff Research Scientist at Qualcomm AI Research, working at the intersection of model efficiency and agentic AI for real-world deployment on mobile and edge hardware. His research spans LLM compression, tool-calling agents, and hybrid cloud-edge multi-agent systems.
Associate Professor at Shanghai Jiao Tong University, with prior industry experience at Huawei Noah's Ark Lab. Her research focuses on post-training and reinforcement learning of AI agents, tool learning, and multi-agent collaboration, with contributions including ToolACE and ACEBench.
Assistant Professor at Tsinghua University. His research focuses on on-device AI agents, resource-efficient large language models, and intelligent mobile systems, with contributions to premier academic conferences and industry white papers on mobile AI systems.
PhD student at the University of Texas at Austin, advised by Prof. Atlas Wang. Her research focuses on LLM efficiency, model compression, and efficient inference, with multiple publications on pruning, quantization, and efficient training of large language models.
PhD student at Carnegie Mellon University. His research focuses on long-horizon reasoning agents, context folding for efficient multi-turn interactions, and human-agent collaboration, with a focus on making agents that can reason over extended horizons without prohibitive compute costs.
Program Committee
We thank the following researchers who have agreed to serve on the program committee.
We are looking for qualified reviewers to join the program committee. If you are interested in reviewing for this workshop, please sign up using the button below.
Join the Program Committee