| title | layout | permalink |
|---|---|---|
Accepted Papers |
home |
/papers/ |
| Title | Authors |
|---|---|
| A Benchmark for Language Models in Real-World System Building | Weilin Jin, Chenyu Zhao, Zeshun Huang, Chaoyun Zhang, Qingwei Lin, Chetan Bansal, Saravan Rajmohan, Shenglin Zhang, Yongqian Sun, Dan Pei, Yifan Wu, Tong Jia, Ying Li, Zhonghai Wu, Minghua Ma |
| A Spec-Driven Workflow for AI-Assisted Domain-Driven Development: Insights from Practice 📝 | Jefferson de Barros Santos |
| Achieving Productivity Gains with AI-based IDE features: A Journey at Google | Maxim Tabachnyk, Xu Shu, Alexander Frömmgen, Pavel Sychev, Vahid Meimand, Ilia Krets, Stanislav Pyatykh, Abner Araujo, Kristof Molnar, Satish Chandra |
| An Automated Methodology for Generating Labeled Datasets of Semantic Errors in Code | Mahmoud Kassem, Francisco Ribeiro, Sarah Nadi |
| An Empirical Study of C to Rust Translation using Local Large-Language Models | Nathan Rutherford, Dan O'Keeffe |
| An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code | Sophie Weidmann, Fernando Castor |
| Benchmarking LLM Commit Message Generation through a Developer-centric Pairwise Preference Framework | Lucas Aguiar, Matheus Freitas, Matheus Paixao, Rafael Carmo |
| Code Roulette: How Prompt Variability Affects LLM Code Generation | Andrei Paleyes, Diana Robinson, Radzim Sendyka, Christian Cabrera, Neil D. Lawrence |
| Code vs Serialized AST Inputs for LLM-Based Code Summarization: An Empirical Study | Shijia Dong, Haoruo Zhao, Paul Harvey |
| ContextPilot: Code Context Engineering with Memory-Augmented Exploration Agents 📝 | Shuzheng Gao, Chaozheng Wang, Shuqing Li, Yun Peng, Michael R. Lyu |
| Continuous Benchmark Generation for Evaluating Enterprise-scale LLM Agents 📝 | Divyanshu Saxena, Rishikesh Maurya, Xiaoxuan Ou, Gagan Somashekar, Shachee Mishra Gupta, Arun Iyer, Yu Kang, Chetan Bansal, Aditya Akella, Saravan Rajmohan |
| CP-Agent: Agentic Constraint Programming | Stefan Szeider |
| Diverse LLMs vs. Vulnerabilities: Who Detects and Fixes Them Better? | Arastoo Zibaeirad, Marco Vieira |
| Do LLMs Dream of Energy-Efficient Code? | Antimo Di Bernardo, Gianluca Capozzi, Pasquale De Rosa, Daniele Cono D'Elia, Leonardo Querzoni, Giuseppe Antonio Di Luna, Valerio Schiavoni |
| English or Chinese? Investigating the Impact of Prompt Language on Large Language Models for Code Summarization 📝 | Yijia Tang, Zhiqiu Huang, Jian Xie, Yaoshen Yu, Bowei Xia, Enya Shen, Yukun Cao |
| Evaluating LLMs-Driven Java Code Refactoring from a Developer’s Perspective 💬 | Javel Freitas, Guilherme Pereira, Lara Lima, Caio Rian de Sousa, Edivar Filho, José Cezar de Souza Filho, Paulo Henrique Maia, Carla Bezerra |
| Learning Functional Equivalence via Supervised Contrastive Code-Problem Alignment | Siu Wun Cheung, Harshitha Menon |
| LLM-Driven SQL Remediation: Towards Safe and Explainable Code for Automated Schema Refactoring | Antony Medeiros, Claudio Cavalcante, Nicolaas Ruberg, Sergio Lifschitz |
| LLM-Powered On-Demand Test Suites in Self-Graded Student Programming Assignments 💬 | Chang Liu |
| MAsFL: Data-Secure, Efficient and Accurate Fault Localization with Multi-Agent Small Language Models | DUONG PHAM DUC, HIROSHI SATO, MASAO KUBO |
| Multi-task Code LLMs: Data Mix or Model Merge? | Mingzhi Zhu, Michele Merler, Stacy Patterson, Raju Pavuluri, Rahul Krishna, Boris Sobolev |
| Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures | Amirkia Rafiei Oskooei, S. Selcan Yukcu, Mehmet Cevheri Bozoglan, Mehmet S. Aktas |
| RAG Against the Machine: Zero-Shot Software Vulnerabilities Classification using LLMs | Edvin Nordqvist, Changjie Wang, Simone Ferlin, Mariano Scazzariello, Marco Chiesa |
| RubberDuckBench: A Benchmark for AI Coding Assistants | Elizabeth Dinella, Ferida Mohammed, Fatma Ayad, Satish Chandra, Petros Maniatis |
| SecRepoBench: Benchmarking Code Agents for Secure Code Completion in Real-World Repositories | Chihao Shen, Connor Dilgren, Purva Chiniya, Luke Griffith, Yu Ding, Yizheng Chen |
| Statistical Independence Aware Caching for LLM Workflows | Yihan Dai, Dimitrios Stamatios Bouras, Haoxiang Jia, Sergey Mechtaev |
| The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution | Norbert Tihanyi, Bilel Cherif, Mohamed Amine Ferrag, Richard A. Dubniczky, Tamas Bisztray |
| Towards Improving in-IDE Code Completion for Driver Development | Batuhan Raif Karagoz, Mahesh Jayasankar, Saurabh Bodhe, Subhayan Roy, Lejin Varghese, Max Kiehn, Yonas Bedasso |
| Towards LLM-guided Semantic Validation of Autonomous Driving Safety Policies 📝 | Qingzhao Zhang, Z. Morley Mao |
| TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization | Haonan Li, Keyu Man, Partha Kanuparthy, Hanning Chen, Wei Sun, Sreen Tallam, Chenguang Zhu, Kevin Zhu, Zhiyun Qian |
| Usage, Effects and Requirements for AI Coding Assistants in the Enterprise: An Empirical Study | Michele Merler, Rangeet Pan, Rahul Krishna, Tin Kam Ho, Raju Pavuluri, Maja Vukovic |
Papers are sorted alphabetically (📝 short paper, 💬 extended abstract).