title stringlengths 1 214 | abstract stringlengths 1 4.31k | year int64 2.03k 2.03k | url stringlengths 42 42 | pdf stringlengths 0 71 | authors listlengths 0 84 | venue stringclasses 2
values | venueid stringclasses 1
value | invitation stringlengths 85 335 | venue_type stringclasses 1
value | reviews listlengths 0 9 | num_reviews int64 0 9 | _bibtex stringlengths 112 601 | _bibkey stringlengths 7 45 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Your Language Model Secretly Contains Personality Subnetworks | Large Language Models (LLMs) demonstrate remarkable flexibility in adopting different personas and behaviors. Existing approaches typically adapt such behavior through external knowledge such as prompting, retrieval-augmented generation (RAG), or fine-tuning. We ask: do LLMs really need external context or parameters to adapting to different behaviors, or do they already have such knowledge embedded to their parameters?
In this work, we show that LLMs already contain persona-specialized subnetworks in their parameter space. Using small calibration datasets, we identify distinct activation signatures associated with different personas. Guided by these statistics, we develop masking strategy that isolate lightweight persona subnetworks. Building on the findings, we further discuss: how can we discover opposing sub-network from the model that lead to binary-opposing personas, such as introvert-extrovert?
To further enhance separation in binary opposition scenarios, we introduce a contrastive pruning strategy that identifies parameters responsible for the statistical divergence between opposing personas. Our method is entirely training-free, and rely solely on the language model's existing parameter space. Across diverse evaluation settings, the resulting subnetworks exhibit significantly stronger persona alignment than baselines that requires external knowledge while being more efficient. Our findings suggest that diverse human-like behaviors are not merely induced in LLMs, but are already embedded in their parameter space—pointing toward a new perspective on controllable and interpretable personalization in large language models. Our code is available at https://anonymous.4open.science/r/C694. | 2,026 | https://openreview.net/forum?id=zzo3Sy3NSX | https://openreview.net/pdf/fe6fc58735330235254f4523254d472b1e04288d.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission4956/-/Full_Submission'] | poster | [
{
"confidence": 2,
"date": 0,
"rating": 4,
"review": "",
"review_id": "f8eJZxPaAh",
"reviewer": "ICLR.cc/2026/Conference/Submission4956/Reviewer_NkPg",
"strengths": "Compared to past prompt-based methods, this paper's approach of calculating a mask via pruning allows for the low-cost cre... | 4 | @inproceedings{
anonymous2025your,
title={Your Language Model Secretly Contains Personality Subnetworks},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zzo3Sy3NSX},
note={under review}
} | anonymous2025your |
Polychromic Objectives for Reinforcement Learning | Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks. These pretrained policies, trained on large datasets, produce generations with a broad range of promising but unrefined behaviors. Often, a critical failure mode of RLFT arises when policies lose this diversity and collapse into a handful of easily exploitable outputs. This convergence hinders exploration, which is essential for expanding the capabilities of the pretrained policy and for amplifying the benefits of test-time compute scaling. To address this, we introduce an objective for policy gradient methods that explicitly enforces the exploration and refinement of diverse generations, which we call a polychromic objective. We then show how proximal policy optimization (PPO) can be adapted to optimize this objective. Our method (1) employs vine sampling to collect on-policy rollouts and (2) modifies the advantage function to reflect the advantage under our new objective. Experiments on BabyAI, Minigrid, and Algorithmic Creativity show that our method improves success rates by reliably solving a larger set of environment configurations and generalizes better under large perturbations. Moreover, when given multiple attempts in pass@$n$ experiments, the policy achieves substantially higher coverage, demonstrating its ability to maintain and exploit a diverse repertoire of strategies. | 2,026 | https://openreview.net/forum?id=zzTQISAGUp | https://openreview.net/pdf/647c24c93d1ac3d8bfc1d3f206a448e32bd03f47.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission23782/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission23782/-/Rebuttal_Revision'] | poster | [
{
"confidence": 3,
"date": 0,
"rating": 2,
"review": "",
"review_id": "DiRMNEHQhO",
"reviewer": "ICLR.cc/2026/Conference/Submission23782/Reviewer_Bmic",
"strengths": "The notion of set RL seems appealing and could inspire novel learning approaches that are distinct from existing classica... | 4 | @inproceedings{
anonymous2025polychromic,
title={Polychromic Objectives for Reinforcement Learning},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zzTQISAGUp},
note={under review}
} | anonymous2025polychromic |
vAttention: Verified Sparse Attention via Sampling | State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based estimation. However, these approaches are fundamentally limited in their ability to approximate full attention: they fail to provide consistent approximations across heads and query vectors and, most critically, lack guarantees on approximation quality, limiting their practical deployment. We observe that top-$k$ and random sampling are complementary: top-$k$ performs well when attention scores are dominated by a few tokens, whereas random sampling provides better estimates when attention scores are relatively uniform. Building on this insight and leveraging the statistical guarantees of sampling, we introduce vAttention, the first practical sparse attention mechanism with user-specified $(\epsilon, \delta)$ guarantees on approximation accuracy. These guarantees make vAttention a compelling step toward practical, reliable deployment of sparse attention at scale. By unifying top-k and sampling, vAttention outperforms both individually, delivering a superior quality–efficiency trade-off. Our experiments show that vAttention significantly improves the quality of sparse attention (e.g., $\sim$4.5 percentage points for Llama-3.1-8B-Inst and Deepseek-R1-Distill-Llama-8B on RULER-HARD ), and effectively bridges the gap between full and sparse attention (e.g., across datasets, it matches full model quality at 10x–20x sparsity). We also demonstrate that it can be deployed in long-generation scenarios to achieve fast decoding without compromising model quality (e.g., vAttention achieves full model quality on AIME2024 at 10\% sparsity with up to 32K token generations). | 2,026 | https://openreview.net/forum?id=zzTDulLys0 | https://openreview.net/pdf/11280b5e6be148a1db3b7d2eaf3fc47eedcb4980.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission9335/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission9335/-/Rebuttal_Revision'] | poster | [
{
"confidence": 5,
"date": 0,
"rating": 2,
"review": "",
"review_id": "yzZyhoNCDS",
"reviewer": "ICLR.cc/2026/Conference/Submission9335/Reviewer_rduG",
"strengths": "1. The paper is well-written, with the exception of some details. It is concise, to the point and effective at communicati... | 4 | @inproceedings{
anonymous2025vattention,
title={vAttention: Verified Sparse Attention via Sampling},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zzTDulLys0},
note={under review}
} | anonymous2025vattention |
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals | Distribution Matching Distillation (DMD) distills score-based generative models into efficient one-step generators,
without requiring a one-to-one correspondence with the sampling trajectories of their teachers.
However, limited model capacity causes one-step distilled models underperform on complex generative tasks, e.g.,
synthesizing intricate object motions in text-to-video generation.
Directly extending DMD to multi-step distillation increases memory usage and computational depth, leading to instability and reduced efficiency.
While prior works propose stochastic gradient truncation as a potential solution,
we observe that it substantially reduces the generation diversity of multi-step distilled models,
bringing it down to the level of their one-step counterparts.
To address these limitations, we propose **Phased DMD**, a multi-step distillation framework that bridges the idea of phase-wise distillation with Mixture-of-Experts (MoE), reducing learning difficulty while enhancing model capacity.
Phased DMD is built upon two key ideas: **progressive distribution matching** and **score matching within subintervals**.
First, our model divides the SNR range into subintervals, progressively refining the model to higher SNR levels, to better capture complex distributions.
Next, to ensure the training objective within each subinterval is accurate, we have conducted rigorous mathematical derivations.
We validate Phased DMD by distilling state-of-the-art (SOTA) image and video generation models, including Qwen-Image (20B parameters) and Wan2.2 (28B parameters).
Experimental results demonstrate that Phased DMD preserves output diversity better than DMD while retaining key generative capabilities.
We will release our code and models. | 2,026 | https://openreview.net/forum?id=zzJTo7ujql | https://openreview.net/pdf/e71773613d64368792595f5adf47cf22041311cc.pdf | [
"Xiangyu Fan",
"Zesong Qiu",
"Zhuguanyu Wu",
"Fanzhou Wang",
"Zhiqian Lin",
"Tianxiang Ren",
"Dahua Lin",
"Ruihao Gong",
"Lei Yang"
] | ICLR 2026 Conference Withdrawn Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission10813/-/Full_Submission', 'ICLR.cc/2026/Conference/-/Withdrawn_Submission'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 4,
"review": "",
"review_id": "us3Mj7Oiym",
"reviewer": "ICLR.cc/2026/Conference/Submission10813/Reviewer_PJDq",
"strengths": "- While the idea of progressive diffusion distillation under various criteria has been explored in previous studies such ... | 3 | @misc{
fan2025phased,
title={Phased {DMD}: Few-step Distribution Matching Distillation via Score Matching within Subintervals},
author={Xiangyu Fan and Zesong Qiu and Zhuguanyu Wu and Fanzhou Wang and Zhiqian Lin and Tianxiang Ren and Dahua Lin and Ruihao Gong and Lei Yang},
year={2025},
url={https://openreview.net/forum?id=zzJTo7ujql}
} | fan2025phased |
Learning activation functions with PCA on a set of diverse piecewise-linear self-trained mappings | This work explores a novel approach to learning activation functions, moving beyond the current reliance on human-engineered designs like the ReLU. Activation functions are crucial for the performance of deep neural networks, yet selecting an optimal one remains challenging. While recent efforts have focused on automatically searching for these functions using a parametric approach, our research does not assume any predefined functional form and lets the activation function be approximated by a subnetwork within a larger network, following the Network in Network (NIN) paradigm. We propose to train several networks on a range of problems to generate a diverse set of effective activation functions, and subsequently apply Principal Component Analysis (PCA) to this collection of functions to uncover their underlying structure. Our experiments show that only a few principal components are enough to explain most of the variance in the learned functions, and that these components have in general a simple, identifiable analytical form. Experiments using the analytical function form achieve state of the art performance, highlighting the potential of this data-driven approach to activation function design. | 2,026 | https://openreview.net/forum?id=zz3El6hqbs | https://openreview.net/pdf/5c2083093945b12142ac89448a624de1f7279d3e.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission19895/-/Full_Submission'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 2,
"review": "",
"review_id": "6hd51Ytryy",
"reviewer": "ICLR.cc/2026/Conference/Submission19895/Reviewer_WARg",
"strengths": "- The topic of the submission is very interesting: many aspects of deep learning architectures are iteratively designed t... | 4 | @inproceedings{
anonymous2025learning,
title={Learning activation functions with {PCA} on a set of diverse piecewise-linear self-trained mappings},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zz3El6hqbs},
note={under review}
} | anonymous2025learning |
Sobolev acceleration for neural networks | $\textit{Sobolev training}$, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain incompletely understood. In this work, we show that Sobolev training provably accelerates the convergence of Rectified Linear Unit (ReLU) networks and quantify such `Sobolev acceleration' within the student--teacher framework. Our analysis builds on an analytical formula for the population gradients and Hessians of ReLU networks under centered spherical Gaussian input. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks, including diffusion models. | 2,026 | https://openreview.net/forum?id=zz06hwkH37 | https://openreview.net/pdf/c051d040c4fd039cab69daed99bece8b60144928.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission23675/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission23675/-/Rebuttal_Revision'] | poster | [
{
"confidence": 3,
"date": 0,
"rating": 4,
"review": "",
"review_id": "Z9CKDs5NgD",
"reviewer": "ICLR.cc/2026/Conference/Submission23675/Reviewer_VVCF",
"strengths": "This paper presents several key strengths, most notably its establishment of the first rigorous theoretical framework for... | 4 | @inproceedings{
anonymous2025sobolev,
title={Sobolev acceleration for neural networks},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zz06hwkH37},
note={under review}
} | anonymous2025sobolev |
MINT: Causally Tracing Information Fusion in Multimodal Large Language Models | Multimodal Large Language Models (MLLMs) have demonstrated impressive performance on tasks that involve understanding and integrating information across different modalities, particularly vision and language. Despite their effectiveness, the internal representations of these Vision Language Models (VLMs) remain poorly understood, making it difficult to interpret their predictions or identify the causes of common errors. A crucial step toward improved interpretability is understanding how visual and textual signals fuse within the language decoder of these models. This integration process is particularly important since failures to properly combine modalities frequently lead to errors such as object hallucinations and incorrect spatial descriptions. In this paper, we systematically investigate the internal mechanisms of multimodal fusion in three representative VLMs: LLaVA-1.5-7B, DeepSeek-VL2-Tiny, and Qwen2-VL-7B. We propose MINT (Multimodal INtervention Tracing), a method that builds on the principle of hidden state patching to create a causal map of multimodal processing by systematically intervening at each layer of the language decoder. From these maps, we identify a critical region we term the `fusion band'—the decisive window of layers where visual and linguistic signals are actively fused to guide the model's output. Our analysis reveals that the location and width of this band are not uniform across models; they highlight fundamental differences in their fusion mechanisms that directly correlate with a model's ability to resolve contradictions, ground language, and perform complex spatial reasoning. This causal mapping offers a diagnostic framework to explain common VLM failures and inform future architectural design. | 2,026 | https://openreview.net/forum?id=zyu1tXMcbh | https://openreview.net/pdf/b8b86038e600dd05d4b796221a461ee4c688e0a4.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission22929/-/Full_Submission'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 6,
"review": "",
"review_id": "mRxJcajRUA",
"reviewer": "ICLR.cc/2026/Conference/Submission22929/Reviewer_qNLH",
"strengths": "1. The introduced probing method MINT, is systematic and causal method to trace multimodal fusion within VLMs, advancing ... | 4 | @inproceedings{
anonymous2025mint,
title={{MINT}: Causally Tracing Information Fusion in Multimodal Large Language Models},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyu1tXMcbh},
note={under review}
} | anonymous2025mint |
DoMiNO: Down-scaling Molecular Dynamics with Neural Graph Ordinary Differential Equations | DoMiNO: Down-scaling Molecular Dynamics with Neural Graph Ordinary Differential Equations | 2,026 | https://openreview.net/forum?id=zyq1JIuIhL | https://openreview.net/pdf/99983c740e057ab5240b1e4426d5c4a9fe111da6.pdf | [
"Fang Sun",
"Zijie Huang",
"Yadi Cao",
"Xiao Luo",
"Wei Wang",
"Yizhou Sun"
] | ICLR 2026 Conference Withdrawn Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission13342/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission13342/-/Rebuttal_Revision', 'ICLR.cc/2026/Conference/-/Withdrawn_Submission'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 2,
"review": "",
"review_id": "vzanZOtJ1N",
"reviewer": "ICLR.cc/2026/Conference/Submission13342/Reviewer_LGqt",
"strengths": "The authors tackle an important problem with a creative and, in principle, intuitive idea. The reduced scaling from O(T) ... | 4 | @misc{
sun2025domino,
title={DoMi{NO}: Down-scaling Molecular Dynamics with Neural Graph Ordinary Differential Equations},
author={Fang Sun and Zijie Huang and Yadi Cao and Xiao Luo and Wei Wang and Yizhou Sun},
year={2025},
url={https://openreview.net/forum?id=zyq1JIuIhL}
} | sun2025domino |
Learning with Interaction: Agentic Distillation for Large Language Model Reasoning | Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning abilities to solve complex tasks. However, these gains come with significant computational costs, limiting their practical deployment. A promising direction is to distill reasoning skills from larger teacher models into smaller, more efficient student models, yet existing data-centric distillation approaches suffer from passive learning, over-learning on simple tasks, and persistent knowledge gaps. To overcome these limitations, we introduce Agentic Distillation, a novel framework for adaptive and active distillation. In Agentic Distillation, student LLMs interact with teacher LLMs modeled as environments, receiving feedback tokens to guide their reasoning process and selectively updating their capabilities when necessary. To address the off-policy and gradient vanishing challenges introduced by feedback tokens, we devise a tailored importance sampling and clipping strategy within a unified objective that both incentivizes reasoning and injects knowledge into student LLMs. Extensive experiments show that Agentic Distillation significantly enhances reasoning performance while improving efficiency, offering a scalable path for equipping compact LLMs with advanced reasoning abilities. | 2,026 | https://openreview.net/forum?id=zyp9QT5Gf1 | https://openreview.net/pdf/83e3c72f3b786cbec6676a0267401ad0cd12b8bd.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission17783/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission17783/-/Rebuttal_Revision'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 4,
"review": "",
"review_id": "GBwzyKXich",
"reviewer": "ICLR.cc/2026/Conference/Submission17783/Reviewer_qKrD",
"strengths": "1. The detailed discussion of several issues when trying to inject teacher-generated tokens into the student LM is insigh... | 4 | @inproceedings{
anonymous2025learning,
title={Learning with Interaction: Agentic Distillation for Large Language Model Reasoning},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyp9QT5Gf1},
note={under review}
} | anonymous2025learning |
LitePruner: A Lightweight Realtime Token Pruner before Large Language Models | Tokenization is one of the core steps of the language model pipeline. However, the tokenizer yields more tokens for the same context in non-English languages, especially in low-resource languages due to the shared multilingual settings, which results in unexpected fairness problems in terms of token fees, response latency, and long context processing. In this paper, we study the real-time computing problem, attempting to reduce the total number of tokens per query but maintain decent performance in multilingual settings. We present a simple, training-free, CPU-based pruner model to reuse pre-trained weights from the first attention layer of small models to rank token importance, only delivering important tokens to the target larger models. This method is motivated by the fact that early layers in both small and large models latch onto the similar shallow local signals due to similar tokenization algorithms (e.g., BPE) producing identical local signals. Massive in-context learning experiments on MGSM, Global-MMLU-Lite and ARC and RAG-based experiments on PubMedQA and MEMERAG show that our method can preserve decent performance for languages while reducing up to $30\%$ of the total number of tokens in both in-family and across-family model settings, where the pruner model and the target large model are in or not in the same model family. Our method is compatible with commercial LLM APIs and CPU-based, contributing to real-life applications. | 2,026 | https://openreview.net/forum?id=zyTGgLUdCb | https://openreview.net/pdf/f1089989f30f9fb47778643e1c055836f291b1f3.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission16269/-/Full_Submission'] | poster | [
{
"confidence": 3,
"date": 0,
"rating": 2,
"review": "",
"review_id": "rIU4bPd3Xi",
"reviewer": "ICLR.cc/2026/Conference/Submission16269/Reviewer_KCka",
"strengths": "1. The paper addresses a real fairness issue where non-English users pay significantly more for LLM services due to token... | 4 | @inproceedings{
anonymous2025litepruner,
title={LitePruner: A Lightweight Realtime Token Pruner before Large Language Models},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyTGgLUdCb},
note={under review}
} | anonymous2025litepruner |
Diffusion Bridge Variational Inference for Deep Gaussian Processes | Deep Gaussian processes (DGPs) enable expressive hierarchical Bayesian modeling but pose substantial challenges for posterior inference, especially over inducing variables. Denoising diffusion variational inference (DDVI) addresses this by modeling the posterior as a time-reversed diffusion from a simple Gaussian prior. However, DDVI’s fixed unconditional starting distribution remains far from the complex true posterior, resulting in inefficient inference trajectories and slow convergence. In this work, we propose Diffusion Bridge Variational Inference (DBVI), a principled extension of DDVI that initiates the reverse diffusion from a learnable, data-dependent initial distribution. This initialization is parameterized via an amortized neural network and progressively adapted using gradients from the ELBO objective, reducing the posterior gap and improving sample efficiency. To enable scalable amortization, we design the network to operate on the inducing inputs $\mathbf{Z}^{(l)}$, which serve as structured, low-dimensional summaries of the dataset and naturally align with the inducing variables' shape. DBVI retains the mathematical elegance of DDVI—including Girsanov-based ELBOs and reverse-time SDEs—while reinterpreting the prior via a Doob-bridged diffusion process. We derive a tractable training objective under this formulation and implement DBVI for scalable inference in large-scale DGPs. Across regression, classification, and image reconstruction tasks, DBVI consistently outperforms DDVI and other variational baselines in predictive accuracy, convergence speed, and posterior quality. | 2,026 | https://openreview.net/forum?id=zyRmy0Ch9a | https://openreview.net/pdf/53c9c6bc86a1153ef4a88043c1f49e49ce4cfb91.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission6981/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission6981/-/Rebuttal_Revision'] | poster | [
{
"confidence": 3,
"date": 0,
"rating": 6,
"review": "",
"review_id": "8hAMzNMbA4",
"reviewer": "ICLR.cc/2026/Conference/Submission6981/Reviewer_vk1c",
"strengths": "Originality:\nThe paper proposes the novel idea of reinterpreting DDVI as a kind of diffusion bridge using Doob’s h-transf... | 4 | @inproceedings{
anonymous2025diffusion,
title={Diffusion Bridge Variational Inference for Deep Gaussian Processes},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyRmy0Ch9a},
note={under review}
} | anonymous2025diffusion |
Preference-based Policy Optimization from Sparse-reward Offline Dataset | Offline reinforcement learning (RL) holds the promise of training effective policies from static datasets without the need for costly online interactions. However, offline RL faces key limitations, most notably the challenge of generalizing to unseen or infrequently encountered state-action pairs. When a value function is learned from limited data in sparse-reward environments, it can become overly optimistic about parts of the space that are poorly represented, leading to unreliable value estimates and degraded policy quality. To address these challenges, we introduce a novel approach based on contrastive preference learning that bypasses direct value function estimation. Our method trains policies by contrasting successful demonstrations with failure behaviors present in the dataset, as well as synthetic behaviors generated outside the support of the dataset distribution. This contrastive formulation mitigates overestimation bias and improves robustness in offline learning. Empirical results on challenging sparse-reward offline RL benchmarks show that our method substantially outperforms existing state-of-the-art baselines in both learning efficiency and final performance. | 2,026 | https://openreview.net/forum?id=zyLI9LEmry | https://openreview.net/pdf/4ef43b31950eff949a4099d4cb6f9c962b012a4a.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission10578/-/Full_Submission'] | poster | [
{
"confidence": 3,
"date": 0,
"rating": 6,
"review": "",
"review_id": "k0n2MAUUPo",
"reviewer": "ICLR.cc/2026/Conference/Submission10578/Reviewer_uvFa",
"strengths": "- This paper proposes a contrastive preference learning framework to bypass direct value function estimation.\n- This pap... | 4 | @inproceedings{
anonymous2025preferencebased,
title={Preference-based Policy Optimization from Sparse-reward Offline Dataset},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyLI9LEmry},
note={under review}
} | anonymous2025preferencebased |
Teaching LLMs to Admit Uncertainty in OCR | Vision language models (VLMs) are increasingly replacing traditional OCR pipelines, but on visually degraded documents they often hallucinate, producing fluent yet incorrect text without signaling uncertainty. This occurs because current post-training emphasizes accuracy, which encourages models to guess even when uncertain. The problem persists in state-of-the-art systems and severely impacts OCR reliability. To improve the trustworthiness of OCR on degraded documents, we propose uncertainty-aware OCR. Rather than suppressing guesses, our model transcribes while explicitly bracketing spans it deems unreliable with uncertainty tags. To train our model, we use Group Relative Policy Optimization (GRPO). We define the usage rules for uncertainty tags and an evaluation protocol. We introduce a pseudo-labeled cold start and a multi-objective reward that balances transcription accuracy and uncertainty coverage while preventing reward hacking. We explore different combinations of cold start and reward granularity and verify the effect of reward parameters in preventing reward hacking and improving the corresponding metrics. We also introduce Blur-OCR, the benchmark for uncertainty-aware OCR on degraded documents. In detailed experiments, our model maintains strong transcription accuracy while achieving uncertainty tag F1 of 0.685, and it substantially outperforms both open- and closed-source baselines. | 2,026 | https://openreview.net/forum?id=zyCjizqOxB | https://openreview.net/pdf/e2a795c9abb1a38a8b9c19099e6e5c79caef476c.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission1052/-/Full_Submission', 'ICLR.cc/2026/Conference/Submission1052/-/Rebuttal_Revision'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 4,
"review": "",
"review_id": "SJjGbrxrVZ",
"reviewer": "ICLR.cc/2026/Conference/Submission1052/Reviewer_ixBu",
"strengths": "**Clear problem formulation**: The paper addresses a real problem—VLM-based OCR systems hallucinate on degraded documents ... | 4 | @inproceedings{
anonymous2025teaching,
title={Teaching {LLM}s to Admit Uncertainty in {OCR}},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zyCjizqOxB},
note={under review}
} | anonymous2025teaching |
Emergence of Machine Language in LLM-based Agent Communication | Language emergence is a hallmark of human intelligence, as well as a key indicator for assessing artificial intelligence. Unlike prior studies grounded in multi-agent reinforcement learning, this paper asks whether machine language, potentially not human-interpretable, can emerge between large language model (LLM) agents. We study this in the stylish paradigm of referential games, where a speaker describes a target object into a message with a predefined alphabet, and a listener, given the message, must identify the target among distractors. We propose an agent design that enables the speaker to retrieve semantically similar words before composing a message, and the listener to decode the message based on structural proximity between words. We observe that even given a set of 541 objects, the two agents successfully develop a shared language: they acquire meanings for each object through only 4 rounds of communication, with at most 3 attempts per communication. Additionally, analyses reveal that the emergent language
exhibits compositionality, generalizability, morphemes, and polysemy, which are defining features of human language. Our project can be accessed via the following link: https://anonymous.4open.science/r/ELofLLM-1746/ | 2,026 | https://openreview.net/forum?id=zy06mHNoO2 | https://openreview.net/pdf/dd385254607d317329de7f1ab96728b480363cb4.pdf | [] | ICLR 2026 Conference Submission | ICLR | ['ICLR.cc/2026/Conference/-/Submission', 'ICLR.cc/2026/Conference/-/Post_Submission', 'ICLR.cc/2026/Conference/Submission3748/-/Full_Submission'] | poster | [
{
"confidence": 4,
"date": 0,
"rating": 6,
"review": "",
"review_id": "0acJkXshT6",
"reviewer": "ICLR.cc/2026/Conference/Submission3748/Reviewer_LQV4",
"strengths": "## Strengths\n- The paper introduces an interesting and innovative approach for generating natural-like communication that... | 4 | @inproceedings{
anonymous2025emergence,
title={Emergence of Machine Language in {LLM}-based Agent Communication},
author={Anonymous},
booktitle={Submitted to The Fourteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=zy06mHNoO2},
note={under review}
} | anonymous2025emergence |
End of preview. Expand in Data Studio
No dataset card yet
- Downloads last month
- 108