Papers
250
With code
0
Suggested Build
Suggested Watch
144
0 saved, 144 Build/Watch — get a daily brief
Novelty / saturation by cluster
Rare clusters = higher novelty potential
Fully open-source search agent democratizing high-performance frontier search through open data and code.
A dynamic-aware robotic manipulation system equipped with PUMA architecture for enhanced adaptability in fast-paced environments.
Mixture-of-depths attention enhances large language models by improving feature recovery in deeper layers while maintaining efficiency.
Tri-Prompting offers a unified framework for customizable video content creation with precise control over scene, subject, and motion.
Fast SAM 3D Body accelerates real-time full-body human mesh recovery for interactive applications.
PRIMO R1 transforms video MLLMs into active critics for enhanced robotic manipulation through process reasoning.
SmartSearch revolutionizes conversational memory retrieval by using a deterministic pipeline that outperforms traditional LLM-based methods.
The PokeAgent Challenge is a competitive benchmark for AI decision-making in Pokemon battles and RPGs, fostering advancements in RL and LLM research.
InterveneBench benchmarks LLMs for intervention reasoning in social science, enhancing causal study design.
SlovKE provides a large-scale dataset and LLM evaluation for keyphrase extraction in Slovak, addressing a critical gap in low-resource language processing.
A real-time oriented object detection transformer that improves angle representation and training stability for remote sensing images.
RSGen enhances layout-driven remote sensing image generation with diverse edge guidance for improved control and accuracy.
ViFeEdit is a video-free tuning framework that enables controllable video generation and editing using only 2D images.
EDA-PSeg enhances panoramic semantic segmentation by addressing geometric distortions and unseen classes through innovative attention mechanisms.
The RoCo Challenge benchmarks robotic collaborative manipulation for industrial assembly, providing a dataset and evaluation framework to enhance automation.
VoT leverages event-driven reasoning and multi-level alignment to enhance time series forecasting using multimodal information.
A taxonomy of invisible AI failures to enhance reliability in human-AI interactions.
DGS-Net is an end-to-end grasp prediction network that learns dense grasp configurations from single-view point clouds in multi-object scenes.
This research provides a practical approach to improve layer utilization in large language models through sparsity techniques.
NavGRPO is a robust reinforcement learning framework for goal-directed navigation in photo-realistic environments using natural language instructions.
IRIS enables efficient and interactive editing of 3D scenes using advanced ray-based techniques.
NavThinker offers a future-aware framework for social navigation using action-conditioned world models and reinforcement learning.
An interactive LLM framework that transforms natural language and imagery into optimized 3D interior designs, enhancing user engagement and design communication.
MeMix is a plug-and-play module that enhances streaming 3D reconstruction by mitigating catastrophic forgetting without the need for fine-tuning.
GVC1D revolutionizes video compression by using a compact one-dimensional latent representation to enhance efficiency and reduce bitrate significantly.
A few-shot visual anomaly detection tool for industrial quality assurance using graph attention networks.
GNIO is a novel learning-based framework that enhances inertial navigation accuracy by dynamically suppressing sensor noise and improving motion context understanding.
A framework that enhances medical object detection by utilizing existing labels at inference for improved accuracy and robustness.
MoE-ACT enhances robotic manipulation by integrating language-conditioned Mixture-of-Experts into a lightweight multi-task imitation learning framework.
A novel industrial e-commerce search framework that enhances user conversion by grounding search plans in real-time retrieval data.
SAGE is a self-evolving multi-agent framework that enhances reasoning in LLMs through closed-loop training with minimal human input.
A physics-informed vision-language model for robust anomaly detection in dynamic systems.
HYDRA-TOK unifies visual understanding and generation through a novel representation-harmonized approach.
PiGRAND leverages physics-informed graph neural diffusion to optimize heat transport in 3D printing applications.
A system that minimizes synchronization overhead in multi-agent LLMs by adapting MESI cache protocols.
ForceVLA2 enhances robotic manipulation by integrating hybrid force-position control with explicit force awareness for improved task performance.
TextOVSR leverages text prompts to enhance the super-resolution of degraded opera videos, outperforming existing methods.
Develop Waypoint Diffusion Transformers (WiT) to improve pixel-space image generation by resolving trajectory conflicts and accelerating training.
VAREX is a benchmark for evaluating multi-modal structured data extraction from documents, enhancing model performance insights.
A novel approach to enhance generative writing in LLMs using memory-augmented replay for improved evaluation and optimization.
CrossADR enhances adverse drug reactions prediction for combination pharmacotherapy using advanced graph neural networks.
STALL is a training-free detector for synthetic videos that leverages spatial-temporal likelihoods for reliable detection.
MER-Bench enables the transformation of negative memes into constructive ones through emotion-controllable multimodal generation.
CycleRL is a sim-to-real deep reinforcement learning framework for robust autonomous bicycle control, leveraging advanced training techniques for real-world adaptability.
A novel approach to enhance chemical reaction diagram parsing using visual prompts and reinforcement learning.
HorizonMath is an open-source benchmark for evaluating AI's capability in solving unsolved mathematical problems with automated verification.
HSImul3R offers a novel framework for stable, simulation-ready 3D reconstruction of human-scene interactions using physics-informed optimization.
Code-A1 is an adversarial co-evolution framework that optimizes code and test generation using reinforcement learning.
Seoul World Model generates realistic urban videos by grounding in real city data.
PAP introduces a novel framework for affordance prediction using 360-degree imagery to enhance embodied AI.
LightCtrl enables precise single-image relighting by integrating physical priors for enhanced control over illumination changes.
Kimodo is a controllable motion generation model that synthesizes high-quality human motion from intuitive inputs.
A modular Cascaded Mixture of Experts model for efficient near-shortest path routing in complex networks.
Vib2ECG offers a novel dataset and benchmark for reconstructing ECG from low-cost vibrational signals, enabling mobile ECG monitoring.
CARS is a synthetic image generation framework that enhances chest X-ray models by improving robustness and clinical feature coverage.
Wonda is a data curation pipeline that enhances training data for program verification using Small Language Models.
FedBNN is a federated learning framework that enables low-cost inference with binary neural networks, optimizing memory and computational efficiency.
The TED framework enhances agent evaluation by incorporating user roles and automated error analysis for improved performance insights.
TabKD offers a novel approach to data-free knowledge distillation for tabular data by focusing on interaction diversity.
ALTK is an open-source toolkit that enhances the reliability of AI agents by providing modular middleware components to address failure modes.
A novel anchor-then-polish framework for superior low-light image enhancement.
A novel 3D counting approach for accurately counting stacked objects in industrial inspection using multi-view images.
A novel approach to generalizing motion policies in robotics using Gaussian Graphs for efficient task adaptation.
RAPO optimizes emotional support dialogue systems using user reactions for enhanced interaction outcomes.
PrismMirror enables real-time human frontal view synthesis from a single image, enhancing immersive 3D telepresence.
CLAG enhances small language models by organizing memory through agent-driven clustering, improving answer quality and robustness.
A hybrid modeling framework that enhances crop prediction accuracy through dynamic parameter calibration and multi-task learning.
SEA-Vision is a multilingual benchmark for enhancing document and scene text understanding across Southeast Asia's diverse languages.
Fusian enables precise, continuous personality control in large language models through a novel framework combining LoRA adapters and reinforcement learning.
SWE-Skills-Bench evaluates the effectiveness of agent skills in software engineering tasks using a structured benchmark.
SFCoT enhances the safety of large language models by proactively evaluating and calibrating reasoning steps to prevent jailbreak attacks.
GradCFA is a hybrid framework that enhances AI interpretability through optimized counterfactual explanations and feature attribution.
SKILLS enhances LLM-driven telecom operations by integrating structured knowledge for improved workflow execution.
Introducing Brain-Inspired Graph Multi-Agent Systems to enhance reasoning in Large Language Models through specialized agent coordination.
A novel conditional diffusion model for efficient remote sensing image compression with high perceptual performance.
CRASH is an LLM-based agent that automates reasoning over autonomous vehicle crash reports to enhance safety analysis.
A fast seismic inversion method leveraging Conditional Rectified Flow to enhance accuracy and efficiency in geophysical exploration.
A novel imaging spectrometer that maximizes light throughput for high-fidelity spectral reconstruction.
Dependency-Oriented Sampler enhances masked diffusion language models by leveraging inter-token dependencies for improved generation efficiency.
A perception module for exosuits that estimates walking modes using inertial data for enhanced user adaptation.
TAGARELA is a large-scale Portuguese speech dataset designed to enhance automatic speech recognition and text-to-speech technologies.
CCTU is a benchmark designed to evaluate large language models' tool use under complex constraints, revealing critical limitations in their performance.
An open-source game engine for Dragonchess that leverages evolutionary transfer learning to enhance AI performance.
NS-Mem enhances multimodal agent reasoning by integrating neuro-symbolic memory for better analytical decision making.
LOOM-CFM accelerates inference in flow-based generative models by optimizing noise-data coupling across minibatches.
A study on dataset diversity metrics and their impact on classification model performance.
Develop a task-aware, training-free acceleration framework for unified multimodal models, optimizing real-world AI deployment.
A novel training framework that enhances LLMs for efficient and interpretable ICD coding using short evidence spans.
AGCD is a novel decoding paradigm that enhances weather forecasting accuracy by integrating state-conditioned physics-priors.
HalDec-Bench is a comprehensive benchmark for evaluating hallucination detectors in image captioning, enhancing the quality of vision-language models.
Lend an Ear is an AI-driven platform that enhances human empathic communication through personalized coaching.
SCAN offers a novel sparse editing framework for Large Language Models to prevent catastrophic forgetting during knowledge updates.
A novel method for robust affordance estimation in robotics using coupled particle filters.
INTERPOL is a model-driven framework that enhances identification accuracy of language models by learning deep stylistic patterns.
DART is an online OOD detection method that adapts to covariate shifts by tracking dual prototypes for improved performance.
ScoutGPT is a generative model that simulates football match events to enhance player transfer evaluations through counterfactual analysis.
A novel method for accelerating document parsing using parallel token prediction in vision-language models.
NavGSim is a high-fidelity simulator that enhances robot navigation through realistic environment rendering and collision simulation.
Develop flexible and scalable autonomous driving systems leveraging a novel end-to-end architecture for enhanced closed-loop driving performance.
KiRAS is a framework for quadruped robots that enables robust skill learning and adaptability across complex terrains using keyframe-guided self-imitation.
A multimodal graph learning framework for improved classification of Autism Spectrum Disorder using integrated imaging data.
PriCoder enables LLMs to effectively use private library APIs for code generation by synthesizing data and enhancing code diversity and quality.
M2-ResiPolicy enhances robotic manipulation by integrating high-level guidance with low-level corrections for improved interaction safety.
Safe Flow Q-Learning offers a novel approach to offline safe reinforcement learning, ensuring safety in real-time control applications.
BodyGuards is a multi-robot escorting framework designed to protect human operators in unknown environments with limited communication.
PrototypeNAS automates the design of efficient deep neural networks tailored for microcontroller units, enabling rapid deployment on edge devices.
AeroGrab is an integrated pipeline for reliable aerial grasping in cluttered environments using language instructions.
HALO enables humanoid robots to effectively adapt to unknown payloads through a novel gradient-based system identification framework.
Open-source biomedical knowledge graphs that enable rapid, reproducible cross-referencing of fragmented biomedical data.
An attribute-aware face recognition architecture that enhances the discriminability of facial embeddings by focusing on identity-relevant attributes.
IA-KRC enhances multi-agent communication efficiency by optimizing partner selection under interference constraints.
AdaAnchor optimizes latent reasoning in LLMs by refining anchor vectors with adaptive halting for efficient computation.
SRL-MAD offers a novel approach to detect morphing attacks in biometric systems using structured residual Fourier representations.
AnoleVLA is a lightweight vision-language-action model designed for efficient robotic manipulation in resource-constrained environments.
MUNKEY enables direct zero-shot forgetting in machine learning models, addressing privacy and data error challenges.
FreeOmniMVS offers a novel reference-free framework for robust omnidirectional depth estimation using multi-view consistency maximization.
TrajFlow is a novel flow-matching-based model for generating nationwide pseudo-GPS trajectory data to enhance urban planning and traffic management.
A suite of pretrained Latvian-specific encoders that outperform existing models in NLP tasks.
A novel approach to adapt image editing models for video frame interpolation using few-shot learning.
A method for estimating errors in physics-informed neural networks to enhance trust and interpretability in their predictions.
FreeTalk enables emotion-driven 3D facial animation on arbitrary topology meshes without template constraints.
A pipeline for recognizing objects based on human pointing gestures using RGB images.
SpecDepth enhances monocular depth estimation in colonoscopy by adapting foundation models to address spectral mismatches.
PMAx is an autonomous agentic framework that democratizes process mining by enabling non-technical users to derive insights from data through natural language interactions while ensuring data privacy.
xplainfi is an R package that provides advanced feature importance methods for machine learning models.
An RL-based traffic signal control algorithm that adapts to varying traffic conditions and outperforms traditional methods.
A Transformer-based algorithm that efficiently approximates optimal consensus rankings for applications in recommendation systems and search engines.
A framework that enhances long-term video understanding through question-guided visual compression and memory feedback.
DAIT enables efficient knowledge transfer from large Vision-Language Models to lightweight classifiers for fine-grained visual categorization.
A multi-expert framework for robust COVID-19 CT classification leveraging source-aware modeling.
A novel Retinex-Guided Transformer model for stable low-light image enhancement through advanced decomposition techniques.
A reinforcement learning framework for efficient feature selection in predictive models.
ReactMotion generates naturalistic listener body motions in response to speaker utterances using a novel dataset and generative framework.
A novel classification framework using Euler Characteristic Surfaces for efficient and interpretable time series analysis.
VTC-Bench is a benchmark for evaluating the tool-use proficiency of Multimodal Large Language Models in complex visual tasks.
This research addresses moral indifference in LLMs by aligning latent representations with moral vectors to enhance moral reasoning.
A framework for improving short-term photovoltaic power forecasting by incorporating uncertainty from missing data.
RoSE enhances knowledge editing in LLMs by improving instruction-following capabilities through geometric alignment.
ViX-Ray is a specialized dataset aimed at enhancing vision-language models for Vietnamese chest X-ray analysis.
Transforming global weather forecasting with FuXiWeather2's rapid and accurate AI-powered predictions.
PYTHEN is a Python-based framework that simplifies defeasible legal reasoning for developers and legal professionals.
A novel surrogate model using Kolmogorov-Arnold networks to enhance the efficiency of geochemical solvers for nuclear waste disposal.
A defense mechanism to enhance the safety and reliability of vision-language models against jailbreaking attacks.
A novel approach to enhance semantic segmentation in satellite imagery using self-supervised pretraining techniques tailored for SAR data.
GlyphPrinter enhances visual text rendering accuracy using region-based preference optimization.
A method for releasing network connectedness indices while ensuring differential privacy.
A hybrid neural operator for efficient simulation of EUV electromagnetic wave diffraction from lithography masks.
A low-complexity graphon estimator that improves accuracy and efficiency in network analysis.
A study on improving safety in skeleton-based action recognition through uncertainty analysis and a novel gating mechanism.
A co-design framework for optimizing memory-storage systems using interpretable machine learning models.
Mamba-3 enhances sequence modeling efficiency with state space principles for improved LLM performance.
Bootleg is a self-supervised learning method that enhances feature extraction by predicting latent representations from multiple hidden layers.
A framework to enhance reasoning in LLMs by externalizing uncertainty for improved control actions.
EscapeCraft-4D is a customizable environment for assessing multimodal reasoning and time awareness in large models.
This paper discusses the vulnerabilities in evaluating AI agents by drawing parallels with malware analysis.
A novel dataset and comparative analysis for automatic classification of Nepali music genres using machine learning and deep learning techniques.
A physics-informed fine-tuning framework for adapting foundation models to partial differential equations with minimal data.
Adaptive Residual Context (ARC) enhances urban traffic monitoring by improving vehicle detection while preserving contextual knowledge.
A novel framework for generating adversarial patches that exploit vulnerabilities in facial identification systems.
A novel game-theoretic approach to optimize agent morphology and control for improved robotics efficiency.
NV-Bench provides a standardized benchmark for evaluating nonverbal vocalization synthesis in text-to-speech systems.
A novel data augmentation method that leverages causal knowledge to enhance predictive model accuracy.
CASHomon Sets enable efficient model selection across multiple classes and hyperparameters, enhancing interpretability and performance.
Curated datasets for probing verb alternations in multiple languages to enhance LLM performance.
A new policy-iteration algorithm that ensures safety in sequential decision-making with improved runtime efficiency.
A self-supervised approach to grading corneal nerve fiber tortuosity without expensive segmentation maps.
IConE offers a novel approach to prevent representation collapse in self-supervised learning, enabling effective training with small batch sizes.
A dataset for improving machine translation of passive sentences between English and Chinese.
A framework that optimizes routing and model pruning for efficient decentralized federated learning in bandwidth-constrained environments.
CATFormer is a scalable framework that prevents catastrophic forgetting in spiking neural networks through dynamic thresholds.
A framework for robust predictor identification under latent shifts using imperfect proxies.
Byz-DM21 is a novel Byzantine-robust distributed learning algorithm that enhances communication efficiency through a double-momentum gradient estimator.
Context-aware sensor modeling enhances multi-sensor tracking performance in heterogeneous environments.
A multilingual approach to indirect question answering, addressing challenges in both high- and low-resource languages.
A novel approach to ultra-low-bitrate image compression using temporal priors for improved fidelity and speed.
This paper critiques the validity of LLM capability benchmarks and proposes a robust theoretical framework for assessment.
MMKU-Bench is a comprehensive evaluation benchmark for multimodal knowledge updating, addressing the need for consistent real-world knowledge in AI models.
PAKAN enhances pansharpening by introducing adaptive activation functions for improved spatial-spectral fusion.
A review of low-cost Edge AI and TinyML solutions for enhancing precision agriculture in resource-constrained environments.
IN-FOAMs are innovative inflatable artificial muscles designed for flexible and portable robotic movements.
A novel probabilistic forecasting method using stochastic feed-forward neural networks for spatio-temporal datasets.
GUI-CEval is a comprehensive benchmark designed to evaluate Chinese mobile GUI agents across various applications and capabilities.
A framework for enhancing CT imaging quality through uncertainty-guided manifold smoothing.
A documentation systematics for Agentic AI systems that enhances transparency and maintainability in industrial applications.
DeepVision-VLA enhances visual representations in Vision-Language-Action models for improved robotic manipulation.
This study critiques existing metrics for counterfactual explanations in AI, highlighting their misalignment with user perceptions.
A perception-aware framework for UAVs that enhances exploration efficiency in feature-limited environments.
Energy-Aware Autonomous Exploration (EAAE) optimizes UAV exploration by minimizing energy consumption while maintaining exploration efficiency.
AC-Foley is an audio-conditioned model for precise video-to-audio synthesis that overcomes text-based limitations.
This paper presents a new algorithm for linear contextual bandits that improves computational efficiency under adversarial conditions.
A novel distillation pipeline for xLSTM architectures aiming for lossless performance compared to large language models.
Lore transforms git commit messages into structured decision records for AI coding agents.
A framework for diagnosing hallucinations in Vision-Language Models through cognitive trajectory analysis.
This research analyzes how LLMs can generate plausible distractors for educational assessments by modeling student misconceptions.
DOT is an automated database tuning algorithm that optimizes performance by dynamically selecting influential parameters.
This paper critiques the evaluation methods in time-series forecasting, advocating for a more rigorous approach to benchmark datasets.
Gym-V is a unified platform for agentic vision research, providing a diverse set of procedurally generated environments for reinforcement learning.
MA-VLCM enhances multi-agent reinforcement learning by using a pretrained vision-language model as a centralized critic for improved sample efficiency.
A framework enhancing the reliability and security of quantized deep neural networks through a three-stage process.
TrinityGuard is a safety evaluation and monitoring framework for LLM-based multi-agent systems addressing unique security risks.
RieMind proposes a novel framework for enhancing spatial reasoning in indoor scenes through explicit 3D scene graph grounding.
PRISM is a simulation-based encoder-decoder for scalable model selection in scientific applications.
HapticVLA enables contact-rich manipulation without the need for tactile sensors during inference.
This paper argues that the most valuable capabilities of LLMs lie in their unexplainable aspects, challenging traditional expert systems.
A novel adversarial training framework for enhancing robustness in autonomous driving systems against rare, safety-critical scenarios.
This research explores the challenges of improving LLM performance in underrepresented English dialects due to data scarcity.
A novel framework for causal mediation analysis using sequential transport and optimal transport methods.
SNCE introduces a novel training objective to enhance discrete image generation by optimizing large VQ codebooks.
CAICL enhances vision-language models for improved robustness in robotic manipulation tasks involving confusable objects.
This paper explores a new security vulnerability in LLM agents related to memory control flow attacks.
An integrated framework for comparative law studies using Japanese legal data.
Introducing Attention Residuals to enhance layer contribution in LLMs through selective aggregation.
A framework for generating human motion using Riemannian geometry.
ClueNet enhances video question answering by improving visual clue extraction and reasoning alignment.
A theoretical framework for modeling artificial general intelligence based on cognitive architecture.
This paper presents new variance-reduction techniques for solving stochastic composite inclusions using forward-reflected-backward splitting methods.
A new framework for estimating staged tree models using hierarchical clustering on the probability simplex.
This paper explores optimal control strategies for underactuated robots to mitigate oscillations during trajectory tracking.
This paper explores the complexities of aligning large language models amidst conflicting priorities and proposes a verification mechanism to enhance robustness.
This paper explores the dynamics of AdamW in relation to grokking and generalization in optimization.
A mathematical formulation for tightly-coupled LiDAR-Inertial Odometry using VoxelMap.
MV2UV offers a novel approach to generating high-quality UV texture maps by addressing multiview inconsistencies and unseen parts.
This paper analyzes various formalisms for specifying robotic missions to aid in selecting appropriate modeling approaches.
This paper explores safety vulnerabilities in test-time training methods for large language models.
AnyCrowd is a novel framework for scalable multi-character animation that addresses identity entanglement and pose binding challenges.
Introducing a new topological complexity measure for classification problems in metric spaces.
This paper explores the internal mechanisms of LLMs in understanding tabular data, providing insights for future research.
This paper presents a theoretical advancement in topological machine learning through persistence spheres for optimal transport.
This paper explores a new learning architecture inspired by cognitive science to enhance autonomous learning in AI systems.
This paper analyzes the impact of beam width selection on LLM output quality, revealing potential overestimation bias.
This paper explores the theoretical aspects of deep residual networks' approximation capacity in dynamical systems.
This paper presents a method for unsupervised anomaly detection in mobile core networks using multi-embedding models.
A theoretical framework for recovering unknown orderings through adaptive querying of pairwise similarities.
This study evaluates the temperature susceptibility of SRAM PUFs in embedded systems.
A novel chaos-based approach to improve classification accuracy through enhanced training processes.
A methodology for identifying dynamic parameters in 3-DOF parallel robots to enhance model-based control.
This paper explores in-context symbolic regression techniques to enhance the robustness of Kolmogorov-Arnold Networks.
This paper explores mechanistic interpretability in embodied control systems using infant motor learning as a model.
This paper explores the decomposition of probabilistic scores to analyze calibration and uncertainty in predictors.
vCause offers a secure and efficient system for causality analysis in cloud-based endpoint auditing.
This paper explores a method for reducing computational costs in online learning through sparse gradient transport.
This paper presents a theoretical framework for identifying Condorcet winners in dueling bandits, focusing on sample complexity.
A framework to evaluate AI-generated research ideas based on their future impact and citation potential.
A novel calibration method for integrating camera and laser tracker measurements in mobile robots.
A multimodal deep learning framework for predicting pathological response in non-small cell lung cancer using limited data.
Muon is an optimizer designed for stable training in the presence of heavy-tailed noise.
This paper explores the error sources in global feature effect estimation for machine learning models.
A framework for assessing the maturity and readiness of prompt engineering assets in generative AI.
This study replicates and extends a system for authorship attribution of machine-generated texts using multilingual models and stylometric features.
This paper explores the risks of catastrophic outcomes from AIs with misspecified objectives in complex environments.