VectorWorld: Efficient Streaming World Model via Diffusion Flow on Vector Graphs Watch
VectorWorld offers real-time, high-fidelity autonomous driving simulation using novel vector graph diffusion flows.
Autonomous Driving Mar 18 High viability
Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare Watch
Zero Trust Security Architecture for AI agents in healthcare, protecting sensitive data from vulnerabilities.
Healthcare Security Mar 18 High viability
S-VGGT: Structure-Aware Subscene Decomposition for Scalable 3D Foundation Models Watch
Optimize 3D foundation models with S-VGGT's structure-aware subscene decomposition for enhanced scalability and efficiency.
3D Modeling Mar 18 High viability
Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition Build Now
Zipper-LoRA enhances multilingual speech recognition by dynamically optimizing language-specific and shared model parameters.
Multilingual Speech Recognition Mar 18 Pending High viability
SafeLand: Safe Autonomous Landing in Unknown Environments with Bayesian Semantic Mapping Build Now
SafeLand is a vision-based system for safe autonomous landing of UAVs in dynamic environments without prior information.
Autonomous UAV Landing Mar 18 Pending High viability
MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval Build Now
MCoT-MVS enhances composed image retrieval by integrating multi-level vision features with multi-modal reasoning for improved semantic accuracy.
Image Retrieval Mar 18 Pending High viability
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Watch
Loc3R-VLM enhances Vision-Language Models with advanced 3D understanding for improved spatial reasoning.
Vision-Language Models Mar 18 High viability
GMT: Goal-Conditioned Multimodal Transformer for 6-DOF Object Trajectory Synthesis in 3D Scenes Watch
GMT is a multimodal transformer that generates realistic 6-DOF object manipulation trajectories for robots in complex 3D environments.
Robotics Mar 18 High viability
Versatile Editing of Video Content, Actions, and Dynamics without Training Watch
DynaEdit enables versatile, training-free video editing using pretrained models for complex scene interactions.
Generative Video Mar 18 High viability
AHOY! Animatable Humans under Occlusion from YouTube Videos with Gaussian Splatting and Video Diffusion Priors Watch
AHOY reconstructs animatable 3D avatars from occluded YouTube videos using advanced Gaussian splatting techniques.
3D Reconstruction Mar 18 High viability
Interpretable Traffic Responsibility from Dashcam Video via Legal Multi Agent Reasoning Watch
C-TRAIL transforms dashcam video evidence into legal responsibility assessments using a multimodal approach.
Legal AI Mar 18 High viability
A practical artificial intelligence framework for legal age estimation using clavicle computed tomography scans Watch
A robust AI framework for legal age estimation using clavicle CT scans, enhancing forensic decision-making.
Medical AI Mar 18 High viability
DexViTac: Collecting Human Visuo-Tactile-Kinematic Demonstrations for Contact-Rich Dexterous Manipulation Watch
DexViTac is a portable data collection system that captures high-fidelity visuo-tactile-kinematic demonstrations for improving robot dexterity in contact-rich environments.
Robotics Mar 18 High viability
ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models Watch
ProbeFlow accelerates action decoding in Vision-Language-Action models for responsive robotic control.
Robotics Control Mar 18 High viability
M2P: Improving Visual Foundation Models with Mask-to-Point Weakly-Supervised Learning for Dense Point Tracking Watch
M2P enhances visual foundation models for dense point tracking using weakly-supervised learning with video object segmentation.
Video Understanding Mar 18 High viability
Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients Build Now
A fine-grained quantization strategy for large vision language models that enhances accuracy while reducing computational overhead.
Vision Language Models Mar 18 Pending High viability
TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos Watch
TAPESTRY generates high-fidelity turntable videos from 3D models, enabling automated creation of production-ready assets.
3D Content Generation Mar 18 High viability
VolumeDP: Modeling Volumetric Representation for Manipulation Policy Learning Watch
VolumeDP enhances robotic manipulation through advanced 3D spatial reasoning for improved imitation learning.
Robotic Manipulation Mar 18 High viability
Parameter-Efficient Modality-Balanced Symmetric Fusion for Multimodal Remote Sensing Semantic Segmentation Build Now
MoBaNet is a parameter-efficient framework for multimodal remote sensing semantic segmentation that balances modality contributions while minimizing computational overhead.
Multimodal Remote Sensing Mar 18 Pending High viability
Learning Transferable Temporal Primitives for Video Reasoning via Synthetic Videos Build Now
SynRL is a post-training framework that enhances video understanding by teaching models fundamental temporal primitives through synthetic video generation.
Video Reasoning Mar 18 Pending High viability
Post-Training Local LLM Agents for Linux Privilege Escalation with Verifiable Rewards Watch
A local LLM agent for Linux privilege escalation that achieves high success rates with verifiable rewards.
Security AI Mar 18 High viability
FINER: MLLMs Hallucinate under Fine-grained Negative Queries Watch
FINER addresses hallucinations in multimodal large language models through innovative fine-grained negative queries and tuning techniques.
Multimodal AI Mar 18 High viability
From Isolated Scoring to Collaborative Ranking: A Comparison-Native Framework for LLM-Based Paper Evaluation Watch
A novel framework for collaborative ranking of scientific papers using LLMs to enhance evaluation accuracy.
LLM Evaluation Mar 18 High viability
PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery Watch
PanoVGGT is a Transformer framework for accurate 3D reconstruction from panoramic imagery, leveraging a unique dataset and innovative training strategies.
3D Reconstruction Mar 18 High viability
KA2L: A Knowledge-Aware Active Learning Framework for LLMs Watch
KA2L is a framework that enhances LLMs' performance through targeted active learning by identifying knowledge gaps.
Active Learning for LLMs Mar 18 High viability
Prompt-Free Universal Region Proposal Network Build Now
A novel object detection network that identifies potential objects without relying on external prompts, enhancing flexibility across various applications.
Object Detection Mar 18 Pending High viability
Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify Watch
GLIDE is a generative recommender system that enhances podcast discovery by combining semantic reasoning with user context at Spotify.
Generative Retrieval Mar 18 High viability
A Unified Language Model for Large Scale Search, Recommendation, and Reasoning Watch
NEO is a unified language model that enhances recommendation, search, and reasoning across large catalogs with language-steerable capabilities.
Recommendation Systems Mar 18 High viability
AirDDE: Multifactor Neural Delay Differential Equations for Air Quality Forecasting Build Now
AirDDE leverages neural delay differential equations for improved air quality forecasting by integrating delay modeling with physical guidance.
Environmental AI Mar 18 Pending High viability
Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code Generation Build Now
Omni-I2C is a benchmark for evaluating Large Multimodal Models in generating executable code from complex digital graphics.
Image-to-Code Generation Mar 18 Pending High viability
Inducing Epistemological Humility in Large Language Models: A Targeted SFT Approach to Reducing Hallucination Watch
A targeted fine-tuning approach to reduce hallucinations in large language models by teaching epistemological humility.
LLM Training Mar 18 High viability
Argument Reconstruction as Supervision for Critical Thinking in LLMs Watch
A framework that enhances LLMs' critical thinking by teaching them to reconstruct arguments.
NLP Mar 18 High viability
Towards Motion-aware Referring Image Segmentation Build Now
A novel approach to improve referring image segmentation for motion-related queries using multimodal learning.
Image Segmentation Mar 18 Pending High viability
Harnessing the Power of Foundation Models for Accurate Material Classification Watch
A novel framework leveraging foundation models to enhance material classification accuracy through innovative dataset generation and prior incorporation.
Material Classification Mar 18 High viability
Universal Skeleton Understanding via Differentiable Rendering and MLLMs Watch
SkeletonLLM translates human skeleton sequences into visual representations for enhanced multimodal reasoning.
Multimodal Learning Mar 18 High viability
EchoGen: Cycle-Consistent Learning for Unified Layout-Image Generation and Understanding Watch
EchoGen is a unified framework that enhances layout-to-image generation and image grounding through innovative training strategies.
Generative Image Mar 18 High viability
The Unreasonable Effectiveness of Text Embedding Interpolation for Continuous Image Steering Build Now
A training-free framework for continuous and controllable image editing using text embeddings.
Image Editing Mar 18 Pending High viability
LoST: Level of Semantics Tokenization for 3D Shapes Watch
LoST revolutionizes 3D shape generation by introducing a semantic tokenization method that enhances autoregressive modeling efficiency.
3D Shape Generation Mar 18 High viability
AdaRadar: Rate Adaptive Spectral Compression for Radar-based Perception Watch
AdaRadar offers an adaptive compression solution for radar data in autonomous driving, enhancing data transmission efficiency.
Radar Data Compression Mar 18 High viability
Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training Watch
MUD optimizes transformer training by enhancing momentum updates for faster convergence.
Optimizer Improvements Mar 18 High viability
Specification-Aware Distribution Shaping for Robotics Foundation Models Watch
A framework for optimizing action distributions in robotics models to ensure compliance with complex temporal specifications.
Robotics Mar 18 High viability
Robust-ComBat: Mitigating Outlier Effects in Diffusion MRI Data Harmonization Watch
Robust-ComBat enhances diffusion MRI data harmonization by effectively mitigating outlier effects in patients with neurological disorders.
Medical AI Mar 18 High viability
TransText: Transparency Aware Image-to-Video Typography Animation Watch
TransText enables high-fidelity, layer-aware text animation for dynamic visual design.
Image-to-Video Animation Mar 18 High viability
Efficient Training-Free Multi-Token Prediction via Embedding-Space Probing Watch
A training-free method for efficient multi-token prediction in large language models that enhances throughput and accuracy.
LLM Training Mar 18 High viability
SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale Watch
Aerial RGB-Thermal Segmentation tool enabling enhanced drone-based surveillance and monitoring.
Aerial Imaging Mar 18 High viability
A Creative Agent is Worth a 64-Token Template Watch
CAT is a framework that enhances creativity in text-to-image models by generating reusable token templates for fuzzy prompts.
Creative AI Mar 18 High viability
scicode-lint: Detecting Methodology Bugs in Scientific Python Code with LLM-Generated Patterns Build Now
Scicode-lint automates the detection of methodology bugs in scientific Python code using AI-generated patterns.
Code Analysis Mar 18 Pending High viability
Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation Watch
A scalable framework for personalized audio-video generation that allows fine-grained control over identity attributes.
Generative Video Mar 18 High viability
DebugLM: Learning Traceable Training Data Provenance for LLMs Watch
DebugLM enhances LLMs with data provenance for improved debugging and targeted remediation.
LLM Training Mar 18 High viability
Mitigating LLM Hallucinations through Domain-Grounded Tiered Retrieval Watch
A domain-grounded retrieval system that enhances the reliability of LLMs by mitigating hallucinations through a structured verification process.
LLM Reliability Mar 18 High viability
Procedural Generation of Algorithm Discovery Tasks in Machine Learning Watch
DiscoGen automates the creation of diverse algorithm discovery tasks to enhance machine learning optimization.
Algorithm Discovery Mar 18 High viability
Revisiting foundation models for cell instance segmentation Build Now
A new instance segmentation strategy for improving microscopy cell segmentation models.
Medical AI Mar 18 Pending High viability
Omni-3DEdit: Generalized Versatile 3D Editing in One-Pass Watch
Omni-3DEdit is a unified model for efficient and versatile 3D editing tasks in one pass.
3D Editing Mar 18 High viability
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control Watch
A novel adaptive and robust control solution for robotics using generative models to optimize flow matching.
Robotics Control Mar 18 High viability
RPMS: Enhancing LLM-Based Embodied Planning through Rule-Augmented Memory Synergy Watch
RPMS enhances LLM-based embodied planning by integrating rule-augmented memory to improve action feasibility and success rates.
Agents Mar 18 High viability
FailureMem: A Failure-Aware Multimodal Framework for Autonomous Software Repair Watch
FailureMem is a multimodal framework that enhances automated program repair by integrating visual reasoning and reusable knowledge from past failures.
Automated Program Repair Mar 18 High viability
EVA: Aligning Video World Models with Executable Robot Actions via Inverse Dynamics Rewards Watch
EVA aligns video world models with executable robot actions to enhance robotic control through reinforcement learning.
Robotics Mar 18 High viability
RangeAD: Fast On-Model Anomaly Detection Watch
RangeAD offers an efficient on-model approach for anomaly detection that reduces inference costs while improving performance.
Anomaly Detection Mar 18 High viability
Exploring parameter-efficient fine-tuning (PEFT) of billion-parameter vision models with QLoRA and DoRA: insights into generalization for limited-data image classification under a 98:1 test-to-train regime Build Now
A parameter-efficient fine-tuning approach for billion-parameter vision models to enhance behavior classification in precision livestock farming.
Agricultural AI Mar 18 Pending High viability
Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory Watch
Introducing Knowledge Objects for efficient and accurate memory management in large language models.
LLM Memory Mar 18 High viability
CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image Watch
CrowdGaussian reconstructs high-fidelity 3D models of human crowds from single images using advanced self-supervised learning techniques.
3D Reconstruction Mar 18 High viability
CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution Watch
CoVerRL enhances label-free reinforcement learning by using a generator-verifier co-evolution framework to improve reasoning accuracy.
Reinforcement Learning Mar 18 High viability
Evidence Packing for Cross-Domain Image Deepfake Detection with LVLMs Watch
A training-free framework for detecting deepfakes using evidence-driven reasoning with large vision-language models.
Image Deepfake Detection Mar 18 High viability
Harm or Humor: A Multimodal, Multilingual Benchmark for Overt and Covert Harmful Humor Watch
A benchmark for detecting harmful humor across multiple languages and modalities.
Multimodal AI Mar 18 High viability
PC-CrossDiff: Point-Cluster Dual-Level Cross-Modal Differential Attention for Unified 3D Referring and Segmentation Watch
PC-CrossDiff enhances 3D visual grounding by improving localization in complex scenes using dual-level differential attention.
3D Visual Grounding Mar 18 High viability
Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow Watch
A testbed for testing Connected and Autonomous Vehicles in mixed traffic environments using a human-in-the-loop approach.
Autonomous Vehicles Mar 18 High viability
SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition Watch
SARE is a training-free framework that enhances Fine-Grained Visual Recognition by adapting reasoning based on sample difficulty.
Visual Recognition Mar 18 High viability
AERR-Nav: Adaptive Exploration-Recovery-Reminiscing Strategy for Zero-Shot Object Navigation Watch
AERR-Nav is an adaptive framework for zero-shot object navigation in complex multi-floor environments.
Robotics Navigation Mar 18 High viability
DancingBox: A Lightweight MoCap System for Character Animation from Physical Proxies Watch
DancingBox is a lightweight motion capture system that enables intuitive character animation using everyday objects and a single webcam.
Character Animation Mar 18 High viability
Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization Watch
Anonymization-first framework for LLM trading agents that enhances trustworthiness and performance in portfolio optimization.
Financial AI Mar 18 High viability
Flow Matching Policy with Entropy Regularization Watch
FMER is an efficient online RL framework that enhances exploration through principled maximum-entropy optimization.
Reinforcement Learning Mar 18 High viability
Does YOLO Really Need to See Every Training Image in Every Epoch? Watch
A novel sampling strategy for YOLO detectors that optimizes training efficiency by selectively resampling images based on learning sufficiency.
Computer Vision Mar 18 High viability
Sensi: Learn One Thing at a Time -- Curriculum-Based Test-Time Learning for LLM Game Agents Watch
Sensi is an LLM agent that learns efficiently in unknown environments through structured test-time learning.
Agents Mar 18 High viability
Consistency-Driven Dual LSTM Models for Kinematic Control of a Wearable Soft Robotic Arm Watch
A dual LSTM framework for enhancing the kinematic control of wearable soft robotic arms.
Robotics Mar 18 High viability
Few-Step Diffusion Sampling Through Instance-Aware Discretizations Watch
An instance-aware discretization framework that enhances the performance of diffusion models by adapting timestep allocations based on input-dependent priors.
Generative Models Mar 18 High viability
Anchoring and Rescaling Attention for Semantically Coherent Inbetweening Watch
A generative inbetweening method that synthesizes realistic intermediate frames with enhanced semantic and temporal guidance.
Generative Video Mar 18 High viability
Part-Aware Open-Vocabulary 3D Affordance Grounding via Prototypical Semantic and Geometric Alignment Watch
A novel framework for enhancing semantic and geometric representations in 3D affordance grounding.
3D Computer Vision Mar 18 High viability
VeriGrey: Greybox Agent Validation Build Now
VeriGrey is a grey-box testing framework that uncovers security risks in LLM agents through dynamic prompt mutation.
Agents Mar 18 Pending High viability
DSS-GAN: Directional State Space GAN with Mamba backbone for Class-Conditional Image Synthesis Watch
DSS-GAN introduces a novel conditioning mechanism for enhanced class-conditional image synthesis using a Mamba backbone.
Generative Image Synthesis Mar 18 High viability
ARES: Scalable and Practical Gradient Inversion Attack in Federated Learning through Activation Recovery Build Now
ARES is a novel gradient inversion attack that reconstructs training samples in federated learning without architectural modifications.
Federated Learning Security Mar 18 Pending High viability
VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation Watch
A multi-agent framework for PPA-aware RTL code generation that integrates EDA tools for continuous optimization.
RTL Code Generation Mar 18 High viability
ReLaGS: Relational Language Gaussian Splatting Watch
A novel framework for efficient open-vocabulary 3D perception and reasoning without scene-specific training.
3D Perception and Reasoning Mar 18 High viability
Trust the Unreliability: Inward Backward Dynamic Unreliability Driven Coreset Selection for Medical Image Classification Watch
Dynamic Unreliability-Driven Coreset Selection enhances medical image classification by focusing on informative unreliable samples.
Medical AI Mar 18 High viability
A Contextual Help Browser Extension to Assist Digital Illiterate Internet Users Watch
A browser extension that provides contextual help for digital illiterate users by delivering on-demand definitions of technical terms.
Digital Literacy Tools Mar 18 High viability
Edit-As-Act: Goal-Regressive Planning for Open-Vocabulary 3D Indoor Scene Editing Watch
Edit-As-Act enables precise 3D indoor scene editing from natural language instructions through goal-regressive planning.
3D Scene Editing Mar 18 High viability
Unsupervised Symbolic Anomaly Detection Watch
SYRAN offers interpretable unsupervised anomaly detection through symbolic regression, providing human-readable equations for anomaly scoring.
Anomaly Detection Mar 18 High viability
HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness Watch
HeiSD accelerates robot control through hybrid speculative decoding for improved inference speeds.
Robotics Mar 18 High viability
FoMo X: Modular Explainability Signals for Outlier Detection Foundation Models Watch
FoMo-X enhances outlier detection models with intrinsic explainability for safer decision-making.
Outlier Detection Mar 18 High viability
Face anonymization preserving facial expressions and photometric realism Watch
A framework for face anonymization that preserves facial expressions and photometric realism for privacy-sensitive applications.
Face Anonymization Mar 18 High viability
FrescoDiffusion: 4K Image-to-Video with Prior-Regularized Tiled Diffusion Watch
FrescoDiffusion enables high-quality, coherent video generation from complex images using a novel tiled diffusion approach.
Generative Video Mar 18 High viability
ProGVC: Progressive-based Generative Video Compression via Auto-Regressive Context Modeling Watch
ProGVC is a novel video compression framework that combines progressive transmission and efficient entropy coding for enhanced perceptual quality at low bitrates.
Video Compression Mar 18 High viability
Rel-Zero: Harnessing Patch-Pair Invariance for Robust Zero-Watermarking Against AI Editing Watch
Rel-Zero offers a robust zero-watermarking solution that maintains visual fidelity against AI editing.
Digital Watermarking Mar 18 High viability
AdapTS: Lightweight Teacher-Student Approach for Multi-Class and Continual Visual Anomaly Detection Watch
AdapTS is a lightweight Teacher-Student framework for efficient multi-class and continual visual anomaly detection optimized for edge deployment.
Visual Anomaly Detection Mar 18 High viability
MM-OVSeg:Multimodal Optical-SAR Fusion for Open-Vocabulary Segmentation in Remote Sensing Watch
MM-OVSeg is a multimodal framework for resilient open-vocabulary segmentation in remote sensing, leveraging optical and SAR data.
Remote Sensing Mar 18 High viability
PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and Part Segmentation Build Now
PCA-Seg enhances open-vocabulary semantic and part segmentation by improving cost aggregation through a parallel structure.
Computer Vision Mar 18 Pending High viability
Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality Build Now
XBridge enhances multilingual capabilities of LLMs by integrating pretrained encoder-decoder translation models for improved performance on low-resource languages.
Multilingual LLMs Mar 18 Pending High viability
Interpreting Context-Aware Human Preferences for Multi-Objective Robot Navigation Watch
A pipeline that enables robots to adapt their navigation based on context-aware human preferences using advanced language and reinforcement learning models.
Robotics Mar 18 High viability
QuantFL: Sustainable Federated Learning for Edge IoT via Pre-Trained Model Quantisation Watch
QuantFL is a sustainable federated learning framework that reduces energy costs for IoT devices through efficient model quantization.
Federated Learning Mar 18 High viability
From Optimizable to Interactable: Mixed Digital Twin-Empowered Testing of Vehicle-Infrastructure Cooperation Systems Watch
IMPACT is an interactive testing framework for vehicle-infrastructure cooperation systems that enhances corner-case generation through human interaction.
Vehicle-Infrastructure Cooperation Systems Mar 18 High viability
UAV-CB: A Complex-Background RGB-T Dataset and Local Frequency Bridge Network for UAV Detection Watch
A novel dataset and network for robust UAV detection in complex backgrounds.
UAV Detection Mar 18 High viability
UniSAFE: A Comprehensive Benchmark for Safety Evaluation of Unified Multimodal Models Watch
UniSAFE provides a comprehensive benchmark for evaluating the safety of unified multimodal models across various tasks.
Safety Benchmarking Mar 18 High viability
Revisiting Cross-Attention Mechanisms: Leveraging Beneficial Noise for Domain-Adaptive Learning Watch
A framework that enhances domain-adaptive learning through beneficial noise in cross-attention mechanisms.
Domain Adaptation Mar 18 High viability
VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection Watch
VirPro enhances weakly-supervised monocular 3D detection by integrating visual-referred prompts for improved scene-aware representations.
3D Object Detection Mar 18 High viability
Efficient Soft Actor-Critic with LLM-Based Action-Level Guidance for Continuous Control Watch
GuidedSAC enhances reinforcement learning efficiency by integrating LLMs for action-level guidance in continuous control tasks.
Reinforcement Learning Mar 18 High viability
AR-CoPO: Align Autoregressive Video Generation with Contrastive Policy Optimization Watch
AR-CoPO enhances autoregressive video generation by aligning it with contrastive policy optimization for improved quality and generalization.
Generative Video Mar 18 High viability
P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation Watch
P$^{3}$Nav is an end-to-end framework that enhances Vision-and-Language Navigation by integrating perception, prediction, and planning.
Vision-and-Language Navigation Mar 18 High viability
VLM2Rec: Resolving Modality Collapse in Vision-Language Model Embedders for Multimodal Sequential Recommendation Watch
VLM2Rec enhances multimodal sequential recommendation by balancing modality utilization in Vision-Language Models.
Multimodal Recommendation Mar 18 High viability
AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement Watch
AdaZoom-GUI enhances GUI grounding accuracy through adaptive zoom and instruction refinement for vision-language models.
GUI Grounding Mar 18 High viability
Baguan-TS: A Sequence-Native In-Context Learning Model for Time Series Forecasting with Covariates Watch
Baguan-TS is a novel Transformer model that enhances time series forecasting through in-context learning and raw-sequence representation.
Time Series Forecasting Mar 18 High viability
FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation Watch
FloorPlan-VLN revolutionizes navigation by integrating concise instructions with structured floor plans for enhanced spatial reasoning.
Vision-Language Navigation Mar 18 High viability
TimeAPN: Adaptive Amplitude-Phase Non-Stationarity Normalization for Time Series Forecasting Build Now
TimeAPN is a novel framework that enhances time series forecasting by addressing non-stationarity through adaptive amplitude-phase normalization.
Time Series Forecasting Mar 18 Pending High viability
ECHO: Towards Emotionally Appropriate and Contextually Aware Interactive Head Generation Watch
ECHO synthesizes lifelike avatar head videos with emotionally appropriate and contextually aware facial behaviors.
Interactive Head Generation Mar 18 High viability
Joint Degradation-Aware Arbitrary-Scale Super-Resolution for Variable-Rate Extreme Image Compression Watch
ASSR-EIC is a novel image compression framework that enables variable-rate extreme image compression with high fidelity and realism.
Image Compression Mar 18 High viability
Motion-Adaptive Temporal Attention for Lightweight Video Generation with Stable Diffusion Build Now
A lightweight video generation system that adapts temporal attention based on motion content using Stable Diffusion.
Video Generation Mar 18 Pending High viability
Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation Watch
A two-stage framework for accurate 3D hand pose estimation using gesture-aware pretraining and token fusion.
3D Hand Pose Estimation Mar 18 High viability
VisionNVS: Self-Supervised Inpainting for Novel View Synthesis under the Virtual-Shift Paradigm Watch
VisionNVS revolutionizes Novel View Synthesis for autonomous driving by transforming it into a self-supervised inpainting task.
Novel View Synthesis Mar 18 High viability
SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction Watch
SCALE is a specialized model for predicting virtual cell responses to perturbations, enhancing both speed and biological accuracy.
Biological AI Mar 18 High viability
Efficient Exploration at Scale Watch
An online learning algorithm that enhances data efficiency in reinforcement learning from human feedback by significantly reducing the number of required labels.
Reinforcement Learning Mar 18 High viability
Stereo World Model: Camera-Guided Stereo Video Generation Watch
StereoWorld is a camera-conditioned stereo world model for efficient stereo video generation.
Generative Video Mar 18 High viability
Shot-Aware Frame Sampling for Video Understanding Build Now
InfoShot is a shot-aware frame sampler that enhances long-video understanding by intelligently selecting keyframes to retain critical information.
Video Understanding Mar 18 Pending High viability
Material Magic Wand: Material-Aware Grouping of 3D Parts in Untextured Meshes Watch
Material Magic Wand simplifies the tedious process of material assignment in 3D modeling by automating part grouping based on material properties.
3D Modeling Tools Mar 18 High viability
Towards Safer Large Reasoning Models by Promoting Safety Decision-Making before Chain-of-Thought Generation Watch
A novel safety alignment method for large reasoning models that enhances safety decision-making before chain-of-thought generation.
Safety in AI Models Mar 18 High viability
Public Profile Matters: A Scalable Integrated Approach to Recommend Citations in the Wild Watch
Profiler enhances citation recommendation systems by efficiently capturing human citation patterns without bias.
Citation Recommendation Mar 18 High viability
TDAD: Test-Driven Agentic Development - Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis Watch
Reduce code regressions in AI coding agents with Test-Driven Agentic Development using graph-based impact analysis.
AI Development Tools Mar 18
VideoAtlas: Navigating Long-Form Video in Logarithmic Compute Watch
VideoAtlas offers a lossless, scalable environment for navigating long-form video using hierarchical grids.
Video Understanding Mar 18
IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia Watch
IndicSafe is a benchmark for evaluating the safety of multilingual LLMs in culturally diverse South Asian contexts.
LLM Safety Evaluation Mar 18
RAMP: Reinforcement Adaptive Mixed Precision Quantization for Efficient On Device LLM Inference Watch
RAMP optimizes large language model inference on resource-constrained devices through adaptive mixed precision quantization.
LLM Optimization Mar 18
Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations? Watch
EditSpilloverProbe leverages edit spillover in image editing models to assess their implicit understanding of world relations.
Image Editing Models Mar 18
VISER: Visually-Informed System for Enhanced Robustness in Open-Set Iris Presentation Attack Detection Watch
A novel approach to enhance robustness in iris presentation attack detection using human perceptual priors.
Computer Vision Mar 18
Event-Centric Human Value Understanding in News-Domain Texts: An Actor-Conditioned, Multi-Granularity Benchmark Watch
NEVU is a benchmark for actor-conditioned, event-centric human value recognition in news articles, enabling fine-grained evaluation of value cues.
NLP Benchmarking Mar 18
CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents Watch
CodeScout enhances developer productivity by using reinforcement learning to optimize code search.
AI for Developer Tools Mar 18 Pending
Process Supervision for Chain-of-Thought Reasoning via Monte Carlo Net Information Gain Watch
A novel method for generating step-level labels to enhance multi-step reasoning in large language models.
LLM Training Mar 18
Towards Infinitely Long Neural Simulations: Self-Refining Neural Surrogate Models for Dynamical Systems Watch
A self-refining neural surrogate model that enhances the simulation of dynamical systems by ensuring long-time consistency.
Dynamical Systems Simulation Mar 18
Predicting Trajectories of Long COVID in Adult Women: The Critical Role of Causal Disentanglement Watch
A predictive framework for assessing long COVID severity in women by integrating clinical and wearable data.
Medical AI Mar 18
DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation Watch
Automate CT report generation using AI-enhanced visual semantic prompting for radiologists.
Healthcare AI Mar 18
MALLES: A Multi-agent LLMs-based Economic Sandbox with Consumer Preference Alignment Watch
MALLES is a multi-agent economic simulation framework leveraging LLMs for consumer preference alignment.
Economic Simulation Mar 18
Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment Watch
A method to enhance local vision-language alignment in few-shot learning for better interpretability in medical diagnosis.
Cross-Domain Learning Mar 18
A Multi-Agent System for Building-Age Cohort Mapping to Support Urban Energy Planning Watch
A multi-agent LLM system that fuses data to accurately determine urban building age for energy planning.
Urban Energy Planning Mar 18
CLeAN: Continual Learning Adaptive Normalization in Dynamic Environments Watch
CLeAN is an adaptive normalization technique that enhances continual learning in dynamic environments by mitigating catastrophic forgetting.
Continual Learning Mar 18
KineVLA: Towards Kinematics-Aware Vision-Language-Action Models with Bi-Level Action Decomposition Watch
KineVLA enhances robotic manipulation through a novel vision-language-action framework that integrates detailed kinematic attributes.
Vision-Language-Action Mar 18
EI: Early Intervention for Multimodal Imaging based Disease Recognition Watch
A novel framework for multimodal medical imaging that enhances disease recognition by leveraging early intervention techniques.
Medical Imaging Mar 18 Pending
The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle Watch
The Phasor Transformer introduces a novel approach to time-series forecasting by leveraging phase-native representations for efficient token mixing.
Time-Series Forecasting Mar 18
CRE-T1 Preview Technical Report: Beyond Contrastive Learning for Reasoning-Intensive Retrieval Watch
T1 is a generative retrieval model that enhances reasoning-intensive retrieval by dynamically generating intermediate reasoning trajectories.
Reasoning-Intensive Retrieval Mar 18
A 3D Reconstruction Benchmark for Asset Inspection Watch
A new dataset for benchmarking 3D reconstruction methods in asset inspection, addressing critical gaps in existing datasets.
3D Reconstruction Mar 18
AgentFactory: A Self-Evolving Framework Through Executable Subagent Accumulation and Reuse Watch
Develop a framework that accumulates and reuses executable subagents for self-evolving AI systems.
AI Framework Mar 18 Pending
Toward Scalable Automated Repository-Level Datasets for Software Vulnerability Detection Watch
Automated benchmark generator for scalable software vulnerability detection datasets.
Software Security Mar 18
ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation Watch
ConGA provides a framework for gender annotation in machine translation to reduce bias and improve accuracy.
NLP Bias Mitigation Mar 18
Noise-Aware Misclassification Attack Detection in Collaborative DNN Inference Watch
A noise-aware anomaly detection framework for securing collaborative DNN inference against misclassification attacks.
Security in AI Mar 18
Pretrained Multilingual Transformers Reveal Quantitative Distance Between Human Languages Watch
A method leveraging pretrained multilingual models to quantitatively measure language distance for linguistic analysis.
NLP Mar 18
Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs Watch
A framework for analyzing and optimizing privacy in AI agents using differential privacy techniques.
Differential Privacy Mar 18
How do LLMs Compute Verbal Confidence Watch
A study revealing how LLMs compute verbal confidence, enhancing our understanding of model uncertainty.
LLM Calibration Mar 18
Text-to-Stage: Spatial Layouts from Long-form Narratives Watch
Automating stage-play layout generation from long-form narratives using advanced language models.
NLP Applications Mar 18
TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models Watch
TINA is a novel attack method that uncovers erased concepts in text-to-image diffusion models by leveraging visual-only probes.
Adversarial Attacks Mar 18
Dropout Robustness and Cognitive Profiling of Transformer Models via Stochastic Inference Watch
This research benchmarks dropout robustness in transformer models, providing insights for selecting models in uncertainty-aware applications.
Transformer Robustness Mar 18
Concept-to-Pixel: Prompt-Free Universal Medical Image Segmentation Watch
Develop a universal, prompt-free AI for medical image segmentation.
Medical Imaging AI Mar 18
Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design Watch
A systematic approach to designing effective LLM-based embedding pipelines for tabular prediction.
Tabular Data Embeddings Mar 18
Adaptive Guidance for Retrieval-Augmented Masked Diffusion Models Watch
Adaptive Retrieval-Augmented Masked Diffusion enhances QA performance by dynamically calibrating guidance based on retrieved context reliability.
Retrieval-Augmented Generation Mar 18
DeepCORO-CLIP: A Multi-View Foundation Model for Comprehensive Coronary Angiography Video-Text Analysis and External Validation Watch
AI system for analyzing coronary angiography videos with video-text context understanding.
Healthcare AI Mar 18
AgentVLN: Towards Agentic Vision-and-Language Navigation Watch
Develop an agentic vision-and-language navigation tool for enhanced autonomous systems.
Vision-and-Language Mar 18 Pending
AURORA Model of Formant-to-Tongue Inversion for Didactic and Clinical Applications Watch
AURORA is a model that predicts tongue displacement in vowel sounds, serving as a didactic tool and biofeedback application.
Speech Technology Mar 18
Informative Semi-Factuals for XAI: The Elaborated Explanations that People Prefer Watch
A novel algorithm for generating informative semi-factual explanations in XAI that enhances user understanding of automated decisions.
Explainable AI Mar 18
Detecting the Machine: A Comprehensive Benchmark of AI-Generated Text Detectors Across Architectures, Domains, and Adversarial Conditions Watch
A comprehensive benchmark for evaluating AI-generated text detectors across various models and conditions.
AI Detection Mar 18
TRiMS: Real-Time Tracking of Minimal Sufficient Length for Efficient Reasoning via RL Watch
TRiMS optimizes reasoning efficiency in language models by minimizing token usage while maintaining accuracy.
NLP Mar 18
Physics-informed Deep Mixture-of-Koopmans Vehicle Dynamics Model with Dual-branch Encoder for Distributed Electric-drive Trucks Watch
A data-driven vehicle dynamics modeling method for distributed electric-drive trucks using Koopman operator theory.
Vehicle Dynamics Mar 18
PJB: A Reasoning-Aware Benchmark for Person-Job Retrieval Watch
PJB is a reasoning-aware benchmark designed to enhance person-job retrieval systems by diagnosing performance failures.
Recruitment AI Mar 18
A Single-Fiber Optical Frequency Domain Reflectometry (OFDR)-Based Shape Sensing of Concentric Tube Steerable Drilling Robots Ignore
A novel shape-sensing method for steerable drilling robots using Optical Frequency Domain Reflectometry.
Robotics Mar 18
Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures Ignore
A novel framework for evaluating and mitigating gender bias in machine translation models.
NLP Bias Evaluation Mar 18
AI-Assisted Goal Setting Improves Goal Progress Through Social Accountability Ignore
An AI career coach that enhances goal progress through social accountability.
AI Coaching Mar 18
Differential Attention-Augmented BiomedCLIP with Asymmetric Focal Optimization for Imbalanced Multi-Label Video Capsule Endoscopy Classification Ignore
A novel framework for multi-label classification in video capsule endoscopy that tackles class imbalance using advanced attention mechanisms and optimization strategies.
Medical AI Mar 18 Pending
RHYME-XT: A Neural Operator for Spatiotemporal Control Systems Ignore
RHYME-XT is a neural operator framework for efficient surrogate modeling of spatiotemporal control systems.
Neural Operators Mar 18
Symmetry-Reduced Physics-Informed Learning of Tensegrity Dynamics Ignore
SymPINN leverages symmetry in tensegrity structures to enhance the efficiency and accuracy of physics-informed neural networks for dynamic predictions.
Physics-Informed Learning Mar 18
Huddle: Parallel Shape Assembly using Decentralized, Minimalistic Robots Ignore
A decentralized algorithm for efficient shape assembly using minimalistic robots.
Robotics Mar 18
On Securing the Software Development Lifecycle in IoT RISC-V Trusted Execution Environments Ignore
A toolkit for enhancing the software development lifecycle in RISC-V Trusted Execution Environments for IoT and automotive applications.
IoT Security Mar 18
Machine Learning for Network Attacks Classification and Statistical Evaluation of Machine Learning for Network Attacks Classification and Adversarial Learning Methodologies for Synthetic Data Generation Ignore
A unified multi-modal dataset and machine learning approach for effective network intrusion detection and synthetic data generation.
Network Security Mar 18
Eye image segmentation using visual and concept prompts with Segment Anything Model 3 (SAM3) Ignore
A comparative study of eye image segmentation using the Segment Anything Model 3 with visual and concept prompts.
Medical AI Mar 18
WeatherReasonSeg: A Benchmark for Weather-Aware Reasoning Segmentation in Visual Language Models Ignore
WeatherReasonSeg benchmarks the performance of vision-language models under adverse weather conditions to enhance robust reasoning segmentation.
Visual Language Models Mar 18
Illumination-Aware Contactless Fingerprint Spoof Detection via Paired Flash-Non-Flash Imaging Ignore
A novel approach to enhance contactless fingerprint spoof detection using paired flash-non-flash imaging.
Biometric Security Mar 18 Pending
Automated Grammar-based Algebraic Multigrid Design With Evolutionary Algorithms Ignore
This paper presents a novel approach to optimize multigrid methods using evolutionary algorithms and genetic programming.
Optimization Algorithms Mar 18
rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks Ignore
rSDNet offers a robust neural learning framework to combat label noise and adversarial attacks in classification tasks.
Robust Neural Learning Mar 18
Complementary Reinforcement Learning Ignore
Complementary RL enhances sample efficiency in LLM-based agents by optimizing experience extraction alongside policy learning.
Reinforcement Learning Mar 18
Temporal Narrative Monitoring in Dynamic Information Environments Ignore
A framework for monitoring evolving narratives in crisis information environments using semantic embeddings and clustering.
Crisis Information Management Mar 18
AdaMuS: Adaptive Multi-view Sparsity Learning for Dimensionally Unbalanced Data Ignore
AdaMuS is a framework that addresses dimensional imbalances in multi-view learning through adaptive sparsity and self-supervised learning.
Multi-view Learning Mar 18
End-to-end data-driven prediction of urban airflow and pollutant dispersion Ignore
A data-driven framework for predicting urban airflow and pollutant dispersion to aid decision-makers in environmental management.
Urban Airflow Prediction Mar 18
Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity Ignore
A method to recover latent actions and dynamics from offline trajectories using demonstrator diversity.
Reinforcement Learning Mar 18
In Trust We Survive: Emergent Trust Learning Ignore
Emergent Trust Learning enhances AI agents' cooperation in competitive environments through a lightweight trust-based control algorithm.
Agents Mar 18
Conditional Inverse Learning of Time-Varying Reproduction Numbers Inference Ignore
A framework for estimating time-varying reproduction numbers in infectious disease surveillance using conditional inverse learning.
Epidemiological Modeling Mar 18
Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models Ignore
This research explores the impact of video-based fine-tuning on multimodal large language models, revealing trade-offs in visual performance.
Multimodal Learning Mar 18
Translation Invariance of Neural Operators for the FitzHugh-Nagumo Model Ignore
This study benchmarks Neural Operators for modeling excitable cell dynamics in the FitzHugh-Nagumo model.
Neural Operators Mar 18
Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless Communications Ignore
This research proposes adaptive network coding to enhance communication reliability in multi-robot systems operating over wireless channels.
Multi-Robot Systems Mar 18
DDH-based schemes for multi-party Function Secret Sharing Ignore
A novel DDH-based technique for reducing key sizes in multi-party Function Secret Sharing schemes.
Cryptography Mar 18
Mutually Causal Semantic Distillation Network for Zero-Shot Learning Ignore
MSDN++ enhances zero-shot learning by distilling intrinsic semantic representations through mutually causal attention.
Zero-Shot Learning Mar 18
Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics Ignore
A framework for improving causal representation learning through enhanced dataset evaluation and reproducibility.
Causal Representation Learning Mar 18
SafeTutors: Benchmarking Pedagogical Safety in AI Tutoring Systems Ignore
SafeTutors benchmarks the pedagogical safety of AI tutoring systems to enhance learning outcomes.
Educational AI Mar 18
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Ignore
Introducing a novel token pruning technique for enhancing efficiency in vision-language models for video tasks.
Video VLM Optimization Mar 18
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding Ignore
Motion-MLLM enhances spatial reasoning in 3D scenes using egomotion data for improved efficiency.
3D Scene Understanding Mar 18
LaDe: Unified Multi-Layered Graphic Media Generation and Decomposition Ignore
LaDe enables the creation of fully editable layered design documents from natural language prompts.
Media Generation Mar 18
Unified Policy Value Decomposition for Rapid Adaptation Ignore
A framework for rapid adaptation in reinforcement learning using shared low-dimensional goal embeddings.
Reinforcement Learning Mar 18
CARE: Covariance-Aware and Rank-Enhanced Decomposition for Enabling Multi-Head Latent Attention Ignore
CARE enhances multi-head latent attention conversion for improved inference efficiency.
NLP Optimization Mar 18
RoboForge: Physically Optimized Text-guided Whole-Body Locomotion for Humanoids Ignore
A unified framework for optimizing text-guided locomotion in humanoid robots through physics-based motion generation.
Humanoid Robotics Mar 18
Only relative ranks matter in weight-clustered large language models Ignore
A novel approach to compress large language models by focusing on the relative rank of weights.
LLM Compression Mar 18
A Noise Sensitivity Exponent Controls Large Statistical-to-Computational Gaps in Single- and Multi-Index Models Ignore
This research identifies a Noise Sensitivity Exponent that links noise robustness and computational hardness in high-dimensional learning models.
Statistical Learning Theory Mar 18
SoK: From Silicon to Netlist and Beyond $-$ Two Decades of Hardware Reverse Engineering Research Ignore
A comprehensive analysis of hardware reverse engineering research to enhance security practices.
Hardware Security Mar 18
Physics-Aware Machine Learning for Seismic and Volcanic Signal Interpretation Ignore
A machine learning approach to enhance the reliability of seismic and volcanic monitoring through improved signal interpretation.
Seismic Monitoring Mar 18
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation Ignore
ChopGrad introduces a memory-efficient method for fine-tuning video diffusion models using truncated backpropagation.
Video Generation Mar 18
Governed Memory: A Production Architecture for Multi-Agent Workflows Ignore
A novel architecture to enhance multi-agent workflows with governed memory.
AI Infrastructure Mar 18
ResNet-50 with Class Reweighting and Anatomy-Guided Temporal Decoding for Gastrointestinal Video Analysis Ignore
A gastrointestinal video analysis pipeline using ResNet-50 for multi-label classification.
Medical AI Mar 18
Modeling Overlapped Speech with Shuffles Ignore
A novel algorithm for single-pass alignment of multi-talker recordings using finite-state automata.
Speech Processing Mar 18
Facial Movement Dynamics Reveal Workload During Complex Multitasking Ignore
A novel approach to monitor cognitive workload using facial movement dynamics from standard webcams.
Cognitive Load Monitoring Mar 18
From Virtual Environments to Real-World Trials: Emerging Trends in Autonomous Driving Ignore
A comprehensive review of synthetic data and simulation technologies for advancing autonomous driving.
Autonomous Driving Mar 18
Objective Mispricing Detection for Shortlisting Undervalued Football Players via Market Dynamics and News Signals Ignore
A framework for identifying undervalued football players using market dynamics and news signals.
Sports Analytics Mar 18
REAL: Robust Extreme Agility via Spatio-Temporal Policy Learning and Physics-Guided Filtering Ignore
REAL is an advanced framework for reliable legged parkour that adapts to sensory corruption.
Robotics Mar 18
Benchmarking Reinforcement Learning via Stochastic Converse Optimality: Generating Systems with Known Optimal Policies Ignore
A rigorous benchmarking framework for evaluating Reinforcement Learning algorithms through controlled environments.
Reinforcement Learning Mar 18
LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation Ignore
LoGSAM is a parameter-efficient framework for MRI tumor segmentation using radiologist dictation as prompts.
Medical AI Mar 18
Learning Coordinate-based Convolutional Kernels for Continuous SE(3) Equivariant and Efficient Point Cloud Analysis Ignore
ECKConv introduces a novel kernel architecture for efficient SE(3) equivariant learning in point cloud tasks.
Point Cloud Analysis Mar 18
Anisotropic Permeability Tensor Prediction from Porous Media Microstructure via Physics-Informed Progressive Transfer Learning with Hybrid CNN-Transformer Ignore
A physics-informed deep learning framework for predicting permeability tensors from porous media microstructure images.
Physics-Informed Learning Mar 18
Learning When to Attend: Conditional Memory Access for Long-Context LLMs Ignore
L2A introduces a conditional memory access layer for long-context language models, optimizing attention usage and improving training efficiency.
LLM Training Mar 18
Auto-Unrolled Proximal Gradient Descent: An AutoML Approach to Interpretable Waveform Optimization Ignore
An AutoML approach that optimizes wireless beamforming and waveforms using interpretable deep unfolding techniques.
AutoML Optimization Mar 18 Pending
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression Ignore
ZipServ optimizes LLM inference through innovative lossless compression techniques tailored for GPU architectures.
LLM Inference Optimization Mar 18
Proactive Knowledge Inquiry in Doctor-Patient Dialogue: Stateful Extraction, Belief Updating, and Path-Aware Action Planning Ignore
A framework for proactive knowledge inquiry in doctor-patient dialogues to enhance EMR generation.
Medical AI Mar 18
Large-Scale 3D Ground-Motion Synthesis with Physics-Inspired Latent Operator Flow Matching Ignore
GMFlow is a physics-inspired framework for rapid generation of realistic ground-motion time histories for earthquake hazard analysis.
Earthquake Hazard Analysis Mar 18
Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis Ignore
This paper explores a generative model for 3D avatar sign language motion using phonological attributes.
Sign Language Generation Mar 18
Cohomological Obstructions to Global Counterfactuals: A Sheaf-Theoretic Foundation for Generative Causal Models Ignore
A novel framework for generative causal models that addresses global counterfactuals using sheaf theory.
Causal Inference Mar 18
Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift Ignore
This paper analyzes jailbreak behavior in vision-language models and proposes a defense method to enhance safety.
VLM Safety Mar 18
ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws Ignore
ShapleyLaw introduces a game-theoretic approach to optimize multilingual pretraining by quantifying cross-lingual transfer effects.
Multilingual NLP Mar 18
Multi-Armed Sequential Hypothesis Testing by Betting Ignore
A theoretical framework for optimizing multi-armed sequential hypothesis testing through betting strategies.
Statistical Testing Mar 18
SpiderCam: Low-Power Snapshot Depth from Differential Defocus Ignore
SpiderCam is a low-power FPGA-based camera that captures real-time depth maps using differential defocus.
Computer Vision Mar 18
Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs Ignore
This paper presents a theoretical framework for optimizing Markov decision processes using linear operators.
Reinforcement Learning Mar 18
Video Understanding: From Geometry and Semantics to Unified Models Ignore
This survey provides a comprehensive overview of video understanding models and their evolution.
Video Understanding Mar 18
Steering Video Diffusion Transformers with Massive Activations Ignore
This paper explores a method to enhance video generation quality using Massive Activations in video diffusion transformers.
Video Generation Mar 18
Discovering Decoupled Functional Modules in Large Language Models Ignore
A framework for discovering functional modules in Large Language Models to enhance interpretability.
LLM Interpretability Mar 18 Pending
Federated Distributional Reinforcement Learning with Distributional Critic Regularization Ignore
A novel approach to federated reinforcement learning that enhances safety by preserving distributional information.
Federated Learning Mar 18
Attention Sinks Induce Gradient Sinks Ignore
This paper explores the relationship between attention sinks and gradient sinks in Transformer models during training.
NLP Mar 18
Data Obfuscation for Secure Use of Classical Values in Quantum Computation Ignore
This paper presents a novel data obfuscation technique for securing classical values in quantum computation.
Quantum Security Mar 18
Do Language Models Encode Semantic Relations? Probing and Sparse Feature Analysis Ignore
This research investigates how large language models encode semantic relationships through probing and interpretability techniques.
NLP Research Mar 18
One-Step Sampler for Boltzmann Distributions via Drifting Ignore
A novel framework for efficient sampling from Boltzmann distributions using a one-step neural generator.
Sampling Methods Mar 18
Gaussian Process Limit Reveals Structural Benefits of Graph Transformers Ignore
This paper theoretically analyzes the structural benefits of graph transformers over traditional graph convolutional networks.
Graph Transformers Mar 18
Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs Ignore
This paper explores the consistency of the k-nearest neighbor regressor in complex survey designs, addressing a gap in existing literature.
Statistical Learning Mar 18
Per-Domain Generalizing Policies: On Learning Efficient and Robust Q-Value Functions (Extended Version with Technical Appendix) Ignore
This paper explores a novel approach to learning Q-value functions for efficient planning in reinforcement learning.
Reinforcement Learning Mar 18
CA-Based Interpretable Knowledge Representation and Analysis of Geometric Design Parameters Ignore
This paper explores the limitations of PCA in estimating design parameters for CAD applications.
Geometric Design Optimization Mar 18
Mirror Descent on Riemannian Manifolds Ignore
This paper presents a generalized Mirror Descent method for optimization on Riemannian manifolds.
Optimization Algorithms Mar 18
UniSem: Generalizable Semantic 3D Reconstruction from Sparse Unposed Images Ignore
UniSem enhances 3D reconstruction accuracy and semantic generalization from sparse images using innovative error-aware techniques.
3D Reconstruction Mar 18
Proof-of-Authorship for Diffusion-based AI Generated Content Ignore
A framework for proving authorship of AI-generated content using cryptographic methods.
Intellectual Property in AI Mar 18
Humans and transformer LMs: Abstraction drives language learning Ignore
This paper explores how transformer LMs learn linguistic categories, shedding light on language acquisition processes.
NLP Mar 18
FACE-net: Factual Calibration and Emotion Augmentation for Retrieval-enhanced Emotional Video Captioning Ignore
FACE-net enhances emotional video captioning by calibrating factual content and augmenting emotional cues.
Video Captioning Mar 18
When Only the Final Text Survives: Implicit Execution Tracing for Multi-Agent Attribution Ignore
A framework for token-level attribution in multi-agent language systems to enhance accountability.
Multi-Agent Systems Mar 18
SHIFT: Motion Alignment in Video Diffusion Models with Adversarial Hybrid Fine-Tuning Ignore
SHIFT introduces a novel fine-tuning framework to enhance motion fidelity in video diffusion models.
Video Diffusion Models Mar 18
From Digital Twins to World Models:Opportunities, Challenges, and Applications for Mobile Edge General Intelligence Ignore
This paper surveys the transition from digital twins to world models for enhancing edge general intelligence in 6G networks.
Edge AI Mar 18
Bootstrapping Coding Agents: The Specification Is the Program Ignore
A theoretical exploration of self-bootstrapping coding agents based on specifications.
Agents Mar 18
The Causal Uncertainty Principle: Manifold Tearing and the Topological Limits of Counterfactual Interventions Ignore
A theoretical exploration of the limits of causal interventions in generative models.
Causal Inference Mar 18
Variational Kernel Design for Internal Noise: Gaussian Chaos Noise, Representation Compatibility, and Reliable Deep Learning Ignore
A theoretical framework for understanding and designing internal noise mechanisms in deep learning.
Deep Learning Noise Mechanisms Mar 18