Papers
250
With code
0
Suggested Build
Suggested Watch
154
Mock watchlist output. This is what your daily delivery layer would look like.
ExpressMind: A Multimodal Pretrained Large Language Model for Expressway Operation
Needs sharper wedge before committing
Saved thesis
Find deployable ai papers with public code, proof pass, and a wedge that can ship inside 6 weeks.
Novelty / saturation by cluster
Rare clusters = higher novelty potential
ExpressMind is a pioneering multimodal AI solution optimizing expressway operations through advanced reasoning and real-time decision-making.
Boost lightweight 3D hand reconstruction on mobile and VR devices with Fast-HaMeR.
Build cutting-edge NLP models for Pashto using the largest available Pashto language corpus, PashtoCorp.
Transform aerial imagery into seamless, topology-consistent vector maps with ACPV-Net.
OmniSONAR offers an unprecedented omnilingual cross-modal embedding solution for multilingual translation and search applications.
A lightweight approach to enable efficient reasoning in small LLMs for mobile devices using LoRA adapters and reinforcement learning.
SegviGen repurposes 3D generative models for efficient part segmentation with minimal training data.
MessyKitchens offers a novel dataset and advanced methods for accurate 3D scene reconstruction in cluttered environments.
SparkVSR offers an interactive framework for video super-resolution that allows users to control output quality through keyframe selection.
MolmoBot enables effective zero-shot manipulation in robotics using large-scale simulated data.
DreamPlan enhances Vision-Language Models for robotic manipulation through efficient reinforcement fine-tuning using video world models.
BRICKSIM offers a real-time simulator for realistic robotic manipulation of interlocking brick assemblies, integrating effortlessly with robotic workflows.
GIST is a novel graph transformer architecture that achieves scalable, gauge-invariant learning for graph-structured data.
Automate professional slide deck creation using LLMs and a novel inverse specification reward system.
DexGrasp-Zero enables zero-shot dexterous grasping across diverse robotic hands using a novel morphology-aligned policy.
InCoder-32B is a specialized code foundation model designed to enhance programming tasks in industrial scenarios.
CIRCLES enhances vision-language models by using counterfactual examples for improved in-context learning and causal reasoning.
CritiSense boosts digital literacy to combat misinformation through a multilingual mobile app with interactive challenges.
Fast-WAM optimizes embodied control by eliminating test-time future imagination while maintaining competitive performance.
A fine-tuned language model that automates the evaluation of research pitches, enhancing decision-making in scientific publishing.
Omanic provides a structured approach to evaluate multi-hop reasoning in large language models through detailed annotations and a challenging benchmark.
HeBA adapts Vision-Language Models efficiently with innovative architectural biases for enhanced downstream task performance.
BUSSARD leverages normalizing flows for efficient and robust anomaly detection in scene graphs.
BATQuant optimizes quantization for multi-modal large language models, achieving state-of-the-art performance while minimizing outlier impact.
REFORGE is a black-box red-teaming framework that enhances the robustness of image generation model unlearning against adversarial attacks.
A deep learning model predicts cancer cell fate from raw video data, enhancing treatment strategies with explainable insights.
DynHD offers a novel approach to detect hallucinations in diffusion large language models by analyzing token-level uncertainty and denoising dynamics.
DISCOVER is a model-agnostic solver that enhances distributional counterfactual explanations for non-differentiable models in tabular data.
Transform natural language trading intents into executable option strategies using a novel query language and LLMs.
A self-supervised cross-modal approach for efficient plankton recognition using minimal labeled data.
HGP-Mamba integrates histology and generated protein features for advanced cancer survival risk prediction.
PlotTwist is a framework that empowers small language models to generate high-quality plots efficiently.
Fanar 2.0 is a sovereign Arabic generative AI platform that delivers advanced language and multimodal capabilities.
DermaFlux generates synthetic skin lesion images to enhance classification accuracy in dermatology.
Rotated Robustness offers a training-free defense against bit-flip attacks on Large Language Models, ensuring reliability and accuracy.
A lightweight, real-time underwater image enhancement framework that restores color accuracy for underwater missions.
A study revealing the hidden critique ability in Large Reasoning Models to enhance error detection and self-correction.
Omnilingual MT offers high-quality machine translation for over 1,600 languages, significantly expanding multilingual capabilities.
VisBrowse-Bench is a benchmark for evaluating visual reasoning in multimodal browsing agents.
WorldCam revolutionizes interactive gaming by using camera pose for precise action control and long-term 3D consistency.
ManiTwin automates the generation of 3D digital assets for scalable robotic manipulation data.
Chronos enhances conversational AI with structured temporal memory for improved long-term interaction.
SocialOmni is a benchmark for evaluating audio-visual social interactivity in omni-modal large language models.
SOMA unifies diverse parametric human body models for seamless reconstruction and animation.
FedAOT is a novel defense mechanism for Byzantine-robust federated learning that dynamically weights client updates to enhance model resilience against adversarial attacks.
M^3 enhances monocular SLAM with precise pose estimation and dynamic area suppression for superior scene reconstruction.
A deep reinforcement learning framework for optimizing edge offloading in latency-sensitive XR applications.
Surg$Σ$ offers a comprehensive multimodal data foundation for enhancing surgical intelligence across diverse clinical tasks.
CABTO automates the construction of reliable behavior tree systems for robot manipulation using large models and contextual feedback.
RaDAR enhances recommendation systems by addressing data sparsity and noise through innovative graph contrastive learning techniques.
A novel approach using adaptive moment estimation to enhance guided diffusion sampling for image restoration and generation.
V-Co enhances visual representation alignment in generative models through effective co-denoising techniques.
SpokenUS is a spoken user simulator designed to enhance task-oriented dialogue agents with realistic user behaviors.
IOSVLM is a 3D vision-language model that enhances dental diagnosis using intraoral scans for improved clinical outcomes.
TraceR1 enhances multimodal AI agents with anticipatory reasoning for improved planning and execution.
A novel algorithm for contextual bandits that adapts to non-stationary environments, enhancing recommendation systems.
Thermopneumatic pixels provide low-voltage, rapid tactile feedback for interactive devices.
A tool for probing cultural biases in large language models through author profiling from song lyrics.
A novel method for generating high-fidelity textile patterns from clothing images using a semi-supervised latent diffusion model.
A novel label-free 3D perception system for self-driving cars leveraging city infrastructure as unsupervised teachers.
A foundation model for advanced EEG decoding using a novel Gaussian-smoothed masking scheme.
A study on how mental health disclosure impacts the safety of personalized LLM agents in task completion.
Federated learning models for predicting major postoperative complications using multicenter data while preserving patient privacy.
An IoT-based framework for real-time monitoring of student emotions to enhance classroom engagement.
Search2Motion offers a training-free solution for precise object-level motion control in video generation.
RARRL optimizes embodied robotic decision-making by adaptively managing reasoning and action execution to enhance efficiency and reliability.
$x^2$-Fusion unifies multimodal data for superior 2D and 3D motion estimation.
Kinema4D is a 4D generative robotic simulator that enhances robot-world interaction modeling for embodied AI.
Kestrel is a training-free framework that mitigates hallucinations in large vision-language models through visual grounding and self-refinement.
A novel spectral property-driven data augmentation technique for enhancing hyperspectral image classification robustness.
StyleExpert is a semantic-aware framework for diverse image stylization using a Mixture of Experts architecture.
FlowComposer enhances compositional zero-shot learning by explicitly fusing visual features with text embeddings for improved generalization.
A revolutionary soft robotic arm that achieves high-performance control without sacrificing compliance.
A framework for generating reliable textual explanations for face recognition decisions using MLLMs.
TCATSeg is a novel framework for accurate semantic segmentation of 3D dental models, enhancing digital dentistry applications.
A teleoperation-based framework for efficient data augmentation in robotic grasping using fingertip contact-aware sampling.
Proxy-GRM enhances reward modeling by generating transferable rubrics verified through proxy agents, reducing training data needs and maintaining performance.
FSMC-Pose revolutionizes cattle farm management by automating estrus detection using advanced pose estimation.
A scalable Mixed Integer Linear Programming solution for optimizing robot inspection paths in various applications.
V-DyKnow is a benchmark for evaluating and improving time-sensitive knowledge in Vision-Language Models.
Face2Scene leverages facial degradation to enhance full-scene image restoration using a novel two-stage framework.
A method to detect and mitigate object hallucinations in large vision-language models using segmentation-based attention entropy.
EmoLLM enhances dialogue by integrating emotional intelligence with cognitive reasoning for improved user interactions.
CompDiff enhances fairness and zero-shot generalization in medical image synthesis, enabling high-quality intersectional demographics from limited data.
ASCENT is a lightweight transformer model for real-time aircraft trajectory prediction to enhance aviation safety.
SAMSEM is a robust tool for segmenting metal lines in SEM images to ensure the integrity of integrated circuits.
Customizing smaller language models for domain-specific text-to-code generation using synthetic datasets.
AdaMem is an adaptive memory framework designed to enhance long-horizon dialogue agents with user-centric understanding.
GAP-MLLM enhances 3D spatial perception in multimodal large language models through geometry-aligned pre-training.
TinyGLASS is a lightweight, real-time anomaly detection system tailored for resource-constrained edge platforms.
TRUST-SQL revolutionizes Text-to-SQL parsing by enabling agents to dynamically identify relevant database schemas without pre-loaded metadata.
A novel pipeline for removing raindrops and reflections from images using a diffusion-based framework.
CD-FKD enhances object detection robustness by leveraging cross-domain feature knowledge distillation.
IRIS provides a comprehensive benchmark for unsupervised physical parameter estimation from real-world video data.
SlideFormer enables efficient fine-tuning of large language models on a single GPU, democratizing access to advanced AI capabilities.
SF-Mamba rethinks the scan operation for vision to enhance computational efficiency and performance in visual tasks.
IndexRAG transforms multi-hop question answering by enabling offline indexing for cross-document reasoning.
RECOVER is an agentic correction framework that enhances entity recognition in ASR by leveraging multiple hypotheses and LLM correction.
A real-time control pipeline for maritime cranes that suppresses payload sway using MuJoCo-based model predictive control.
A method to control fish schools using virtual agents trained with reinforcement learning.
SemTok is a novel semantic one-dimensional tokenizer that enhances image reconstruction and generation through compact token representation.
InViC enhances medical visual question answering by integrating intent-aware visual cues into large language models.
FederatedFactory revolutionizes federated learning by enabling generative one-shot learning for non-IID distributed scenarios.
A high-fidelity monocular depth estimation framework that balances speed and quality for remote sensing applications.
Automated identification of Ichneumonoidea wasps using a YOLO-based deep learning framework for biodiversity assessment.
A framework for robust 3D human pose estimation from LiDAR point clouds leveraging human-object interactions.
PKINet-v2 is an advanced backbone for remote sensing object detection that efficiently combines multiple kernel types for superior accuracy and speed.
DriveFix is a multi-view restoration framework that ensures spatio-temporal coherence for driving scenes in autonomous driving applications.
Micro-AU CLIP enhances micro-expression detection by modeling local independence and global dependency of action units.
PyPhonPlan is an open-source toolkit for simulating phonetic planning and speech dynamics using dynamic neural fields.
A novel framework for improving Spoken Question Answering by grounding evidence through attention mechanisms in SpeechLLMs.
A framework that mitigates hallucinations in Large Vision-Language Models by applying targeted feature steering based on layer relevance.
A competitive reinforcement learning solution for intercepting agile drones using trained policies.
A physics-integrated neural framework for efficient long-horizon prediction of fluid flows near solid boundaries.
GenZ-LIO is a robust LiDAR-inertial odometry framework that adapts to both indoor and outdoor environments for autonomous navigation.
VIGOR enhances video generation by using a geometry-based reward model for improved consistency and robustness.
MG-Grasp is a depth-free 6-DoF grasping framework that enhances robotic manipulation using sparse RGB observations.
A brain-controlled exoskeleton that enables precise start-stop movements for rehabilitation therapy using EEG signals.
TurnWise introduces a benchmark and data pipeline to enhance multi-turn conversation capabilities in language models.
A framework for automated microscopy that actively discovers new behaviors in target spaces using deep-kernel learning.
HMAR is an adaptive medical image retrieval framework that enhances clinical diagnosis through precise lesion-region retrieval.
A novel approach to enhance reasoning in masked discrete diffusion models by enabling self-correction through a learned Markov transition kernel.
A deep learning approach to automate brood cell detection in layer trap nests, reducing manual labeling effort and improving species classification.
A multi-robot framework for coordinated oil-spill cleanup using autonomous surface vehicles.
A novel computer vision approach leveraging the collinearity principle to enhance defect detection in industrial applications.
A novel VAE-EM framework for automated calibration of electron microscopes, reducing estimation error and improving efficiency.
Kamino is a GPU-based physics solver enabling high-throughput simulations of complex robotic systems with challenging topologies.
Plaza6G is an Experiment-as-a-Service platform that simplifies AI-assisted trials in next-generation wireless networks.
SpikeCLR leverages self-supervised learning to enhance spiking neural networks for event-based vision in low-data environments.
The eAP dataset enhances visual perception in autonomous driving by leveraging event camera data for improved 3D vehicle detection and object time-to-contact estimation.
OGScene3D enables incremental open-vocabulary 3D scene understanding for robotic applications.
EverTale is a story world simulator that enables continuous character customization and integration for enhanced visual storytelling.
A novel traffic forecasting model that incorporates incident-aware spatio-temporal dynamics for improved accuracy.
This research addresses positional bias in Vision Transformers, enhancing their applicability in material science imaging.
This research proposes novel metrics for improving the reliability of RAG-based LLMs through conformal factuality filtering.
A digitalized pipeline for optimizing inventory forecasting and costs in supply chains.
pADAM is a unified generative framework for multi-physics learning that enables accurate inference and uncertainty quantification across diverse physical laws.
Ember is a serverless, peer-to-peer encrypted messaging system designed for decentralized communication over IPv6 mesh networks.
This research explores using linguistically related languages to enhance LLM translation in low-resource settings without extensive fine-tuning.
A novel Gaussian process model that enhances multi-class classification by utilizing Aitchison geometry for improved predictive probabilities.
Tarab is the largest open Arabic corpus of song lyrics and poetry, enabling comprehensive linguistic and cultural analysis.
A novel trajectory-optimized time reparameterization method enhances the learnability of reduced-order models for stiff dynamical systems.
A comprehensive security analysis tool for AI agent skills that reduces false positives in malicious behavior classification.
A model-agnostic tool to enhance representations of deep tabular models without altering their parameters.
DanceHA is a multi-agent framework that enhances document-level aspect-based sentiment analysis through collaborative AI.
SympFormer introduces accelerated attention blocks for faster convergence in NLP tasks.
VIEW2SPACE offers a novel benchmark for advancing multi-view visual reasoning through scalable data generation and evaluation.
UOT-Unlearn offers a novel framework for safe unlearning in one-step generative models using Unbalanced Optimal Transport.
A mobile manipulator system for autonomous inspection of cluttered pipe networks in hazardous environments.
EngGPT2 is an efficient Italian LLM designed for high-performance NLP tasks with reduced resource requirements.
A proof-of-concept study demonstrating persistent memory integration in frozen LLMs for enhanced conversational learning.
A novel approach to neural network initialization that leverages spectral data properties for improved convergence and interpretability.
A machine learning framework for personalized lung cancer treatment analysis using genetic data.
A framework for automated fault diagnostics in vehicles using advanced event sequence modeling and causal discovery.
A novel approach to understanding and enhancing reasoning in video generation models through emergent behaviors.
This research identifies critical anchor selection methods to enhance the reliability of LLM evaluations.
A systematic analysis of data-centric methods for identifying and isolating label noise in remote sensing datasets.
WildDepth is a multimodal dataset designed to enhance depth estimation and 3D reconstruction for wildlife perception.
A replay-driven validation methodology for CPU-GPU integration in chiplet architectures.
A framework for optimizing treatment plans using conservative stochastic control based on patient trajectory data.
A novel framework for recognizing true emotions from masked expressions using apexframe classification.
SuCor offers a novel method for correcting geometric distortions in EPI imaging using optimal transport.
A generative AI model that selects statements reflecting common ground across diverse population preferences using a novel sampling-based algorithm.
A Bayesian model for predicting mental health symptoms from neural and behavioral data using Implicit Association Tests.
MedCL-Bench offers a standardized benchmark for evaluating continual learning in biomedical NLP models to prevent catastrophic forgetting.
A method for generating 3D-consistent worlds from video frames using non-rigid alignment techniques.
GeMA offers a novel framework for benchmarking complex systems using a latent manifold approach.
Leveraging large language models for advanced Arabic morphosyntactic tagging and dependency parsing.
SynthChain provides a synthetic benchmark and dataset for analyzing software supply chain attacks, enhancing detection capabilities.
This research explores how transformer models develop geometric representations for optimal prediction in constrained random walks.
vAccSOL optimizes AI vision workloads for mobile robots, enhancing performance and reducing power consumption.
Analyzing harmful interactions between users and LLM chatbots to mitigate psychological risks.
RecencyQA provides a dataset for improving question answering systems by categorizing questions based on how often their answers change.
A climbing robot that uses compliant pin-array grippers for stable locomotion on steep and rocky terrain.
A novel method for conservative offline robot policy learning that improves adaptation to heterogeneous datasets.
A novel relocalization framework that enhances pose refinement in 3D Gaussian Splatting by addressing pose and geometric uncertainties.
LIMBERO is a quadrupedal climbing robot designed for lunar and planetary exploration, capable of ascending steep rocky surfaces.
FEAT is a linear-complexity foundation model designed to efficiently handle extremely large structured data across various domains.
A curved-link tensegrity robot that enhances rolling locomotion while maintaining stability for space exploration.
DistriTTRL optimizes reward signals in Reinforcement Learning by leveraging model confidence distribution to enhance performance and mitigate reward hacking.
A novel dataset for high-frequency time series data to enhance time series foundation models.
This research evaluates the effectiveness of emotion understanding models in synthesized speech, highlighting significant gaps in current methodologies.
DST-Net enhances low-light images using a novel dual-stream transformer architecture for improved visibility.
A causal evaluation protocol to assess the faithfulness of LLMs to intermediate structures in decision-making.
A multi-agent reinforcement learning algorithm to enhance satellite communication by optimizing channel state information.
RetailBench is a benchmark for evaluating long-horizon decision-making of LLM agents in dynamic retail environments.
ProgressiveAvatars offers a dynamic 3D avatar representation that adapts to network conditions for real-time XR applications.
A multimodal benchmark for evaluating moral reasoning in AI systems using visual inputs.
This research proposes a shift in AI alignment strategies by emphasizing negative constraints over positive preferences for training large language models.
This paper critiques LLM benchmarking methods for Icelandic, highlighting flaws in synthetic data usage.
BADSEG uncovers vulnerabilities in semantic segmentation models to backdoor attacks, paving the way for improved defenses.
A novel calibration method for transforming DMSP nighttime light data into VIIRS format using contrastive learning.
A model for age prediction that enhances out-of-distribution generalization and mitigates bias through adversarial representation learning.
Lightweight deep learning-based intrusion detection systems enhance IoT network security against cyber threats.
NeSy-Route is a neuro-symbolic benchmark designed to enhance route planning capabilities in remote sensing applications.
A surrogate-assisted genetic programming approach to enhance decision-making in dynamic project scheduling.
Laya is an innovative EEG foundation model that enhances brain signal representation through latent predictive learning.
FG-SGL enhances micro-gesture recognition by integrating fine-grained and category-level semantics.
A framework for language models to improve continuously from real-world deployment experiences.
LEAFE enhances long-horizon agency in autonomous agents by internalizing recovery from reflective experiences.
This research explores prompt programming to enhance cultural alignment in large language models.
A low-cost, modular syringe pump system designed to enhance soft robotics applications.
An on-demand quadrupedal assistance robot system designed to enhance the independence of individuals with limited mobility.
SOMP is a scalable framework for recovering private training text from aggregated gradients in large language models.
A RADIUS-based framework for maintaining persistent device identity in network access control amidst MAC address randomization.
IQuest-Coder-V1 introduces a new family of code LLMs with a multi-stage training paradigm for enhanced code intelligence.
A learning-based controller for salamander robots that enables stable locomotion in complex environments.
This paper explores unsupervised reinforcement learning to enhance mathematical reasoning in large language models through intrinsic rewards.
A method for generating physically-based materials for 3D shapes using a video diffusion transformer.
A novel graph-based algorithm for precise segmentation of detonation cells from 3D pressure traces.
HyDRA enhances multimodal emotion recognition through a novel reasoning architecture that reconciles diverse emotional cues.
Evo-Retriever enhances multimodal document retrieval through LLM-guided curriculum evolution and viewpoint-pathway collaboration.
VQKV is a training-free method for high-fidelity KV cache compression using vector quantization.
A novel energy-safe iterative coupling method for real-time parallel simulation of port-Hamiltonian robotic systems.
A novel linear solution for near-light photometric stereo using symmetric light arrangements.
A modular framework for optimizing robot motion based on task ambiguity.
DynamicGate-MLP introduces a novel framework for efficient conditional computation in neural networks.
FactorEngine is a program-level framework for automated discovery of predictive signals in quantitative investment.
This study analyzes the impact of file-open hook points on the effectiveness of real-time backup systems against ransomware.
SseRex is a symbolic execution tool designed to detect vulnerabilities in Solana smart contracts.
ADAPT enhances humanoid locomotion in complex 3D environments through adaptive spatial sensing.
This paper explores stochastic resetting as a mechanism to accelerate policy convergence in reinforcement learning.
A novel method for estimating and testing conditional distributional treatment effects in statistical analysis.
This paper explores the complexities of Gaussian mean estimation under a contamination model, revealing trade-offs in sample size and runtime.
This research investigates the variability in results produced by AI coding agents in empirical research.
This paper explores the effects of quantizing optimizer states in LLM pre-training and proposes a method for effective state resets.
This paper explores how chain-of-thought reasoning affects uncertainty quantification in vision-language models.
This paper presents a novel Finsler metric for trajectory inference that integrates geometric and classification priors.
This paper explores optimal methods for matrix inversion updates in online outlier detection.
Exploring emergent learning behaviors in AI agent communities to enhance human-AI partnerships in education.
A proposed pipeline for developing norm-compliant reinforcement learning agents using argumentation-based supervision.
This paper proposes a novel integration of constraint propagation into dynamic programming for improved combinatorial problem solving.
This paper explores how reasoning in LLMs can both mitigate and mask sycophancy.
This paper explores AI's reasoning capabilities in analyzing unfolding geopolitical conflicts.
This paper proposes a method for aligning language models with target models through domain mixture design.
A framework for runtime governance of AI agents to manage compliance and risk.
This paper critiques the cognitive adequacy of transformers in modeling human sentence processing.
A novel unsupervised regularization scheme for autoencoders that aligns latent space distances with input data distances.
BenchPreS evaluates the context-aware application of user preferences in persistent-memory LLMs.
A framework for managing assistance allocation in LLM-enabled robots to address value disagreements.
This paper presents a new method for improving uncertainty bounds in multivariate kernel regression using Gaussian processes.
A theoretical framework for capability-aware compression in large language models.
HGFNet offers a novel approach to hyperspectral image classification by integrating 3D convolutional feature extraction with advanced frequency-domain filtering.
This research critiques the decentralisation paradox in digital identity systems.
Iris enhances monocular depth estimation by integrating real-world priors into a diffusion model.
This study explores the challenges of engaging users with a service robot in a real-world setting.
This research explores behavioral steering in a large language model using sparse autoencoders to enhance agentic traits.
This paper proposes a human-centred architecture for integrating cognitive assistants into quality management systems in manufacturing.
A comprehensive analysis of payment system designs for Central Bank Digital Currencies (CBDCs).
A novel CRT-based secret sharing scheme that aims to enhance security and flexibility in share sizes.