LLMPyTorch
Top papers
- Predictive Batch Scheduling: Accelerating Language Model Training Through Loss-Aware Sample Prioritization(5.0)
- Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models(4.0)
- Breaking the Overscaling Curse: Thinking Parallelism Before Parallel Thinking(4.0)
- Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models(3.0)