203 Episoder

  1. Test-Time RL: Self-Evolving LLMs via Majority Voting Rewards

    Publisert: 25.4.2025
  2. Tina: Tiny LoRA Reasoning Models

    Publisert: 25.4.2025
  3. Evaluating large language models in theory of mind tasks

    Publisert: 25.4.2025
  4. QUEST: Quality Sampling for Machine Translation

    Publisert: 24.4.2025
  5. Offline Preference Learning via Simulated Trajectory Feedback

    Publisert: 24.4.2025
  6. Reasoning Elicitation in Language Models via Counterfactual Feedback

    Publisert: 24.4.2025
  7. Eliciting Human Preferences with Language Models

    Publisert: 24.4.2025
  8. Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

    Publisert: 24.4.2025
  9. γ-Bench: Evaluating LLMs in Multi-Agent Games

    Publisert: 24.4.2025
  10. DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

    Publisert: 24.4.2025
  11. Optimal Prediction Sets for Enhanced Human-AI Accuracy

    Publisert: 24.4.2025
  12. Self-Correction via Reinforcement Learning for Language Models

    Publisert: 24.4.2025
  13. Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

    Publisert: 24.4.2025
  14. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Publisert: 24.4.2025
  15. Iterative Nash Policy Optimization for Language Model Alignment

    Publisert: 24.4.2025
  16. SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

    Publisert: 23.4.2025
  17. Stack AI: Democratizing Enterprise AI Development

    Publisert: 22.4.2025
  18. Evaluating Modern Recommender Systems: Challenges and Future Directions

    Publisert: 22.4.2025
  19. AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

    Publisert: 22.4.2025
  20. Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Publisert: 21.4.2025

5 / 11

Men know other men best. Women know other women best. And yes, perhaps AIs know other AIs best. AI explains what you should know about this week's AI research progress.

Visit the podcast's native language site