203 Episoder

  1. Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

    Publisert: 12.5.2025
  2. Leaked Claude Sonnet 3.7 System Instruction tuning

    Publisert: 12.5.2025
  3. Converging Predictions with Shared Information

    Publisert: 11.5.2025
  4. Test-Time Alignment Via Hypothesis Reweighting

    Publisert: 11.5.2025
  5. Rethinking Diverse Human Preference Learning through Principal Component Analysis

    Publisert: 11.5.2025
  6. Active Statistical Inference

    Publisert: 10.5.2025
  7. Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

    Publisert: 10.5.2025
  8. AI-Powered Bayesian Inference

    Publisert: 10.5.2025
  9. Can Unconfident LLM Annotations Be Used for Confident Conclusions?

    Publisert: 9.5.2025
  10. Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

    Publisert: 9.5.2025
  11. Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

    Publisert: 9.5.2025
  12. How to Evaluate Reward Models for RLHF

    Publisert: 9.5.2025
  13. LLMs as Judges: Survey of Evaluation Methods

    Publisert: 9.5.2025
  14. The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

    Publisert: 9.5.2025
  15. Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

    Publisert: 9.5.2025
  16. Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

    Publisert: 9.5.2025
  17. Accelerating Unbiased LLM Evaluation via Synthetic Feedback

    Publisert: 9.5.2025
  18. Prediction-Powered Statistical Inference Framework

    Publisert: 9.5.2025
  19. Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

    Publisert: 9.5.2025
  20. RM-R1: Reward Modeling as Reasoning

    Publisert: 9.5.2025

2 / 11

Men know other men best. Women know other women best. And yes, perhaps AIs know other AIs best. AI explains what you should know about this week's AI research progress.

Visit the podcast's native language site