AI Safety Fundamentals: Alignment

En podkast av BlueDot Impact

Kategorier:

83 Episoder

  1. Constitutional AI Harmlessness from AI Feedback

    Publisert: 19.7.2024
  2. Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Publisert: 19.7.2024
  3. Illustrating Reinforcement Learning from Human Feedback (RLHF)

    Publisert: 19.7.2024
  4. Deep Double Descent

    Publisert: 17.6.2024
  5. Chinchilla’s Wild Implications

    Publisert: 17.6.2024
  6. Eliciting Latent Knowledge

    Publisert: 17.6.2024
  7. Intro to Brain-Like-AGI Safety

    Publisert: 17.6.2024
  8. Toy Models of Superposition

    Publisert: 17.6.2024
  9. Low-Stakes Alignment

    Publisert: 17.6.2024
  10. ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation

    Publisert: 17.6.2024
  11. Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions

    Publisert: 17.6.2024
  12. Gradient Hacking: Definitions and Examples

    Publisert: 17.6.2024
  13. Imitative Generalisation (AKA ‘Learning the Prior’)

    Publisert: 17.6.2024
  14. Discovering Latent Knowledge in Language Models Without Supervision

    Publisert: 17.6.2024
  15. Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

    Publisert: 17.6.2024
  16. An Investigation of Model-Free Planning

    Publisert: 17.6.2024
  17. Empirical Findings Generalize Surprisingly Far

    Publisert: 17.6.2024
  18. Compute Trends Across Three Eras of Machine Learning

    Publisert: 13.6.2024
  19. Worst-Case Thinking in AI Alignment

    Publisert: 29.5.2024
  20. How to Get Feedback

    Publisert: 12.5.2024

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment

Visit the podcast's native language site