AI Safety Fundamentals: Alignment

En podkast av BlueDot Impact

Kategorier:

83 Episoder

  1. Yudkowsky Contra Christiano on AI Takeoff Speeds

    Publisert: 13.5.2023
  2. Why AI Alignment Could Be Hard With Modern Deep Learning

    Publisert: 13.5.2023
  3. Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

    Publisert: 13.5.2023
  4. Measuring Progress on Scalable Oversight for Large Language Models

    Publisert: 13.5.2023
  5. Supervising Strong Learners by Amplifying Weak Experts

    Publisert: 13.5.2023
  6. Summarizing Books With Human Feedback

    Publisert: 13.5.2023
  7. Robust Feature-Level Adversaries Are Interpretability Tools

    Publisert: 13.5.2023
  8. Debate Update: Obfuscated Arguments Problem

    Publisert: 13.5.2023
  9. High-Stakes Alignment via Adversarial Training [Redwood Research Report]

    Publisert: 13.5.2023
  10. AI Safety via Debate

    Publisert: 13.5.2023
  11. Takeaways From Our Robust Injury Classifier Project [Redwood Research]

    Publisert: 13.5.2023
  12. Introduction to Logical Decision Theory for Computer Scientists

    Publisert: 13.5.2023
  13. AI Safety via Debatered Teaming Language Models With Language Models

    Publisert: 13.5.2023
  14. Toy Models of Superposition

    Publisert: 13.5.2023
  15. Understanding Intermediate Layers Using Linear Classifier Probes

    Publisert: 13.5.2023
  16. Acquisition of Chess Knowledge in Alphazero

    Publisert: 13.5.2023
  17. Feature Visualization

    Publisert: 13.5.2023
  18. Discovering Latent Knowledge in Language Models Without Supervision

    Publisert: 13.5.2023
  19. Progress on Causal Influence Diagrams

    Publisert: 13.5.2023
  20. Careers in Alignment

    Publisert: 13.5.2023

4 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment

Visit the podcast's native language site