AI Safety Fundamentals: Alignment

En podkast av BlueDot Impact

Prøv Podimo gratis i hele 60! dager!

I Podimo finner du eksklusive podkaster og bestselgende lydbøker tilpasset dine ører

83 Episoder

Constitutional AI Harmlessness from AI Feedback
Publisert: 19.7.2024
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Publisert: 19.7.2024
Illustrating Reinforcement Learning from Human Feedback (RLHF)
Publisert: 19.7.2024
Chinchilla’s Wild Implications
Publisert: 17.6.2024
Deep Double Descent
Publisert: 17.6.2024
Intro to Brain-Like-AGI Safety
Publisert: 17.6.2024
Eliciting Latent Knowledge
Publisert: 17.6.2024
Toy Models of Superposition
Publisert: 17.6.2024
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Publisert: 17.6.2024
Discovering Latent Knowledge in Language Models Without Supervision
Publisert: 17.6.2024
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation
Publisert: 17.6.2024
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions
Publisert: 17.6.2024
Imitative Generalisation (AKA ‘Learning the Prior’)
Publisert: 17.6.2024
An Investigation of Model-Free Planning
Publisert: 17.6.2024
Low-Stakes Alignment
Publisert: 17.6.2024
Gradient Hacking: Definitions and Examples
Publisert: 17.6.2024
Empirical Findings Generalize Surprisingly Far
Publisert: 17.6.2024
Compute Trends Across Three Eras of Machine Learning
Publisert: 13.6.2024
Worst-Case Thinking in AI Alignment
Publisert: 29.5.2024
Public by Default: How We Manage Information Visibility at Get on Board
Publisert: 12.5.2024

1 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment