“Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap” by Molly Hickman

EA Forum Podcast (All audio) - En podkast av EA Forum Team

In Q4 we ran the second tournament in the AI Benchmarking Series which aims to assess how the best bots compare to the best humans on real-world forecasting questions, like those found on Metaculus. Over the quarter we had 44 bots compete for $30,000 on 402 questions with a team of ten Pros serving as a human benchmark on 122 of those 402. We found that: Metaculus Pro Forecasters were better than the top bot “team” (a team of one, this quarter), but not with statistical significance (p = 0.079) using log scoring with a weighted t-test. Top bot performance improved relative to the Pro benchmark to a -8.9 head-to-head score in Q4 2024, compared to a -11.3 head-to-head score in Q3 2024, although this improvement is not statistically significant. (A higher score indicates greater relative accuracy. A score of 0 corresponds to equal accuracy.) These main results [...] ---Outline:(03:55) Selecting a Bot Team(08:17) Comparing the Bots to Pros(12:26) Calibration(14:08) Discrimination(15:45) Comparing Individual Bots to the Pros(16:19) Metaculus Bots(17:56) The best bot: pgodzinai(21:04) Trends in Winners(23:29) Discussion(25:12) Can you make a winning bot?The original text contained 6 images which were described by AI. --- First published: February 19th, 2025 Source: https://forum.effectivealtruism.org/posts/TG2zCDCozMcDLgoJ5/metaculus-q4-ai-benchmarking-bots-are-closing-the-gap --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Visit the podcast's native language site