Claude is a filthy cheater #programming #coding #softwareengineer #ai #artificialintelligence
@thecodingslothTranscript
Cheating is bad, and this AI model cheated. "Splash news, yo!" Andthropic just dropped plot Opus 4.8, and it's now ranked the number one AI model for programming. Except not really. The programming benchmarks for AI models are kind of broken. They're contaminated, too easy, and unreliable. The answers are public, and the automated graders they were using were wrong about 32% of the trials. The last cloud model literally cheated on a benchmark by running Git log to get the answers. If the old model cheated, the new one probably did too. So a startup called Data Curve built a new benchmark that they can't cheat on, and now all of a sudden, ChatchyBT is better than this new cloud model. Hmm, if you want more news like this, you can check out my free newsletter.
Download Transcript
Related Videos

This repo is pretty cool. #fyp #programming #coding #cs

They rejected my application to Hogwarts but I still found a way to be a wizard. 🧹#illusion #magic #harrypotter

Jailbreak - Clue 5

Kiwi Eating 🥝 ASMR Your new daily ASMR habit starts here…Follow to keep it going! #asmr #satisfyingvideos #aiasmr #eating #kiwi