7 Comments
User's avatar
Adam M.J's avatar

Thanks, Karo. I have to say, your launch deep dives are some of the most interesting ones I read. You always help me understand what to really pay attention to.

Karo (Product with Attitude)'s avatar

That makes me very happy, thank you so much, Adam! 🤗

ToxSec's avatar

awesome stuff Karo. i’m really liking the idea of less shortcuts and more honest. will have to test more, but my initial impressions are they are heading the right direction with this one.

Law's avatar

Really interesting(I'm not a dev so excuse the question) , do you know how many lines of code it originally was before the 750k and how much it cost or the token count?

Karo (Product with Attitude)'s avatar

Great question! No, I don't know. Buy I'll try to find out! 🤗

Dr Jim Polk's avatar

Thanks for this. I love using AI for a multitude of tasks, routine and creative.

And, I don’t understand why we go so insane and give so much attention to these weekly benchmark improvements. They’re not life changing, or revolutionary. Just my take. Thanks.

Karo (Product with Attitude)'s avatar

Thank you so much for taking the time to read and comment dr Jim! 🤗 Yes, the benchmark theater can get ridiculous heheh. Especially when they're cited without context.

But for builders, they're worth watching for two reasons:

1-even small improvements can matter if they make previously annoying workflows reliable enough to use.

2 - those small jumps sometimes reveal a product shift underneath: less babysitting, better verification, etc.

And for me personally, benchmarks are useful because they give me a way to discuss critical AI literacy.

A benchmark is a receipt from one narrow test environment, so I always encourage everyone to test themselves. That’s where it becomes interesting.