@rajistics: AI agents used to shut down mid-task or hallucinate vending empires. Now? They're beating humans at long-horizon business simulations. From 8% task success with GPT‑4o to 30%+ with Claude and Gemini, benchmarks like AgentCompany and Vending-Bench show agents aren’t just smarter — they’re starting to work. TheAgentCompany Benchmark (CMU): https://arxiv.org/abs/2412.14161 Vending-Bench (Andon Labs): https://arxiv.org/abs/2502.15840 Project Vend (Anthropic): https://www.anthropic.com/research/project-vend-1 Claude/Gemini benchmark updates: https://x.com/andonlabs/status/1805322416206078341

Rajiv Shah | data science & AI

Open In TikTok:

Region: US

Sunday 06 July 2025 15:35:16 GMT

1415

44

4

2

Music

Download

No Watermark .mp4 (6.95MB) No Watermark(HD) .mp4 (5.71MB) Watermark .mp4 (7.11MB) Music .mp3

Comments

miakdot :

Honestly the more complexity the worse these things get at the moment. It’s a good exercise to try later to benchmark when technology improves.

2025-07-06 16:13:10

1

mon :

My code base gets ruined after using ai, even without integration just using web copy paste and search advise…

2025-07-06 15:51:21

0

ArcaMutant :

For context, the Claudius bot also used Claude Opus 4, and it performed very badly in a real world test. (It didn’t have any extra features though, from what I can tell it was a raw LLM with the ability to send Emails.

2025-07-08 23:33:27

0

To see more videos from user @rajistics, please go to the Tikwm homepage.

Other Videos

#elijahmikaelson #thevampirediaries #fyp #edit #tvd

#elijahmikaelson #thevampirediaries #fyp #edit #tvd

#bac2025 #bac2025❤ #bac2025_nchallah #bac2025🎓📖📔📚📑💡yes_we_can_do_it

#bac2025 #bac2025❤ #bac2025_nchallah #bac2025🎓📖📔📚📑💡yes_we_can_do_it

se re tocaba el pelo #fyp #foryoupage #paratii #maluma

se re tocaba el pelo #fyp #foryoupage #paratii #maluma

Neuer Genshin Trailer🤩 #GenshinImpact #flinsgenshin #genshin #nodkrai #screentuber

Neuer Genshin Trailer🤩 #GenshinImpact #flinsgenshin #genshin #nodkrai #screentuber

Time heals #whimsical #loveyou #funnyvideo #storytime #greenloftai #creature #aiart

Time heals #whimsical #loveyou #funnyvideo #storytime #greenloftai #creature #aiart

About

Robot

Legal

Privacy Policy