Standard vs Custom AI Benchmark

The AI Benchmark: The Most Important Clause You’ve Never Used (Part 2)

In Part 1 of this post, we discussed why artificial intelligence (AI) benchmark testing belongs in every contract you negotiate involving AI, why benchmarking is important for every kind of AI system, ...

What AI benchmarks miss about real-world performance

Enterprise AI teams have spent years solving for compute, securing GPU allocations, negotiating cloud capacity, and ...

TechCrunch

A new AI benchmark tests whether chatbots protect human well-being

AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A ...

Hosted on MSN

Exclusive: National AI project faces fairness criticism over Naver's custom benchmarks

The government’s first evaluation of the “National Representative AI” has sparked controversy over fairness after it was revealed that the assessment included not only a common benchmark but also ...

Network World

Cisco research finds standard AI safety benchmarks miss the real threat

Enterprises deploying closed AI models have generally relied on published safety benchmarks to assess risk before procurement and deployment decisions. New research from Cisco’s AI Threat Intelligence ...

VentureBeat

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction following ...

Hosted on MSN

Magic Hour benchmark sets new AI video quality standard

The Magic Hour AI Video Benchmark, launched in April 2026, offers a standardized framework to evaluate generative video tools on metrics like prompt adherence, scene stability, and lip-sync accuracy.

MIT Technology Review

AI benchmarks are broken. Here’s what we need instead.

One-off tests don’t measure AI’s true impact. We’re better off shifting to more human-centered, context-specific methods. For decades, artificial intelligence has been evaluated through the question ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results