MinasMinas
useminas.com/founding-story

Founding Story

AI vendors are still being approved on vibes

May 2026 · 8 min read

At my last company, I built benchmarks for our AI models.

We spent a lot of time trying to answer a simple question: Is our model actually better?

Not just better in a demo. Not just better on a cherry-picked example. Not just better because the team liked the output. Objectively better.

So we built internal evals. We tested models against real workflows. We compared outputs. We looked for failure modes. We tried to understand where the model performed well, where it broke, and where it created risk.

That work mattered.

But over time, I noticed something that bothered me. Almost none of our customers were doing the same thing.

They were evaluating AI vendors, but not really objectively. The process was usually something like:

  • — A team tried the tool.
  • — They ran a few examples.
  • — The output looked good.
  • — People liked it.
  • — The vendor felt promising.
  • — The conversation moved forward.

In other words, the approval process was mostly vibes.

And to be clear, this was not because buyers were careless. It was because they did not have a better operating system.

They did not have a structured way to ask: What exactly are we testing? What does "good" mean here? Which vendors are we comparing? What evidence did we collect? What risks did we find? Who reviewed the results?

Without that structure, even strong AI products became difficult to buy. Sales cycles stretched out. Legal, security, IT, procurement, and business teams all had questions. Internal champions struggled to prove why one vendor was better than another.

Leadership wanted confidence, but the team could not produce objective evidence.

The vendor might have been great. The buyer might have wanted to move forward. But the organization still lacked the proof needed to make a decision.

That is the problem Minas exists to solve.

Minas helps enterprises approve AI vendors with evidence.

The idea is simple: every AI vendor review should produce a clear, defensible approval record. Not a scattered folder of screenshots. Not a few Slack messages. Not a spreadsheet that nobody trusts three months later. Not a meeting where everyone agrees the output "seems pretty good."

A real evaluation. A structured use case. A tailored eval blueprint. A set of test scenarios. Evidence from the actual workflow. Human review. A decision packet.

The hard part is that building good evals is not easy. It is technical. It is domain-specific. It requires understanding the workflow, the failure modes, the expected outputs, the edge cases, and the evidence needed to make a defensible decision.

A generic benchmark is not enough.

An eval for a legal AI tool should not look like an eval for a customer support copilot. An eval for a claims automation workflow should not look like an eval for an internal research assistant. An eval for an autonomous agent should not look like an eval for a document summarizer.

We have built thousands of evaluations. We know what it takes to turn a messy business workflow into a rigorous test of whether an AI system is actually good enough to use.

So we productized that expertise.

Minas helps enterprises generate sophisticated evals for any AI use case, test every vendor against the same standard, capture the evidence, and make an approval decision with confidence.

AI risk is not generic.

The same model can be low-risk in one workflow and completely unacceptable in another. An AI tool used to summarize internal meeting notes is not the same as an AI tool used to review contracts, respond to customers, screen claims, generate financial analysis, or automate operational decisions.

The question is not just: "Is this vendor good?"

The better question is: "Is this vendor good enough for this specific use case, with this data, these users, this workflow, and these consequences if it fails?"

That question cannot be answered with a demo. It cannot be answered with a reference call. It cannot be answered with a sales deck or a slide about accuracy benchmarks.

It can only be answered with a structured evaluation.

AI governance has become increasingly formalized. Legal, compliance, risk, and privacy teams are now weighing in on vendor selection. Regulatory scrutiny is rising. Audit requirements are expanding. And enterprises want to move fast, but they also want defensibility.

Minas bridges that gap.

We help teams move faster by making the approval process structured and repeatable. We help them stay safe by forcing every vendor decision to be backed by real evidence. And we help them scale by making good AI evaluations a standard operating procedure.

The companies that figure out how to do this well will move faster than their competitors. They will adopt AI safely. They will avoid costly failures. And they will build internal capabilities that make them smarter about every future AI decision.

That is the future we are building toward.

If your team is evaluating AI vendors and wants a more structured approach, we would love to talk. We are working with early design partners to shape the product, and we are especially interested in teams that care deeply about making defensible AI decisions.

AI vendors should not be approved on vibes.

They should be approved with evidence.

— The Minas Team