AI readiness

How do I know which model to pick for my use case? Intelligence, speed/cost, open vs. closed source, and vibe.

Intelligence

The standard way to measure LLM intelligence is benchmarks. This is a standard place to get most up to date benchmark numbers for different LLMs: https://artificialanalysis.ai/#intelligence.

The “Intelligence Index” is derived from a combination of benchmarks. The relative rankings give a rough estimate of intelligence ranking.

You can also look at individual benchmark rankings. E.g.

GPDEval evaluates AI models on real-world, economically valuable tasks across a wide range of occupations.
Omniscience measures factual recall and hallucination across various economically relevant domains.

Use these benchmarks as guidance to pick the top performing model for your domain/use case.

Speed and cost

More intelligent models are in general more expensive and slower. So if your use case requires e.g. very fast responses, then you probably can't use very slow models.

See LLM pricing comparisons and speed measurements.

(Artificial Analysis is pretty useful eh?)

Note: if you are using e.g. ChatGPT, cost is usually not a concern, since its a flat fee per month (all-you-can-eat style TOKENMAXX) no matter how much you use. Although some products, e.g. Claude Chat, limits how many times you can use their most intelligent models (Opus).

Open vs. closed source

see Open vs Closed LLMs. If theres a hard requirement, e.g. only use open source models, welp, you gotta follow that.

"Vibe"

If all other factors are equal, honestly, just pick a model and see what happens. Then test it and get a "vibe" of whether you like the model or not. That’s a valid reason. For example, some people simply prefer the communication style of Claude models (although if this is for a company project, you probably want to find a more concrete reason...).