Modelbench Tutorial Texture

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

The current public practice benchmark uses LlamaGuard to evaluate the safety of responses. For now you will need a Together AI account to use it. For 1.0, we test models on a variety of services; if ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Run safety benchmarks against AI models and view detailed reports showing how well they performed.

Trending now