I previously wrote about writing react code with Deepseek-coder 33b model, and whether we could improve some of these shortcomings with the latest res

Evaluating LLM Benchmarks for React

submited by
Style Pass
2024-05-05 10:30:04

I previously wrote about writing react code with Deepseek-coder 33b model, and whether we could improve some of these shortcomings with the latest research in the LLM space

So in this post, I’m going to evaluate existing benchmarks that specifically measures LLM capabilities on coding capabilities.

However, they don’t accept “Custom code” Evals. Meaning, only simple matches (Exact, Includes, Fuzzy Match) are possible test evaluations to run.

Even though OpenAI doesn’t accept these evals. It’s worth noting that we can simply fork the repo and write our own custom evals

The framework allows to build a custom eval, as well as a custom completion function. It also comes with a nice cookbook tutorial.

👍 - This could work for building a react benchmark. It might be a bit hard to get off the ground though, and may limit customization.

10,000 code generation problems of varying difficulties. Covers simple introductory problems, interview-level problems, and coding competition challenges

Leave a Comment