This is a weakness visible in coding agents. Accurately editing hundreds of lines can take multiple model calls, at times trapping the agent in an inf

Near-Instant Full-File Edits

submited by
Style Pass
2024-05-16 04:30:06

This is a weakness visible in coding agents. Accurately editing hundreds of lines can take multiple model calls, at times trapping the agent in an infinite loop. Even small, isolated edits are plagued with bugs:

In Cursor, the planning phase takes the form of a chat interface with a powerful frontier model. Applying the change to the current file should be straightforward and instant.

Figure 2: A toy example of a change we want to "apply". It cannot easily be copy/pasted since it sketches out the change over a full class.

Our fast-apply model surpasses GPT-4 and GPT-4o performance and pushes the pareto frontier on the accuracy / latency curve. We achieve speeds of >1000 tokens/s (just under 4000 char/s) on our 70b model using a speculative-decoding variant tailored for code-edits, called speculative edits.

This means a ~13x speedup over vanilla inference using Llama-3-70b and a ~9x speedup over our previous GPT-4 speculative edits deployment.

Leave a Comment