Reviewing Post-Training Techniques from Recent Open LLMs

submited by
Style Pass
2025-01-07 16:30:03

Whenever a new technical report is released for an open LLM, I like to give it a skim to see if there are any novel post-training techniques, as that's what I've been working on lately. When these techniques are used in large-scale models available to the public, it's more convincing to me than when it's demonstrated in a standalone paper or a small-scale model. This post is a roundup of some of the techniques I've seen in recent reports, and a brief overview of how they work.

Unfortunately, none of these model reports contain ablation metrics for the techniques reviewed; at their scale, it might have been prohibitively expensive to do so, but this does leave the question open as to how effective these techniques are in isolation1. I'll also be skipping details not related to post-training techniques, so this won't be a full paper review; I'd suggest checking Sebastian Raschka's blog for more in-depth reviews of papers in that vein.

Detailed in section 4 of the Phi-4 paper, pivotal token search is a variant of DPO that accounts for the fact that there are often specific tokens in a completion that can cause a model to derail from a correct answer, in verifiable use cases such as code generation or math reasoning. The paper proposes that for each token, you can determine an incremental increase in the conditional probability of success of the entire completion for that token; if this is applied to the entire completion, you would find certain 'pivotal' tokens in the completion that radically increase or decrease the probability of completion success. They also note that tokens with a low probability in the chosen completion could potentially contribute positively to the loss, even if it massively reduces the likelihood of completion success.

Leave a Comment