Language models are able to generate text, but when requiring a precise output format, they do not always perform as instructed. Various prompt engine

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-06-30 11:00:03

Language models are able to generate text, but when requiring a precise output format, they do not always perform as instructed. Various prompt engineering techniques have been introduced to improve the robustness of the generated text, but they are not always sufficient. This project solves the issues by filtering the tokens that the language model is allowed to generate at every timestep, thus ensuring that the output format is respected, while minimizing the limitations on the language model.

We created a Google Colab Notebook which contains a full example of how to use this library to enforce the output format of llama2, including interpreting the intermediate results. The notebook can run on a free GPU-backed runtime in Colab.

LM Format Enforcer is integrated into the vLLM inference server. vLLM includes an OpenAI compatible server with added capabilities that allow using LM Format Enforcer without writing custom inference code.

Or on a per-request basis, by adding the guided_decoding_backend parameter to the request together with the guided decoding parameters: