Panza is an automated email assistant customized to your writing style and past email history.
Its main features are as follows: For most email cli

Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-05-02 21:30:03

Panza is an automated email assistant customized to your writing style and past email history. Its main features are as follows:

For most email clients, it is possible to download a user's past emails in a machine-friendly .mbox format. For example, GMail allows you to do this via Google Takeout, whereas Thunderbird allows one to do this via various plugins.

One key part of Panza is a dataset-generation technique we call data playback: Given some of your past emails in .mbox format, we automatically create a training set for Panza by using a pretrained LLM to summarize the emails in instruction form; each email becomes a (synthetic instruction, real email) pair. Given a dataset consisting of all pairs, we use these pairs to "play back" your sent emails: the LLM receives only the instruction, and has to generate the "ground truth" email as a training target.

We then use parameter-efficient finetuning to train the LLM on this dataset, locally. We found that we get the best results with the RoSA method, which combines low-rank (LoRA) and sparse finetuning. If parameter efficiency is not a concern, that is, you have a more powerful GPU, then regular, full-rank/full-parameter finetuning can also be used. We find that a moderate amount of further training strikes the right balance between matching the writer's style without memorizing irrelevant details in past emails.