nanoGCG is a lightweight but full-featured implementation of the GCG (Greedy Coordinate Gradient) algorithm. This implementation can be used to optimi

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-10-21 20:00:09

nanoGCG is a lightweight but full-featured implementation of the GCG (Greedy Coordinate Gradient) algorithm. This implementation can be used to optimize adversarial strings on causal Hugging Face models.

The GCG algorithm was introduced in Universal and Transferrable Attacks on Aligned Language Models [1] by Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, Zico Kolter, and Matt Fredrikson. This implementation implements the original algorithm and supports several modifications that can improve performance, including multi-position token swapping [2], a historical attack buffer [2][3], and the mellowmax loss function [4][5].

optim_str_init: str = "x x x x x x x x x x x x x x x x x x x x" - the starting point for the adversarial string that will be optimized

batch_size: int = None - can be used to manually specify how many of the search_width candidate sequences are evaluated at a time in a single GCG iteration

Leave a Comment