last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks and exploits. It acts as a robust filt

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-04-03 11:00:05

last_layer is a security library designed to protect LLM applications from prompt injection attacks, jailbreaks and exploits. It acts as a robust filtering layer to scrutinize prompts before they are processed by LLMs, ensuring that only safe and appropriate content is allowed through.

Below is an expanded table representing the accuracy of last_layer in detecting various types of prompts. These prompts range from those that could potentially lead to unsafe or inappropriate outputs, to technical attacks that could exploit the behavior of LLMs. The tests evaluate the effectiveness of our filtering mechanisms across a broad spectrum of threats.

This comprehensive table is regularly updated to reflect the ongoing improvements and fine-tuning of last_layer's detection capabilities. We aim to maintain and improve the highest standards of safety

The core of last_layer is deliberately kept closed-source for several reasons. Foremost among these is the concern over reverse engineering. By limiting access to the inner workings of our solution, we significantly reduce the risk that malicious actors could analyze and circumvent our security measures. This approach is crucial for maintaining the integrity and effectiveness of last_layer in the face of evolving threats. Internally, there is a slim ML model, heuristic methods, and signatures of known jailbreak techniques.

Leave a Comment