LangExtract is a Python library that uses LLMs to extract structured information from unstructured text documents based on user-defined instructions. It processes materials such as clinical notes or reports, identifying and organizing key details while ensuring the extracted data corresponds to the source text.
Note: Using cloud-hosted models like Gemini requires an API key. See the API Key Setup section for instructions on how to get and configure your key.
First, create a prompt that clearly describes what you want to extract. Then, provide a high-quality example to guide the model.
Model Selection: gemini-2.5-flash is the recommended default, offering an excellent balance of speed, cost, and quality. For highly complex tasks requiring deeper reasoning, gemini-2.5-pro may provide superior results. For large-scale or production use, a Tier 2 Gemini quota is suggested to increase throughput and avoid rate limits. See the rate-limit documentation for details.
Model Lifecycle: Note that Gemini models have a lifecycle with defined retirement dates. Users should consult the official model version documentation to stay informed about the latest stable and legacy versions.