Search code, repositories, users, issues, pull requests...

submited by

Style Pass

2024-04-17 13:30:05

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

This change upstreams llamafile's cpu matrix multiplication kernels which improve image and prompt evaluation speed. For starters, Q4_0 and Q8_0 weights should go ~40% faster on CPU. The biggest benefits are with data types like f16 / f32, which process prompts 2x faster thus making them faster than quantized data types for prompt evals.

This change also introduces bona fide AVX512 support since tinyBLAS is able to exploit the larger register file. For example, on my CPU llama.cpp llava-cli processes an image prompt at 305 tokens/second, using the Q4_K and Q4_0 types, which has always been faster than if we used f16 LLaVA weights, which at HEAD go 188 tokens/second. With this change, f16 LLaVA performance leap frogs to 464 tokens/second.

On Intel Core i9-14900K this change improves F16 prompt perf by 5x. For example, using llama.cpp at HEAD with Mistral 7b f16 to process a 215 token prompt will go 13 tok/sec. This change has fixes making it go 52 tok/sec. It's mostly thanks to my vectorized outer product kernels but also because I added support for correctly counting the number of cores on Alderlake, so the default thread count discounts Intel's new efficiency cores. Only Linux right now can count cores.

Web-based editor - GitHub Docs

Comment

Groundhog day: NPM package caught stealing browser passwords

Comment

DHI-GRAS / terracotta

Comment

Semantic FAQ Search with Haystack

Comment

nubank / umschreiben-clj

Comment

The internet has changed, our search should change as well!

Comment

screenshotbot / screenshotbot-oss Public

Comment

UltraEdit: The world's best text editor

Comment

Brave Search beta now available in Brave browser, offering users the first independent privacy search/browser alternative to big tech

Comment

Pagination in web scraper

Comment

Search code, repositories, users, issues, pull requests...

Leave a Comment

Related Posts

Web-based editor - GitHub Docs

Groundhog day: NPM package caught stealing browser passwords

DHI-GRAS / terracotta

Semantic FAQ Search with Haystack

nubank / umschreiben-clj

The internet has changed, our search should change as well!

screenshotbot / screenshotbot-oss Public

UltraEdit: The world's best text editor

Brave Search beta now available in Brave browser, offering users the first independent privacy search/browser alternative to big tech

Pagination in web scraper

Recent Posts

Light Speed Constraint - by Yair - BauZen

How we charged a Polestar in 10 minutes — and why it’ll change the EV landscape forever

Scientists Find a Surprising Way to Transform A and B Blood Types Into Universal Blood

SatBird: a Dataset for Bird Species Distribution Modeling using Remote Sensing and Citizen Science Data

Dolphin Emulator - Dolphin Progress Report Tenth Anniversary Special: February, March, and April 2024

Introduction to Tanstack Query and organizing code with queryOptions for maintainability

5 Types of Climate Companies Job Seekers Should Know About

A new spin on materials analysis: Benefits of probing electron spin states at much higher resolution and efficiency

Survey Finds Many Gen Zers Say School Lacks a ‘Sense of Purpose’ and Isn’t ‘Motivating’

Benchmarking NVIDIA TensorRT-LLM

Reverse Searching Netflix’s Federated Graph

Cuadros Aesthetic: Encuentra la armonía en tu hogar

LLM as an n-dimensional Object in n-dimensional Space

Timing Details With cURL

With younger women getting breast cancer, national panel lowers mammogram screening age to 40

Open Source world's Bruce Perens emits draft Post-Open Zero Cost License

Binius: highly efficient proofs over binary fields

Accelerate software development and leverage your business data with generative AI assistance from Amazon Q

Upbound now everywhere: A fully automated Crossplane experience for platform engineers

Crunchy Bridge for Analytics: Your Data Lake in PostgreSQL