Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio. It is an 8.5B parameter transformer trained on 20 mill

si-pbc / hertz-dev like 148 Follow Standard Intelligence 38

submited by

Style Pass

2024-11-16 05:30:02

Hertz-dev is an open-source, first-of-its-kind base model for full-duplex conversational audio. It is an 8.5B parameter transformer trained on 20 million unique hours of high-quality audio data. This repo contains code for both mono- and full-duplex generation; we expect to do a full Transformers library integration in the near future.

Hertz-dev is a base model, without fine-tuning, RLHF, or instruction-following behavior. It can be fine-tuned for almost 𝘢𝘯𝘺 audio modeling task, from live translation to classification. Base models excel at faithfully modeling their training set, and accurate maps come from contact with reality.

From the world’s largest known dataset of high-quality real-world conversational audio, hertz-dev exhibits state-of-the art ability in human-like speech patterns such as pauses and emotional inflections. Hertz-dev has a 80ms theoretical average latency, and benchmarks 120ms real-world latency on a single RTX 4090, which is 1.5-2x lower than the previous state of the art. Low latency is necessary for natural audio, and we're proud to move the field in this direction.

Inference is known to work on Python 3.10 and CUDA 12.1. Other versions have not been tested as thoroughly. If you want to use CUDA 12.1, you'll need to install torch with pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 All three scripts will automatically download the models you need.

si-pbc / hertz-dev like 148 Follow Standard Intelligence 38

Leave a Comment

Related Posts

Recent Posts

Olive oil was revered and cherished by the ancients. But its distinctive peppery taste is really a modern invention

Search code, repositories, users, issues, pull requests...

WhatsApp now lets you save message drafts

A Sea Snail Toxin Could Inspire New Diabetes Drugs

Paid for shade: How parametric insurance is helping Indian women

Introducing ellycache - pgDash

Cute trick to mark parts of a C structure read-only

Learning to use a handy Third Thumb may be easier than you think

A Work Ritual That's Losing Its Taste - Standup Escape

What I Ate Growing Up With the Grateful Dead - The Atlantic

Should you freeze your brain? These scientists think it’s worth a shot

Why Australian zoo is asking Sydney residents to catch deadly funnel-web spiders

Lexical

YouTuber Jake Paul strikes defeat against former boxing champ Mike Tyson in Texas

Search code, repositories, users, issues, pull requests...

Adwords for Software APIs - Davis Treybig

About that JPEG/ZIP/Shakespeare hybrid file

Mike Tyson vs. Jake Paul: Netflix suffers significant issues on massive fight night

American fighter pilots explain how they fought an overwhelming Iranian drone swarm in total darkness

The ambiguous "use" / GioCities