Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

submited by

Style Pass

2024-04-28 02:00:02

Jiachen Liu, Zhiyu Wu, Jae-Won Chung University of Michigan, Fan Lai UIUC, Myungjin Lee Cisco Systems, Mosharaf Chowdhury University of Michigan.

TL;DR: Large language models (LLMs) have revolutionized text-based interactions, enabling services from real-time translation to AI-driven chatbots. By streaming tokens to users, akin to video streaming, such text streaming service allows users to digest the content incrementally, whether in text or speech form. However, existing serving systems primarily focus on optimizing server-side aggregated metrics while ignoring individual user experience, leading to unfavorable service quality or poor Quality-of-Experience (QoE) under high and/or bursty load.

In this project, we first formally define QoE in text streaming services by considering the end-to-end token delivery process. Thereafter, we propose Andes, a QoE-aware serving system that enhances user experience. Andes achieves this by strategically scheduling multiple requests on contended GPU resources, prioritizing them based on their resource demands and service acquired. Our evaluations demonstrate that, compared to the state-of-the-art LLM serving systems like vLLM, Andes improves the average QoE by up to 3.2× under high request rate, or alternatively, it attains up to 1.6× higher request rate while preserving high QoE.

Imagine three different scenarios where text is streamed to users. Despite all having the same efficiency in token generation throughput, their user experiences vary dramatically:

Basic Anatomy of a Flink Program

Comment

New Windows 11 registry hacks to customize your device

Comment

New FPS Cheating Tool Uses Machine Learning, Is Impossible to Detect

Comment

FreeBSD Desktop – Part 27 – Configuration – Netflix Signal Telegram

Comment

Barry Diller says streaming services killed the movie business as he knew it

Comment

Akka Serverless | @lightbend

Comment

forlater.email (beta)

Comment

Amazon Unveils First of Its Own Smart TVs, Will Bring TikTok to Fire TV in U.S. and Canada

Comment

Sainsbury's stops selling CDs and DVDs

Comment

Anime Encoding Guide for x265 (HEVC) & AAC/OPUS (and Why to Never Use FLAC)

Comment

Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

Leave a Comment

Related Posts

Basic Anatomy of a Flink Program

New Windows 11 registry hacks to customize your device

New FPS Cheating Tool Uses Machine Learning, Is Impossible to Detect

FreeBSD Desktop – Part 27 – Configuration – Netflix Signal Telegram

Barry Diller says streaming services killed the movie business as he knew it

Akka Serverless | @lightbend

forlater.email (beta)

Amazon Unveils First of Its Own Smart TVs, Will Bring TikTok to Fire TV in U.S. and Canada

Sainsbury's stops selling CDs and DVDs

Anime Encoding Guide for x265 (HEVC) & AAC/OPUS (and Why to Never Use FLAC)

Recent Posts

She was accused of faking an incriminating video of teenage cheerleaders. She was arrested, outcast and condemned. The problem? Nothing was fake after all

Robert Williams Wood and the mystery of anomalous dispersion (1901) | Skulls in the Stars

Comparison of Data Lake Table Formats (Apache Iceberg, Apache Hudi and Delta Lake)

Fabien Sanglard's Website

Investing legend Jeremy Grantham sounds the alarm on a stocks bubble, blasts bitcoin, and says the dollar's still king

A hacker got 6 years in prison for stealing therapy notes and blackmailing patients

Computer Science > Computation and Language

Systemic lupus Erythematosus and geomagnetic disturbances: a time series analysis

Tennessee Valley Authority

The Software Behind Silicon (with Synopsys Founder Aart de Geus and CEO Sassine Ghazi)

Running CHIP-8 on an HP 48 calculator

2.1. Version 9.10.1¶

Build Awesome Notion Website Templates

Big Energy | No Mercy / No Malice

Ola founder: Ditch Azure, get free cloud in LinkedIn row - India Dispatch

Dark Night Skies - Susam Pal

How the Moon got a makeover

"Extreme" G5 geomagnetic storm reaches Earth, NOAA says, following "unusual" solar event

Computer Science > Machine Learning

Heart problems lead to fatal crash into the ocean