Latency in LLM Serving
Preface There have been many excellent works on LLM serving, mainly focusing on improving the throughput. Meanwhile, in practical applications, latency is equally important for LLM serving. However, currently few works focus on improvement of LLM serving latency, especially the latency optimization under SLA constraint. This blog attempts to summarize the basic concepts and problems in this direction, and give some novel research directions based on some analysis of latency in LLM serving. ...