ModelScope的11g显存跑千问1.5-1.8怎么也够了，为什么还报错?

"ModelScope的11g显存跑千问1.5-1.8怎么也够了，为什么还报错? torch.empty(kv_cache_shape,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 252.00 MiB. GPU 0 has a total capacty of 10.75 GiB of which 231.94 MiB is free. Including non-PyTorch memory, this process has 9.54 GiB memory in use. Of the allocated memory 9.13 GiB is allocated by PyTorch, and 69.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 可复现的代码 #!/bin/bash

Command to run the VLLM OpenAI API server

python3 -m vllm.entrypoints.openai.api_server \
--model=/home/Qwen1.5-7b-chat/Qwen1.5-1.8B \
--served-model-name=Qwen1.5 \
--dtype=half \
--tensor-parallel-size=1 \
--trust-remote-code \
--gpu-memory-utilization=0.90 \
--host=0.0.0.0 \
--port=8001 \
--max-model-len=500 \
--max-num-seqs=1"

展开

收起

Lucidly 2024-05-01 08:57:32 27 0

1 条回答

写回答

取消提交回答

为了利利

2000元阿里云代金券免费领取，2核4G云服务器仅664元/3年，新老用户都有优惠，立即抢购>>>
参考以下代码 VLLM_USE_MODELSCOPE=True python -m vllm.entrypoints.openai.api_server --model="qwen/Qwen1.5-1.8B-Chat" --revision="master" 内存分的多了不稳定改成0.75就好了。此回答整理自钉群“魔搭ModelScope开发者联盟群 ①”

2024-05-01 16:00:15

赞同 1 展开评论打赏

ModelScope模型即服务

ModelScope旨在打造下一代开源的模型即服务共享平台，为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品，让模型应用更简单！欢迎加入技术交流群：微信公众号：魔搭ModelScope社区，钉钉群号：44837352

我要提问

ModelScope的11g显存跑千问1.5-1.8怎么也够了，为什么还报错?

Command to run the VLLM OpenAI API server

ModelScope模型即服务

热门讨论

热门文章

相关课程

相关电子书

ModelScope的11g显存跑千问1.5-1.8怎么也够了，为什么还报错?

Command to run the VLLM OpenAI API server

ModelScope模型即服务

热门讨论

热门文章

相关课程

相关文章

相关电子书