Tmp Directory Locked #2675

sidjha1 · 2024-01-30T23:49:12Z

When multiple users are using vLLM on the same machine, we get the following permission denied error regarding a .lock file

Permission denied: '/tmp/meta-llama-Llama-2-70b-chat-hf.lock

This was also mentioned in #2232 and #2179.

The text was updated successfully, but these errors were encountered:

asimmunawar · 2024-01-31T02:12:42Z

You can add this to resolve the lock issue
--download-dir "LOCAL-PATH"

Mor-Li · 2024-01-31T06:40:10Z

Same Problem, I think attention is needed to solve this problem.This problem seems to appear randomly！
When I run this script

from vllm import LLM, SamplingParams
prompts = [
    "<s><|im_start|>user\nHello, my name is<|im_end|>\n<|im_start|>assistant\n",
    "<s><|im_start|>user\nThe president of the United States is<|im_end|>\n<|im_start|>assistant\n",
    "<s><|im_start|>user\nThe capital of France is<|im_end|>\n<|im_start|>assistant\n",
    "<s><|im_start|>user\nThe future of AI is<|im_end|>\n<|im_start|>assistant\n",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="WizardLM/WizardLM-70B-V1.0", trust_remote_code=True,
          tensor_parallel_size=4,)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

The outputs will be sometimes like this

INFO 01-31 14:22:40 llm_engine.py:70] Initializing an LLM engine with config: model='WizardLM/WizardLM-70B-V1.0', tokenizer='WizardLM/WizardLM-70B-V1.0', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16[174/1901]
len=4096, download_dir=None, load_format=auto, tensor_parallel_size=4, quantization=None, enforce_eager=False, seed=0)
Traceback (most recent call last):
  File "/mnt/hwfile/limo/opencompass_fork/configs/needleinahaystack/wizard_debug.py", line 10, in <module>
    llm = LLM(model="WizardLM/WizardLM-70B-V1.0", trust_remote_code=True,
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 105, in __init__
    self.llm_engine = LLMEngine.from_engine_args(engine_args)
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 309, in from_engine_args
    engine = cls(*engine_configs,
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 109, in __init__
    self._init_workers_ray(placement_group)
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 249, in _init_workers_ray
    self._run_workers(
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 795, in _run_workers
    driver_worker_output = getattr(self.driver_worker,
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/worker/worker.py", line 81, in load_model
    self.model_runner.load_model()
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 64, in load_model
    self.model = get_model(self.model_config)
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/model_executor/model_loader.py", line 72, in get_model
    model.load_weights(model_config.model, model_config.download_dir,
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 313, in load_weights
    for name, loaded_weight in hf_model_weights_iterator(
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 198, in hf_model_weights_iterator
    hf_folder, hf_weights_files, use_safetensors = prepare_hf_model_weights(
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/vllm/model_executor/weight_utils.py", line 154, in prepare_hf_model_weights
    with get_lock(model_name_or_path, cache_dir):
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/filelock/_api.py", line 297, in __enter__
    self.acquire()
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/filelock/_api.py", line 255, in acquire
    self._acquire()
  File "/mnt/petrelfs/limo/miniconda3/envs/opencompass_fork/lib/python3.10/site-packages/filelock/_unix.py", line 39, in _acquire
    fd = os.open(self.lock_file, open_flags, self._context.mode)
PermissionError: [Errno 13] Permission denied: '/tmp/WizardLM-WizardLM-70B-V1.0.lock'

sometimes like this:

(opencompass_fork) [limo@HOST-10-140-60-209 opencompass_fork]$ srun -p llm_dev2 --quotatype=auto --gres=gpu:4 -N1 -u python3 configs/needleinahaystack/wizard_debug.py
srun: job 3348453 queued and waiting for resources
srun: job 3348453 has been allocated resources
srun: Job 3348453 scheduled successfully!
Current QUOTA_TYPE is [reserved], which means the job has occupied quota in RESERVED_TOTAL under your partition.
Current PHX_PRIORITY is normal

2024-01-31 14:29:37,548 INFO worker.py:1724 -- Started a local Ray instance.
INFO 01-31 14:29:42 llm_engine.py:70] Initializing an LLM engine with config: model='WizardLM/WizardLM-70B-V1.0', tokenizer='WizardLM/WizardLM-70B-V1.0', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq$
len=4096, download_dir=None, load_format=auto, tensor_parallel_size=4, quantization=None, enforce_eager=False, seed=0)
INFO 01-31 14:33:36 llm_engine.py:275] # GPU blocks: 30322, # CPU blocks: 3276
INFO 01-31 14:33:38 model_runner.py:501] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-31 14:33:38 model_runner.py:505] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode.
(RayWorkerVllm pid=102741) INFO 01-31 14:33:38 model_runner.py:501] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
(RayWorkerVllm pid=102741) INFO 01-31 14:33:38 model_runner.py:505] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode.
[W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
(RayWorkerVllm pid=102741) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
INFO 01-31 14:34:13 model_runner.py:547] Graph capturing finished in 35 secs.
Processed prompts:   0%|          | 0/4 [00:00<?, ?it/s](RayWorkerVllm pid=102741) INFO 01-31 14:34:13 model_runner.py:547] Graph capturing finished in 35 secs.
(RayWorkerVllm pid=103012) INFO 01-31 14:33:38 model_runner.py:501] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. 
[repeated 2x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(RayWorkerVllm pid=103012) INFO 01-31 14:33:38 model_runner.py:505] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. [repeated 2x across cluster]
Processed prompts: 100%|██████████| 4/4 [00:00<00:00,  6.71it/s]
Prompt: '<s><|im_start|>user\nHello, my name is<|im_end|>\n<|im_start|>assistant\n', Generated text: "Hi there! I'm here to assist you with any questions or tasks you"
Prompt: '<s><|im_start|>user\nThe president of the United States is<|im_end|>\n<|im_start|>assistant\n', Generated text: 'The president of the United States is currently Joe Biden. He was inaugurated'
Prompt: '<s><|im_start|>user\nThe capital of France is<|im_end|>\n<|im_start|>assistant\n', Generated text: 'Paris\n<|im_end|>'
Prompt: '<s><|im_start|>user\nThe future of AI is<|im_end|>\n<|im_start|>assistant\n', Generated text: '<|im_end|>'
(RayWorkerVllm pid=103012) INFO 01-31 14:34:13 model_runner.py:547] Graph capturing finished in 35 secs. [repeated 2x across cluster]
(RayWorkerVllm pid=103012) [W CUDAGraph.cpp:145] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator()) [repeated 2x across cluster]

sidjha1 · 2024-02-03T23:16:42Z

Specifying the download directory worked. Thanks!

TanmayParekh · 2024-02-09T19:13:19Z

@sidjha1 How and where did you specify the download directory? I don't see any argument like this for the LLM class here - https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py

sidjha1 · 2024-02-09T19:43:31Z

Hey @TanmayParekh, I'm putting the vLLM quickstart with the download_dir specified below. IIRC, the parameter becomes part of the kwargs so it is not explicitly mentioned in the parameter list.

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="facebook/opt-125m", download_dir="vllm-download-dir")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

sidjha1 closed this as completed Feb 3, 2024

mgoin mentioned this issue Mar 6, 2024

Move model filelocks from /tmp/ to ~/.cache/vllm/locks/ dir #3241

Merged

kota-iizuka mentioned this issue Mar 23, 2024

[Bugfix] use SoftLockFile instead of LockFile #3578

Merged

a-r-r-o-w mentioned this issue Oct 17, 2024

captioning + dataset preparation + inference + improvements a-r-r-o-w/finetrainers#34

Open

pjyi2147 mentioned this issue Nov 14, 2024

Add download_dir ServerArgs property sgl-project/sglang#2027

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tmp Directory Locked #2675

Tmp Directory Locked #2675

sidjha1 commented Jan 30, 2024

asimmunawar commented Jan 31, 2024

Mor-Li commented Jan 31, 2024

sidjha1 commented Feb 3, 2024

TanmayParekh commented Feb 9, 2024

sidjha1 commented Feb 9, 2024

Tmp Directory Locked #2675

Tmp Directory Locked #2675

Comments

sidjha1 commented Jan 30, 2024

asimmunawar commented Jan 31, 2024

Mor-Li commented Jan 31, 2024

sidjha1 commented Feb 3, 2024

TanmayParekh commented Feb 9, 2024

sidjha1 commented Feb 9, 2024