Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Phi-3-mini does not work when using Ray #6607

Closed
baughmann opened this issue Jul 20, 2024 · 21 comments · Fixed by #6751
Closed

[Bug]: Phi-3-mini does not work when using Ray #6607

baughmann opened this issue Jul 20, 2024 · 21 comments · Fixed by #6751
Assignees
Labels
bug Something isn't working ray anything related with ray

Comments

@baughmann
Copy link

baughmann commented Jul 20, 2024

Your current environment

PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Fedora Linux 40 (Workstation Edition) (x86_64)
GCC version: (GCC) 14.1.1 20240701 (Red Hat 14.1.1-7)
Clang version: 18.1.6 (Fedora 18.1.6-3.fc40)
CMake version: version 3.30.0
Libc version: glibc-2.39

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.9.9-200.fc40.x86_64-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 555.58.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 9 7950X 16-Core Processor
CPU family:                           25
Model:                                97
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             2
CPU(s) scaling MHz:                   51%
CPU max MHz:                          5881.0000
CPU min MHz:                          545.0000
BogoMIPS:                             8999.97
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
Virtualization:                       AMD-V
L1d cache:                            512 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             16 MiB (16 instances)
L3 cache:                             64 MiB (2 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-31
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Mitigation; Safe RET
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] onnx==1.16.1
[pip3] onnxruntime==1.18.1
[pip3] onnxruntime-gpu==1.18.1
[pip3] sentence-transformers==3.0.1
[pip3] torch==2.3.0
[pip3] torchvision==0.18.0
[pip3] transformers==4.42.4
[pip3] triton==2.3.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
[conda] sentence-transformers     3.0.1                    pypi_0    pypi
[conda] torch                     2.3.0                    pypi_0    pypi
[conda] torchvision               0.18.0                   pypi_0    pypi
[conda] transformers              4.42.4                   pypi_0    pypi
[conda] triton                    2.3.0                    pypi_0    pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.5.1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     0-31    0               N/A
GPU1    PHB      X      0-31    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

Feel free to use this gist with a minimal Jupyter notebook.

When attempting to load any Phi-3 mini/small model using the AsyncLLMEngine and specifying ray as the distributed backend, Ray throws a:

ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'

A pip list in my main project shows

sentence-transformers           3.0.1
transformers                             4.42.4

although it sounds like this is likely not a bug with my project.

I highly encourage you to look at the Jupyter notebook, but for completeness, here's how I'm trying to load the model:

from vllm import AsyncEngineArgs, AsyncLLMEngine

# model source: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

# this config works
mp_args = AsyncEngineArgs(
    model="../../models/microsoft/Phi-3-mini-128k-instruct",
    trust_remote_code=True,
    distributed_executor_backend="mp",
    max_model_len=8000, # limit mem utilization for this example
    disable_sliding_window=True, # needed in order to use flash-attn
)

# this config does not work. it just sits at 
#   "INFO worker.py:1779 -- Started a local Ray instance. View the dashboard at..."
# The actor dies with a `ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'`
ray_args = AsyncEngineArgs(
    model="../../models/microsoft/Phi-3-mini-128k-instruct",
    trust_remote_code=True,
    max_model_len=8000,
    engine_use_ray=True,
    distributed_executor_backend="ray",
)

# engine = AsyncLLMEngine.from_engine_args(mp_args)
engine = AsyncLLMEngine.from_engine_args(ray_args)

Additionally, here's the full System log from the dead actor:

1[2024-07-20 10:54:21,234 I 193920 193920] core_worker_process.cc:107: Constructing CoreWorkerProcess. pid: 193920
[2024-07-20 10:54:21,235 I 193920 193920] io_service_pool.cc:35: IOServicePool is running with 1 io_service.
[2024-07-20 10:54:21,236 I 193920 193920] grpc_server.cc:134: worker server started, listening on port 32921.
[2024-07-20 10:54:21,238 I 193920 193920] core_worker.cc:275: Initializing worker at address: 192.168.88.7:32921, worker ID 8251a92f00f8164eafa28810f113655f1ff265d396be3f6ab41f0ba5, raylet b0e273bf47f964aa3b48176dcb0fc921baec9887c10c232502e64270
[2024-07-20 10:54:21,238 I 193920 193920] task_event_buffer.cc:177: Reporting task events to GCS every 1000ms.
[2024-07-20 10:54:21,239 I 193920 193920] core_worker.cc:704: Adjusted worker niceness to 15
[2024-07-20 10:54:21,239 I 193920 193957] core_worker.cc:643: Event stats:


Global stats: 13 total (9 active)
Queueing time: mean = 6.636 us, max = 45.069 us, min = 6.370 us, total = 86.269 us
Execution time:  mean = 21.033 us, total = 273.426 us
Event stats:
	PeriodicalRunner.RunFnPeriodically - 7 total (5 active, 1 running), Execution time: mean = 1.971 us, total = 13.800 us, Queueing time: mean = 11.414 us, max = 45.069 us, min = 34.830 us, total = 79.899 us
	CoreWorker.ExitIfParentRayletDies - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberCommandBatch - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::WorkerInfoGcsService.grpc_client.AddWorkerInfo - 1 total (0 active), Execution time: mean = 229.476 us, total = 229.476 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::InternalPubSubGcsService.grpc_client.GcsSubscriberPoll - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::WorkerInfoGcsService.grpc_client.AddWorkerInfo.OnReplyReceived - 1 total (0 active), Execution time: mean = 30.150 us, total = 30.150 us, Queueing time: mean = 6.370 us, max = 6.370 us, min = 6.370 us, total = 6.370 us
	Publisher.CheckDeadSubscribers - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s

-----------------
Task Event stats:

IO Service Stats:

Global stats: 4 total (1 active)
Queueing time: mean = 4.330 us, max = 12.240 us, min = 5.080 us, total = 17.320 us
Execution time:  mean = 71.079 us, total = 284.315 us
Event stats:
	PeriodicalRunner.RunFnPeriodically - 1 total (0 active), Execution time: mean = 58.839 us, total = 58.839 us, Queueing time: mean = 12.240 us, max = 12.240 us, min = 12.240 us, total = 12.240 us
	ray::rpc::TaskInfoGcsService.grpc_client.AddTaskEventData - 1 total (0 active), Execution time: mean = 218.797 us, total = 218.797 us, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	CoreWorker.deadline_timer.flush_task_events - 1 total (1 active), Execution time: mean = 0.000 s, total = 0.000 s, Queueing time: mean = 0.000 s, max = -0.000 s, min = 9223372036.855 s, total = 0.000 s
	ray::rpc::TaskInfoGcsService.grpc_client.AddTaskEventData.OnReplyReceived - 1 total (0 active), Execution time: mean = 6.679 us, total = 6.679 us, Queueing time: mean = 5.080 us, max = 5.080 us, min = 5.080 us, total = 5.080 us
Other Stats:
	grpc_in_progress:0
	current number of task status events in buffer: 0
	current number of profile events in buffer: 0
	current number of dropped task attempts tracked: 0
	total task events sent: 0 MiB
	total number of task attempts sent: 0
	total number of task attempts dropped reported: 0
	total number of sent failure: 0
	num status task events dropped: 0
	num profile task events dropped: 0


[2024-07-20 10:54:21,239 I 193920 193920] event.cc:234: Set ray event level to warning
[2024-07-20 10:54:21,239 I 193920 193920] event.cc:342: Ray Event initialized for CORE_WORKER
[2024-07-20 10:54:21,239 I 193920 193957] accessor.cc:668: Received notification for node id = b0e273bf47f964aa3b48176dcb0fc921baec9887c10c232502e64270, IsAlive = 1
[2024-07-20 10:54:21,239 I 193920 193957] core_worker.cc:4735: Number of alive nodes:1
[2024-07-20 10:54:21,240 I 193920 193920] direct_actor_task_submitter.cc:36: Set max pending calls to -1 for actor 926791420e82ba35c48a118601000000
[2024-07-20 10:54:21,240 I 193920 193920] direct_actor_task_submitter.cc:237: Connecting to actor 926791420e82ba35c48a118601000000 at worker 8251a92f00f8164eafa28810f113655f1ff265d396be3f6ab41f0ba5
[2024-07-20 10:54:21,240 I 193920 193920] core_worker.cc:3010: Creating actor: 926791420e82ba35c48a118601000000
[2024-07-20 10:54:22,071 I 193920 193920] core_worker.cc:878: Exit signal received, this process will exit after all outstanding tasks have finished, exit_type=USER_ERROR, detail=Worker exits because there was an exception in the initialization method (e.g., __init__). Fix the exceptions from the initialization to resolve the issue. Exception raised from an actor init method. Traceback: The actor died because of an error raised in its creation task, �[36mray::_AsyncLLMEngine.__init__()�[39m (pid=193920, ip=192.168.88.7, actor_id=926791420e82ba35c48a118601000000, repr=<vllm.engine.async_llm_engine._AsyncLLMEngine object at 0x7fbb4bd12650>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'
traceback: Traceback (most recent call last):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          ^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'transformers_modules'
[2024-07-20 10:54:22,071 W 193920 193920] direct_actor_transport.cc:189: Actor creation task finished with errors, task_id: ffffffffffffffff926791420e82ba35c48a118601000000, actor_id: 926791420e82ba35c48a118601000000, status: CreationTaskError: Exception raised from an actor init method. Traceback: The actor died because of an error raised in its creation task, �[36mray::_AsyncLLMEngine.__init__()�[39m (pid=193920, ip=192.168.88.7, actor_id=926791420e82ba35c48a118601000000, repr=<vllm.engine.async_llm_engine._AsyncLLMEngine object at 0x7fbb4bd12650>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'
traceback: Traceback (most recent call last):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          ^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'transformers_modules'
[2024-07-20 10:54:22,079 I 193920 193920] core_worker.cc:856: Try killing all child processes of this worker as it exits. Child process pids: 
[2024-07-20 10:54:22,080 I 193920 193920] core_worker.cc:815: Disconnecting to the raylet.
[2024-07-20 10:54:22,080 I 193920 193920] raylet_client.cc:161: RayletClient::Disconnect, exit_type=USER_ERROR, exit_detail=Worker exits because there was an exception in the initialization method (e.g., __init__). Fix the exceptions from the initialization to resolve the issue. Exception raised from an actor init method. Traceback: The actor died because of an error raised in its creation task, �[36mray::_AsyncLLMEngine.__init__()�[39m (pid=193920, ip=192.168.88.7, actor_id=926791420e82ba35c48a118601000000, repr=<vllm.engine.async_llm_engine._AsyncLLMEngine object at 0x7fbb4bd12650>)
  At least one of the input arguments for this task could not be computed:
ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'
traceback: Traceback (most recent call last):
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          ^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'transformers_modules', has creation_task_exception_pb_bytes=1
[2024-07-20 10:54:22,080 I 193920 193920] core_worker.cc:723: Shutting down a core worker.
[2024-07-20 10:54:22,080 I 193920 193920] task_event_buffer.cc:188: Shutting down TaskEventBuffer.
[2024-07-20 10:54:22,080 I 193920 193978] task_event_buffer.cc:170: Task event buffer io service stopped.
[2024-07-20 10:54:22,080 I 193920 193920] core_worker.cc:749: Disconnecting a GCS client.
[2024-07-20 10:54:22,080 I 193920 193920] core_worker.cc:753: Waiting for joining a core worker io thread. If it hangs here, there might be deadlock or a high load in the core worker io service.
[2024-07-20 10:54:22,080 I 193920 193957] core_worker.cc:986: Core worker main io service stopped.
[2024-07-20 10:54:22,084 I 193920 193920] core_worker.cc:766: Core worker ready to be deallocated.
[2024-07-20 10:54:22,084 I 193920 193920] core_worker_process.cc:245: Task execution loop terminated. Removing the global worker.
[2024-07-20 10:54:22,084 I 193920 193920] core_worker.cc:714: Core worker is destructed
[2024-07-20 10:54:22,084 I 193920 193920] task_event_buffer.cc:188: Shutting down TaskEventBuffer.
[2024-07-20 10:54:22,084 I 193920 193920] core_worker_process.cc:148: Destructing CoreWorkerProcessImpl. pid: 193920
[2024-07-20 10:54:22,084 I 193920 193920] io_service_pool.cc:47: IOServicePool is stopped.
[2024-07-20 10:54:22,239 I 193920 193920] stats.h:120: Stats module has shutdown.

Also, thank you guys for such a great library. Its very easy and fun to use and bugs like this are few and far between 😄

Edit: I've also tried this with 0.5.2 and the 0.5.3 prerelease per @rkooo567 's question

@baughmann baughmann added the bug Something isn't working label Jul 20, 2024
@youkaichao
Copy link
Member

does distributed_executor_backend="mp" work?

@youkaichao
Copy link
Member

cc @rkooo567 @richardliaw for the ray error.

@rkooo567
Copy link
Collaborator

I feel like I have seen this before, and it may have been fixed in the latest version. have you tried 0.5.2?

@baughmann
Copy link
Author

baughmann commented Jul 23, 2024

I feel like I have seen this before, and it may have been fixed in the latest version. have you tried 0.5.2?

Oh dang I didn't even realize it came out. Let me upgrade and report back.

Edit:
Same result with both 0.5.2 and 0.5.3 using the notebook posted in my OP. The actor dies immediately with ModuleNotFoundError: No module named 'transformers_modules'. Good catch, though @rkooo567

@baughmann
Copy link
Author

baughmann commented Jul 23, 2024

does distributed_executor_backend="mp" work?

Yes. I didn't explicitly set it in the original notebook because, as I understand it, mp is the default if there are enough GPUs on the local system to satisfy the tensor parallelization requirement. I'm updating the original notebook to include that in the working engine args. I'm also adding disable_sliding_window to the working args because otherwise it doesn't use flash-attn

@rkooo567
Copy link
Collaborator

I assume it is the same issue as #4286.

@baughmann is there a repro I can try?

@rkooo567
Copy link
Collaborator

maybe we need more fundamental solution for this case

@baughmann
Copy link
Author

baughmann commented Jul 23, 2024

@rkooo567 Thanks for looking into this.

I put the content of the Jupyter notebook that I used to produce the error in the OP, but I'll also put it here for your convenience :)

The model used is just the official one.

from vllm import AsyncEngineArgs, AsyncLLMEngine

# model source: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct

# this config works
mp_args = AsyncEngineArgs(
    model="../../models/microsoft/Phi-3-mini-128k-instruct",
    trust_remote_code=True,
    distributed_executor_backend="mp",
    max_model_len=8000, # limit mem utilization for this example
    disable_sliding_window=True, # needed in order to use flash-attn
)

# this config does not work. it just sits at 
#   "INFO worker.py:1779 -- Started a local Ray instance. View the dashboard at..."
# The actor dies with a `ray.exceptions.RaySystemError: System error: No module named 'transformers_modules'`
ray_args = AsyncEngineArgs(
    model="../../models/microsoft/Phi-3-mini-128k-instruct",
    trust_remote_code=True,
    max_model_len=8000,
    engine_use_ray=True,
    distributed_executor_backend="ray",
)

# engine = AsyncLLMEngine.from_engine_args(mp_args)
engine = AsyncLLMEngine.from_engine_args(ray_args)

@tjohnson31415
Copy link
Contributor

tjohnson31415 commented Jul 24, 2024

Hello. I've been investigating the same error but in the context of multi-node inference with Ray. I created #6751 which fixes the issue for me.

Perhaps my fix will help in this scenario as well. I attempted to reproduce the error raised here using the code examples in this issue, but was unable to (using the latest vLLM code); the AsyncLLMEngine is created without error for me 🤔

@baughmann
Copy link
Author

@tjohnson31415 In my case I'm running only on a single node.

And oh, wow, the notebook didn't give you problems using the ray args? How strange

@tjohnson31415
Copy link
Contributor

tjohnson31415 commented Jul 24, 2024

In my case I'm running only on a single node.

Yeah, that makes it interesting. In my understanding, #4286 should be the most recent fix for the single node case, but that fix looks like it was included in release v0.4.1...
I just tested in my env with vllm==v0.5.1 and it worked with that too 🤔

Some other thoughts/questions:

  • Does it work for you if you use engine_use_ray=False? Maybe that is a new piece of the puzzle.
  • You said you run this in a Jupyter notebook. If you run your code just as a python script, do you get the same error?
    • I'm not running in Jupyter in my test...
  • Do the transformers_modules dynamic modules exist in ~/.cache/huggingface/modules/transformers_modules/ (assuming HF_HOME and HF_MODULES_HOME are unset)? And are they regenerated by running your code?
    • I rm -rf ~/.cache/huggingface/modules/ each time I try to run the code to confirm that the modules do get regenerated.

@rkooo567
Copy link
Collaborator

Let me also take a look at it quickly. I am a little busy by other high priority task from our end.

@baughmann
Copy link
Author

@tjohnson31415 What I'll do is make a minimal conda env with a minimal reqs etc. and then perform the troubleshooting on that.

I will upload and post that repo when I'm able, but it may not be until later today

@baughmann
Copy link
Author

baughmann commented Jul 25, 2024

@tjohnson31415 @rkooo567 Here is a repo with conda for you all. Also created a basic readme for your convenience.

@tjohnson31415 Here's the update regarding your suggestions:

Does it work for you if you use engine_use_ray=False?

Yes, it does. As I understand it, with emgine_use_ray=False, it's not using Ray--it's using multiprocessing (I could be wrong about this though). I was able to use phi-3 with mp before, but I need to use it with Ray.

You said you run this in a Jupyter notebook. If you run your code just as a python script, do you get the same error?

Yes, I'm afraid so. That's not surprising though as I run Jupyter notebooks in the same exact development environment I'm writing my application in. For me, it's just a quick and easy way to experiment with specific parts of my application.

Do the transformers_modules dynamic modules exist in...

Yes, it does. I added a note about this in the readme of the repo I posted. I see a Phi-3-mini-128k-instruct directory in that folder. If I delete .cache/huggingface/modules/transformers_modules/ it gets re-created the next time I try to run it with Ray.

@tjohnson31415
Copy link
Contributor

tjohnson31415 commented Jul 26, 2024

@baughmann Ah, thanks for creating the repro-repo! I didn't realize that engine_use_ray=True would not print out any logs from the main engine loop. I can see the System error: No module named 'transformers_modules' error by looking in the logs in the Ray dashboard, as you stated in your repro steps.

As I understand it, with engine_use_ray=False, it's not using Ray

To make the workers executing the model use Ray, distributed_executor_backend="ray" is sufficient. engine_use_ray is a separate configuration that puts the engine execution itself into a Ray process separate from the server process.

The error occurs because the Ray worker spawned for the engine loop with engine_use_ray does not have its python path updated to include the dynamic modules generated for trust_remote_code. The failure occurs when communicating the ModelConfig from the main process to the Ray engine worker here.

The current way that the (non-engine) Ray workers handle this is that the WorkerWrapperBase is initialized where it runs init_cached_hf_modules before the args containing the ModelConfig are passed into init_worker. A wrapper/base like that could be added for the engine worker too... Or I think something similar to my fix in #6751 to pass the model configuration as a simple class instead of as an instance of a dynamic class generated in transformers_modules would work too.

But quickest fix is engine_use_ray=False.

@baughmann
Copy link
Author

baughmann commented Jul 27, 2024

@tjohnson31415 That most certainly did it! Thank you for the detailed explanation, that makes a lot of sense!

However, I would still expect feature parity among the supported models. Should we leave this ticket open, even though there is that workaround?

@justinthelaw
Copy link

justinthelaw commented Aug 1, 2024

I am having this issue as well, and the workaround works. I am also curious as to when this will be implemented into the engine? If there is an open branch or fork, can someone link it here?

EDIT: NVM I found it! Thank you all!

@rkooo567 rkooo567 self-assigned this Aug 1, 2024
@rkooo567 rkooo567 added the ray anything related with ray label Aug 1, 2024
@tjohnson31415
Copy link
Contributor

Just to note it here, there is a new RFC to remove --engine-use-ray altogether:
#7045

If the RFC is accepted, a fix for this issue may not be relevant for very long.

@nightflight-dk
Copy link

It would appear mp exec is now also affected in 0.6.1.post2
these two arguments to LLM don't help:

              distributed_executor_backend="mp",
              worker_use_ray=False,
ModuleNotFoundError: No module named 'transformers_modules.Phi-3'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'transformers_modules.Phi-3'

@nightflight-dk
Copy link

nightflight-dk commented Sep 17, 2024

workaround for me was to remove '.' from the model name (path), before instantiating the engine. e.g. "weights/Phi3.5mini-instruct" -> "weights/Phi35mini-instruct" (on top of disabling Ray)

@youkaichao
Copy link
Member

@nightflight-dk it makes sense. the name might be used for import, and . has special meaning in python's import system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ray anything related with ray
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants