You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now when I run generate with the same command, I get really low acceptance rate:
Acceptance probs: [0.5620253164556962, 0.3632911392405063, 0.07088607594936709, 0.0037974683544303796, 0.0, 0.0]
Mean Accepted: 0.5164556962025316
Average tokens/sec: 24.87
Memory used: 22.12 GB
But if I don't pass --compile I get the same acceptance rate as before:
Acceptance probs: [0.07142857142857142, 0.05102040816326531, 0.04591836734693878, 0.05102040816326531, 0.07142857142857142, 0.7091836734693877]
Mean Accepted: 4.127551020408164
Average tokens/sec: 24.03
Memory used: 22.10 GB
My question is why is this one line causing that drastic decline in quality when using compile? Here is the commit with the change in my fork: kalradivyanshu@20bd673
Any insights will really be appreciated. Thankyou!
The text was updated successfully, but these errors were encountered:
I cloned the gpt-fast repo, and tried it out with Llama-3, to setup I ran the following code:
Now I ran generate with speculative decoding:
I get:
Which makes sense. I was playing around with the model and added just one line in Transformer's forward method:
Now when I run generate with the same command, I get really low acceptance rate:
But if I don't pass
--compile
I get the same acceptance rate as before:My question is why is this one line causing that drastic decline in quality when using compile? Here is the commit with the change in my fork: kalradivyanshu@20bd673
Any insights will really be appreciated. Thankyou!
The text was updated successfully, but these errors were encountered: