Add LLama CPP Support #335

bayedieng · 2024-10-12T13:09:31Z

This PR is meant to add LLama CPP Support using the ggml inference engine. For the sake of simplicity, this PR will take Q8(8-bit) format GGUF Files, implementing the LLAMA model, and infer from their weights. The process involves dequantizing the weights into 32-bit floats to perform computations and re-quantizing them to Q8s for memory-efficiency.

Steps

Parse model weights
Implement Tensor Operations to Perform Inference
Implement Full Model

Closes #167

bayedieng added 4 commits October 11, 2024 10:51

initial setup and download

2cef457

add reminder to support more models later

5f5073a

initial model

96bbb0c

add gguf parser

b1da514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLama CPP Support #335

Add LLama CPP Support #335

bayedieng commented Oct 12, 2024 •

edited

Loading

Add LLama CPP Support #335

Are you sure you want to change the base?

Add LLama CPP Support #335

Conversation

bayedieng commented Oct 12, 2024 • edited Loading

Steps

bayedieng commented Oct 12, 2024 •

edited

Loading