Skip to content
View SulRash's full-sized avatar
๐Ÿค–
Training teeny tiny models
๐Ÿค–
Training teeny tiny models

Highlights

  • Pro

Block or report SulRash

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
SulRash/README.md

Hi there! I'm Sultan ๐Ÿ‘‹

Homepage

๐Ÿค– AI Engineer working on multilingual NLP, smol language modelling, huge language models, and educational AI! Previously helped build ALLaM, a state-of-the-art Arabic-English language model :)

๐Ÿ“š Recent Publications

  • SmolTulu - Highest performing sub 2B model on reasoning benchmarks - An investigation into learning rate & batch size ratios
  • Fineweb-Edu-Ar - Largest open-source machine translated Arabic educational dataset
  • ALLaM - State-of-the-art Arabic-English LLM
  • When Benchmarks are Targets - Analysis of LLM evaluation sensitivity (ACL 2024)

๐Ÿš€ Projects

๐ŸŒ Let's connect! Find me on LinkedIn!

Pinned Loading

  1. minLLMTrain minLLMTrain Public

    Minimal yet high performant code for pretraining llms. Attempts to implement some SOTA features. Implements training through: Deepspeed, Megatron-LM, and FSDP. WIP

    Python 5

  2. Cheatsheet Cheatsheet Public

    An attempt at improving facial recognition performance through appending a 'cheatsheet' to an image with one positive sample and multiple negatives during training.

    Python 5

  3. huggingface-text-data-analyzer huggingface-text-data-analyzer Public

    Analyzes text datasets from huggingface for training LLMs!

    Python 4

  4. envenc envenc Public

    Repository for environment encoder, an attempt at improving reinforcement learning agents' generalisability through learning how to act on universal multimodal embeddings generated by a vision-langโ€ฆ

    Python 2

  5. AnshulSood11/Engagement-Level-Prediction AnshulSood11/Engagement-Level-Prediction Public

    Engagement Intensity Prediction in Real TIme

    C++ 15 9

  6. microsoft/Megatron-DeepSpeed microsoft/Megatron-DeepSpeed Public

    Forked from NVIDIA/Megatron-LM

    Ongoing research training transformer language models at scale, including: BERT & GPT-2

    Python 1.9k 344