Skip to content
View davidpissarra's full-sized avatar

Highlights

  • Pro

Organizations

@mlc-ai

Block or report davidpissarra

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Artifacts of EVT ASPLOS'24

Python 20 2 Updated Mar 6, 2024

Tile primitives for speedy kernels

Cuda 1,789 82 Updated Dec 23, 2024

An open-source efficient deep learning framework/compiler, written in python.

Python 663 53 Updated Dec 24, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,439 305 Updated Oct 19, 2024

The Tensor Algebra SuperOptimizer for Deep Learning

C++ 693 91 Updated Jan 26, 2023

CUDA GPU Benchmark

Cuda 17 5 Updated Jun 27, 2024

paper and its code for AI System

222 13 Updated Nov 28, 2024

Efficient Triton Kernels for LLM Training

Python 3,959 231 Updated Dec 23, 2024

Fast low-bit matmul kernels in Triton

Python 178 14 Updated Dec 19, 2024

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 295 45 Updated Nov 28, 2021

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 685 39 Updated Dec 23, 2024

Fast CUDA matrix multiplication from scratch

Cuda 545 66 Updated Dec 28, 2023

CUDA Library Samples

Cuda 1,684 355 Updated Dec 22, 2024

CUDA checkpoint and restore utility

Cuda 247 13 Updated Apr 17, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 354 42 Updated Sep 11, 2024

Open weights LLM from Google DeepMind.

Python 2,531 323 Updated Dec 23, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,609 160 Updated Dec 24, 2024

Development repository for the Triton language and compiler

C++ 13,771 1,687 Updated Dec 24, 2024
Python 3 4 Updated Nov 6, 2023

MLX: An array framework for Apple silicon

C++ 17,971 1,036 Updated Dec 23, 2024

Mamba SSM architecture

Python 13,591 1,161 Updated Dec 6, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 32,374 4,937 Updated Dec 23, 2024

Sparsity-aware deep learning inference runtime for CPUs

Python 3,058 176 Updated Jul 19, 2024

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

Python 683 40 Updated Apr 10, 2024

Universal LLM Deployment Engine with ML Compilation

Python 19,453 1,598 Updated Dec 19, 2024

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 11,870 3,489 Updated Dec 16, 2024
Showing results