Hackathon 2020 项目汇总

序号	队名	参赛主题/Program Description	项目设计/RFC	GitHub Repo
一等奖	' or 0=0 or '	我们将为 TiDB 实现一个用户自定义函数（UDF，User Defined Function）引擎。通过 UDF 用户可以编写复杂的自定义函数执行逻辑，并在数据库上直接进行计算。我们的 UDF 实现具有以下特性：高性能；沙盒安全，可用于 TiDB Cloud；执行逻辑丰富，甚至支持机器学习、受控的网络访问等；用户可使用多种编程语言（例如自己最熟悉的语言）编写函数逻辑；相同的 UDF 能同时在 TiDB、TiKV 和 TiFlash 上执行，最大化利用分布式资源；相同的 UDF 同时兼容 x86、ARM 等不同平台架构，TiDB Cloud 可以无负担切换到 ARM；为 MySQL 编写的 UDF 可运行在 TiDB 上；符合人行分布式数据库评测标准。	https://docs.google.com/document/d/1TdZJR9vkVgfvpiirhruQKnu7pLYfK6jEhZWkZctAov4/edit	https://github.com/tidb-hackathon-2020-wasm-udf
二等奖	B.A.D	开发一个 Visual Studio Code 插件，极大提升开发者开发和调试 TiDB 项目的体验。	https://gist.github.com/dragonly/dfe1d5c9ecb50eb4fc460a663a5d5076	https://github.com/dragonly/ticode
二等奖	TiGraph	Integrate graph mode in TiDB and make TiDB a hybrid mode distributed database.	https://docs.google.com/document/d/1KHOAYJLim-A7YMOVZvZUc1PIxXDmT6nk0fQ5GNiU0T8/edit	https://github.com/tigraph/tidb
三等奖	zh.md	本提案提供一套侧重中文技术文档的写作风格检查与自动化管理解决方案，包括：完整的中文文档写作风格指南：为文档作者和审校者提供统一的文档写作参考规范中文文档分析工具和中文文档风格检测工具：可检测存量或增量文档中的风格问题基于 GitHub 的文档管理 bot：可以自动建立 issue 或 PR 修复存量或增量文档中的风格问题基于机器学习辅助文档写作：可以辅助生成具有统一风格的文档可明显降低文档的审查工作量、提高文档整体质量，调动工程师积极参与文档内容建设。本提案适用于任何需要专业技术文档的项目，具有普及性。	https://github.com/WPH95/zh.md/blob/main/PROPOSAL.md	https://github.com/zh-md/zh.md
三等奖 & 云启资本最具市场潜力奖	Mods	GPU Accelerated TiDB For Analytical Queries Empower TiDB using GPU acceleration techniques to improve performance of CPU-intensive analytical query processing, such as joins, aggregations, etc. Other than scaling out, which TiDB is already capable of, the ability of scaling up by embracing new generation hardware will be extended.	https://gist.github.com/zanmato1984/520400e0426ca18ef3711355ac930e86	https://github.com/zanmato1984/cura https://github.com/windtalker/tidb/tree/tidb_cura
华创最具市场潜力奖	Ti-Improve	AWS native testing framework TiDB 作为一款云上数据库，其云上特性与云厂商提供的特性息息相关，如何使用领先的云厂商 aws 基础设施对 TiDB 进行全自动化的测试一直是一个难点和重点，这个 hackathon 项目完全使用 aws 的基础设施来进行测试 TiDB, 如 s3 存储备份文件，lambda 来执行具体的测试事项，aws codepipeline 来作为调度执行引擎，cloud watch 观测 TiDB 的表现，并且可以使用 chaos mesh 作为注入等	https://docs.google.com/document/d/1bgbI0hVkWRtIe_jFSc_Kx0IfbPu15MOyRY7TtW2e6xI/edit
CNCF 特别奖	森海飞霞	dynamic copysets proposal：对于一个分布式存储系统，使用多副本机制可以保证数据的安全性。然而，通常情况下，随着集群规模的增长，绝大部分的数据副本数并不会随之增长。久而久之，就会出现当集群规模成百上千时，挂掉的机器节点数等于或者大于数据的副本数(这个数字通常是 3 )的概率也会随之上升。对于一个 3副本的集群，当集群发生 3 个节点宕机时，对于不同的调度算法而言，丢失数据的概率与影响范围也是不同的。我们希望通过 dynamic copysets 在该情况发生时，降低丢失数据的概率。	https://docs.google.com/document/d/1emQUIJ_ayczFCWNvVVgzdGY8n6LSyE7fDC5tXb5ZjmI/edit	https://github.com/Yisaer/pd/tree/support_elastic_copyset
GGV 最具市场潜力奖	hundundm	TiBI: 将 Metabase 类似的 BI 分析工具结合到 TiDB 生态中，在 TiDB Dashboard 直接进行展示，为 TiDB 插上 BI 分析的翅膀。既能帮助分析师能基于 TiDB 直接进行数据可视化分析，也能帮助 DBA 直接分析集群状况，如表的数据分布，索引分布等。	https://docs.google.com/document/d/1-yuJ3GLrrRqFC1YFBAzQ56ppTGvbtVNJgIf_jPvXa6c/edit	TiDB Dashboard 可视化界面：https://github.com/HunDunDM/tidb-dashboard/tree/TiBI/master RocksDB SST 级别数据统计： https://github.com/CheneyDing/rocksdb/tree/sst-status Rust-RocksDB 统计接口：https://github.com/nolouch/rust-rocksdb/tree/sst-status TIKV APIhttps://github.com/nolouch/tikv/tree/sst-status TiDB 内存表https://github.com/lhy1024/tidb/tree/sst
最佳人气奖	TiFlink	为 Flink 实现更好的 TiKV Source、Sink 和 TiDB Catalog Reader，支持 Snapshot 读取和 Change Log 增量读取和2 Phase Commit 写入。以实现在 TiDB 里快速创建 Materialized View 和在 Flink 里方便编写读写 TiKV 数据的批/流处理任务。	https://docs.google.com/document/d/1ksdPxk4WLIzTJmy7pPh37JrsoBSuJy8uK28rABWQYa0/edit	github.com/TiFlink/TiFlink
10	TIS	一键自动生成 TiDB 表倒排索引，通过 TiDB DDL语句与 TIS 无缝整合，全自动生成基于 Lucene 的倒排索引，支持 TiDB 基于 ngram 或者 IK 等分词的模糊查询	https://docs.google.com/document/d/1ENHISp-I5edo_BDS-YBVMTlry9WpkVxobjJ8Eh2YZ-o/edit
11	队长负责划水	使用 Linux kernel 中 ebpf 针对分布式 DB 集群中 TiDB working 进程或者线程进行在线性能分析，使用 ebpf 追踪、查看、嗅探 TiDB 进程在分布式集群中的表现，无论是裸机还是容器环境中，分布式系统都可以动态感知到进程的变化情况。Trick：加上 CUDA 以及某种 ML&DL 模型针对 TiDB 的进程或者线程或者容器进程或者线程的表现，进行针对性的数学建模	https://github.com/devillove084/TiDBHackathon/blob/main/RFC.md
12	平头哥	降低云上 TiDB 跨 AZ 流量费用	https://docs.google.com/document/d/1mYF57vUeK572Lf-RKq-iwlIfsetGAQs8Tqj3KscEYbk/edit	https://github.com/zhangjinpeng1987/tidb
13	滑滑蛋	该项目的灵感来自于 https://disksing.com/hackathon-idea/： > 根据马斯洛需求层次理论，当基本的生理和安全需求被满足了之后，人们就需要去满足更高级别的精神级别的需求。因此，我大胆断定，现在对于 TiDB 的 DBA 来说，运维 TiDB 最大的问题就是：不够酷炫，不够嗨！该项目使用 VR 增强运维 TiDB 时的带入感，给予 DBA 沉浸式的运维体验！	https://github.com/fantastic-things/ticockpit-proposal	https://github.com/fantastic-things/ticockpit
14	T4	在万物互联的今天，各种数据的体量都在迅速膨胀。不但是传统意义上的离线分析场景，在线交易系统所面对的数据也迅速向 BigData scale 迈进。如何用最合理的成本支撑更大的业务数据流量成为这个时代 TiDB 需要思考的一个问题。而这个问题的核心是更好的匹配 TiDB 和它所承载的数据的价值，数据价值超出 TiDB 成本的部分越大，TiDB 为业务创造的价值也越大。解决数据价值和 TiDB 成本问题可以从两个方向考虑。一种方向是使用各种方式降低 TiDB 的成本，比如利用分级存储将冷数据放置在更低成本的存储介质上。而另外一种方向则是提升数据的价值密度，将 TiDB 的能力用于最有价值的数据上。本项目的目标是使用具备普适性的时间维度对数据进行自动化的蒸馏，让 TiDB 的每一份资源投入都应用在高度萃取的高价值数据上。 T4 项目旨在为 TiDB 的数据表增加 TTL 能力，让数据以指定的 TTL 策略自动过期并回收对应的资源。实现数据价值的实时蒸馏，萃取高价值的新鲜数据。	https://docs.google.com/document/d/1iG1sf_gKYiXn_WCpFdBta2zJrCx8WuURjfFeq9Ljm_w/edit	https://github.com/tidb-hackathon/t4
15	zhangyushao	Boost bug fix efficiency. In this project, we identified the inefficiency in how currently TiDB bug issues are processed. After a bug issue is submitted, a developer always tries to manually reproduce the case with the provided test cases. It is not rare that an issue is automatically tagged as a bug before it is finally classified as a not a bug/cannot reproduce by another developer. Our project aims to reduce manual efforts during the mentioned process and provide the developer with more precise and concise bug information. We plan to implement a Continuous Integration (CI) tool to smartly reproduce the bug issue and to previously tag it before any manual efforts. We also expect to link each issue with an environment to facilitate the later manual recheck by the developer.	https://docs.google.com/document/d/1s8zTdHZnohcrDWNRzk8Y1zAk7CwCeM9CtzK_rko_y9o/edit	https://github.com/ti-community-infra/tichi/tree/ching-wei 代码最终分支我们的代码会先 PR
16	Uchiha clan	Flexible-Raft allows us to replicate the state log safely, without having to replicate the state log to the majority of the nodes in the cluster to achieve quorum. This idea is drawn out from the Flexible-Paxos. The submitted RFC contain the detailed information.	https://docs.google.com/document/d/1t5XcdJj8do5dv8uRdmgS2px7Ts0qWivPn9P8pe4IR40/edit	https://github.com/akashsharma95/flexible-raft-benchmark https://github.com/akashsharma95/raft-rs
17	tidb-mysql-injection-attack	从数据库 SQL 解析层面，设计一套原生防 SQL 注入的检测体系，提升数据库的应用安全性。
18	Phantom Ensemble	Deterministic on TiDB 1. 在 TiDB 上实现确定性事务模型，预期在 autocommit 的场景下大幅度提升性能，采用与现有 2pc 事务模型兼容的设计。 2. 我们重新思考 2pc 的原子性解决方法，提出了一种新的保证事务原子性的方案，在性能与容错之间选取了另一种策略。	https://docs.google.com/document/d/1AgUGvInmM5o_pw8UBlaxJUmGCFUJUUx0LoX1fhr7ph0/edit	https://github.com/you06/tidb/tree/support-deterministic, https://github.com/MyonKeminta/tikv/tree/deterministic, https://github.com/you06/kvproto/tree/support-deterministic
19	TiDB Trace	TiDB 全量 SQL 采集及性能分析采集并记录访问 TiDB 的 SQL 语句，以及对应 SQL 语句的性能指标，用来更细粒度的分析语句级的性能瓶颈问题。实现思路： 1、采集程序数据源基于 information_schema.cluster_statements_summary 表，在 TiDB Server 节点上间隔 10ms 获取数据，在内存中比较前后两次间隔获取的查询结果集的差异值记录下来。对于执行时间小于间隔时间的 SQL 语句，我们可以近似的获取到这条 SQL 语句执行对应的性能指标。 2、采集程序输出文件到 tmpfs，基于内存的文件系统中。 3、导入程序，通过访问 tmpfs 中的 trace 临时文件，拼接成批量的 SQL，直接 insert 到 clickhouse 数据库中。 4、需要修改源码扩展一些功能： QUERY_SAMPLE_TEXT：现在记录的是原 SQL 语句，一个 SQL 语句带参数的样本，可以考虑修改为上次执行的 SQL 语句明细现在表中没有记录客户端 IP，可以考虑增加客户端 IP 将 LAST_SEEN 字段的时间精度提高到毫秒级等	https://github.com/wangdong58/tidb_trace/blob/main/README.md
20	TIDB flashback	生产环境数据类的误操作易引发 RCA，闪回工具对于数据回滚及其重要。随着 TiDB 推广，为快速回滚 TiDB 误操作，需要 TiDB 闪回工具。	[https://github.com/cenkore/tidbflashback/blob/master/TiDB%202020%20Hackathon%20RFC.md](https://github.com/cenkore/tidbflashback/blob/master/TiDB 2020 Hackathon RFC.md)
21	blue	缓解热点数据对集群中某个TiKV的压力。
22	龙姐姐说的都队	TiFS：a FUSE based on TiKV	https://github.com/Hexilee/tifs
23	2021~~	TiDB-Operator 支持可灰度的 Pod 原地升级。 1. 为运行在 kubernetes 上的 TiDB 集群，提供一种不删除重建 Pod，只重启容器进程的方式进行升级。 2. 通过支持在一套 kubernetes 上同时部署多套 TiDB-Operator，分别接管 TiDB 集群，使得可以对该特性进行充分的灰度验证后发布。	https://docs.google.com/document/d/1wRNuRe2qyLJwmyPENg9szh6ymTghj4jOKbIxxdPDVPg/edit	#N/A
24	进击的韭菜	基于 IPFS 区块链技术的 TiDB 目前 TiDB 需要 TiKV 或者 TiFlash 作为存储。对于供应链溯源、数据交易等场景，存在去中心化、不可篡改的需求。IPFS 协议利用比特币区块链协议和网络基础设施的优势来存储不可更改的数据。我们会将 TiDB 运行在 IPFS 的链上	https://shimo.im/docs/8twhTX6jTKcrPWTy/	https://github.com/harryge00/tidb-hackthon-2021
25	CAAS	Chaos Engineering as a Service	https://github.com/wuntun/rfcs/blob/main/text/2020-12-31-chaos-engineering-as-a-service.md	https://github.com/wuntun
26	接化发	TiKV 通用协处理器实现方案		https://github.com/baiyuqing/jiehuafa.git
27	转转队	walleKV 是一套企业级 KV 存储方案，目标是让 KV 存储更加好用。通过融合多种存储引擎（Redis、TiKV 等）集中化解决海量 KV 持久化的问题，充分利用各种不用类型的存储介质(内存、SSD、机械磁盘、S3), 加速数据写入，加速热数据处理，自动迁移数据到不同的存储介质; 对于企业可以最大程度减少研发和运维的复杂度，并且可以大幅度提升服务器利用率，有效的缩减资源和人力的投入，成为企业降本增效利器。	https://github.com/walletech/wallekv	https://github.com/walletech/wallekv
28	️️	使 TiKV 使 TiKV 使用用户态驱动和存储技术，以提升 TiKV 的整体性能。 IOMMU 等内核功能的提供使得用户态驱动成为了可能，并且在网络栈上 DPDK 已经使用用户态驱动极大地提升了程序性能。而许多云厂商也已使用用户态存储设备的驱动使得云盘性能得到了提升。在本次比赛中将尝试使用 SPDK 提供的工具让 TiKV 基于用户态存储运行，期待获得显著的性能提升。同时也将提供方便易用的迁移机制，方便用户从朴素的操作系统读写方式迁移到使用用户态存储读写。这一实现将提升 TiKV 的性能，提高调度的可操作性和可能性，也将为即将到来的 NVDIMM / DAX 存储的时代做好准备，可谓一举三得。	https://github.com/YangKeao/tikv/blob/hackathon-pingcap/rfc.md	https://github.com/YangKeao/tikv/tree/hackathon-pingcap, https://github.com/YangKeao/rocksdb-spdk/tree/hackathon-pingcap, https://github.com/YangKeao/spdk/tree/hackathon-pingcap, https://github.com/YangKeao/rust-rocksdb/tree/hackathon-pingcap
29	XuanyuanDB	Automatically and continuously tune database knobs to increase performance. We do this by developing a Machine learning model using Deep Reinforcement Learning(DRL) to tune TiDB(/TiKV/RocksDB) settings in order to decrease latency and increase throughput. The proposal is based on Qtune's model which features vectorizing sql query information and using an actor-critic model in order to predict which knobs need to be adjusted.	https://gist.github.com/DallasC/08c5b5861fe660b5a2a1ca418c2442af	https://github.com/DallasC/TiAi
30	评委说的都	本 issue 旨在解决在线、离线数据查询以及流式计算统一 SQL 入口的问题。需要在 TiDB 现有在线数据 SQL 入口的基础上，增加 - 离线数据查询 SQL 入口 - 流式计算 SQL 入口本 issue 希望能通过 TiDB 和 Flink，利用 TiDB 在线数据上的优势，以及 Flink 在离线数据和流式计算上的优势，打造一个统一的 SQL 入口。用户不需要再意识到我是在用 TiDB、Flink、Spark 或是其他计算引擎，只需要在同一个入口，用同一个语法的 SQL 就能解决在线、离线数据查询，流式计算等问题。	https://github.com/marsishandsome/tidb-hackathon-2021-flink/blob/master/RCF.md	https://github.com/marsishandsome/tidb-hackathon-2021-flink
31	Mouse Tail Juice	Support event scheduler (CREATE EVENT) in TiDB.	https://docs.google.com/document/d/1eyozEEG66G-Wwem1OTmbpERIfUkzE1_7gKkNa9KbfOQ/edit
32	团团圆圆	Build a Jupyter kernel which allows interact with TiDB using SQL in Jupyter Notebooks. It will enable us to leverage all compabilities of Jupyter community.	https://github.com/wangfenjin/xeus-tidb	https://github.com/wangfenjin/xeus-tidb
33	CloudSearch	基于 TiKV 的全文搜索引擎；TiDB 使用 TiKV 作为其分布式存储层，在上层构建了无状态的 SQL 查询层。同样的，全文索引的核心是倒排索引，即 token 到 documentID 的映射关系，同样可以使用 KV 存储解决；基于此，我们可以试图去构建新的全文索引引擎。	https://shimo.im/docs/6whKWtJRpCk6pjw9/
34	Xteam	基于 region 级别的 follower read 优化	https://docs.google.com/document/d/1gOqxOEwkW2v6FG2cXA3epRKAIg-SIXZjwQDmfn0EHoQ/edit	https://github.com/shonge/pd https://github.com/slpslpslp/tidb https://github.com/shonge/tikv
35	中年先锋队	自动分析 TiDB 集群性能瓶颈。程序开启后运行指定 workload（如 sysbench），自动调整并发度，并分析系统各项 metrics 指标的变化，然后在可视化页面中展示系统的性能瓶颈，用来引导 DBA 调整配置提升集群性能。
36	young队	TiDB 集群中 PD、TiKV 等组件以及 TiDB Binlog 和备份等工具都需要使用将数据持久化的存储。而 k8s 上目前只支持静态分配 pv 的本地存储，而 TiDB 上使用第三方本地存储 local-static-provisioner 在发现目录 (discovery directory) 下的每一个挂载点创建一个 PV，对于大规模云原生 TiDB 集群通常需要创建大量PV，因此，在发现目录下需要预先创建大量挂载点，多个集群共用发现目录下挂载点时，无法对其挂载点做整体的容量管理，并且对挂载点进行扩容时没有自动化方案。本提案针对云原生 TiDB 在 k8s 上使用本地存储存在的问题，提出了一种基于 CSI+LVM 实现本地动态存储方案，通过 k8s 提供的CSI 将本地存储与 k8s 解耦，使用 LVM 技术实现对本地存储动态管理和在线扩容，并在原生 k8s 调度器基础上增加位置、容量调度因子实现本地存储优化调度。	https://github.com/shipx123/diskmanager/blob/main/doc/design/rfc.md	https://github.com/xuegang/Smart-Localdisk-Management https://github.com/xuegang/LocalDisk-Schedule
37	Fox	在 TiDB 上实现分布式的基于表/库的 IOPS 配额能力。		https://github.com/oofox/drl
38	名字，有用吗	Graph Database on TiKV	https://github.com/leiysky/tidb-hackathon-2020	https://github.com/leiysky/tigraph
39	克兰德曼	在 2020 年，很多云提供商都出现了服务不可用、宕机等情况，在这种情况下，很多厂商会选择混合云等方式减少对云的依赖。在这一背景下，我们希望能模拟云提供商的服务发生故障，观察故障带来的影响，从而提高应用的弹性，可观测性等。	https://docs.google.com/document/d/1XfitU-_HUDOKXAxloVJNkHjljIXay3pOPo8CPzSNinY/edit	https://github.com/cloudman2077
40	Hackthon Fix Typo Team	TiDB 在小表场景下由于存在回表导致请求延迟不敌 MySQL。我们可以特意把小表的索引和数据调度到同一个 Store 上然后用一个请求同时完成 Index Scan 和 Table Scan 来消除回表开销。	https://hackmd.io/lzGLcGNNQbCkYNdJQvxNOA?both	https://github.com/hftt
41	鸽了爽	FDW（FOREIGN DATA WRAPPER，外部数据包装器）可使得数据库可以访问不同的远程数据存储。本提案将严格遵循 SQL/MED，通过对 Table Lifecycle 注入 Hook（复用 TiDB plan，在执行器 builder 时候使用自定义算子逻辑）使用 TiDB Plugin 等手段，使 TiDB 支持 FDW 功能。从而大幅度降低 TiDB 支持远程数据源的难度，提升 TiDB 社区活跃性。	https://gist.github.com/iguoyr/c1fa54fdbe2c335e3b998afddb685e85	https://github.com/iguoyr/tidb
42	好自为之队	TiKV 弱一致读
43	dddd, indeed	基于 raft log 实现 TiDB 的物化视图。通过实现类似于 TiFlash Proxy 的 raft learner，将导出的 raft log 进行预聚合等处理，以此来在 TiDB 获得中类似于物化视图的效果。这里的预聚合结果带有 schema，并且可以满足强一致的 snapshot isolation 事务。除了预聚合之外，还可以通过接入 Flink 等方式来实现更多的功能，如流式 Join 。	https://github.com/LittleFall/tidb-hackathon-2020/blob/main/RFC.md	https://github.com/LittleFall/tidb-hackathon-2020
44	Bourbon	As is described in http://pages.cs.wisc.edu/~yifann/bourbon-osdi20.pdf ，Try learned index on	https://docs.google.com/document/d/1bgbI0hVkWRtIe_jFSc_Kx0IfbPu15MOyRY7TtW2e6xI/edit	https://github.com/spongedu/badger/tree/hackathon_use_mph_v3 https://github.com/liufuyang/tbadger https://github.com/liufuyang/badger/tree/plt-based-on-master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hackathon-2020-projects.md

hackathon-2020-projects.md

Hackathon 2020 项目汇总

Files

hackathon-2020-projects.md

Latest commit

History

hackathon-2020-projects.md

File metadata and controls

Hackathon 2020 项目汇总