This GitHub repository contains an updated list of Federated Learning papers as of November 29, 2024.
- The resources are collected from various sources, including arXiv, NeurIPS, ICML, ICLR, ACL, EMNLP, AAAI, IJCAI, KDD, CVPR, ICCV, ECCV, NIPS, IEEE, ACM, Springer, ScienceDirect, Wiley, Nature, Science, and other top AI/ML conferences and journals.
- For a better reading experience, visit the Shinyapps website.
Explore additional research papers on the following topics:
- For Large Language Models papers, please visit the LLM Repository.
- For Backdoor Learning papers, please visit the Backdoor Learning Repository.
- For Federated Learning papers, please visit the Federated Learning Repository.
- For Machine Unlearning papers, please visit the Machine Unlearning Repository.
For contributions, inquiries, or suggestions, feel free to reach out via email.
If you find this application helpful and would like to support its development, you can buy me a coffee using one of the following methods:
- Techcombank (Vietnam): 5877 5555 55 (Nguyen Thi Lan Phuong)
- PayPal or Credit/Debit Card: https://ko-fi.com/miutheladycat
Due to GitHub repository limitations, this section includes only those papers that provide accompanying code, sorted by publish date. For access to the full list of papers, please visit the Shinyapps website.
No. | Title | Authors | Publish Date | Venue | Code | URL |
---|---|---|---|---|---|---|
1 | A Survey on Large Language Model based Autonomous Agents | Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen | 2024-12-01 | arXiv | https://github.com/Paitesanshi/LLM-Agent-Survey | https://doi.org/10.48550/arXiv.2308.11432 |
2 | The Two Word Test: A Semantic Benchmark for Large Language Models | Nicholas Riccardi, Xuan Yang, Rutvik H. Desai | 2024-12-01 | arXiv | https://github.com/NickRiccardi/two-word-test | https://doi.org/10.48550/arXiv.2306.04610 |
3 | Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study | Hao Hao, Xiaoqun Zhang, Aimin Zhou | 2024-12-01 | arXiv | https://github.com/hhyqhh/LAEA | https://doi.org/10.48550/arXiv.2406.10675 |
4 | DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-11-22 | arXiv:2411.14055, 2024 | https://github.com/hexuandeng/DRPruning | http://arxiv.org/abs/2411.14055v1 |
5 | SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model | Christopher Nguyen, William Nguyen, Atsushi Suzuki, Daisuke Oku, Hong An Phan, Sang Dinh, Zooey Nguyen, Anh Ha, Shruti Raghavan, Huy Vo, Thang Nguyen, Lan Nguyen, Yoshikuni Hirayama | 2024-11-22 | arXiv …, 2024 | https://github.com/aitomatic/semikong | http://arxiv.org/abs/2411.13802v2 |
6 | UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | Bethel Melesse Tessema, Akhil Kedia, Tae-Sun Chung | 2024-11-22 | arXiv:2411.14343, 2024 | https://github.com/bethelmelesse/unifiedcrawl | http://arxiv.org/abs/2411.14343v1 |
7 | Disentangling Memory and Reasoning Ability in Large Language Models | Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang | 2024-11-21 | arXiv …, 2024 | https://github.com/MingyuJ666/Disentangling-Memory-and-Reasoning | http://arxiv.org/abs/2411.13504v2 |
8 | DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Xianda Guo, Ruijun Zhang, Yiqun Duan, Yuhang He, Chenming Zhang, Shuai Liu, Long Chen | 2024-11-21 | arXiv …, 2024 | https://github.com/XiandaGuo/Drive-MLLM | http://arxiv.org/abs/2411.13112v1 |
9 | On the Consistency of Video Large Language Models in Temporal Comprehension | Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao | 2024-11-21 | arXiv:2411.12951, 2024 | https://github.com/minjoong507/Consistency-of-Video-LLM | http://arxiv.org/abs/2411.12951v1 |
10 | Does Unlearning Truly Unlearn? A Black Box Evaluation of LLM Unlearning Methods | Jai Doshi, Asa Cooper Stickland | 2024-11-18 | arXiv | https://github.com/JaiDoshi/Knowledge-Erasure | http://arxiv.org/abs/2411.12103v2 |
11 | FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training | Anjia Cao, Xing Wei, Zhiheng Ma | 2024-11-18 | arXiv | https://github.com/MIV-XJTU/FLAME | http://arxiv.org/abs/2411.11927v1 |
12 | BianCang: A Traditional Chinese Medicine Large Language Model | Sibo Wei, Xueping Peng, Yi-fei Wang, Jiasheng Si, Weiyu Zhang, Wenpeng Lu, Xiaoming Wu, Yinglong Wang | 2024-11-18 | arXiv …, 2024 | https://github.com/QLU-NLP/BianCang | http://arxiv.org/abs/2411.11027v1 |
13 | Multilingual Large Language Models: A Systematic Survey | Shaolin Zhu, Supryadi, Shaoyang Xu, Haoran Sun, Leiyu Pan, Menglong Cui, Jiangcun Du, Renren Jin, António Branco, Deyi Xiong | 2024-11-17 | arXiv | https://github.com/tjunlp-lab/Awesome-Multilingual-LLMs-Papers | http://arxiv.org/abs/2411.11072v1 |
14 | DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models | Yongdong Wang, Runze Xiao, Jun Younes Louhi Kasahara, Ryosuke Yajima, Keiji Nagatani, Atsushi Yamashita, Hajime Asama | 2024-11-17 | arXiv e …, 2024 | https://wyd0817.github.io/project-dart-llm/ | http://arxiv.org/abs/2411.09022v1 |
15 | LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation | Zhenshi Li, Dilxat Muhtar, Feng Gu, Xueliang Zhang, Pengfeng Xiao, Guangjun He, Xiaoxiang Zhu | 2024-11-17 | arXiv e …, 2024 | https://github.com/NJU-LHRS/LHRS-Bot | http://arxiv.org/abs/2411.09301v1 |
16 | TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models | Tingyu Qu, Mingxiao Li, Tinne Tuytelaars, Marie-Francine Moens | 2024-11-17 | arXiv | https://github.com/tingyu215/TS-LLaVA | http://arxiv.org/abs/2411.11066v1 |
17 | Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering | Zeping Yu, Sophia Ananiadou | 2024-11-17 | arXiv | https://github.com/zepingyu0512/llava-mechanism | http://arxiv.org/abs/2411.10950v1 |
18 | Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model | Ting Liu, Liangtao Shi, Richang Hong, Yue Hu, Quanjun Yin, Linfeng Zhang | 2024-11-16 | arXiv | https://github.com/liuting20/MustDrop | http://arxiv.org/abs/2411.10803v1 |
19 | Large Language Models Can Self-Improve in Long-context Reasoning | Siheng Li, Cheng Yang, Zesen Cheng, Lemao Liu, Mo Yu, Yujiu Yang, Wai Lam | 2024-11-16 | arXiv e …, 2024 | https://github.com/SihengLi99/SEALONG | http://arxiv.org/abs/2411.08147v1 |
20 | Evaluating Creativity and Deception in Large Language Models: A Simulation Framework for Multi-Agent Balderdash | Parsa Hejabi, Elnaz Rahmati, Alireza S. Ziabari, Preni Golazizian, Jesse Thomason, Morteza Dehghani | 2024-11-15 | arXiv | https://github.com/ParsaHejabi/Simulation-Framework-for-Multi-Agent-Balderdash | http://arxiv.org/abs/2411.10422v1 |
21 | Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits | Yuxuan Huang | 2024-11-15 | arXiv | https://github.com/Aipura/Orca | http://arxiv.org/abs/2411.10006v1 |
22 | MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs | Mengyuan Zhang, Ruihui Wang, Bo Xia, Yuan Sun, Xiaobing Zhao | 2024-11-15 | arXiv:2411.09492, 2024 | https://github.com/joenahm/MM-Eval | http://arxiv.org/abs/2411.09492v1 |
23 | Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era | Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen | 2024-11-15 | arXiv | https://github.com/tamlhp/awesome-instruction-editing | http://arxiv.org/abs/2411.09955v1 |
24 | CorrectBench: Automatic Testbench Generation with Functional Self-Correction using LLMs for HDL Design | Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li | 2024-11-15 | arXiv e …, 2024 | https://github.com/AutoBench/CorrectBench | http://arxiv.org/abs/2411.08510v1 |
25 | Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination | Haojie Zheng, Tianyang Xu, Hanchi Sun, Shu Pu, Ruoxi Chen, Lichao Sun | 2024-11-15 | arXiv | https://github.com/Terry-Xu-666/visual_inference_chain | http://arxiv.org/abs/2411.12591v1 |
26 | InfiCoder-Eval: Systematically Evaluating the Question-Answering Capabilities of Code Large Language Models | Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang | 2024-11-14 | arXiv | https://infi-coder.github.io/infibench | https://doi.org/10.48550/arXiv.2404.07940 |
27 | MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models | Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju | 2024-11-14 | The Thirty-eight …, 2024 | https://github.com/AI4LIFE-GROUP/med-safety-bench | http://arxiv.org/abs/2403.03744v4 |
28 | TourSynbio-Search: A Large Language Model Driven Agent Framework for Unified Search Method for Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-14 | arXiv e-prints, 2024 | https://github.com/tsynbio/Toursynbio-Search | http://arxiv.org/abs/2411.06024v1 |
29 | When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models | Yinghui Li, Qingyu Zhou, Yuanzhen Luo, Shirong Ma, Yangning Li, Hai-Tao Zheng, Xuming Hu, Philip S. Yu | 2024-11-14 | arXiv | https://github.com/THUKElab/FLUB | https://doi.org/10.48550/arXiv.2402.11100 |
30 | DROJ: A Prompt-Driven Attack against Large Language Models | Leyang Hu, Boran Wang | 2024-11-14 | arXiv | https://github.com/Leon-Leyang/LLM-Safeguard | http://arxiv.org/abs/2411.09125v1 |
31 | Verbosity |
Yusen Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang | 2024-11-12 | arXiv | https://github.com/psunlpgroup/VerbosityLLM | http://arxiv.org/abs/2411.07858v1 |
32 | ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Danielle Bitterman, Fei Wang, Kai Shu | 2024-11-10 | GoogleScholar | https://clinicalbench.github.io | http://arxiv.org/abs/2411.06469v1 |
33 | AutoProteinEngine: A Large Language Model Driven Agent Framework for Multimodal AutoML in Protein Engineering | Yungeng Liu, Zan Chen, Yu Guang Wang, Yiqing Shen | 2024-11-10 | arXiv e-prints, 2024 | https://github.com/tsynbio/AutoPE | http://arxiv.org/abs/2411.04440v1 |
34 | Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation | Ayan Sengupta, Vaibhav Seth, Arinjay Pathak, Natraj Raman, Sriram Gopalakrishnan, Tanmoy Chakraborty | 2024-11-10 | arXiv e …, 2024 | https://github.com/LCS2-IIITD/MonteCLoRA | http://arxiv.org/abs/2411.04358v2 |
35 | Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | Young-Jun Lee, Dokyong Lee, Junyoung Youn, Kyeongjin Oh, Ho-Jin Choi | 2024-11-10 | arXiv e-prints, 2024 | https://github.com/passing2961/Thanos | http://arxiv.org/abs/2411.04496v1 |
36 | LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models | Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu | 2024-11-09 | EMNLP | https://github.com/yxwan123/LogicAsker | https://aclanthology.org/2024.emnlp-main.128 |
37 | On Fake News Detection with LLM Enhanced Semantics Mining | X Ma, Y Zhang, K Ding, J Yang, J Wu… | 2024-11-09 | OpenReview | https://github.com/LEG4FD/LEG4FD | https://openreview.net/pdf/ee5cbff327c79c172ad3b32ba4e6aaf163010f2f.pdf |
38 | Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment | Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen | 2024-11-09 | EMNLP | https://github.com/RUCAIBox/ALLO | https://aclanthology.org/2024.emnlp-main.857 |
39 | A User-Centric Benchmark for Evaluating Large Language Models | Jiayin Wang, Fengran Mo, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie | 2024-11-09 | arXiv | https://github.com/Alice1998/URS | https://doi.org/10.48550/arXiv.2404.13940 |
40 | Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | Xiaojun Wu, Junxi Liu, Huanyi Su, Zhouchi Lin, Yiyan Qi, Chengjin Xu, Jiajun Su, Jiajie Zhong, Fuwei Wang, Saizhuo Wang, Fengrui Hua, Jia Li, Jian Guo | 2024-11-09 | arXiv | https://github.com/IDEA-FinAI/Golden-Touchstone | http://arxiv.org/abs/2411.06272v1 |
41 | LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit | Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Chengtao Lv, Yunchen Zhang, Dacheng Tao, Xianglong Liu | 2024-11-09 | EMNLP | https://github.com/ModelTC/llmc | https://aclanthology.org/2024.emnlp-industry.12 |
42 | Game-theoretic LLM: Agent Workflow for Negotiation Games | Wenyue Hua, Ollie Liu, Lingyao Li, Alfonso Amayuelas, Julie Chen, Lucas Jiang, Mingyu Jin, Lizhou Fan, Fei Sun, William Wang, Xintong Wang, Yongfeng Zhang | 2024-11-08 | arXiv | https://github.com/Wenyueh/game_theory | http://arxiv.org/abs/2411.05990v2 |
43 | Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation | Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du | 2024-11-08 | arXiv | https://github.com/Tizzzzy/LLM-GDM-alignment | http://arxiv.org/abs/2411.05316v1 |
44 | WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | Shengda Fan, Xin Cong, Yuepeng Fu, Zhong Zhang, Shuyan Zhang, Yuanwei Liu, Yesai Wu, Yankai Lin, Zhiyuan Liu, Maosong Sun | 2024-11-08 | arXiv | https://github.com/OpenBMB/WorkflowLLM | http://arxiv.org/abs/2411.05451v1 |
45 | FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | Eric Wu, Kevin Wu, James Zou | 2024-11-07 | arXiv | https://github.com/kevinwu23/StanfordFineTuneBench | http://arxiv.org/abs/2411.05059v1 |
46 | Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities | Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei | 2024-11-07 | arXiv | https://github.com/findalexli/Abstract2Appendix | http://arxiv.org/abs/2411.05232v1 |
47 | Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | Zhijian Zhuo, Ya Wang, Yutao Zeng, Xiaoqing Li, Xun Zhou, Jinwen Ma | 2024-11-06 | arXiv | https://github.com/BryceZhuo/PolyCom | http://arxiv.org/abs/2411.03884v1 |
48 | QUILL: Quotation Generation Enhancement of Large Language Models | Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing Liang, Feng Wei, Jinglei Chen, Zujie Liang, Deqing Yang, Yanghua Xiao | 2024-11-06 | arXiv | https://github.com/GraceXiaoo/QUILL | http://arxiv.org/abs/2411.03675v1 |
49 | FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models | Zhanwei Zhang, Shizhao Sun, Wenxiao Wang, Deng Cai, Jiang Bian | 2024-11-05 | arXiv | https://github.com/microsoft/CADGeneration/FlexCAD | http://arxiv.org/abs/2411.05823v1 |
50 | Change Is the Only Constant: Dynamic LLM Slicing based on Layer Redundancy | Razvan-Gabriel Dumitru, Paul-Ioan Clotan, Vikas Yadav, Darius Peteleaza, Mihai Surdeanu | 2024-11-05 | arXiv | https://github.com/RazvanDu/DynamicSlicing | http://arxiv.org/abs/2411.03513v1 |
51 | Leveraging Large Language Models in Code Question Answering: Baselines and Issues | Georgy Andryushchenko, Vladimir Ivanov, Vladimir Makharev, Elizaveta Tukhtina, Aidar Valeev | 2024-11-05 | arXiv | https://github.com/IU-AES-AI4Code/CodeQuestionAnswering | http://arxiv.org/abs/2411.03012v1 |
52 | SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents | Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen | 2024-11-05 | arXiv | https://github.com/David-Li0406/SMoA | http://arxiv.org/abs/2411.03284v1 |
53 | Culinary Class Wars: Evaluating LLMs using ASH in Cuisine Transfer Task | Hoonick Lee, Mogan Gim, Donghyeon Park, Donghee Choi, Jaewoo Kang | 2024-11-04 | arXiv | http://github.com/dmis-lab/CulinaryASH | http://arxiv.org/abs/2411.01996v1 |
54 | Eurekaverse: Environment Curriculum Generation via Large Language Models | William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma | 2024-11-04 | arXiv | https://eureka-research.github.io/eurekaverse | http://arxiv.org/abs/2411.01775v1 |
55 | SQL Injection Jailbreak: a structural disaster of large language models | Jiawei Zhao, Kejiang Chen, Weiming Zhang, Nenghai Yu | 2024-11-03 | arXiv | https://github.com/weiyezhimeng/SQL-Injection-Jailbreak | http://arxiv.org/abs/2411.01565v1 |
56 | TODO: Enhancing LLM Alignment with Ternary Preferences | Yuxiang Guo, Lu Yin, Bo Jiang, Jiaqi Zhang | 2024-11-02 | arXiv | https://github.com/XXares/TODO | http://arxiv.org/abs/2411.02442v1 |
57 | Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis | Shijia Liao, Yuxuan Wang, Tianyu Li, Yifan Cheng, Ruoyi Zhang, Rongzhi Zhou, Yijin Xing | 2024-11-02 | arXiv | https://github.com/fishaudio/fish-speech | http://arxiv.org/abs/2411.01156v1 |
58 | Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection | Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das | 2024-11-02 | arXiv | https://github.com/apple-yinhan/Noise-robust-SED | http://arxiv.org/abs/2411.01174v1 |
59 | Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM | Xiong Wang, Yangze Li, Chaoyou Fu, Yunhang Shen, Lei Xie, Ke Li, Xing Sun, Long Ma | 2024-11-01 | arXiv | https://freeze-omni.github.io/ | http://arxiv.org/abs/2411.00774v1 |
60 | Beyond Utility: Evaluating LLM as Recommender | Chumeng Jiang, Jiayin Wang, Weizhi Ma, Charles L. A. Clarke, Shuai Wang, Chuhan Wu, Min Zhang | 2024-11-01 | arXiv | https://github.com/JiangDeccc/EvaLLMasRecommender | http://arxiv.org/abs/2411.00331v1 |
61 | LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham | 2024-11-01 | arXiv | https://fsoft-aic.github.io/fsoft-LibMoE.github.io | http://arxiv.org/abs/2411.00918v1 |
62 | Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling | Yiwen Ding, Zhiheng Xi, Wei He, Zhuoyuan Li, Yitao Zhai, Xiaowei Shi, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-11-01 | arXiv | https://github.com/Yiwen-Ding/Guided-Self-Improvement | http://arxiv.org/abs/2411.00750v1 |
63 | MoD: A Distribution-Based Approach for Merging Large Language Models | Quy-Anh Dang, Chris Ngo | 2024-11-01 | arXiv | https://github.com/knovel-eng/mod | http://arxiv.org/abs/2411.00406v1 |
64 | PILL: Plug Into LLM with Adapter Expert and Attention Gate | Fangyuan Zhang, Tingting Liang, Zhengyuan Wu, Yuyu Yin | 2024-11-01 | Applied Soft Computing | https://github.com/DsaltYfish/PILL | http://arxiv.org/abs/2311.02126v1 |
65 | Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging | Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, Jinyuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang | 2024-11 | LAMPS '24: Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis | https://github.com/ThuCCSLab/MergeGuard | https://dl.acm.org/doi/10.1145/3689217.3690614 |
66 | SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing | Zhiyuan Zhang, DongDong Chen, Jing Liao | 2024-11 | ACM Transactions on Graphics (TOG), Volume 43, Issue 6 | https://bestzzhang.github.io/SGEdit | https://dl.acm.org/doi/10.1145/3687957 |
67 | Large Language Models for Anomaly Detection in Computational Workflows: From Supervised Fine-Tuning to In-Context Learning | Hongwei Jin, George Papadimitriou, Krishnan Raghavan, Pawel Zuk, Prasanna Balaprakash, Cong Wang, Anirban Mandal, Ewa Deelman | 2024-11 | SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis | https://github.com/PoSeiDon-Workflows/LLM_AD | https://dl.acm.org/doi/10.1109/SC41406.2024.00098 |
68 | Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models | Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen | 2024-11 | Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12 | https://github.com/eosphoros-ai/DB-GPT | https://dl.acm.org/doi/10.14778/3685800.3685876 |
69 | EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Unified Compression and Adaptive Layer Voting | Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin | 2024-11 | DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference | https://github.com/GATECH-EIC/Edge-LLM | https://dl.acm.org/doi/10.1145/3649329.3658473 |
70 | End-to-End Ontology Learning with Large Language Models | Andy Lo, Albert Q. Jiang, Wenda Li, Mateja Jamnik | 2024-10-31 | arXiv | https://github.com/andylolu2/ollm | http://arxiv.org/abs/2410.23584v1 |
71 | What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | Ming Li, Yanhong Li, Tianyi Zhou | 2024-10-31 | arXiv | https://github.com/MingLiiii/Layer_Gradient | http://arxiv.org/abs/2410.23743v1 |
72 | Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models | Haritz Puerto, Martin Gubri, Sangdoo Yun, Seong Joon Oh | 2024-10-31 | arXiv | https://github.com/parameterlab/mia-scaling | http://arxiv.org/abs/2411.00154v1 |
73 | LLaMo: Large Language Model-based Molecular Graph Assistant | Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim | 2024-10-31 | arXiv | https://github.com/mlvlab/LLaMo | http://arxiv.org/abs/2411.00871v1 |
74 | LLM4Mat-Bench: Benchmarking Large Language Models for Materials Property Prediction | Andre Niyongabo Rubungo, Kangming Li, Jason Hattrick-Simpers, Adji Bousso Dieng | 2024-10-31 | arXiv | https://github.com/vertaix/LLM4Mat-Bench | http://arxiv.org/abs/2411.00177v1 |
75 | BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments | Xinghao Wang, Pengyu Wang, Bo Wang, Dong Zhang, Yunhua Zhou, Xipeng Qiu | 2024-10-31 | arXiv | https://github.com/xinghaow99/BitStack | http://arxiv.org/abs/2410.23918v1 |
76 | DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios | Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao | 2024-10-31 | arXiv | https://github.com/NLP2CT/DetectRL | http://arxiv.org/abs/2410.23746v1 |
77 | BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference | Junqi Zhao, Zhijin Fang, Shu Li, Shaohui Yang, Shichao He | 2024-10-30 | arXiv | https://github.com/JunqiZhao888/buzz-llm | http://arxiv.org/abs/2410.23079v1 |
78 | Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation | Yang Zhang, Juntao You, Yimeng Bai, Jizhi Zhang, Keqin Bao, Wenjie Wang, Tat-Seng Chua | 2024-10-30 | arXiv | https://github.com/itsmeyjt/CFT | http://arxiv.org/abs/2410.22809v1 |
79 | Comparative Analysis of Demonstration Selection Algorithms for LLM In-Context Learning | Dong Shu, Mengnan Du | 2024-10-30 | arXiv | https://github.com/Tizzzzy/Demonstration_Selection_Overview | http://arxiv.org/abs/2410.23099v1 |
80 | On Memorization of Large Language Models in Logical Reasoning | Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar | 2024-10-30 | arXiv | https://memkklogic.github.io | http://arxiv.org/abs/2410.23123v1 |
81 | Real-Time Personalization for LLM-based Recommendation with Customized In-Context Learning | Keqin Bao, Ming Yan, Yang Zhang, Jizhi Zhang, Wenjie Wang, Fuli Feng, Xiangnan He | 2024-10-30 | arXiv | https://github.com/ym689/rec_icl | http://arxiv.org/abs/2410.23136v1 |
82 | SciPIP: An LLM-based Scientific Paper Idea Proposer | Wenxiao Wang, Lihui Gu, Liye Zhang, Yunxiang Luo, Yi Dai, Chen Shen, Liang Xie, Binbin Lin, Xiaofei He, Jieping Ye | 2024-10-30 | arXiv | https://github.com/cheerss/SciPIP | http://arxiv.org/abs/2410.23166v1 |
83 | ReasoningRec: Bridging Personalized Recommendations and Human-Interpretable Explanations through LLM Reasoning | Millennium Bismay, Xiangjue Dong, James Caverlee | 2024-10-30 | arXiv | https://github.com/millenniumbismay/reasoningrec | http://arxiv.org/abs/2410.23180v1 |
84 | Distinguishing Ignorance from Error in LLM Hallucinations | Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov | 2024-10-29 | arXiv | https://github.com/technion-cs-nlp/hallucination-mitigation | http://arxiv.org/abs/2410.22071v1 |
85 | Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach | Qingchuan Li, Jiatong Li, Tongxuan Liu, Yuting Zeng, Mingyue Cheng, Weizhe Huang, Qi Liu | 2024-10-29 | arXiv | https://github.com/wufeiwuwoshihua/nshy | http://arxiv.org/abs/2410.21779v1 |
86 | Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance | Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong Cho | 2024-10-29 | arXiv | https://github.com/krafton-ai/Rare2Frequent | http://arxiv.org/abs/2410.22376v1 |
87 | Scaling LLM Inference with Optimized Sample Compute Allocation | Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei Li | 2024-10-29 | arXiv | https://github.com/LeiLiLab/OSCA | http://arxiv.org/abs/2410.22480v1 |
88 | LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment | Ge Yang, Changyi He, Jinyang Guo, Jianyu Wu, Yifu Ding, Aishan Liu, Haotong Qin, Pengliang Ji, Xianglong Liu | 2024-10-28 | arXiv | https://github.com/AboveParadise/LLMCBench | http://arxiv.org/abs/2410.21352v2 |
89 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | Wanhua Li, Zibin Meng, Jiawei Zhou, Donglai Wei, Chuang Gan, Hanspeter Pfister | 2024-10-28 | arXiv | https://mengzibin.github.io/SocialGPT.github.io/ | http://arxiv.org/abs/2410.21411v1 |
90 | Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models | Yilun Jin, Zheng Li, Chenwei Zhang, Tianyu Cao, Yifan Gao, Pratik Jayarao, Mao Li, Xin Liu, Ritesh Sarkhel, Xianfeng Tang, Haodong Wang, Zhengyang Wang, Wenju Xu, Jingfeng Yang, Qingyu Yin, Xian Li, Priyanka Nigam, Yi Xu, Kai Chen, Qiang Yang, Meng Jiang, Bing Yin | 2024-10-28 | arXiv | https://github.com/KL4805/ShoppingMMLU | http://arxiv.org/abs/2410.20745v2 |
91 | ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, Beidi Chen | 2024-10-28 | arXiv | https://github.com/bytedance/ShadowKV | http://arxiv.org/abs/2410.21465v1 |
92 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates | Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu | 2024-10-28 | arXiv | https://github.com/hexuandeng/NewTerm | http://arxiv.org/abs/2410.20814v1 |
93 | Instruction-Tuned LLMs Succeed in Document-Level MT Without Fine-Tuning -- But BLEU Turns a Blind Eye | Yirong Sun, Dawei Zhu, Yanjun Chen, Erjia Xiao, Xinghao Chen, Xiaoyu Shen | 2024-10-28 | arXiv | https://github.com/EIT-NLP/BLEUless_DocMT | http://arxiv.org/abs/2410.20941v2 |
94 | Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Mufei Li, Siqi Miao, Pan Li | 2024-10-28 | arXiv | https://github.com/Graph-COM/SubgraphRAG | http://arxiv.org/abs/2410.20724v1 |
95 | Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese | 2024-10-28 | arXiv | https://github.com/pasquini-dario/project_mantis | http://arxiv.org/abs/2410.20911v1 |
96 | Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data | Xinhong Xie, Tao Li, Quanyan Zhu | 2024-10-27 | arXiv | https://github.com/XXXinhong/Detoxification_LLM | http://arxiv.org/abs/2410.20298v1 |
97 | LLMs Can Evolve Continually on Modality for X-Modal Reasoning | Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen | 2024-10-26 | arXiv | https://github.com/JiazuoYu/PathWeave | http://arxiv.org/abs/2410.20178v1 |
98 | Developing Retrieval Augmented Generation (RAG) based LLM Systems from PDFs: An Experience Report | Ayman Asad Khan, Md Toufique Hasan, Kai Kristian Kemell, Jussi Rasku, Pekka Abrahamsson | 2024-10-26 | arXiv e …, 2024 | https://github.com/GPT-Laboratory/RAG-LLM-Development-Guidebook-from-PDFs | http://arxiv.org/abs/2410.15944v1 |
99 | Enhancing Inflation Nowcasting with LLM: Sentiment Analysis on News | Marc-Antoine Allard, Paul Teiletche, Adam Zinebi | 2024-10-26 | arXiv | https://github.com/paultltc/InflaBERT | http://arxiv.org/abs/2410.20198v1 |
100 | Language Agents Meet Causality -- Bridging LLMs and Causal World Models | John Gkountouras, Matthias Lindemann, Phillip Lippe, Efstratios Gavves, Ivan Titov | 2024-10-25 | arXiv | https://j0hngou.github.io/LLMCWM/ | http://arxiv.org/abs/2410.19923v1 |
101 | Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities | Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao | 2024-10-25 | arXiv …, 2024 | https://github.com/SunChungEn/ADV-LLM | http://arxiv.org/abs/2410.18469v1 |
102 | APRICOT: Active Preference Learning and Constraint-Aware Task Planning with LLMs | Huaxiaoyue Wang, Nathaniel Chin, Gonzalo Gonzalez-Pumariega, Xiangwan Sun, Neha Sunkara, Maximus Adrian Pace, Jeannette Bohg, Sanjiban Choudhury | 2024-10-25 | arXiv | https://portal-cornell.github.io/apricot/ | http://arxiv.org/abs/2410.19656v1 |
103 | Distill Visual Chart Reasoning Ability from LLMs to MLLMs | Wei He, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-10-24 | arXiv | https://github.com/hewei2001/ReachQA | http://arxiv.org/abs/2410.18798v1 |
104 | GCoder: Improving Large Language Model for Generalized Graph Problem Solving | Qifan Zhang, Xiaobin Hong, Jianheng Tang, Nuo Chen, Yuhan Li, Wenzhong Li, Jing Tang, Jia Li | 2024-10-24 | arXiv | https://github.com/Bklight999/WWW25-GCoder/tree/master | http://arxiv.org/abs/2410.19084v1 |
105 | Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, Zhangyang Wang | 2024-10-24 | arXiv | https://github.com/VITA-Group/READ-ME | http://arxiv.org/abs/2410.19123v1 |
106 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models | Kim Sung-Bin, Oh Hyun-Bin, JungMok Lee, Arda Senocak, Joon Son Chung, Tae-Hyun Oh | 2024-10-23 | arXiv | https://github.com/AVHBench/AVHBench | http://arxiv.org/abs/2410.18325v1 |
107 | CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Qinsi Wang, Saeed Vahidian, Hancheng Ye, Jianyang Gu, Jianyi Zhang, Yiran Chen | 2024-10-23 | arXiv | https://wangqinsi1.github.io/coreinfer_page/ | http://arxiv.org/abs/2410.18311v1 |
108 | Cross-model Control: Improving Multiple Large Language Models in One-time Training | Jiayi Wu, Hao Sun, Hengyi Cai, Lixin Su, Shuaiqiang Wang, Dawei Yin, Xiang Li, Ming Gao | 2024-10-23 | arXiv | https://github.com/wujwyi/CMC | http://arxiv.org/abs/2410.17599v1 |
109 | VoiceBench: Benchmarking LLM-Based Voice Assistants | Yiming Chen, Xianghu Yue, Chen Zhang, Xiaoxue Gao, Robby T. Tan, Haizhou Li | 2024-10-22 | arXiv | https://github.com/MatthewCYM/VoiceBench | http://arxiv.org/abs/2410.17196v1 |
110 | CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing | Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou | 2024-10-22 | arXiv | https://github.com/uclaml/COPS | http://arxiv.org/abs/2410.16670v1 |
111 | Improving Causal Reasoning in Large Language Models: A Survey | Longxuan Yu, Delin Chen, Siheng Xiong, Qingyang Wu, Qingzhen Liu, Dawei Li, Zhikai Chen, Xiaoze Liu, Liangming Pan | 2024-10-22 | arXiv | https://github.com/chendl02/Awesome-LLM-causal-reasoning | http://arxiv.org/abs/2410.16676v3 |
112 | Large Language Models Empowered Personalized Web Agents | Hongru Cai, Yongqi Li, Wenjie Wang, Fengbin Zhu, Xiaoyu Shen, Wenjie Li, Tat-Seng Chua | 2024-10-22 | arXiv | https://hongrucai.github.io/PersonalWAB/ | http://arxiv.org/abs/2410.17236v1 |
113 | AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration | Bradley McDanel | 2024-10-22 | arXiv | https://github.com/BradMcDanel/AMUSD/ | http://arxiv.org/abs/2410.17375v1 |
114 | ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage | Taewhoo Lee, Chanwoong Yoon, Kyochul Jang, Donghyeon Lee, Minju Song, Hyunjae Kim, Jaewoo Kang | 2024-10-22 | arXiv | https://github.com/dmis-lab/ETHIC | http://arxiv.org/abs/2410.16848v1 |
115 | Automated Spinal MRI Labelling from Reports Using a Large Language Model | Robin Y. Park, Rhydian Windsor, Amir Jamaludin, Andrew Zisserman | 2024-10-22 | MICCAI | https://github.com/robinyjpark/AutoLabelClassifier | https://doi.org/10.1007/978-3-031-72086-4_10 |
116 | DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models | Chen Qian, Dongrui Liu, Jie Zhang, Yong Liu, Jing Shao | 2024-10-22 | arXiv | https://github.com/ChnQ/DEAN | http://arxiv.org/abs/2410.16672v1 |
117 | Boosting Jailbreak Transferability for Large Language Models | Hanqing Liu, Lifeng Zhou, Huanqian Yan | 2024-10-21 | arXiv | https://github.com/HqingLiu/SI-GCG | http://arxiv.org/abs/2410.15645v1 |
118 | CausalGraph2LLM: Evaluating LLMs for Causal Queries | Ivaxi Sheth, Bahare Fatemi, Mario Fritz | 2024-10-21 | arXiv | https://github.com/ivaxi0s/CausalGraph2LLM | http://arxiv.org/abs/2410.15939v1 |
119 | LLaVA-KD: A Framework of Distilling Multimodal Large Language Models | Yuxuan Cai, Jiangning Zhang, Haoyang He, Xinwei He, Ao Tong, Zhenye Gan, Chengjie Wang, Xiang Bai | 2024-10-21 | arXiv | https://github.com/Fantasyele/LLaVA-KD | http://arxiv.org/abs/2410.16236v2 |
120 | MagicPIG: LSH Sampling for Efficient LLM Generation | Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye, Yang Zhou, Jianyu Zhang, Niklas Nolte, Yuandong Tian, Matthijs Douze, Leon Bottou, Zhihao Jia, Beidi Chen | 2024-10-21 | arXiv | https://github.com/Infini-AI-Lab/MagicPIG | http://arxiv.org/abs/2410.16179v1 |
121 | Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs | Xin Ma, Yang Liu, Jingjing Liu, Xiaoxu Ma | 2024-10-21 | arXiv | https://github.com/soacker/Mesa-Extrapolation | http://arxiv.org/abs/2410.15859v3 |
122 | RAC: Efficient LLM Factuality Correction with Retrieval Augmentation | Changmao Li, Jeffrey Flanigan | 2024-10-21 | arXiv | https://github.com/jlab-nlp/Retrieval-Augmented-Correction | http://arxiv.org/abs/2410.15667v1 |
123 | A Comprehensive Evaluation of Cognitive Biases in LLMs | Simon Malberg, Roman Poletukhin, Carolin M. Schuster, Georg Groh | 2024-10-20 | arXiv | https://github.com/simonmalberg/cognitive-biases-in-llms | http://arxiv.org/abs/2410.15413v1 |
124 | Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | Yinhan He, Zaiyi Zheng, Patrick Soga, Yaozhen Zhu, yushun Dong, Jundong Li | 2024-10-19 | EMNLP 2024 (Findings) | https://github.com/YinhanHe123/new\_LLM4GNNExplanation | http://arxiv.org/abs/2410.15165v1 |
125 | Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization | Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian | 2024-10-19 | arXiv | https://github.com/wooozihui/GlitchMiner | http://arxiv.org/abs/2410.15052v2 |
126 | Imprompter: Tricking LLM Agents into Improper Tool Use | Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, Earlence Fernandes | 2024-10-19 | arXiv | https://github.com/Reapor-Yurnero/imprompter | http://arxiv.org/abs/2410.14923v2 |
127 | MCCoder: Streamlining Motion Control with LLM-Assisted Code Generation and Rigorous Verification | Yin Li, Liangwei Wang, Shiyuan Piao, Boo-Ho Yang, Ziyue Li, Wei Zeng, Fugee Tsung | 2024-10-19 | arXiv | https://github.com/MCCodeAI/MCCoder | http://arxiv.org/abs/2410.15154v1 |
128 | Are LLMs Good Zero-Shot Fallacy Classifiers? | Fengjun Pan, Xiaobao Wu, Zongrui Li, Anh Tuan Luu | 2024-10-19 | arXiv | https://github.com/panFJCharlotte98/Fallacy_Detection | http://arxiv.org/abs/2410.15050v1 |
129 | Evaluating Deep Unlearning in Large Language Models | Ruihan Wu, Chhavi Yadav, Russ Salakhutdinov, Kamalika Chaudhuri | 2024-10-19 | arXiv | https://github.com/wrh14/deep_unlearning | http://arxiv.org/abs/2410.15153v1 |
130 | GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization | Zihui Wu, Haichang Gao, Ping Wang, Shudong Zhang, Zhaoxiang Liu, Shiguo Lian | 2024-10-19 | arXiv | https://github.com/wooozihui/GlitchMiner | http://arxiv.org/abs/2410.15052v4 |
131 | SRAP-Agent: Simulating and Optimizing Scarce Resource Allocation Policy with LLM-based Agent | Jiarui Ji, Yang Li, Hongtao Liu, Zhicheng Du, Zhewei Wei, Weiran Shen, Qi Qi, Yankai Lin | 2024-10-18 | arXiv | https://github.com/jijiarui-cather/SRAPAgent_Framework | http://arxiv.org/abs/2410.14152v1 |
132 | Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation | Shuo Tang, Xianghe Pang, Zexi Liu, Bohan Tang, Rui Ye, Xiaowen Dong, Yanfeng Wang, Siheng Chen | 2024-10-18 | arXiv | https://github.com/ShuoTang123/MATRIX-Gen | http://arxiv.org/abs/2410.14251v1 |
133 | Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models | Wei Jie Yeo, Ranjan Satapathy, Erik Cambria | 2024-10-18 | arXiv | https://github.com/wj210/Causal-Faithfulness | http://arxiv.org/abs/2410.14155v2 |
134 | REEF: Representation Encoding Fingerprints for Large Language Models | Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong Liu, Yu Qiao, Jing Shao | 2024-10-18 | arXiv | https://github.com/tmylla/REEF | http://arxiv.org/abs/2410.14273v1 |
135 | Enabling Scalable Evaluation of Bias Patterns in Medical LLMs | Hamed Fayyaz, Raphael Poulain, Rahmatollah Beheshti | 2024-10-18 | arXiv | https://github.com/healthylaife/autofair | http://arxiv.org/abs/2410.14763v1 |
136 | CoMAL: Collaborative Multi-Agent Large Language Models for Mixed-Autonomy Traffic | Huaiyuan Yao, Longchao Da, Vishnu Nandam, Justin Turnau, Zhiwei Liu, Linsey Pang, Hua Wei | 2024-10-18 | arXiv | https://github.com/Hyan-Yao/CoMAL | http://arxiv.org/abs/2410.14368v1 |
137 | Retrieval-Augmented Personalization for Multimodal Large Language Models | Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue | 2024-10-17 | arXiv | https://github.com/Hoar012/RAP-MLLM | http://arxiv.org/abs/2410.13360v2 |
138 | Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models | Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu | 2024-10-17 | EMNLP | https://github.com/yyhappier/ShortcutSuite | https://aclanthology.org/2024.emnlp-main.679 |
139 | Data Defenses Against Large Language Models | William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das | 2024-10-17 | arXiv | https://github.com/wagnew3/LLMDataDefenses | http://arxiv.org/abs/2410.13138v1 |
140 | FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs | Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelevitch, Amin Ahmad | 2024-10-17 | arXiv | https://github.com/vectara/FaithBench | http://arxiv.org/abs/2410.13210v1 |
141 | SLM-Mod: Small Language Models Surpass LLMs at Content Moderation | Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, Koustuv Saha | 2024-10-17 | arXiv | https://github.com/AGoyal0512/SLM-Mod | http://arxiv.org/abs/2410.13155v1 |
142 | aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion | Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Gen Wang, Yihong Dong, Kechi Zhang, Ge Li | 2024-10-17 | arXiv | https://github.com/aixcoder-plugin/aiXcoder-7B/tree/main | http://arxiv.org/abs/2410.13187v1 |
143 | Self-Pluralising Culture Alignment for Large Language Models | Shaoyang Xu, Yongqi Leng, Linhao Yu, Deyi Xiong | 2024-10-16 | arXiv | https://github.com/shaoyangxu/CultureSPA | http://arxiv.org/abs/2410.12971v1 |
144 | Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models | Iaroslav Chelombitko, Egor Safronov, Aleksey Komissarov | 2024-10-16 | arXiv | https://github.com/nup-csai/Qtok/ | http://arxiv.org/abs/2410.12989v1 |
145 | Neuron-based Personality Trait Induction in Large Language Models | Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen | 2024-10-16 | arXiv | https://github.com/RUCAIBox/NPTI | http://arxiv.org/abs/2410.12327v1 |
146 | Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors | Weixuan Wang, Jingyuan Yang, Wei Peng | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/SADI | http://arxiv.org/abs/2410.12299v1 |
147 | POROver: Improving Safety and Reducing Overrefusal in Large Language Models with Overgeneration and Preference Optimization | Batuhan K. Karaman, Ishmam Zabir, Alon Benhaim, Vishrav Chaudhary, Mert R. Sabuncu, Xia Song | 2024-10-16 | arXiv | https://github.com/batuhankmkaraman/POROver | http://arxiv.org/abs/2410.12999v1 |
148 | ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | Jingming Zhuo, Songyang Zhang, Xinyu Fang, Haodong Duan, Dahua Lin, Kai Chen | 2024-10-16 | arXiv | https://github.com/open-compass/ProSA | http://arxiv.org/abs/2410.12405v1 |
149 | Hypothesis Testing the Circuit Hypothesis in LLMs | Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adrià Garriga-Alonso, Andrew Jesson, Maggie Makar, David M. Blei | 2024-10-16 | arXiv | https://github.com/blei-lab/circuitry | http://arxiv.org/abs/2410.13032v1 |
150 | DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs | Yingsong Luo, Ling Chen | 2024-10-16 | arXiv | https://github.com/LuoYingSong/DAQ | http://arxiv.org/abs/2410.12187v2 |
151 | Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights | Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha | 2024-10-16 | arXiv | https://github.com/IBM/codellm-devkit | http://arxiv.org/abs/2410.13007v1 |
152 | HerO at AVeriTeC: The Herd of Open Large Language Models for Verifying Real-World Claims | Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park | 2024-10-16 | arXiv | https://github.com/ssu-humane/HerO | http://arxiv.org/abs/2410.12377v1 |
153 | Exploring Model Kinship for Merging Large Language Models | Yedi Hu, Yunzhi Yao, Ningyu Zhang, Shumin Deng, Huajun Chen | 2024-10-16 | arXiv | https://github.com/zjunlp/ModelKinship | http://arxiv.org/abs/2410.12613v1 |
154 | Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention | Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch | 2024-10-16 | arXiv | https://github.com/weixuan-wang123/INCLINE | http://arxiv.org/abs/2410.12462v1 |
155 | Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models | Kai Yao, Penglei Gao, Lichun Li, Yuan Zhao, Xiaofeng Wang, Wei Wang, Jianke Zhu | 2024-10-15 | EMNLP | https://github.com/Kaiseem/IST | https://aclanthology.org/2024.findings-emnlp.109 |
156 | Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models | Zhongye Liu, Hongbin Liu, Yuepeng Hu, Zedian Shao, Neil Zhenqiang Gong | 2024-10-15 | arXiv | https://github.com/lycheeefish/VHExpansion | http://arxiv.org/abs/2410.11242v1 |
157 | LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs | Volker Strobel, Marco Dorigo, Mario Fritz | 2024-10-15 | arXiv | https://github.com/Pold87/LLM2Swarm | http://arxiv.org/abs/2410.11387v2 |
158 | Subspace Optimization for Large Language Models with Convergence Guarantees | Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, Kun Yuan | 2024-10-15 | arXiv | https://github.com/pkumelon/Golore | http://arxiv.org/abs/2410.11289v1 |
159 | Zero-shot Model-based Reinforcement Learning using Large Language Models | Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat, Oussama Zekri, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Ievgen Redko, Balázs Kégl | 2024-10-15 | arXiv | https://github.com/abenechehab/dicl | http://arxiv.org/abs/2410.11711v1 |
160 | DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | Guangxuan Xiao, Jiaming Tang, Jingwei Zuo, Junxian Guo, Shang Yang, Haotian Tang, Yao Fu, Song Han | 2024-10-14 | arXiv | https://github.com/mit-han-lab/duo-attention | http://arxiv.org/abs/2410.10819v1 |
161 | MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media | Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu | 2024-10-14 | arXiv | https://github.com/zwzzzQAQ/MentalGLM | http://arxiv.org/abs/2410.10323v1 |
162 | Locking Down the Finetuned LLMs Safety | Minjun Zhu, Linyi Yang, Yifan Wei, Ningyu Zhang, Yue Zhang | 2024-10-14 | arXiv | https://github.com/zhu-minjun/SafetyLock | http://arxiv.org/abs/2410.10343v1 |
163 | Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | Ziyue Li, Tianyi Zhou | 2024-10-14 | arXiv | https://github.com/tianyi-lab/MoE-Embedding | http://arxiv.org/abs/2410.10814v2 |
164 | Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues | Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao | 2024-10-14 | arXiv | https://github.com/renqibing/ActorAttack | http://arxiv.org/abs/2410.10700v1 |
165 | AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models | Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang | 2024-10-14 | arXiv | https://github.com/haiquanlu/AlphaPruning | https://doi.org/10.48550/arXiv.2410.10912 |
166 | Large Language Model Evaluation via Matrix Nuclear-Norm | Yahan Li, Tingyu Xia, Yi Chang, Yuan Wu | 2024-10-14 | arXiv | https://github.com/MLGroupJLU/MatrixNuclearNorm | https://doi.org/10.48550/arXiv.2410.10672 |
167 | One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Fangru Lin, Shaoguang Mao, Emanuele La Malfa, Valentin Hofmann, Adrian de Wynter, Jing Yao, Si-Qing Chen, Michael Wooldridge, Furu Wei | 2024-10-14 | arXiv | https://github.com/fangru-lin/redial_dialect_robustness_fairness | https://doi.org/10.48550/arXiv.2410.11005 |
168 | RMB: Comprehensively Benchmarking Reward Models in LLM Alignment | Enyu Zhou, Guodong Zheng, Binghai Wang, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang | 2024-10-13 | arXiv | https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark | http://arxiv.org/abs/2410.09893v1 |
169 | LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Han Qiu, Jiaxing Huang, Peng Gao, Qin Qi, Xiaoqin Zhang, Ling Shao, Shijian Lu | 2024-10-13 | arXiv | https://github.com/hanqiu-hq/LongHalQA | http://arxiv.org/abs/2410.09962v2 |
170 | LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models | Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun | 2024-10-12 | arXiv | https://github.com/thunlp/LLMxMapReduce | http://arxiv.org/abs/2410.09342v1 |
171 | ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models | Nandan Kumar Jha, Brandon Reagen | 2024-10-12 | arXiv | https://github.com/Nandan91/relu-revival-normfree | http://arxiv.org/abs/2410.09637v1 |
172 | OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models | Jun Wang, Meng Fang, Ziyu Wan, Muning Wen, Jiachen Zhu, Anjie Liu, Ziqin Gong, Yan Song, Lei Chen, Lionel M. Ni, Linyi Yang, Ying Wen, Weinan Zhang | 2024-10-12 | arXiv | https://openreasoner.github.io | http://arxiv.org/abs/2410.09671v1 |
173 | Skipping Computations in Multimodal LLMs | Mustafa Shukor, Matthieu Cord | 2024-10-12 | arXiv | https://github.com/mshukor/ima-lmms | http://arxiv.org/abs/2410.09454v1 |
174 | FlatQuant: Flatness Matters for LLM Quantization | Yuxuan Sun, Ruikang Liu, Haoli Bai, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao | 2024-10-12 | arXiv | https://github.com/ruikangliu/FlatQuant | http://arxiv.org/abs/2410.09426v1 |
175 | FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback | Youquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin Cui, Weipeng Chen, Zenan Zhou, Wentao Zhang | 2024-10-12 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/FB-Bench | http://arxiv.org/abs/2410.09412v1 |
176 | ELICIT: LLM Augmentation via External In-Context Capability | Futing Wang, Jianhao Yan, Yue Zhang, Tao Lin | 2024-10-12 | arXiv | https://github.com/LINs-lab/ELICIT | http://arxiv.org/abs/2410.09343v1 |
177 | MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection | Xi Jiang, Jian Li, Hanqiu Deng, Yong Liu, Bin-Bin Gao, Yifeng Zhou, Jialin Li, Chengjie Wang, Feng Zheng | 2024-10-12 | arXiv | https://github.com/jam-cc/MMAD | http://arxiv.org/abs/2410.09453v1 |
178 | Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking | Wei Zhang, Pengfei Li, Junli Wang, Bingchuan Sun, Qihao Jin, Guangjun Bao, Shibo Rui, Yang Yu, Wenchao Ding, Peng Li, Yilun Chen | 2024-10-11 | arXiv | https://github.com/ChipsICU/Dual-AEB | https://doi.org/10.48550/arXiv.2410.08616 |
179 | AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation | Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie | 2024-10-11 | arXiv | https://github.com/UCSC-VLAA/AttnGCG-attack | http://arxiv.org/abs/2410.09040v1 |
180 | QEFT: Quantization for Efficient Fine-Tuning of LLMs | Changhun Lee, Jun-gyu Jin, Younghyun Cho, Eunhyeok Park | 2024-10-11 | arXiv | https://github.com/xvyaward/qeft | http://arxiv.org/abs/2410.08661v1 |
181 | Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models | Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang | 2024-10-10 | arXiv | https://github.com/sitaocheng/Knowledge_Interplay | https://doi.org/10.48550/arXiv.2410.08414 |
182 | VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models | Lisa Dunlap, Krishna Mandal, Trevor Darrell, Jacob Steinhardt, Joseph E Gonzalez | 2024-10-10 | arXiv | https://github.com/lisadunlap/VibeCheck | http://arxiv.org/abs/2410.12851v1 |
183 | Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | Qi Wang, Jindong Li, Shiqi Wang, Qianli Xing, Runliang Niu, He Kong, Rui Li, Guodong Long, Yi Chang, Chengqi Zhang | 2024-10-10 | arXiv | https://github.com/jindongli-Ai/Next-Generation-LLM-based-Recommender-Systems-Survey | http://arxiv.org/abs/2410.19744v1 |
184 | StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs | Yuanqing Yu, Zhefan Wang, Weizhi Ma, Zhicheng Guo, Jingtao Zhan, Shuai Wang, Chuhan Wu, Zhiqiang Guo, Min Zhang | 2024-10-10 | arXiv | https://github.com/yuyq18/StepTool | http://arxiv.org/abs/2410.07745v1 |
185 | Reward-Augmented Data Enhances Direct Preference Alignment of LLMs | Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang | 2024-10-10 | arXiv | https://github.com/shenao-zhang/reward-augmented-preference | http://arxiv.org/abs/2410.08067v1 |
186 | Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun | 2024-10-10 | arXiv | https://chenweize1998.github.io/optima-project-page | http://arxiv.org/abs/2410.08115v1 |
187 | Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | Wenting Tan, Dongxiao Chen, Jieting Xue, Zihao Wang, Taijie Chen | 2024-10-10 | arXiv | https://github.com/SallyTan13/Teaching-Inspired-Prompting | https://doi.org/10.48550/arXiv.2410.08068 |
188 | GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Maps | Muhammad Umair Nasir, Steven James, Julian Togelius | 2024-10-10 | arXiv | https://github.com/umair-nasir14/Game-Traversal-Benchmark | https://doi.org/10.48550/arXiv.2410.07765 |
189 | Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models | Zhipeng Chen, Liang Song, Kun Zhou, Wayne Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen | 2024-10-10 | arXiv | https://github.com/RUCAIBox/MAET | https://doi.org/10.48550/arXiv.2410.07825 |
190 | A Closer Look at Machine Unlearning for Large Language Models | Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin | 2024-10-10 | arXiv | https://github.com/sail-sg/closer-look-LLM-unlearning | https://doi.org/10.48550/arXiv.2410.08109 |
191 | Privately Learning from Graphs with Applications in Fine-tuning Large Language Models | Haoteng Yin, Rongzhe Wei, Eli Chien, Pan Li | 2024-10-10 | arXiv | https://github.com/Graph-COM/PvGaLM | https://doi.org/10.48550/arXiv.2410.08299 |
192 | IterGen: Iterative Structured LLM Generation | Shubham Ugare, Rohan Gumaste, Tarun Suresh, Gagandeep Singh, Sasa Misailovic | 2024-10-09 | arXiv | https://github.com/uiuc-arc/itergen | http://arxiv.org/abs/2410.07295v1 |
193 | WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents | Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang | 2024-10-09 | arXiv | https://github.com/elated-sawyer/WALL-E | http://arxiv.org/abs/2410.07484v2 |
194 | Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning | Chongyu Fan, Jiancheng Liu, Licong Lin, Jinghan Jia, Ruiqi Zhang, Song Mei, Sijia Liu | 2024-10-09 | arXiv | https://github.com/OPTML-Group/Unlearn-Simple | http://arxiv.org/abs/2410.07163v1 |
195 | Weak-eval-Strong: Evaluating and Eliciting Lateral Thinking of LLMs with Situation Puzzles | Qi Chen, Bowen Zhang, Gang Wang, Qi Wu | 2024-10-09 | arXiv | https://github.com/chenqi008/LateralThinking | http://arxiv.org/abs/2410.06733v1 |
196 | Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang | 2024-10-09 | arXiv | https://video-salmonn-2.github.io | http://arxiv.org/abs/2410.06682v1 |
197 | Dissecting Fine-Tuning Unlearning in Large Language Models | Yihuai Hong, Yuelin Zou, Lijie Hu, Ziqian Zeng, Di Wang, Haiqin Yang | 2024-10-09 | EMNLP | https://github.com/yihuaihong/Dissecting-FT-Unlearning | https://aclanthology.org/2024.emnlp-main.228 |
198 | CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models | Zi Gong, Hang Yu, Cong Liao, Bingchang Liu, Chaoyu Chen, Jianguo Li | 2024-10-09 | EMNLP | https://github.com/codefuse-ai/MFTCoder | https://aclanthology.org/2024.emnlp-main.459 |
199 | AgentSquare: Automatic LLM Agent Search in Modular Design Space | Yu Shang, Yu Li, Keyu Zhao, Likai Ma, Jiahe Liu, Fengli Xu, Yong Li | 2024-10-08 | arXiv | https://github.com/tsinghua-fib-lab/AgentSquare | http://arxiv.org/abs/2410.06153v1 |
200 | GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | Muhammad Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogério Feris, Leonid Karlinsky, James R. Glass | 2024-10-08 | arXiv | https://github.com/jmiemirza/GLOV | https://doi.org/10.48550/arXiv.2410.06154 |
201 | Enhancing Temporal Modeling of Video LLMs via Time Gating | Zi-Yuan Hu, Yiwu Zhong, Shijia Huang, Michael R. Lyu, Liwei Wang | 2024-10-08 | arXiv | https://github.com/LaVi-Lab/TG-Vid | http://arxiv.org/abs/2410.05714v1 |
202 | MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment | Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, François Yvon, Hinrich Schütze | 2024-10-08 | arXiv | https://github.com/cisnlp/Mexa | http://arxiv.org/abs/2410.05873v1 |
203 | ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities | Zhenchao Jin, Mengchen Liu, Dongdong Chen, Lingting Zhu, Yunsheng Li, Lequan Yu | 2024-10-08 | arXiv | https://github.com/CharlesPikachu/ToolBridge | http://arxiv.org/abs/2410.10872v1 |
204 | Aligning LLMs to Be Robust Against Prompt Injection | Sizhe Chen, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, Chuan Guo | 2024-10-07 | arXiv | https://github.com/facebookresearch/SecAlign | http://arxiv.org/abs/2410.05451v1 |
205 | PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | Mengzhao Chen, Yi Liu, Jiahao Wang, Yi Bin, Wenqi Shao, Ping Luo | 2024-10-07 | arXiv | https://github.com/ChenMnZ/PrefixQuant | http://arxiv.org/abs/2410.05265v1 |
206 | Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Xinyu Zhao, Guoheng Sun, Ruisi Cai, Yukun Zhou, Pingzhi Li, Peihao Wang, Bowen Tan, Yexiao He, Li Chen, Yi Liang, Beidi Chen, Binhang Yuan, Hongyi Wang, Ang Li, Zhangyang Wang, Tianlong Chen | 2024-10-07 | arXiv | https://github.com/Model-GLUE/Model-GLUE | http://arxiv.org/abs/2410.05357v1 |
207 | Can LLMs Understand Time Series Anomalies? | Zihao Zhou, Rose Yu | 2024-10-07 | arXiv | https://github.com/Rose-STL-Lab/AnomLLM/` | http://arxiv.org/abs/2410.05440v2 |
208 | Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback | Sanjiban Choudhury, Paloma Sodhi | 2024-10-07 | arXiv | https://leap-llm.github.io | http://arxiv.org/abs/2410.05434v1 |
209 | Synthesizing Interpretable Control Policies through Large Language Model Guided Search | Carlo Bosio, Mark W. Mueller | 2024-10-07 | arXiv | https://github.com/muellerlab/synthesizing_interpretable_control_policies | https://doi.org/10.48550/arXiv.2410.05406 |
210 | Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality | Guanyu Zhou, Yibo Yan, Xin Zou, Kun Wang, Aiwei Liu, Xuming Hu | 2024-10-07 | arXiv | https://github.com/The-Martyr/CausalMM | https://doi.org/10.48550/arXiv.2410.04780 |
211 | Intriguing Properties of Large Language and Vision Models | Young-Jun Lee, Byungsoo Ko, Han-Gyu Kim, Yechan Hwang, Ho-Jin Choi | 2024-10-07 | arXiv | https://github.com/passing2961/IP-LLVM | https://doi.org/10.48550/arXiv.2410.04751 |
212 | Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives | Xinliang Frederick Zhang, Nicholas Beauchamp, Lu Wang | 2024-10-07 | EMNLP | https://github.com/launchnlp/NoT | https://aclanthology.org/2024.findings-emnlp.963 |
213 | Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | Fei Wang, Ninareh Mehrabi, Palash Goyal, Rahul Gupta, Kai-Wei Chang, Aram Galstyan | 2024-10-07 | EMNLP | https://feiwang96.github.io/DataAdvisor/ | https://aclanthology.org/2024.emnlp-main.461 |
214 | CogDevelop2K: Reversed Cognitive Development in Multimodal Large Language Models | Yijiang Li, Qingying Gao, Haoran Sun, Haiyun Lyu, Dezhi Luo, Hokin Deng | 2024-10-06 | arXiv | https://growing-ai-like-a-child.github.io/ | https://doi.org/10.48550/arXiv.2410.10855 |
215 | Leveraging Large Language Models for Suicide Detection on Social Media with Limited Labels | Vy Nguyen, Chau Pham | 2024-10-06 | arXiv | https://github.com/khanhvynguyen/Suicide_Detection_LLMs | https://doi.org/10.48550/arXiv.2410.04501 |
216 | MindScope: Exploring Cognitive Biases in Large Language Models Through Multi-Agent Systems | Zhentao Xie, Jiabao Zhao, Yilei Wang, Jinxin Shi, Yanhong Bai, Xingjiao Wu, Liang He | 2024-10-06 | ECAI | https://github.com/2279072142/MindScope | https://doi.org/10.3233/FAIA240879 |
217 | CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints | Anirudh Atmakuru, Jatin Nainani, Rohith Siddhartha Reddy Bheemreddy, Anirudh Lakkaraju, Zonghai Yao, Hamed Zamani, Haw-Shiuan Chang | 2024-10-05 | arXiv | https://github.com/anirudhlakkaraju/cs4_benchmark | https://doi.org/10.48550/arXiv.2410.04197 |
218 | Neuron-Level Sequential Editing for Large Language Models | Houcheng Jiang, Junfeng Fang, Tianyu Zhang, An Zhang, Ruipeng Wang, Tao Liang, Xiang Wang | 2024-10-05 | arXiv | https://github.com/jianghoucheng/NSE | https://doi.org/10.48550/arXiv.2410.04045 |
219 | Self-Powered LLM Modality Expansion for Large Speech-Text Models | Tengfei Yu, Xuebo Liu, Zhiyi Hou, Liang Ding, Dacheng Tao, Min Zhang | 2024-10-04 | arXiv | https://github.com/ytf-philp/Self-powered-LSM | http://arxiv.org/abs/2410.03798v2 |
220 | Leveraging Social Determinants of Health in Alzheimer's Research Using LLM-Augmented Literature Mining and Knowledge Graphs | Tianqi Shang, Shu Yang, Weiqing He, Tianhua Zhai, Dawei Li, Bojian Hou, Tianlong Chen, Jason H. Moore, Marylyn D. Ritchie, Li Shen | 2024-10-04 | arXiv | https://github.com/hwq0726/SDoHenPKG | http://arxiv.org/abs/2410.09080v1 |
221 | LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity | Selim Furkan Tekin, Fatih Ilhan, Tiansheng Huang, Sihao Hu, Ling Liu | 2024-10-04 | arXiv | https://github.com/git-disl/llm-topla | http://arxiv.org/abs/2410.03953v1 |
222 | GraphRouter: A Graph-based Router for LLM Selections | Tao Feng, Yanzhen Shen, Jiaxuan You | 2024-10-04 | arXiv | https://github.com/ulab-uiuc/GraphRouter | http://arxiv.org/abs/2410.03834v1 |
223 | Aligning LLMs with Individual Preferences via Interaction | Shujin Wu, May Fung, Cheng Qian, Jeonghwan Kim, Dilek Hakkani-Tur, Heng Ji | 2024-10-04 | arXiv | https://github.com/ShujinWu-0814/ALOE | http://arxiv.org/abs/2410.03642v1 |
224 | Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents | Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, Yongfeng Zhang | 2024-10-04 | arXiv …, 2024 | https://github.com/agiresearch/ASB | http://arxiv.org/abs/2410.02644v1 |
225 | ARB-LLM: Alternating Refined Binarizations for Large Language Models | Zhiteng Li, Xianglong Yan, Tianao Zhang, Haotong Qin, Dong Xie, Jiang Tian, zhongchao shi, Linghe Kong, Yulun Zhang, Xiaokang Yang | 2024-10-04 | arXiv | https://github.com/ZHITENGLI/ARB-LLM | https://doi.org/10.48550/arXiv.2410.03129 |
226 | Steering Large Language Models between Code Execution and Textual Reasoning | Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang | 2024-10-04 | arXiv | https://yongchao98.github.io/CodeSteer/ | https://doi.org/10.48550/arXiv.2410.03524 |
227 | A Probabilistic Perspective on Unlearning and Alignment for Large Language Models | Yan Scholten, Stephan Günnemann, Leo Schwinn | 2024-10-04 | arXiv | https://github.com/yascho/probabilistic-unlearning | https://doi.org/10.48550/arXiv.2410.03523 |
228 | Output Scouting: Auditing Large Language Models for Catastrophic Responses | Andrew Bell, João Fonseca | 2024-10-04 | arXiv | https://github.com/joaopfonseca/outputscouting | https://doi.org/10.48550/arXiv.2410.05305 |
229 | PersonalSum: A User-Subjective Guided Personalized Summarization Dataset for Large Language Models | Lemei Zhang, Peng Liu, Marcus Tiedemann Oekland Henriksboe, Even W. Lauvrak, Jon Atle Gulla, Heri Ramampiaro | 2024-10-04 | arXiv | https://github.com/SmartmediaAI/PersonalSum | https://doi.org/10.48550/arXiv.2410.03905 |
230 | CommonIT: Commonality-Aware Instruction Tuning for Large Language Models via Data Partitions | Jun Rao, Xuebo Liu, Lian Lian, Shengjun Cheng, Yunjie Liao, Min Zhang | 2024-10-04 | arXiv | https://github.com/raojay7/CommonIT | https://doi.org/10.48550/arXiv.2410.03077 |
231 | PersoBench: Benchmarking Personalized Response Generation in Large Language Models | Saleh Afzoon, Usman Naseem, Amin Beheshti, Zahra Jamali | 2024-10-04 | arXiv | https://github.com/salehafzoon/PersoBench | https://doi.org/10.48550/arXiv.2410.03198 |
232 | FactAlign: Long-form Factuality Alignment of Large Language Models | Chao-Wei Huang, Yun-Nung Chen | 2024-10-03 | arXiv | https://github.com/MiuLab/FactAlign | https://doi.org/10.48550/arXiv.2410.01691 |
233 | POSIX: A Prompt Sensitivity Index For Large Language Models | Anwoy Chatterjee, H. S. V. N. S. Kowndinya Renduchintala, Sumit Bhatia, Tanmoy Chakraborty | 2024-10-03 | arXiv | https://github.com/kowndinya-renduchintala/POSIX | https://doi.org/10.48550/arXiv.2410.02185 |
234 | Traffic Light or Light Traffic? Investigating Phrasal Semantics in Large Language Models | Rui Meng, Ye Liu, Lifu Tu, Daqing He, Yingbo Zhou, Semih Yavuz | 2024-10-03 | arXiv | https://github.com/memray/llm_phrase_semantics | https://doi.org/10.48550/arXiv.2410.02308 |
235 | Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language | Anthony Costarelli, Mat Allen, Severin Field | 2024-10-03 | arXiv:2410.02472, 2024 | https://github.com/acostarelli/meta-models-public | http://arxiv.org/abs/2410.02472v2 |
236 | Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective | Zeyu Gan, Yong Liu | 2024-10-03 | arXiv:2410.01720, 2024 | https://github.com/ZyGan1999/Towards-a-Theoretical-Understanding-of-Synthetic-Data-in-LLM-Post-Training | http://arxiv.org/abs/2410.01720v2 |
237 | Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models | Shayekh Bin Islam, Md Asib Rahman, K. S. M. Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md Rizwan Parvez | 2024-10-02 | EMNLP | https://openragmoe.github.io/ | https://aclanthology.org/2024.findings-emnlp.831 |
238 | Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression | Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, Grace Li Zhang | 2024-10-02 | arXiv | https://github.com/TUDa-HWAI/Basis_Sharing | https://doi.org/10.48550/arXiv.2410.03765 |
239 | DLP-LoRA: Efficient Task-Specific LoRA Fusion with a Dynamic, Lightweight Plugin for Large Language Models | Yuxuan Zhang, Ruizhe Li | 2024-10-02 | arXiv | https://github.com/MeCuping/DLP-LoRA | https://doi.org/10.48550/arXiv.2410.01497 |
240 | StringLLM: Understanding the String Processing Capability of Large Language Models | Xilong Wang, Hao Fu, Jindong Wang, Neil Zhenqiang Gong | 2024-10-02 | arXiv | https://github.com/wxl-lxw/StringLLM | https://doi.org/10.48550/arXiv.2410.01208 |
241 | TypedThinker: Typed Thinking Improves Large Language Model Reasoning | Danqing Wang, Jianxin Ma, Fei Fang, Lei Li | 2024-10-02 | arXiv | https://github.com/dqwang122/ThinkHub | https://doi.org/10.48550/arXiv.2410.01952 |
242 | EMMA: Efficient Visual Alignment in Multi-Modal LLMs | Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami | 2024-10-02 | arXiv | https://github.com/SaraGhazanfari/EMMA | http://arxiv.org/abs/2410.02080v1 |
243 | Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? | Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang | 2024-10-02 | arXiv | https://github.com/xichen-fy/Fira | http://arxiv.org/abs/2410.01623v2 |
244 | PneumoLLM: Harnessing the power of large language model for pneumoconiosis diagnosis | Meiyue Song, Jiarui Wang, Zhihua Yu, Jiaxin Wang, Le Yang, Yuting Lu, Baicun Li, Xue Wang, Xiaoxu Wang, Qinghua Huang, Zhijun Li, Nikolaos I. Kanellakis, Jiangfeng Liu, Jing Wang, Binglu Wang, Juntao Yang | 2024-10-01 | Medical Image Anal. | https://github.com/CodeMonsterPHD/PneumoLLM/tree/main | https://doi.org/10.1016/j.media.2024.103248 |
245 | Style-Specific Neurons for Steering LLMs in Text Style Transfer | Wen Lai, Viktor Hangya, Alexander Fraser | 2024-10-01 | arXiv | https://github.com/wenlai-lavine/sNeuron-TST | http://arxiv.org/abs/2410.00593v1 |
246 | Insight: A Multi-Modal Diagnostic Pipeline using LLMs for Ocular Surface Disease Diagnosis | Chun-Hsiao Yeh, Jiayun Wang, Andrew D. Graham, Andrea J. Liu, Bo Tan, Yubei Chen, Yi Ma, Meng C. Lin | 2024-10-01 | arXiv | https://danielchyeh.github.io/MDPipe/ | http://arxiv.org/abs/2410.00292v1 |
247 | Dynamic Planning for LLM-based Graphical User Interface Automation | Shaoqing Zhang, Zhuosheng Zhang, Kehai Chen, Xinbei Ma, Muyun Yang, Tiejun Zhao, Min Zhang | 2024-10-01 | OpenReview | https://github.com/sqzhang-lazy/D-PoT | http://arxiv.org/abs/2410.00467v1 |
248 | mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model | Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/PaperOwl | https://dl.acm.org/doi/10.1145/3664647.3681294 |
249 | WorldGPT: Empowering LLM as Multimodal World Model | Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/DCDmllm/WorldGPT | https://dl.acm.org/doi/10.1145/3664647.3681488 |
250 | Semantic Alignment for Multimodal Large Language Models | Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://mccartney01.github.io/SAM | https://dl.acm.org/doi/10.1145/3664647.3681014 |
251 | Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei, Chang Wen Chen, Qing Li | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/fletcherjiang/LLMEPET | https://dl.acm.org/doi/10.1145/3664647.3681115 |
252 | Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval | Yabing Wang, Le Wang, Qiang Zhou, Zhibin Wang, Hao Li, Gang Hua, Wei Tang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/LiJiaBei-7/leccr | https://dl.acm.org/doi/10.1145/3664647.3680886 |
253 | MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors | Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/TangYuan96/MiniGPT-3D | https://dl.acm.org/doi/10.1145/3664647.3681257 |
254 | Making Large Language Models Perform Better in Knowledge Graph Completion | Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Wen Zhang, Huajun Chen | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/zjukg/KoPA | https://dl.acm.org/doi/10.1145/3664647.3681327 |
255 | MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models | Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/LuminosityX/MM-Forecast | https://dl.acm.org/doi/10.1145/3664647.3681593 |
256 | Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/suay1113/HMLLM | https://dl.acm.org/doi/10.1145/3664647.3680810 |
257 | Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs | Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/NJUNLP/Hallu-PI | https://dl.acm.org/doi/10.1145/3664647.3681251 |
258 | Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation | Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji | 2024-10 | MM '24: Proceedings of the 32nd ACM International Conference on Multimedia | https://github.com/xjjxmu/QSLAW | https://dl.acm.org/doi/10.1145/3664647.3680838 |
259 | UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models | Qi Liu, Yongyi He, Defu Lian, Zhi Zheng, Tong Xu, Liu Che, Enhong Chen | 2024-10 | CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | https://github.com/Javkonline/UniMEL | https://dl.acm.org/doi/10.1145/3627673.3679793 |
260 | Fairness in Large Language Models in Three Hours | Thang Viet Doan, Zichong Wang, Nhat Nguyen Minh Hoang, Wenbin Zhang | 2024-10 | CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | https://github.com/LavinWong/Fairness-in-Large-Language-Models | https://dl.acm.org/doi/10.1145/3627673.3679090 |
261 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He | 2024-10 | CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management | https://github.com/ECNU-ICALK/SocraticMath | https://dl.acm.org/doi/10.1145/3627673.3679881 |
262 | VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp | 2024-09-30 | arXiv | https://github.com/mayhugotong/VideoINSTA | http://arxiv.org/abs/2409.20365v2 |
263 | RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models | Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang | 2024-09-30 | arXiv | https://github.com/shuhao02/RouterDC | https://doi.org/10.48550/arXiv.2409.19886 |
264 | LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models | Haitao Li, You Chen, Qingyao Ai, Yueyue Wu, Ruizhe Zhang, Yiqun Liu | 2024-09-30 | arXiv | https://github.com/CSHaitao/LexEval | https://doi.org/10.48550/arXiv.2409.20288 |
265 | LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation | Ziyao Zhang, Yanlin Wang, Chong Wang, Jiachi Chen, Zibin Zheng | 2024-09-30 | arXiv | https://github.com/DeepSoftwareAnalytics/LLMCodingHallucination | http://arxiv.org/abs/2409.20550v1 |
266 | Do Influence Functions Work on Large Language Models? | Zhe Li, Wei Zhao, Yige Li, Jun Sun | 2024-09-30 | arXiv | https://github.com/plumprc/Failures-of-Influence-Functions-in-LLMs | https://doi.org/10.48550/arXiv.2409.19998 |
267 | Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models | Luohe Shi, Yao Yao, Zuchao Li, Lefei Zhang, Hai Zhao | 2024-09-30 | arXiv | https://github.com/ShiLuohe/ReferenceTrustableDecoding | https://doi.org/10.48550/arXiv.2409.20181 |
268 | A multimodal LLM for the non-invasive decoding of spoken text from brain recordings | Youssef Hmamouche, Ismail Chihab, Lahoucine Kdouri, Amal El Fallah Seghrouchni | 2024-09-29 | arXiv | https://github.com/Hmamouche/brain_decode | http://arxiv.org/abs/2409.19710v1 |
269 | BuildingView: Constructing Urban Building Exteriors Databases with Street View Imagery and Multimodal Large Language Mode | Zongrong Li, Yunlei Su, Chenyuan Zhu, Wufan Zhao | 2024-09-29 | arXiv | https://github.com/Jasper0122/BuildingView | https://doi.org/10.48550/arXiv.2409.19527 |
270 | Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models | Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang | 2024-09-29 | arXiv | https://github.com/BUPT-GAMMA/ProGraph | https://doi.org/10.48550/arXiv.2409.19667 |
271 | Identifying Knowledge Editing Types in Large Language Models | Xiaopeng Li, Shangwen Wang, Shezheng Song, Bin Ji, Huijun Liu, Shasha Li, Jun Ma, Jie Yu | 2024-09-29 | arXiv | https://github.com/xpq-tech/KETI | https://doi.org/10.48550/arXiv.2409.19663 |
272 | OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation | Tanvir Mahmud, Diana Marculescu | 2024-09-28 | arXiv | https://github.com/tanvir-utexas/OpenSep | https://doi.org/10.48550/arXiv.2409.19270 |
273 | Enhancing text-based knowledge graph completion with zero-shot large language models: A focus on semantic enhancement | Rui Yang, Jiahao Zhu, Jianping Man, Li Fang, Yi Zhou | 2024-09-27 | Knowl. Based Syst. | https://github.com/sjlmg/CP-KGC | https://doi.org/10.1016/j.knosys.2024.112155 |
274 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, Xinchao Wang | 2024-09-27 | arXiv | https://github.com/NVlabs/MaskLLM | https://doi.org/10.48550/arXiv.2409.17481 |
275 | HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection | Xuefeng Du, Chaowei Xiao, Yixuan Li | 2024-09-27 | arXiv:2409.17504, 2024 | https://github.com/deeplearningwisc/haloscope | http://arxiv.org/abs/2409.17504v1 |
276 | From News to Forecast: Integrating Event Analysis in LLM-Based Time Series Forecasting with Reflection | Xinlei Wang, Maike Feng, Jing Qiu, Jinjin Gu, Junhua Zhao | 2024-09-27 | arXiv:2409.17515, 2024 | https://github.com/ameliawong1996/From_News_to_Forecast | http://arxiv.org/abs/2409.17515v2 |
277 | AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment | Nan Sun, Bo Mao, Yongchang Li, Lumeng Ma, Di Guo, Huaping Liu | 2024-09-27 | arXiv:2409.17655, 2024 | https://assistantx-agent.github.io/AssistantX/ | http://arxiv.org/abs/2409.17655v1 |
278 | CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | Kanghyun Ryu, Qiayuan Liao, Zhongyu Li, Koushil Sreenath, Negar Mehr | 2024-09-27 | arXiv | https://github.com/labicon/CurricuLLM | https://doi.org/10.48550/arXiv.2409.18382 |
279 | Align2LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation | Hongzhe Huang, Zhewen Yu, Jiang Liu, Li Cai, Dian Jiao, Wenqiao Zhang, Siliang Tang, Juncheng Li, Hao Jiang, Haoyuan Li, Yueting Zhuang | 2024-09-27 | arXiv | https://github.com/DCDmllm/Align2LLaVA | https://doi.org/10.48550/arXiv.2409.18541 |
280 | A Survey on the Honesty of Large Language Models | Siheng Li, Cheng Yang, Taiqiang Wu, Chufan Shi, Yuji Zhang, Xinyu Zhu, Zesen Cheng, Deng Cai, Mo Yu, Lemao Liu, Jie Zhou, Yujiu Yang, Ngai Wong, Xixin Wu, Wai Lam | 2024-09-27 | arXiv | https://github.com/SihengLi99/LLM-Honesty-Survey | https://doi.org/10.48550/arXiv.2409.18786 |
281 | Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, yuelin bai, Run Luo, Longze Chen, Min Yang | 2024-09-27 | arXiv | https://github.com/Geaming2002/Ruler | https://doi.org/10.48550/arXiv.2409.18943 |
282 | Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models | Georg Ahnert, Max Pellert, David Garcia, Markus Strohmaier | 2024-09-26 | arXiv | https://github.com/dess-mannheim/temporal-adapters | http://arxiv.org/abs/2409.17990v1 |
283 | RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking | Yifan Jiang, Kriti Aggarwal, Tanmay Laud, Kashif Munir, Jay Pujara, Subhabrata Mukherjee | 2024-09-26 | arXiv | https://github.com/kriti-hippo/red_queen | https://doi.org/10.48550/arXiv.2409.17458 |
284 | LLM-CARD: Towards a Description and Landscape of Large Language Models | Shengwei Tian, Lifeng Han, Erick Mendez Guzman, Goran Nenadic | 2024-09-25 | arXiv | https://github.com/shengwei-tian/dependency-parser-visualization | https://doi.org/10.48550/arXiv.2409.17011 |
285 | Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness | Shixuan Ma, Quan Wang | 2024-09-25 | arXiv | https://github.com/Shixuan-Ma/TOCSIN | http://arxiv.org/abs/2409.16914v1 |
286 | Search for Efficient Large Language Models | Xuan Shen, Pu Zhao, Yifan Gong, Zhenglun Kong, Zheng Zhan, Yushu Wu, Ming Lin, Chao Wu, Xue Lin, Yanzhi Wang | 2024-09-25 | arXiv | https://github.com/shawnricecake/search-llm | https://doi.org/10.48550/arXiv.2409.17372 |
287 | DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | Kyuheon Jung, Yongdeuk Seo, Seongwoo Cho, Jaeyoung Kim, Hyun-seok Min, Sungchul Choi | 2024-09-25 | arXiv | https://github.com/kkyuhun94/dalda | http://arxiv.org/abs/2409.16949v1 |
288 | HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows | Wenlin Yao, Haitao Mi, Dong Yu | 2024-09-25 | arXiv | https://github.com/wenlinyao/HDFlow | http://arxiv.org/abs/2409.17433v1 |
289 | EventHallusion: Diagnosing Event Hallucinations in Video LLMs | Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Jingjing Chen, Yu-Gang Jiang | 2024-09-25 | arXiv | https://github.com/Stevetich/EventHallusion | http://arxiv.org/abs/2409.16597v1 |
290 | Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | Zhenmei Shi, Yifei Ming, Xuan-Phi Nguyen, Yingyu Liang, Shafiq Joty | 2024-09-25 | arXiv | https://github.com/SalesforceAIResearch/GemFilter | http://arxiv.org/abs/2409.17422v1 |
291 | CHBench: A Chinese Dataset for Evaluating Health in Large Language Models | Chenlu Guo, Nuo Xu, Yi Chang, Yuan Wu | 2024-09-24 | arXiv | https://github.com/TracyGuo2001/CHBench | https://doi.org/10.48550/arXiv.2409.15766 |
292 | HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen | 2024-09-24 | arXiv | https://github.com/Quehry/HelloBench | https://doi.org/10.48550/arXiv.2409.16191 |
293 | XTRUST: On the Multilingual Trustworthiness of Large Language Models | Yahan Li, Yi Wang, Yi Chang, Yuan Wu | 2024-09-24 | arXiv | https://github.com/LluckyYH/XTRUST | https://doi.org/10.48550/arXiv.2409.15762 |
294 | Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method | Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | 2024-09-23 | arXiv | https://github.com/zhang-wei-chao/DC-PDD | https://doi.org/10.48550/arXiv.2409.14781 |
295 | COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models | Kehui Liu, Zixin Tang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li | 2024-09-23 | arXiv | https://github.com/MrKeee/COHERENT | https://doi.org/10.48550/arXiv.2409.15146 |
296 | Phantom of Latent for Large Language and Vision Models | Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro | 2024-09-23 | arXiv | https://github.com/ByungKwanLee/Phantom | https://doi.org/10.48550/arXiv.2409.14713 |
297 | Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses | Hung-Ting Su, Ya-Ching Hsu, Xudong Lin, Xiang Qian Shi, Yulei Niu, Han-Yuan Hsu, Hung-yi Lee, Winston H. Hsu | 2024-09-22 | arXiv | https://github.com/Shelley1214/Trope | https://doi.org/10.48550/arXiv.2409.14324 |
298 | PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL | Ruilin Luo, Liyuan Wang, Binghuai Lin, Zicheng Lin, Yujiu Yang | 2024-09-21 | arXiv | https://github.com/lrlbbzl/PTD-SQL | http://arxiv.org/abs/2409.14082v1 |
299 | StateAct: State Tracking and Reasoning for Acting and Planning with Large Language Models | Nikolai Rozanov, Marek Rei | 2024-09-21 | arXiv | https://github.com/ai-nikolai/StateAct | https://doi.org/10.48550/arXiv.2410.02810 |
300 | ProcessTBench: An LLM Plan Generation Dataset for Process Mining | Andrei Cosmin Redis, Mohammadreza Fani Sani, Bahram Zarrin, Andrea Burattin | 2024-09-20 | arXiv e …, 2024 | https://github.com/microsoft/ProcessTBench | http://arxiv.org/abs/2409.09191v2 |
301 | ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources | Shuting Yang, Zehui Liu, Wolfgang Mayer | 2024-09-20 | arXiv | https://github.com/Zaiwen/CropGPT | https://doi.org/10.48550/arXiv.2409.13537 |
302 | Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models | Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Jing Qin | 2024-09-20 | arXiv | https://github.com/zhangpeii/Edu-Values | https://doi.org/10.48550/arXiv.2409.12739 |
303 | CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information | Yuxin Wang, Minghua Ma, Zekun Wang, Jingchang Chen, Huiming Fan, Liping Shan, Qing Yang, Dongliang Xu, Ming Liu, Bing Qin | 2024-09-20 | arXiv | https://github.com/wyxscir/CFSP | http://arxiv.org/abs/2409.13199v1 |
304 | HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | Junyi Chen, Lu Chi, Bingyue Peng, Zehuan Yuan | 2024-09-19 | arXiv | https://github.com/bytedance/HLLM | https://doi.org/10.48550/arXiv.2409.12740 |
305 | Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models | Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang G. Cai | 2024-09-19 | arXiv | https://github.com/ChenDelong1999/Linguistic-Similarity | https://doi.org/10.48550/arXiv.2409.12435 |
306 | CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | Tsung-Han Wu, Joseph E. Gonzalez, Trevor Darrell, David M. Chan | 2024-09-19 | arXiv | https://github.com/DavidMChan/clair-a | https://doi.org/10.48550/arXiv.2409.12962 |
307 | Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources | Issey Sukeda | 2024-09-19 | arXiv | https://github.com/stardust-coder/japanese-lm-med-harness | https://doi.org/10.48550/arXiv.2409.11783 |
308 | BodyShapeGPT: SMPL Body Shape Manipulation with LLMs | Baldomero R. Árbol, Dan Casas | 2024-09-18 | arXiv | https://github.com/baldoarbol/BodyShapeGPT | http://arxiv.org/abs/2410.03556v1 |
309 | Large Language Models Are Strong Audio-Visual Speech Recognition Learners | Umberto Cappellazzo, Minsu Kim, Honglie Chen, Pingchuan Ma, Stavros Petridis, Daniele Falavigna, Alessio Brutti, Maja Pantic | 2024-09-18 | arXiv | https://github.com/umbertocappellazzo/AVSR-LLMs | https://doi.org/10.48550/arXiv.2409.12319 |
310 | Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | Fatemeh Haji, Mazal Bethany, Maryam Tabar, Jason Chiang, Anthony Rios, Peyman Najafirad | 2024-09-17 | arXiv | https://github.com/SecureAIAutonomyLab/MA-ToT | http://arxiv.org/abs/2409.11527v1 |
311 | Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang | 2024-09-17 | arXiv | https://github.com/FreedomIntelligence/TRIM | http://arxiv.org/abs/2409.10994v1 |
312 | NVLM: Open Frontier-Class Multimodal LLMs | Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping | 2024-09-17 | arXiv | https://nvlm-project.github.io/ | http://arxiv.org/abs/2409.11402v2 |
313 | Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges | Vinay Samuel, Yue Zhou, Henry Peng Zou | 2024-09-16 | arXiv | https://github.com/vsamuel2003/data-contamination | https://doi.org/10.48550/arXiv.2409.09927 |
314 | HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making | Sumera Anjum, Hanzhi Zhang, Wenjun Zhou, Eun Jin Paek, Xiaopeng Zhao, Yunhe Feng | 2024-09-16 | arXiv | https://github.com/ResponsibleAILab/HALO | http://arxiv.org/abs/2409.10011v2 |
315 | Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models | Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou | 2024-09-16 | arXiv | https://github.com/ywh187/FitPrune | https://doi.org/10.48550/arXiv.2409.10197 |
316 | Do Large Language Models Need a Content Delivery Network? | Yihua Cheng, Kuntai Du, Jiayi Yao, Junchen Jiang | 2024-09-16 | arXiv | https://github.com/LMCache/LMCache | https://doi.org/10.48550/arXiv.2409.13761 |
317 | Benchmarking Large Language Model Uncertainty for Prompt Optimization | Pei-Fu Guo, Yun-Da Tsai, Shou-De Lin | 2024-09-16 | arXiv | https://github.com/0Frett/PO-Uncertainty-Benchmarking | https://doi.org/10.48550/arXiv.2409.10044 |
318 | AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs | Madhusudan Ghosh, Shrimon Mukherjee, Asmit Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar, Debasis Ganguly | 2024-09-15 | arXiv | https://github.com/shrimonmuke0202/AlpaPICO | http://arxiv.org/abs/2409.09704v1 |
319 | Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition | Zongyou Yu, Qiang Qu, Xiaoming Chen, Chen Wang | 2024-09-15 | arXiv | https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM | https://doi.org/10.48550/arXiv.2409.09628 |
320 | Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model | Bo-Kai Ruan, Hao-Tang Tsui, Yung-Hui Li, Hong-Han Shuai | 2024-09-15 | arXiv | https://basiclab.github.io/TTSG | https://doi.org/10.48550/arXiv.2409.09575 |
321 | LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach | Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, Yitian Chen | 2024-09-14 | arXiv | https://github.com/Cklwanfifa/KDDCUP2024-PST | http://arxiv.org/abs/2409.09383v2 |
322 | PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM | Kelin Fu, Yang Tian, Kaigui Bian | 2024-09-14 | arXiv | https://github.com/Z2sJ4t/PeriGuru | http://arxiv.org/abs/2409.09354v1 |
323 | L3Cube-IndicQuest: A Benchmark Questing Answering Dataset for Evaluating Knowledge of LLMs in Indic Context | Pritika Rohera, Chaitrali Ginimav, Akanksha Salunke, Gayatri Sawant, Raviraj Joshi | 2024-09-13 | arXiv | https://github.com/l3cube-pune/indic-nlp | http://arxiv.org/abs/2409.08706v1 |
324 | FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | Zhenhua Xu, Wenpeng Xing, Zhebo Wang, Chang Hu, Chen Jie, Meng Han | 2024-09-13 | arXiv | https://fingerprintvector.github.io | https://doi.org/10.48550/arXiv.2409.08846 |
325 | Intelligent LiDAR Navigation: Leveraging External Information and Semantic Maps with LLM as Copilot | Fujing Xie, Jiajie Zhang, Sören Schwertfeger | 2024-09-13 | arXiv | https://github.com/xiexiexiaoxiexie/Intelligent-LiDAR-Navigation-LLM-as-Copilot | http://arxiv.org/abs/2409.08493v1 |
326 | TAIiST CPS-UAV at the SBFT Tool Competition 2024 | T. Zhu, W. Newton, S. Embury, Y. Sun | 2024-09-12 | 2024 IEEE/ACM International Workshop on Search-Based and Fuzz Testing (SBFT) | https://github.com/Trusted-AI-in-System-Test | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10699540 |
327 | Fine-tuning Large Language Models for Entity Matching | Aaron Steiner, Ralph Peeters, Christian Bizer | 2024-09-12 | arXiv | https://github.com/wbsg-uni-mannheim/TailorMatch | https://doi.org/10.48550/arXiv.2409.08185 |
328 | LLM Detectors Still Fall Short of Real World: Case of LLM-Generated Short News-Like Posts | Henrique Da Silva Gameiro, Andrei Kucharavy, Ljiljana Dolamic | 2024-09-11 | arXiv e-prints, 2024 | https://github.com/Reliable-Information-Lab-HEVS/dynamic_llm_detector_benchmark | http://arxiv.org/abs/2409.03291v1 |
329 | Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation | SeongYeub Chu, JongWoo Kim, MunYong Yi | 2024-09-11 | arXiv | https://github.com/BBeeChu/InteractEval | http://arxiv.org/abs/2409.07355v1 |
330 | LLaMA-Omni: Seamless Speech Interaction with Large Language Models | Qingkai Fang, Shoutao Guo, Yan Zhou, Zhengrui Ma, Shaolei Zhang, Yang Feng | 2024-09-11 | arXiv | https://github.com/ictnlp/LLaMA-Omni | https://doi.org/10.48550/arXiv.2409.06666 |
331 | Understanding Knowledge Drift in LLMs through Misinformation | Alina Fastowski, Gjergji Kasneci | 2024-09-11 | arXiv | https://github.com/afastowski/knowledge_drift | http://arxiv.org/abs/2409.07085v1 |
332 | Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu | 2024-09-11 | arXiv | https://github.com/allen4747/Ferret | https://doi.org/10.48550/arXiv.2409.06277 |
333 | DrLLM: Prompt-Enhanced Distributed Denial-of-Service Resistance Method with Large Language Models | Zhenyu Yin, Shang Liu, Guangyuan Xu | 2024-09-11 | arXiv | https://github.com/liuup/DrLLM | https://doi.org/10.48550/arXiv.2409.10561 |
334 | AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs | Lijia Lv, Weigang Zhang, Xuehai Tang, Jie Wen, Feng Liu, Jizhong Han, Songlin Hu | 2024-09-11 | arXiv | https://github.com/Yummy416/AdaPPA | http://arxiv.org/abs/2409.07503v1 |
335 | Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning Approach | Meng Zhou, Surajsinh Parmar, Anubhav Bhatti | 2024-09-10 | arXiv | https://github.com/SpassMed/Med-Llama3 | https://doi.org/10.48550/arXiv.2409.05732 |
336 | What is the Role of Small Models in the LLM Era: A Survey | Lihu Chen, Gaël Varoquaux | 2024-09-10 | arXiv | https://github.com/tigerchen52/role_of_small_models | http://arxiv.org/abs/2409.06857v2 |
337 | Benchmarking Chinese Knowledge Rectification in Large Language Models | Tianhe Lu, Jizhan Fang, Yunzhi Yao, Xin Xu, Ningyu Zhang, Huajun Chen | 2024-09-09 | arXiv | https://github.com/zjunlp/EasyEdit | https://doi.org/10.48550/arXiv.2409.05806 |
338 | FLoRA: Federated Fine-Tuning Large Language Models with Heterogeneous Low-Rank Adaptations | Ziyao Wang, Zheyu Shen, Yexiao He, Guoheng Sun, Hongyi Wang, Lingjuan Lyu, Ang Li | 2024-09-09 | arXiv | https://github.com/ATP-1010/FederatedLLM | https://doi.org/10.48550/arXiv.2409.05976 |
339 | Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design | Andre Nakkab, Sai Qian Zhang, Ramesh Karri, Siddharth Garg | 2024-09-09 | MLCAD '24: Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD | https://github.com/ajn313/ROME-LLM | https://dl.acm.org/doi/10.1145/3670474.3685964 |
340 | OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs | Jintian Zhang, Cheng Peng, Mengshu Sun, Xiang Chen, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen, Ningyu Zhang | 2024-09-08 | arXiv | https://github.com/zjunlp/OneGen | http://arxiv.org/abs/2409.05152v1 |
341 | Multi-Programming Language Ensemble for Code Generation in Large Language Model | Tengfei Xue, Xuefeng Li, Tahir Azim, Roman Smirnov, Jianhui Yu, Arash Sadrieh, Babak Pahlavan | 2024-09-06 | arXiv | https://github.com/NinjaTech-AI/MPLE | https://doi.org/10.48550/arXiv.2409.04114 |
342 | Sirius: Contextual Sparsity with Correction for Efficient LLMs | Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Victoria Lin, Beidi Chen | 2024-09-05 | arXiv | https://github.com/Infini-AI-Lab/Sirius | http://arxiv.org/abs/2409.03856v1 |
343 | Sketch: A Toolkit for Streamlining LLM Operations | Xin Jiang, Xiang Li, Wenjia Ma, Xuezhi Fang, Yiqun Yao, Naitong Yu, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang | 2024-09-05 | arXiv | https://github.com/cofe-ai/Sketch | http://arxiv.org/abs/2409.03346v1 |
344 | Attention Heads of Large Language Models: A Survey | Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, Zhiyu Li | 2024-09-05 | arXiv | https://github.com/IAAR-Shanghai/Awesome-Attention-Heads | https://doi.org/10.48550/arXiv.2409.03752 |
345 | Planning In Natural Language Improves LLM Search For Code Generation | Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang | 2024-09-05 | arXiv | https://github.com/scaleapi/plansearch | http://arxiv.org/abs/2409.03733v1 |
346 | Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models | Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui | 2024-09-05 | arXiv | https://github.com/reml-group/DoG | https://doi.org/10.48550/arXiv.2409.03155 |
347 | Alignment-Aware Model Extraction Attacks on Large Language Models | Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu | 2024-09-04 | arXiv | https://github.com/liangzid/alignmentExtraction | https://doi.org/10.48550/arXiv.2409.02718 |
348 | Hypothesizing Missing Causal Variables with LLMs | Ivaxi Sheth, Sahar Abdelnabi, Mario Fritz | 2024-09-04 | arXiv | https://github.com/ivaxi0s/hypothesizing-causal-variable-llm | http://arxiv.org/abs/2409.02604v1 |
349 | Large Language Model-Based Agents for Software Engineering: A Survey | Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou | 2024-09-04 | arXiv | https://github.com/FudanSELab/Agent4SE-Paper-List | https://doi.org/10.48550/arXiv.2409.02977 |
350 | Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models? | Yixuan Tang, Yi Yang | 2024-09-04 | arXiv | https://github.com/yixuantt/PoolingAndAttn | http://arxiv.org/abs/2409.02727v1 |
351 | MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs | Saeid Asgari Taghanaki, Aliasgahr Khani, Amir Khasahmadi | 2024-09-03 | arXiv | https://github.com/asgsaeid/mmlu-pro-plus | http://arxiv.org/abs/2409.02257v1 |
352 | Foundations of Large Language Model Compression - Part 1: Weight Quantization | Sean I. Young | 2024-09-03 | arXiv | https://github.com/seannz/cvxq | https://doi.org/10.48550/arXiv.2409.02026 |
353 | Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor | Abdullah Arafat Miah, Yu Bi | 2024-09-03 | arXiv | https://github.com/SiSL-URI/Arch_Backdoor_LLM | https://doi.org/10.48550/arXiv.2409.01952 |
354 | Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2024-09-03 | arXiv | https://github.com/git-disl/Booster | https://doi.org/10.48550/arXiv.2409.01586 |
355 | Agentic Society: Merging skeleton from real world and texture from Large Language Model | Yuqi Bai, Kun Sun, Huishi Yin | 2024-09-02 | arXiv | https://github.com/baiyuqi/agentic-society | https://doi.org/10.48550/arXiv.2409.10550 |
356 | FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment | Ran Yan, Youhe Jiang, Wangcheng Tao, Xiaonan Nie, Bin Cui, Binhang Yuan | 2024-09-02 | arXiv | https://github.com/Relaxed-System-Lab/FlashFlex | https://doi.org/10.48550/arXiv.2409.01143 |
357 | Large Language Models versus Classical Machine Learning: Performance in COVID-19 Mortality Prediction Using High-Dimensional Tabular Data | Mohammadreza Ghaffarzadeh-Esfahani, Mahdi Ghaffarzadeh-Esfahani, Arian Salahi-Niri, Hossein Toreyhi, Zahra Atf, Amirali Mohsenzadeh-Kermani, Mahshad Sarikhani, Zohreh Tajabadi, Fatemeh Shojaeian, Mohammad Hassan Bagheri, Aydin Feyzi, Mohammadamin Tarighatpayma, Narges Gazmeh, Fateme Heydari, Hossein Afshar, Amirreza Allahgholipour, Farid Alimardani, Ameneh Salehi, Naghmeh Asadimanesh, Mohammad Amin Khalafi, Hadis Shabanipour, Ali Moradi, Sajjad Hossein Zadeh, Omid Yazdani, Romina Esbati, Moozhan Maleki, Danial Samiei Nasr, Amirali Soheili, Hossein Majlesi, Saba Shahsavan, Alireza Soheilipour, Nooshin Goudarzi, Erfan Taherifard, Hamidreza Hatamabadi, Jamil S. Samaan, Thomas Savage, Ankit Sakhuja, Ali Soroush, Girish N. Nadkarni, Ilad Alavi Darazam, Mohamad Amin Pourhoseingholi, Seyed Amir Ahmad Safavi-Naini | 2024-09-02 | arXiv | https://github.com/mohammad-gh009/Large-Language-Models-vs-Classical-Machine-learning | https://doi.org/10.48550/arXiv.2409.02136 |
358 | Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke | 2024-09-02 | arXiv | https://github.com/Workday/cpc | http://arxiv.org/abs/2409.01227v2 |
359 | Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models | Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang | 2024-09-01 | arXiv | https://github.com/umd-huang-lab/FalseRefusal | https://doi.org/10.48550/arXiv.2409.00598 |
360 | Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering | Derian Boer, Fabian Koch, Stefan Kramer | 2024-09-01 | arXiv | https://github.com/kramerlab/4StepFocus | http://arxiv.org/abs/2409.00861v1 |
361 | Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I. Alhadidi, Ahmed Jaber, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser, Andry Rakotonirainy | 2024-09-01 | arXiv | https://github.com/ahmed-abdulhuy/Solving-TSP-and-mTSP-Combinatorial-Challenges-using-Visual-Reasoning-and-Multi-Agent-Approach-MLLMs- | https://doi.org/10.48550/arXiv.2407.00092 |
362 | Large Language Models for Software Engineering: A Systematic Literature Review | Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John C. Grundy, Haoyu Wang | 2024-09 | ACM Transactions on Software Engineering and Methodology (TOSEM), Just Accepted | https://github.com/xinyi-hou/LLM4SE_SLR | https://dl.acm.org/doi/10.1145/3695988 |
363 | AskIt: Unified Programming Interface for Programming with Large Language Models | Katsumi Okuda, Saman P. Amarasinghe | 2024-09 | 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) | https://github.com/katsumiok/ts-askit | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10444830 |
364 | LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models | Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi | 2024-08-31 | arXiv | https://github.com/zhiyuanhubj/LongRecipe | https://doi.org/10.48550/arXiv.2409.00509 |
365 | MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models | Shuai Peng, Di Fu, Liangcai Gao, Xiuqin Zhong, Hongguang Fu, Zhi Tang | 2024-08-30 | arXiv | https://github.com/pengshuai-rin/MultiMath | https://doi.org/10.48550/arXiv.2409.00147 |
366 | A Survey on Evaluation of Large Language Models | Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie | 2024-08-28 | ACM Transactions on Intelligent Systems and Technology (TIST), Volume 15, Issue 3 | https://llm-eval.github.io/ | https://dl.acm.org/doi/10.1145/3641289 |
367 | CBF-LLM: Safe Control for LLM Alignment | Yuya Miyaoka, Masaki Inoue | 2024-08-28 | arXiv | https://github.com/Mya-Mya/CBF-LLM | http://arxiv.org/abs/2408.15625v1 |
368 | Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu | 2024-08-28 | arXiv | https://github.com/NVlabs/Eagle | http://arxiv.org/abs/2408.15998v1 |
369 | Efficient LLM Scheduling by Learning to Rank | Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang | 2024-08-28 | arXiv | https://github.com/hao-ai-lab/vllm-ltr | http://arxiv.org/abs/2408.15792v1 |
370 | Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu | 2024-08-28 | arXiv | https://github.com/Yaphabates/Rocket | https://doi.org/10.48550/arXiv.2408.15915 |
371 | RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models | Junyao Ge, Yang Zheng, Kaitai Guo, Jimin Liang | 2024-08-27 | arXiv | https://github.com/SlytherinGe/RSTeller | https://doi.org/10.48550/arXiv.2408.14744 |
372 | PAT: Pruning-Aware Tuning for Large Language Models | Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du | 2024-08-27 | arXiv | https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning | https://doi.org/10.48550/arXiv.2408.14721 |
373 | LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models | Haven Kim, Kahyun Choi | 2024-08-27 | arXiv | https://github.com/havenpersona/lycon | https://doi.org/10.48550/arXiv.2408.14750 |
374 | AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework | Jie Feng, Yuwei Du, Jie Zhao, Yong Li | 2024-08-26 | arXiv | https://github.com/tsinghua-fib-lab/AgentMove | https://doi.org/10.48550/arXiv.2408.13986 |
375 | CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation | Muhammad Fawi | 2024-08-26 | arXiv | https://github.com/MNoorFawi/curlora | http://arxiv.org/abs/2408.14572v1 |
376 | Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System | Sein Kim, Hongseok Kang, Seungyoon Choi, Donghyun Kim, Min-Chul Yang, Chanyoung Park | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/ghdtjr/A-LLMRec | https://dl.acm.org/doi/10.1145/3637528.3671931 |
377 | Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized Models | Seyed Amir Ahmad Safavi-Naini, Shuhaib Ali, Omer Shahab, Zahra Shahhoseini, Thomas Savage, Sara Rafiee, Jamil S. Samaan, Reem Al Shabeeb, Farah Ladak, Jamie O. Yang, Juan Echavarria, Sumbal Babar, Aasma Shaukat, Samuel Margolis, Nicholas P. Tatonetti, Girish N. Nadkarni, Bara El Kurdi, Ali Soroush | 2024-08-25 | arXiv | https://github.com/Sdamirsa/LLM-VLM-in-Gastroenterology | https://doi.org/10.48550/arXiv.2409.00084 |
378 | Understanding the Weakness of Large Language Model Agents within a Complex Android Environment | Mingzhe Xing, Rongkai Zhang, Hui Xue, Qi Chen, Fan Yang, Zhen Xiao | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/AndroidArenaAgent/AndroidArena | https://dl.acm.org/doi/10.1145/3637528.3671650 |
379 | RecExplainer: Aligning Large Language Models for Explaining Recommendation Models | Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, Xing Xie | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/microsoft/RecAI | https://dl.acm.org/doi/10.1145/3637528.3671802 |
380 | R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models | Shangqing Tu, Yuanchun Wang, Jifan Yu, Yuyang Xie, Yaran Shi, Xiaozhi Wang, Jing Zhang, Lei Hou, Juanzi Li | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/THU-KEG/R-Eval | https://dl.acm.org/doi/10.1145/3637528.3671564 |
381 | OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning | Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, Siheng Chen | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/rui-ye/OpenFedLLM | https://dl.acm.org/doi/10.1145/3637528.3671582 |
382 | A Survey of Large Language Models for Graphs | Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh V. Chawla, Chao Huang | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/HKUDS/Awesome-LLM4Graph-Papers | https://dl.acm.org/doi/10.1145/3637528.3671460 |
383 | Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network | Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/LinChen-65/ReStruct | https://dl.acm.org/doi/10.1145/3637528.3671965 |
384 | Bias and Unfairness in Information Retrieval Systems: New Challenges in the LLM Era | Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, Jun Xu | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://llm-ir-bias-fairness.github.io/ | https://dl.acm.org/doi/10.1145/3637528.3671458 |
385 | AutoWebGLM: A Large Language Model-based Web Navigating Agent | Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://github.com/THUDM/AutoWebGLM | https://dl.acm.org/doi/10.1145/3637528.3671620 |
386 | A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models | Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li | 2024-08-25 | KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining | https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/ | https://dl.acm.org/doi/10.1145/3637528.3671470 |
387 | ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models | Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang | 2024-08-25 | arXiv | https://github.com/yejipark-m/ConVis | https://doi.org/10.48550/arXiv.2408.13906 |
388 | HRGraph: Leveraging LLMs for HR Data Knowledge Graphs with Information Propagation-based Job Recommendation | Azmine Toushik Wasi | 2024-08-24 | Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024), Association for Computational Linguistics 2024 | https://github.com/azminewasi/HRGraph | http://arxiv.org/abs/2408.13521v1 |
389 | LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang, Sunghun Kim | 2024-08-24 | arXiv | https://github.com/deep-diver/llamaduo | http://arxiv.org/abs/2408.13467v2 |
390 | vitaLITy 2: Reviewing Academic Literature Using Large Language Models | Hongye An, Arpit Narechania, Emily Wall, Kai Xu | 2024-08-24 | arXiv | https://vitality-vis.github.io | https://doi.org/10.48550/arXiv.2408.13450 |
391 | LIMP: Large Language Model Enhanced Intent-aware Mobility Prediction | Songwei Li, Jie Feng, Jiawei Chi, Xinyuan Hu, Xiaomeng Zhao, Fengli Xu | 2024-08-23 | arXiv | https://github.com/tsinghua-fib-lab/LIMP | https://doi.org/10.48550/arXiv.2408.12832 |
392 | MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | Yi-Fan Zhang, Huanyu Zhang, Haochen Tian, Chaoyou Fu, Shuangqing Zhang, Junfei Wu, Feng Li, Kun Wang, Qingsong Wen, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan | 2024-08-23 | arXiv | https://mme-realworld.github.io/ | http://arxiv.org/abs/2408.13257v1 |
393 | LLM-PBE: Assessing Data Privacy in Large Language Models | Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song | 2024-08-23 | Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11 | https://llm-pbe.github.io/ | https://dl.acm.org/doi/10.14778/3681954.3681994 |
394 | Generating Analytic Specifications for Data Visualization from Natural Language Queries using Large Language Models | Subham Sah, Rishab Mitra, Arpit Narechania, Alex Endert, John T. Stasko, Wenwen Dou | 2024-08-23 | arXiv | https://nl4dv.github.io | https://doi.org/10.48550/arXiv.2408.13391 |
395 | IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities | Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin | 2024-08-23 | arXiv | https://github.com/360CVGroup/Inner-Adaptor-Architecture | https://doi.org/10.48550/arXiv.2408.12902 |
396 | BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models | Yige Li, Hanxun Huang, Yunhan Zhao, Xingjun Ma, Jun Sun | 2024-08-23 | arXiv | https://github.com/bboylyg/BackdoorLLM | https://doi.org/10.48550/arXiv.2408.12798 |
397 | GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models | Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu | 2024-08-22 | arXiv | https://github.com/kstanghere/GenderCARE-ccs24 | https://doi.org/10.48550/arXiv.2408.12494 |
398 | Towards Evaluating and Building Versatile Large Language Models for Medicine | Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie | 2024-08-22 | arXiv | https://henrychur.github.io/MedS-Bench/ | https://doi.org/10.48550/arXiv.2408.12547 |
399 | Reasoning Factual Knowledge in Structured Data with Large Language Models | Sirui Huang, Yanggan Gu, Xuming Hu, Zhonghao Li, Qing Li, Guandong Xu | 2024-08-22 | arXiv | https://github.com/EganGu/StructFact | https://doi.org/10.48550/arXiv.2408.12188 |
400 | MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents | Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, Xun Jiang | 2024-08-22 | arXiv | https://github.com/lemonsis/MDD-5k | http://arxiv.org/abs/2408.12142v1 |
401 | Controllable Text Generation for Large Language Models: A Survey | Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li | 2024-08-22 | arXiv | https://github.com/IAAR-Shanghai/CTGSurvey | https://doi.org/10.48550/arXiv.2408.12599 |
402 | Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs | Ronit Singhal, Pransh Patwa, Parth Patwa, Aman Chadha, Amitava Das | 2024-08-22 | arXiv | https://github.com/ronit-singhal/evidence-backed-fact-checking-using-rag-and-few-shot-in-context-learning-with-llms | http://arxiv.org/abs/2408.12060v1 |
403 | Enhanced Fine-Tuning of Lightweight Domain-Specific Q&A Model Based on Large Language Models | Shenglin Zhang, Pengtian Zhu, Minghua Ma, Jiagang Wang, Yongqian Sun, Dongwen Li, Jingyu Wang, Qianying Guo, Xiaolei Hua, Lin Zhu, Dan Pei | 2024-08-22 | arXiv | https://github.com/Zero-Pointer/Self-Evolution | https://doi.org/10.48550/arXiv.2408.12247 |
404 | Aligning (Medical) LLMs for (Counterfactual) Fairness | Raphael Poulain, Hamed Fayyaz, Rahmatollah Beheshti | 2024-08-22 | arXiv | https://github.com/healthylaife/FairAlignmentLLM | http://arxiv.org/abs/2408.12055v1 |
405 | MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen | 2024-08-21 | arXiv | https://github.com/zjwang21/MoE-LPR | https://doi.org/10.48550/arXiv.2408.11396 |
406 | Personality Alignment of Large Language Models | Minjun Zhu, Linyi Yang, Yue Zhang | 2024-08-21 | arXiv | https://github.com/zhu-minjun/PAlign | https://doi.org/10.48550/arXiv.2408.11779 |
407 | SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models | Yang Cao | 2024-08-21 | arXiv | https://github.com/Gunale0926/SORSA | https://doi.org/10.48550/arXiv.2409.00055 |
408 | SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins | Jingquan Wang, Harry Zhang, Huzaifa Mustafa Unjhawala, Peter Negrut, Shu Wang, Khailanii Slaton, Radu Serban, Jin-Long Wu, Dan Negrut | 2024-08-21 | arXiv | https://github.com/uwsbel/SimBench | http://arxiv.org/abs/2408.11987v1 |
409 | Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models | Yuzhou Huang, Yiran Qin, Shunlin Lu, Xintao Wang, Rui Huang, Ying Shan, Ruimao Zhang | 2024-08-21 | arXiv | https://yuzhou914.github.io/Story3D-Agent/ | https://doi.org/10.48550/arXiv.2408.11801 |
410 | LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models | Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu | 2024-08-20 | arXiv | https://github.com/YupengSu/LLM-Barber | https://doi.org/10.48550/arXiv.2408.10631 |
411 | SysBench: Can Large Language Models Follow System Messages? | Yanzhao Qin, Tao Zhang, Tao Zhang, Yanjun Shen, Wenjing Luo, Haoze Sun, Yan Zhang, Yujing Qiao, Weipeng Chen, Zenan Zhou, Wentao Zhang, Bin Cui | 2024-08-20 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/SysBench | https://doi.org/10.48550/arXiv.2408.10943 |
412 | Large Language Models for Multimodal Deformable Image Registration | Mingrui Ma, Weijie Wang, Jie Ning, Jianfeng He, Nicu Sebe, Bruno Lepri | 2024-08-20 | arXiv | https://github.com/ninjannn/LLM-Morph | https://doi.org/10.48550/arXiv.2408.10703 |
413 | Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter | Junhao Chen, Bowen Wang, Zhouqiang jiang, Yuta Nakashima | 2024-08-20 | arXiv | https://github.com/3244we/Question-Rewriter | http://arxiv.org/abs/2408.10573v1 |
414 | FLAME: Learning to Navigate with Multimodal LLM in Urban Environments | Yunzhe Xu, Yiyuan Pan, Zhe Liu, Hesheng Wang | 2024-08-20 | arXiv | https://flame-sjtu.github.io | http://arxiv.org/abs/2408.11051v1 |
415 | Beyond Labels: Aligning Large Language Models with Human-like Reasoning | Muhammad Rafsan Kabir, Rafeed Mohammad Sultan, Ihsanul Haque Asif, Jawad Ibn Ahad, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman | 2024-08-20 | arXiv | https://github.com/apurba-nsu-rnd-lab/DFAR | https://doi.org/10.48550/arXiv.2408.11879 |
416 | Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model | Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou | 2024-08-20 | arXiv | https://github.com/chenhan97/Otter | https://doi.org/10.48550/arXiv.2408.10764 |
417 | CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding? | Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma | 2024-08-20 | arXiv | https://github.com/CodeLLM-Research/CodeJudge-Eval | https://doi.org/10.48550/arXiv.2408.10718 |
418 | FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant | Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang, Jiaya Jia | 2024-08-19 | arXiv | https://ffaa-vl.github.io | https://doi.org/10.48550/arXiv.2408.10072 |
419 | AutoML-guided Fusion of Entity and LLM-based representations | Boshko Koloski, Senja Pollak, Roberto Navigli, Blaž Škrlj | 2024-08-19 | arXiv | https://github.com/bkolosk1/bablfusion | http://arxiv.org/abs/2408.09794v1 |
420 | CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models | Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Liutao Liutao, Deyi Xiong | 2024-08-19 | ACL | https://github.com/tjunlp-lab/CMoralEval | https://aclanthology.org/2024.findings-acl.703 |
421 | Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework | Jiandong Jin, Xiao Wang, Qian Zhu, Haiyang Wang, Chenglong Li | 2024-08-19 | arXiv | https://github.com/Event-AHU/OpenPAR | https://doi.org/10.48550/arXiv.2408.09720 |
422 | R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation | Xiao Wang, Yuehang Li, Fuling Wang, Shiao Wang, Chuanfu Li, Bo Jiang | 2024-08-19 | arXiv | https://github.com/Event-AHU/Medical_Image_Analysis | https://doi.org/10.48550/arXiv.2408.09743 |
423 | PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding | Dawei Dai, Yuanhui Zhang, Long Xu, Qianlan Yang, Xiaojing Shen, Shuyin Xia, Guoyin Wang | 2024-08-18 | arXiv | https://github.com/ddw2AIGROUP2CQUPT/PA-LLaVA | https://doi.org/10.48550/arXiv.2408.09530 |
424 | Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning | Tiansheng Huang, Gautam Bhattacharya, Pratik Joshi, Josh Kimball, Ling Liu | 2024-08-18 | arXiv | https://huangtiansheng.github.io/Antidote_gh_page/ | https://doi.org/10.48550/arXiv.2408.09600 |
425 | HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model | Mengkang Hu, Tianxing Chen, Qiguang Chen, Yao Mu, Wenqi Shao, Ping Luo | 2024-08-18 | arXiv | https://github.com/HiAgent2024/HiAgent | https://doi.org/10.48550/arXiv.2408.09559 |
426 | TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems | Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang | 2024-08-17 | arXiv | https://https://github.com/Artessay/SAMA | http://arxiv.org/abs/2408.09199v1 |
427 | Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges | Baixiang Huang, Canyu Chen, Kai Shu | 2024-08-16 | arXiv | https://llm-authorship.github.io | http://arxiv.org/abs/2408.08946v1 |
428 | Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program | Alejandro Carrasco, Victor Rodriguez-Fernandez, Richard Linares | 2024-08-16 | arXiv | https://github.com/ARCLab-MIT/kspdg | http://arxiv.org/abs/2408.08676v1 |
429 | MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector | Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang | 2024-08-16 | arXiv | https://github.com/wjfu99/MIA-Tuner | https://doi.org/10.48550/arXiv.2408.08661 |
430 | Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks | Jiawei Zhao, Kejiang Chen, Xiaojian Yuan, Weiming Zhang | 2024-08-15 | arXiv | https://github.com/weiyezhimeng/Prefix-Guidance | https://doi.org/10.48550/arXiv.2408.08924 |
431 | Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images | Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Tom Weidong Cai | 2024-08-15 | arXiv | https://github.com/Zhiyuan-Li-John/MuCR | https://doi.org/10.48550/arXiv.2408.08105 |
432 | Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models | Tianyu Wang, Haitao Lin, Junqiu Yu, Yanwei Fu | 2024-08-15 | arXiv | https://star-uu-wang.github.io/Polaris/ | https://doi.org/10.48550/arXiv.2408.07975 |
433 | Can Large Language Models Understand Symbolic Graphics Programs? | Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf | 2024-08-15 | arXiv | https://sgp-bench.github.io/ | https://doi.org/10.48550/arXiv.2408.08313 |
434 | FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models | Zhongyu Zhao, Menghang Dong, Rongyu Zhang, Wenzhao Zheng, Yunpeng Zhang, Huanrui Yang, Dalong Du, Kurt Keutzer, Shanghang Zhang | 2024-08-15 | arXiv | https://github.com/zhenwuweihe/FactorLLM | https://doi.org/10.48550/arXiv.2408.11855 |
435 | ArabLegalEval: A Multitask Benchmark for Assessing Arabic Legal Knowledge in Large Language Models | Faris Hijazi, Somayah AlHarbi, Abdulaziz AlHussein, Harethah Abu Shairah, Reem Alzahrani, Hebah AlShamlan, George Turkiyyah, Omar Knio | 2024-08-15 | ArabicNLP | https://github.com/Thiqah/ArabLegalEval | https://aclanthology.org/2024.arabicnlp-1.20 |
436 | Evaluating Large Language Model based Personal Information Extraction and Countermeasures | Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong | 2024-08-14 | arXiv | https://github.com/liu00222/LLM-Based-Personal-Profile-Extraction | https://doi.org/10.48550/arXiv.2408.07291 |
437 | Knowledge in Superposition: Unveiling the Failures of Lifelong Knowledge Editing for Large Language Models | Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao | 2024-08-14 | arXiv | https://github.com/ChenhuiHu/knowledge_in_superposition | https://doi.org/10.48550/arXiv.2408.07413 |
438 | Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | Enneng Yang, Li Shen, Guibing Guo, Xingwei Wang, Xiaochun Cao, Jie Zhang, Dacheng Tao | 2024-08-14 | arXiv | https://github.com/EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications | http://arxiv.org/abs/2408.07666v3 |
439 | LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li | 2024-08-13 | arXiv | https://github.com/THUDM/LongWriter | http://arxiv.org/abs/2408.07055v1 |
440 | Kov: Transferable and Naturalistic Black-Box LLM Attacks using Markov Decision Processes and Tree Search | Robert J. Moss | 2024-08-11 | arXiv | https://github.com/sisl/Kov.jl | http://arxiv.org/abs/2408.08899v1 |
441 | Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models | Yupeng Chang, Yi Chang, Yuan Wu | 2024-08-09 | arXiv | https://github.com/cyp-jlu-ai/BA-LoRA | https://doi.org/10.48550/arXiv.2408.04556 |
442 | Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | Qirui Jiao, Daoyuan Chen, Yilun Huang, Yaliang Li, Ying Shen | 2024-08-09 | arXiv | https://github.com/modelscope/data-juicer/tree/ImgDiff | https://doi.org/10.48550/arXiv.2408.04594 |
443 | Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | Junde Wu, Jiayuan Zhu, Yunli Qi, Jingkun Chen, Min Xu, Filippo Menolascina, Vicente Grau | 2024-08-09 | arXiv | https://github.com/MedicineToken/Medical-Graph-RAG/tree/main | https://doi.org/10.48550/arXiv.2408.04187 |
444 | Open-domain Implicit Format Control for Large Language Model Generation | Yiqun Yao, Wenjia Ma, Xuezhi Fang, Xin Jiang, Xiang Li, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang | 2024-08-09 | arXiv | https://github.com/cofe-ai/OIFC | https://doi.org/10.48550/arXiv.2408.04392 |
445 | Revisiting Multi-Modal LLM Evaluation | Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan | 2024-08-09 | arXiv | https://kevinlujian.github.io/MLLM_Evaluations/ | http://arxiv.org/abs/2408.05334v1 |
446 | SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply Chain Disruptions | Zhi-Qi Cheng, Yifei Dong, Aike Shi, Wei Liu, Yuzhi Hu, Jason O'Connor, Alexander G. Hauptmann, Kate S. Whitefoot | 2024-08-09 | arXiv | https://fly1113.github.io/MFI/ | http://arxiv.org/abs/2408.05357v2 |
447 | Tabular Transfer Learning via Prompting LLMs | Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, Jinwoo Shin | 2024-08-09 | arXiv | https://github.com/jaehyun513/P2T | http://arxiv.org/abs/2408.11063v1 |
448 | VITA: Towards Open-Source Interactive Omni Multimodal LLM | Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Shaoqi Dong, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun | 2024-08-09 | arXiv | https://vita-home.github.io | http://arxiv.org/abs/2408.05211v2 |
449 | ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang | 2024-08-08 | arXiv | https://github.com/apple/ToolSandbox | http://arxiv.org/abs/2408.04682v1 |
450 | BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models | Yupeng Chang, Yi Chang, Yuan Wu | 2024-08-08 | arXiv | https://github.com/cyp-jlu-ai/BA-LoRA | http://arxiv.org/abs/2408.04556v2 |
451 | NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time | Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu | 2024-08-07 | arXiv | https://github.com/PaddlePaddle/Research/tree/master/NLP/ACL2024-NACL | http://arxiv.org/abs/2408.03675v2 |
452 | WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | Prannaya Gupta, Le Qi Yau, Hao Han Low, I-Shiang Lee, Hugo Maximus Lim, Yu Xin Teoh, Jia Hng Koh, Dar Win Liew, Rishabh Bhardwaj, Rajat Bhardwaj, Soujanya Poria | 2024-08-07 | arXiv | https://github.com/walledai/walledeval | https://doi.org/10.48550/arXiv.2408.03837 |
453 | CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | Xiangyan Liu, Bo Lan, Zhiyuan Hu, Yang Liu, Zhicheng Zhang, Fei Wang, Michael Shieh, Wenmeng Zhou | 2024-08-07 | arXiv | https://github.com/modelscope/modelscope-agent/tree/master/apps/codexgraph_agent | https://doi.org/10.48550/arXiv.2408.03910 |
454 | Citekit: A Modular Toolkit for Large Language Model Citation Generation | Jiajun Shen, Tong Zhou, Suifeng Zhao, Yubo Chen, Kang Liu | 2024-08-06 | arXiv | https://github.com/SjJ1017/Citekit | https://doi.org/10.48550/arXiv.2408.04662 |
455 | OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs | Hasan Iqbal, Yuxia Wang, Minghan Wang, Georgi Georgiev, Jiahui Geng, Iryna Gurevych, Preslav Nakov | 2024-08-06 | arXiv | https://github.com/hasaniqbal777/openfactcheck | http://arxiv.org/abs/2408.11832v1 |
456 | StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun | 2024-08-06 | ACL | https://github.com/c-box/StructEval | https://aclanthology.org/2024.findings-acl.314 |
457 | Topic Modeling with Fine-tuning LLMs and Bag of Sentences | Johannes Schneider | 2024-08-06 | arXiv | https://github.com/JohnTailor/FT-Topic | http://arxiv.org/abs/2408.03099v1 |
458 | ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning | Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen | 2024-08-06 | arXiv | https://github.com/nlp-uoregon/ullme | https://doi.org/10.48550/arXiv.2408.03402 |
459 | RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak | 2024-08-05 | arXiv | https://github.com/IntelLabs/RAGFoundry | http://arxiv.org/abs/2408.02545v1 |
460 | ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems | Andrew Zhu, Liam Dugan, Chris Callison-Burch | 2024-08-05 | arXiv | https://github.com/zhudotexe/redel | http://arxiv.org/abs/2408.02248v1 |
461 | UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model | Zhaowei Li, Wei Wang, Yiqing Cai, Qi Xu, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang | 2024-08-05 | arXiv | https://github.com/lzw-lzw/UnifiedMLLM | https://doi.org/10.48550/arXiv.2408.02503 |
462 | Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models | Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Haoyang Li | 2024-08-05 | arXiv | https://github.com/liangzid/PromptExtractionEval | https://doi.org/10.48550/arXiv.2408.02416 |
463 | Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models | Mingxin Huang, Yuliang Liu, Dingkang Liang, Lianwen Jin, Xiang Bai | 2024-08-04 | arXiv | https://github.com/Yuliang-Liu/Monkey | https://doi.org/10.48550/arXiv.2408.02034 |
464 | MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | Jihye Choi, Nils Palumbo, Prasad Chalasani, Matthew M. Engelhard, Somesh Jha, Anivarya Kumar, David Page | 2024-08-03 | arXiv | https://github.com/jihyechoi77/malade | http://arxiv.org/abs/2408.01869v1 |
465 | PLUGH: A Benchmark for Spatial Understanding and Reasoning in Large Language Models | Alexey Tikhonov | 2024-08-03 | arXiv | https://github.com/altsoph/PLUGH | https://doi.org/10.48550/arXiv.2408.04648 |
466 | Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses | Gabriele Sarti, Tommaso Caselli, Malvina Nissim, Arianna Bisazza | 2024-08-02 | arXiv | https://github.com/gsarti/verbalized-rebus | https://doi.org/10.48550/arXiv.2408.00584 |
467 | Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs | Yilun Hua, Yoav Artzi | 2024-08-02 | arXiv | https://github.com/lil-lab/ICCA | http://arxiv.org/abs/2408.01417v1 |
468 | CFBench: A Comprehensive Constraints-Following Benchmark for LLMs | Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou | 2024-08-02 | arXiv | https://github.com/PKU-Baichuan-MLSystemLab/CFBench | http://arxiv.org/abs/2408.01122v1 |
469 | Agentic LLM Workflows for Generating Patient-Friendly Medical Reports | Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih | 2024-08-02 | arXiv | http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation | http://arxiv.org/abs/2408.01112v2 |
470 | ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models | Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji | 2024-08-01 | arXiv | https://github.com/mrwu-mac/ControlMLLM | https://doi.org/10.48550/arXiv.2407.21534 |
471 | ArcheType: A Novel Framework for Open-Source Column Type Annotation Using Large Language Models | Benjamin Feuer, Yurong Liu, Chinmay Hegde, Juliana Freire | 2024-08 | Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 9 | https://github.com/penfever/ArcheType | https://dl.acm.org/doi/10.14778/3665844.3665857 |
472 | Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning | Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng | 2024-07-29 | arXiv | https://github.com/zengxingchen/ChartQA-MLLM | https://doi.org/10.48550/arXiv.2407.20174 |
473 | Can Editing LLMs Inject Harm? | Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu | 2024-07-29 | arXiv | https://llm-editing.github.io | http://arxiv.org/abs/2407.20224v2 |
474 | CollectiveSFT: Scaling Large Language Models for Chinese Medical Benchmark with Collective Instructions in Healthcare | Jingwei Zhu, Minghuan Tan, Min Yang, Ruixue Li, Hamid Alinejad-Rokny | 2024-07-29 | arXiv | https://github.com/CAS-SIAT-XinHai/CollectiveSFT | https://doi.org/10.48550/arXiv.2407.19705 |
475 | rLLM: Relational Table Learning with LLMs | Weichen Li, Xiaotong Huang, Jianwu Zheng, Zheng Wang, Chaokun Wang, Li Pan, Jianhua Li | 2024-07-29 | arXiv | https://github.com/rllm-project/rllm | http://arxiv.org/abs/2407.20157v1 |
476 | A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation | Laiyi Fu, Binbin Fan, Hongkai Du, Yanxiang Feng, Chunhua Li, Huping Song | 2024-07-26 | arXiv | https://github.com/sperfu/EyeDoc | https://doi.org/10.48550/arXiv.2407.18483 |
477 | The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | Zihui Wu, Haichang Gao, Jianping He, Ping Wang | 2024-07-25 | arXiv | https://github.com/wooozihui/jailbreakfunction | https://doi.org/10.48550/arXiv.2407.17915 |
478 | Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives | Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, Dong-Kyu Chae | 2024-07-25 | arXiv | https://heal-workshop.github.io/#:~:text=Exploring%20Bengali%20Religious%20Dialect%20Biases%20in%20Large%20Language%20Models%20with%20Evaluation%20Perspectives | https://doi.org/10.48550/arXiv.2407.18376 |
479 | Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance | Ao Shen, Qiang Wang, Zhiquan Lai, Xionglve Li, Dong-sheng Li | 2024-07-24 | arXiv | https://github.com/xiaocaigou/qbaraqahira | https://doi.org/10.48550/arXiv.2407.17029 |
480 | Scalify: scale propagation for efficient low-precision LLM training | Paul Balança, Sam Hosegood, Carlo Luschi, Andrew Fitzgibbon | 2024-07-24 | arXiv | https://github.com/graphcore-research/jax-scalify | http://arxiv.org/abs/2407.17353v1 |
481 | Enhancing LLM's Cognition via Structurization | Kai Liu, Zhihang Fu, Chao Chen, Wei Zhang, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye | 2024-07-23 | arXiv | https://github.com/alibaba/struxgpt | http://arxiv.org/abs/2407.16434v1 |
482 | Figure it Out: Analyzing-based Jailbreak Attack on Large Language Models | Shi Lin, Rongchang Li, Xun Wang, Changting Lin, Wenpeng Xing, Meng Han | 2024-07-23 | arXiv | https://github.com/theshi-1128/ABJ-Attack | https://doi.org/10.48550/arXiv.2407.16205 |
483 | INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji | 2024-07-23 | arXiv | https://github.com/WeihuangLin/INF-LLaVA | https://doi.org/10.48550/arXiv.2407.16198 |
484 | LawLuo: A Chinese Law Firm Co-run by LLM Agents | Jingyun Sun, Chengxiao Dai, Zhongze Luo, Yangbo Chang, Yang Li | 2024-07-23 | arXiv | https://github.com/NEFUJing/LawLuo | http://arxiv.org/abs/2407.16252v1 |
485 | Structure-aware Domain Knowledge Injection for Large Language Models | Kai Liu, Ze Chen, Zhihang Fu, Rongxin Jiang, Fan Zhou, Yaowu Chen, Yue Wu, Jieping Ye | 2024-07-23 | arXiv | https://github.com/alibaba/struxgpt | http://arxiv.org/abs/2407.16724v2 |
486 | LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models | Xi Chen, Songyang Zhang, Qibing Bai, Kai Chen, Satoshi Nakamura | 2024-07-22 | ACL | https://github.com/openaudiolab/LLaST | https://aclanthology.org/2024.findings-acl.416 |
487 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu, Mingfei Gao, Zhe Gan, Hong-You Chen, Zhengfeng Lai, Haiming Gang, Kai Kang, Afshin Dehghan | 2024-07-22 | arXiv | https://github.com/apple/ml-slowfast-llava | https://doi.org/10.48550/arXiv.2407.15841 |
488 | Counter Turing Test ( |
Ishan Kavathekar, Anku Rani, Ashmit Chamoli, Ponnurangam Kumaraguru, Amit Sheth, Amitava Das | 2024-07-22 | OpenReview | https://github.com/ishank31/Counter_Turing_Test | http://arxiv.org/abs/2407.15694v1 |
489 | Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models | Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, Qianying Wang, Yaqiang Wu, Guang Dai, Ping Chen | 2024-07-22 | arXiv | https://github.com/Lackel/DKA | https://doi.org/10.48550/arXiv.2407.15346 |
490 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu, Zhenmei Shi, Yingyu Liang | 2024-07-22 | arXiv | https://github.com/OliverXUZY/LLM_Compose | https://doi.org/10.48550/arXiv.2407.15720 |
491 | BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM | Hanjun Luo, Haoyu Huang, Ziye Deng, Xuecheng Liu, Ruizhe Chen, Zuozhu Liu | 2024-07-21 | arXiv | https://github.com/BIGbench2024/BIGbench2024/ | http://arxiv.org/abs/2407.15240v2 |
492 | Large Language Model for Verilog Generation with Golden Code Feedback | Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan | 2024-07-21 | arXiv | https://github.com/CatIIIIIIII/veriseek | https://doi.org/10.48550/arXiv.2407.18271 |
493 | Navigation Instruction Generation with BEV Perception and Large Language Models | Sheng Fan, Rui Liu, Wenguan Wang, Yi Yang | 2024-07-21 | arXiv | https://github.com/FanScy/BEVInstructor | https://doi.org/10.48550/arXiv.2407.15087 |
494 | SynCPKL: Harnessing LLMs to Generate Synthetic Data for Commonsense Persona Knowledge Linking | Kuan-Yen Lin | 2024-07-21 | arXiv | https://github.com/irislin1006/CPKL | http://arxiv.org/abs/2407.15281v1 |
495 | On the Design and Analysis of LLM-Based Algorithms | Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou | 2024-07-20 | arXiv | https://github.com/modelscope/agentscope/tree/main/examples/paper_llm_based_algorithm | http://arxiv.org/abs/2407.14788v1 |
496 | Beyond Code Generation: Assessing Code LLM Maturity with Postconditions | Fusen He, Juan Zhai, Minxue Pan | 2024-07-19 | arXiv | https://github.com/MatureModel/PostcondGen | http://arxiv.org/abs/2407.14118v1 |
497 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu | 2024-07-19 | arXiv | https://www.github.com/wsntxxn/AttrEnhZsAc | https://doi.org/10.48550/arXiv.2407.14355 |
498 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Peng Cheng, Zhonghao Wang, Feiyu Xiong, Zhiyu Li | 2024-07-19 | arXiv | https://github.com/IAAR-Shanghai/ICSFSurvey | https://doi.org/10.48550/arXiv.2407.14507 |
499 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He, Henghui Ding, Xudong Jiang, Bihan Wen | 2024-07-18 | arXiv | https://heshuting555.github.io/SegPoint | https://doi.org/10.48550/arXiv.2407.13761 |
500 | ViLLa: Video Reasoning Segmentation with Large Language Model | Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao | 2024-07-18 | arXiv | https://github.com/rkzheng99/ViLLa | https://doi.org/10.48550/arXiv.2407.14500 |
501 | Leveraging Environment Interaction for Automated PDDL Generation and Planning with Large Language Models | Sadegh Mahdavi, Raquel Aoki, Keyi Tang, Yanshuai Cao | 2024-07-17 | arXiv | https://github.com/BorealisAI/llm-pddl-planning | https://doi.org/10.48550/arXiv.2407.12979 |
502 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang | 2024-07-17 | arXiv | https://github.com/kongds/E5-V | https://doi.org/10.48550/arXiv.2407.12580 |
503 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie | 2024-07-17 | arXiv | https://github.com/JiuTian-VL/MoME | https://doi.org/10.48550/arXiv.2407.12709 |
504 | Patch-Level Training for Large Language Models | Chenze Shao, Fandong Meng, Jie Zhou | 2024-07-17 | arXiv | https://github.com/shaochenze/PatchTrain | https://doi.org/10.48550/arXiv.2407.12665 |
505 | Robust Utility-Preserving Text Anonymization Based on Large Language Models | Tianyu Yang, Xiaodan Zhu, Iryna Gurevych | 2024-07-16 | arXiv | https://github.com/UKPLab/arxiv2024-rupta | https://doi.org/10.48550/arXiv.2407.11770 |
506 | VISA: Reasoning Video Object Segmentation via Large Language Models | Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves | 2024-07-16 | arXiv | https://github.com/cilinyan/VISA | https://doi.org/10.48550/arXiv.2407.11325 |
507 | LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices | Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee | 2024-07-16 | arXiv | https://github.com/onliwad101/FlexRound_LRQ | https://doi.org/10.48550/arXiv.2407.11534 |
508 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen | 2024-07-16 | arXiv | https://github.com/open-compass/opencompass | http://arxiv.org/abs/2407.11963v1 |
509 | Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models | Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun | 2024-07-16 | arXiv | https://github.com/jszheng21/RACE | https://doi.org/10.48550/arXiv.2407.11470 |
510 | By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting | Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong, Kimin Lee, Sung-Ju Lee | 2024-07-15 | arXiv | https://github.com/diamond264/ByMyEyes | https://doi.org/10.48550/arXiv.2407.10385 |
511 | Evaluating Large Language Models with fmeval | Pola Schwöbel, Luca Franceschi, Muhammad Bilal Zafar, Keerthan Vasist, Aman Malhotra, Tomer Shenhar, Pinal Tailor, Pinar Yilmaz, Michael Diamond, Michele Donini | 2024-07-15 | arXiv | https://github.com/aws/fmeval | https://doi.org/10.48550/arXiv.2407.12872 |
512 | IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization | Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang | 2024-07-15 | arXiv | https://github.com/DCDmllm/IDEAL_Summary | https://doi.org/10.48550/arXiv.2407.10486 |
513 | Learning Dynamics of LLM Finetuning | Yi Ren, Danica J. Sutherland | 2024-07-15 | arXiv | https://github.com/Joshua-Ren/Learning_dynamics_LLM | http://arxiv.org/abs/2407.10490v1 |
514 | Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models | Louis Abraham, Charles Arnal, Antoine Marie | 2024-07-15 | arXiv | https://prompt-ultra.github.io/ | https://doi.org/10.48550/arXiv.2407.10645 |
515 | Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang | 2024-07-15 | arXiv | https://github.com/qcznlp/uncertainty_attack | https://doi.org/10.48550/arXiv.2407.11282 |
516 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee | 2024-07-15 | arXiv | https://vgbench.github.io | https://doi.org/10.48550/arXiv.2407.10972 |
517 | When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments | Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang | 2024-07-15 | arXiv | https://github.com/MingyuJ666/Stockagent | https://doi.org/10.48550/arXiv.2407.18957 |
518 | Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models | Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo | 2024-07-14 | arXiv | https://github.com/Yuchen413/AnomalyRuler | https://doi.org/10.48550/arXiv.2407.10299 |
519 | ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning | Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang | 2024-07-14 | arXiv | https://github.com/Strong-AI-Lab/ChatLogic | https://doi.org/10.48550/arXiv.2407.10162 |
520 | LLMatic: Neural Architecture Search Via Large Language Models And Quality Diversity Optimization | Muhammad Umair Nasir, Sam Earle, Julian Togelius, Steven James, Christopher W. Cleghorn | 2024-07-14 | GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference | https://github.com/umair-nasir14/LLMatic | https://dl.acm.org/doi/10.1145/3638529.3654017 |
521 | Refusing Safe Prompts for Multi-modal Large Language Models | Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong | 2024-07-12 | arXiv | https://github.com/Sadcardation/MLLM-Refusal | https://doi.org/10.48550/arXiv.2407.09050 |
522 | Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu | 2024-07-12 | arXiv | https://github.com/RobustNLP/DeRTa | http://arxiv.org/abs/2407.09121v1 |
523 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu | 2024-07-12 | arXiv | https://github.com/oneal2000/EntityHallucination | https://doi.org/10.48550/arXiv.2407.09417 |
524 | Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection | Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu | 2024-07-12 | arXiv | https://github.com/GradiusTwinbee/GLIS | http://arxiv.org/abs/2407.08931v1 |
525 | Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors | Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan | 2024-07-12 | arXiv | https://github.com/eth-lre/verify-then-generate | https://doi.org/10.48550/arXiv.2407.09136 |
526 | GLBench: A Comprehensive Benchmark for Graph with Large Language Models | Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li | 2024-07-12 | arXiv | https://github.com/NineAbyss/GLBench | https://doi.org/10.48550/arXiv.2407.07457 |
527 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang, Linlu Qiu, Cheng-Yu Hsieh, Ranjay Krishna, Yoon Kim, James R. Glass | 2024-07-11 | arXiv | https://github.com/voidism/Lookback-Lens | https://doi.org/10.48550/arXiv.2407.07071 |
528 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma, Mingjie Sun, Zhiqiang Shen | 2024-07-11 | arXiv:2407.07093, 2024 | https://github.com/LiqunMa/FBI-LLM | http://arxiv.org/abs/2407.07093v1 |
529 | Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility | Yuchen Xia, Jize Zhang, Nasser Jazdi, Michael Weyrich | 2024-07-11 | arXiv | https://github.com/YuchenXia/GPT4IndustrialAutomation | https://doi.org/10.48550/arXiv.2407.08550 |
530 | Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov | 2024-07-11 | arXiv …, 2024 | https://github.com/project-metron/metron | http://arxiv.org/abs/2407.07000v1 |
531 | Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing | Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang | 2024-07-11 | arXiv | https://github.com/lucywang720/model-surgery | http://arxiv.org/abs/2407.08770v1 |
532 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen | 2024-07-11 | arXiv | https://github.com/TencentARC/SEED-Story | https://doi.org/10.48550/arXiv.2407.08683 |
533 | The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective | Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng | 2024-07-11 | arXiv | https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md | https://doi.org/10.48550/arXiv.2407.08583 |
534 | RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization | Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng | 2024-07-10 | arXiv | https://github.com/HuangOwen/RoLoRA | http://arxiv.org/abs/2407.08044v1 |
535 | Large Language Models are Learnable Planners for Long-Term Recommendation | Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/jizhi-zhang/BiLLP | https://dl.acm.org/doi/10.1145/3626772.3657683 |
536 | OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems | Shuyuan Xu, Wenyue Hua, Yongfeng Zhang | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/agiresearch/OpenP5 | https://dl.acm.org/doi/10.1145/3626772.3657883 |
537 | PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking | Yuzhang Xie, Jiaying Lu, Joyce Ho, Fadi B. Nahab, Xiao Hu, Carl Yang | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/constantjxyz/PromptLink | https://dl.acm.org/doi/10.1145/3626772.3657904 |
538 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang, Maonan Wang, Man-On Pun, Chung Shue Chen, Xi Xiong | 2024-07-10 | arXiv | https://github.com/Traffic-Alpha/iLLM-TSC | https://doi.org/10.48550/arXiv.2407.06025 |
539 | TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision | Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/skyriver-2000/TRAD-Official | https://dl.acm.org/doi/10.1145/3626772.3657788 |
540 | USimAgent: Large Language Models for Simulating Search Users | Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, Jiaxin Mao | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/Meow-E/USimAgent | https://dl.acm.org/doi/10.1145/3626772.3657963 |
541 | Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs | Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low | 2024-07-10 | arXiv e …, 2024 | https://github.com/aoi3142/Waterfall | http://arxiv.org/abs/2407.04411v2 |
542 | LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages | Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan | 2024-07-10 | arXiv:2407.05975, 2024 | https://github.com/CONE-MT/LLaMAX/ | http://arxiv.org/abs/2407.05975v1 |
543 | LLaRA: Large Language-Recommendation Assistant | Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/ljy0ustc/LLaRA | https://dl.acm.org/doi/10.1145/3626772.3657690 |
544 | Inference Performance Optimization for Large Language Models on CPUs | Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie | 2024-07-10 | arXiv | https://github.com/intel/xFasterTransformer | https://doi.org/10.48550/arXiv.2407.07304 |
545 | LLMBox: A Comprehensive Library for Large Language Models | Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen | 2024-07-10 | arXiv | https://github.com/RUCAIBox/LLMBox | https://doi.org/10.48550/arXiv.2407.05563 |
546 | IDGenRec: LLM-RecSys Alignment with Textual ID Learning | Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, Yongfeng Zhang | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/agiresearch/IDGenRec | https://dl.acm.org/doi/10.1145/3626772.3657821 |
547 | GraphGPT: Graph Instruction Tuning for Large Language Models | Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/HKUDS/GraphGPT | https://dl.acm.org/doi/10.1145/3626772.3657775 |
548 | GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing | Zhenyu Wang, Aoxue Li, Zhenguo Li, Xihui Liu | 2024-07-10 | arXiv:2407.05600, 2024 | https://zhenyuw16.github.io/GenArtist_page | http://arxiv.org/abs/2407.05600v1 |
549 | EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo | 2024-07-10 | arXiv | https://github.com/OpenGVLab/EfficientQAT | https://doi.org/10.48550/arXiv.2407.11062 |
550 | ChatUniTest: A Framework for LLM-Based Test Generation | Yinghao Chen, Zehao Hu, Chen Zhi, Junxiao Han, Shuiguang Deng, Jianwei Yin | 2024-07-10 | FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering | https://github.com/ZJU-ACES-ISE/ChatUniTest | https://dl.acm.org/doi/10.1145/3663529.3663801 |
551 | Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange | Ankit Satpute, Noah Giessing, Andre Greiner-Petter, Moritz Schubotz, Olaf Teschke, Akiko Aizawa, Bela Gipp | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/gipplab/LLM-Investig-MathStackExchange | https://dl.acm.org/doi/10.1145/3626772.3657945 |
552 | Are Large Language Models Good at Utility Judgments? | Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/ict-bigdatalab/utility_judgments | https://dl.acm.org/doi/10.1145/3626772.3657784 |
553 | A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models | Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon | 2024-07-10 | SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval | https://github.com/ielab/llm-rankers | https://dl.acm.org/doi/10.1145/3626772.3657813 |
554 | Etalon: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov | 2024-07-09 | arXiv | https://github.com/project-etalon/etalon | http://arxiv.org/abs/2407.07000v2 |
555 | DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | Luke Yoffe, Alfonso Amayuelas, William Yang Wang | 2024-07-08 | arXiv | https://github.com/lukeyoffe/debunc | https://doi.org/10.48550/arXiv.2407.06426 |
556 | KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions | Yanxu Zhu, Jinlin Xiao, Yuhang Wang, Jitao Sang | 2024-07-08 | arXiv | https://github.com/yanxuzhu/KG-FPQ | http://arxiv.org/abs/2407.05868v1 |
557 | Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression | Zhichao Xu, Ashim Gupta, Tao Li, Oliver Bentham, Vivek Srikumar | 2024-07-06 | arXiv | https://github.com/zhichaoxu-shufe/Beyond-Perplexity-Compression-Safety-Eval | http://arxiv.org/abs/2407.04965v2 |
558 | LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts | Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang | 2024-07-06 | arXiv | https://github.com/Yijia-Xiao/LogicVista | http://arxiv.org/abs/2407.04973v1 |
559 | When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions | Jérémy Perez, Corentin Léger, Grgur Kovač, Cédric Colas, Gaia Molinaro, Maxime Derex, Pierre-Yves Oudeyer, Clément Moulin-Frier | 2024-07-05 | arXiv | https://github.com/jeremyperez2/TelephoneGameLLM | http://arxiv.org/abs/2407.04503v1 |
560 | Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs | Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui | 2024-07-05 | arXiv | https://github.com/Mihir3009/Extract-AI | http://arxiv.org/abs/2407.04855v1 |
561 | BiosERC: Integrating Biography Speakers Supported by LLMs for ERC Tasks | Jieying Xue, Minh Phuong Nguyen, Blake Matheny, Le Minh Nguyen | 2024-07-05 | arXiv | https://github.com/yingjie7/BiosERC | http://arxiv.org/abs/2407.04279v1 |
562 | Automating Venture Capital: Founder assessment using LLM-powered segmentation, feature engineering and automated labeling techniques | Ekin Ozince, Yiğit Ihlamur | 2024-07-05 | arXiv | https://github.com/velapartners/moneyball-LLM-based-founder-features | http://arxiv.org/abs/2407.04885v1 |
563 | AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Mikhail Burtsev, Evgeny Burnaev | 2024-07-05 | arXiv | https://github.com/AIRI-Institute/AriGraph | http://arxiv.org/abs/2407.04363v1 |
564 | TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models | Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin | 2024-07-04 | arXiv | https://github.com/SCUT-DLVCLab/TongGu-LLM | https://doi.org/10.48550/arXiv.2407.03937 |
565 | NutriBench: A Dataset for Evaluating Large Language Models in Carbohydrate Estimation from Meal Descriptions | Andong Hua, Mehak Preet Dhaliwal, Ryan Burke, Laya Pullela, Yao Qin | 2024-07-04 | arXiv | https://mehak126.github.io/nutribench.html | https://doi.org/10.48550/arXiv.2407.12843 |
566 | AutoBench: Automatic Testbench Generation and Evaluation Using LLMs for HDL Design | Ruidi Qiu, Grace Li Zhang, Rolf Drechsler, Ulf Schlichtmann, Bing Li | 2024-07-04 | arXiv | https://github.com/AutoBench/AutoBench | http://arxiv.org/abs/2407.03891v1 |
567 | Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs | Sara Price, Arjun Panickssery, Sam Bowman, Asa Cooper Stickland | 2024-07-04 | arXiv | https://github.com/sbp354/Future_triggered_backdoors | http://arxiv.org/abs/2407.04108v1 |
568 | Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation | Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu, Bo An | 2024-07-04 | arXiv | https://github.com/mansicer/Q-Adapter | http://arxiv.org/abs/2407.03856v2 |
569 | The Price of Prompting: Profiling Energy Use in Large Language Models Inference | Erik Johannes Husom, Arda Goknil, Lwin Khin Shar, Sagar Sen | 2024-07-04 | arXiv | https://github.com/ejhusom/MELODI | https://doi.org/10.48550/arXiv.2407.16893 |
570 | GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models | Zike Yuan, Ming Liu, Hui Wang, Bing Qin | 2024-07-03 | arXiv | https://github.com/ZIKEYUAN/GraCoRe | https://doi.org/10.48550/arXiv.2407.02936 |
571 | Improving LLM Abilities in Idiomatic Translation | Sundesh Donthi, Maximilian Spencer, Om Patel, Joon Doh, Eid Rodan | 2024-07-03 | arXiv | https://github.com/ANON13222/ITR | http://arxiv.org/abs/2407.03518v1 |
572 | Enabling Discriminative Reasoning in LLMs for Legal Judgment Prediction | Chenlong Deng, Kelong Mao, Yuyao Zhang, Zhicheng Dou | 2024-07-02 | arXiv | https://github.com/ChenlongDeng/ADAPT | http://arxiv.org/abs/2407.01964v3 |
573 | TokenPacker: Efficient Visual Projector for Multimodal LLM | Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang | 2024-07-02 | arXiv | https://github.com/CircleRadon/TokenPacker | http://arxiv.org/abs/2407.02392v1 |
574 | Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation | Pablo Messina, René Vidal, Denis Parra, Alvaro Soto, Vladimir Araujo | 2024-07-02 | ACL | https://github.com/PabloMessina/CXR-Fact-Encoder | https://aclanthology.org/2024.findings-acl.236 |
575 | To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models | Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang | 2024-07-02 | arXiv | https://github.com/zjunlp/KnowUnDo | https://doi.org/10.48550/arXiv.2407.01920 |
576 | CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models | Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao | 2024-07-02 | arXiv | https://cfinbench.github.io/ | https://doi.org/10.48550/arXiv.2407.02301 |
577 | Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis | Chahat Raj, Anjishnu Mukherjee, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu | 2024-07-02 | arXiv | https://github.com/chahatraj/breakingbias | http://arxiv.org/abs/2407.02030v1 |
578 | Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Zihan Wang, Deli Chen, Damai Dai, Runxin Xu, Zhuoshu Li, Y. Wu | 2024-07-02 | arXiv | https://github.com/deepseek-ai/ESFT | https://doi.org/10.48550/arXiv.2407.01906 |
579 | Fine-grained, Multi-dimensional Summarization Evaluation with LLMs | Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour | 2024-07-01 | arXiv | https://github.com/DISL-Lab/FineSurE-ACL24 | http://arxiv.org/abs/2407.00908v2 |
580 | SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models | Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao | 2024-07-01 | arXiv | https://fduinc.github.io/splitlora/ | https://doi.org/10.48550/arXiv.2407.00952 |
581 | RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models | Liangliang Chen, Yutian Lei, Shiyu Jin, Ying Zhang, Liangjun Zhang | 2024-07-01 | IEEE Robotics and Automation Letters | https://rlingua.github.io | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10529514 |
582 | MIRAI: Evaluating LLM Agents for Event Forecasting | Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang | 2024-07-01 | arXiv | https://mirai-llm.github.io/ | http://arxiv.org/abs/2407.01231v1 |
583 | FineSurE: Fine-grained Summarization Evaluation using LLMs | Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour | 2024-07-01 | arXiv | https://github.com/DISL-Lab/FineSurE-ACL24 | http://arxiv.org/abs/2407.00908v1 |
584 | MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education | Shashank Sonkar, Naiming Liu, MyCo Le, Richard G. Baraniuk | 2024-07-01 | EMNLP | https://github.com/luffycodes/MalAlgoQA-Dataset | https://aclanthology.org/2024.findings-emnlp.913 |
585 | Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement | Zisu Huang, Xiaohua Wang, Feiran Zhang, Zhibo Xu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang | 2024-07-01 | arXiv | https://github.com/Huangzisu/query-refinement | https://doi.org/10.48550/arXiv.2407.01461 |
586 | EconNLI: Evaluating Large Language Models on Economics Reasoning | Yue Guo, Yi Yang | 2024-07-01 | arXiv | https://github.com/Irenehere/EconNLI | https://doi.org/10.48550/arXiv.2407.01212 |
587 | DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark | 2024-07-01 | arXiv | https://github.com/allenai/discoverybench | https://doi.org/10.48550/arXiv.2407.01725 |
588 | AutoFlow: Automated Workflow Generation for Large Language Model Agents | Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang | 2024-07-01 | arXiv | https://github.com/agiresearch/AutoFlow | https://doi.org/10.48550/arXiv.2407.12821 |
589 | Exploring Advanced Large Language Models with LLMsuite | Giorgio Roffo | 2024-07-01 | arXiv | https://github.com/giorgioroffo/large_language_models_open_suite | https://doi.org/10.48550/arXiv.2407.12036 |
590 | LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation | Mushui Liu, Yuhang Ma, Yang Zhen, Jun Dan, Yunlong Yu, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan | 2024-06-30 | arXiv | https://xiaobul.github.io/LLM4GEN/ | http://arxiv.org/abs/2407.00737v1 |
591 | GraphArena: Benchmarking Large Language Models on Graph Computational Problems | Jianheng Tang, Qifan Zhang, Yuhan Li, Jia Li | 2024-06-29 | arXiv | https://github.com/squareRoot3/GraphArena | https://doi.org/10.48550/arXiv.2407.00379 |
592 | LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement | Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan | 2024-06-29 | arXiv | https://yingjiahao14.github.io/LLMs-as-Instructors-pages/ | http://arxiv.org/abs/2407.00497v1 |
593 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen | 2024-06-28 | arXiv | https://mbzuai-llm.github.io/webpage2code/ | http://arxiv.org/abs/2406.20098v1 |
594 | Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring | Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He | 2024-06-28 | arXiv | https://github.com/lijiazheng99/thought_tree_assessment | http://arxiv.org/abs/2406.19949v1 |
595 | MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? | Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang | 2024-06-28 | arXiv | https://mm-robobench.github.io/ | http://arxiv.org/abs/2406.19693v1 |
596 | YuLan: An Open-source Large Language Model | Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao, Yueguo Chen, Weizheng Lu, Ji-Rong Wen | 2024-06-28 | arXiv | https://github.com/RUC-GSAI/YuLan-Chat | https://doi.org/10.48550/arXiv.2406.19853 |
597 | Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models | Yang Yan, Lizhi Ma, Anqi Li, Jingsong Ma, Zhenzhong Lan | 2024-06-27 | arXiv | https://github.com/kuri-leo/BigFive-LLM-Predictor | https://doi.org/10.48550/arXiv.2406.17287 |
598 | Large Language Models are Interpretable Learners | Ruochen Wang, Si Si, Felix Yu, Dorothea Wiesmann, Cho-Jui Hsieh, Inderjit S. Dhillon | 2024-06-27 | arXiv | https://github.com/ruocwang/llm-symbolic-program | https://doi.org/10.48550/arXiv.2406.17224 |
599 | STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis | Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang Jing, Haining Tan, Jingping Bi | 2024-06-27 | arXiv | https://github.com/LwbXc/STBench | https://doi.org/10.48550/arXiv.2406.19065 |
600 | Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li | 2024-06-27 | arXiv …, 2024 | https://github.com/MozerWang/Loong | http://arxiv.org/abs/2406.17419v1 |
601 | DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model | Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao | 2024-06-27 | arXiv | https://github.com/season1blue/DIM | https://doi.org/10.48550/arXiv.2407.12019 |
602 | Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning | Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo | 2024-06-27 | arXiv | https://github.com/kaistAI/knowledge-reasoning | https://doi.org/10.48550/arXiv.2406.19502 |
603 | Dual-Space Knowledge Distillation for Large Language Models | Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, Jinan Xu | 2024-06-27 | arXiv | https://github.com/songmzhang/DSKD | https://doi.org/10.48550/arXiv.2406.17328 |
604 | Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Utilization | Miyoung Ko, Sue Hyun Park, Joonsuk Park, Minjoon Seo | 2024-06-27 | arXiv | https://github.com/kaistAI/knowledge-reasoning | http://arxiv.org/abs/2406.19502v2 |
605 | A Review of Large Language Models and Autonomous Agents in Chemistry | Mayk Caldas Ramos, Christopher J. Collison, Andrew D. White | 2024-06-26 | arXiv | https://github.com/ur-whitelab/LLMs-in-science | https://doi.org/10.48550/arXiv.2407.01603 |
606 | Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs | Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang | 2024-06-26 | arXiv | https://github.com/Hambaobao/HCP-Coder | http://arxiv.org/abs/2406.18294v2 |
607 | Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | Guanting Dong, Yutao Zhu, Chenghao Zhang, Zechen Wang, Zhicheng Dou, Ji-Rong Wen | 2024-06-26 | arXiv | https://github.com/dongguanting/DPA-RAG | http://arxiv.org/abs/2406.18676v1 |
608 | The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Meinardus Boris, Batra Anil, Rohrbach Anna, Rohrbach Marcus | 2024-06-26 | arXiv | https://github.com/sudo-Boris/mr-Blip | https://doi.org/10.48550/arXiv.2406.18113 |
609 | Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia | 2024-06-26 | arXiv | https://github.com/dvlab-research/Step-DPO | http://arxiv.org/abs/2406.18629v1 |
610 | Selective Prompting Tuning for Personalized Conversations with LLMs | Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang | 2024-06-26 | OpenReview | https://github.com/hqsiswiliam/SPT | http://arxiv.org/abs/2406.18187v1 |
611 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong | 2024-06-26 | arXiv | https://github.com/danshi777/IRCAN | http://arxiv.org/abs/2406.18406v1 |
612 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen | 2024-06-26 | arXiv | https://charxiv.github.io/ | http://arxiv.org/abs/2406.18521v1 |
613 | ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa | 2024-06-26 | arXiv | http://github.com/ahmedheakl/arazn-llm | http://arxiv.org/abs/2406.18120v1 |
614 | A Closer Look into Mixture-of-Experts in Large Language Models | Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, Jie Fu | 2024-06-26 | arXiv | https://github.com/kamanphoebe/Look-into-MoEs | https://doi.org/10.48550/arXiv.2406.18219 |
615 | BADGE: BADminton report Generation and Evaluation with LLM | Shang-Hsuan Chiang, Lin-Wei Chao, Kuang-Da Wang, Chih-Chuan Wang, Wen-Chih Peng | 2024-06-26 | arXiv | https://github.com/AndyChiangSH/BADGE | http://arxiv.org/abs/2406.18116v1 |
616 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake, Eunsol Choi, Greg Durrett | 2024-06-25 | arXiv | https://github.com/thomlake/investigating-alignment | https://doi.org/10.48550/arXiv.2406.17692 |
617 | TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot | Kaiqi Zhang, Shuai Yuan, Honghan Zhao | 2024-06-25 | arXiv | https://github.com/zlkqz/auto_eval | http://arxiv.org/abs/2407.10999v1 |
618 | T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang | 2024-06-25 | arXiv | https://github.com/microsoft/T-MAC | http://arxiv.org/abs/2407.00088v1 |
619 | Retrieval Augmented Instruction Tuning for Open NER with Large Language Models | Tingyu Xie, Jian Zhang, Yan Zhang, Yuanyuan Liang, Qi Li, Hongwei Wang | 2024-06-25 | arXiv | https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER | https://doi.org/10.48550/arXiv.2406.17305 |
620 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary, Vikas Yadav, Hoang Nguyen, Khyati Mahajan, Sathwik Tejaswi Madhusudhan | 2024-06-25 | arXiv | https://github.com/ServiceNow/M2Lingual | https://doi.org/10.48550/arXiv.2406.16783 |
621 | Large Language Models Are Cross-Lingual Knowledge-Free Reasoners | Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang | 2024-06-25 | arXiv | https://github.com/NJUNLP/Knowledge-Free-Reasoning | https://doi.org/10.48550/arXiv.2406.16655 |
622 | Improving Arithmetic Reasoning Ability of Large Language Models through Relation Tuples, Verification and Dynamic Feedback | Zhongtao Miao, Kaiyan Zhao, Yoshimasa Tsuruoka | 2024-06-25 | arXiv | https://github.com/gpgg/art | https://doi.org/10.48550/arXiv.2406.17873 |
623 | Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels | Razvan-Gabriel Dumitru, Vikas Yadav, Rishabh Maheshwary, Paul-Ioan Clotan, Sathwik Tejaswi Madhusudhan, Mihai Surdeanu | 2024-06-25 | arXiv | https://github.com/RazvanDu/LayerwiseQuant | http://arxiv.org/abs/2406.17415v2 |
624 | DemoRank: Selecting Effective Demonstrations for Large Language Models in Ranking Task | Wenhan Liu, Yutao Zhu, Zhicheng Dou | 2024-06-25 | arXiv | https://github.com/8421BCD/DemoRank | https://doi.org/10.48550/arXiv.2406.16332 |
625 | ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models | Yash Akhauri, Ahmed F. AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M. Rush, Safeen Huda, Mohamed S. Abdelfattah | 2024-06-25 | arXiv | https://github.com/abdelfattah-lab/shadow_llm/ | https://doi.org/10.48550/arXiv.2406.16635 |
626 | AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang | 2024-06-25 | arXiv | https://github.com/thu-coai/AutoDetect | https://doi.org/10.48550/arXiv.2406.16714 |
627 | Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee | 2024-06-25 | arXiv | https://github.com/HZQ950419/Math-LLaVA | https://doi.org/10.48550/arXiv.2406.17294 |
628 | Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models | Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi Nakamura, Neeraj Varshney, Chitta Baral | 2024-06-25 | arXiv | https://github.com/Mihir3009/Multi-LogiEval | https://doi.org/10.48550/arXiv.2406.17169 |
629 | Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | Aashiq Muhamed, Oscar Li, David Woodruff, Mona Diab, Virginia Smith | 2024-06-25 | arXiv | https://github.com/aashiqmuhamed/GRASS | http://arxiv.org/abs/2406.17660v1 |
630 | Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers | Xiuying Wei, Skander Moalla, Razvan Pascanu, Caglar Gulcehre | 2024-06-25 | arXiv:2406.16450, 2024 | https://github.com/CLAIRE-Labo/StructuredFFN/tree/main | http://arxiv.org/abs/2406.16450v1 |
631 | Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework | Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin | 2024-06-25 | arXiv | https://github.com/Bernard-Yang/SimsChat | http://arxiv.org/abs/2406.17962v3 |
632 | DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph | Zhehao Zhang, Jiaao Chen, Diyi Yang | 2024-06-25 | arXiv | https://github.com/SALT-NLP/DARG | https://doi.org/10.48550/arXiv.2406.17271 |
633 | FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models | Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko | 2024-06-24 | arXiv | https://github.com/IAAR-Shanghai/FastMem | https://doi.org/10.48550/arXiv.2406.16069 |
634 | Can LLM Graph Reasoning Generalize beyond Pattern Memorization? | Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov | 2024-06-24 | arXiv …, 2024 | https://github.com/MatthewYZhang/NLGift | http://arxiv.org/abs/2406.15992v1 |
635 | Can LLM be a Personalized Judge? | Yijiang River Dong, Tiancheng Hu, Nigel Collier | 2024-06-24 | arXiv e-prints, 2024 | https://github.com/dong-river/Personalized-Judge | http://arxiv.org/abs/2406.11657v1 |
636 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang | 2024-06-24 | arXiv | https://github.com/THUDM | https://doi.org/10.48550/arXiv.2406.12793 |
637 | Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models | Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang | 2024-06-24 | arXiv | https://github.com/google-research/crosslingual-knowledge-barriers | https://doi.org/10.48550/arXiv.2406.16135 |
638 | EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models | Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li | 2024-06-24 | arXiv | https://sais-fuxi.github.io/projects/evalalign/ | http://arxiv.org/abs/2406.16562v2 |
639 | Efficient Evolutionary Search Over Chemical Space with Large Language Models | Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Streith-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang | 2024-06-24 | arXiv | http://github.com/zoom-wang112358/MOLLEO | https://doi.org/10.48550/arXiv.2406.16976 |
640 | FS-RAG: A Frame Semantics Based Approach for Improved Factual Accuracy in Large Language Models | Harish Tayyar Madabushi | 2024-06-24 | arXiv | https://github.com/H-TayyarMadabushi/A-Frame-Semantics-based-approach-for-Improved-Factual-Accuracy-in-Large-Language-Models | https://doi.org/10.48550/arXiv.2406.16167 |
641 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal | 2024-06-24 | arXiv | https://github.com/kiddyboots216/lottery-ticket-adaptation | http://arxiv.org/abs/2406.16797v2 |
642 | Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models | Yichen Sun, Zhixuan Chu, Zhan Qin, Kui Ren | 2024-06-24 | arXiv | https://github.com/TruthAI-Lab/PCIG | http://arxiv.org/abs/2406.16333v1 |
643 | AudioBench: A Universal Benchmark for Audio Large Language Models | Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen | 2024-06-24 | arXiv | https://github.com/AudioLLMs/AudioBench | https://doi.org/10.48550/arXiv.2406.16020 |
644 | The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models | Jiajia Li, Lu Yang, Mingni Tang, Chenchong Chenchong, Zuchao Li, Ping Wang, Hai Zhao | 2024-06-23 | ACL | https://github.com/zcli-charlie/ZIQI-Eval | https://aclanthology.org/2024.findings-acl.194 |
645 | Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level | Zhaopeng Feng, Ruizhe Chen, Yan Zhang, Zijie Meng, Zuozhu Liu | 2024-06-23 | arXiv:2406.15741, 2024 | https://github.com/fzp0424/Ladder | http://arxiv.org/abs/2406.15741v1 |
646 | RuleR: Improving LLM Controllability by Rule-based Data Recycling | Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou | 2024-06-23 | arXiv …, 2024 | https://github.com/MingLiiii/RuleR | http://arxiv.org/abs/2406.15938v1 |
647 | Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | Qi Liu, Bo Wang, Nan Wang, Jiaxin Mao | 2024-06-22 | arXiv | https://github.com/liuqi6777/pe_rank | https://doi.org/10.48550/arXiv.2406.14848 |
648 | Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph | Roman Vashurin, Ekaterina Fadeeva, Artem Vazhentsev, Lyudmila Rvanova, Akim Tsvigun, Daniil Vasilev, Rui Xing, Abdelrahman Boda Sadallah, Kirill Grishchenkov, Sergey Petrakov, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov, Artem Shelmanov | 2024-06-22 | arXiv | https://github.com/IINemo/lm-polygraph | https://doi.org/10.48550/arXiv.2406.15627 |
649 | video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang | 2024-06-22 | arXiv | https://github.com/bytedance/SALMONN/ | https://doi.org/10.48550/arXiv.2406.15704 |
650 | SS-GEN: A Social Story Generation Framework with Large Language Models | Yi Feng, Mingyang Song, Jiaqi Wang, Zhuang Chen, Guanqun Bi, Minlie Huang, Liping Jing, Jian Yu | 2024-06-22 | arXiv | https://github.com/MIMIFY/SS-GEN | http://arxiv.org/abs/2406.15695v2 |
651 | MT-Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level | Zhaopeng Feng, Yan Zhang, Ruizhe Chen, Zijie Meng, Zuozhu Liu | 2024-06-22 | arXiv | https://github.com/fzp0424/Ladder | http://arxiv.org/abs/2406.15741v2 |
652 | Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration | Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin | 2024-06-22 | arXiv | https://github.com/GATECH-EIC/ACT | https://doi.org/10.48550/arXiv.2406.15765 |
653 | InternLM-Law: An Open Source Chinese Legal Large Language Model | Zhiwei Fei, Songyang Zhang, Xiaoyu Shen, Dawei Zhu, Xiao Wang, Maosong Cao, Fengzhe Zhou, Yining Li, Wenwei Zhang, Dahua Lin, Kai Chen, Jidong Ge | 2024-06-22 | arXiv | https://github.com/InternLM/InternLM-Law | https://doi.org/10.48550/arXiv.2406.14887 |
654 | Identifying Inaccurate Descriptions in LLM-generated Code Comments via Test Execution | Sungmin Kang, Louis Milliken, Shin Yoo | 2024-06-22 | arXiv:2406.14836, 2024 | https://smkang96.github.io/assets/pdf/doctest_supplementary_arxiv.pdf | http://arxiv.org/abs/2406.14836v1 |
655 | ICLEval: Evaluating In-Context Learning Ability of Large Language Models | Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen | 2024-06-22 | arXiv | https://github.com/yiye3/ICLEval | https://doi.org/10.48550/arXiv.2406.14955 |
656 | GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models | Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He | 2024-06-22 | arXiv | https://github.com/GIEBench/GIEBench | https://doi.org/10.48550/arXiv.2406.14903 |
657 | Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation | Keqin Bao, Jizhi Zhang, Yang Zhang, Xinyue Huo, Chong Chen, Fuli Feng | 2024-06-22 | arXiv …, 2024 | https://github.com/SAI990323/DecodingMatters | http://arxiv.org/abs/2406.14900v1 |
658 | ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models | Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Jian Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang | 2024-06-21 | arXiv | https://github.com/haidequanbu/ESC-Eval | https://doi.org/10.48550/arXiv.2406.14952 |
659 | GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians | Haoyang Liu, Haohan Wang | 2024-06-21 | arXiv | https://github.com/Liu-Hy/GenoTex | http://arxiv.org/abs/2406.15341v1 |
660 | MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang | 2024-06-21 | arXiv | https://github.com/thu-nics/MoA | https://doi.org/10.48550/arXiv.2406.14909 |
661 | OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants | Jaspreet Ranjit, Brihi Joshi, Rebecca Dorn, Laura Petry, Olga Koumoundouros, Jayne Bottarini, Peichen Liu, Eric Rice, Swabha Swayamdipta | 2024-06-21 | arXiv | https://dill-lab.github.io/oath-frames/ | http://arxiv.org/abs/2406.14883v1 |
662 | FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving | Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang | 2024-06-20 | arXiv | https://fveler.github.io/ | https://doi.org/10.48550/arXiv.2406.14408 |
663 | Taxonomy-Guided Zero-Shot Recommendations with LLMs | Yueqing Liang, Liangwei Yang, Chen Wang, Xiongxiao Xu, Philip S. Yu, Kai Shu | 2024-06-20 | arXiv | https://github.com/yueqingliang1/TaxRec | http://arxiv.org/abs/2406.14043v1 |
664 | ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation | Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu | 2024-06-20 | arXiv | https://github.com/openpsi-project/ReaLHF | https://doi.org/10.48550/arXiv.2406.14088 |
665 | QPaug: Question and Passage Augmentation for Open-Domain Question Answering of LLMs | Minsang Kim, Cheoneum Park, Seungjun Baek | 2024-06-20 | arXiv | https://github.com/kmswin1/QPaug | http://arxiv.org/abs/2406.14277v2 |
666 | MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models | Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia | 2024-06-20 | arXiv | https://randolph-zeng.github.io/Mr-Ben.github.io/ | https://doi.org/10.48550/arXiv.2406.13975 |
667 | CityBench: Evaluating the Capabilities of Large Language Model as World Model | Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li | 2024-06-20 | arXiv | https://github.com/tsinghua-fib-lab/CityBench | https://doi.org/10.48550/arXiv.2406.13945 |
668 | Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective | Yuchen Wen, Keping Bi, Wei Chen, Jiafeng Guo, Xueqi Cheng | 2024-06-20 | arXiv | https://github.com/wen112358/ImplicitBiasPsychometricEvaluation | https://doi.org/10.48550/arXiv.2406.14023 |
669 | CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li | 2024-06-20 | arXiv | https://github.com/tsinghua-fib-lab/CityGPT | https://doi.org/10.48550/arXiv.2406.13948 |
670 | CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines | Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai | 2024-06-20 | arXiv | https://github.com/amademicnoboday12/CEBench | http://arxiv.org/abs/2407.12797v1 |
671 | APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking | Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas | 2024-06-20 | arXiv | https://github.com/jincan333/APEER | https://doi.org/10.48550/arXiv.2406.14449 |
672 | BeHonest: Benchmarking Honesty of Large Language Models | Steffi Chern, Zhulin Hu, Yuqing Yang, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, Pengfei Liu | 2024-06-19 | arXiv | https://github.com/GAIR-NLP/BeHonest | https://doi.org/10.48550/arXiv.2406.13261 |
673 | Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators | Matéo Mahaut, Laura Aina, Paula Czarnowska, Momchil Hardalov, Thomas Müller, Lluís Màrquez | 2024-06-19 | OpenReview | https://github.com/amazon-science/factual-confidence-of-llms | http://arxiv.org/abs/2406.13415v1 |
674 | Finding Blind Spots in Evaluator LLMs with Interpretable Checklists | Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra | 2024-06-19 | arXiv | https://github.com/AI4Bharat/FBI | http://arxiv.org/abs/2406.13439v1 |
675 | Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata | Mykhailo Poliakov, Nadiya Shvai | 2024-06-19 | arXiv | https://github.com/mxpoliakov/Multi-Meta-RAG | http://arxiv.org/abs/2406.13213v1 |
676 | Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou | 2024-06-19 | arXiv | https://github.com/QwenLM/AutoIF | https://doi.org/10.48550/arXiv.2406.13542 |
677 | Low-Redundant Optimization for Large Language Model Alignment | Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen | 2024-06-18 | arXiv | https://github.com/RUCAIBox/ALLO | https://doi.org/10.48550/arXiv.2406.12606 |
678 | TourLLM: Enhancing LLMs with Tourism Knowledge | Qikai Wei, Mingzhi Yang, Jinqiang Wang, Wenwei Mao, Jiabo Xu, Huansheng Ning | 2024-06-18 | arXiv | https://github.com/mrweiqk/Cultour | http://arxiv.org/abs/2407.12791v1 |
679 | Stealth edits to large language models | Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Y. Tyukin | 2024-06-18 | arXiv | https://github.com/qinghua-zhou/stealth-edits | http://arxiv.org/abs/2406.12670v2 |
680 | Stealth edits for provably fixing or attacking large language models | Oliver J. Sutton, Qinghua Zhou, Wei Wang, Desmond J. Higham, Alexander N. Gorban, Alexander Bastounis, Ivan Yu. Tyukin | 2024-06-18 | arXiv | https://github.com/qinghua-zhou/stealth-edits | https://doi.org/10.48550/arXiv.2406.12670 |
681 | SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation | Xiaoze Liu, Ting Sun, Tianyang Xu, Feijie Wu, Cunxiang Wang, Xiaoqian Wang, Jing Gao | 2024-06-18 | arXiv | https://github.com/xz-liu/SHIELD | http://arxiv.org/abs/2406.12975v1 |
682 | MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction | Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, Qiaoyu Tan | 2024-06-18 | arXiv | https://github.com/NYUSHCS/MolecularGPT | https://doi.org/10.48550/arXiv.2406.12950 |
683 | CollabStory: Multi-LLM Collaborative Story Generation and Authorship Analysis | Saranya Venkatraman, Nafis Irtiza Tripto, Dongwon Lee | 2024-06-18 | arXiv | https://github.com/saranya-venkatraman/multi_llm_story_writing | http://arxiv.org/abs/2406.12665v1 |
684 | IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models | Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin | 2024-06-18 | arXiv | https://ipeval.github.io/ | https://doi.org/10.48550/arXiv.2406.12386 |
685 | Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on Large Language Models | Eldar Kurtic, Amir Moeini, Dan Alistarh | 2024-06-18 | arXiv | https://github.com/IST-DASLab/Mathador-LM | https://doi.org/10.48550/arXiv.2406.12572 |
686 | CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework | Shaohuang Wang, Lun Wang, Yunhan Bu, Tianwei Huang | 2024-06-18 | arXiv | https://github.com/xxxxxx | http://arxiv.org/abs/2406.12243v1 |
687 | AgentReview: Exploring Peer Review Dynamics with LLM Agents | Yiqiao Jin, Qinlin Zhao, Yiyang Wang, Hao Chen, Kaijie Zhu, Yijia Xiao, Jindong Wang | 2024-06-18 | arXiv | https://agentreview.github.io/ | http://arxiv.org/abs/2406.12708v1 |
688 | TroL: Traversal of Layers for Large Language and Vision Models | Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro | 2024-06-18 | arXiv | https://github.com/ByungKwanLee/TroL | https://doi.org/10.48550/arXiv.2406.12246 |
689 | Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Chuchu Han, Xiaonan Huang, Changxin Gao, Yuehuan Wang, Nong Sang | 2024-06-18 | arXiv | https://github.com/pipixin321/HolmesVAD | http://arxiv.org/abs/2406.12235v1 |
690 | DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models | Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen | 2024-06-17 | arXiv | https://github.com/eosphoros-ai/DB-GPT-Hub | https://doi.org/10.48550/arXiv.2406.11434 |
691 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou | 2024-06-17 | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) | https://showlab.github.io/videollm-online | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10657274 |
692 | Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models | Yuqing Wang, Yun Zhao, Sara Alessandra Keller, Anne A. H. de Hond, Marieke M. van Buchem, Malvika Pillai, Tina Hernandez-Boussard | 2024-06-17 | arXiv | https://github.com/EternityYW/BiasEval-LLM-MentalHealth | https://doi.org/10.48550/arXiv.2406.12033 |
693 | Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models | Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang | 2024-06-17 | arXiv | https://github.com/FsFrancis15/BrainLLM | https://doi.org/10.48550/arXiv.2406.11568 |
694 | Soft Prompting for Unlearning in Large Language Models | Karuna Bhaila, Minh-Hao Van, Xintao Wu | 2024-06-17 | arXiv | https://github.com/karuna-bhaila/llm_unlearning | https://doi.org/10.48550/arXiv.2406.12038 |
695 | Probing the Decision Boundaries of In-context Learning in Large Language Models | Siyan Zhao, Tung Nguyen, Aditya Grover | 2024-06-17 | arXiv | https://github.com/siyan-zhao/ICL_decision_boundary | https://doi.org/10.48550/arXiv.2406.11233 |
696 | Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, Hao Wang | 2024-06-17 | arXiv | https://github.com/Wang-ML-Lab/multimodal-needle-in-a-haystack | https://doi.org/10.48550/arXiv.2406.11230 |
697 | Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | Fangzhi Xu, Qiushi Sun, Kanzhi Cheng, Jun Liu, Yu Qiao, Zhiyong Wu | 2024-06-17 | arXiv | https://github.com/xufangzhi/ENVISIONS | https://doi.org/10.48550/arXiv.2406.11736 |
698 | ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking | Wenshuo Li, Xinghao Chen, Han Shu, Yehui Tang, Yunhe Wang | 2024-06-17 | Proceedings of Machine Learning Research | https://github.com/Gaffey/ExCP | http://arxiv.org/abs/2406.11257v1 |
699 | Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging | Priyanka Kargupta, Ishika Agarwal, Dilek Hakkani-Tur, Jiawei Han | 2024-06-17 | arXiv | http://github.com/agarwalishika/TreeInstruct | http://arxiv.org/abs/2406.11709v2 |
700 | AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval | Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou | 2024-06-17 | arXiv | https://github.com/zou-group/avatar | http://arxiv.org/abs/2406.11200v2 |
701 | mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen | 2024-06-17 | arXiv | https://feiwang96.github.io/mDPO | https://doi.org/10.48550/arXiv.2406.11839 |
702 | Tokenization Falling Short: On Subword Robustness in Large Language Models | Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li | 2024-06-17 | EMNLP | https://github.com/FloatAI/TKEval | https://aclanthology.org/2024.findings-emnlp.86 |
703 | MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model | Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu | 2024-06-17 | arXiv | https://github.com/Z1zs/MMNeuron | https://doi.org/10.48550/arXiv.2406.11193 |
704 | GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation | Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng | 2024-06-17 | arXiv | https://github.com/Lanyu0303/GeoGPT4V_Project | https://doi.org/10.48550/arXiv.2406.11503 |
705 | Problematic Tokens: Tokenizer Bias in Large Language Models | Jin Yang, Zhiqiang Wang, Yanbin Lin, Zunduo Zhao | 2024-06-17 | arXiv | https://github.com/yeyimilk/LLMGPT4o | http://arxiv.org/abs/2406.11214v3 |
706 | Investigating Annotator Bias in Large Language Models for Hate Speech Detection | Amit Das, Zheng Zhang, Najib Hasan, Souvika Sarkar, Fatemeh Jamshidi, Tathagata Bhattacharya, Mostafa Rahgouy, Nilanjana Raychawdhary, Dongji Feng, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals | 2024-06-17 | arXiv | https://github.com/AmitDasRup123/HateSpeechCorpus | https://doi.org/10.48550/arXiv.2406.11109 |
707 | LLaNA: Large Language and NeRF Assistant | Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano | 2024-06-17 | arXiv | https://andreamaduzzi.github.io/llana/ | https://doi.org/10.48550/arXiv.2406.11840 |
708 | AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning | Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou | 2024-06-17 | arXiv | https://github.com/zou-group/avatar | http://arxiv.org/abs/2406.11200v3 |
709 | RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models | Yuqing Wang, Yun Zhao | 2024-06-16 | arXiv | https://github.com/EternityYW/RUPBench | https://doi.org/10.48550/arXiv.2406.11020 |
710 | Toward Optimal LLM Alignments Using Two-Player Games | Rui Zheng, Hongyi Guo, Zhihan Liu, Xiaoying Zhang, Yuanshun Yao, Xiaojun Xu, Zhaoran Wang, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang, Hang Li, Yang Liu | 2024-06-16 | arXiv | https://github.com/ruizheng20/gpo | http://arxiv.org/abs/2406.10977v1 |
711 | SCAR: Efficient Instruction-Tuning for Large Language Models via Style Consistency-Aware Response Ranking | Zhuang Li, Yuncheng Hua, Thuy-Trang Vu, Haolan Zhan, Lizhen Qu, Gholamreza Haffari | 2024-06-16 | arXiv | https://github.com/zhuang-li/SCAR | https://doi.org/10.48550/arXiv.2406.10882 |
712 | RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models | Zhuoran Jin, Pengfei Cao, Chenhao Wang, Zhitao He, Hongbang Yuan, Jiachun Li, Yubo Chen, Kang Liu, Jun Zhao | 2024-06-16 | arXiv | http://rwku-bench.github.io | https://doi.org/10.48550/arXiv.2406.10890 |
713 | A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery | Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han | 2024-06-16 | arXiv | https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models | https://doi.org/10.48550/arXiv.2406.10833 |
714 | Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference | Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han | 2024-06-16 | Proceedings of Machine Learning Research | http://github.com/mit-han-lab/Quest | http://arxiv.org/abs/2406.10774v1 |
715 | Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies | Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu | 2024-06-16 | arXiv | https://ander1119.github.io/TiM | https://doi.org/10.48550/arXiv.2406.10923 |
716 | GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun | 2024-06-16 | arXiv | https://gui-world.github.io/ | http://arxiv.org/abs/2406.10819v1 |
717 | A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth | 2024-06-16 | arXiv | https://github.com/bowen-upenn/llm_token_bias | https://doi.org/10.48550/arXiv.2406.11050 |
718 | StructBench: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding | Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao | 2024-06-15 | arXiv | https://github.com/MikeGu721/StructBench | http://arxiv.org/abs/2406.10621v1 |
719 | StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text | Zhouhong Gu, Haoning Ye, Xingzhou Chen, Zeyang Zhou, Hongwei Feng, Yanghua Xiao | 2024-06-15 | arXiv | https://github.com/MikeGu721/StrucText-Eval | http://arxiv.org/abs/2406.10621v3 |
720 | StrucText-Eval: An Autogenerated Benchmark for Evaluating Large Language Model's Ability in Structure-Rich Text Understanding | Zhouhong Gu, Haoning Ye, Zeyang Zhou, Hongwei Feng, Yanghua Xiao | 2024-06-15 | arXiv | https://github.com/MikeGu721/StrucText-Eval | https://doi.org/10.48550/arXiv.2406.10621 |
721 | Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox | Yijun Liu, Yuan Meng, Fang Wu, Shenhao Peng, Hang Yao, Chaoyu Guan, Chen Tang, Xinzhu Ma, Zhi Wang, Wenwu Zhu | 2024-06-15 | arXiv | https://github.com/TsingmaoAI/MI-optimize | http://arxiv.org/abs/2406.12928v1 |
722 | What is the best model? Application-driven Evaluation for Large Language Models | Shiguo Lian, Kaikai Zhao, Xinhui Liu, Xuejiao Lei, Bikun Yang, Wenjing Zhang, Kai Wang, Zhaoxiang Liu | 2024-06-14 | arXiv | https://github.com/UnicomAI/DataSet/tree/main/TestData/GeneralAbility | https://doi.org/10.48550/arXiv.2406.10307 |
723 | BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages | Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh | 2024-06-14 | arXiv | https://github.com/nlee0212/BLEnD | http://arxiv.org/abs/2406.09948v1 |
724 | Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein | 2024-06-14 | arXiv | https://github.com/ahans30/goldfish-loss | http://arxiv.org/abs/2406.10209v1 |
725 | CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models | Wenjing Zhang, Xuejiao Lei, Zhaoxiang Liu, Meijuan An, Bikun Yang, Kaikai Zhao, Kai Wang, Shiguo Lian | 2024-06-14 | arXiv | https://github.com/UnicomAI/DataSet/tree/main/TestData/Safety | https://doi.org/10.48550/arXiv.2406.10311 |
726 | CliBench: A Multifaceted and Multigranular Evaluation of Large Language Models for Clinical Decision Making | Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang | 2024-06-14 | arXiv | https://clibench.github.io | http://arxiv.org/abs/2406.09923v2 |
727 | CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions | Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S. Chang, Wei Wang | 2024-06-14 | arXiv | https://clibench.github.io | https://doi.org/10.48550/arXiv.2406.09923 |
728 | First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models | Enming Zhang, Ruobing Yao, Huanyong Liu, Junhui Yu, Jiale Wang | 2024-06-14 | arXiv | https://github.com/360AILAB-NLP/FlowCE | https://doi.org/10.48550/arXiv.2406.10057 |
729 | JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models | Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang | 2024-06-13 | arXiv | https://github.com/ThuCCSLab/JailbreakEval | https://doi.org/10.48550/arXiv.2406.09321 |
730 | Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs | Weixuan Wang, Barry Haddow, Minghao Wu, Wei Peng, Alexandra Birch | 2024-06-13 | arXiv | https://github.com/weixuan-wang123/multilingual-neurons | http://arxiv.org/abs/2406.09265v2 |
731 | Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? | Zhaochen Su, Juntao Li, Jun Zhang, Tong Zhu, Xiaoye Qu, Pan Zhou, Yan Bowen, Yu Cheng, Min Zhang | 2024-06-13 | arXiv | https://github.com/zhaochen0110/Cotempqa | https://doi.org/10.48550/arXiv.2406.09072 |
732 | LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models | Xiaohao Yang, He Zhao, Dinh Q. Phung, Wray L. Buntine, Lan Du | 2024-06-13 | arXiv | https://github.com/Xiaohao-Yang/Topic_Model_Evaluation | https://doi.org/10.48550/arXiv.2406.09008 |
733 | LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living | Rajatsubhra Chakraborty, Arkaprava Sinha, Dominick Reilly, Manish Kumar Govind, Pu Wang, François Brémond, Srijan Das | 2024-06-13 | arXiv | https://adl-x.github.io/ | https://doi.org/10.48550/arXiv.2406.09390 |
734 | SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models | Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, Huajun Chen | 2024-06-13 | arXiv | https://github.com/hicai-zju/sciknoweval | https://doi.org/10.48550/arXiv.2406.09098 |
735 | Investigating the translation capabilities of Large Language Models trained on parallel data only | Javier García Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca de Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero | 2024-06-13 | arXiv | https://github.com/projecte-aina/Plume | https://doi.org/10.48550/arXiv.2406.09140 |
736 | DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation | A B M Ashikur Rahman, Saeed Anwar, Muhammad Usman, Ajmal Mian | 2024-06-13 | arXiv | https://github.com/ashikiut/DefAn | http://arxiv.org/abs/2406.09155v1 |
737 | Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs | Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin | 2024-06-13 | arXiv | https://github.com/sail-sg/CPO | http://arxiv.org/abs/2406.09136v1 |
738 | Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Zhao Xu, Fan Liu, Hao Liu | 2024-06-13 | arXiv | https://github.com/usail-hkust/Bag_of_Tricks_for_LLM_Jailbreaking | http://arxiv.org/abs/2406.09324v1 |
739 | Enhancing Psychotherapy Counseling: A Data Augmentation Pipeline Leveraging Large Language Models for Counseling Conversations | Jun-Woo Kim, Ji-Eun Han, Jun-Seok Koh, Hyeon-Tae Seo, Du-Seong Chang | 2024-06-13 | arXiv | https://github.com/jwkim-chat/A-Data-Augmentation-Pipeline-Leveraging-Large-Language-Models-for-Counseling-Conversations | https://doi.org/10.48550/arXiv.2406.08718 |
740 | Large Language Models Must Be Taught to Know What They Don't Know | Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine M. Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson | 2024-06-12 | arXiv | https://github.com/activatedgeek/calibration-tuning | https://doi.org/10.48550/arXiv.2406.08391 |
741 | TasTe: Teaching Large Language Models to Translate through Self-Reflection | Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie Zhou, Min Zhang | 2024-06-12 | arXiv | https://github.com/YutongWang1216/ReflectionLLMMT | https://doi.org/10.48550/arXiv.2406.08434 |
742 | Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference | Jiabao Ji, Yujian Liu, Yang Zhang, Gaowen Liu, Ramana Rao Kompella, Sijia Liu, Shiyu Chang | 2024-06-12 | arXiv | https://github.com/UCSB-NLP-Chang/ULD | http://arxiv.org/abs/2406.08607v1 |
743 | Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin | 2024-06-12 | arXiv | https://magpie-align.github.io/ | http://arxiv.org/abs/2406.08464v1 |
744 | MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents | Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen | 2024-06-12 | arXiv | https://MobileAgentBench.github.io | http://arxiv.org/abs/2406.08184v1 |
745 | Large Language Model Unlearning via Embedding-Corrupted Prompts | Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu | 2024-06-12 | arXiv | https://github.com/chrisliu298/llm-unlearn-eco | https://doi.org/10.48550/arXiv.2406.07933 |
746 | CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery | Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma Gongque, Jianing Yu, Qiuna Tan, Weiran Xu | 2024-06-12 | arXiv | https://github.com/csbench/csbench | https://doi.org/10.48550/arXiv.2406.08587 |
747 | Are Large Language Models Good Statisticians? | Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang | 2024-06-12 | arXiv | https://statqa.github.io/ | https://doi.org/10.48550/arXiv.2406.07815 |
748 | Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning | Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, Jinwoo Shin | 2024-06-12 | arXiv | https://github.com/jaehyun513/OCTree | http://arxiv.org/abs/2406.08527v1 |
749 | When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models | Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan Celine Lin | 2024-06-11 | ICML | https://github.com/GATECH-EIC/Linearized-LLM | https://openreview.net/forum?id=7mFSaP6IiN |
750 | VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models | Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen | 2024-06-11 | arXiv | https://github.com/Sweetaroo/VulDetectBench | https://doi.org/10.48550/arXiv.2406.07595 |
751 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing | 2024-06-11 | arXiv | https://github.com/DAMO-NLP-SG/VideoLLaMA2 | http://arxiv.org/abs/2406.07476v2 |
752 | MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations | Zixiao Wang, Jingwei Zhang, Wenqian Zhao, Farzan Farnia, Bei Yu | 2024-06-11 | arXiv | https://github.com/ShiningSord/MoreauPruner | https://doi.org/10.48550/arXiv.2406.07017 |
753 | Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond | Dewu Zheng, Yanlin Wang, Ensheng Shi, Ruikai Zhang, Yuchi Ma, Hongyu Zhang, Zibin Zheng | 2024-06-11 | arXiv | https://github.com/DeepSoftwareAnalytics/EvoEval | http://arxiv.org/abs/2406.06918v1 |
754 | Scaling Large-Language-Model-based Multi-Agent Collaboration | Chen Qian, Zihao Xie, Yifei Wang, Wei Liu, Yufan Dang, Zhuoyun Du, Weize Chen, Cheng Yang, Zhiyuan Liu, Maosong Sun | 2024-06-11 | arXiv | https://github.com/OpenBMB/ChatDev | https://doi.org/10.48550/arXiv.2406.07155 |
755 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li, Han Shi, Xin Jiang, Zhenguo Li, Hong Xu, Jiaya Jia | 2024-06-11 | arXiv | https://github.com/dvlab-research/Q-LLM | https://doi.org/10.48550/arXiv.2406.07528 |
756 | Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Aidar Myrzakhan, Sondos Mahmoud Bsharat, Zhiqiang Shen | 2024-06-11 | arXiv | https://github.com/VILA-Lab/Open-LLM-Leaderboard | http://arxiv.org/abs/2406.07545v1 |
757 | Instruct Large Language Models to Drive like Humans | Ruijun Zhang, Xianda Guo, Wenzhao Zheng, Chenming Zhang, Kurt Keutzer, Long Chen | 2024-06-11 | arXiv | https://github.com/bonbon-rj/InstructDriver | https://doi.org/10.48550/arXiv.2406.07296 |
758 | Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models | Zhenyi Lu, Jie Tian, Wei Wei, Xiaoye Qu, Yu Cheng, Wenfeng Xie, Dangyang Chen | 2024-06-11 | arXiv | https://github.com/Chuge0335/PC-CoT | https://doi.org/10.48550/arXiv.2406.07001 |
759 | Entropy-Reinforced Planning with Large Language Models for Drug Discovery | Xuefeng Liu, Chih-chan Tien, Peng Ding, Songhao Jiang, Rick L. Stevens | 2024-06-11 | arXiv | https://github.com/xuefeng-cs/ERP | https://doi.org/10.48550/arXiv.2406.07025 |
760 | MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs | Vera Neplenbroek, Arianna Bisazza, Raquel Fernández | 2024-06-11 | arXiv | https://github.com/Veranep/MBBQ | http://arxiv.org/abs/2406.07243v2 |
761 | Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study | Yichi Zhang, Yao Huang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Yifan Wang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu | 2024-06-11 | arXiv | https://multi-trust.github.io/ | https://doi.org/10.48550/arXiv.2406.07057 |
762 | Evolving Subnetwork Training for Large Language Models | Hanqi Li, Lu Chen, Da Ma, Zijian Wu, Su Zhu, Kai Yu | 2024-06-11 | arXiv | https://github.com/OpenDFM/EST | https://doi.org/10.48550/arXiv.2406.06962 |
763 | Limited Out-of-Context Knowledge Reasoning in Large Language Models | Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang | 2024-06-11 | arXiv | https://github.com/NJUNLP/ID-OCKR | https://doi.org/10.48550/arXiv.2406.07393 |
764 | LUNAR: Unsupervised LLM-based Log Parsing | Junjie Huang, Zhihan Jiang, Zhuangbin Chen, Michael R. Lyu | 2024-06-11 | arXiv | https://github.com/Jun-jie-Huang/LUNAR | http://arxiv.org/abs/2406.07174v2 |
765 | ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization | Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Celine Lin | 2024-06-10 | arXiv | https://github.com/GATECH-EIC/ShiftAddLLM | http://arxiv.org/abs/2406.05981v3 |
766 | Aligning Large Language Models with Representation Editing: A Control Perspective | Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang | 2024-06-10 | arXiv | https://github.com/Lingkai-Kong/RE-Control | https://doi.org/10.48550/arXiv.2406.05954 |
767 | AutoSurvey: Large Language Models Can Automatically Write Surveys | Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang | 2024-06-10 | arXiv | https://github.com/AutoSurveys/AutoSurvey | https://doi.org/10.48550/arXiv.2406.10252 |
768 | How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark | Ruizhong Qiu, Weiliang Will Zeng, Hanghang Tong, James Ezick, Christopher Lott | 2024-06-10 | arXiv | https://github.com/q-rz/enamel | http://arxiv.org/abs/2406.06647v2 |
769 | LLM Dataset Inference: Did you train on my dataset? | Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic | 2024-06-10 | arXiv | https://github.com/pratyushmaini/llm_dataset_inference/ | http://arxiv.org/abs/2406.06443v1 |
770 | Low-Rank Quantization-Aware Training for LLMs | Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel | 2024-06-10 | arXiv | https://github.com/qualcomm-ai-research/LR-QAT | http://arxiv.org/abs/2406.06385v2 |
771 | Recurrent Context Compression: Efficiently Expanding the Context Window of LLM | Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guojing Ge, Haoran Chen, Dong Yi, Jinqiao Wang | 2024-06-10 | arXiv | https://github.com/WUHU-G/RCC_Transformer | http://arxiv.org/abs/2406.06110v1 |
772 | Data-Juicer: A One-Stop Data Processing System for Large Language Models | Daoyuan Chen, Yilun Huang, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou | 2024-06-09 | SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data | https://github.com/alibaba/data-juicer | https://dl.acm.org/doi/10.1145/3626246.3653385 |
773 | Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin | 2024-06-09 | arXiv | https://github.com/Yu-Fangxu/FoR | http://arxiv.org/abs/2406.05673v2 |
774 | Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples | Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin | 2024-06-09 | arXiv | https://github.com/Yu-Fangxu/FoR | http://arxiv.org/abs/2406.05673v3 |
775 | Hello Again! LLM-powered Personalized Agent for Long-term Dialogue | Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua | 2024-06-09 | arXiv | https://github.com/leolee99/LD-Agent | http://arxiv.org/abs/2406.05925v1 |
776 | How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States | Zhenhong Zhou, Haiyang Yu, Xinghua Zhang, Rongwu Xu, Fei Huang, Yongbin Li | 2024-06-09 | arXiv | https://github.com/ydyjya/LLM-IHS-Explanation | http://arxiv.org/abs/2406.05644v2 |
777 | Soundscape Captioning using Sound Affective Quality Network and Large Language Model | Yuanbo Hou, Qiaoqiao Ren, Andrew Mitchell, Wenwu Wang, Jian Kang, Tony Belpaeme, Dick Botteldooren | 2024-06-09 | arXiv | https://github.com/Yuanbo2020/SoundSCaper | https://doi.org/10.48550/arXiv.2406.05914 |
778 | On the Worst Prompt Performance of Large Language Models | Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, Wai Lam | 2024-06-08 | arXiv | https://github.com/cbwbuaa/On-the-Worst-Prompt- | https://doi.org/10.48550/arXiv.2406.10248 |
779 | Large Language Model Assisted Adversarial Robustness Neural Architecture Search | Rui Zhong, Yang Cao, Jun Yu, Masaharu Munetomo | 2024-06-08 | arXiv | https://github.com/RuiZhong961230/LLMO | https://doi.org/10.48550/arXiv.2406.05433 |
780 | NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique | 2024-06-08 | arXiv | https://github.com/NYU-LLM-CTF/LLM_CTF_Database | http://arxiv.org/abs/2406.05590v1 |
781 | 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai | 2024-06-07 | arXiv | https://3d-grand.github.io | http://arxiv.org/abs/2406.05132v2 |
782 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan | 2024-06-07 | arXiv | https://github.com/alenai97/PEFT-MLLM | https://doi.org/10.48550/arXiv.2406.05130 |
783 | CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models | Ling Shi, Deyi Xiong | 2024-06-07 | arXiv | https://github.com/lingshi6565/Risk_eval | https://doi.org/10.48550/arXiv.2406.04752 |
784 | FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models | Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen | 2024-06-07 | arXiv | https://github.com/rui-ye/FedLLM-Bench | https://doi.org/10.48550/arXiv.2406.04845 |
785 | LLM-Enhanced Bayesian Optimization for Efficient Analog Layout Constraint Generation | Guojin Chen, Keren Zhu, Seunggeun Kim, Hanqing Zhu, Yao Lai, Bei Yu, David Z. Pan | 2024-06-07 | arXiv | https://github.com/dekura/LLANA | http://arxiv.org/abs/2406.05250v2 |
786 | LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model | Zhi Zhou, Jiang-Xin Shi, Peng-Xiao Song, Xiao-Wen Yang, Yi-Xuan Jin, Lan-Zhe Guo, Yu-Feng Li | 2024-06-07 | arXiv | https://github.com/pengxiao-song/LaWGPT | https://doi.org/10.48550/arXiv.2406.04614 |
787 | LogiCode: An LLM-Driven Framework for Logical Anomaly Detection | Yiheng Zhang, Yunkang Cao, Xiaohao Xu, Weiming Shen | 2024-06-07 | IEEE Transactions on Automation Science and Engineering | https://github.com/22strongestme/LOCO-Annotations | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10710633 |
788 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan | 2024-06-07 | arXiv | https://chocowu.github.io/SeTok-web/ | http://arxiv.org/abs/2406.05127v2 |
789 | MoralBench: Moral Evaluation of LLMs | Jianchao Ji, Yutong Chen, Mingyu Jin, Wujiang Xu, Wenyue Hua, Yongfeng Zhang | 2024-06-06 | arXiv | https://github.com/agiresearch/MoralBench | http://arxiv.org/abs/2406.04428v1 |
790 | ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models | Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song | 2024-06-06 | arXiv | https://github.com/Value4AI/ValueBench | https://doi.org/10.48550/arXiv.2406.04214 |
791 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen, Tsun-Hsuan Wang, Zhang-Wei Hong, Sertac Karaman, Daniela Rus | 2024-06-06 | arXiv | https://text-to-drive.github.io/ | https://doi.org/10.48550/arXiv.2406.04300 |
792 | TESTEVAL: Benchmarking Large Language Models for Test Case Generation | Wenhan Wang, Chenyuan Yang, Zhijie Wang, Yuheng Huang, Zhaoyang Chu, Da Song, Lingming Zhang, An Ran Chen, Lei Ma | 2024-06-06 | arXiv | https://llm4softwaretesting.github.io | https://doi.org/10.48550/arXiv.2406.04531 |
793 | PaCE: Parsimonious Concept Engineering for Large Language Models | Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René Vidal | 2024-06-06 | arXiv | https://github.com/peterljq/Parsimonious-Concept-Engineering | https://doi.org/10.48550/arXiv.2406.04331 |
794 | Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs | Shang Zhou, Feng Yao, Chengyu Dong, Zihan Wang, Jingbo Shang | 2024-06-06 | arXiv | https://github.com/ShangDataLab/Smooth-Control | http://arxiv.org/abs/2406.04460v1 |
795 | LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification | Chun Liu, Hongguang Zhang, Kainan Zhao, Xinghai Ju, Lin Yang | 2024-06-06 | Proceedings of the Annual Meeting of the Association for Computational Linguistics | https://github.com/ChunLiu-cs/LLMEmbed-ACL2024 | http://arxiv.org/abs/2406.03725v1 |
796 | Aligning Agents like Large Language Models | Adam Jelley, Yuhan Cao, David Bignell, Sam Devlin, Tabish Rashid | 2024-06-06 | arXiv | https://adamjelley.github.io/aligning-agents-like-llms | https://doi.org/10.48550/arXiv.2406.04208 |
797 | AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang | 2024-06-06 | arXiv | https://agentgym.github.io | https://doi.org/10.48550/arXiv.2406.04151 |
798 | Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui | 2024-06-06 | arXiv | https://github.com/YangLing0818/buffer-of-thought-llm | https://doi.org/10.48550/arXiv.2406.04271 |
799 | DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning | Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li | 2024-06-06 | arXiv | https://github.com/THU-KEG/DICE | http://arxiv.org/abs/2406.04197v1 |
800 | Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models | Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu | 2024-06-05 | arXiv | https://github.com/pprp/Pruner-Zero | https://doi.org/10.48550/arXiv.2406.02924 |
801 | MultifacetEval: Multifaceted Evaluation to Probe LLMs in Mastering Medical Knowledge | Yuxuan Zhou, Xien Liu, Chen Ning, Ji Wu | 2024-06-05 | arXiv | https://github.com/THUMLP/MultifacetEval | http://arxiv.org/abs/2406.02919v1 |
802 | Text-like Encoding of Collaborative Information in Large Language Models for Recommendation | Yang Zhang, Keqin Bao, Ming Yang, Wenjie Wang, Fuli Feng, Xiangnan He | 2024-06-05 | ACL | https://github.com/zyang1580/BinLLM | https://aclanthology.org/2024.acl-long.497 |
803 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun | 2024-06-05 | arXiv | https://github.com/MayDomine/Seq1F1B | https://doi.org/10.48550/arXiv.2406.03488 |
804 | PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs | Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar | 2024-06-05 | arXiv | https://github.com/houcharlie/PrE-Text | http://arxiv.org/abs/2406.02958v1 |
805 | PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM | Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen | 2024-06-05 | arXiv | https://github.com/posterllava/PosterLLaVA | http://arxiv.org/abs/2406.02884v1 |
806 | Llumnix: Dynamic Scheduling for Large Language Model Serving | Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin | 2024-06-05 | arXiv | https://github.com/AlibabaPAI/llumnix | https://doi.org/10.48550/arXiv.2406.03243 |
807 | HYDRA: Model Factorization Framework for Black-Box LLM Personalization | Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai | 2024-06-05 | arXiv | https://github.com/night-chen/HYDRA | http://arxiv.org/abs/2406.02888v2 |
808 | Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation | Tingjia Shen, Hao Wang, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen | 2024-06-05 | arXiv | https://github.com/TingJShen/URLLM | https://doi.org/10.48550/arXiv.2406.03085 |
809 | CSS: Contrastive Semantic Similarity for Uncertainty Quantification of LLMs | Shuang Ao, Stefan Rueger, Advaith Siddharthan | 2024-06-05 | arXiv | https://github.com/AoShuang92/css_uq_llms | http://arxiv.org/abs/2406.03158v1 |
810 | BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents | Yifei Wang, Dizhan Xue, Shengjie Zhang, Shengsheng Qian | 2024-06-05 | Proceedings of the Annual Meeting of the Association for Computational Linguistics | https://github.com/DPamK/BadAgent | http://arxiv.org/abs/2406.03007v1 |
811 | XRec: Large Language Models for Explainable Recommendation | Qiyao Ma, Xubin Ren, Chao Huang | 2024-06-04 | arXiv | https://github.com/HKUDS/XRec | https://doi.org/10.48550/arXiv.2406.02377 |
812 | Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, Jenia Jitsev | 2024-06-04 | arXiv | https://github.com/LAION-AI/AIW | https://doi.org/10.48550/arXiv.2406.02061 |
813 | Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature | Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren | 2024-06-04 | arXiv | https://github.com/Tongzhou0101/Bileve-official | https://doi.org/10.48550/arXiv.2406.01946 |
814 | Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion | Jakub Hoscilowicz, Pawel Popiolek, Jan Rudkowski, Jedrzej Bieniasz, Artur Janicki | 2024-06-04 | arXiv | https://github.com/j-hoscilowic/zurek-stegano | https://doi.org/10.48550/arXiv.2406.02481 |
815 | Large Language Models as Carriers of Hidden Messages | Jakub Hoscilowicz, Pawel Popiolek, Jan Rudkowski, Jedrzej Bieniasz, Artur Janicki | 2024-06-04 | arXiv | https://github.com/j-hoscilowic/zurek-stegano | http://arxiv.org/abs/2406.02481v2 |
816 | Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller | Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Dan Zhang, Difan Zou, Yisong Yue, Ziniu Hu | 2024-06-04 | arXiv | https://llm-self-control.github.io/ | http://arxiv.org/abs/2406.02721v3 |
817 | SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM | Quandong Wang, Yuxuan Yuan, Xiaoyu Yang, Ruike Zhang, Kang Zhao, Wei Liu, Jian Luan, Daniel Povey, Bin Wang | 2024-06-03 | arXiv | https://github.com/XiaoMi/subllm | http://arxiv.org/abs/2406.06571v2 |
818 | Sparsity-Accelerated Training for Large Language Models | Da Ma, Lu Chen, Pengyu Wang, Hongshen Xu, Hanqi Li, Liangtai Sun, Su Zhu, Shuai Fan, Kai Yu | 2024-06-03 | arXiv | https://github.com/OpenDFM/SAT | https://doi.org/10.48550/arXiv.2406.01392 |
819 | The Geometry of Categorical and Hierarchical Concepts in Large Language Models | Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch | 2024-06-03 | arXiv | https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations | https://doi.org/10.48550/arXiv.2406.01506 |
820 | VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model | Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Zeke Xie, Zhong Ji, Jungong Han, Mingming Sun | 2024-06-03 | arXiv | https://github.com/ucasyjz/VIP | https://doi.org/10.48550/arXiv.2406.01059 |
821 | Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization | Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, Yun-Nung Chen | 2024-06-03 | arXiv | https://github.com/MiuLab/PersonaLLM-Survey | http://arxiv.org/abs/2406.01171v2 |
822 | REvolve: Reward Evolution with Large Language Models using Human Feedback | Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires | 2024-06-03 | arXiv | https://rishihazra.github.io/REvolve | http://arxiv.org/abs/2406.01309v2 |
823 | Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs | Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei | 2024-06-03 | arXiv | https://github.com/Hsu1023/DuQuant | http://arxiv.org/abs/2406.01721v1 |
824 | Towards Scalable Automated Alignment of LLMs: A Survey | Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu | 2024-06-03 | arXiv | https://github.com/cascip/awesome-auto-alignment | http://arxiv.org/abs/2406.01252v1 |
825 | REvolve: Reward Evolution with Large Language Models for Autonomous Driving | Rishi Hazra, Alkis Sygkounas, Andreas Persson, Amy Loutfi, Pedro Zuidberg Dos Martires | 2024-06-03 | arXiv | https://rishihazra.github.io/REvolve | https://doi.org/10.48550/arXiv.2406.01309 |
826 | LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback | Wen Lai, Mohsen Mesgar, Alexander Fraser | 2024-06-03 | arXiv | https://github.com/boschresearch/ACL24-MLLM | http://arxiv.org/abs/2406.01771v1 |
827 | Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery | Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai | 2024-06-03 | arXiv | https://astrokg.github.io/ | https://doi.org/10.48550/arXiv.2406.01391 |
828 | DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs | Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei | 2024-06-03 | arXiv | https://duquant.github.io | http://arxiv.org/abs/2406.01721v2 |
829 | Demystifying Platform Requirements for Diverse LLM Inference Use Cases | Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna | 2024-06-03 | arXiv | https://github.com/abhibambhaniya/GenZ-LLM-Analyzer | http://arxiv.org/abs/2406.01698v1 |
830 | Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models | Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen | 2024-06-03 | arXiv | https://github.com/MiuLab/EditLLM-Survey | https://doi.org/10.48550/arXiv.2406.01436 |
831 | LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation | Yongjing Yin, Jiali Zeng, Yafu Li, Fandong Meng, Yue Zhang | 2024-06-03 | arXiv | https://github.com/ARIES-LM/Lexmatcher-MT | http://arxiv.org/abs/2406.01441v1 |
832 | Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection | Chentao Cao, Zhun Zhong, Zhanke Zhou, Yang Liu, Tongliang Liu, Bo Han | 2024-06-02 | arXiv | https://github.com/tmlr-group/EOE | https://doi.org/10.48550/arXiv.2406.00806 |
833 | Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction | Xiaoyuan Li, Wenjie Wang, Moxin Li, Junrong Guo, Yang Zhang, Fuli Feng | 2024-06-02 | arXiv | https://github.com/LittleCirc1e/EIC | https://doi.org/10.48550/arXiv.2406.00755 |
834 | A Closer Look at Logical Reasoning with LLMs: The Choice of Tool Matters | Long Hei Matthew Lam, Ramya Keerthy Thatikonda, Ehsan Shareghi | 2024-06-01 | arXiv | https://github.com/Mattylam/Logic_Symbolic_Solvers_Experiment | http://arxiv.org/abs/2406.00284v1 |
835 | Phased Instruction Fine-Tuning for Large Language Models | Wei Pang, Chuan Zhou, Xiao-Hua Zhou, Xiaojie Wang | 2024-06-01 | arXiv | https://github.com/xubuvd/PhasedSFT | https://doi.org/10.48550/arXiv.2406.04371 |
836 | Knowledge-Aware Code Generation with Large Language Models | Tao Huang, Zhihong Sun, Zhi Jin, Ge Li, Chen Lyu | 2024-06 | 2024 IEEE/ACM 32nd International Conference on Program Comprehension (ICPC) | https://github.com/CodeGeneration3/KareCoder.CCS | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10556459 |
837 | Investigating the Efficacy of Large Language Models for Code Clone Detection | Mohamad Khajezade, Jie JW Wu, Fatemeh Hendijani Fard, Gema Rodríguez-Pérez, Mohamed Sami Shehata | 2024-06 | 2024 IEEE/ACM 32nd International Conference on Program Comprehension (ICPC) | https://github.com/mkhfring/llm-for-ccd | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10556419 |
838 | Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies | Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yanqing Zhao, Yuhang Chen, Hao Yang, Yanfei Jiang, Xun Chen | 2024-06 | 2024 IEEE/ACM 32nd International Conference on Program Comprehension (ICPC) | https://github.com/lunyiliu/LogPrompt.CCS | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10556497 |
839 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin | 2024-05-31 | arXiv | https://github.com/jiaxiaojunQAQ/I-GCG | https://doi.org/10.48550/arXiv.2405.21018 |
840 | LLM-ESR: Large Language Models Enhancement for Long-tailed Sequential Recommendation | Qidong Liu, Xian Wu, Yejing Wang, Zijian Zhang, Feng Tian, Yefeng Zheng, Xiangyu Zhao | 2024-05-31 | arXiv | https://github.com/Applied-Machine-Learning-Lab/LLM-ESR | http://arxiv.org/abs/2405.20646v2 |
841 | Ovis: Structural Embedding Alignment for Multimodal Large Language Model | Shiyin Lu, Yang Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Han-Jia Ye | 2024-05-31 | arXiv | https://github.com/AIDC-AI/Ovis | https://doi.org/10.48550/arXiv.2405.20797 |
842 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao | 2024-05-31 | arXiv | https://github.com/xu1868/SaySelf | http://arxiv.org/abs/2405.20974v2 |
843 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun | 2024-05-31 | arXiv | https://video-mme.github.io | http://arxiv.org/abs/2405.21075v2 |
844 | One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models | Yutao Zhu, Zhaoheng Huang, Zhicheng Dou, Ji-Rong Wen | 2024-05-30 | arXiv | https://github.com/DaoD/SPRING/ | https://doi.org/10.48550/arXiv.2405.19670 |
845 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni, JingCheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu | 2024-05-30 | arXiv | https://github.com/Xwin-LM/Xwin-LM | http://arxiv.org/abs/2405.20335v1 |
846 | Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning | Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo | 2024-05-30 | arXiv | https://github.com/guozix/LLM-catalyst | http://arxiv.org/abs/2405.19732v3 |
847 | TAIA: Large Language Models are Out-of-Distribution Data Learners | Shuyang Jiang, Yusheng Liao, Ya Zhang, Yu Wang, Yanfeng Wang | 2024-05-30 | arXiv | https://github.com/pixas/TAIA_LLM | https://doi.org/10.48550/arXiv.2405.20192 |
848 | PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations | Jiatong Li, Renjun Hu, Kunzhe Huang, Yan Zhuang, Qi Liu, Mengxiao Zhu, Xing Shi, Wei Lin | 2024-05-30 | arXiv | https://github.com/aigc-apps/PertEval | http://arxiv.org/abs/2405.19740v1 |
849 | Designing an Evaluation Framework for Large Language Models in Astronomy Research | John F. Wu, Alina Hyk, Kiera McCormick, Christine Ye, Simone Astarita, Elina Baral, Jo Ciuca, Jesse Cranney, Anjalie Field, Kartheik Iyer, Philipp Koehn, Jenn Kotler, Sandor Kruk, Michelle Ntampaka, Charles O'Neill, Joshua E. G. Peek, Sanjib Sharma, Mikaeel Yunus | 2024-05-30 | arXiv | https://github.com/jsalt2024-evaluating-llms-for-astronomy/astro-arxiv-bot | https://doi.org/10.48550/arXiv.2405.20389 |
850 | NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models | Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang | 2024-05-30 | arXiv | https://kaiwu5.github.io/noiseboost | https://doi.org/10.48550/arXiv.2405.20081 |
851 | Large Language Models as Planning Domain Generators (Student Abstract) | James T. Oswald, Kavitha Srinivas, Harsha Kokel, Junkyu Lee, Michael Katz, Shirin Sohrabi | 2024-05-30 | AAAI | https://github.com/IBM/NL2PDDL | https://doi.org/10.1609/aaai.v38i21.30491 |
852 | Evaluating Large Language Model Biases in Persona-Steered Generation | Andy Liu, Mona Diab, Daniel Fried | 2024-05-30 | arXiv | https://github.com/andyjliu/persona-steered-generation-bias | https://doi.org/10.48550/arXiv.2405.20253 |
853 | Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach | Ernesto Quevedo, Jorge Yero, Rachel Koerner, Pablo Rivas, Tomás Cerný | 2024-05-30 | arXiv | https://github.com/Baylor-AI/HalluDetect | https://doi.org/10.48550/arXiv.2405.19648 |
854 | PATIENT-ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals | Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen | 2024-05-30 | EMNLP | https://github.com/ruiyiw/patient-psi | https://aclanthology.org/2024.emnlp-main.711 |
855 | AutoDroid: LLM-powered Task Automation in Android | Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu | 2024-05-29 | ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and Networking | https://autodroid-sys.github.io/ | https://dl.acm.org/doi/10.1145/3636534.3649379 |
856 | Compressing Large Language Models using Low Rank and Low Precision Decomposition | Rajarshi Saha, Naomi Sagan, Varun Srivastava, Andrea J. Goldsmith, Mert Pilanci | 2024-05-29 | arXiv | https://github.com/pilancilab/caldera | https://doi.org/10.48550/arXiv.2405.18886 |
857 | Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer | Zengqun Zhao, Yu Cao, Shaogang Gong, Ioannis Patras | 2024-05-29 | arXiv | https://github.com/zengqunzhao/Exp-CLIP | http://arxiv.org/abs/2405.19100v2 |
858 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen | 2024-05-29 | arXiv | https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation | http://arxiv.org/abs/2405.19334v2 |
859 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Y. Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, Yubo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang, Wenhao Huang, Wenhu Chen | 2024-05-29 | arXiv | https://map-neo.github.io/ | https://doi.org/10.48550/arXiv.2405.19327 |
860 | VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal | 2024-05-29 | arXiv | https://videotree2024.github.io/ | http://arxiv.org/abs/2405.19209v1 |
861 | ORLM: Training Large Language Models for Optimization Modeling | Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang | 2024-05-28 | arXiv | https://github.com/Cardinal-Operations/ORLM | https://doi.org/10.48550/arXiv.2405.17743 |
862 | Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model | Haogeng Liu, Quanzeng You, Xiaotian Han, Yongfei Liu, Huaibo Huang, Ran He, Hongxia Yang | 2024-05-28 | arXiv | https://github.com/liuhaogeng/Anchor-Former | https://doi.org/10.48550/arXiv.2405.17815 |
863 | TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models | Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim | 2024-05-28 | arXiv | https://ahnjaewoo.github.io/timechara | https://doi.org/10.48550/arXiv.2405.18027 |
864 | Pipette: Automatic Fine-Grained Large Language Model Training Configurator for Real-World Clusters | Jinkyu Yim, Jaeyong Song, Yerim Choi, Jaebeen Lee, Jaewon Jung, Hongsun Jang, Jinho Lee | 2024-05-28 | 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) | https://github.com/yimjinkyu1/date2024_pipette | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10546826 |
865 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu | 2024-05-28 | arXiv | https://github.com/pixeli99/OwLore | http://arxiv.org/abs/2405.18380v1 |
866 | Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing | Wei Zhao, Zhe Li, Yige Li, Ye Zhang, Jun Sun | 2024-05-28 | arXiv | https://github.com/ledllm/ledllm | https://doi.org/10.48550/arXiv.2405.18166 |
867 | Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2024-05-28 | arXiv | https://github.com/git-disl/Lisa | http://arxiv.org/abs/2405.18641v5 |
868 | Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning | Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu | 2024-05-28 | arXiv | https://github.com/git-disl/Lisa | https://doi.org/10.48550/arXiv.2405.18641 |
869 | Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | Hao Mark Chen, Wayne Luk, Ka Fai Cedric Yiu, Rui Li, Konstantin Mishchenko, Stylianos I. Venieris, Hongxiang Fan | 2024-05-28 | arXiv | https://github.com/hmarkc/parallel-prompt-decoding | http://arxiv.org/abs/2405.18628v2 |
870 | C$^3$Bench: A Comprehensive Classical Chinese Understanding Benchmark for Large Language Models | Jiahuan Cao, Yongxin Shi, Dezhi Peng, Yang Liu, Lianwen Jin | 2024-05-28 | arXiv | https://github.com/SCUT-DLVCLab/C3bench | http://arxiv.org/abs/2405.17732v2 |
871 | LLMs and Memorization: On Quality and Specificity of Copyright Compliance | Felix B Mueller, Rebekka Görge, Anna K Bernzen, Janna C Pirk, Maximilian Poretschkin | 2024-05-28 | Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7(1), 984-996, 2024 | https://github.com/felixbmuller/llms-memorization-copyright | http://arxiv.org/abs/2405.18492v1 |
872 | CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs | Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian | 2024-05-27 | arXiv | https://github.com/fayuge/CLAQ | http://arxiv.org/abs/2405.17233v2 |
873 | Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning | Xun Liang, Simin Niu, Zhiyu Li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, Chenyang Xi | 2024-05-27 | arXiv | https://github.com/IAAR-Shanghai/PGRAG | https://doi.org/10.48550/arXiv.2405.16933 |
874 | Entity Alignment with Noisy Annotations from Large Language Models | Shengyuan Chen, Qinggang Zhang, Junnan Dong, Wen Hua, Qing Li, Xiao Huang | 2024-05-27 | arXiv | https://github.com/chensyCN/llm4ea_official | https://doi.org/10.48550/arXiv.2405.16806 |
875 | LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | Haoyu Zhao, Wenhang Ge, Ying-Cong Chen | 2024-05-27 | arXiv | https://haoyu-zhao.github.io/LLM-Optic.github.io/ | https://doi.org/10.48550/arXiv.2405.17104 |
876 | Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching | Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Hao Wang, Zhenyu Zeng, Le Sun | 2024-05-27 | arXiv | https://github.com/tshu-w/LLM4EM | https://doi.org/10.48550/arXiv.2405.16884 |
877 | MotionLLM: Multimodal Motion-Language Learning with Large Language Models | Qi Wu, Yubo Zhao, Yifan Wang, Yu-Wing Tai, Chi-Keung Tang | 2024-05-27 | arXiv | https://knoxzhao.github.io/MotionLLM | https://doi.org/10.48550/arXiv.2405.17013 |
878 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | Shengyun Peng, Pin-Yu Chen, Matthew Hull, Duen Horng Chau | 2024-05-27 | arXiv | https://github.com/ShengYun-Peng/llm-landscape | https://doi.org/10.48550/arXiv.2405.17374 |
879 | ReMoDetect: Reward Models Recognize Aligned LLM's Generations | Hyunseok Lee, Jihoon Tack, Jinwoo Shin | 2024-05-27 | arXiv | https://github.com/hyunseoklee-ai/reward_llm_detect | http://arxiv.org/abs/2405.17382v1 |
880 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang, Xiangtai Li, Lu Qi, Shuicheng Yan, Ming-Hsuan Yang | 2024-05-27 | arXiv | https://KuanchihHuang.github.io/project/reason3d | https://doi.org/10.48550/arXiv.2405.17427 |
881 | Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | Xijie Huang, Xinyuan Wang, Hantao Zhang, Yinghao Zhu, Jiawen Xi, Jingkun An, Hao Wang, Hao Liang, Chengwei Pan | 2024-05-26 | arXiv | https://github.com/dirtycomputer/O2M_attack | http://arxiv.org/abs/2405.20775v2 |
882 | Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs | Mustafa Shukor, Matthieu Cord | 2024-05-26 | arXiv | https://ima-lmms.github.io/ | http://arxiv.org/abs/2405.16700v1 |
883 | Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models | Xijie Huang, Xinyuan Wang, Hantao Zhang, Jiawen Xi, Jingkun An, Hao Wang, Chengwei Pan | 2024-05-26 | arXiv | https://github.com/dirtycomputer/O2M_attack | https://doi.org/10.48550/arXiv.2405.20775 |
884 | Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration | Sunhao Dai, Weihao Liu, Yuqi Zhou, Liang Pang, Rongju Ruan, Gang Wang, Zhenhua Dong, Jun Xu, Ji-Rong Wen | 2024-05-26 | Proceedings of the Annual Meeting of the Association for Computational Linguistics | https://github.com/KID-22/Cocktail | http://arxiv.org/abs/2405.16546v1 |
885 | Automatically Generating Numerous Context-Driven SFT Data for LLMs across Diverse Granularity | Shanghaoran Quan | 2024-05-26 | arXiv | https://github.com/quanshr/AugCon | http://arxiv.org/abs/2405.16579v1 |
886 | AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning | Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He | 2024-05-25 | arXiv | https://github.com/minghchen/automanual | http://arxiv.org/abs/2405.16247v4 |
887 | SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models | Xudong Lu, Aojun Zhou, Yuhui Xu, Renrui Zhang, Peng Gao, Hongsheng Li | 2024-05-25 | arXiv | https://github.com/Lucky-Lance/SPP | https://doi.org/10.48550/arXiv.2405.16057 |
888 | CulturePark: Boosting Cross-cultural Understanding in Large Language Models | Cheng Li, Damien Teney, Linyi Yang, Qingsong Wen, Xing Xie, Jindong Wang | 2024-05-24 | arXiv | https://github.com/Scarelette/CulturePark | https://doi.org/10.48550/arXiv.2405.15145 |
889 | A Solution-based LLM API-using Methodology for Academic Information Seeking | Yuanchun Wang, Jifan Yu, Zijun Yao, Jing Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, Jinkai Zhang, Jingyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang | 2024-05-24 | arXiv | https://github.com/RUCKBReasoning/SoAy | http://arxiv.org/abs/2405.15165v1 |
890 | Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs | Chenxi Sun, Hongzhi Zhang, Zijia Lin, Jingyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong | 2024-05-24 | arXiv | https://github.com/tjunlp-lab/Lexical-Unit-Decoding-LUD- | http://arxiv.org/abs/2405.15208v1 |
891 | Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models | Simon Chi Lok Yu, Jie He, Pasquale Minervini, Jeff Z. Pan | 2024-05-24 | arXiv | https://github.com/simonucl/adv-retreival-icl | https://doi.org/10.48550/arXiv.2405.15984 |
892 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng, Jinjin Gu, Shijun Li, Chao Dong | 2024-05-24 | arXiv | https://github.com/bytetriper/LM4LV | https://doi.org/10.48550/arXiv.2405.15734 |
893 | Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | Byung-Kwan Lee, Chae Won Kim, Beomchan Park, Yong Man Ro | 2024-05-24 | arXiv | https://github.com/ByungKwanLee/Meteor | https://doi.org/10.48550/arXiv.2405.15574 |
894 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen | 2024-05-23 | arXiv | https://github.com/zjunlp/EasyEdit | https://doi.org/10.48550/arXiv.2405.14768 |
895 | Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration | Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li | 2024-05-23 | arXiv | https://read-llm.github.io/ | http://arxiv.org/abs/2405.14314v2 |
896 | RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models | Xiangkun Hu, Dongyu Ru, Lin Qiu, Qipeng Guo, Tianhang Zhang, Yang Xu, Yun Luo, Pengfei Liu, Yue Zhang, Zheng Zhang | 2024-05-23 | arXiv | https://github.com/amazon-science/RefChecker | https://doi.org/10.48550/arXiv.2405.14486 |
897 | Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs | Jaewoo Yang, Hayun Kim, Younghoon Kim | 2024-05-23 | arXiv | https://github.com/onnoo/activation-spikes | http://arxiv.org/abs/2405.14428v1 |
898 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su | 2024-05-23 | arXiv | https://github.com/OSU-NLP-Group/HippoRAG | https://doi.org/10.48550/arXiv.2405.14831 |
899 | FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | Hongyang Yang, Boyu Zhang, Neng Wang, Cheng Guo, Xiaoli Zhang, Likun Lin, Junlin Wang, Tianyu Zhou, Mao Guan, Runjia Zhang, Christina Dan Wang | 2024-05-23 | arXiv | https://github.com/AI4Finance-Foundation/FinRobot | https://doi.org/10.48550/arXiv.2405.14767 |
900 | DuanzAI: Slang-Enhanced LLM with Prompt for Humor Understanding | Yesian Rohn | 2024-05-23 | arXiv | https://github.com/YesianRohn/DuanzAI | http://arxiv.org/abs/2405.15818v1 |
901 | Dissociation of Faithful and Unfaithful Reasoning in LLMs | Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan Paturi, Leon Bergen | 2024-05-23 | arXiv | https://github.com/CoTErrorRecovery/CoTErrorRecovery | http://arxiv.org/abs/2405.15092v1 |
902 | ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation | Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua | 2024-05-23 | 2024 Neurips | https://github.com/SophieZheng998/ALI-Agent | http://arxiv.org/abs/2405.14125v2 |
903 | Large Language Models Can Self-Correct with Key Condition Verification | Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang | 2024-05-23 | EMNLP | https://wzy6642.github.io/proco.github.io/ | https://aclanthology.org/2024.emnlp-main.714 |
904 | AutoCoder: Enhancing Code Large Language Model with AIEV-Instruct | Bin Lei, Yuchen Li, Qiuwu Chen | 2024-05-23 | arXiv | https://github.com/bin123apple/AutoCoder | https://doi.org/10.48550/arXiv.2405.14906 |
905 | AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | Fei Zhao, Taotian Pang, Chunhui Li, Zhen Wu, Junjie Guo, Shangyu Xing, Xinyu Dai | 2024-05-23 | arXiv | https://aligngpt-vl.github.io/ | https://doi.org/10.48550/arXiv.2405.14129 |
906 | PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery | Runlong He, Mengya Xu, Adrito Das, Danyal Z. Khan, Sophia Bano, Hani J. Marcus, Danail Stoyanov, Matthew J. Clarkson, Mobarakol Islam | 2024-05-22 | arXiv | https://github.com/mobarakol/PitVQA | http://arxiv.org/abs/2405.13949v1 |
907 | VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | Yongxin Guo, Jingyu Liu, Mingda Li, Xiaoying Tang, Xi Chen, Bo Zhao | 2024-05-22 | arXiv | https://github.com/gyxxyg/VTG-LLM | http://arxiv.org/abs/2405.13382v1 |
908 | Prompt-Time Ontology-Driven Symbolic Knowledge Capture with Large Language Models | Tolga Çöplü, Arto Bendiken, Andrii Skomorokhov, Eduard Bateiko, Stephen Cobb, Joshua J. Bouw | 2024-05-22 | arXiv | https://github.com/HaltiaAI/paper-PTODSKC | https://doi.org/10.48550/arXiv.2405.14012 |
909 | Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards | Xiaoyu Yang, Jie Lu, En Yu | 2024-05-22 | arXiv | https://github.com/Anonymous0Knight/ConceptDriftMLLMs | http://arxiv.org/abs/2405.13459v2 |
910 | LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework | Yiran Qiao, Xiang Ao, Yang Liu, Jiarong Xu, Xiaoqian Sun, Qing He | 2024-05-22 | arXiv | https://github.com/QiaoYRan/LOGIN | https://doi.org/10.48550/arXiv.2405.13902 |
911 | Adapting Multi-modal Large Language Model to Concept Drift in the Long-tailed Open World | Xiaoyu Yang, Jie Lu, En Yu | 2024-05-22 | arXiv | https://github.com/Anonymous0Knight/ConceptDriftMLLMs | https://doi.org/10.48550/arXiv.2405.13459 |
912 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan, Mengping Yang, Luozheng Qin, Hao Yang, Ye Qian, Qiang Zhou, Cheng Zhang, Hao Li | 2024-05-21 | arXiv | https://github.com/llm-conditioned-diffusion/llm-conditioned-diffusion | https://doi.org/10.48550/arXiv.2405.12914 |
913 | SirLLM: Streaming Infinite Retentive LLM | Yao Yao, Zuchao Li, Hai Zhao | 2024-05-21 | Proceedings of the Annual Meeting of the Association for Computational Linguistics | https://github.com/Zoeyyao27/SirLLM | http://arxiv.org/abs/2405.12528v1 |
914 | Large Language Models Meet NL2Code: A Survey | Libo Qin, Qiguang Chen, Xiachong Feng, Yang Wu, Yongheng Zhang, Yinghui Li, Min Li, Wanxiang Che, Philip S. Yu | 2024-05-21 | ACL | https://nl2code.github.io | https://doi.org/10.18653/v1/2023.acl-long.411 |
915 | CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models | Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua | 2024-05-20 | arXiv | https://github.com/zt991211/CLAMBER | https://doi.org/10.48550/arXiv.2405.12063 |
916 | DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou | 2024-05-20 | arXiv | https://github.com/ChenhaoEcnuCS/Reason-Correct | https://doi.org/10.48550/arXiv.2405.12100 |
917 | MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | Hongwei Liu, Zilong Zheng, Yuxuan Qiao, Haodong Duan, Zhiwei Fei, Fengzhe Zhou, Wenwei Zhang, Songyang Zhang, Dahua Lin, Kai Chen | 2024-05-20 | OpenReview | https://github.com/open-compass/MathBench | http://arxiv.org/abs/2405.12209v1 |
918 | MBIAS: Mitigating Bias in Large Language Models While Retaining Context | Shaina Raza, Ananya Raval, Veronica Chatrath | 2024-05-18 | arXiv | https://github.com/shainarazavi/MBIAS/tree/main | https://doi.org/10.48550/arXiv.2405.11290 |
919 | Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang | 2024-05-18 | arXiv | https://uni-moe.github.io/ | http://arxiv.org/abs/2405.11273v1 |
920 | RDRec: Rationale Distillation for LLM-based Recommendation | Xinfeng Wang, Jin Cui, Yoshimi Suzuki, Fumiyo Fukumoto | 2024-05-17 | Proceedings of the Annual Meeting of the Association for Computational Linguistics | https://github.com/WangXFng/RDRec | http://arxiv.org/abs/2405.10587v2 |
921 | Surgical Feature-Space Decomposition of LLMs: Why, When and How? | Arnav Chavan, Nahush Lele, Deepak Gupta | 2024-05-17 | OpenReview | https://github.com/nyunAI/SFSD-LLM | http://arxiv.org/abs/2405.13039v1 |
922 | Layer-Condensed KV Cache for Efficient Inference of Large Language Models | Haoyi Wu, Kewei Tu | 2024-05-17 | arXiv | https://github.com/whyNLP/LCKV | https://doi.org/10.48550/arXiv.2405.10637 |
923 | Benchmarking Large Language Models on CFLUE - A Chinese Financial Language Understanding Evaluation Dataset | Jie Zhu, Junhui Li, Yalong Wen, Lifan Guo | 2024-05-17 | arXiv | https://github.com/aliyun/cflue | https://doi.org/10.48550/arXiv.2405.10542 |
924 | DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented Dialogues | Xiang Luo, Zhiwen Tang, Jin Wang, Xuejie Zhang | 2024-05-16 | LREC/COLING | https://github.com/suntea233/DuetSim | https://aclanthology.org/2024.lrec-main.481 |
925 | LFED: A Literary Fiction Evaluation Dataset for Large Language Models | Linhao Yu, Qun Liu, Deyi Xiong | 2024-05-16 | LREC/COLING | https://github.com/tjunlp-lab/LFED | https://aclanthology.org/2024.lrec-main.915 |
926 | Libra: Building Decoupled Vision System on Large Language Models | Yifan Xu, Xiaoshan Yang, Yaguang Song, Changsheng Xu | 2024-05-16 | arXiv | https://github.com/YifanXu74/Libra | https://doi.org/10.48550/arXiv.2405.10140 |
927 | MarkLLM: An Open-Source Toolkit for LLM Watermarking | Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Hanlin Zhang, Xuming Hu, Lijie Wen, Irwin King | 2024-05-16 | arXiv | https://github.com/THU-BPM/MarkLLM | http://arxiv.org/abs/2405.10051v3 |
928 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H. S. Torr, Marc Pollefeys, Matthias Nießner, Ian D. Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu | 2024-05-16 | arXiv | https://github.com/ActiveVisionLab/Awesome-LLM-3D | https://doi.org/10.48550/arXiv.2405.10255 |
929 | Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography | Minhao Bai. Jinshuai Yang, Kaiyi Pang, Huili Wang, Yongfeng Huang | 2024-05-15 | arXiv | https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs | http://arxiv.org/abs/2405.09090v1 |
930 | Evaluating LLMs at Evaluating Temporal Generalization | Chenghao Zhu, Nuo Chen, Yufei Gao, Benyou Wang | 2024-05-14 | arXiv | https://github.com/FreedomIntelligence/FreshBench | http://arxiv.org/abs/2405.08460v1 |
931 | Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring | Tiantian Zhang, Manxi Lin, Hongda Guo, Xiaofan Zhang, Ka Fung Peter Chiu, Aasa Feragen, Qi Dou | 2024-05-14 | arXiv | https://github.com/med-air/PICG2scoring | https://doi.org/10.48550/arXiv.2405.08786 |
932 | Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization | Chenghao Zhu, Nuo Chen, Yufei Gao, Yunyi Zhang, Prayag Tiwari, Benyou Wang | 2024-05-14 | arXiv | https://github.com/FreedomIntelligence/FreshBench | http://arxiv.org/abs/2405.08460v2 |
933 | Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform | Mingyue Cheng, Hao Zhang, Jiqian Yang, Qi Liu, Li Li, Xin Huang, Liwei Song, Zhi Li, Zhenya Huang, Enhong Chen | 2024-05-13 | WWW '24: Companion Proceedings of the ACM on Web Conference 2024 | https://github.com/Mingyue-Cheng/Bingjian | https://dl.acm.org/doi/10.1145/3589335.3651243 |
934 | Representation Learning with Large Language Models for Recommendation | Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, Chao Huang | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/HKUDS/RLMRec | https://dl.acm.org/doi/10.1145/3589334.3645458 |
935 | RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems | Jianxun Lian, Yuxuan Lei, Xu Huang, Jing Yao, Wei Xu, Xing Xie | 2024-05-13 | WWW '24: Companion Proceedings of the ACM on Web Conference 2024 | https://github.com/microsoft/RecAI | https://dl.acm.org/doi/10.1145/3589335.3651242 |
936 | ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation | Jianghao Lin, Rong Shan, Chenxu Zhu, Kounianhua Du, Bo Chen, Shigang Quan, Ruiming Tang, Yong Yu, Weinan Zhang | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/LaVieEnRose365/ReLLa | https://dl.acm.org/doi/10.1145/3589334.3645467 |
937 | News Recommendation with Category Description by a Large Language Model | Yuki Yada, Hayato Yamana | 2024-05-13 | arXiv | https://github.com/yamanalab/gpt-augmented-news-recommendation | https://doi.org/10.48550/arXiv.2405.13007 |
938 | Item-side Fairness of Large Language Model-based Recommendation System | Meng Jiang, Keqin Bao, Jizhi Zhang, Wenjie Wang, Zhengyi Yang, Fuli Feng, Xiangnan He | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/JiangM-C/IFairLRS | https://dl.acm.org/doi/10.1145/3589334.3648158 |
939 | Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering | Hongda Sun, Yuxuan Liu, Chengwei Wu, Haiyu Yan, Cheng Tai, Xin Gao, Shuo Shang, Rui Yan | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/EthanLeo-LYX/LLMQA | https://dl.acm.org/doi/10.1145/3589334.3645670 |
940 | Collaborative Large Language Model for Recommender Systems | Yaochen Zhu, Liang Wu, Qi Guo, Liangjie Hong, Jundong Li | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/yaochenzhu/llm4rec | https://dl.acm.org/doi/10.1145/3589334.3645347 |
941 | A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen | 2024-05-13 | arXiv | https://github.com/webis-de/msmarco-llm-distillation | https://doi.org/10.48550/arXiv.2405.07920 |
942 | GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks | Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, Chuan Shi | 2024-05-13 | WWW '24: Proceedings of the ACM on Web Conference 2024 | https://github.com/alibaba/GraphTranslator | https://dl.acm.org/doi/10.1145/3589334.3645682 |
943 | EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models | Yunsheng Ni, Chuanjian Liu, Yehui Tang, Kai Han, Yunhe Wang | 2024-05-13 | arXiv | https://github.com/niyunsheng/EMS-SD | https://doi.org/10.48550/arXiv.2405.07542 |
944 | FashionReGen: LLM-Empowered Fashion Report Generation | Yujuan Ding, Yunshan Ma, Wenqi Fan, Yige Yao, Tat-Seng Chua, Qing Li | 2024-05-13 | WWW '24: Companion Proceedings of the ACM on Web Conference 2024 | https://github.com/CompFashion/FashionReGen | https://dl.acm.org/doi/10.1145/3589335.3651232 |
945 | AIOS Compiler: LLM as Interpreter for Natural Language Programming and Flow Programming of AI Agents | Shuyuan Xu, Zelong Li, Kai Mei, Yongfeng Zhang | 2024-05-11 | arXiv | https://github.com/agiresearch/CoRE | http://arxiv.org/abs/2405.06907v2 |
946 | ChaCha: Leveraging Large Language Models to Prompt Children to Share Their Emotions about Personal Events | Woosuk Seo, Chanmo Yang, Young-Ho Kim | 2024-05-11 | CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems | https://naver-ai.github.io/chacha/ | https://dl.acm.org/doi/10.1145/3613904.3642152 |
947 | LaMI: Large Language Models for Multi-Modal Human-Robot Interaction | Chao Wang, Stephan Hasler, Daniel Tanneberg, Felix Ocker, Frank Joublin, Antonello Ceravola, Joerg Deigmoeller, Michael Gienger | 2024-05-11 | CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems | https://hri-eu.github.io/Lami/ | https://dl.acm.org/doi/10.1145/3613905.3651029 |
948 | Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models | Hyung-Kwon Ko, Hyeon Jeon, Gwanmo Park, Dae Hyun Kim, Nam Wook Kim, Juho Kim, Jinwook Seo | 2024-05-11 | CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems | https://github.com/hyungkwonko/chart-llm | https://dl.acm.org/doi/10.1145/3613904.3642943 |
949 | LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun | 2024-05-10 | arXiv | https://github.com/lawraa/LLM-Discussion | https://doi.org/10.48550/arXiv.2405.06373 |
950 | Linearizing Large Language Models | Jean Mercat, Igor Vasiljevic, Sedrick Keh, Kushal Arora, Achal Dave, Adrien Gaidon, Thomas Kollar | 2024-05-10 | arXiv | https://github.com/TRI-ML/linear_open_lm | https://doi.org/10.48550/arXiv.2405.06640 |
951 | PLeak: Prompt Leaking Attacks against Large Language Model Applications | Bo Hui, Haolin Yuan, Neil Zhenqiang Gong, Philippe Burlina, Yinzhi Cao | 2024-05-10 | arXiv | https://github.com/BHui97/PLeak | https://doi.org/10.48550/arXiv.2405.06823 |
952 | Pruning as a Domain-specific LLM Extractor | Nan Zhang, Yanchi Liu, Xujiang Zhao, Wei Cheng, Runxue Bao, Rui Zhang, Prasenjit Mitra, Haifeng Chen | 2024-05-10 | Findings of the Association for Computational Linguistics: NAACL 2024 - Findings | https://github.com/psunlpgroup/D-Pruner | http://arxiv.org/abs/2405.06275v1 |
953 | LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models | Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Yunchen Zhang, Xianglong Liu, Dacheng Tao | 2024-05-09 | arXiv | https://github.com/ModelTC/llmc | https://doi.org/10.48550/arXiv.2405.06001 |
954 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov, Miguel Altamirano Cabrera, Koffivi Fidèle Gbagbe, Dzmitry Tsetserukou | 2024-05-09 | arXiv | https://github.com/TemaLykov/robots_can_feel | http://arxiv.org/abs/2405.05824v1 |
955 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram, Tsun-Hsuan Wang, Alaa Maalouf, Guy Rosman, Sertac Karaman, Daniela Rus | 2024-05-09 | arXiv | https://github.com/sreeramsa/DriveSim | http://arxiv.org/abs/2405.05956v1 |
956 | Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference | Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji | 2024-05-09 | arXiv | https://github.com/lzhxmu/VTW | https://doi.org/10.48550/arXiv.2405.05803 |
957 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen | 2024-05-09 | arXiv | https://github.com/SHI-Labs/CuMo | http://arxiv.org/abs/2405.05949v1 |
958 | Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models | Sander Land, Max Bartolo | 2024-05-08 | arXiv | https://github.com/cohere-ai/magikarp/ | https://doi.org/10.48550/arXiv.2405.05417 |
959 | DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature | Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen | 2024-05-08 | arXiv | https://github.com/David-Li0406/DALK | http://arxiv.org/abs/2405.04819v2 |
960 | Vidur: A Large-Scale Simulation Framework For LLM Inference | Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov | 2024-05-08 | arXiv | https://github.com/microsoft/vidur | http://arxiv.org/abs/2405.05465v2 |
961 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han | 2024-05-07 | arXiv | https://github.com/mit-han-lab/qserve | http://arxiv.org/abs/2405.04532v2 |
962 | MedDoc-Bot: A Chat Tool for Comparative Analysis of Large Language Models in the Context of the Pediatric Hypertension Guideline | Mohamed Yaseen Jabarulla, Steffen Oeltze-Jafra, Philipp Beerbaum, Theodor Uden | 2024-05-06 | arXiv | https://github.com/yaseen28/MedDoc-Bot | https://doi.org/10.48550/arXiv.2405.03359 |
963 | Word2World: Generating Stories and Worlds through Large Language Models | Muhammad Umair Nasir, Steven James, Julian Togelius | 2024-05-06 | arXiv | https://github.com/umair-nasir14/Word2World | https://doi.org/10.48550/arXiv.2405.06686 |
964 | When LLMs Meet Cybersecurity: A Systematic Literature Review | Jie Zhang, Haoyu Bu, Hui Wen, Yu Chen, Lun Li, Hongsong Zhu | 2024-05-06 | arXiv | https://github.com/tmylla/Awesome-LLM4Cybersecurity | http://arxiv.org/abs/2405.03644v1 |
965 | Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs | Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang | 2024-05-06 | arXiv | https://github.com/cornell-zhang/llm-datatypes | http://arxiv.org/abs/2405.03103v2 |
966 | Large Language Models Synergize with Automated Machine Learning | Jinglue Xu, Jialong Li, Zhen Liu, Nagar Anthel Venkatesh Suryanarayanan, Guoyuan Zhou, Jia Guo, Hitoshi Iba, Kenji Tei | 2024-05-06 | arXiv | https://github.com/JLX0/llm-automl | https://doi.org/10.48550/arXiv.2405.03727 |
967 | Language Evolution for Evading Social Media Regulation via LLM-Based Multi-Agent Simulation | Jinyu Cai, Jialong Li, Mingyue Zhang, Munan Li, Chen-Shu Wang, Kenji Tei | 2024-05-05 | 2024 IEEE Congress on Evolutionary Computation (CEC) | https://github.com/BlueLinkXlGA-MAS | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10612015 |
968 | NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli | Xu Wang, Cheng Li, Yi Chang, Jindong Wang, Yuan Wu | 2024-05-05 | arXiv | https://github.com/wangxu0820/NegativePrompt | https://doi.org/10.48550/arXiv.2405.02814 |
969 | EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROAD | Bing-Yue Wu, Utsav Sharma, Sai Rahul Dhanvi Kankipati, Ajay Yadav, Bintu Kappil George, Sai Ritish Guntupalli, Austin Rovinski, Vidya A. Chhabria | 2024-05-04 | 2024 IEEE LLM Aided Design Workshop (LAD) | https://github.com/OpenROAD-Assistant/EDA-Corpus | https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10691774 |
970 | Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding | Zheng Zhao, Emilio Monti, Jens Lehmann, Haytham Assem | 2024-05-04 | arXiv | https://github.com/amazon-science/ContextualUnderstanding-ContrastiveDecoding | https://doi.org/10.48550/arXiv.2405.02750 |
971 | PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | Hyeong Kyu Choi, Yixuan Li | 2024-05-03 | arXiv | https://github.com/deeplearning-wisc/picle | https://doi.org/10.48550/arXiv.2405.02501 |
972 | ProFLingo: A Fingerprinting-based Copyright Protection Scheme for Large Language Models | Heng Jin, Chaoyu Zhang, Shanghao Shi, Wenjing Lou, Y. Thomas Hou | 2024-05-03 | arXiv | https://github.com/hengvt/ProFLingo | https://doi.org/10.48550/arXiv.2405.02466 |
973 | Learning Multiple Object States from Actions via Large Language Models | Masatoshi Tateno, Takuma Yagi, Ryosuke Furuta, Yoichi Sato | 2024-05-02 | arXiv | https://masatate.github.io/ObjStatefromAction.github.io/ | http://arxiv.org/abs/2405.01090v2 |
974 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin, Yuen Chen, Fernando Gonzalez Adauto, Jiarui Liu, Jiayi Zhang, Julian Michael, Bernhard Schölkopf, Mona T. Diab | 2024-05-02 | NAACL-HLT | https://github.com/causalNLP/amr_llm | https://doi.org/10.18653/v1/2024.naacl-long.209 |
975 | Creative Problem Solving in Large Language and Vision Models - What Would it Take? | Lakshmi Nair, Evana Gizzi, Jivko Sinapov | 2024-05-02 | arXiv | https://github.com/lnairGT/creative-problem-solving-LLMs | https://doi.org/10.48550/arXiv.2405.01453 |
976 | A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law | Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian J. McAuley, Linda R. Petzold, William Yang Wang | 2024-05-02 | arXiv | https://github.com/czyssrs/LLM_X_papers | https://doi.org/10.48550/arXiv.2405.01769 |
977 | Characterising the Creative Process in Humans and Large Language Models | Surabhi S. Nath, Peter Dayan, Claire Stevenson | 2024-05-01 | arXiv | https://github.com/surabhisnath/Creative_Process | https://doi.org/10.48550/arXiv.2405.00899 |
978 | Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation | Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, Jingren Zhou | 2024-05 | Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 5 | https://github.com/BeachWang/DAIL-SQL | https://dl.acm.org/doi/10.14778/3641204.3641221 |
979 | CodeHalu: Code Hallucinations in LLMs Driven by Execution-based Verification | Yuchen Tian, Weixiang Yan, Qian Yang, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma | 2024-04-30 | arXiv | https://github.com/yuchen814/CodeHalu | http://arxiv.org/abs/2405.00253v2 |
980 | CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification | Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song | 2024-04-30 | arXiv | https://github.com/yuchen814/CodeHalu | http://arxiv.org/abs/2405.00253v3 |
981 | Do Large Language Models Understand Conversational Implicature - A case study with a chinese sitcom | Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu | 2024-04-30 | arXiv | https://github.com/sjtu-compling/llm-pragmatics | https://doi.org/10.48550/arXiv.2404.19509 |
982 | Transcrib3D: 3D Referring Expression Resolution through Large Language Models | Jiading Fang, Xiangshan Tan, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus, Gregory Shakhnarovich, Matthew R. Walter | 2024-04-30 | arXiv | https://ripl.github.io/Transcrib3D | https://doi.org/10.48550/arXiv.2404.19221 |
983 | LLM-SR: Scientific Equation Discovery via Programming with Large Language Models | Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy | 2024-04-29 | arXiv | https://github.com/deep-symbolic-mathematics/LLM-SR | https://doi.org/10.48550/arXiv.2404.18400 |
984 | Benchmarking Benchmark Leakage in Large Language Models | Ruijie Xu, Zengzhi Wang, Run-Ze Fan, Pengfei Liu | 2024-04-29 | arXiv | https://gair-nlp.github.io/benbench | https://doi.org/10.48550/arXiv.2404.18824 |
985 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou | 2024-04-29 | arXiv | https://github.com/showlab/Awesome-MLLM-Hallucination | https://doi.org/10.48550/arXiv.2404.18930 |
986 | SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning | Jinghan Jia, Yihua Zhang, Yimeng Zhang, Jiancheng Liu, Bharat Runwal, James Diffenderfer, Bhavya Kailkhura, Sijia Liu | 2024-04-28 | arXiv | https://github.com/OPTML-Group/SOUL | http://arxiv.org/abs/2404.18239v4 |
987 | SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification | Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia | 2024-04-27 | ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 | https://github.com/flexflow/FlexFlow/ | https://dl.acm.org/doi/10.1145/3620666.3651335 |
988 | A Unified Debugging Approach via LLM-Based Multi-Agent Synergy | Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, Michael R. Lyu | 2024-04-26 | arXiv | https://github.com/AcceptePapier/UniDebugger | http://arxiv.org/abs/2404.17153v1 |
989 | When to Trust LLMs: Aligning Confidence with Response Quality | Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, Bolin Ding | 2024-04-26 | arXiv | https://github.com/TaoShuchang/CONQORD | http://arxiv.org/abs/2404.17287v2 |
990 | SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan | 2024-04-25 | arXiv | https://github.com/AILab-CVC/SEED-Bench | https://doi.org/10.48550/arXiv.2404.16790 |
991 | List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, Jianfeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang | 2024-04-25 | arXiv | https://github.com/zzxslp/SoM-LLaVA | http://arxiv.org/abs/2404.16375v1 |
992 | Large Language Models in the Clinic: A Comprehensive Benchmark | Fenglin Liu, Zheng Li, Hongjian Zhou, Qingyu Yin, Jingfeng Yang, Xianfeng Tang, Chen Luo, Ming Zeng, Haoming Jiang, Yifan Gao, Priyanka Nigam, Sreyashi Nag, Bing Yin, Yining Hua, Xuan Zhou, Omid Rohanian, Anshul Thakur, Lei Clifton, David A. Clifton | 2024-04-25 | arXiv | https://github.com/AI-in-Health/ClinicBench | http://arxiv.org/abs/2405.00716v3 |
993 | Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models | Bradley P. Allen, Paul T. Groth | 2024-04-25 | arXiv | https://github.com/bradleypallen/evaluating-kg-class-memberships-using-llms | https://doi.org/10.48550/arXiv.2404.17000 |
994 | Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models | Xu Ji, Jianyi Zhang, Ziyin Zhou, Zhangchi Zhao, Qianqian Qiao, Kaiying Han, Md Imran Hossen, Xiali Hei | 2024-04-25 | arXiv | https://github.com/cistineup/CantCounter | http://arxiv.org/abs/2405.00718v1 |
995 | Continual Learning of Large Language Models: A Comprehensive Survey | Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, Hao Wang | 2024-04-25 | arXiv | https://github.com/Wang-ML-Lab/llm-continual-learning-survey | https://doi.org/10.48550/arXiv.2404.16789 |
996 | Attacks on Third-Party APIs of Large Language Models | Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane | 2024-04-24 | arXiv | https://github.com/vk0812/Third-Party-Attacks-on-LLMs | https://doi.org/10.48550/arXiv.2404.16891 |
997 | Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model | Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger | 2024-04-24 | arXiv | https://github.com/ftgTUGraz/Chat2Scenario | https://doi.org/10.48550/arXiv.2404.16147 |
998 | ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea | 2024-04-24 | arXiv | https://github.com/HenryPengZou/ImplicitAVE | http://arxiv.org/abs/2404.15592v1 |
999 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild, Zhili Feng, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter | 2024-04-23 | arXiv | https://locuslab.github.io/acr-memorization | http://arxiv.org/abs/2404.15146v1 |
1000 | LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar, Nisarg Patel, Neeraj Varshney, Mutsumi Nakamura, Man Luo, Santosh Mashetty, Arindam Mitra, Chitta Baral | 2024-04-23 | ACL | https://github.com/Mihir3009/LogicBench | https://aclanthology.org/2024.acl-long.739 |
1001 | Think-Program-reCtify: 3D Situated Reasoning with Large Language Models | Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin | 2024-04-23 | arXiv | https://qingrongh.github.io/LLM-TPC/ | https://doi.org/10.48550/arXiv.2404.14705 |
1002 | MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkit | Boning Zhang, Chengxi Li, Kai Fan | 2024-04-22 | arXiv | https://github.com/MARIO-Math-Reasoning/math_evaluation | http://arxiv.org/abs/2404.13925v1 |