mail_text.txt

------------------------------------------------------------------------------
------------------------------------------------------------------------------
Send any comments regarding submissions directly to submitter.
------------------------------------------------------------------------------
Archives at http://arxiv.org/
To unsubscribe, e-mail To: cs@arXiv.org, Subject: cancel
------------------------------------------------------------------------------
Submissions to:
Artificial Intelligence
Computational Geometry
Computers and Society
Machine Learning
received from  Wed 13 Mar 24 18:00:00 GMT  to  Thu 14 Mar 24 18:00:00 GMT
------------------------------------------------------------------------------
------------------------------------------------------------------------------
\\
arXiv:2403.08802
Date: Mon, 5 Feb 2024 14:20:19 GMT   (482kb)

Title: Governance of Generative Artificial Intelligence for Companies
Authors: Johannes Schneider, Rene Abraham, Christian Meske
Categories: cs.AI cs.CY cs.LG
\\
 Generative Artificial Intelligence (GenAI), specifically large language
models like ChatGPT, has swiftly entered organizations without adequate
governance, posing both opportunities and risks. Despite extensive debates on
GenAI's transformative nature and regulatory measures, limited research
addresses organizational governance, encompassing technical and business
perspectives. This review paper fills this gap by surveying recent works. It
goes beyond mere summarization by developing a framework for GenAI governance
within companies. Our framework outlines the scope, objectives, and governance
mechanisms tailored to harness business opportunities and mitigate risks
associated with GenAI integration. This research contributes a focused approach
to GenAI governance, offering practical insights for companies navigating the
challenges of responsible AI adoption. It is also valuable for a technical
audience to broaden their perspective as increasingly ethical and business
concerns gain in prevalence and allow them to identify novel research
directions.
\\ ( https://arxiv.org/abs/2403.08802 ,  482kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08843
Date: Wed, 13 Mar 2024 14:45:54 GMT   (943kb,D)

Title: Fuzzy Fault Trees Formalized
Authors: Thi Kim Nhung Dang, Milan Lopuha\"a-Zwakenberg, Mari\"elle Stoelinga
Categories: cs.AI
Comments: 14 pages
\\
 Fault tree analysis is a vital method of assessing safety risks. It helps to
identify potential causes of accidents, assess their likelihood and severity,
and suggest preventive measures. Quantitative analysis of fault trees is often
done via the dependability metrics that compute the system's failure behaviour
over time. However, the lack of precise data is a major obstacle to
quantitative analysis, and so to reliability analysis. Fuzzy logic is a popular
framework for dealing with ambiguous values and has applications in many
domains. A number of fuzzy approaches have been proposed to fault tree
analysis, but -- to the best of our knowledge -- none of them provide rigorous
definitions or algorithms for computing fuzzy unreliability values. In this
paper, we define a rigorous framework for fuzzy unreliability values. In
addition, we provide a bottom-up algorithm to efficiently calculate fuzzy
reliability for a system. The algorithm incorporates the concept of
$\alpha$-cuts method. That is, performing binary algebraic operations on
intervals on horizontally discretised $\alpha$-cut representations of fuzzy
numbers. The method preserves the nonlinearity of fuzzy unreliability. Finally,
we illustrate the results obtained from two case studies.
\\ ( https://arxiv.org/abs/2403.08843 ,  943kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08910
Date: Wed, 13 Mar 2024 19:00:36 GMT   (31kb)

Title: Meta-operators for Enabling Parallel Planning Using Deep Reinforcement
 Learning
Authors: \'Angel Aso-Mollar, Eva Onaindia
Categories: cs.AI
Comments: 9 pages. Submitted to PRL workshop at ICAPS 2023
\\
 There is a growing interest in the application of Reinforcement Learning (RL)
techniques to AI planning with the aim to come up with general policies.
Typically, the mapping of the transition model of AI planning to the state
transition system of a Markov Decision Process is established by assuming a
one-to-one correspondence of the respective action spaces. In this paper, we
introduce the concept of meta-operator as the result of simultaneously applying
multiple planning operators, and we show that including meta-operators in the
RL action space enables new planning perspectives to be addressed using RL,
such as parallel planning. Our research aims to analyze the performance and
complexity of including meta-operators in the RL process, concretely in domains
where satisfactory outcomes have not been previously achieved using usual
generalized planning models. The main objective of this article is thus to pave
the way towards a redefinition of the RL action space in a manner that is more
closely aligned with the planning perspective.
\\ ( https://arxiv.org/abs/2403.08910 ,  31kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09232
Date: Thu, 14 Mar 2024 09:56:35 GMT   (625kb,D)

Title: Generating Feasible and Plausible Counterfactual Explanations for
 Outcome Prediction of Business Processes
Authors: Alexander Stevens, Chun Ouyang, Johannes De Smedt, Catarina Moreira
Categories: cs.AI
Comments: Journal Submission
\\
 In recent years, various machine and deep learning architectures have been
successfully introduced to the field of predictive process analytics.
Nevertheless, the inherent opacity of these algorithms poses a significant
challenge for human decision-makers, hindering their ability to understand the
reasoning behind the predictions. This growing concern has sparked the
introduction of counterfactual explanations, designed as human-understandable
what if scenarios, to provide clearer insights into the decision-making process
behind undesirable predictions. The generation of counterfactual explanations,
however, encounters specific challenges when dealing with the sequential nature
of the (business) process cases typically used in predictive process analytics.
Our paper tackles this challenge by introducing a data-driven approach,
REVISEDplus, to generate more feasible and plausible counterfactual
explanations. First, we restrict the counterfactual algorithm to generate
counterfactuals that lie within a high-density region of the process data,
ensuring that the proposed counterfactuals are realistic and feasible within
the observed process data distribution. Additionally, we ensure plausibility by
learning sequential patterns between the activities in the process cases,
utilising Declare language templates. Finally, we evaluate the properties that
define the validity of counterfactuals.
\\ ( https://arxiv.org/abs/2403.09232 ,  625kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09249
Date: Thu, 14 Mar 2024 10:16:57 GMT   (1713kb,D)

Title: Leveraging Constraint Programming in a Deep Learning Approach for
 Dynamically Solving the Flexible Job-Shop Scheduling Problem
Authors: Imanol Echeverria, Maialen Murua, Roberto Santana
Categories: cs.AI
\\
 Recent advancements in the flexible job-shop scheduling problem (FJSSP) are
primarily based on deep reinforcement learning (DRL) due to its ability to
generate high-quality, real-time solutions. However, DRL approaches often fail
to fully harness the strengths of existing techniques such as exact methods or
constraint programming (CP), which can excel at finding optimal or near-optimal
solutions for smaller instances. This paper aims to integrate CP within a deep
learning (DL) based methodology, leveraging the benefits of both. In this
paper, we introduce a method that involves training a DL model using optimal
solutions generated by CP, ensuring the model learns from high-quality data,
thereby eliminating the need for the extensive exploration typical in DRL and
enhancing overall performance. Further, we integrate CP into our DL framework
to jointly construct solutions, utilizing DL for the initial complex stages and
transitioning to CP for optimal resolution as the problem is simplified. Our
hybrid approach has been extensively tested on three public FJSSP benchmarks,
demonstrating superior performance over five state-of-the-art DRL approaches
and a widely-used CP solver. Additionally, with the objective of exploring the
application to other combinatorial optimization problems, promising preliminary
results are presented on applying our hybrid approach to the traveling salesman
problem, combining an exact method with a well-known DRL method.
\\ ( https://arxiv.org/abs/2403.09249 ,  1713kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09289
Date: Thu, 14 Mar 2024 11:22:51 GMT   (525kb,D)

Title: Silico-centric Theory of Mind
Authors: Anirban Mukherjee, Hannah Hanwen Chang
Categories: cs.AI
\\
 Theory of Mind (ToM) refers to the ability to attribute mental states, such
as beliefs, desires, intentions, and knowledge, to oneself and others, and to
understand that these mental states can differ from one's own and from reality.
We investigate ToM in environments with multiple, distinct, independent AI
agents, each possessing unique internal states, information, and objectives.
Inspired by human false-belief experiments, we present an AI ('focal AI') with
a scenario where its clone undergoes a human-centric ToM assessment. We prompt
the focal AI to assess whether its clone would benefit from additional
instructions. Concurrently, we give its clones the ToM assessment, both with
and without the instructions, thereby engaging the focal AI in higher-order
counterfactual reasoning akin to human mentalizing--with respect to humans in
one test and to other AI in another. We uncover a discrepancy: Contemporary AI
demonstrates near-perfect accuracy on human-centric ToM assessments. Since
information embedded in one AI is identically embedded in its clone, additional
instructions are redundant. Yet, we observe AI crafting elaborate instructions
for their clones, erroneously anticipating a need for assistance. An
independent referee AI agrees with these unsupported expectations. Neither the
focal AI nor the referee demonstrates ToM in our 'silico-centric' test.
\\ ( https://arxiv.org/abs/2403.09289 ,  525kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09361
Date: Thu, 14 Mar 2024 13:11:30 GMT   (307kb,D)

Title: A Multi-population Integrated Approach for Capacitated Location Routing
Authors: Pengfei He, Jin-Kao Hao, Qinghua Wu
Categories: cs.AI
\\
 The capacitated location-routing problem involves determining the depots from
a set of candidate capacitated depot locations and finding the required routes
from the selected depots to serve a set of customers whereas minimizing a cost
function that includes the cost of opening the chosen depots, the fixed
utilization cost per vehicle used, and the total cost (distance) of the routes.
This paper presents a multi-population integrated framework in which a
multi-depot edge assembly crossover generates promising offspring solutions
from the perspective of both depot location and route edge assembly. The method
includes an effective neighborhood-based local search, a feasibility-restoring
procedure and a diversification-oriented mutation. Of particular interest is
the multi-population scheme which organizes the population into multiple
subpopulations based on depot configurations. Extensive experiments on 281
benchmark instances from the literature show that the algorithm performs
remarkably well, by improving 101 best-known results (new upper bounds) and
matching 84 best-known results. Additional experiments are presented to gain
insight into the role of the key elements of the algorithm.
\\ ( https://arxiv.org/abs/2403.09361 ,  307kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09404
Date: Thu, 14 Mar 2024 13:53:05 GMT   (436kb,D)

Title: Heuristic Reasoning in AI: Instrumental Use and Mimetic Absorption
Authors: Anirban Mukherjee, Hannah Hanwen Chang
Categories: cs.AI
\\
 We propose a novel program of heuristic reasoning within artificial
intelligence (AI) systems. Through a series of innovative experiments,
including variations of the classic Linda problem and a novel application of
the Beauty Contest game, we uncover trade-offs between accuracy maximization
and effort reduction that shape the conditions under which AIs transition
between exhaustive logical processing and the use of cognitive shortcuts
(heuristics). We distinguish between the 'instrumental' use of heuristics to
match resources with objectives, and 'mimetic absorption,' whereby heuristics
are learned from humans, and manifest randomly and universally. We provide
evidence that AI, despite lacking intrinsic goals or self-awareness, manifests
an adaptive balancing of precision and efficiency, consistent with principles
of resource-rational human cognition as explicated in classical theories of
bounded rationality and dual-process theory.
\\ ( https://arxiv.org/abs/2403.09404 ,  436kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09481
Date: Thu, 14 Mar 2024 15:25:23 GMT   (4223kb,D)

Title: Clinical Reasoning over Tabular Data and Text with Bayesian Networks
Authors: Paloma Rabaey, Johannes Deleu, Stefan Heytens, Thomas Demeester
Categories: cs.AI
Comments: 10 pages, 2 figures
\\
 Bayesian networks are well-suited for clinical reasoning on tabular data, but
are less compatible with natural language data, for which neural networks
provide a successful framework. This paper compares and discusses strategies to
augment Bayesian networks with neural text representations, both in a
generative and discriminative manner. This is illustrated with simulation
results for a primary care use case (diagnosis of pneumonia) and discussed in a
broader clinical context.
\\ ( https://arxiv.org/abs/2403.09481 ,  4223kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09510
Date: Thu, 14 Mar 2024 15:56:39 GMT   (8782kb,D)

Title: Trust AI Regulation? Discerning users are vital to build trust and
 effective AI regulation
Authors: Zainab Alalawi, Paolo Bova, Theodor Cimpeanu, Alessandro Di Stefano,
 Manh Hong Duong, Elias Fernandez Domingos, The Anh Han, Marcus Krellner,
 Bianca Ogbo, Simon T. Powers, and Filippo Zimmaro
Categories: cs.AI cs.CY cs.GT cs.MA math.DS
\\
 There is general agreement that some form of regulation is necessary both for
AI creators to be incentivised to develop trustworthy systems, and for users to
actually trust those systems. But there is much debate about what form these
regulations should take and how they should be implemented. Most work in this
area has been qualitative, and has not been able to make formal predictions.
Here, we propose that evolutionary game theory can be used to quantitatively
model the dilemmas faced by users, AI creators, and regulators, and provide
insights into the possible effects of different regulatory regimes. We show
that creating trustworthy AI and user trust requires regulators to be
incentivised to regulate effectively. We demonstrate the effectiveness of two
mechanisms that can achieve this. The first is where governments can recognise
and reward regulators that do a good job. In that case, if the AI system is not
too risky for users then some level of trustworthy development and user trust
evolves. We then consider an alternative solution, where users can condition
their trust decision on the effectiveness of the regulators. This leads to
effective regulation, and consequently the development of trustworthy AI and
user trust, provided that the cost of implementing regulations is not too high.
Our findings highlight the importance of considering the effect of different
regulatory regimes from an evolutionary game theoretic perspective.
\\ ( https://arxiv.org/abs/2403.09510 ,  8782kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09580
Date: Thu, 14 Mar 2024 17:14:53 GMT   (16kb)

Title: Algorithmic syntactic causal identification
Authors: Dhurim Cakiqi and Max A. Little
Categories: cs.AI cs.LG stat.OT
Comments: 11 pages, 2 TikZ figures
\\
 Causal identification in causal Bayes nets (CBNs) is an important tool in
causal inference allowing the derivation of interventional distributions from
observational distributions where this is possible in principle. However, most
existing formulations of causal identification using techniques such as
d-separation and do-calculus are expressed within the mathematical language of
classical probability theory on CBNs. However, there are many causal settings
where probability theory and hence current causal identification techniques are
inapplicable such as relational databases, dataflow programs such as hardware
description languages, distributed systems and most modern machine learning
algorithms. We show that this restriction can be lifted by replacing the use of
classical probability theory with the alternative axiomatic foundation of
symmetric monoidal categories. In this alternative axiomatization, we show how
an unambiguous and clean distinction can be drawn between the general syntax of
causal models and any specific semantic implementation of that causal model.
This allows a purely syntactic algorithmic description of general causal
identification by a translation of recent formulations of the general ID
algorithm through fixing. Our description is given entirely in terms of the
non-parametric ADMG structure specifying a causal model and the algebraic
signature of the corresponding monoidal category, to which a sequence of
manipulations is then applied so as to arrive at a modified monoidal category
in which the desired, purely syntactic interventional causal model, is
obtained. We use this idea to derive purely syntactic analogues of classical
back-door and front-door causal adjustment, and illustrate an application to a
more complex causal model.
\\ ( https://arxiv.org/abs/2403.09580 ,  16kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08977
Date: Wed, 13 Mar 2024 21:56:40 GMT   (128kb,D)

Title: On maximum-sum matchings of bichromatic points
Authors: Oscar Chac\'on-Rivera, Pablo P\'erez-Lantero
Categories: cs.CG cs.DM
\\
 Huemer et al. (Discrete Math, 2019) proved that for any two finite point sets
$R$ and $B$ in the plane with $|R| = |B|$, the perfect matching that matches
points of $R$ with points of $B$, and maximizes the total squared Euclidean
distance of the matched pairs, has the property that all the disks induced by
the matching have a nonempty common intersection. A pair of matched points
induces the disk that has the segment connecting the points as diameter. In
this note, we characterize these maximum-sum matchings for any continuous
(semi)metric, focusing on both the Euclidean distance and squared Euclidean
distance. Using this characterization, we give a different but simpler proof
for the common intersection property proved by Huemer et al..
\\ ( https://arxiv.org/abs/2403.08977 ,  128kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09197
Date: Thu, 14 Mar 2024 09:09:15 GMT   (645kb,D)

Title: MetroGNN: Metro Network Expansion with Reinforcement Learning
Authors: Hongyuan Su, Yu Zheng, Jingtao Ding, Depeng Jin, Yong Li
Categories: cs.CY
Comments: WWW24 short
MSC-class: 68T09
DOI: 10.1145/3589335.3651536
\\
 Selecting urban regions for metro network expansion to meet maximal
transportation demands is crucial for urban development, while computationally
challenging to solve. The expansion process relies not only on complicated
features like urban demographics and origin-destination (OD) flow but is also
constrained by the existing metro network and urban geography. In this paper,
we introduce a reinforcement learning framework to address a Markov decision
process within an urban heterogeneous multi-graph. Our approach employs an
attentive policy network that intelligently selects nodes based on information
captured by a graph neural network. Experiments on real-world urban data
demonstrate that our proposed methodology substantially improve the satisfied
transportation demands by over 30\% when compared with state-of-the-art
methods. Codes are published at https://github.com/tsinghua-fib-lab/MetroGNN.
\\ ( https://arxiv.org/abs/2403.09197 ,  645kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09208
Date: Thu, 14 Mar 2024 09:22:16 GMT   (425kb)

Title: Older adults' safety and security online: A post-pandemic exploration of
 attitudes and behaviors
Authors: Edgar Pacheco
Categories: cs.CY
Comments: 20 pages, 7 tables
\\
 Older adults' growing use of the internet and related technologies, further
accelerated by the COVID-19 pandemic, has prompted not only a critical
examination of their behaviors and attitudes about online threats but also a
greater understanding of the roles of specific characteristics within this
population group. Based on survey data and using descriptive and inferential
statistics, this empirical study delves into this matter. The behaviors and
attitudes of a group of older adults aged 60 years and older (n=275) regarding
different dimensions of online safety and cybersecurity are investigated. The
results show that older adults report a discernible degree of concern about the
security of their personal information. Despite the varied precautions taken,
most of them do not know where to report online threats. What is more,
regarding key demographics, the study found some significant differences in
terms of gender and age group, but not disability status. This implies that
older adults do not seem to constitute a homogeneous group when it comes to
attitudes and behaviors regarding safety and security online. The study
concludes that support systems should include older adults in the development
of protective measures and acknowledge their diversity. The implications of the
results are discussed and some directions for future research are proposed.
\\ ( https://arxiv.org/abs/2403.09208 ,  425kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09216
Date: Thu, 14 Mar 2024 09:31:20 GMT   (432kb)

Title: Unlocking the Potential of Open Government Data: Exploring the
 Strategic, Technical, and Application Perspectives of High-Value Datasets
 Opening in Taiwan
Authors: Hsien-Lee Tseng, Anastasija Nikiforova
Categories: cs.CY
Comments: This paper has been accepted for publication in Proceedings of the
 25th Annual International Conference on Digital Government Research and this
 is a pre-print version of the manuscript. It is posted here for your personal
 use. Not for redistribution
\\
 Today, data has an unprecedented value as it forms the basis for data-driven
decision-making, including serving as an input for AI models, where the latter
is highly dependent on the availability of the data. However, availability of
data in an open data format creates a little added value, where the value of
these data, i.e., their relevance to the real needs of the end user, is key.
This is where the concept of high-value dataset (HVD) comes into play, which
has become popular in recent years. Defining and opening HVD is an ongoing
process consisting of a set of interrelated steps, the implementation of which
may vary from one country or region to another. Therefore, there has recently
been a call to conduct research in a country or region setting considered to be
of greatest national value. So far, only a few studies have been conducted at
the regional or national level, most of which consider only one step of the
process, such as identifying HVD or measuring their impact. With this study, we
answer this call and examine the national case of Taiwan by exploring the
entire lifecycle of HVD opening. The aim of the paper is to understand and
evaluate the lifecycle of high-value dataset publishing in one of the world's
leading producers of information and communication technology (ICT) products -
Taiwan. To do this, we conduct a qualitative study with exploratory interviews
with representatives from government agencies in Taiwan responsible for HVD
opening, exploring HVD opening lifecycle. As such, we examine (1) strategic
aspects related to the HVD determination process, (2) technical aspects, and
(3) application aspects.
\\ ( https://arxiv.org/abs/2403.09216 ,  432kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08818
Date: Mon, 19 Feb 2024 23:48:40 GMT   (647kb,D)

Title: Multimodal Fusion of EHR in Structures and Semantics: Integrating
 Clinical Records and Notes with Hypergraph and LLM
Authors: Hejie Cui, Xinyu Fang, Ran Xu, Xuan Kan, Joyce C. Ho, Carl Yang
Categories: cs.LG cs.AI cs.CL
\\
 Electronic Health Records (EHRs) have become increasingly popular to support
clinical decision-making and healthcare in recent decades. EHRs usually contain
heterogeneous information, such as structural data in tabular form and
unstructured data in textual notes. Different types of information in EHRs can
complement each other and provide a more complete picture of the health status
of a patient. While there has been a lot of research on representation learning
of structured EHR data, the fusion of different types of EHR data (multimodal
fusion) is not well studied. This is mostly because of the complex medical
coding systems used and the noise and redundancy present in the written notes.
In this work, we propose a new framework called MINGLE, which integrates both
structures and semantics in EHR effectively. Our framework uses a two-level
infusion strategy to combine medical concept semantics and clinical note
semantics into hypergraph neural networks, which learn the complex interactions
between different types of data to generate visit representations for
downstream prediction. Experiment results on two EHR datasets, the public
MIMIC-III and private CRADLE, show that MINGLE can effectively improve
predictive performance by 11.83% relatively, enhancing semantic integration as
well as multimodal fusion for structural and textual EHR data.
\\ ( https://arxiv.org/abs/2403.08818 ,  647kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08819
Date: Tue, 20 Feb 2024 04:13:48 GMT   (2161kb,D)

Title: Thermometer: Towards Universal Calibration for Large Language Models
Authors: Maohao Shen, Subhro Das, Kristjan Greenewald, Prasanna Sattigeri,
 Gregory Wornell, Soumya Ghosh
Categories: cs.LG cs.CL stat.ML
\\
 We consider the issue of calibration in large language models (LLM). Recent
studies have found that common interventions such as instruction tuning often
result in poorly calibrated LLMs. Although calibration is well-explored in
traditional applications, calibrating LLMs is uniquely challenging. These
challenges stem as much from the severe computational requirements of LLMs as
from their versatility, which allows them to be applied to diverse tasks.
Addressing these challenges, we propose THERMOMETER, a calibration approach
tailored to LLMs. THERMOMETER learns an auxiliary model, given data from
multiple tasks, for calibrating a LLM. It is computationally efficient,
preserves the accuracy of the LLM, and produces better-calibrated responses for
new tasks. Extensive empirical evaluations across various benchmarks
demonstrate the effectiveness of the proposed method.
\\ ( https://arxiv.org/abs/2403.08819 ,  2161kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08820
Date: Wed, 21 Feb 2024 19:36:24 GMT   (15522kb,D)

Title: Diet-ODIN: A Novel Framework for Opioid Misuse Detection with
 Interpretable Dietary Patterns
Authors: Zheyuan Zhang, Zehong Wang, Shifu Hou, Evan Hall, Landon Bachman,
 Vincent Galassi, Jasmine White, Nitesh V. Chawla, Chuxu Zhang, Yanfang Ye
Categories: cs.LG cs.AI cs.SI
\\
 The opioid crisis has been one of the most critical society concerns in the
United States. Although the medication assisted treatment (MAT) is recognized
as the most effective treatment for opioid misuse and addiction, the various
side effects can trigger opioid relapse. In addition to MAT, the dietary
nutrition intervention has been demonstrated its importance in opioid misuse
prevention and recovery. However, research on the alarming connections between
dietary patterns and opioid misuse remain under-explored. In response to this
gap, in this paper, we first establish a large-scale multifaceted dietary
benchmark dataset related to opioid users at the first attempt and then develop
a novel framework - i.e., namely Opioid Misuse Detection with Interpretable
Dietary Patterns (Diet-ODIN) - to bridge heterogeneous graph (HG) and large
language model (LLM) for the identification of users with opioid misuse and the
interpretation of their associated dietary patterns. Specifically, in
Diet-ODIN, we first construct an HG to comprehensively incorporate both dietary
and health-related information, and then we devise a holistic graph learning
framework with noise reduction to fully capitalize both users' individual
dietary habits and shared dietary patterns for the detection of users with
opioid misuse. To further delve into the intricate correlations between dietary
patterns and opioid misuse, we exploit an LLM by utilizing the knowledge
obtained from the graph learning model for interpretation. The extensive
experimental results based on our established benchmark with quantitative and
qualitative measures demonstrate the outstanding performance of Diet-ODIN in
exploring the complex interplay between opioid misuse and dietary patterns, by
comparison with state-of-the-art baseline methods.
\\ ( https://arxiv.org/abs/2403.08820 ,  15522kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08822
Date: Wed, 28 Feb 2024 06:50:10 GMT   (661kb)

Title: LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient
 Fine-Tuning of Large Language Models
Authors: Yichao Wu, Yafei Xiang, Shuning Huo, Yulu Gong, Penghao Liang
Categories: cs.LG cs.CL
\\
 In addressing the computational and memory demands of fine-tuning Large
Language Models(LLMs), we propose LoRA-SP(Streamlined Partial Parameter
Adaptation), a novel approach utilizing randomized half-selective parameter
freezing within the Low-Rank Adaptation(LoRA)framework. This method efficiently
balances pre-trained knowledge retention and adaptability for task-specific
optimizations. Through a randomized mechanism, LoRA-SP determines which
parameters to update or freeze, significantly reducing computational and memory
requirements without compromising model performance. We evaluated LoRA-SP
across several benchmark NLP tasks, demonstrating its ability to achieve
competitive performance with substantially lower resource consumption compared
to traditional full-parameter fine-tuning and other parameter-efficient
techniques. LoRA-SP innovative approach not only facilitates the deployment of
advanced NLP models in resource-limited settings but also opens new research
avenues into effective and efficient model adaptation strategies.
\\ ( https://arxiv.org/abs/2403.08822 ,  661kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08834
Date: Wed, 13 Mar 2024 08:04:00 GMT   (1308kb,D)

Title: Predictive Analysis of Tuberculosis Treatment Outcomes Using Machine
 Learning: A Karnataka TB Data Study at a Scale
Authors: SeshaSai Nath Chinagudaba, Darshan Gera, Krishna Kiran Vamsi Dasu, Uma
 Shankar S, Kiran K, Anil Singarajpure, Shivayogappa.U, Somashekar N, Vineet
 Kumar Chadda and Sharath B N
Categories: cs.LG cs.AI
\\
 Tuberculosis (TB) remains a global health threat, ranking among the leading
causes of mortality worldwide. In this context, machine learning (ML) has
emerged as a transformative force, providing innovative solutions to the
complexities associated with TB treatment.This study explores how machine
learning, especially with tabular data, can be used to predict Tuberculosis
(TB) treatment outcomes more accurately. It transforms this prediction task
into a binary classification problem, generating risk scores from patient data
sourced from NIKSHAY, India's national TB control program, which includes over
500,000 patient records.
 Data preprocessing is a critical component of the study, and the model
achieved an recall of 98% and an AUC-ROC score of 0.95 on the validation set,
which includes 20,000 patient records.We also explore the use of Natural
Language Processing (NLP) for improved model learning. Our results,
corroborated by various metrics and ablation studies, validate the
effectiveness of our approach. The study concludes by discussing the potential
ramifications of our research on TB eradication efforts and proposing potential
avenues for future work. This study marks a significant stride in the battle
against TB, showcasing the potential of machine learning in healthcare.
\\ ( https://arxiv.org/abs/2403.08834 ,  1308kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08835
Date: Wed, 13 Mar 2024 08:10:18 GMT   (894kb)

Title: Stacking-based deep neural network for player scouting in football 1
Authors: Simon Lacan (IMT Nord Europe)
Categories: cs.LG cs.AI
\\
 Datascouting is one of the most known data applications in professional
sport, and specifically football. Its objective is to analyze huge database of
players in order to detect high potentials that can be then individually
considered by human scouts. In this paper, we propose a stacking-based deep
learning model to detect high potential football players. Applied on
open-source database, our model obtains significantly better results that
classical statistical methods.
\\ ( https://arxiv.org/abs/2403.08835 ,  894kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08836
Date: Wed, 13 Mar 2024 08:15:18 GMT   (1135kb,D)

Title: Structural Positional Encoding for knowledge integration in
 transformer-based medical process monitoring
Authors: Christopher Irwin, Marco Dossena, Giorgio Leonardi, Stefania Montani
Categories: cs.LG cs.AI
\\
 Predictive process monitoring is a process mining task aimed at forecasting
information about a running process trace, such as the most correct next
activity to be executed. In medical domains, predictive process monitoring can
provide valuable decision support in atypical and nontrivial situations.
Decision support and quality assessment in medicine cannot ignore domain
knowledge, in order to be grounded on all the available information (which is
not limited to data) and to be really acceptable by end users.
 In this paper, we propose a predictive process monitoring approach relying on
the use of a {\em transformer}, a deep learning architecture based on the
attention mechanism. A major contribution of our work lies in the incorporation
of ontological domain-specific knowledge, carried out through a graph
positional encoding technique. The paper presents and discusses the encouraging
experimental result we are collecting in the domain of stroke management.
\\ ( https://arxiv.org/abs/2403.08836 ,  1135kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08837
Date: Wed, 13 Mar 2024 08:39:21 GMT   (318kb,D)

Title: Cyclic Data Parallelism for Efficient Parallelism of Deep Neural
 Networks
Authors: Louis Fournier (MLIA), Edouard Oyallon
Categories: cs.LG cs.AI cs.DC cs.NE stat.ML
\\
 Training large deep learning models requires parallelization techniques to
scale. In existing methods such as Data Parallelism or ZeRO-DP, micro-batches
of data are processed in parallel, which creates two drawbacks: the total
memory required to store the model's activations peaks at the end of the
forward pass, and gradients must be simultaneously averaged at the end of the
backpropagation step. We propose Cyclic Data Parallelism, a novel paradigm
shifting the execution of the micro-batches from simultaneous to sequential,
with a uniform delay. At the cost of a slight gradient delay, the total memory
taken by activations is constant, and the gradient communications are balanced
during the training step. With Model Parallelism, our technique reduces the
number of GPUs needed, by sharing GPUs across micro-batches. Within the ZeRO-DP
framework, our technique allows communication of the model states with
point-to-point operations rather than a collective broadcast operation. We
illustrate the strength of our approach on the CIFAR-10 and ImageNet datasets.
\\ ( https://arxiv.org/abs/2403.08837 ,  318kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08838
Date: Wed, 13 Mar 2024 12:05:02 GMT   (3375kb,D)

Title: Predictive Clustering of Vessel Behavior Based on Hierarchical
 Trajectory Representation
Authors: Rui Zhang, Hanyue Wu, Zhenzhong Yin, Zhu Xiao, Yong Xiong, and Kezhong
 Liu
Categories: cs.LG cs.AI
\\
 Vessel trajectory clustering, which aims to find similar trajectory patterns,
has been widely leveraged in overwater applications. Most traditional methods
use predefined rules and thresholds to identify discrete vessel behaviors. They
aim for high-quality clustering and conduct clustering on entire sequences,
whether the original trajectory or its sub-trajectories, failing to represent
their evolution. To resolve this problem, we propose a Predictive Clustering of
Hierarchical Vessel Behavior (PC-HiV). PC-HiV first uses hierarchical
representations to transform every trajectory into a behavioral sequence. Then,
it predicts evolution at each timestamp of the sequence based on the
representations. By applying predictive clustering and latent encoding, PC-HiV
improves clustering and predictions simultaneously. Experiments on real AIS
datasets demonstrate PC-HiV's superiority over existing methods, showcasing its
effectiveness in capturing behavioral evolution discrepancies between vessel
types (tramp vs. liner) and within emission control areas. Results show that
our method outperforms NN-Kmeans and Robust DAA by 3.9% and 6.4% of the purity
score.
\\ ( https://arxiv.org/abs/2403.08838 ,  3375kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08839
Date: Wed, 13 Mar 2024 12:08:27 GMT   (537kb,D)

Title: Learning-Enhanced Neighborhood Selection for the Vehicle Routing Problem
 with Time Windows
Authors: Willem Feijen, Guido Sch\"afer, Koen Dekker, Seppo Pieterse
Categories: cs.LG
MSC-class: 90-05
\\
 Large Neighborhood Search (LNS) is a universal approach that is broadly
applicable and has proven to be highly efficient in practice for solving
optimization problems. We propose to integrate machine learning (ML) into LNS
to assist in deciding which parts of the solution should be destroyed and
repaired in each iteration of LNS. We refer to our new approach as
Learning-Enhanced Neighborhood Selection (LENS for short). Our approach is
universally applicable, i.e., it can be applied to any LNS algorithm to amplify
the workings of the destroy algorithm. In this paper, we demonstrate the
potential of LENS on the fundamental Vehicle Routing Problem with Time Windows
(VRPTW). We implemented an LNS algorithm for VRPTW and collected data on
generated novel training instances derived from well-known, extensively
utilized benchmark datasets. We trained our LENS approach with this data and
compared the experimental results of our approach with two benchmark
algorithms: a random neighborhood selection method to show that LENS learns to
make informed choices and an oracle neighborhood selection method to
demonstrate the potential of our LENS approach. With LENS, we obtain results
that significantly improve the quality of the solutions.
\\ ( https://arxiv.org/abs/2403.08839 ,  537kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08845
Date: Wed, 13 Mar 2024 16:30:57 GMT   (25270kb,D)

Title: Bifurcated Attention for Single-Context Large-Batch Sampling
Authors: Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda,
 Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen,
 Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang
Categories: cs.LG cs.AI
\\
 In our study, we present bifurcated attention, a method developed for
language model inference in single-context batch sampling contexts. This
approach aims to reduce redundant memory IO costs, a significant factor in
latency for high batch sizes and long context lengths. Bifurcated attention
achieves this by dividing the attention mechanism during incremental decoding
into two distinct GEMM operations, focusing on the KV cache from prefill and
the decoding process. This method ensures precise computation and maintains the
usual computational load (FLOPs) of standard attention mechanisms, but with
reduced memory IO. Bifurcated attention is also compatible with multi-query
attention mechanism known for reduced memory IO for KV cache, further enabling
higher batch size and context length. The resulting efficiency leads to lower
latency, improving suitability for real-time applications, e.g., enabling
massively-parallel answer generation without substantially increasing latency,
enhancing performance when integrated with postprocessing techniques such as
reranking.
\\ ( https://arxiv.org/abs/2403.08845 ,  25270kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08879
Date: Wed, 13 Mar 2024 18:05:16 GMT   (10655kb,D)

Title: Multi-Objective Optimization Using Adaptive Distributed Reinforcement
 Learning
Authors: Jing Tan, Ramin Khalili, Holger Karl
Categories: cs.LG cs.AI cs.MA
\\
 The Intelligent Transportation System (ITS) environment is known to be
dynamic and distributed, where participants (vehicle users, operators, etc.)
have multiple, changing and possibly conflicting objectives. Although
Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS
applications such as resource management and offloading, most RL algorithms
focus on single objectives. In many situations, converting a multi-objective
problem into a single-objective one is impossible, intractable or insufficient,
making such RL algorithms inapplicable. We propose a multi-objective,
multi-agent reinforcement learning (MARL) algorithm with high learning
efficiency and low computational requirements, which automatically triggers
adaptive few-shot learning in a dynamic, distributed and noisy environment with
sparse and delayed reward. We test our algorithm in an ITS environment with
edge cloud computing. Empirical results show that the algorithm is quick to
adapt to new environments and performs better in all individual and system
metrics compared to the state-of-the-art benchmark. Our algorithm also
addresses various practical concerns with its modularized and asynchronous
online training method. In addition to the cloud simulation, we test our
algorithm on a single-board computer and show that it can make inference in 6
milliseconds.
\\ ( https://arxiv.org/abs/2403.08879 ,  10655kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08880
Date: Wed, 13 Mar 2024 18:06:43 GMT   (4011kb,D)

Title: REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP
 Values
Authors: Shubham Sharma, Sanghamitra Dutta, Emanuele Albini, Freddy Lecue,
 Daniele Magazzeni, Manuela Veloso
Categories: cs.LG
DOI: 10.1145/3600211.3604706
\\
 Feature selection is a crucial step in building machine learning models. This
process is often achieved with accuracy as an objective, and can be cumbersome
and computationally expensive for large-scale datasets. Several additional
model performance characteristics such as fairness and robustness are of
importance for model development. As regulations are driving the need for more
trustworthy models, deployed models need to be corrected for model
characteristics associated with responsible artificial intelligence. When
feature selection is done with respect to one model performance characteristic
(eg. accuracy), feature selection with secondary model performance
characteristics (eg. fairness and robustness) as objectives would require going
through the computationally expensive selection process from scratch. In this
paper, we introduce the problem of feature \emph{reselection}, so that features
can be selected with respect to secondary model performance characteristics
efficiently even after a feature selection process has been done with respect
to a primary objective. To address this problem, we propose REFRESH, a method
to reselect features so that additional constraints that are desirable towards
model performance can be achieved without having to train several new models.
REFRESH's underlying algorithm is a novel technique using SHAP values and
correlation analysis that can approximate for the predictions of a model
without having to train these models. Empirical evaluations on three datasets,
including a large-scale loan defaulting dataset show that REFRESH can help find
alternate models with better model characteristics efficiently. We also discuss
the need for reselection and REFRESH based on regulation desiderata.
\\ ( https://arxiv.org/abs/2403.08880 ,  4011kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08896
Date: Wed, 13 Mar 2024 18:37:16 GMT   (42kb)

Title: One-Shot Averaging for Distributed TD($\lambda$) Under Markov Sampling
Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky
Categories: cs.LG
\\
 We consider a distributed setup for reinforcement learning, where each agent
has a copy of the same Markov Decision Process but transitions are sampled from
the corresponding Markov chain independently by each agent. We show that in
this setting, we can achieve a linear speedup for TD($\lambda$), a family of
popular methods for policy evaluation, in the sense that $N$ agents can
evaluate a policy $N$ times faster provided the target accuracy is small
enough. Notably, this speedup is achieved by ``one shot averaging,'' a
procedure where the agents run TD($\lambda$) with Markov sampling independently
and only average their results after the final step. This significantly reduces
the amount of communication required to achieve a linear speedup relative to
previous work.
\\ ( https://arxiv.org/abs/2403.08896 ,  42kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08946
Date: Wed, 13 Mar 2024 20:25:27 GMT   (5156kb,D)

Title: Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM
 Era
Authors: Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang,
 Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu
Categories: cs.LG cs.CL cs.CY
Comments: 38 pages, 4 figures
\\
 Explainable AI (XAI) refers to techniques that provide human-understandable
insights into the workings of AI models. Recently, the focus of XAI is being
extended towards Large Language Models (LLMs) which are often criticized for
their lack of transparency. This extension calls for a significant
transformation in XAI methodologies because of two reasons. First, many
existing XAI methods cannot be directly applied to LLMs due to their complexity
advanced capabilities. Second, as LLMs are increasingly deployed across diverse
industry applications, the role of XAI shifts from merely opening the "black
box" to actively enhancing the productivity and applicability of LLMs in
real-world settings. Meanwhile, unlike traditional machine learning models that
are passive recipients of XAI insights, the distinct abilities of LLMs can
reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in
the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems,
and (2) how LLMs can contribute to the advancement of XAI. We introduce 10
strategies, introducing the key techniques for each and discussing their
associated challenges. We also provide case studies to demonstrate how to
obtain and leverage explanations. The code used in this paper can be found at:
https://github.com/JacksonWuxs/UsableXAI_LLM.
\\ ( https://arxiv.org/abs/2403.08946 ,  5156kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08955
Date: Wed, 13 Mar 2024 20:50:49 GMT   (999kb,D)

Title: Towards Efficient Risk-Sensitive Policy Gradient: An Iteration
 Complexity Analysis
Authors: Rui Liu, Erfaun Noorani, Pratap Tokekar, John S. Baras
Categories: cs.LG cs.AI
\\
 Reinforcement Learning (RL) has shown exceptional performance across various
applications, enabling autonomous agents to learn optimal policies through
interaction with their environments. However, traditional RL frameworks often
face challenges in terms of iteration complexity and robustness. Risk-sensitive
RL, which balances expected return and risk, has been explored for its
potential to yield probabilistically robust policies, yet its iteration
complexity analysis remains underexplored. In this study, we conduct a thorough
iteration complexity analysis for the risk-sensitive policy gradient method,
focusing on the REINFORCE algorithm and employing the exponential utility
function. We obtain an iteration complexity of $\mathcal{O}(\epsilon^{-2})$ to
reach an $\epsilon$-approximate first-order stationary point (FOSP). We
investigate whether risk-sensitive algorithms can achieve better iteration
complexity compared to their risk-neutral counterparts. Our theoretical
analysis demonstrates that risk-sensitive REINFORCE can have a reduced number
of iterations required for convergence. This leads to improved iteration
complexity, as employing the exponential utility does not entail additional
computation per iteration. We characterize the conditions under which
risk-sensitive algorithms can achieve better iteration complexity. Our
simulation results also validate that risk-averse cases can converge and
stabilize more quickly after approximately half of the episodes compared to
their risk-neutral counterparts.
\\ ( https://arxiv.org/abs/2403.08955 ,  999kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08980
Date: Wed, 13 Mar 2024 22:10:42 GMT   (137kb,D)

Title: Architectural Implications of Neural Network Inference for High
 Data-Rate, Low-Latency Scientific Applications
Authors: Olivia Weng, Alexander Redding, Nhan Tran, Javier Mauricio Duarte,
 Ryan Kastner
Categories: cs.LG cs.AR
\\
 With more scientific fields relying on neural networks (NNs) to process data
incoming at extreme throughputs and latencies, it is crucial to develop NNs
with all their parameters stored on-chip. In many of these applications, there
is not enough time to go off-chip and retrieve weights. Even more so, off-chip
memory such as DRAM does not have the bandwidth required to process these NNs
as fast as the data is being produced (e.g., every 25 ns). As such, these
extreme latency and bandwidth requirements have architectural implications for
the hardware intended to run these NNs: 1) all NN parameters must fit on-chip,
and 2) codesigning custom/reconfigurable logic is often required to meet these
latency and bandwidth constraints. In our work, we show that many scientific NN
applications must run fully on chip, in the extreme case requiring a custom
chip to meet such stringent constraints.
\\ ( https://arxiv.org/abs/2403.08980 ,  137kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09035
Date: Thu, 14 Mar 2024 02:11:38 GMT   (6378kb,D)

Title: DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers
Authors: Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma
Categories: cs.LG
\\
 Enabling efficient and accurate deep neural network (DNN) inference on
microcontrollers is non-trivial due to the constrained on-chip resources.
Current methodologies primarily focus on compressing larger models yet at the
expense of model accuracy. In this paper, we rethink the problem from the
inverse perspective by constructing small/weak models directly and improving
their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference
framework with a selector-classifiers architecture, where the selector routes
each input sample to the appropriate classifier for classification. DiTMoS is
grounded on a key insight: a composition of weak models can exhibit high
diversity and the union of them can significantly boost the accuracy upper
bound. To approach the upper bound, DiTMoS introduces three strategies
including diverse training data splitting to increase the classifiers'
diversity, adversarial selector-classifiers training to ensure synergistic
interactions thereby maximizing their complementarity, and heterogeneous
feature aggregation to improve the capacity of classifiers. We further propose
a network slicing technique to alleviate the extra memory overhead incurred by
feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and
evaluate it based on three time-series datasets for human activity recognition,
keywords spotting, and emotion recognition, respectively. The experiment
results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement
compared to the best baseline; (b) network slicing almost completely eliminates
the memory overhead incurred by feature aggregation with a marginal increase of
latency.
\\ ( https://arxiv.org/abs/2403.09035 ,  6378kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09039
Date: Thu, 14 Mar 2024 02:26:10 GMT   (10026kb,D)

Title: Spatial-temporal Memories Enhanced Graph Autoencoder for Anomaly
 Detection in Dynamic Graphs
Authors: Jie Liu, Xuequn Shang, Xiaolin Han, Wentao Zhang, Hongzhi Yin
Categories: cs.LG cs.AI
\\
 Anomaly detection in dynamic graphs presents a significant challenge due to
the temporal evolution of graph structures and attributes. The conventional
approaches that tackle this problem typically employ an unsupervised learning
framework, capturing normality patterns with exclusive normal data during
training and identifying deviations as anomalies during testing. However, these
methods face critical drawbacks: they either only depend on proxy tasks for
general representation without directly pinpointing normal patterns, or they
neglect to differentiate between spatial and temporal normality patterns,
leading to diminished efficacy in anomaly detection. To address these
challenges, we introduce a novel Spatial-Temporal memories-enhanced graph
autoencoder (STRIPE). Initially, STRIPE employs Graph Neural Networks (GNNs)
and gated temporal convolution layers to extract spatial features and temporal
features, respectively. Then STRIPE incorporates separate spatial and temporal
memory networks, which capture and store prototypes of normal patterns, thereby
preserving the uniqueness of spatial and temporal normality. After that,
through a mutual attention mechanism, these stored patterns are then retrieved
and integrated with encoded graph embeddings. Finally, the integrated features
are fed into the decoder to reconstruct the graph streams which serve as the
proxy task for anomaly detection. This comprehensive approach not only
minimizes reconstruction errors but also refines the model by emphasizing the
compactness and distinctiveness of the embeddings in relation to the nearest
memory prototypes. Through extensive testing, STRIPE has demonstrated a
superior capability to discern anomalies by effectively leveraging the distinct
spatial and temporal dynamics of dynamic graphs, significantly outperforming
existing methodologies, with an average improvement of 15.39% on AUC values.
\\ ( https://arxiv.org/abs/2403.09039 ,  10026kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09048
Date: Thu, 14 Mar 2024 02:36:16 GMT   (2596kb,D)

Title: Taming Cross-Domain Representation Variance in Federated Prototype
 Learning with Heterogeneous Data Domains
Authors: Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, Jie Xu
Categories: cs.LG cs.CV
Comments: 16 pages
\\
 Federated learning (FL) allows collaborative machine learning training
without sharing private data. While most FL methods assume identical data
domains across clients, real-world scenarios often involve heterogeneous data
domains. Federated Prototype Learning (FedPL) addresses this issue, using mean
feature vectors as prototypes to enhance model generalization. However,
existing FedPL methods create the same number of prototypes for each client,
leading to cross-domain performance gaps and disparities for clients with
varied data distributions. To mitigate cross-domain feature representation
variance, we introduce FedPLVM, which establishes variance-aware dual-level
prototypes clustering and employs a novel $\alpha$-sparsity prototype loss. The
dual-level prototypes clustering strategy creates local clustered prototypes
based on private data features, then performs global prototypes clustering to
reduce communication complexity and preserve local data privacy. The
$\alpha$-sparsity prototype loss aligns samples from underrepresented domains,
enhancing intra-class similarity and reducing inter-class similarity.
Evaluations on Digit-5, Office-10, and DomainNet datasets demonstrate our
method's superiority over existing approaches.
\\ ( https://arxiv.org/abs/2403.09048 ,  2596kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09053
Date: Thu, 14 Mar 2024 02:42:19 GMT   (129kb,D)

Title: Towards a theory of model distillation
Authors: Enric Boix-Adsera
Categories: cs.LG cs.AI cs.NE
Comments: 47 pages, 5 figures. Please reach out with comments! Feedback is
 welcome
\\
 Distillation is the task of replacing a complicated machine learning model
with a simpler model that approximates the original [BCNM06,HVD15]. Despite
many practical applications, basic questions about the extent to which models
can be distilled, and the runtime and amount of data needed to distill, remain
largely open.
 To study these questions, we initiate a general theory of distillation,
defining PAC-distillation in an analogous way to PAC-learning [Val84]. As
applications of this theory: (1) we propose new algorithms to extract the
knowledge stored in the trained weights of neural networks -- we show how to
efficiently distill neural networks into succinct, explicit decision tree
representations when possible by using the ``linear representation
hypothesis''; and (2) we prove that distillation can be much cheaper than
learning from scratch, and make progress on characterizing its complexity.
\\ ( https://arxiv.org/abs/2403.09053 ,  129kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09054
Date: Thu, 14 Mar 2024 02:42:42 GMT   (1504kb,D)

Title: Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient
 Generative Inference
Authors: Muhammad Adnan and Akhil Arunkumar and Gaurav Jain and Prashant J.
 Nair and Ilya Soloveychik and Purushotham Kamath
Categories: cs.LG cs.AI cs.AR
Comments: A collaborative effort by d-matrix and the University of British
 Columbia
MSC-class: 68U35
ACM-class: I.2.7; C.0
Journal-ref: Proceedings of the 7th Annual Conference on Machine Learning and
 Systems (MLSys), 2024
\\
 Transformers have emerged as the underpinning architecture for Large Language
Models (LLMs). In generative language models, the inference process involves
two primary phases: prompt processing and token generation. Token generation,
which constitutes the majority of the computational workload, primarily entails
vector-matrix multiplications and interactions with the Key-Value (KV) Cache.
This phase is constrained by memory bandwidth due to the overhead of
transferring weights and KV cache values from the memory system to the
computing units. This memory bottleneck becomes particularly pronounced in
applications that require long-context and extensive text generation, both of
which are increasingly crucial for LLMs.
 This paper introduces "Keyformer", an innovative inference-time approach, to
mitigate the challenges associated with KV cache size and memory bandwidth
utilization. Keyformer leverages the observation that approximately 90% of the
attention weight in generative inference focuses on a specific subset of
tokens, referred to as "key" tokens. Keyformer retains only the key tokens in
the KV cache by identifying these crucial tokens using a novel score function.
This approach effectively reduces both the KV cache size and memory bandwidth
usage without compromising model accuracy. We evaluate Keyformer's performance
across three foundational models: GPT-J, Cerebras-GPT, and MPT, which employ
various positional embedding algorithms. Our assessment encompasses a variety
of tasks, with a particular emphasis on summarization and conversation tasks
involving extended contexts. Keyformer's reduction of KV cache reduces
inference latency by 2.1x and improves token generation throughput by 2.4x,
while preserving the model's accuracy.
\\ ( https://arxiv.org/abs/2403.09054 ,  1504kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09066
Date: Thu, 14 Mar 2024 03:13:01 GMT   (551kb,D)

Title: Hyperparameters in Continual Learning: a Reality Check
Authors: Sungmin Cha and Kyunghyun Cho
Categories: cs.LG cs.CV
Comments: Preprint
\\
 Various algorithms for continual learning (CL) have been designed with the
goal of effectively alleviating the trade-off between stability and plasticity
during the CL process. To achieve this goal, tuning appropriate hyperparameters
for each algorithm is essential. As an evaluation protocol, it has been common
practice to train a CL algorithm using diverse hyperparameter values on a CL
scenario constructed with a benchmark dataset. Subsequently, the best
performance attained with the optimal hyperparameter value serves as the
criterion for evaluating the CL algorithm. In this paper, we contend that this
evaluation protocol is not only impractical but also incapable of effectively
assessing the CL capability of a CL algorithm. Returning to the fundamental
principles of model evaluation in machine learning, we propose an evaluation
protocol that involves Hyperparameter Tuning and Evaluation phases. Those
phases consist of different datasets but share the same CL scenario. In the
Hyperparameter Tuning phase, each algorithm is iteratively trained with
different hyperparameter values to find the optimal hyperparameter values.
Subsequently, in the Evaluation phase, the optimal hyperparameter values is
directly applied for training each algorithm, and their performance in the
Evaluation phase serves as the criterion for evaluating them. Through
experiments on CIFAR-100 and ImageNet-100 based on the proposed protocol in
class-incremental learning, we not only observed that the existing evaluation
method fail to properly assess the CL capability of each algorithm but also
observe that some recently proposed state-of-the-art algorithms, which reported
superior performance, actually exhibit inferior performance compared to the
previous algorithm.
\\ ( https://arxiv.org/abs/2403.09066 ,  551kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09086
Date: Thu, 14 Mar 2024 04:06:45 GMT   (5578kb,D)

Title: Learning from straggler clients in federated learning
Authors: Andrew Hard, Antonious M. Girgis, Ehsan Amid, Sean Augenstein, Lara
 McConnaughey, Rajiv Mathews, Rohan Anil
Categories: cs.LG
\\
 How well do existing federated learning algorithms learn from client devices
that return model updates with a significant time delay? Is it even possible to
learn effectively from clients that report back minutes, hours, or days after
being scheduled? We answer these questions by developing Monte Carlo
simulations of client latency that are guided by real-world applications. We
study synchronous optimization algorithms like FedAvg and FedAdam as well as
the asynchronous FedBuff algorithm, and observe that all these existing
approaches struggle to learn from severely delayed clients. To improve upon
this situation, we experiment with modifications, including distillation
regularization and exponential moving averages of model weights. Finally, we
introduce two new algorithms, FARe-DUST and FeAST-on-MSG, based on distillation
and averaging, respectively. Experiments with the EMNIST, CIFAR-100, and
StackOverflow benchmark federated learning tasks demonstrate that our new
algorithms outperform existing ones in terms of accuracy for straggler clients,
while also providing better trade-offs between training time and total
accuracy.
\\ ( https://arxiv.org/abs/2403.09086 ,  5578kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09101
Date: Thu, 14 Mar 2024 04:48:31 GMT   (4854kb,D)

Title: Soften to Defend: Towards Adversarial Robustness via Self-Guided Label
 Refinement
Authors: Daiwei Yu and Zhuorong Li and Lina Wei and Canghong Jin and Yun Zhang
 and Sixian Chan
Categories: cs.LG cs.CV
Comments: Accepted to CVPR 2024
\\
 Adversarial training (AT) is currently one of the most effective ways to
obtain the robustness of deep neural networks against adversarial attacks.
However, most AT methods suffer from robust overfitting, i.e., a significant
generalization gap in adversarial robustness between the training and testing
curves. In this paper, we first identify a connection between robust
overfitting and the excessive memorization of noisy labels in AT from a view of
gradient norm. As such label noise is mainly caused by a distribution mismatch
and improper label assignments, we are motivated to propose a label refinement
approach for AT. Specifically, our Self-Guided Label Refinement first
self-refines a more accurate and informative label distribution from
over-confident hard labels, and then it calibrates the training by dynamically
incorporating knowledge from self-distilled models into the current model and
thus requiring no external teachers. Empirical results demonstrate that our
method can simultaneously boost the standard accuracy and robust performance
across multiple benchmark datasets, attack types, and architectures. In
addition, we also provide a set of analyses from the perspectives of
information theory to dive into our method and suggest the importance of soft
labels for robust generalization.
\\ ( https://arxiv.org/abs/2403.09101 ,  4854kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09107
Date: Thu, 14 Mar 2024 05:00:29 GMT   (15850kb,D)

Title: S^2MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Authors: Zhen Long, Qiyuan Wang, Yazhou Ren, Yipeng Liu, Ce Zhu
Categories: cs.LG cs.CV
Comments: Accepted by CVPR2024
\\
 Anchor-based large-scale multi-view clustering has attracted considerable
attention for its effectiveness in handling massive datasets. However, current
methods mainly seek the consensus embedding feature for clustering by exploring
global correlations between anchor graphs or projection matrices.In this paper,
we propose a simple yet efficient scalable multi-view tensor clustering
(S^2MVTC) approach, where our focus is on learning correlations of embedding
features within and across views. Specifically, we first construct the
embedding feature tensor by stacking the embedding features of different views
into a tensor and rotating it. Additionally, we build a novel tensor
low-frequency approximation (TLFA) operator, which incorporates graph
similarity into embedding feature learning, efficiently achieving smooth
representation of embedding features within different views. Furthermore,
consensus constraints are applied to embedding features to ensure inter-view
semantic consistency. Experimental results on six large-scale multi-view
datasets demonstrate that S^2MVTC significantly outperforms state-of-the-art
algorithms in terms of clustering performance and CPU execution time,
especially when handling massive data. The code of S^2MVTC is publicly
available at https://github.com/longzhen520/S2MVTC.
\\ ( https://arxiv.org/abs/2403.09107 ,  15850kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09110
Date: Thu, 14 Mar 2024 05:17:39 GMT   (22377kb,D)

Title: SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning
Authors: Nicholas Zolman, Urban Fasel, J. Nathan Kutz, Steven L. Brunton
Categories: cs.LG cs.SY eess.SY math.DS math.OC
Comments: 24 pages + 14 appendices (45 pages total). 25 figures, 7 tables. For
 code, see https://github.com/nzolman/sindy-rl
\\
 Deep reinforcement learning (DRL) has shown significant promise for
uncovering sophisticated control policies that interact in environments with
complicated dynamics, such as stabilizing the magnetohydrodynamics of a tokamak
fusion reactor or minimizing the drag force exerted on an object in a fluid
flow. However, these algorithms require an abundance of training examples and
may become prohibitively expensive for many applications. In addition, the
reliance on deep neural networks often results in an uninterpretable, black-box
policy that may be too computationally expensive to use with certain embedded
systems. Recent advances in sparse dictionary learning, such as the sparse
identification of nonlinear dynamics (SINDy), have shown promise for creating
efficient and interpretable data-driven models in the low-data regime. In this
work we introduce SINDy-RL, a unifying framework for combining SINDy and DRL to
create efficient, interpretable, and trustworthy representations of the
dynamics model, reward function, and control policy. We demonstrate the
effectiveness of our approaches on benchmark control environments and
challenging fluids problems. SINDy-RL achieves comparable performance to
state-of-the-art DRL algorithms using significantly fewer interactions in the
environment and results in an interpretable control policy orders of magnitude
smaller than a deep neural network policy.
\\ ( https://arxiv.org/abs/2403.09110 ,  22377kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09123
Date: Thu, 14 Mar 2024 06:14:07 GMT   (2059kb,D)

Title: Optimal Top-Two Method for Best Arm Identification and Fluid Analysis
Authors: Agniv Bandyopadhyay, Sandeep Juneja, Shubhada Agrawal
Categories: cs.LG cs.IT math.IT stat.ML
\\
 Top-$2$ methods have become popular in solving the best arm identification
(BAI) problem. The best arm, or the arm with the largest mean amongst finitely
many, is identified through an algorithm that at any sequential step
independently pulls the empirical best arm, with a fixed probability $\beta$,
and pulls the best challenger arm otherwise. The probability of incorrect
selection is guaranteed to lie below a specified $\delta >0$. Information
theoretic lower bounds on sample complexity are well known for BAI problem and
are matched asymptotically as $\delta \rightarrow 0$ by computationally
demanding plug-in methods. The above top 2 algorithm for any $\beta \in (0,1)$
has sample complexity within a constant of the lower bound. However,
determining the optimal $\beta$ that matches the lower bound has proven
difficult. In this paper, we address this and propose an optimal top-2 type
algorithm. We consider a function of allocations anchored at a threshold. If it
exceeds the threshold then the algorithm samples the empirical best arm.
Otherwise, it samples the challenger arm. We show that the proposed algorithm
is optimal as $\delta \rightarrow 0$. Our analysis relies on identifying a
limiting fluid dynamics of allocations that satisfy a series of ordinary
differential equations pasted together and that describe the asymptotic path
followed by our algorithm. We rely on the implicit function theorem to show
existence and uniqueness of these fluid ode's and to show that the proposed
algorithm remains close to the ode solution.
\\ ( https://arxiv.org/abs/2403.09123 ,  2059kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09171
Date: Thu, 14 Mar 2024 08:31:39 GMT   (1081kb,D)

Title: ADEdgeDrop: Adversarial Edge Dropping for Robust Graph Neural Networks
Authors: Zhaoliang Chen, Zhihao Wu, Ylli Sadikaj, Claudia Plant, Hong-Ning Dai,
 Shiping Wang, Wenzhong Guo
Categories: cs.LG cs.AI
\\
 Although Graph Neural Networks (GNNs) have exhibited the powerful ability to
gather graph-structured information from neighborhood nodes via various
message-passing mechanisms, the performance of GNNs is limited by poor
generalization and fragile robustness caused by noisy and redundant graph data.
As a prominent solution, Graph Augmentation Learning (GAL) has recently
received increasing attention. Among prior GAL approaches, edge-dropping
methods that randomly remove edges from a graph during training are effective
techniques to improve the robustness of GNNs. However, randomly dropping edges
often results in bypassing critical edges, consequently weakening the
effectiveness of message passing. In this paper, we propose a novel adversarial
edge-dropping method (ADEdgeDrop) that leverages an adversarial edge predictor
guiding the removal of edges, which can be flexibly incorporated into diverse
GNN backbones. Employing an adversarial training framework, the edge predictor
utilizes the line graph transformed from the original graph to estimate the
edges to be dropped, which improves the interpretability of the edge-dropping
method. The proposed ADEdgeDrop is optimized alternately by stochastic gradient
descent and projected gradient descent. Comprehensive experiments on six graph
benchmark datasets demonstrate that the proposed ADEdgeDrop outperforms
state-of-the-art baselines across various GNN backbones, demonstrating improved
generalization and robustness.
\\ ( https://arxiv.org/abs/2403.09171 ,  1081kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09188
Date: Thu, 14 Mar 2024 09:03:51 GMT   (418kb)

Title: Design of an basis-projected layer for sparse datasets in deep learning
 training using gc-ms spectra as a case study
Authors: Yu Tang Chang and Shih Fang Chen
Categories: cs.LG
Comments: 5 pages, 2 figures, 2 tables, conference
MSC-class: 68-06
ACM-class: I.2.4; J.2
\\
 Deep learning (DL) models encompass millions or even billions of parameters
and learn complex patterns from big data. However, not all data are initially
stored in a suitable formation to effectively train a DL model, e.g., gas
chromatography-mass spectrometry (GC-MS) spectra and DNA sequence. These
datasets commonly contain many zero values, and the sparse data formation
causes difficulties in optimizing DL models. A DL module called the
basis-projected layer (BPL) was proposed to mitigate the issue by transforming
the sparse data into a dense representation. The transformed data is expected
to facilitate the gradient calculation and finetuned process in a DL training
process. The dataset, example of a sparse dataset, contained 362 specialty
coffee odorant spectra detected from GC-MS. The BPL layer was placed at the
beginning of the DL model. The tunable parameters in the layer were learnable
projected axes that were the bases of a new representation space. The layer
rotated these bases when its parameters were updated. When the number of the
bases was the same as the original dimension, the increasing percentage of the
F1 scores was 8.56%. Furthermore, when the number was set as 768 (the original
dimension was 490), the increasing percentage of the F1 score was 11.49%. The
layer not only maintained the model performance and even constructed a better
representation space in analyzing sparse datasets.
\\ ( https://arxiv.org/abs/2403.09188 ,  418kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09215
Date: Thu, 14 Mar 2024 09:28:28 GMT   (553kb,D)

Title: On the Laplace Approximation as Model Selection Criterion for Gaussian
 Processes
Authors: Andreas Besginow, Jan David H\"uwel, Thomas Pawellek, Christian
 Beecks, Markus Lange-Hegermann
Categories: cs.LG cs.AI
\\
 Model selection aims to find the best model in terms of accuracy,
interpretability or simplicity, preferably all at once. In this work, we focus
on evaluating model performance of Gaussian process models, i.e. finding a
metric that provides the best trade-off between all those criteria. While
previous work considers metrics like the likelihood, AIC or dynamic nested
sampling, they either lack performance or have significant runtime issues,
which severely limits applicability. We address these challenges by introducing
multiple metrics based on the Laplace approximation, where we overcome a severe
inconsistency occuring during naive application of the Laplace approximation.
Experiments show that our metrics are comparable in quality to the gold
standard dynamic nested sampling without compromising for computational speed.
Our model selection criteria allow significantly faster and high quality model
selection of Gaussian process models.
\\ ( https://arxiv.org/abs/2403.09215 ,  553kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09223
Date: Thu, 14 Mar 2024 09:43:07 GMT   (3346kb,D)

Title: MCformer: Multivariate Time Series Forecasting with Mixed-Channels
 Transformer
Authors: Wenyong Han, Tao Zhu Member, Liming Chen, Huansheng Ning, Yang Luo,
 Yaping Wan
Categories: cs.LG
\\
 The massive generation of time-series data by largescale Internet of Things
(IoT) devices necessitates the exploration of more effective models for
multivariate time-series forecasting. In previous models, there was a
predominant use of the Channel Dependence (CD) strategy (where each channel
represents a univariate sequence). Current state-of-the-art (SOTA) models
primarily rely on the Channel Independence (CI) strategy. The CI strategy
treats all channels as a single channel, expanding the dataset to improve
generalization performance and avoiding inter-channel correlation that disrupts
long-term features. However, the CI strategy faces the challenge of
interchannel correlation forgetting. To address this issue, we propose an
innovative Mixed Channels strategy, combining the data expansion advantages of
the CI strategy with the ability to counteract inter-channel correlation
forgetting. Based on this strategy, we introduce MCformer, a multivariate
time-series forecasting model with mixed channel features. The model blends a
specific number of channels, leveraging an attention mechanism to effectively
capture inter-channel correlation information when modeling long-term features.
Experimental results demonstrate that the Mixed Channels strategy outperforms
pure CI strategy in multivariate time-series forecasting tasks.
\\ ( https://arxiv.org/abs/2403.09223 ,  3346kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09228
Date: Thu, 14 Mar 2024 09:48:48 GMT   (522kb,D)

Title: Uncertainty Quantification for cross-subject Motor Imagery
 classification
Authors: Prithviraj Manivannan, Ivo Pascal de Jong, Matias Valdenegro-Toro,
 Andreea Ioana Sburlea
Categories: cs.LG
\\
 Uncertainty Quantification aims to determine when the prediction from a
Machine Learning model is likely to be wrong. Computer Vision research has
explored methods for determining epistemic uncertainty (also known as model
uncertainty), which should correspond with generalisation error. These methods
theoretically allow to predict misclassifications due to inter-subject
variability. We applied a variety of Uncertainty Quantification methods to
predict misclassifications for a Motor Imagery Brain Computer Interface. Deep
Ensembles performed best, both in terms of classification performance and
cross-subject Uncertainty Quantification performance. However, we found that
standard CNNs with Softmax output performed better than some of the more
advanced methods.
\\ ( https://arxiv.org/abs/2403.09228 ,  522kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09284
Date: Thu, 14 Mar 2024 11:12:10 GMT   (27423kb,D)

Title: DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning
Authors: Xu Yang, Jiyuan Feng, Songyue Guo, Ye Wang, Ye Ding, Binxing Fang,
 Qing Liao
Categories: cs.LG cs.DC
\\
 Personalized federated learning becomes a hot research topic that can learn a
personalized learning model for each client. Existing personalized federated
learning models prefer to aggregate similar clients with similar data
distribution to improve the performance of learning models. However,
similaritybased personalized federated learning methods may exacerbate the
class imbalanced problem. In this paper, we propose a novel Dynamic
Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the
class imbalanced problem during federated learning. Specifically, we build an
affinity metric from a complementary perspective to guide which clients should
be aggregated. Then we design a dynamic aggregation strategy to dynamically
aggregate clients based on the affinity metric in each round to reduce the
class imbalanced risk. Extensive experiments show that the proposed DA-PFL
model can significantly improve the accuracy of each client in three real-world
datasets with state-of-the-art comparison methods.
\\ ( https://arxiv.org/abs/2403.09284 ,  27423kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09300
Date: Thu, 14 Mar 2024 11:46:25 GMT   (63kb)

Title: Recursive Causal Discovery
Authors: Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari, Negar Kiyavash
Categories: cs.LG stat.ML
Comments: 50 pages, 5 tables, 11 algorithms, 5 figures
\\
 Causal discovery, i.e., learning the causal graph from data, is often the
first step toward the identification and estimation of causal effects, a key
requirement in numerous scientific domains. Causal discovery is hampered by two
main challenges: limited data results in errors in statistical testing and the
computational complexity of the learning task is daunting. This paper builds
upon and extends four of our prior publications (Mokhtarian et al., 2021;
Akbari et al., 2021; Mokhtarian et al., 2022, 2023a). These works introduced
the concept of removable variables, which are the only variables that can be
removed recursively for the purpose of causal discovery. Presence and
identification of removable variables allow recursive approaches for causal
discovery, a promising solution that helps to address the aforementioned
challenges by reducing the problem size successively. This reduction not only
minimizes conditioning sets in each conditional independence (CI) test, leading
to fewer errors but also significantly decreases the number of required CI
tests. The worst-case performances of these methods nearly match the lower
bound. In this paper, we present a unified framework for the proposed
algorithms, refined with additional details and enhancements for a coherent
presentation. A comprehensive literature review is also included, comparing the
computational complexity of our methods with existing approaches, showcasing
their state-of-the-art efficiency. Another contribution of this paper is the
release of RCD, a Python package that efficiently implements these algorithms.
This package is designed for practitioners and researchers interested in
applying these methods in practical scenarios. The package is available at
github.com/ban-epfl/rcd, with comprehensive documentation provided at
rcdpackage.com.
\\ ( https://arxiv.org/abs/2403.09300 ,  63kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09303
Date: Thu, 14 Mar 2024 11:51:01 GMT   (205kb,D)

Title: Rethinking Autoencoders for Medical Anomaly Detection from A Theoretical
 Perspective
Authors: Yu Cai, Hao Chen, Kwang-Ting Cheng
Categories: cs.LG cs.CV
\\
 Medical anomaly detection aims to identify abnormal findings using only
normal training data, playing a crucial role in health screening and
recognizing rare diseases. Reconstruction-based methods, particularly those
utilizing autoencoders (AEs), are dominant in this field. They work under the
assumption that AEs trained on only normal data cannot reconstruct unseen
abnormal regions well, thereby enabling the anomaly detection based on
reconstruction errors. However, this assumption does not always hold due to the
mismatch between the reconstruction training objective and the anomaly
detection task objective, rendering these methods theoretically unsound. This
study focuses on providing a theoretical foundation for AE-based reconstruction
methods in anomaly detection. By leveraging information theory, we elucidate
the principles of these methods and reveal that the key to improving AE in
anomaly detection lies in minimizing the information entropy of latent vectors.
Experiments on four datasets with two image modalities validate the
effectiveness of our theory. To the best of our knowledge, this is the first
effort to theoretically clarify the principles and design philosophy of AE for
anomaly detection. Code will be available upon acceptance.
\\ ( https://arxiv.org/abs/2403.09303 ,  205kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09415
Date: Thu, 14 Mar 2024 14:04:37 GMT   (1715kb,D)

Title: User Identification via Free Roaming Eye Tracking Data
Authors: Rishabh Vallabh Varsha Haria, Amin El Abed, Sebastian Maneth
Categories: cs.LG cs.HC
\\
 We present a new dataset of "free roaming" (FR) and "targeted roaming" (TR):
a pool of 41 participants is asked to walk around a university campus (FR) or
is asked to find a particular room within a library (TR). Eye movements are
recorded using a commodity wearable eye tracker (Pupil Labs Neon at 200Hz). On
this dataset we investigate the accuracy of user identification using a
previously known machine learning pipeline where a Radial Basis Function
Network (RBFN) is used as classifier. Our highest accuracies are 87.3% for FR
and 89.4% for TR. This should be compared to 95.3% which is the (corresponding)
highest accuracy we are aware of (achieved in a laboratory setting using the
"RAN" stimulus of the BioEye 2015 competition dataset). To the best of our
knowledge, our results are the first that study user identification in a non
laboratory setting; such settings are often more feasible than laboratory
settings and may include further advantages. The minimum duration of each
recording is 263s for FR and 154s for TR. Our best accuracies are obtained when
restricting to 120s and 140s for FR and TR respectively, always cut from the
end of the trajectories (both for the training and testing sessions). If we cut
the same length from the beginning, then accuracies are 12.2% lower for FR and
around 6.4% lower for TR. On the full trajectories accuracies are lower by 5%
and 52% for FR and TR. We also investigate the impact of including higher order
velocity derivatives (such as acceleration, jerk, or jounce).
\\ ( https://arxiv.org/abs/2403.09415 ,  1715kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09428
Date: Thu, 14 Mar 2024 14:19:48 GMT   (8012kb,D)

Title: Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
 Learning with Missing Modalities and Data Scarcity
Authors: Zhuo Zhi, Ziquan Liu, Moe Elbadawi, Adam Daneshmend, Mine Orlu, Abdul
 Basit, Andreas Demosthenous, Miguel Rodrigues
Categories: cs.LG
\\
 Multimodal machine learning with missing modalities is an increasingly
relevant challenge arising in various applications such as healthcare. This
paper extends the current research into missing modalities to the low-data
regime, i.e., a downstream task has both missing modalities and limited sample
size issues. This problem setting is particularly challenging and also
practical as it is often expensive to get full-modality data and sufficient
annotated training samples. We propose to use retrieval-augmented in-context
learning to address these two crucial issues by unleashing the potential of a
transformer's in-context learning ability. Diverging from existing methods,
which primarily belong to the parametric paradigm and often require sufficient
training samples, our work exploits the value of the available full-modality
data, offering a novel perspective on resolving the challenge. The proposed
data-dependent framework exhibits a higher degree of sample efficiency and is
empirically demonstrated to enhance the classification model's performance on
both full- and missing-modality data in the low-data regime across various
multimodal learning tasks. When only 1% of the training data are available, our
proposed method demonstrates an average improvement of 6.1% over a recent
strong baseline across various datasets and missing states. Notably, our method
also reduces the performance gap between full-modality and missing-modality
data compared with the baseline.
\\ ( https://arxiv.org/abs/2403.09428 ,  8012kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09441
Date: Thu, 14 Mar 2024 14:34:25 GMT   (1126kb,D)

Title: Adversarial Fine-tuning of Compressed Neural Networks for Joint
 Improvement of Robustness and Efficiency
Authors: Hallgrimur Thorsteinsson, Valdemar J Henriksen, Tong Chen, Raghavendra
 Selvan
Categories: cs.LG
Comments: 22 pages, 4 figures, 6 tables
\\
 As deep learning (DL) models are increasingly being integrated into our
everyday lives, ensuring their safety by making them robust against adversarial
attacks has become increasingly critical. DL models have been found to be
susceptible to adversarial attacks which can be achieved by introducing small,
targeted perturbations to disrupt the input data. Adversarial training has been
presented as a mitigation strategy which can result in more robust models. This
adversarial robustness comes with additional computational costs required to
design adversarial attacks during training. The two objectives -- adversarial
robustness and computational efficiency -- then appear to be in conflict of
each other. In this work, we explore the effects of two different model
compression methods -- structured weight pruning and quantization -- on
adversarial robustness. We specifically explore the effects of fine-tuning on
compressed models, and present the trade-off between standard fine-tuning and
adversarial fine-tuning. Our results show that compression does not inherently
lead to loss in model robustness and adversarial fine-tuning of a compressed
model can yield large improvement to the robustness performance of models. We
present experiments on two benchmark datasets showing that adversarial
fine-tuning of compressed models can achieve robustness performance comparable
to adversarially trained models, while also improving computational efficiency.
\\ ( https://arxiv.org/abs/2403.09441 ,  1126kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09450
Date: Thu, 14 Mar 2024 14:48:37 GMT   (2425kb,D)

Title: Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative
 Privacy Risk
Authors: Zhangheng Li, Junyuan Hong, Bo Li, Zhangyang Wang
Categories: cs.LG
\\
 While diffusion models have recently demonstrated remarkable progress in
generating realistic images, privacy risks also arise: published models or APIs
could generate training images and thus leak privacy-sensitive training
information. In this paper, we reveal a new risk, Shake-to-Leak (S2L), that
fine-tuning the pre-trained models with manipulated data can amplify the
existing privacy risks. We demonstrate that S2L could occur in various standard
fine-tuning strategies for diffusion models, including concept-injection
methods (DreamBooth and Textual Inversion) and parameter-efficient methods
(LoRA and Hypernetwork), as well as their combinations. In the worst case, S2L
can amplify the state-of-the-art membership inference attack (MIA) on diffusion
models by $5.4\%$ (absolute difference) AUC and can increase extracted private
samples from almost $0$ samples to $16.3$ samples on average per target domain.
This discovery underscores that the privacy risk with diffusion models is even
more severe than previously recognized. Codes are available at
https://github.com/VITA-Group/Shake-to-Leak.
\\ ( https://arxiv.org/abs/2403.09450 ,  2425kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09454
Date: Thu, 14 Mar 2024 14:53:18 GMT   (546kb,D)

Title: Machine learning for structural design models of continuous beam systems
 via influence zones
Authors: Adrien Gallet, Andrew Liew, Iman Hajirasouliha, Danny Smyl
Categories: cs.LG
Comments: 30 pages, 16 figures, 8 tables
DOI: 10.1088/1361-6420/ad3334
\\
 This work develops a machine learned structural design model for continuous
beam systems from the inverse problem perspective. After demarcating between
forward, optimisation and inverse machine learned operators, the investigation
proposes a novel methodology based on the recently developed influence zone
concept which represents a fundamental shift in approach compared to
traditional structural design methods. The aim of this approach is to
conceptualise a non-iterative structural design model that predicts
cross-section requirements for continuous beam systems of arbitrary system
size. After generating a dataset of known solutions, an appropriate neural
network architecture is identified, trained, and tested against unseen data.
The results show a mean absolute percentage testing error of 1.6% for
cross-section property predictions, along with a good ability of the neural
network to generalise well to structural systems of variable size. The CBeamXP
dataset generated in this work and an associated python-based neural network
training script are available at an open-source data repository to allow for
the reproducibility of results and to encourage further investigations.
\\ ( https://arxiv.org/abs/2403.09454 ,  546kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09472
Date: Thu, 14 Mar 2024 15:12:38 GMT   (715kb,D)

Title: Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Authors: Zhiqing Sun, Longhui Yu, Yikang Shen, Weiyang Liu, Yiming Yang, Sean
 Welleck, Chuang Gan
Categories: cs.LG cs.CL
\\
 Current AI alignment methodologies rely on human-provided demonstrations or
judgments, and the learned capabilities of AI systems would be upper-bounded by
human capabilities as a result. This raises a challenging research question:
How can we keep improving the systems when their capabilities have surpassed
the levels of humans? This paper answers this question in the context of
tackling hard reasoning tasks (e.g., level 4-5 MATH problems) via learning from
human annotations on easier tasks (e.g., level 1-3 MATH problems), which we
term as \textit{easy-to-hard generalization}. Our key insight is that an
evaluator (reward model) trained on supervisions for easier tasks can be
effectively used for scoring candidate solutions of harder tasks and hence
facilitating easy-to-hard generalization over different levels of tasks. Based
on this insight, we propose a novel approach to scalable alignment, which
firstly trains the process-supervised reward models on easy problems (e.g.,
level 1-3), and then uses them to evaluate the performance of policy models on
hard problems. We show that such \textit{easy-to-hard generalization from
evaluators} can enable \textit{easy-to-hard generalizations in generators}
either through re-ranking or reinforcement learning (RL). Notably, our
process-supervised 7b RL model achieves an accuracy of 34.0\% on MATH500,
despite only using human supervision on easy problems. Our approach suggests a
promising path toward AI systems that advance beyond the frontier of human
supervision.
\\ ( https://arxiv.org/abs/2403.09472 ,  715kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09479
Date: Thu, 14 Mar 2024 15:20:54 GMT   (359kb,D)

Title: Laying the Foundation First? Investigating the Generalization from
 Atomic Skills to Complex Reasoning Tasks
Authors: Yuncheng Huang, Qianyu He, Yipei Xu, Jiaqing Liang and Yanghua Xiao
Categories: cs.LG
\\
 Current language models have demonstrated their capability to develop basic
reasoning, but struggle in more complicated reasoning tasks that require a
combination of atomic skills, such as math word problem requiring skills like
arithmetic and unit conversion. Previous methods either do not improve the
inherent atomic skills of models or not attempt to generalize the atomic skills
to complex reasoning tasks. In this paper, we first propose a probing framework
to investigate whether the atomic skill can spontaneously generalize to complex
reasoning tasks. Then, we introduce a hierarchical curriculum learning training
strategy to achieve better skill generalization. In our experiments, we find
that atomic skills can not spontaneously generalize to compositional tasks. By
leveraging hierarchical curriculum learning, we successfully induce
generalization, significantly improve the performance of open-source LMs on
complex reasoning tasks. Promisingly, the skill generalization exhibit
effective in cross-dataset and cross-domain scenarios. Complex reasoning can
also help enhance atomic skills. Our findings offer valuable guidance for
designing better training strategies for complex reasoning tasks.
\\ ( https://arxiv.org/abs/2403.09479 ,  359kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09491
Date: Thu, 14 Mar 2024 15:32:25 GMT   (5719kb,D)

Title: On using Machine Learning Algorithms for Motorcycle Collision Detection
Authors: Philipp Rodegast, Steffen Maier, Jonas Kneifl, J\"org Fehr
Categories: cs.LG math.DS
\\
 Globally, motorcycles attract vast and varied users. However, since the rate
of severe injury and fatality in motorcycle accidents far exceeds passenger car
accidents, efforts have been directed toward increasing passive safety systems.
Impact simulations show that the risk of severe injury or death in the event of
a motorcycle-to-car impact can be greatly reduced if the motorcycle is equipped
with passive safety measures such as airbags and seat belts. For the passive
safety systems to be activated, a collision must be detected within
milliseconds for a wide variety of impact configurations, but under no
circumstances may it be falsely triggered. For the challenge of reliably
detecting impending collisions, this paper presents an investigation towards
the applicability of machine learning algorithms. First, a series of
simulations of accidents and driving operation is introduced to collect data to
train machine learning classification models. Their performance is henceforth
assessed and compared via multiple representative and application-oriented
criteria.
\\ ( https://arxiv.org/abs/2403.09491 ,  5719kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09499
Date: Thu, 14 Mar 2024 15:42:26 GMT   (3808kb,D)

Title: A Reinforcement Learning Approach to Dairy Farm Battery Management using
 Q Learning
Authors: Nawazish Ali, Abdul Wahid, Rachael Shaw, Karl Mason
Categories: cs.LG cs.AI
\\
 Dairy farming consumes a significant amount of energy, making it an
energy-intensive sector within agriculture. Integrating renewable energy
generation into dairy farming could help address this challenge. Effective
battery management is important for integrating renewable energy generation.
Managing battery charging and discharging poses significant challenges because
of fluctuations in electrical consumption, the intermittent nature of renewable
energy generation, and fluctuations in energy prices. Artificial Intelligence
(AI) has the potential to significantly improve the use of renewable energy in
dairy farming, however, there is limited research conducted in this particular
domain. This research considers Ireland as a case study as it works towards
attaining its 2030 energy strategy centered on the utilization of renewable
sources. This study proposes a Q-learning-based algorithm for scheduling
battery charging and discharging in a dairy farm setting. This research also
explores the effect of the proposed algorithm by adding wind generation data
and considering additional case studies. The proposed algorithm reduces the
cost of imported electricity from the grid by 13.41\%, peak demand by 2\%, and
24.49\% when utilizing wind generation. These results underline how
reinforcement learning is highly effective in managing batteries in the dairy
farming sector.
\\ ( https://arxiv.org/abs/2403.09499 ,  3808kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09502
Date: Thu, 14 Mar 2024 15:44:19 GMT   (2029kb,D)

Title: EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Authors: Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho, Junmo Kim and Joon Son
 Chung
Categories: cs.LG cs.AI
Comments: 14 pages, 3 figures
\\
 Recent advancements in self-supervised audio-visual representation learning
have demonstrated its potential to capture rich and comprehensive
representations. However, despite the advantages of data augmentation verified
in many learning methods, audio-visual learning has struggled to fully harness
these benefits, as augmentations can easily disrupt the correspondence between
input pairs. To address this limitation, we introduce EquiAV, a novel framework
that leverages equivariance for audio-visual contrastive learning. Our approach
begins with extending equivariance to audio-visual learning, facilitated by a
shared attention-based transformation predictor. It enables the aggregation of
features from diverse augmentations into a representative embedding, providing
robust supervision. Notably, this is achieved with minimal computational
overhead. Extensive ablation studies and qualitative results verify the
effectiveness of our method. EquiAV outperforms previous works across various
audio-visual benchmarks.
\\ ( https://arxiv.org/abs/2403.09502 ,  2029kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09548
Date: Thu, 14 Mar 2024 16:35:43 GMT   (4146kb,D)

Title: Breast Cancer Classification Using Gradient Boosting Algorithms Focusing
 on Reducing the False Negative and SHAP for Explainability
Authors: Jo\~ao Manoel Herrera Pinheiro, Marcelo Becker
Categories: cs.LG q-bio.QM
Comments: 9 pages, 16 figures
\\
 Cancer is one of the diseases that kill the most women in the world, with
breast cancer being responsible for the highest number of cancer cases and
consequently deaths. However, it can be prevented by early detection and,
consequently, early treatment. Any development for detection or perdition this
kind of cancer is important for a better healthy life. Many studies focus on a
model with high accuracy in cancer prediction, but sometimes accuracy alone may
not always be a reliable metric. This study implies an investigative approach
to studying the performance of different machine learning algorithms based on
boosting to predict breast cancer focusing on the recall metric. Boosting
machine learning algorithms has been proven to be an effective tool for
detecting medical diseases. The dataset of the University of California, Irvine
(UCI) repository has been utilized to train and test the model classifier that
contains their attributes. The main objective of this study is to use
state-of-the-art boosting algorithms such as AdaBoost, XGBoost, CatBoost and
LightGBM to predict and diagnose breast cancer and to find the most effective
metric regarding recall, ROC-AUC, and confusion matrix. Furthermore, our study
is the first to use these four boosting algorithms with Optuna, a library for
hyperparameter optimization, and the SHAP method to improve the
interpretability of our model, which can be used as a support to identify and
predict breast cancer. We were able to improve AUC or recall for all the models
and reduce the False Negative for AdaBoost and LigthGBM the final AUC were more
than 99.41\% for all models.
\\ ( https://arxiv.org/abs/2403.09548 ,  4146kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09549
Date: Thu, 14 Mar 2024 16:38:02 GMT   (14640kb,D)

Title: Generalizing Denoising to Non-Equilibrium Structures Improves
 Equivariant Force Fields
Authors: Yi-Lun Liao, Tess Smidt, Abhishek Das
Categories: cs.LG cs.AI physics.comp-ph
\\
 Understanding the interactions of atoms such as forces in 3D atomistic
systems is fundamental to many applications like molecular dynamics and
catalyst design. However, simulating these interactions requires
compute-intensive ab initio calculations and thus results in limited data for
training neural networks. In this paper, we propose to use denoising
non-equilibrium structures (DeNS) as an auxiliary task to better leverage
training data and improve performance. For training with DeNS, we first corrupt
a 3D structure by adding noise to its 3D coordinates and then predict the
noise. Different from previous works on denoising, which are limited to
equilibrium structures, the proposed method generalizes denoising to a much
larger set of non-equilibrium structures. The main difference is that a
non-equilibrium structure does not correspond to local energy minima and has
non-zero forces, and therefore it can have many possible atomic positions
compared to an equilibrium structure. This makes denoising non-equilibrium
structures an ill-posed problem since the target of denoising is not uniquely
defined. Our key insight is to additionally encode the forces of the original
non-equilibrium structure to specify which non-equilibrium structure we are
denoising. Concretely, given a corrupted non-equilibrium structure and the
forces of the original one, we predict the non-equilibrium structure satisfying
the input forces instead of any arbitrary structures. Since DeNS requires
encoding forces, DeNS favors equivariant networks, which can easily incorporate
forces and other higher-order tensors in node embeddings. We study the
effectiveness of training equivariant networks with DeNS on OC20, OC22 and MD17
datasets and demonstrate that DeNS can achieve new state-of-the-art results on
OC20 and OC22 and significantly improve training efficiency on MD17.
\\ ( https://arxiv.org/abs/2403.09549 ,  14640kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09560
Date: Thu, 14 Mar 2024 16:52:57 GMT   (381kb,D)

Title: Self-Consistency Training for Hamiltonian Prediction
Authors: He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng,
 Bin Shao, Tie-Yan Liu
Categories: cs.LG
\\
 Hamiltonian prediction is a versatile formulation to leverage machine
learning for solving molecular science problems. Yet, its applicability is
limited by insufficient labeled data for training. In this work, we highlight
that Hamiltonian prediction possesses a self-consistency principle, based on
which we propose an exact training method that does not require labeled data.
This merit addresses the data scarcity difficulty, and distinguishes the task
from other property prediction formulations with unique benefits: (1)
self-consistency training enables the model to be trained on a large amount of
unlabeled data, hence substantially enhances generalization; (2)
self-consistency training is more efficient than labeling data with DFT for
supervised training, since it is an amortization of DFT calculation over a set
of molecular structures. We empirically demonstrate the better generalization
in data-scarce and out-of-distribution scenarios, and the better efficiency
from the amortization. These benefits push forward the applicability of
Hamiltonian prediction to an ever larger scale.
\\ ( https://arxiv.org/abs/2403.09560 ,  381kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09570
Date: Thu, 14 Mar 2024 17:00:01 GMT   (1216kb,D)

Title: Multi-Fidelity Bayesian Optimization With Across-Task Transferable
 Max-Value Entropy Search
Authors: Yunchuan Zhang, Sangwoo Park, Osvaldo Simeone
Categories: cs.LG cs.IT eess.SP math.IT
Comments: submitted to IEEE for review
\\
 In many applications, ranging from logistics to engineering, a designer is
faced with a sequence of optimization tasks for which the objectives are in the
form of black-box functions that are costly to evaluate. For example, the
designer may need to tune the hyperparameters of neural network models for
different learning tasks over time. Rather than evaluating the objective
function for each candidate solution, the designer may have access to
approximations of the objective functions, for which higher-fidelity
evaluations entail a larger cost. Existing multi-fidelity black-box
optimization strategies select candidate solutions and fidelity levels with the
goal of maximizing the information accrued about the optimal value or solution
for the current task. Assuming that successive optimization tasks are related,
this paper introduces a novel information-theoretic acquisition function that
balances the need to acquire information about the current task with the goal
of collecting information transferable to future tasks. The proposed method
includes shared inter-task latent variables, which are transferred across tasks
by implementing particle-based variational Bayesian updates. Experimental
results across synthetic and real-world examples reveal that the proposed
provident acquisition strategy that caters to future tasks can significantly
improve the optimization efficiency as soon as a sufficient number of tasks is
processed.
\\ ( https://arxiv.org/abs/2403.09570 ,  1216kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09588
Date: Thu, 14 Mar 2024 17:26:00 GMT   (4292kb,D)

Title: Iterative Forgetting: Online Data Stream Regression Using
 Database-Inspired Adaptive Granulation
Authors: Niket Kathiriya, Hossein Haeri, Cindy Chen, Kshitij Jerath
Categories: cs.LG cs.DB
\\
 Many modern systems, such as financial, transportation, and
telecommunications systems, are time-sensitive in the sense that they demand
low-latency predictions for real-time decision-making. Such systems often have
to contend with continuous unbounded data streams as well as concept drift,
which are challenging requirements that traditional regression techniques are
unable to cater to. There exists a need to create novel data stream regression
methods that can handle these scenarios. We present a database-inspired
datastream regression model that (a) uses inspiration from R*-trees to create
granules from incoming datastreams such that relevant information is retained,
(b) iteratively forgets granules whose information is deemed to be outdated,
thus maintaining a list of only recent, relevant granules, and (c) uses the
recent data and granules to provide low-latency predictions. The
R*-tree-inspired approach also makes the algorithm amenable to integration with
database systems. Our experiments demonstrate that the ability of this method
to discard data produces a significant order-of-magnitude improvement in
latency and training time when evaluated against the most accurate
state-of-the-art algorithms, while the R*-tree-inspired granulation technique
provides competitively accurate predictions
\\ ( https://arxiv.org/abs/2403.09588 ,  4292kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09613
Date: Thu, 14 Mar 2024 17:51:54 GMT   (632kb,D)

Title: Reawakening knowledge: Anticipatory recovery from catastrophic
 interference via structured training
Authors: Yanlai Yang, Matt Jones, Michael C. Mozer, Mengye Ren
Categories: cs.LG cs.CL
Comments: 19 pages, 18 figures
\\
 We explore the training dynamics of neural networks in a structured non-IID
setting where documents are presented cyclically in a fixed, repeated sequence.
Typically, networks suffer from catastrophic interference when training on a
sequence of documents; however, we discover a curious and remarkable property
of LLMs fine-tuned sequentially in this setting: they exhibit anticipatory
behavior, recovering from the forgetting on documents before encountering them
again. The behavior emerges and becomes more robust as the architecture scales
up its number of parameters. Through comprehensive experiments and
visualizations, we uncover new insights into training over-parameterized
networks in structured environments.
\\ ( https://arxiv.org/abs/2403.09613 ,  632kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09621
Date: Thu, 14 Mar 2024 17:55:10 GMT   (52kb)

Title: Minimax Optimal and Computationally Efficient Algorithms for
 Distributionally Robust Offline Reinforcement Learning
Authors: Zhishuai Liu, Pan Xu
Categories: cs.LG cs.AI stat.ML
Comments: 53 pages, 1 figure, 1 table
\\
 Distributionally robust offline reinforcement learning (RL), which seeks
robust policy training against environment perturbation by modeling dynamics
uncertainty, calls for function approximations when facing large state-action
spaces. However, the consideration of dynamics uncertainty introduces essential
nonlinearity and computational burden, posing unique challenges for analyzing
and practically employing function approximation. Focusing on a basic setting
where the nominal model and perturbed models are linearly parameterized, we
propose minimax optimal and computationally efficient algorithms realizing
function approximation and initiate the study on instance-dependent
suboptimality analysis in the context of robust offline RL. Our results uncover
that function approximation in robust offline RL is essentially distinct from
and probably harder than that in standard offline RL. Our algorithms and
theoretical results crucially depend on a variety of new techniques, involving
a novel function approximation mechanism incorporating variance information, a
new procedure of suboptimality and estimation uncertainty decomposition, a
quantification of the robust value function shrinkage, and a meticulously
designed family of hard instances, which might be of independent interest.
\\ ( https://arxiv.org/abs/2403.09621 ,  52kb)
%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-
------------------------------------------------------------------------------
\\
arXiv:2310.10404 (*cross-listing*)
Date: Mon, 16 Oct 2023 13:49:46 GMT   (4667kb,D)
Date (revised v2): Wed, 18 Oct 2023 04:05:40 GMT   (4667kb,D)
Date (revised v3): Thu, 19 Oct 2023 04:01:41 GMT   (4667kb,D)
Date (revised v4): Fri, 20 Oct 2023 01:12:52 GMT   (4667kb,D)
Date (revised v5): Mon, 27 Nov 2023 11:41:32 GMT   (6418kb,D)

Title: LLM4SGG: Large Language Model for Weakly Supervised Scene Graph
 Generation
Authors: Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon, Yeonjun In, Jinyoung Moon,
 Donghyun Kim, Chanyoung Park
Categories: cs.CV cs.AI
Comments: 21 pages, Preprint
\\
 Weakly-Supervised Scene Graph Generation (WSSGG) research has recently
emerged as an alternative to the fully-supervised approach that heavily relies
on costly annotations. In this regard, studies on WSSGG have utilized image
captions to obtain unlocalized triplets while primarily focusing on grounding
the unlocalized triplets over image regions. However, they have overlooked the
two issues involved in the triplet formation process from the captions: 1)
Semantic over-simplification issue arises when extracting triplets from
captions, where fine-grained predicates in captions are undesirably converted
into coarse-grained predicates, resulting in a long-tailed predicate
distribution, and 2) Low-density scene graph issue arises when aligning the
triplets in the caption with entity/predicate classes of interest, where many
triplets are discarded and not used in training, leading to insufficient
supervision. To tackle the two issues, we propose a new approach, i.e., Large
Language Model for weakly-supervised SGG (LLM4SGG), where we mitigate the two
issues by leveraging the LLM's in-depth understanding of language and reasoning
ability during the extraction of triplets from captions and alignment of
entity/predicate classes with target data. To further engage the LLM in these
processes, we adopt the idea of Chain-of-Thought and the in-context few-shot
learning strategy. To validate the effectiveness of LLM4SGG, we conduct
extensive experiments on Visual Genome and GQA datasets, showing significant
improvements in both Recall@K and mean Recall@K compared to the
state-of-the-art WSSGG methods. A further appeal is that LLM4SGG is
data-efficient, enabling effective model training with a small amount of
training images.
\\ ( https://arxiv.org/abs/2310.10404 ,  6418kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08773 (*cross-listing*)
Date: Thu, 18 Jan 2024 12:45:25 GMT   (5132kb,D)

Title: Veagle: Advancements in Multimodal Representation Learning
Authors: Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam,
 Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola
Categories: cs.CV cs.AI cs.CL cs.MM
\\
 Lately, researchers in artificial intelligence have been really interested in
how language and vision come together, giving rise to the development of
multimodal models that aim to seamlessly integrate textual and visual
information. Multimodal models, an extension of Large Language Models (LLMs),
have exhibited remarkable capabilities in addressing a diverse array of tasks,
ranging from image captioning and visual question answering (VQA) to visual
grounding. While these models have showcased significant advancements,
challenges persist in accurately interpreting images and answering the
question, a common occurrence in real-world scenarios. This paper introduces a
novel approach to enhance the multimodal capabilities of existing models. In
response to the limitations observed in current Vision Language Models (VLMs)
and Multimodal Large Language Models (MLLMs), our proposed model Veagle,
incorporates a unique mechanism inspired by the successes and insights of
previous works. Veagle leverages a dynamic mechanism to project encoded visual
information directly into the language model. This dynamic approach allows for
a more nuanced understanding of intricate details present in visual contexts.
To validate the effectiveness of Veagle, we conduct comprehensive experiments
on benchmark datasets, emphasizing tasks such as visual question answering and
image understanding. Our results indicate a improvement of 5-6 \% in
performance, with Veagle outperforming existing models by a notable margin. The
outcomes underscore the model's versatility and applicability beyond
traditional benchmarks.
\\ ( https://arxiv.org/abs/2403.08773 ,  5132kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08774 (*cross-listing*)
Date: Sun, 21 Jan 2024 15:03:17 GMT   (1490kb)

Title: Discussion of Loop Expansion and Introduction of Series Cutting
 Functions to Local Potential Approximation: Complexity Analysis Using Green's
 Functions, Cutting Of Nth-Order Social Interactions For Progressive Safety
Authors: Yasuko Kawahata
Categories: physics.soc-ph cs.AI
Comments: In this study, we focus on the aforementioned paper, "Examination
 Kubo-Matsubara Green's Function Of The Edwards-Anderson Model: Extreme Value
 Information Flow Of Nth-Order Interpolated Extrapolation Of Zero Phenomena
 Using The Replica Method (2024)"
\\
 In this study, we focus on the aforementioned paper, "Examination
Kubo-Matsubara Green's Function Of The Edwards-Anderson Model: Extreme Value
Information Flow Of Nth-Order Interpolated Extrapolation Of Zero Phenomena
Using The Replica Method (2024)". This paper also applies theoretical physics
methods to better understand the filter bubble phenomenon, focusing in
particular on loop expansions and truncation functions. Using the loop
expansion method, the complexity of social interactions during the occurrence
of filter bubbles will be discussed in order to introduce series, express
mathematically, and evaluate the impact of these interactions. We analyze the
interactions between agents and their time evolution using a variety of Green's
functions, including delayed Green's functions, advanced Green's functions, and
causal Green's functions, to capture the dynamic response of the system through
local potential approximations. In addition, we apply truncation functions and
truncation techniques to ensure incremental safety and evaluate the long-term
stability of the system. This approach will enable a better understanding of
the mechanisms of filter bubble generation and dissolution, and discuss
insights into their prevention and management. This research explores the
possibilities of applying theoretical physics frameworks to social science
problems and examines methods for analyzing the complex dynamics of information
flow and opinion formation in digital society.
\\ ( https://arxiv.org/abs/2403.08774 ,  1490kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08775 (*cross-listing*)
Date: Sun, 21 Jan 2024 21:57:22 GMT   (1687kb,D)

Title: Constrained Reinforcement Learning for Adaptive Controller
 Synchronization in Distributed SDN
Authors: Ioannis Panitsas, Akrit Mudvari, Leandros Tassiulas
Categories: cs.NI cs.AI
\\
 In software-defined networking (SDN), the implementation of distributed SDN
controllers, with each controller responsible for managing a specific
sub-network or domain, plays a critical role in achieving a balance between
centralized control, scalability, reliability, and network efficiency. These
controllers must be synchronized to maintain a logically centralized view of
the entire network. While there are various approaches for synchronizing
distributed SDN controllers, most tend to prioritize goals such as optimization
of communication latency or load balancing, often neglecting to address both
the aspects simultaneously. This limitation becomes particularly significant
when considering applications like Augmented and Virtual Reality (AR/VR), which
demand constrained network latencies and substantial computational resources.
Additionally, many existing studies in this field predominantly rely on
value-based reinforcement learning (RL) methods, overlooking the potential
advantages offered by state-of-the-art policy-based RL algorithms. To bridge
this gap, our work focuses on examining deep reinforcement learning (DRL)
techniques, encompassing both value-based and policy-based methods, to
guarantee an upper latency threshold for AR/VR task offloading within SDN
environments, while selecting the most cost-effective servers for AR/VR task
offloading. Our evaluation results indicate that while value-based methods
excel in optimizing individual network metrics such as latency or load
balancing, policy-based approaches exhibit greater robustness in adapting to
sudden network changes or reconfiguration.
\\ ( https://arxiv.org/abs/2403.08775 ,  1687kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08776 (*cross-listing*)
Date: Mon, 22 Jan 2024 13:54:40 GMT   (37041kb,D)

Title: Leveraging Chat-Based Large Vision Language Models for Multimodal
 Out-Of-Context Detection
Authors: Fatma Shalabi, Hichem Felouat, Huy H. Nguyen, and Isao Echizen
Categories: cs.CV cs.AI
Comments: 13 pages, 6 figures , conference
\\
 Out-of-context (OOC) detection is a challenging task involving identifying
images and texts that are irrelevant to the context in which they are
presented. Large vision-language models (LVLMs) are effective at various tasks,
including image classification and text generation. However, the extent of
their proficiency in multimodal OOC detection tasks is unclear. In this paper,
we investigate the ability of LVLMs to detect multimodal OOC and show that
these models cannot achieve high accuracy on OOC detection tasks without
fine-tuning. However, we demonstrate that fine-tuning LVLMs on multimodal OOC
datasets can further improve their OOC detection accuracy. To evaluate the
performance of LVLMs on OOC detection tasks, we fine-tune MiniGPT-4 on the
NewsCLIPpings dataset, a large dataset of multimodal OOC. Our results show that
fine-tuning MiniGPT-4 on the NewsCLIPpings dataset significantly improves the
OOC detection accuracy in this dataset. This suggests that fine-tuning can
significantly improve the performance of LVLMs on OOC detection tasks.
\\ ( https://arxiv.org/abs/2403.08776 ,  37041kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08782 (*cross-listing*)
Date: Sun, 28 Jan 2024 14:22:27 GMT   (5904kb,D)

Title: Procedural terrain generation with style transfer
Authors: Fabio Merizzi
Categories: cs.CV cs.AI
\\
 In this study we introduce a new technique for the generation of terrain
maps, exploiting a combination of procedural generation and Neural Style
Transfer. We consider our approach to be a viable alternative to competing
generative models, with our technique achieving greater versatility, lower
hardware requirements and greater integration in the creative process of
designers and developers. Our method involves generating procedural noise maps
using either multi-layered smoothed Gaussian noise or the Perlin algorithm. We
then employ an enhanced Neural Style transfer technique, drawing style from
real-world height maps. This fusion of algorithmic generation and neural
processing holds the potential to produce terrains that are not only diverse
but also closely aligned with the morphological characteristics of real-world
landscapes, with our process yielding consistent terrain structures with low
computational cost and offering the capability to create customized maps.
Numerical evaluations further validate our model's enhanced ability to
accurately replicate terrain morphology, surpassing traditional procedural
methods.
\\ ( https://arxiv.org/abs/2403.08782 ,  5904kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08783 (*cross-listing*)
Date: Mon, 29 Jan 2024 11:55:14 GMT   (7409kb,D)

Title: Image-Text Out-Of-Context Detection Using Synthetic Multimodal
 Misinformation
Authors: Fatma Shalabi, Huy H. Nguyen, Hichem Felouat, Ching-Chun Chang, and
 Isao Echizen
Categories: cs.CV cs.AI cs.CL
Comments: 8 pages, 2 figures, conference
DOI: 10.1109/APSIPAASC58517.2023.10317336
\\
 Misinformation has become a major challenge in the era of increasing digital
information, requiring the development of effective detection methods. We have
investigated a novel approach to Out-Of-Context detection (OOCD) that uses
synthetic data generation. We created a dataset specifically designed for OOCD
and developed an efficient detector for accurate classification. Our
experimental findings validate the use of synthetic data generation and
demonstrate its efficacy in addressing the data limitations associated with
OOCD. The dataset and detector should serve as valuable resources for future
research and the development of robust misinformation detection systems.
\\ ( https://arxiv.org/abs/2403.08783 ,  7409kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08786 (*cross-listing*)
Date: Tue, 30 Jan 2024 02:00:28 GMT   (15563kb)

Title: One-Spike SNN: Single-Spike Phase Coding with Base Manipulation for
 ANN-to-SNN Conversion Loss Minimization
Authors: Sangwoo Hwang and Jaeha Kung
Categories: cs.NE cs.AI
Comments: 11 pages, 10 figures
MSC-class: 68T07
\\
 As spiking neural networks (SNNs) are event-driven, energy efficiency is
higher than conventional artificial neural networks (ANNs). Since SNN delivers
data through discrete spikes, it is difficult to use gradient methods for
training, limiting its accuracy. To keep the accuracy of SNNs similar to ANN
counterparts, pre-trained ANNs are converted to SNNs (ANN-to-SNN conversion).
During the conversion, encoding activations of ANNs to a set of spikes in SNNs
is crucial for minimizing the conversion loss. In this work, we propose a
single-spike phase coding as an encoding scheme that minimizes the number of
spikes to transfer data between SNN layers. To minimize the encoding error due
to single-spike approximation in phase coding, threshold shift and base
manipulation are proposed. Without any additional retraining or architectural
constraints on ANNs, the proposed conversion method does not lose inference
accuracy (0.58% on average) verified on three convolutional neural networks
(CNNs) with CIFAR and ImageNet datasets.In addition, graph convolutional
networks (GCNs) are converted to SNNs successfully with an average accuracy
loss of 0.90%.Most importantly, the energy efficiency of our SNN improves by
4.6~17.3 X compared to the ANN baseline.
\\ ( https://arxiv.org/abs/2403.08786 ,  15563kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08788 (*cross-listing*)
Date: Tue, 30 Jan 2024 09:05:38 GMT   (6237kb,D)

Title: Verification for Object Detection -- IBP IoU
Authors: No\'emie Cohen, M\'elanie Ducoffe, Ryma Boumazouza (CRIL), Christophe
 Gabreau, Claire Pagetti, Xavier Pucel, Audrey Galametz
Categories: cs.CV cs.AI cs.NE
\\
 We introduce a novel Interval Bound Propagation (IBP) approach for the formal
verification of object detection models, specifically targeting the
Intersection over Union (IoU) metric. The approach has been implemented in an
open source code, named IBP IoU, compatible with popular abstract
interpretation based verification tools. The resulting verifier is evaluated on
landing approach runway detection and handwritten digit recognition case
studies. Comparisons against a baseline (Vanilla IBP IoU) highlight the
superior performance of IBP IoU in ensuring accuracy and stability,
contributing to more secure and robust machine learning applications.
\\ ( https://arxiv.org/abs/2403.08788 ,  6237kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08789 (*cross-listing*)
Date: Tue, 30 Jan 2024 09:13:49 GMT   (7690kb,D)

Title: Bridging Human Concepts and Computer Vision for Explainable Face
 Verification
Authors: Miriam Doh (UMons, IRIDIA), Caroline Mazini Rodrigues (LRDE, LIGM),
 Nicolas Boutry (LRDE), Laurent Najman (LIGM), Matei Mancas (UMONS), Hugues
 Bersini (IRIDIA)
Categories: cs.CV cs.AI cs.HC cs.LG
\\
 With Artificial Intelligence (AI) influencing the decision-making process of
sensitive applications such as Face Verification, it is fundamental to ensure
the transparency, fairness, and accountability of decisions. Although
Explainable Artificial Intelligence (XAI) techniques exist to clarify AI
decisions, it is equally important to provide interpretability of these
decisions to humans. In this paper, we present an approach to combine computer
and human vision to increase the explanation's interpretability of a face
verification algorithm. In particular, we are inspired by the human perceptual
process to understand how machines perceive face's human-semantic areas during
face comparison tasks. We use Mediapipe, which provides a segmentation
technique that identifies distinct human-semantic facial regions, enabling the
machine's perception analysis. Additionally, we adapted two model-agnostic
algorithms to provide human-interpretable insights into the decision-making
processes.
\\ ( https://arxiv.org/abs/2403.08789 ,  7690kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08790 (*cross-listing*)
Date: Tue, 30 Jan 2024 10:29:01 GMT   (968kb,D)

Title: Using Sequential Runtime Distributions for the Parallel Speedup
 Prediction of SAT Local Search
Authors: Alejandro Arbelaez and Charlotte Truchet and Philippe Codognet
Categories: cs.DC cs.AI
Journal-ref: Theory and Practice of Logic Programming. 2013;13(4-5):625-639
DOI: 10.1017/S1471068413000392
\\
 This paper presents a detailed analysis of the scalability and
parallelization of local search algorithms for the Satisfiability problem. We
propose a framework to estimate the parallel performance of a given algorithm
by analyzing the runtime behavior of its sequential version. Indeed, by
approximating the runtime distribution of the sequential process with
statistical methods, the runtime behavior of the parallel process can be
predicted by a model based on order statistics. We apply this approach to study
the parallel performance of two SAT local search solvers, namely Sparrow and
CCASAT, and compare the predicted performances to the results of an actual
experimentation on parallel hardware up to 384 cores. We show that the model is
accurate and predicts performance close to the empirical data. Moreover, as we
study different types of instances (random and crafted), we observe that the
local search solvers exhibit different behaviors and that their runtime
distributions can be approximated by two types of distributions: exponential
(shifted and non-shifted) and lognormal.
\\ ( https://arxiv.org/abs/2403.08790 ,  968kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08797 (*cross-listing*)
Date: Thu, 1 Feb 2024 19:22:02 GMT   (1239kb,D)

Title: Evolutionary Algorithms Simulating Molecular Evolution: A New Field
 Proposal
Authors: James S. L. Browning Jr., Daniel R. Tauritz, John Beckmann
Categories: cs.NE cs.AI
Comments: 7 pages, 2 figures
ACM-class: I.2.1
\\
 The genetic blueprint for the essential functions of life is encoded in DNA,
which is translated into proteins -- the engines driving most of our metabolic
processes. Recent advancements in genome sequencing have unveiled a vast
diversity of protein families, but compared to the massive search space of all
possible amino acid sequences, the set of known functional families is minimal.
One could say nature has a limited protein "vocabulary." The major question for
computational biologists, therefore, is whether this vocabulary can be expanded
to include useful proteins that went extinct long ago, or maybe never evolved
in the first place. We outline a computational approach to solving this
problem. By merging evolutionary algorithms, machine learning (ML), and
bioinformatics, we can facilitate the development of completely novel proteins
which have never existed before. We envision this work forming a new sub-field
of computational evolution we dub evolutionary algorithms simulating molecular
evolution (EASME).
\\ ( https://arxiv.org/abs/2403.08797 ,  1239kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08807 (*cross-listing*)
Date: Tue, 6 Feb 2024 11:53:44 GMT   (313kb,D)

Title: Effective anytime algorithm for multiobjective combinatorial
 optimization problems
Authors: Miguel \'Angel Dom\'inguez-R\'ios, Francisco Chicano, Enrique Alba
Categories: cs.NE cs.AI
Journal-ref: Miguel \'Angel Dom\'inguez-R\'ios, Francisco Chicano, Enrique
 Alba: Effective anytime algorithm for multiobjective combinatorial
 optimization problems. Inf. Sci. 565: 210-228 (2021)
DOI: 10.1016/j.ins.2021.02.074
\\
 In multiobjective optimization, the result of an optimization algorithm is a
set of efficient solutions from which the decision maker selects one. It is
common that not all the efficient solutions can be computed in a short time and
the search algorithm has to be stopped prematurely to analyze the solutions
found so far. A set of efficient solutions that are well-spread in the
objective space is preferred to provide the decision maker with a great variety
of solutions. However, just a few exact algorithms in the literature exist with
the ability to provide such a well-spread set of solutions at any moment: we
call them anytime algorithms. We propose a new exact anytime algorithm for
multiobjective combinatorial optimization combining three novel ideas to
enhance the anytime behavior. We compare the proposed algorithm with those in
the state-of-the-art for anytime multiobjective combinatorial optimization
using a set of 480 instances from different well-known benchmarks and four
different performance measures: the overall non-dominated vector generation
ratio, the hypervolume, the general spread and the additive epsilon indicator.
A comprehensive experimental study reveals that our proposal outperforms the
previous algorithms in most of the instances.
\\ ( https://arxiv.org/abs/2403.08807 ,  313kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08808 (*cross-listing*)
Date: Tue, 6 Feb 2024 13:20:56 GMT   (1641kb,D)

Title: A Bionic Data-driven Approach for Long-distance Underwater Navigation
 with Anomaly Resistance
Authors: Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai,
 Yushuai Li, Tingwen Huang
Categories: cs.RO cs.AI
\\
 Various animals exhibit accurate navigation using environment cues. The
Earth's magnetic field has been proved a reliable information source in
long-distance fauna migration. Inspired by animal navigation, this work
proposes a bionic and data-driven approach for long-distance underwater
navigation. The proposed approach uses measured geomagnetic data for the
navigation, and requires no GPS systems or geographical maps. Particularly, we
construct and train a Temporal Attention-based Long Short-Term Memory (TA-LSTM)
network to predict the heading angle during the navigation. To mitigate the
impact of geomagnetic anomalies, we develop the mechanism to detect and
quantify the anomalies based on Maximum Likelihood Estimation. We integrate the
developed mechanism with the TA-LSTM, and calibrate the predicted heading
angles to gain resistance against geomagnetic anomalies. Using the retrieved
data from the WMM model, we conduct numerical simulations with diversified
navigation conditions to test our approach. The simulation results demonstrate
a resilience navigation against geomagnetic anomalies by our approach, along
with precision and stability of the underwater navigation in single and
multiple destination missions.
\\ ( https://arxiv.org/abs/2403.08808 ,  1641kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08810 (*cross-listing*)
Date: Wed, 7 Feb 2024 21:15:18 GMT   (16797kb)

Title: Comparison of edge computing methods in Internet of Things architectures
 for efficient estimation of indoor environmental parameters with Machine
 Learning
Authors: Jose-Carlos Gamazo-Real, Raul Torres Fernandez, Adrian Murillo Armas
Categories: cs.NI cs.AI cs.AR cs.DC cs.IT cs.LG math.IT
Journal-ref: Engineering Applications of Artificial Intelligence, 2023, vol.
 126, Part D, no. 107149, pp. 1-27, ISSN 0952-1976
DOI: 10.1016/j.engappai.2023.107149
\\
 The large increase in the number of Internet of Things (IoT) devices have
revolutionised the way data is processed, which added to the current trend from
cloud to edge computing has resulted in the need for efficient and reliable
data processing near the data sources using energy-efficient devices. Two
methods based on low-cost edge-IoT architectures are proposed to implement
lightweight Machine Learning (ML) models that estimate indoor environmental
quality (IEQ) parameters, such as Artificial Neural Networks of Multilayer
Perceptron type. Their implementation is based on centralised and distributed
parallel IoT architectures, connected via wireless, which share commercial
off-the-self modules for data acquisition and sensing, such as sensors for
temperature, humidity, illuminance, CO2, and other gases. The centralised
method uses a Graphics Processing Unit and the Message Queuing Telemetry
Transport protocol, but the distributed method utilises low performance
ARM-based devices and the Message Passing Interface protocol. Although multiple
IEQ parameters are measured, the training and testing of ML models is
accomplished with experiments focused on small temperature and illuminance
datasets to reduce data processing load, obtained from sudden spikes, square
profiles and sawteeth test cases. The results show a high estimation
performance with F-score and Accuracy values close to 0.95, and an almost
theorical Speedup with a reduction in power consumption close to 37% in the
distributed parallel approach. In addition, similar or slightly better
performance is achieved compared to equivalent IoT architectures from related
research, but error reduction of 35 to 76% is accomplished with an adequate
balance between performance and energy efficiency.
\\ ( https://arxiv.org/abs/2403.08810 ,  16797kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08813 (*cross-listing*)
Date: Sat, 10 Feb 2024 10:34:20 GMT   (2312kb,D)

Title: Federated Deep Q-Learning and 5G load balancing
Authors: Hsin Lin, Yi-Kang Su, Hong-Qi Chen, La-Fei Ko
Categories: cs.NI cs.AI cs.LG cs.MA
Comments: 5 pages, in Chinese language. 8 figures. Presented at 2022 Taiwan
 telecommunications annual symposium
\\
 Despite advances in cellular network technology, base station (BS) load
balancing remains a persistent problem. Although centralized resource
allocation methods can address the load balancing problem, it still remains an
NP-hard problem. In this research, we study how federated deep Q learning can
be used to inform each user equipment (UE) of the each BS's load conditions.
Federated deep Q learning's load balancing enables intelligent UEs to
independently select the best BS while also limiting the amount of private
information exposed to the network.
 In this study, we propose and analyze a federated deep Q learning load
balancing system, which is implemented using the Open-RAN xAPP framework and
the near-Real Time Radio Interface Controller (near-RT RIC). Our simulation
results indicate that compared to the maximum Signal-To-Noise-Ratio (MAX-SINR)
method currently used by UEs, our proposed deep Q learning model can
consistently provide better High average UE quality of service
\\ ( https://arxiv.org/abs/2403.08813 ,  2312kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08824 (*cross-listing*)
Date: Sat, 9 Mar 2024 11:16:09 GMT   (12695kb,D)

Title: Measuring Non-Typical Emotions for Mental Health: A Survey of
 Computational Approaches
Authors: Puneet Kumar, Alexander Vedernikov, Xiaobai Li
Categories: cs.HC cs.AI cs.MM
Comments: Under review in IEEE Transactions on Affective Computing
\\
 Analysis of non-typical emotions, such as stress, depression and engagement
is less common and more complex compared to that of frequently discussed
emotions like happiness, sadness, fear, and anger. The importance of these
non-typical emotions has been increasingly recognized due to their implications
on mental health and well-being. Stress and depression impact the engagement in
daily tasks, highlighting the need to understand their interplay. This survey
is the first to simultaneously explore computational methods for analyzing
stress, depression, and engagement. We discuss the most commonly used datasets,
input modalities, data processing techniques, and information fusion methods
used for the computational analysis of stress, depression and engagement. A
timeline and taxonomy of non-typical emotion analysis approaches along with
their generic pipeline and categories are presented. Subsequently, we describe
state-of-the-art computational approaches for non-typical emotion analysis,
including a performance summary on the most commonly used datasets. Following
this, we explore the applications, along with the associated challenges,
limitations, and future research directions.
\\ ( https://arxiv.org/abs/2403.08824 ,  12695kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08828 (*cross-listing*)
Date: Mon, 11 Mar 2024 11:48:50 GMT   (2263kb,D)

Title: People Attribute Purpose to Autonomous Vehicles When Explaining Their
 Behavior
Authors: Balint Gyevnar and Stephanie Droop and Tadeg Quillien
Categories: cs.HC cs.AI cs.RO
\\
 A hallmark of a good XAI system is explanations that users can understand and
act on. In many cases, this requires a system to offer causal or counterfactual
explanations that are intelligible. Cognitive science can help us understand
what kinds of explanations users might expect, and in which format to frame
these explanations. We briefly review relevant literature from the cognitive
science of explanation, particularly as it concerns teleology, the tendency to
explain a decision in terms of the purpose it was meant to achieve. We then
report empirical data on how people generate explanations for the behavior of
autonomous vehicles, and how they evaluate these explanations. In a first
survey, participants (n=54) were shown videos of a road scene and asked to
generate either mechanistic, counterfactual, or teleological verbal
explanations for a vehicle's actions. In the second survey, a different set of
participants (n=356) rated these explanations along various metrics including
quality, trustworthiness, and how much each explanatory mode was emphasized in
the explanation. Participants deemed mechanistic and teleological explanations
as significantly higher quality than counterfactual explanations. In addition,
perceived teleology was the best predictor of perceived quality and
trustworthiness. Neither perceived teleology nor quality ratings were affected
by whether the car whose actions were being explained was an autonomous vehicle
or was being driven by a person. The results show people use and value
teleological concepts to evaluate information about both other people and
autonomous vehicles, indicating they find the 'intentional stance' a convenient
abstraction. We make our dataset of annotated video situations with
explanations, called Human Explanations for Autonomous Driving Decisions
(HEADD), publicly available, which we hope will prompt further research.
\\ ( https://arxiv.org/abs/2403.08828 ,  2263kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08833 (*cross-listing*)
Date: Wed, 13 Mar 2024 05:22:39 GMT   (15740kb,D)

Title: TINA: Think, Interaction, and Action Framework for Zero-Shot Vision
 Language Navigation
Authors: Dingbang Li, Wenzhou Chen, Xin Lin
Categories: cs.CV cs.AI
\\
 Zero-shot navigation is a critical challenge in Vision-Language Navigation
(VLN) tasks, where the ability to adapt to unfamiliar instructions and to act
in unknown environments is essential. Existing supervised learning-based
models, trained using annotated data through reinforcement learning, exhibit
limitations in generalization capabilities. Large Language Models (LLMs), with
their extensive knowledge and emergent reasoning abilities, present a potential
pathway for achieving zero-shot navigation. This paper presents a VLN agent
based on LLMs, exploring approaches to the zero-shot navigation problem. To
compensate for the shortcomings of LLMs in environmental perception, we propose
the Thinking, Interacting, and Action (TINA) framework. TINA enables the agent
to scrutinize perceptual information and autonomously query key clues within
the environment through an introduced question-answering module, thereby
aligning instructions with specific perceptual data. The navigation agent's
perceptual abilities are enhanced through the TINA framework, while the
explicit thought and query processes also improve the navigational procedure's
explainability and transparency. We evaluate the performance of our method on
the Room-to-Room dataset. The experiment results indicate that our approach
improves the navigation performance of LLM-based agents. Our approach also
outperformed some supervised learning-based methods, highlighting its efficacy
in zero-shot navigation.
\\ ( https://arxiv.org/abs/2403.08833 ,  15740kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08840 (*cross-listing*)
Date: Wed, 13 Mar 2024 12:32:25 GMT   (41815kb,D)

Title: NoiseDiffusion: Correcting Noise for Image Interpolation with Diffusion
 Models beyond Spherical Linear Interpolation
Authors: PengFei Zheng, Yonggang Zhang, Zhen Fang, Tongliang Liu, Defu Lian, Bo
 Han
Categories: cs.CV cs.AI
Comments: ICLR 2024
\\
 Image interpolation based on diffusion models is promising in creating fresh
and interesting images. Advanced interpolation methods mainly focus on
spherical linear interpolation, where images are encoded into the noise space
and then interpolated for denoising to images. However, existing methods face
challenges in effectively interpolating natural images (not generated by
diffusion models), thereby restricting their practical applicability. Our
experimental investigations reveal that these challenges stem from the
invalidity of the encoding noise, which may no longer obey the expected noise
distribution, e.g., a normal distribution. To address these challenges, we
propose a novel approach to correct noise for image interpolation,
NoiseDiffusion. Specifically, NoiseDiffusion approaches the invalid noise to
the expected distribution by introducing subtle Gaussian noise and introduces a
constraint to suppress noise with extreme values. In this context, promoting
noise validity contributes to mitigating image artifacts, but the constraint
and introduced exogenous noise typically lead to a reduction in signal-to-noise
ratio, i.e., loss of original image information. Hence, NoiseDiffusion performs
interpolation within the noisy image space and injects raw images into these
noisy counterparts to address the challenge of information loss. Consequently,
NoiseDiffusion enables us to interpolate natural images without causing
artifacts or information loss, thus achieving the best interpolation results.
\\ ( https://arxiv.org/abs/2403.08840 ,  41815kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08844 (*cross-listing*)
Date: Wed, 13 Mar 2024 15:54:49 GMT   (3925kb,D)

Title: AcademiaOS: Automating Grounded Theory Development in Qualitative
 Research with Large Language Models
Authors: Thomas \"Ubellacker
Categories: cs.HC cs.AI cs.IR
Comments: Live version: https://academia-os.org Source code:
 https://github.com/thomasuebi/academia-os
\\
 AcademiaOS is a first attempt to automate grounded theory development in
qualitative research with large language models. Using recent large language
models' language understanding, generation, and reasoning capabilities,
AcademiaOS codes curated qualitative raw data such as interview transcripts and
develops themes and dimensions to further develop a grounded theoretical model,
affording novel insights. A user study (n=19) suggests that the system finds
acceptance in the academic community and exhibits the potential to augment
humans in qualitative research. AcademiaOS has been made open-source for others
to build upon and adapt to their use cases.
\\ ( https://arxiv.org/abs/2403.08844 ,  3925kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08882 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:11:17 GMT   (11190kb,D)

Title: Cultural evolution in populations of Large Language Models
Authors: J\'er\'emy Perez, Corentin L\'eger, Marcela Ovando-Tellez, Chris
 Foulon, Joan Dussauld, Pierre-Yves Oudeyer, Cl\'ement Moulin-Frier
Categories: cs.MA cs.AI q-bio.PE
Comments: 17 pages, 20 figures. Open-source code available at
 https://github.com/jeremyperez2/LLM-Culture
MSC-class: 68T50
ACM-class: I.2.7
\\
 Research in cultural evolution aims at providing causal explanations for the
change of culture over time. Over the past decades, this field has generated an
important body of knowledge, using experimental, historical, and computational
methods. While computational models have been very successful at generating
testable hypotheses about the effects of several factors, such as population
structure or transmission biases, some phenomena have so far been more complex
to capture using agent-based and formal models. This is in particular the case
for the effect of the transformations of social information induced by evolved
cognitive mechanisms. We here propose that leveraging the capacity of Large
Language Models (LLMs) to mimic human behavior may be fruitful to address this
gap. On top of being an useful approximation of human cultural dynamics,
multi-agents models featuring generative agents are also important to study for
their own sake. Indeed, as artificial agents are bound to participate more and
more to the evolution of culture, it is crucial to better understand the
dynamics of machine-generated cultural evolution. We here present a framework
for simulating cultural evolution in populations of LLMs, allowing the
manipulation of variables known to be important in cultural evolution, such as
network structure, personality, and the way social information is aggregated
and transformed. The software we developed for conducting these simulations is
open-source and features an intuitive user-interface, which we hope will help
to build bridges between the fields of cultural evolution and generative
artificial intelligence.
\\ ( https://arxiv.org/abs/2403.08882 ,  11190kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08885 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:12:53 GMT   (3674kb,D)

Title: SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion
 using a 3D Recurrent U-Net
Authors: Helin Cao, Sven Behnke
Categories: cs.CV cs.AI cs.RO
Comments: 2024 IEEE International Conference on Robotics and Automation
 (ICRA2024), Yokohama, Japan, May 2024
\\
 We introduce SLCF-Net, a novel approach for the Semantic Scene Completion
(SSC) task that sequentially fuses LiDAR and camera data. It jointly estimates
missing geometry and semantics in a scene from sequences of RGB images and
sparse LiDAR measurements. The images are semantically segmented by a
pre-trained 2D U-Net and a dense depth prior is estimated from a
depth-conditioned pipeline fueled by Depth Anything. To associate the 2D image
features with the 3D scene volume, we introduce Gaussian-decay Depth-prior
Projection (GDP). This module projects the 2D features into the 3D volume along
the line of sight with a Gaussian-decay function, centered around the depth
prior. Volumetric semantics is computed by a 3D U-Net. We propagate the hidden
3D U-Net state using the sensor motion and design a novel loss to ensure
temporal consistency. We evaluate our approach on the SemanticKITTI dataset and
compare it with leading SSC approaches. The SLCF-Net excels in all SSC metrics
and shows great temporal consistency.
\\ ( https://arxiv.org/abs/2403.08885 ,  3674kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08906 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:54:27 GMT   (417kb,D)

Title: Strategizing against Q-learners: A Control-theoretical Approach
Authors: Yuksel Arslantas, Ege Yuceel and Muhammed O. Sayin
Categories: cs.GT cs.AI math.OC
\\
 In this paper, we explore the susceptibility of the Q-learning algorithm (a
classical and widely used reinforcement learning method) to strategic
manipulation of sophisticated opponents in games. We quantify how much a
strategically sophisticated agent can exploit a naive Q-learner if she knows
the opponent's Q-learning algorithm. To this end, we formulate the strategic
actor's problem as a Markov decision process (with a continuum state space
encompassing all possible Q-values) as if the Q-learning algorithm is the
underlying dynamical system. We also present a quantization-based approximation
scheme to tackle the continuum state space and analyze its performance both
analytically and numerically.
\\ ( https://arxiv.org/abs/2403.08906 ,  417kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08915 (*cross-listing*)
Date: Wed, 13 Mar 2024 19:11:58 GMT   (9129kb,D)

Title: Cross-Modal Learning of Housing Quality in Amsterdam
Authors: Alex Levering, Diego Marcos, Devis Tuia
Categories: cs.CV cs.AI
Comments: Presented at SIGSpatial GeoAI workshop '21
DOI: 10.1145/3486635.3491067
\\
 In our research we test data and models for the recognition of housing
quality in the city of Amsterdam from ground-level and aerial imagery. For
ground-level images we compare Google StreetView (GSV) to Flickr images. Our
results show that GSV predicts the most accurate building quality scores,
approximately 30% better than using only aerial images. However, we find that
through careful filtering and by using the right pre-trained model, Flickr
image features combined with aerial image features are able to halve the
performance gap to GSV features from 30% to 15%. Our results indicate that
there are viable alternatives to GSV for liveability factor prediction, which
is encouraging as GSV images are more difficult to acquire and not always
available.
\\ ( https://arxiv.org/abs/2403.08915 ,  9129kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08933 (*cross-listing*)
Date: Wed, 13 Mar 2024 19:56:30 GMT   (805kb,D)

Title: Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images
Authors: Giuseppe Cartella, Vittorio Cuculo, Marcella Cornia, Rita Cucchiara
Categories: cs.CV cs.AI
Comments: Accepted to IEEE Signal Processing Letters 2024
\\
 Creating high-quality and realistic images is now possible thanks to the
impressive advancements in image generation. A description in natural language
of your desired output is all you need to obtain breathtaking results. However,
as the use of generative models grows, so do concerns about the propagation of
malicious content and misinformation. Consequently, the research community is
actively working on the development of novel fake detection techniques,
primarily focusing on low-level features and possible fingerprints left by
generative models during the image generation process. In a different vein, in
our work, we leverage human semantic knowledge to investigate the possibility
of being included in frameworks of fake image detection. To achieve this, we
collect a novel dataset of partially manipulated images using diffusion models
and conduct an eye-tracking experiment to record the eye movements of different
observers while viewing real and fake stimuli. A preliminary statistical
analysis is conducted to explore the distinctive patterns in how humans
perceive genuine and altered images. Statistical findings reveal that, when
perceiving counterfeit samples, humans tend to focus on more confined regions
of the image, in contrast to the more dispersed observational pattern observed
when viewing genuine images. Our dataset is publicly available at:
https://github.com/aimagelab/unveiling-the-truth.
\\ ( https://arxiv.org/abs/2403.08933 ,  805kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08936 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:11:20 GMT   (26613kb,D)

Title: Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient
 Multi-Agent Reinforcement Learning
Authors: Peihong Yu, Manav Mishra, Alec Koppel, Carl Busart, Priya Narayan,
 Dinesh Manocha, Amrit Bedi, Pratap Tokekar
Categories: cs.MA cs.AI cs.RO
\\
 Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of
efficient exploration due to the exponential increase in the size of the joint
state-action space. While demonstration-guided learning has proven beneficial
in single-agent settings, its direct applicability to MARL is hindered by the
practical difficulty of obtaining joint expert demonstrations. In this work, we
introduce a novel concept of personalized expert demonstrations, tailored for
each individual agent or, more broadly, each individual type of agent within a
heterogeneous team. These demonstrations solely pertain to single-agent
behaviors and how each agent can achieve personal goals without encompassing
any cooperative elements, thus naively imitating them will not achieve
cooperation due to potential conflicts. To this end, we propose an approach
that selectively utilizes personalized expert demonstrations as guidance and
allows agents to learn to cooperate, namely personalized expert-guided MARL
(PegMARL). This algorithm utilizes two discriminators: the first provides
incentives based on the alignment of policy behavior with demonstrations, and
the second regulates incentives based on whether the behavior leads to the
desired objective. We evaluate PegMARL using personalized demonstrations in
both discrete and continuous environments. The results demonstrate that PegMARL
learns near-optimal policies even when provided with suboptimal demonstrations,
and outperforms state-of-the-art MARL algorithms in solving coordinated tasks.
We also showcase PegMARL's capability to leverage joint demonstrations in the
StarCraft scenario and converge effectively even with demonstrations from
non-co-trained policies.
\\ ( https://arxiv.org/abs/2403.08936 ,  26613kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08937 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:12:01 GMT   (1323kb,D)

Title: Bugs in Large Language Models Generated Code
Authors: Florian Tambon, Arghavan Moradi Dakhel, Amin Nikanjam, Foutse Khomh,
 Michel C. Desmarais, Giuliano Antoniol
Categories: cs.SE cs.AI
Comments: 47 pages, 7 figures
\\
 Large Language Models (LLMs) for code have gained significant attention
recently. They can generate code in different programming languages based on
provided prompts, fulfilling a long-lasting dream in Software Engineering (SE),
i.e., automatic code generation. Similar to human-written code, LLM-generated
code is prone to bugs, and these bugs have not yet been thoroughly examined by
the community. Given the increasing adoption of LLM-based code generation tools
(e.g., GitHub Copilot) in SE activities, it is critical to understand the
characteristics of bugs contained in code generated by LLMs. This paper
examines a sample of 333 bugs collected from code generated using three leading
LLMs (i.e., CodeGen, PanGu-Coder, and Codex) and identifies the following 10
distinctive bug patterns: Misinterpretations, Syntax Error, Silly Mistake,
Prompt-biased code, Missing Corner Case, Wrong Input Type, Hallucinated Object,
Wrong Attribute, Incomplete Generation, and Non-Prompted Consideration. The bug
patterns are presented in the form of a taxonomy. The identified bug patterns
are validated using an online survey with 34 LLM practitioners and researchers.
The surveyed participants generally asserted the significance and prevalence of
the bug patterns. Researchers and practitioners can leverage these findings to
develop effective quality assurance techniques for LLM-generated code. This
study sheds light on the distinctive characteristics of LLM-generated code.
\\ ( https://arxiv.org/abs/2403.08937 ,  1323kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08944 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:21:20 GMT   (424kb)

Title: Language-based game theory in the age of artificial intelligence
Authors: Valerio Capraro, Roberto Di Paolo, Matjaz Perc, Veronica Pizziol
Categories: cs.GT cs.AI cs.CY econ.TH
Journal-ref: Journal of the Royal Society Interface, 21, 20230720 (2024)
\\
 Understanding human behaviour in decision problems and strategic interactions
has wide-ranging applications in economics, psychology, and artificial
intelligence. Game theory offers a robust foundation for this understanding,
based on the idea that individuals aim to maximize a utility function. However,
the exact factors influencing strategy choices remain elusive. While
traditional models try to explain human behaviour as a function of the outcomes
of available actions, recent experimental research reveals that linguistic
content significantly impacts decision-making, thus prompting a paradigm shift
from outcome-based to language-based utility functions. This shift is more
urgent than ever, given the advancement of generative AI, which has the
potential to support humans in making critical decisions through language-based
interactions. We propose sentiment analysis as a fundamental tool for this
shift and take an initial step by analyzing 61 experimental instructions from
the dictator game, an economic game capturing the balance between self-interest
and the interest of others, which is at the core of many social interactions.
Our meta-analysis shows that sentiment analysis can explain human behaviour
beyond economic outcomes. We discuss future research directions. We hope this
work sets the stage for a novel game theoretical approach that emphasizes the
importance of language in human decisions.
\\ ( https://arxiv.org/abs/2403.08944 ,  424kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08950 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:32:32 GMT   (286kb,D)

Title: Exploring Prompt Engineering Practices in the Enterprise
Authors: Michael Desmond and Michelle Brachman
Categories: cs.HC cs.AI
\\
 Interaction with Large Language Models (LLMs) is primarily carried out via
prompting. A prompt is a natural language instruction designed to elicit
certain behaviour or output from a model. In theory, natural language prompts
enable non-experts to interact with and leverage LLMs. However, for complex
tasks and tasks with specific requirements, prompt design is not trivial.
Creating effective prompts requires skill and knowledge, as well as significant
iteration in order to determine model behavior, and guide the model to
accomplish a particular goal. We hypothesize that the way in which users
iterate on their prompts can provide insight into how they think prompting and
models work, as well as the kinds of support needed for more efficient prompt
engineering. To better understand prompt engineering practices, we analyzed
sessions of prompt editing behavior, categorizing the parts of prompts users
iterated on and the types of changes they made. We discuss design implications
and future directions based on these prompt engineering practices.
\\ ( https://arxiv.org/abs/2403.08950 ,  286kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08956 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:51:21 GMT   (728kb)

Title: AI coach for badminton
Authors: Dhruv Toshniwal, Arpit Patil, Nancy Vachhani
Categories: cs.HC cs.AI
Comments: 7 pages, 11 figures. https://ieeexplore.ieee.org/document/9825164
Journal-ref: 2022 3rd International Conference for Emerging Technology (INCET),
 Belgaum, India, 2022, pp. 1-7
DOI: 10.1109/INCET54531.2022.9825164
\\
 In the competitive realm of sports, optimal performance necessitates rigorous
management of nutrition and physical conditioning. Specifically, in badminton,
the agility and precision required make it an ideal candidate for motion
analysis through video analytics. This study leverages advanced neural network
methodologies to dissect video footage of badminton matches, aiming to extract
detailed insights into player kinetics and biomechanics. Through the analysis
of stroke mechanics, including hand-hip coordination, leg positioning, and the
execution angles of strokes, the research aims to derive predictive models that
can suggest improvements in stance, technique, and muscle orientation. These
recommendations are designed to mitigate erroneous techniques, reduce the risk
of joint fatigue, and enhance overall performance. Utilizing a vast array of
data available online, this research correlates players' physical attributes
with their in-game movements to identify muscle activation patterns during
play. The goal is to offer personalized training and nutrition strategies that
align with the specific biomechanical demands of badminton, thereby
facilitating targeted performance enhancements.
\\ ( https://arxiv.org/abs/2403.08956 ,  728kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08962 (*cross-listing*)
Date: Wed, 13 Mar 2024 21:05:34 GMT   (34280kb,D)

Title: Using Deep Learning for Morphological Classification in Pigs with a
 Focus on Sanitary Monitoring
Authors: Eduardo Bedin, Junior Silva Souza, Gabriel Toshio Hirokawa Higa,
 Alexandre Pereira, Charles Kiefer, Newton Loebens, Hemerson Pistori
Categories: cs.CV cs.AI
\\
 The aim of this paper is to evaluate the use of D-CNN (Deep Convolutional
Neural Networks) algorithms to classify pig body conditions in normal or not
normal conditions, with a focus on characteristics that are observed in
sanitary monitoring, and were used six different algorithms to do this task.
The study focused on five pig characteristics, being these caudophagy, ear
hematoma, scratches on the body, redness, and natural stains (brown or black).
The results of the study showed that D-CNN was effective in classifying
deviations in pig body morphologies related to skin characteristics. The
evaluation was conducted by analyzing the performance metrics Precision,
Recall, and F-score, as well as the statistical analyses ANOVA and the
Scott-Knott test. The contribution of this article is characterized by the
proposal of using D-CNN networks for morphological classification in pigs, with
a focus on characteristics identified in sanitary monitoring. Among the best
results, the average Precision metric of 80.6\% to classify caudophagy was
achieved for the InceptionResNetV2 network, indicating the potential use of
this technology for the proposed task. Additionally, a new image database was
created, containing various pig's distinct body characteristics, which can
serve as data for future research.
\\ ( https://arxiv.org/abs/2403.08962 ,  34280kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08967 (*cross-listing*)
Date: Wed, 13 Mar 2024 21:19:12 GMT   (5653kb,D)

Title: PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for
 Whole Slide Image Classification and Captioning
Authors: Qifeng Zhou, Wenliang Zhong, Yuzhi Guo, Michael Xiao, Hehuan Ma,
 Junzhou Huang
Categories: cs.CV cs.AI
\\
 In the field of computational histopathology, both whole slide images (WSIs)
and diagnostic captions provide valuable insights for making diagnostic
decisions. However, aligning WSIs with diagnostic captions presents a
significant challenge. This difficulty arises from two main factors: 1)
Gigapixel WSIs are unsuitable for direct input into deep learning models, and
the redundancy and correlation among the patches demand more attention; and 2)
Authentic WSI diagnostic captions are extremely limited, making it difficult to
train an effective model. To overcome these obstacles, we present PathM3, a
multimodal, multi-task, multiple instance learning (MIL) framework for WSI
classification and captioning. PathM3 adapts a query-based transformer to
effectively align WSIs with diagnostic captions. Given that histopathology
visual patterns are redundantly distributed across WSIs, we aggregate each
patch feature with MIL method that considers the correlations among instances.
Furthermore, our PathM3 overcomes data scarcity in WSI-level captions by
leveraging limited WSI diagnostic caption data in the manner of multi-task
joint learning. Extensive experiments with improved classification accuracy and
caption generation demonstrate the effectiveness of our method on both WSI
classification and captioning task.
\\ ( https://arxiv.org/abs/2403.08967 ,  5653kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08974 (*cross-listing*)
Date: Wed, 13 Mar 2024 21:43:24 GMT   (10344kb,D)

Title: Representing Anatomical Trees by Denoising Diffusion of Implicit Neural
 Fields
Authors: Ashish Sinha and Ghassan Hamarneh
Categories: cs.CV cs.AI
Comments: Preprint. In review. The code will be available at:
 https://github.com/sinAshish/TreeDiffusion
\\
 Anatomical trees play a central role in clinical diagnosis and treatment
planning. However, accurately representing anatomical trees is challenging due
to their varying and complex topology and geometry. Traditional methods for
representing tree structures, captured using medical imaging, while invaluable
for visualizing vascular and bronchial networks, exhibit drawbacks in terms of
limited resolution, flexibility, and efficiency. Recently, implicit neural
representations (INRs) have emerged as a powerful tool for representing shapes
accurately and efficiently. We propose a novel approach for representing
anatomical trees using INR, while also capturing the distribution of a set of
trees via denoising diffusion in the space of INRs. We accurately capture the
intricate geometries and topologies of anatomical trees at any desired
resolution. Through extensive qualitative and quantitative evaluation, we
demonstrate high-fidelity tree reconstruction with arbitrary resolution yet
compact storage, and versatility across anatomical sites and tree complexities.
\\ ( https://arxiv.org/abs/2403.08974 ,  10344kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08984 (*cross-listing*)
Date: Wed, 13 Mar 2024 22:19:06 GMT   (9216kb,D)

Title: Safe Road-Crossing by Autonomous Wheelchairs: a Novel Dataset and its
 Experimental Evaluation
Authors: Carlo Grigioni, Franca Corradini, Alessandro Antonucci, J\'er\^ome
 Guzzi and Francesco Flammini
Categories: cs.RO cs.AI cs.MA
Comments: 14 pages, 8 figures
MSC-class: 68T45
ACM-class: I.2.10; C.4; I.2.9; I.4.8
\\
 Safe road-crossing by self-driving vehicles is a crucial problem to address
in smart-cities. In this paper, we introduce a multi-sensor fusion approach to
support road-crossing decisions in a system composed by an autonomous
wheelchair and a flying drone featuring a robust sensory system made of diverse
and redundant components. To that aim, we designed an analytical danger
function based on explainable physical conditions evaluated by single sensors,
including those using machine learning and artificial vision. As a
proof-of-concept, we provide an experimental evaluation in a laboratory
environment, showing the advantages of using multiple sensors, which can
improve decision accuracy and effectively support safety assessment. We made
the dataset available to the scientific community for further experimentation.
The work has been developed in the context of an European project named
REXASI-PRO, which aims to develop trustworthy artificial intelligence for
social navigation of people with reduced mobility.
\\ ( https://arxiv.org/abs/2403.08984 ,  9216kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09024 (*cross-listing*)
Date: Thu, 14 Mar 2024 01:28:13 GMT   (1343kb,D)

Title: Semiparametric Token-Sequence Co-Supervision
Authors: Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang,
 Kyoung-Woon On, Minjoon Seo
Categories: cs.CL cs.AI
\\
 In this work, we introduce a semiparametric token-sequence co-supervision
training method. It trains a language model by simultaneously leveraging
supervision from the traditional next token prediction loss which is calculated
over the parametric token embedding space and the next sequence prediction loss
which is calculated over the nonparametric sequence embedding space. The
nonparametric sequence embedding space is constructed by a separate language
model tasked to condense an input text into a single representative embedding.
Our experiments demonstrate that a model trained via both supervisions
consistently surpasses models trained via each supervision independently.
Analysis suggests that this co-supervision encourages a broader generalization
capability across the model. Especially, the robustness of parametric token
space which is established during the pretraining step tends to effectively
enhance the stability of nonparametric sequence embedding space, a new space
established by another language model.
\\ ( https://arxiv.org/abs/2403.09024 ,  1343kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09029 (*cross-listing*)
Date: Thu, 14 Mar 2024 01:40:40 GMT   (2891kb,D)

Title: Unlocking the conversion of Web Screenshots into HTML Code with the
 WebSight Dataset
Authors: Hugo Lauren\c{c}on, L\'eo Tronchon, Victor Sanh
Categories: cs.HC cs.AI cs.CV
\\
 Using vision-language models (VLMs) in web development presents a promising
strategy to increase efficiency and unblock no-code solutions: by providing a
screenshot or a sketch of a UI, a VLM could generate the code to reproduce it,
for instance in a language like HTML. Despite the advancements in VLMs for
various tasks, the specific challenge of converting a screenshot into a
corresponding HTML has been minimally explored. We posit that this is mainly
due to the absence of a suitable, high-quality dataset. This work introduces
WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and
their corresponding screenshots. We fine-tune a foundational VLM on our dataset
and show proficiency in converting webpage screenshots to functional HTML code.
To accelerate the research in this area, we open-source WebSight.
\\ ( https://arxiv.org/abs/2403.09029 ,  2891kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09057 (*cross-listing*)
Date: Thu, 14 Mar 2024 02:55:37 GMT   (3521kb,D)

Title: A Continued Pretrained LLM Approach for Automatic Medical Note
 Generation
Authors: Dong Yuan, Eti Rastogi, Gautam Naik, Jai Chintagunta, Sree Prasanna
 Rajagopal, Fen Zhao, Sagar Goyal, Jeff Ward
Categories: cs.CL cs.AI
\\
 LLMs are revolutionizing NLP tasks. However, the most powerful LLM, like
GPT-4, is too costly for most domain-specific scenarios. We present the first
continuously trained 13B Llama2-based LLM that is purpose-built for medical
conversations and measured on automated scribing. Our results show that our
model outperforms GPT-4 in PubMedQA with 76.6\% accuracy and matches its
performance in summarizing medical conversations into SOAP notes. Notably, our
model exceeds GPT-4 in capturing a higher number of correct medical concepts
and outperforms human scribes with higher correctness and completeness.
\\ ( https://arxiv.org/abs/2403.09057 ,  3521kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09063 (*cross-listing*)
Date: Thu, 14 Mar 2024 03:07:58 GMT   (11797kb,D)

Title: Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery
Authors: Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A
 Clausi and John Zelek
Categories: cs.CV cs.AI
Comments: Submitted to 21st International Conference on Robots and Vision
 (CRV'24), Guelph, Ontario, Canada
\\
 Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable
challenge and is often hindered by depth ambiguities and reduced precision.
Existing works resort to either pose priors or multi-modal data such as
multi-view or point cloud information, though their methods often overlook the
valuable scene-depth information inherently present in a single image.
Moreover, achieving robust HMR for out-of-distribution (OOD) data is
exceedingly challenging due to inherent variations in pose, shape and depth.
Consequently, understanding the underlying distribution becomes a vital
subproblem in modeling human forms. Motivated by the need for unambiguous and
robust human modeling, we introduce Distribution and depth-aware human mesh
recovery (D2A-HMR), an end-to-end transformer architecture meticulously
designed to minimize the disparity between distributions and incorporate
scene-depth leveraging prior depth information. Our approach demonstrates
superior performance in handling OOD data in certain scenarios while
consistently achieving competitive results against state-of-the-art HMR methods
on controlled datasets.
\\ ( https://arxiv.org/abs/2403.09063 ,  11797kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09072 (*cross-listing*)
Date: Thu, 14 Mar 2024 03:29:58 GMT   (2255kb,D)

Title: UniCode: Learning a Unified Codebook for Multimodal Large Language
 Models
Authors: Sipeng Zheng, Bohan Zhou, Yicheng Feng, Ye Wang, Zongqing Lu
Categories: cs.CV cs.AI cs.CL
Comments: 14 pages, 2 figures, 11 tables
\\
 In this paper, we propose \textbf{UniCode}, a novel approach within the
domain of multimodal large language models (MLLMs) that learns a unified
codebook to efficiently tokenize visual, text, and potentially other types of
signals. This innovation addresses a critical limitation in existing MLLMs:
their reliance on a text-only codebook, which restricts MLLM's ability to
generate images and texts in a multimodal context. Towards this end, we propose
a language-driven iterative training paradigm, coupled with an in-context
pre-training task we term ``image decompression'', enabling our model to
interpret compressed visual data and generate high-quality images.The unified
codebook empowers our model to extend visual instruction tuning to
non-linguistic generation tasks. Moreover, UniCode is adaptable to diverse
stacked quantization approaches in order to compress visual signals into a more
compact token representation. Despite using significantly fewer parameters and
less data during training, Unicode demonstrates promising capabilities in
visual reconstruction and generation. It also achieves performances comparable
to leading MLLMs across a spectrum of VQA benchmarks.
\\ ( https://arxiv.org/abs/2403.09072 ,  2255kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09085 (*cross-listing*)
Date: Thu, 14 Mar 2024 04:06:13 GMT   (5732kb,D)

Title: Meaningful Learning: Advancing Abstract Reasoning in Large Language
 Models via Generic Fact Guidance
Authors: Kai Xiong, Xiao Ding, Ting Liu, Bing Qin, Dongliang Xu, Qing Yang,
 Hongtao Liu, Yixin Cao
Categories: cs.CL cs.AI
\\
 Large language models (LLMs) have developed impressive performance and strong
explainability across various reasoning scenarios, marking a significant stride
towards mimicking human-like intelligence. Despite this, when tasked with
simple questions supported by a generic fact, LLMs often fail to provide
consistent and precise answers, indicating a deficiency in abstract reasoning
abilities. This has sparked a vigorous debate about whether LLMs are genuinely
reasoning or merely memorizing. In light of this, we design a preliminary study
to quantify and delve into the abstract reasoning abilities of existing LLMs.
Our findings reveal a substantial discrepancy between their general reasoning
and abstract reasoning performances. To relieve this problem, we tailor an
abstract reasoning dataset (AbsR) together with a meaningful learning paradigm
to teach LLMs how to leverage generic facts for reasoning purposes. The results
show that our approach not only boosts the general reasoning performance of
LLMs but also makes considerable strides towards their capacity for abstract
reasoning, moving beyond simple memorization or imitation to a more nuanced
understanding and application of generic facts.
\\ ( https://arxiv.org/abs/2403.09085 ,  5732kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09092 (*cross-listing*)
Date: Thu, 14 Mar 2024 04:32:13 GMT   (6495kb,D)

Title: MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection
Authors: Yupeng Li, Haorui He, Jin Bai, and Dacheng Wen
Categories: cs.CL cs.AI
Comments: Accepted by the ACM Web Conference 2024 (WWW 2024) oral, dataset
 available: https://github.com/TrustworthyComp
DOI: 10.1145/3589334.3645385
\\
 The prevalence of fake news across various online sources has had a
significant influence on the public. Existing Chinese fake news detection
datasets are limited to news sourced solely from Weibo. However, fake news
originating from multiple sources exhibits diversity in various aspects,
including its content and social context. Methods trained on purely one single
news source can hardly be applicable to real-world scenarios. Our pilot
experiment demonstrates that the F1 score of the state-of-the-art method that
learns from a large Chinese fake news detection dataset, Weibo-21, drops
significantly from 0.943 to 0.470 when the test data is changed to multi-source
news data, failing to identify more than one-third of the multi-source fake
news. To address this limitation, we constructed the first multi-source
benchmark dataset for Chinese fake news detection, termed MCFEND, which is
composed of news we collected from diverse sources such as social platforms,
messaging apps, and traditional online news outlets. Notably, such news has
been fact-checked by 14 authoritative fact-checking agencies worldwide. In
addition, various existing Chinese fake news detection methods are thoroughly
evaluated on our proposed dataset in cross-source, multi-source, and unseen
source ways. MCFEND, as a benchmark dataset, aims to advance Chinese fake news
detection approaches in real-world scenarios.
\\ ( https://arxiv.org/abs/2403.09092 ,  6495kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09113 (*cross-listing*)
Date: Thu, 14 Mar 2024 05:29:35 GMT   (129kb,D)

Title: AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based
 on Meta Learning
Authors: Ruiyi Zhang, Rushi Qiang, Sai Ashish Somayajula, Pengtao Xie
Categories: cs.CL cs.AI cs.LG
\\
 Large-scale pretraining followed by task-specific finetuning has achieved
great success in various NLP tasks. Since finetuning all parameters of large
pretrained models poses substantial computational and memory challenges,
several efficient finetuning methods have been developed. Among them, low-rank
adaptation (LoRA), which finetunes low-rank incremental update matrices on top
of frozen pretrained weights, has proven particularly effective. Nonetheless,
LoRA's uniform rank assignment across all layers, along with its reliance on an
exhaustive search to find the best rank, leads to high computation costs and
suboptimal finetuning performance. To address these limitations, we introduce
AutoLoRA, a meta learning based framework for automatically identifying the
optimal rank of each LoRA layer. AutoLoRA associates each rank-1 matrix in a
low-rank update matrix with a selection variable, which determines whether the
rank-1 matrix should be discarded. A meta learning based method is developed to
learn these selection variables. The optimal rank is determined by thresholding
the values of these variables. Our comprehensive experiments on natural
language understanding, generation, and sequence labeling demonstrate the
effectiveness of AutoLoRA.
\\ ( https://arxiv.org/abs/2403.09113 ,  129kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09131 (*cross-listing*)
Date: Thu, 14 Mar 2024 06:49:16 GMT   (312kb,D)

Title: ProSwitch: Knowledge-Guided Language Model Fine-Tuning to Generate
 Professional and Non-Professional Styled Text
Authors: Chang Zong, Yuyan Chen, Weiming Lu, Jian Shao, Yueting Zhuang
Categories: cs.CL cs.AI
Comments: 8 pages
MSC-class: 68T50
ACM-class: I.2.7
\\
 Large Language Models (LLMs) have demonstrated efficacy in various linguistic
applications, including text summarization and controlled text generation.
However, studies into their capacity of switching between styles via
fine-tuning remain underexplored. This study concentrates on textual
professionalism and introduces a novel methodology, named ProSwitch, which
equips a language model with the ability to produce both professional and
non-professional responses through knowledge-guided instruction tuning.
ProSwitch unfolds across three phases: data preparation for gathering domain
knowledge and training corpus; instruction tuning for optimizing language
models with multiple levels of instruction formats; and comprehensive
evaluation for assessing the professionalism discrimination and reference-based
quality of generated text. Comparative analysis of ProSwitch against both
general and specialized language models reveals that our approach outperforms
baselines in switching between professional and non-professional text
generation.
\\ ( https://arxiv.org/abs/2403.09131 ,  312kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09141 (*cross-listing*)
Date: Thu, 14 Mar 2024 07:40:32 GMT   (733kb)

Title: Uncertainty Estimation in Multi-Agent Distributed Learning for
 AI-Enabled Edge Devices
Authors: Gleb Radchenko and Victoria Andrea Fill
Categories: cs.DC cs.AI cs.LG
ACM-class: I.2.11
\\
 Initially considered as low-power units with limited autonomous processing,
Edge IoT devices have seen a paradigm shift with the introduction of FPGAs and
AI accelerators. This advancement has vastly amplified their computational
capabilities, emphasizing the practicality of edge AI. Such progress introduces
new challenges of optimizing AI tasks for the limitations of energy and network
resources typical in Edge computing environments. Our study explores methods
that enable distributed data processing through AI-enabled edge devices,
enhancing collaborative learning capabilities. A key focus of our research is
the challenge of determining confidence levels in learning outcomes,
considering the spatial and temporal variability of data sets encountered by
independent agents. To address this issue, we investigate the application of
Bayesian neural networks, proposing a novel approach to manage uncertainty in
distributed learning environments.
\\ ( https://arxiv.org/abs/2403.09141 ,  733kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09142 (*cross-listing*)
Date: Thu, 14 Mar 2024 07:40:54 GMT   (201kb,D)

Title: USimAgent: Large Language Models for Simulating Search Users
Authors: Erhan Zhang, Xingzhu Wang, Peiyuan Gong, Yankai Lin, Jiaxin Mao
Categories: cs.IR cs.AI
\\
 Due to the advantages in the cost-efficiency and reproducibility, user
simulation has become a promising solution to the user-centric evaluation of
information retrieval systems. Nonetheless, accurately simulating user search
behaviors has long been a challenge, because users' actions in search are
highly complex and driven by intricate cognitive processes such as learning,
reasoning, and planning. Recently, Large Language Models (LLMs) have
demonstrated remarked potential in simulating human-level intelligence and have
been used in building autonomous agents for various tasks. However, the
potential of using LLMs in simulating search behaviors has not yet been fully
explored. In this paper, we introduce a LLM-based user search behavior
simulator, USimAgent. The proposed simulator can simulate users' querying,
clicking, and stopping behaviors during search, and thus, is capable of
generating complete search sessions for specific search tasks. Empirical
investigation on a real user behavior dataset shows that the proposed simulator
outperforms existing methods in query generation and is comparable to
traditional methods in predicting user clicks and stopping behaviors. These
results not only validate the effectiveness of using LLMs for user simulation
but also shed light on the development of a more robust and generic user
simulators.
\\ ( https://arxiv.org/abs/2403.09142 ,  201kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09184 (*cross-listing*)
Date: Thu, 14 Mar 2024 08:54:19 GMT   (120kb)

Title: Learning Algorithms for Verification of Markov Decision Processes
Authors: Tom\'a\v{s} Br\'azdil and Krishnendu Chatterjee and Martin Chmelik and
 Vojt\v{e}ch Forejt and Jan K\v{r}et\'insk\'y and Marta Kwiatkowska and Tobias
 Meggendorfer and David Parker and Mateusz Ujma
Categories: eess.SY cs.AI cs.SY
\\
 We present a general framework for applying learning algorithms and
heuristical guidance to the verification of Markov decision processes (MDPs),
based on the ideas of Br\'azdil, T. et al. (2014). Verification of Markov
Decision Processes Using Learning Algorithms. The primary goal of the
techniques presented in that work is to improve performance by avoiding an
exhaustive exploration of the state space, guided by heuristics. This approach
is significantly extended in this work. Several details of the base theory are
refined and errors are fixed. Section 1.3 provides an overview of all
differences.
 The presented framework focuses on probabilistic reachability, which is a
core problem in verification, and is instantiated in two distinct scenarios.
The first assumes that full knowledge of the MDP is available, in particular
precise transition probabilities. It performs a heuristic-driven partial
exploration of the model, yielding precise lower and upper bounds on the
required probability. The second tackles the case where we may only sample the
MDP without knowing the exact transition dynamics. Here, we obtain
probabilistic guarantees, again in terms of both the lower and upper bounds,
which provides efficient stopping criteria for the approximation. In
particular, the latter is an extension of statistical model-checking (SMC) for
unbounded properties in MDPs. In contrast to other related approaches, we do
not restrict our attention to time-bounded (finite-horizon) or discounted
properties, nor assume any particular structural properties of the MDP.
\\ ( https://arxiv.org/abs/2403.09184 ,  120kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09190 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:05:25 GMT   (2518kb,D)

Title: Intention-aware Denoising Diffusion Model for Trajectory Prediction
Authors: Chen Liu, Shibo He, Haoyu Liu, Jiming Chen
Categories: cs.CV cs.AI
Comments: 14 pages, 9 figures
\\
 Trajectory prediction is an essential component in autonomous driving,
particularly for collision avoidance systems. Considering the inherent
uncertainty of the task, numerous studies have utilized generative models to
produce multiple plausible future trajectories for each agent. However, most of
them suffer from restricted representation ability or unstable training issues.
To overcome these limitations, we propose utilizing the diffusion model to
generate the distribution of future trajectories. Two cruxes are to be settled
to realize such an idea. First, the diversity of intention is intertwined with
the uncertain surroundings, making the true distribution hard to parameterize.
Second, the diffusion process is time-consuming during the inference phase,
rendering it unrealistic to implement in a real-time driving system. We propose
an Intention-aware denoising Diffusion Model (IDM), which tackles the above two
problems. We decouple the original uncertainty into intention uncertainty and
action uncertainty and model them with two dependent diffusion processes. To
decrease the inference time, we reduce the variable dimensions in the
intention-aware diffusion process and restrict the initial distribution of the
action-aware diffusion process, which leads to fewer diffusion steps. To
validate our approach, we conduct experiments on the Stanford Drone Dataset
(SDD) and ETH/UCY dataset. Our methods achieve state-of-the-art results, with
an FDE of 13.83 pixels on the SDD dataset and 0.36 meters on the ETH/UCY
dataset. Compared with the original diffusion model, IDM reduces inference time
by two-thirds. Interestingly, our experiments further reveal that introducing
intention information is beneficial in modeling the diffusion process of fewer
steps.
\\ ( https://arxiv.org/abs/2403.09190 ,  2518kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09193 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:07:14 GMT   (3749kb,D)

Title: Are Vision Language Models Texture or Shape Biased and Can We Steer
 Them?
Authors: Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca
 Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuper
Categories: cs.CV cs.AI cs.LG q-bio.NC
\\
 Vision language models (VLMs) have drastically changed the computer vision
model landscape in only a few years, opening an exciting array of new
applications from zero-shot image classification, over to image captioning, and
visual question answering. Unlike pure vision models, they offer an intuitive
way to access visual content through language prompting. The wide applicability
of such models encourages us to ask whether they also align with human vision -
specifically, how far they adopt human-induced visual biases through multimodal
fusion, or whether they simply inherit biases from pure vision models. One
important visual bias is the texture vs. shape bias, or the dominance of local
over global information. In this paper, we study this bias in a wide range of
popular VLMs. Interestingly, we find that VLMs are often more shape-biased than
their vision encoders, indicating that visual biases are modulated to some
extent through text in multimodal models. If text does indeed influence visual
biases, this suggests that we may be able to steer visual biases not just
through visual input but also through language: a hypothesis that we confirm
through extensive experiments. For instance, we are able to steer shape bias
from as low as 49% to as high as 72% through prompting alone. For now, the
strong human bias towards shape (96%) remains out of reach for all tested VLMs.
\\ ( https://arxiv.org/abs/2403.09193 ,  3749kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09199 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:13:51 GMT   (11817kb,D)

Title: Customizing Segmentation Foundation Model via Prompt Learning for
 Instance Segmentation
Authors: Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Bae
Categories: cs.CV cs.AI
Comments: 11 pages, 10 figures
\\
 Recently, foundation models trained on massive datasets to adapt to a wide
range of domains have attracted considerable attention and are actively being
explored within the computer vision community. Among these, the Segment
Anything Model (SAM) stands out for its remarkable progress in generalizability
and flexibility for image segmentation tasks, achieved through prompt-based
object mask generation. However, despite its strength, SAM faces two key
limitations when applied to customized instance segmentation that segments
specific objects or those in unique environments not typically present in the
training data: 1) the ambiguity inherent in input prompts and 2) the necessity
for extensive additional training to achieve optimal segmentation. To address
these challenges, we propose a novel method, customized instance segmentation
via prompt learning tailored to SAM. Our method involves a prompt learning
module (PLM), which adjusts input prompts into the embedding space to better
align with user intentions, thereby enabling more efficient training.
Furthermore, we introduce a point matching module (PMM) to enhance the feature
representation for finer segmentation by ensuring detailed alignment with
ground truth boundaries. Experimental results on various customized instance
segmentation scenarios demonstrate the effectiveness of the proposed method.
\\ ( https://arxiv.org/abs/2403.09199 ,  11817kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09206 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:19:50 GMT   (231kb)

Title: Upper Bound of Bayesian Generalization Error in Partial Concept
 Bottleneck Model (CBM): Partial CBM outperforms naive CBM
Authors: Naoki Hayashi and Yoshihide Sawada
Categories: stat.ML cs.AI cs.LG math.ST stat.TH
Comments: 17 pages, 1 figure, submitted to TMLR
MSC-class: 62F15, 62R01, 68T07
\\
 Concept Bottleneck Model (CBM) is a methods for explaining neural networks.
In CBM, concepts which correspond to reasons of outputs are inserted in the
last intermediate layer as observed values. It is expected that we can
interpret the relationship between the output and concept similar to linear
regression. However, this interpretation requires observing all concepts and
decreases the generalization performance of neural networks. Partial CBM
(PCBM), which uses partially observed concepts, has been devised to resolve
these difficulties. Although some numerical experiments suggest that the
generalization performance of PCBMs is almost as high as that of the original
neural networks, the theoretical behavior of its generalization error has not
been yet clarified since PCBM is singular statistical model. In this paper, we
reveal the Bayesian generalization error in PCBM with a three-layered and
linear architecture. The result indcates that the structure of partially
observed concepts decreases the Bayesian generalization error compared with
that of CBM (full-observed concepts).
\\ ( https://arxiv.org/abs/2403.09206 ,  231kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09209 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:22:17 GMT   (4111kb,D)

Title: LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection
Authors: Xiangrui Cai, Yang Wang, Sihan Xu, Hao Li, Ying Zhang, Xiaojie Yuan
Categories: cs.CR cs.AI cs.LG
Comments: 13 pages
\\
 Enterprises and organizations are faced with potential threats from insider
employees that may lead to serious consequences. Previous studies on insider
threat detection (ITD) mainly focus on detecting abnormal users or abnormal
time periods (e.g., a week or a day). However, a user may have hundreds of
thousands of activities in the log, and even within a day there may exist
thousands of activities for a user, requiring a high investigation budget to
verify abnormal users or activities given the detection results. On the other
hand, existing works are mainly post-hoc methods rather than real-time
detection, which can not report insider threats in time before they cause loss.
In this paper, we conduct the first study towards real-time ITD at activity
level, and present a fine-grained and efficient framework LAN. Specifically,
LAN simultaneously learns the temporal dependencies within an activity sequence
and the relationships between activities across sequences with graph structure
learning. Moreover, to mitigate the data imbalance problem in ITD, we propose a
novel hybrid prediction loss, which integrates self-supervision signals {from
normal activities} and supervision signals from abnormal activities into a
unified loss for anomaly detection. We evaluate the performance of LAN on two
widely used datasets, i.e., CERT r4.2 and CERT r5.2. Extensive and comparative
experiments demonstrate the superiority of LAN, outperforming 9
state-of-the-art baselines by at least 9.92% and 6.35% in AUC for real-time ITD
on CERT r4.2 and r5.2, respectively. Moreover, LAN can be also applied to
post-hoc ITD, surpassing 8 competitive baselines by at least 7.70% and 4.03% in
AUC on two datasets. Finally, the ablation study, parameter analysis, and
compatibility analysis evaluate the impact of each module and hyper-parameter
in LAN.
\\ ( https://arxiv.org/abs/2403.09209 ,  4111kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09227 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:48:36 GMT   (13987kb,D)

Title: BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday
 Activities and Realistic Simulation
Authors: Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana
 Srivastava, Roberto Mart\'in-Mart\'in, Chen Wang, Gabrael Levine, Wensi Ai,
 Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano
 Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari,
 Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R
 Matthews, Ivan Villa-Renteria, Jerry Huayang Tang, Claire Tang, Fei Xia,
 Yunzhu Li, Silvio Savarese, Hyowon Gweon, C. Karen Liu, Jiajun Wu, Li Fei-Fei
Categories: cs.RO cs.AI
Comments: A preliminary version was published at 6th Conference on Robot
 Learning (CoRL 2022)
\\
 We present BEHAVIOR-1K, a comprehensive simulation benchmark for
human-centered robotics. BEHAVIOR-1K includes two components, guided and
motivated by the results of an extensive survey on "what do you want robots to
do for you?". The first is the definition of 1,000 everyday activities,
grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more
than 9,000 objects annotated with rich physical and semantic properties. The
second is OMNIGIBSON, a novel simulation environment that supports these
activities via realistic physics simulation and rendering of rigid bodies,
deformable bodies, and liquids. Our experiments indicate that the activities in
BEHAVIOR-1K are long-horizon and dependent on complex manipulation skills, both
of which remain a challenge for even state-of-the-art robot learning solutions.
To calibrate the simulation-to-reality gap of BEHAVIOR-1K, we provide an
initial study on transferring solutions learned with a mobile manipulator in a
simulated apartment to its real-world counterpart. We hope that BEHAVIOR-1K's
human-grounded nature, diversity, and realism make it valuable for embodied AI
and robot learning research. Project website: https://behavior.stanford.edu.
\\ ( https://arxiv.org/abs/2403.09227 ,  13987kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09288 (*cross-listing*)
Date: Thu, 14 Mar 2024 11:22:06 GMT   (2319kb,D)

Title: Adversarial Training with OCR Modality Perturbation for Scene-Text
 Visual Question Answering
Authors: Zhixuan Shen, Haonan Luo, Sijia Li, Tianrui Li
Categories: cs.CV cs.AI
Comments: 6 pages, 3 figures, accepted by 2024 IEEE International Conference on
 Multimedia and Expo
\\
 Scene-Text Visual Question Answering (ST-VQA) aims to understand scene text
in images and answer questions related to the text content. Most existing
methods heavily rely on the accuracy of Optical Character Recognition (OCR)
systems, and aggressive fine-tuning based on limited spatial location
information and erroneous OCR text information often leads to inevitable
overfitting. In this paper, we propose a multimodal adversarial training
architecture with spatial awareness capabilities. Specifically, we introduce an
Adversarial OCR Enhancement (AOE) module, which leverages adversarial training
in the embedding space of OCR modality to enhance fault-tolerant representation
of OCR texts, thereby reducing noise caused by OCR errors. Simultaneously, We
add a Spatial-Aware Self-Attention (SASA) mechanism to help the model better
capture the spatial relationships among OCR tokens. Various experiments
demonstrate that our method achieves significant performance improvements on
both the ST-VQA and TextVQA datasets and provides a novel paradigm for
multimodal adversarial training.
\\ ( https://arxiv.org/abs/2403.09288 ,  2319kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09290 (*cross-listing*)
Date: Thu, 14 Mar 2024 11:23:39 GMT   (9542kb,D)

Title: SELECTOR: Heterogeneous graph network with convolutional masked
 autoencoder for multimodal robust prediction of cancer survival
Authors: Liangrui Pan, Yijun Peng, Yan Li, Xiang Wang, Wenjuan Liu, Liwen Xu,
 Qingchun Liang and Shaoliang Peng
Categories: cs.CV cs.AI cs.LG
Comments: Accepted on Computers in Biology and Medicine
\\
 Accurately predicting the survival rate of cancer patients is crucial for
aiding clinicians in planning appropriate treatment, reducing cancer-related
medical expenses, and significantly enhancing patients' quality of life.
Multimodal prediction of cancer patient survival offers a more comprehensive
and precise approach. However, existing methods still grapple with challenges
related to missing multimodal data and information interaction within
modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network
based on convolutional mask encoders for robust multimodal prediction of cancer
patient survival. SELECTOR comprises feature edge reconstruction, convolutional
mask encoder, feature cross-fusion, and multimodal survival prediction modules.
Initially, we construct a multimodal heterogeneous graph and employ the
meta-path method for feature edge reconstruction, ensuring comprehensive
incorporation of feature information from graph edges and effective embedding
of nodes. To mitigate the impact of missing features within the modality on
prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to
process the heterogeneous graph post-feature reconstruction. Subsequently, the
feature cross-fusion module facilitates communication between modalities,
ensuring that output features encompass all features of the modality and
relevant information from other modalities. Extensive experiments and analysis
on six cancer datasets from TCGA demonstrate that our method significantly
outperforms state-of-the-art methods in both modality-missing and
intra-modality information-confirmed cases. Our codes are made available at
https://github.com/panliangrui/Selector.
\\ ( https://arxiv.org/abs/2403.09290 ,  9542kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09313 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:03:28 GMT   (1255kb,D)

Title: Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection
Authors: Martin Aubard, L\'aszl\'o Antal, Ana Madureira, Erika \'Abrah\'am
Categories: cs.CV cs.AI
\\
 In this paper we present YOLOX-ViT, a novel object detection model, and
investigate the efficacy of knowledge distillation for model size reduction
without sacrificing performance. Focused on underwater robotics, our research
addresses key questions about the viability of smaller models and the impact of
the visual transformer layer in YOLOX. Furthermore, we introduce a new
side-scan sonar image dataset, and use it to evaluate our object detector's
performance. Results show that knowledge distillation effectively reduces false
positives in wall detection. Additionally, the introduced visual transformer
layer significantly improves object detection accuracy in the underwater
environment. The source code of the knowledge distillation in the YOLOX-ViT is
at https://github.com/remaro-network/KD-YOLOX-ViT.
\\ ( https://arxiv.org/abs/2403.09313 ,  1255kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09317 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:08:44 GMT   (23627kb,D)

Title: SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D
 Pose Estimation In Bin-picking Scenarios
Authors: Ding-Tao Huang, En-Te Lin, Lipeng Chen, Li-Fu Liu, Long Zeng
Categories: cs.CV cs.AI
\\
 Despite the success in 6D pose estimation in bin-picking scenarios, existing
methods still struggle to produce accurate prediction results for symmetry
objects and real world scenarios. The primary bottlenecks include 1) the
ambiguity keypoints caused by object symmetries; 2) the domain gap between real
and synthetic data. To circumvent these problem, we propose a new 6D pose
estimation network with symmetric-aware keypoint prediction and self-training
domain adaptation (SD-Net). SD-Net builds on pointwise keypoint regression and
deep hough voting to perform reliable detection keypoint under clutter and
occlusion. Specifically, at the keypoint prediction stage, we designe a robust
3D keypoints selection strategy considering the symmetry class of objects and
equivalent keypoints, which facilitate locating 3D keypoints even in highly
occluded scenes. Additionally, we build an effective filtering algorithm on
predicted keypoint to dynamically eliminate multiple ambiguity and outlier
keypoint candidates. At the domain adaptation stage, we propose the
self-training framework using a student-teacher training scheme. To carefully
distinguish reliable predictions, we harnesses a tailored heuristics for 3D
geometry pseudo labelling based on semi-chamfer distance. On public Sil'eane
dataset, SD-Net achieves state-of-the-art results, obtaining an average
precision of 96%. Testing learning and generalization abilities on public
Parametric datasets, SD-Net is 8% higher than the state-of-the-art method. The
code is available at https://github.com/dingthuang/SD-Net.
\\ ( https://arxiv.org/abs/2403.09317 ,  23627kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09326 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:15:23 GMT   (32531kb,D)

Title: HeadEvolver: Text to Head Avatars via Locally Learnable Mesh Deformation
Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin
 Wang, Mingming Fan, Ying Shan, Xiaohang Zhan, Zeyu Wang
Categories: cs.GR cs.AI
Comments: 12 pages, 15 figures
ACM-class: I.2.6; I.3.8
\\
 We present HeadEvolver, a novel framework to generate stylized head avatars
from text guidance. HeadEvolver uses locally learnable mesh deformation from a
template head mesh, producing high-quality digital assets for detail-preserving
editing and animation. To tackle the challenges of lacking fine-grained and
semantic-aware local shape control in global deformation through Jacobians, we
introduce a trainable parameter as a weighting factor for the Jacobian at each
triangle to adaptively change local shapes while maintaining global
correspondences and facial features. Moreover, to ensure the coherence of the
resulting shape and appearance from different viewpoints, we use pretrained
image diffusion models for differentiable rendering with regularization terms
to refine the deformation under text guidance. Extensive experiments
demonstrate that our method can generate diverse head avatars with an
articulated mesh that can be edited seamlessly in 3D graphics software,
facilitating downstream applications such as more efficient animation with
inherited blend shapes and semantic consistency.
\\ ( https://arxiv.org/abs/2403.09326 ,  32531kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09333 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:21:37 GMT   (2550kb,D)

Title: Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling
 and Visual-Language Co-Referring
Authors: Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao
 Wang
Categories: cs.CV cs.AI
Comments: Tech report working in progress. Codes, models and datasets will be
 released at https://github.com/jefferyZhan/Griffon
\\
 Large Vision Language Models have achieved fine-grained object perception,
but the limitation of image resolution remains a significant obstacle to
surpass the performance of task-specific experts in complex and dense
scenarios. Such limitation further restricts the model's potential to achieve
nuanced visual and language referring in domains such as GUI Agents, Counting
and \etc. To address this issue, we introduce a unified high-resolution
generalist model, Griffon v2, enabling flexible object referring with visual
and textual prompts. To efficiently scaling up image resolution, we design a
simple and lightweight down-sampling projector to overcome the input tokens
constraint in Large Language Models. This design inherently preserves the
complete contexts and fine details, and significantly improves multimodal
perception ability especially for small objects. Building upon this, we further
equip the model with visual-language co-referring capabilities through a
plug-and-play visual tokenizer. It enables user-friendly interaction with
flexible target images, free-form texts and even coordinates. Experiments
demonstrate that Griffon v2 can localize any objects of interest with visual
and textual referring, achieve state-of-the-art performance on REC, phrase
grounding, and REG tasks, and outperform expert models in object detection and
object counting. Data, codes and models will be released at
https://github.com/jefferyZhan/Griffon.
\\ ( https://arxiv.org/abs/2403.09333 ,  2550kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09338 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:32:40 GMT   (1872kb,D)

Title: LocalMamba: Visual State Space Model with Windowed Selective Scan
Authors: Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu
Categories: cs.CV cs.AI
\\
 Recent advancements in state space models, notably Mamba, have demonstrated
significant progress in modeling long sequences for tasks like language
understanding. Yet, their application in vision tasks has not markedly
surpassed the performance of traditional Convolutional Neural Networks (CNNs)
and Vision Transformers (ViTs). This paper posits that the key to enhancing
Vision Mamba (ViM) lies in optimizing scan directions for sequence modeling.
Traditional ViM approaches, which flatten spatial tokens, overlook the
preservation of local 2D dependencies, thereby elongating the distance between
adjacent tokens. We introduce a novel local scanning strategy that divides
images into distinct windows, effectively capturing local dependencies while
maintaining a global perspective. Additionally, acknowledging the varying
preferences for scan patterns across different network layers, we propose a
dynamic method to independently search for the optimal scan choices for each
layer, substantially improving performance. Extensive experiments across both
plain and hierarchical models underscore our approach's superiority in
effectively capturing image representations. For example, our model
significantly outperforms Vim-Ti by 3.1% on ImageNet with the same 1.5G FLOPs.
Code is available at: https://github.com/hunto/LocalMamba.
\\ ( https://arxiv.org/abs/2403.09338 ,  1872kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09344 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:49:29 GMT   (9072kb,D)

Title: SketchINR: A First Look into Sketches as Implicit Neural Representations
Authors: Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury,
 Aneeshan Sain, Tao Xiang, Timothy Hospedales, Yi-Zhe Song
Categories: cs.CV cs.AI
Comments: CVPR 2024
\\
 We propose SketchINR, to advance the representation of vector sketches with
implicit neural models. A variable length vector sketch is compressed into a
latent space of fixed dimension that implicitly encodes the underlying shape as
a function of time and strokes. The learned function predicts the $xy$ point
coordinates in a sketch at each time and stroke. Despite its simplicity,
SketchINR outperforms existing representations at multiple tasks: (i) Encoding
an entire sketch dataset into a fixed size latent vector, SketchINR gives
$60\times$ and $10\times$ data compression over raster and vector sketches,
respectively. (ii) SketchINR's auto-decoder provides a much higher-fidelity
representation than other learned vector sketch representations, and is
uniquely able to scale to complex vector sketches such as FS-COCO. (iii)
SketchINR supports parallelisation that can decode/render $\sim$$100\times$
faster than other learned vector representations such as SketchRNN. (iv)
SketchINR, for the first time, emulates the human ability to reproduce a sketch
with varying abstraction in terms of number and complexity of strokes. As a
first look at implicit sketches, SketchINR's compact high-fidelity
representation will support future work in modelling long and complex sketches.
\\ ( https://arxiv.org/abs/2403.09344 ,  9072kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09346 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:51:07 GMT   (1092kb,D)

Title: AVIBench: Towards Evaluating the Robustness of Large Vision-Language
 Model on Adversarial Visual-Instructions
Authors: Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao,
 Kaipeng Zhang
Categories: cs.CV cs.AI
\\
 Large Vision-Language Models (LVLMs) have shown significant progress in well
responding to visual-instructions from users. However, these instructions,
encompassing images and text, are susceptible to both intentional and
inadvertent attacks. Despite the critical importance of LVLMs' robustness
against such threats, current research in this area remains limited. To bridge
this gap, we introduce AVIBench, a framework designed to analyze the robustness
of LVLMs when facing various adversarial visual-instructions (AVIs), including
four types of image-based AVIs, ten types of text-based AVIs, and nine types of
content bias AVIs (such as gender, violence, cultural, and racial biases, among
others). We generate 260K AVIs encompassing five categories of multimodal
capabilities (nine tasks) and content bias. We then conduct a comprehensive
evaluation involving 14 open-source LVLMs to assess their performance. AVIBench
also serves as a convenient tool for practitioners to evaluate the robustness
of LVLMs against AVIs. Our findings and extensive experimental results shed
light on the vulnerabilities of LVLMs, and highlight that inherent biases exist
even in advanced closed-source LVLMs like GeminiProVision and GPT-4V. This
underscores the importance of enhancing the robustness, security, and fairness
of LVLMs. The source code and benchmark will be made publicly available.
\\ ( https://arxiv.org/abs/2403.09346 ,  1092kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09359 (*cross-listing*)
Date: Thu, 14 Mar 2024 13:05:43 GMT   (11138kb,D)

Title: D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap
 for Domain-Adaptive Object Detection
Authors: Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan
 Cho and Wonjun Hwang
Categories: cs.CV cs.AI
Comments: Accepted by CVPR 2024. Link: https://github.com/EdwardDo69/D3T
\\
 Domain adaptation for object detection typically entails transferring
knowledge from one visible domain to another visible domain. However, there are
limited studies on adapting from the visible to the thermal domain, because the
domain gap between the visible and thermal domains is much larger than
expected, and traditional domain adaptation can not successfully facilitate
learning in this situation. To overcome this challenge, we propose a
Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training
paradigms for each domain. Specifically, we segregate the source and target
training sets for building dual-teachers and successively deploy exponential
moving average to the student model to individual teachers of each domain. The
framework further incorporates a zigzag learning method between dual teachers,
facilitating a gradual transition from the visible to thermal domains during
training. We validate the superiority of our method through newly designed
experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST.
Source code is available at https://github.com/EdwardDo69/D3T .
\\ ( https://arxiv.org/abs/2403.09359 ,  11138kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09407 (*cross-listing*)
Date: Thu, 14 Mar 2024 13:59:04 GMT   (4038kb,D)

Title: LM2D: Lyrics- and Music-Driven Dance Synthesis
Authors: Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, M{\aa}rten
 Bj\"orkman
Categories: cs.SD cs.AI cs.LG cs.MM eess.AS
\\
 Dance typically involves professional choreography with complex movements
that follow a musical rhythm and can also be influenced by lyrical content. The
integration of lyrics in addition to the auditory dimension, enriches the
foundational tone and makes motion generation more amenable to its semantic
meanings. However, existing dance synthesis methods tend to model motions only
conditioned on audio signals. In this work, we make two contributions to bridge
this gap. First, we propose LM2D, a novel probabilistic architecture that
incorporates a multimodal diffusion model with consistency distillation,
designed to create dance conditioned on both music and lyrics in one diffusion
generation step. Second, we introduce the first 3D dance-motion dataset that
encompasses both music and lyrics, obtained with pose estimation technologies.
We evaluate our model against music-only baseline models with objective metrics
and human evaluations, including dancers and choreographers. The results
demonstrate LM2D is able to produce realistic and diverse dance matching both
lyrics and music. A video summary can be accessed at:
https://youtu.be/4XCgvYookvA.
\\ ( https://arxiv.org/abs/2403.09407 ,  4038kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09409 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:01:26 GMT   (532kb)

Title: "Like a Nesting Doll": Analyzing Recursion Analogies Generated by CS
 Students using Large Language Models
Authors: Seth Bernstein, Paul Denny, Juho Leinonen, Lauren Kan, Arto Hellas,
 Matt Littlefield Sami Sarsa, Stephen MacNeil
Categories: cs.HC cs.AI cs.CL
Comments: 7 pages, 2 figures, ITiCSE 2024 preprint
\\
 Grasping complex computing concepts often poses a challenge for students who
struggle to anchor these new ideas to familiar experiences and understandings.
To help with this, a good analogy can bridge the gap between unfamiliar
concepts and familiar ones, providing an engaging way to aid understanding.
However, creating effective educational analogies is difficult even for
experienced instructors. We investigate to what extent large language models
(LLMs), specifically ChatGPT, can provide access to personally relevant
analogies on demand. Focusing on recursion, a challenging threshold concept, we
conducted an investigation analyzing the analogies generated by more than 350
first-year computing students. They were provided with a code snippet and
tasked to generate their own recursion-based analogies using ChatGPT,
optionally including personally relevant topics in their prompts. We observed a
great deal of diversity in the analogies produced with student-prescribed
topics, in contrast to the otherwise generic analogies, highlighting the value
of student creativity when working with LLMs. Not only did students enjoy the
activity and report an improved understanding of recursion, but they described
more easily remembering analogies that were personally and culturally relevant.
\\ ( https://arxiv.org/abs/2403.09409 ,  532kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09410 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:02:01 GMT   (2485kb,D)

Title: XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via
 Concept-guided Context Optimization
Authors: Yequan Bie, Luyang Luo, Zhixuan Chen, Hao Chen
Categories: cs.CV cs.AI
\\
 Utilizing potent representations of the large vision-language models (VLMs)
to accomplish various downstream tasks has attracted increasing attention.
Within this research field, soft prompt learning has become a representative
approach for efficiently adapting VLMs such as CLIP, to tasks like image
classification. However, most existing prompt learning methods learn text
tokens that are unexplainable, which cannot satisfy the stringent
interpretability requirements of Explainable Artificial Intelligence (XAI) in
high-stakes scenarios like healthcare. To address this issue, we propose a
novel explainable prompt learning framework that leverages medical knowledge by
aligning the semantics of images, learnable prompts, and clinical
concept-driven prompts at multiple granularities. Moreover, our framework
addresses the lack of valuable concept annotations by eliciting knowledge from
large language models and offers both visual and textual explanations for the
prompts. Extensive experiments and explainability analyses conducted on various
datasets, with and without concept labels, demonstrate that our method
simultaneously achieves superior diagnostic performance, flexibility, and
interpretability, shedding light on the effectiveness of foundation models in
facilitating XAI. The code will be made publically available.
\\ ( https://arxiv.org/abs/2403.09410 ,  2485kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09412 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:03:29 GMT   (7761kb,D)

Title: OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in
 Large-Scale Outdoor Environments
Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi
 Yang, Yufeng Yue
Categories: cs.CV cs.AI cs.RO
\\
 Environment maps endowed with sophisticated semantics are pivotal for
facilitating seamless interaction between robots and humans, enabling them to
effectively carry out various tasks. Open-vocabulary maps, powered by
Visual-Language models (VLMs), possess inherent advantages, including
multimodal retrieval and open-set classes. However, existing open-vocabulary
maps are constrained to closed indoor scenarios and VLM features, thereby
diminishing their usability and inference capabilities. Moreover, the absence
of topological relationships further complicates the accurate querying of
specific instances. In this work, we propose OpenGraph, a representation of
open-vocabulary hierarchical graph structure designed for large-scale outdoor
environments. OpenGraph initially extracts instances and their captions from
visual images using 2D foundation models, encoding the captions with features
to enhance textual reasoning. Subsequently, 3D incremental panoramic mapping
with feature embedding is achieved by projecting images onto LiDAR point
clouds. Finally, the environment is segmented based on lane graph connectivity
to construct a hierarchical graph. Validation results from real public dataset
SemanticKITTI demonstrate that, even without fine-tuning the models, OpenGraph
exhibits the ability to generalize to novel semantic classes and achieve the
highest segmentation and query accuracy. The source code of OpenGraph is
publicly available at https://github.com/BIT-DYN/OpenGraph.
\\ ( https://arxiv.org/abs/2403.09412 ,  7761kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09422 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:14:47 GMT   (2668kb,D)

Title: Mitigating attribute amplification in counterfactual image generation
Authors: Tian Xia, M\'elanie Roschewitz, Fabio De Sousa Ribeiro, Charles Jones,
 Ben Glocker
Categories: cs.CV cs.AI
\\
 Causal generative modelling is gaining interest in medical imaging due to its
ability to answer interventional and counterfactual queries. Most work focuses
on generating counterfactual images that look plausible, using auxiliary
classifiers to enforce effectiveness of simulated interventions. We investigate
pitfalls in this approach, discovering the issue of attribute amplification,
where unrelated attributes are spuriously affected during interventions,
leading to biases across protected characteristics and disease status. We show
that attribute amplification is caused by the use of hard labels in the
counterfactual training process and propose soft counterfactual fine-tuning to
mitigate this issue. Our method substantially reduces the amplification effect
while maintaining effectiveness of generated images, demonstrated on a large
chest X-ray dataset. Our work makes an important advancement towards more
faithful and unbiased causal modelling in medical imaging.
\\ ( https://arxiv.org/abs/2403.09422 ,  2668kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09439 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:31:22 GMT   (6998kb,D)

Title: 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Authors: Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao,
 Weiwei Xu, Changqing Zou
Categories: cs.CV cs.AI
Comments: 11 pages, 7 figures
\\
 Text-driven 3D scene generation techniques have made rapid progress in recent
years. Their success is mainly attributed to using existing generative models
to iteratively perform image warping and inpainting to generate 3D scenes.
However, these methods heavily rely on the outputs of existing models, leading
to error accumulation in geometry and appearance that prevent the models from
being used in various scenarios (e.g., outdoor and unreal scenarios). To
address this limitation, we generatively refine the newly generated local views
by querying and aggregating global 3D information, and then progressively
generate the 3D scene. Specifically, we employ a tri-plane features-based NeRF
as a unified representation of the 3D scene to constrain global 3D consistency,
and propose a generative refinement network to synthesize new contents with
higher quality by exploiting the natural image prior from 2D diffusion model as
well as the global 3D information of the current scene. Our extensive
experiments demonstrate that, in comparison to previous methods, our approach
supports wide variety of scene generation and arbitrary camera trajectories
with improved visual quality and 3D consistency.
\\ ( https://arxiv.org/abs/2403.09439 ,  6998kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09442 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:35:53 GMT   (1105kb,D)

Title: LLM-based agents for automating the enhancement of user story quality:
 An early report
Authors: Zheying Zhang, Maruf Rayhan, Tomas Herda, Manuel Goisauf, Pekka
 Abrahamsson
Categories: cs.SE cs.AI
Comments: 16 pages, 5 figures, 2 tables
\\
 In agile software development, maintaining high-quality user stories is
crucial, but also challenging. This study explores the use of large language
models to automatically improve the user story quality in Austrian Post Group
IT agile teams. We developed a reference model for an Autonomous LLM-based
Agent System and implemented it at the company. The quality of user stories in
the study and the effectiveness of these agents for user story quality
improvement was assessed by 11 participants across six agile teams. Our
findings demonstrate the potential of LLMs in improving user story quality,
contributing to the research on AI role in agile development, and providing a
practical example of the transformative impact of AI in an industry setting.
\\ ( https://arxiv.org/abs/2403.09442 ,  1105kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09480 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:22:33 GMT   (4770kb,D)

Title: What Sketch Explainability Really Means for Downstream Tasks
Authors: Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia,
 Aneeshan Sain, Tao Xiang, Yi-Zhe Song
Categories: cs.CV cs.AI
Comments: CVPR 2024
\\
 In this paper, we explore the unique modality of sketch for explainability,
emphasising the profound impact of human strokes compared to conventional
pixel-oriented studies. Beyond explanations of network behavior, we discern the
genuine implications of explainability across diverse downstream sketch-related
tasks. We propose a lightweight and portable explainability solution -- a
seamless plugin that integrates effortlessly with any pre-trained model,
eliminating the need for re-training. Demonstrating its adaptability, we
present four applications: highly studied retrieval and generation, and
completely novel assisted drawing and sketch adversarial attacks. The
centrepiece to our solution is a stroke-level attribution map that takes
different forms when linked with downstream tasks. By addressing the inherent
non-differentiability of rasterisation, we enable explanations at both coarse
stroke level (SLA) and partial stroke level (P-SLA), each with its advantages
for specific downstream tasks.
\\ ( https://arxiv.org/abs/2403.09480 ,  4770kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09488 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:30:14 GMT   (2853kb,D)

Title: Rectifying Demonstration Shortcut in In-Context Learning
Authors: Joonwon Jang, Sanghwan Jang, Wonbin Kweon, Minjin Jeon and Hwanjo Yu
Categories: cs.CL cs.AI
\\
 Large language models (LLMs) are able to solve various tasks with only a few
demonstrations utilizing their in-context learning (ICL) abilities. However,
LLMs often rely on their pre-trained semantic priors of demonstrations rather
than on the input-label relationships to proceed with ICL prediction. In this
work, we term this phenomenon as the `Demonstration Shortcut'. While previous
works have primarily focused on improving ICL prediction results for predefined
tasks, we aim to rectify the Demonstration Shortcut, thereby enabling the LLM
to effectively learn new input-label relationships from demonstrations. To
achieve this, we introduce In-Context Calibration, a demonstration-aware
calibration method. We evaluate the effectiveness of the proposed method in two
settings: (1) the Original ICL Task using the standard label space and (2) the
Task Learning setting, where the label space is replaced with semantically
unrelated tokens. In both settings, In-Context Calibration demonstrates
substantial improvements, with results generalized across three LLM families
(OPT, GPT, and Llama2) under various configurations.
\\ ( https://arxiv.org/abs/2403.09488 ,  2853kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09498 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:40:13 GMT   (860kb,D)

Title: From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward
 Fake News
Authors: Yuhan Liu, Xiuying Chen, Xiaoqing Zhang, Xing Gao, Ji Zhang, Rui Yan
Categories: cs.SI cs.AI cs.CL
\\
 In the digital era, the rapid propagation of fake news and rumors via social
networks brings notable societal challenges and impacts public opinion
regulation. Traditional fake news modeling typically forecasts the general
popularity trends of different groups or numerically represents opinions shift.
However, these methods often oversimplify real-world complexities and overlook
the rich semantic information of news text. The advent of large language models
(LLMs) provides the possibility of modeling subtle dynamics of opinion.
Consequently, in this work, we introduce a Fake news Propagation Simulation
framework (FPS) based on LLM, which studies the trends and control of fake news
propagation in detail. Specifically, each agent in the simulation represents an
individual with a distinct personality. They are equipped with both short-term
and long-term memory, as well as a reflective mechanism to mimic human-like
thinking. Every day, they engage in random opinion exchanges, reflect on their
thinking, and update their opinions. Our simulation results uncover patterns in
fake news propagation related to topic relevance, and individual traits,
aligning with real-world observations. Additionally, we evaluate various
intervention strategies and demonstrate that early and appropriately frequent
interventions strike a balance between governance cost and effectiveness,
offering valuable insights for practical applications. Our study underscores
the significant utility and potential of LLMs in combating fake news.
\\ ( https://arxiv.org/abs/2403.09498 ,  860kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09506 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:53:04 GMT   (1783kb,D)

Title: Don't Judge by the Look: A Motion Coherent Augmentation for Video
 Recognition
Authors: Yitian Zhang, Yue Bai, Huan Wang, Yizhou Wang, Yun Fu
Categories: cs.CV cs.AI cs.LG
Comments: Accepted by ICLR2024
\\
 Current training pipelines in object recognition neglect Hue Jittering when
doing data augmentation as it not only brings appearance changes that are
detrimental to classification, but also the implementation is inefficient in
practice. In this study, we investigate the effect of hue variance in the
context of video recognition and find this variance to be beneficial since
static appearances are less important in videos that contain motion
information. Based on this observation, we propose a data augmentation method
for video recognition, named Motion Coherent Augmentation (MCA), that
introduces appearance variation in videos and implicitly encourages the model
to prioritize motion patterns, rather than static appearances. Concretely, we
propose an operation SwapMix to efficiently modify the appearance of video
samples, and introduce Variation Alignment (VA) to resolve the distribution
shift caused by SwapMix, enforcing the model to learn appearance invariant
representations. Comprehensive empirical evaluation across various
architectures and different datasets solidly validates the effectiveness and
generalization ability of MCA, and the application of VA in other augmentation
methods. Code is available at https://github.com/BeSpontaneous/MCA-pytorch.
\\ ( https://arxiv.org/abs/2403.09506 ,  1783kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09513 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:57:13 GMT   (3353kb,D)

Title: AdaShield: Safeguarding Multimodal Large Language Models from
 Structure-based Attack via Adaptive Shield Prompting
Authors: Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao
Categories: cs.CR cs.AI
Comments: Multimodal Large Language Models Defense, 25 Pages
\\
 With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly
pronounced. However, with the integration of additional modalities, MLLMs are
exposed to new vulnerabilities, rendering them prone to structured-based
jailbreak attacks, where semantic content (e.g., "harmful text") has been
injected into the images to mislead MLLMs. In this work, we aim to defend
against such threats. Specifically, we propose \textbf{Ada}ptive
\textbf{Shield} Prompting (\textbf{AdaShield}), which prepends inputs with
defense prompts to defend MLLMs against structure-based jailbreak attacks
without fine-tuning MLLMs or training additional modules (e.g., post-stage
content detector). Initially, we present a manually designed static defense
prompt, which thoroughly examines the image and instruction content step by
step and specifies response methods to malicious queries. Furthermore, we
introduce an adaptive auto-refinement framework, consisting of a target MLLM
and a LLM-based defense prompt generator (Defender). These components
collaboratively and iteratively communicate to generate a defense prompt.
Extensive experiments on the popular structure-based jailbreak attacks and
benign datasets show that our methods can consistently improve MLLMs'
robustness against structure-based jailbreak attacks without compromising the
model's general capabilities evaluated on standard benign tasks. Our code is
available at https://github.com/rain305f/AdaShield.
\\ ( https://arxiv.org/abs/2403.09513 ,  3353kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09530 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:13:00 GMT   (9985kb,D)

Title: VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision
 Understanding
Authors: Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang,
 Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou
Categories: cs.CV cs.AI cs.CL cs.GR
Comments: 12 pages, 7 figures, pending conference
\\
 The evolution of text to visual components facilitates people's daily lives,
such as generating image, videos from text and identifying the desired elements
within the images. Computer vision models involving the multimodal abilities in
the previous days are focused on image detection, classification based on
well-defined objects. Large language models (LLMs) introduces the
transformation from nature language to visual objects, which present the visual
layout for text contexts. OpenAI GPT-4 has emerged as the pinnacle in LLMs,
while the computer vision (CV) domain boasts a plethora of state-of-the-art
(SOTA) models and algorithms to convert 2D images to their 3D representations.
However, the mismatching between the algorithms with the problem could lead to
undesired results. In response to this challenge, we propose an unified
VisionGPT-3D framework to consolidate the state-of-the-art vision models,
thereby facilitating the development of vision-oriented AI. VisionGPT-3D
provides a versatile multimodal framework building upon the strengths of
multimodal foundation models. It seamlessly integrates various SOTA vision
models and brings the automation in the selection of SOTA vision models,
identifies the suitable 3D mesh creation algorithms corresponding to 2D depth
maps analysis, generates optimal results based on diverse multimodal inputs
such as text prompts.
 Keywords: VisionGPT-3D, 3D vision understanding, Multimodal agent
\\ ( https://arxiv.org/abs/2403.09530 ,  9985kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09539 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:27:49 GMT   (1651kb,D)

Title: Logits of API-Protected LLMs Leak Proprietary Information
Authors: Matthew Finlayson, Swabha Swayamdipta, Xiang Ren
Categories: cs.CL cs.AI cs.CR cs.LG
MSC-class: 68T50
ACM-class: I.2.7
\\
 The commercialization of large language models (LLMs) has led to the common
practice of high-level API-only access to proprietary models. In this work, we
show that even with a conservative assumption about the model architecture, it
is possible to learn a surprisingly large amount of non-public information
about an API-protected LLM from a relatively small number of API queries (e.g.,
costing under $1,000 for OpenAI's gpt-3.5-turbo). Our findings are centered on
one key observation: most modern LLMs suffer from a softmax bottleneck, which
restricts the model outputs to a linear subspace of the full output space. We
show that this lends itself to a model image or a model signature which unlocks
several capabilities with affordable cost: efficiently discovering the LLM's
hidden size, obtaining full-vocabulary outputs, detecting and disambiguating
different model updates, identifying the source LLM given a single full LLM
output, and even estimating the output layer parameters. Our empirical
investigations show the effectiveness of our methods, which allow us to
estimate the embedding size of OpenAI's gpt-3.5-turbo to be about 4,096.
Lastly, we discuss ways that LLM providers can guard against these attacks, as
well as how these capabilities can be viewed as a feature (rather than a bug)
by allowing for greater transparency and accountability.
\\ ( https://arxiv.org/abs/2403.09539 ,  1651kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09565 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:56:52 GMT   (261kb,D)

Title: Welcome Your New AI Teammate: On Safety Analysis by Leashing Large
 Language Models
Authors: Ali Nouri, Beatriz Cabrero-Daniel, Fredrik T\"orner, H\.akan
 Sivencrona, Christian Berger
Categories: cs.SE cs.AI
Comments: Accepted in CAIN 2024, 6 pages, 1 figure
DOI: 10.1145/3644815.3644953
\\
 DevOps is a necessity in many industries, including the development of
Autonomous Vehicles. In those settings, there are iterative activities that
reduce the speed of SafetyOps cycles. One of these activities is "Hazard
Analysis & Risk Assessment" (HARA), which is an essential step to start the
safety requirements specification. As a potential approach to increase the
speed of this step in SafetyOps, we have delved into the capabilities of Large
Language Models (LLMs).
 Our objective is to systematically assess their potential for application in
the field of safety engineering. To that end, we propose a framework to support
a higher degree of automation of HARA with LLMs. Despite our endeavors to
automate as much of the process as possible, expert review remains crucial to
ensure the validity and correctness of the analysis results, with necessary
modifications made accordingly.
\\ ( https://arxiv.org/abs/2403.09565 ,  261kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09567 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:57:18 GMT   (1680kb,D)

Title: Enhancing Trust in Autonomous Agents: An Architecture for Accountability
 and Explainability through Blockchain and Large Language Models
Authors: Laura Fern\'andez-Becerra, Miguel \'Angel Gonz\'alez-Santamarta,
 \'Angel Manuel Guerrero-Higueras, Francisco Javier Rodr\'iguez-Lera and
 Vicente Matell\'an Olivera
Categories: cs.RO cs.AI
Comments: 21 pages, 12 figures
\\
 The deployment of autonomous agents in environments involving human
interaction has increasingly raised security concerns. Consequently,
understanding the circumstances behind an event becomes critical, requiring the
development of capabilities to justify their behaviors to non-expert users.
Such explanations are essential in enhancing trustworthiness and safety, acting
as a preventive measure against failures, errors, and misunderstandings.
Additionally, they contribute to improving communication, bridging the gap
between the agent and the user, thereby improving the effectiveness of their
interactions. This work presents an accountability and explainability
architecture implemented for ROS-based mobile robots. The proposed solution
consists of two main components. Firstly, a black box-like element to provide
accountability, featuring anti-tampering properties achieved through blockchain
technology. Secondly, a component in charge of generating natural language
explanations by harnessing the capabilities of Large Language Models (LLMs)
over the data contained within the previously mentioned black box. The study
evaluates the performance of our solution in three different scenarios, each
involving autonomous agent navigation functionalities. This evaluation includes
a thorough examination of accountability and explainability metrics,
demonstrating the effectiveness of our approach in using accountable data from
robot actions to obtain coherent, accurate and understandable explanations,
even when facing challenges inherent in the use of autonomous agents in
real-world scenarios.
\\ ( https://arxiv.org/abs/2403.09567 ,  1680kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09603 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:44:35 GMT   (1505kb,D)

Title: Optimistic Verifiable Training by Controlling Hardware Nondeterminism
Authors: Megha Srivastava, Simran Arora, Dan Boneh
Categories: cs.CR cs.AI cs.LG
Comments: 11 pages, 5 figures, preprint
\\
 The increasing compute demands of AI systems has led to the emergence of
services that train models on behalf of clients lacking necessary resources.
However, ensuring correctness of training and guarding against potential
training-time attacks, such as data poisoning, poses challenges. Existing works
on verifiable training largely fall into two classes: proof-based systems,
which struggle to scale due to requiring cryptographic techniques, and
"optimistic" methods that consider a trusted third-party auditor who replicates
the training process. A key challenge with the latter is that hardware
nondeterminism between GPU types during training prevents an auditor from
replicating the training process exactly, and such schemes are therefore
non-robust. We propose a method that combines training in a higher precision
than the target model, rounding after intermediate computation steps, and
storing rounding decisions based on an adaptive thresholding procedure, to
successfully control for nondeterminism. Across three different NVIDIA GPUs
(A40, Titan XP, RTX 2080 Ti), we achieve exact training replication at FP32
precision for both full-training and fine-tuning of ResNet-50 (23M) and GPT-2
(117M) models. Our verifiable training scheme significantly decreases the
storage and time costs compared to proof-based systems.
\\ ( https://arxiv.org/abs/2403.09603 ,  1505kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09605 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:47:01 GMT   (6639kb,D)

Title: Counterfactual contrastive learning: robust representations via causal
 image synthesis
Authors: Melanie Roschewitz, Fabio De Sousa Ribeiro, Tian Xia, Galvin Khara,
 Ben Glocker
Categories: cs.CV cs.AI
Comments: Code available at
 https://github.com/biomedia-mira/counterfactual-contrastive
\\
 Contrastive pretraining is well-known to improve downstream task performance
and model generalisation, especially in limited label settings. However, it is
sensitive to the choice of augmentation pipeline. Positive pairs should
preserve semantic information while destroying domain-specific information.
Standard augmentation pipelines emulate domain-specific changes with
pre-defined photometric transformations, but what if we could simulate
realistic domain changes instead? In this work, we show how to utilise recent
progress in counterfactual image generation to this effect. We propose
CF-SimCLR, a counterfactual contrastive learning approach which leverages
approximate counterfactual inference for positive pair creation. Comprehensive
evaluation across five datasets, on chest radiography and mammography,
demonstrates that CF-SimCLR substantially improves robustness to acquisition
shift with higher downstream performance on both in- and out-of-distribution
data, particularly for domains which are under-represented during training.
\\ ( https://arxiv.org/abs/2403.09605 ,  6639kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09606 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:47:20 GMT   (95kb)

Title: Large Language Models and Causal Inference in Collaboration: A
 Comprehensive Survey
Authors: Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang
 Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei
 Ai and Furong Huang
Categories: cs.CL cs.AI
\\
 Causal inference has shown potential in enhancing the predictive accuracy,
fairness, robustness, and explainability of Natural Language Processing (NLP)
models by capturing causal relationships among variables. The emergence of
generative Large Language Models (LLMs) has significantly impacted various NLP
domains, particularly through their advanced reasoning capabilities. This
survey focuses on evaluating and improving LLMs from a causal view in the
following areas: understanding and improving the LLMs' reasoning capacity,
addressing fairness and safety issues in LLMs, complementing LLMs with
explanations, and handling multimodality. Meanwhile, LLMs' strong reasoning
capacities can in turn contribute to the field of causal inference by aiding
causal relationship discovery and causal effect estimations. This review
explores the interplay between causal inference frameworks and LLMs from both
perspectives, emphasizing their collective potential to further the development
of more advanced and equitable artificial intelligence systems.
\\ ( https://arxiv.org/abs/2403.09606 ,  95kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09629 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:58:16 GMT   (510kb,D)

Title: Quiet-STaR: Language Models Can Teach Themselves to Think Before
 Speaking
Authors: Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber,
 Noah D. Goodman
Categories: cs.CL cs.AI cs.LG
\\
 When writing and talking, people sometimes pause to think. Although
reasoning-focused works have often framed reasoning as a method of answering
questions or completing agentic tasks, reasoning is implicit in almost all
written text. For example, this applies to the steps not stated between the
lines of a proof or to the theory of mind underlying a conversation. In the
Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned
by inferring rationales from few-shot examples in question-answering and
learning from those that lead to a correct answer. This is a highly constrained
setting -- ideally, a language model could instead learn to infer unstated
rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR
in which LMs learn to generate rationales at each token to explain future text,
improving their predictions. We address key challenges, including 1) the
computational cost of generating continuations, 2) the fact that the LM does
not initially know how to generate or use internal thoughts, and 3) the need to
predict beyond individual next tokens. To resolve these, we propose a tokenwise
parallel sampling algorithm, using learnable tokens indicating a thought's
start and end, and an extended teacher-forcing technique. Encouragingly,
generated rationales disproportionately help model difficult-to-predict tokens
and improve the LM's ability to directly answer difficult questions. In
particular, after continued pretraining of an LM on a corpus of internet text
with Quiet-STaR, we find zero-shot improvements on GSM8K
(5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and
observe a perplexity improvement of difficult tokens in natural text.
Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR
marks a step towards LMs that can learn to reason in a more general and
scalable way.
\\ ( https://arxiv.org/abs/2403.09629 ,  510kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09631 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:58:41 GMT   (39710kb,D)

Title: 3D-VLA: A 3D Vision-Language-Action Generative World Model
Authors: Haoyu Zhen and Xiaowen Qiu and Peihao Chen and Jincheng Yang and Xin
 Yan and Yilun Du and Yining Hong and Chuang Gan
Categories: cs.CV cs.AI cs.CL cs.RO
Comments: Project page: https://vis-www.cs.umass.edu/3dvla/
\\
 Recent vision-language-action (VLA) models rely on 2D inputs, lacking
integration with the broader realm of the 3D physical world. Furthermore, they
perform action prediction by learning a direct mapping from perception to
action, neglecting the vast dynamics of the world and the relations between
actions and dynamics. In contrast, human beings are endowed with world models
that depict imagination about future scenarios to plan actions accordingly. To
this end, we propose 3D-VLA by introducing a new family of embodied foundation
models that seamlessly link 3D perception, reasoning, and action through a
generative world model. Specifically, 3D-VLA is built on top of a 3D-based
large language model (LLM), and a set of interaction tokens is introduced to
engage with the embodied environment. Furthermore, to inject generation
abilities into the model, we train a series of embodied diffusion models and
align them into the LLM for predicting the goal images and point clouds. To
train our 3D-VLA, we curate a large-scale 3D embodied instruction dataset by
extracting vast 3D-related information from existing robotics datasets. Our
experiments on held-in datasets demonstrate that 3D-VLA significantly improves
the reasoning, multimodal generation, and planning capabilities in embodied
environments, showcasing its potential in real-world applications.
\\ ( https://arxiv.org/abs/2403.09631 ,  39710kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09635 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:59:14 GMT   (684kb,D)

Title: Transformers Get Stable: An End-to-End Signal Propagation Theory for
 Language Models
Authors: Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia, Jungho Jung, Harshith
 Goka, Haejun Lee
Categories: cs.CL cs.AI cs.CV cs.LG
Comments: Akhil Kedia, Mohd Abbas Zaidi, Sushil Khyalia equal contribution.
 Source code is available at
 https://github.com/akhilkedia/TranformersGetStable
ACM-class: I.2.7; I.2.10
\\
 In spite of their huge success, transformer models remain difficult to scale
in depth. In this work, we develop a unified signal propagation theory and
provide formulae that govern the moments of the forward and backward signal
through the transformer model. Our framework can be used to understand and
mitigate vanishing/exploding gradients, rank collapse, and instability
associated with high attention scores. We also propose DeepScaleLM, an
initialization and scaling scheme that conserves unit output/gradient moments
throughout the model, enabling the training of very deep models with 100s of
layers. We find that transformer models could be much deeper - our deep models
with fewer parameters outperform shallow models in Language Modeling, Speech
Translation, and Image Classification, across Encoder-only, Decoder-only and
Encoder-Decoder variants, for both Pre-LN and Post-LN transformers, for
multiple datasets and model sizes. These improvements also translate into
improved performance on downstream Question Answering tasks and improved
robustness for image classification.
\\ ( https://arxiv.org/abs/2403.09635 ,  684kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09163 (*cross-listing*)
Date: Thu, 14 Mar 2024 08:19:41 GMT   (1236kb)

Title: Caveat Lector: Large Language Models in Legal Practice
Authors: Eliza Mik
Categories: cs.CL cs.CY
Comments: Vol 19 Rutgers Bus L R 2 2024 (forthcoming)
\\
 The current fascination with large language models, or LLMs, derives from the
fact that many users lack the expertise to evaluate the quality of the
generated text. LLMs may therefore appear more capable than they actually are.
The dangerous combination of fluency and superficial plausibility leads to the
temptation to trust the generated text and creates the risk of overreliance.
Who would not trust perfect legalese? Relying recent findings in both technical
and legal scholarship, this Article counterbalances the overly optimistic
predictions as to the role of LLMs in legal practice. Integrating LLMs into
legal workstreams without a better comprehension of their limitations, will
create inefficiencies if not outright risks. Notwithstanding their
unprecedented ability to generate text, LLMs do not understand text. Without
the ability to understand meaning, LLMs will remain unable to use language, to
acquire knowledge and to perform complex reasoning tasks. Trained to model
language on the basis of stochastic word predictions, LLMs cannot distinguish
fact from fiction. Their knowledge of the law is limited to word strings
memorized in their parameters. It is also incomplete and largely incorrect.
LLMs operate at the level of word distributions, not at the level of verified
facts. The resulting propensity to hallucinate, to produce statements that are
incorrect but appear helpful and relevant, is alarming in high-risk areas like
legal services. At present, lawyers should beware of relying on text generated
by LLMs.
\\ ( https://arxiv.org/abs/2403.09163 ,  1236kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09219 (*cross-listing*)
Date: Thu, 14 Mar 2024 09:37:54 GMT   (1235kb)

Title: An Extensive Comparison of Static Application Security Testing Tools
Authors: Matteo Esposito, Valentina Falaschi, Davide Falessi
Categories: cs.SE cs.CR cs.CY
\\
 Context: Static Application Security Testing Tools (SASTTs) identify software
vulnerabilities to support the security and reliability of software
applications. Interestingly, several studies have suggested that alternative
solutions may be more effective than SASTTs due to their tendency to generate
false alarms, commonly referred to as low Precision. Aim: We aim to
comprehensively evaluate SASTTs, setting a reliable benchmark for assessing and
finding gaps in vulnerability identification mechanisms based on SASTTs or
alternatives. Method: Our SASTTs evaluation is based on a controlled, though
synthetic, Java codebase. It involves an assessment of 1.5 million test
executions, and it features innovative methodological features such as
effort-aware accuracy metrics and method-level analysis. Results: Our findings
reveal that SASTTs detect a tiny range of vulnerabilities. In contrast to
prevailing wisdom, SASTTs exhibit high Precision while falling short in Recall.
Conclusions: The paper suggests that enhancing Recall, alongside expanding the
spectrum of detected vulnerability types, should be the primary focus for
improving SASTTs or alternative approaches, such as machine learning-based
vulnerability identification solutions.
\\ ( https://arxiv.org/abs/2403.09219 ,  1235kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09254 (*cross-listing*)
Date: Thu, 14 Mar 2024 10:22:01 GMT   (1570kb,D)

Title: Gun Culture in Fringe Social Media
Authors: Fatemeh Tahmasbi, Aakarsha Chug, Barry Bradlyn, Jeremy Blackburn
Categories: cs.SI cs.CY
\\
 The increasing frequency of mass shootings in the United States has,
unfortunately, become a norm. While the issue of gun control in the US involves
complex legal concerns, there are also societal issues at play. One such social
issue is so-called "gun culture," i.e., a general set of beliefs and actions
related to gun ownership. However relatively little is known about gun culture,
and even less is known when it comes to fringe online communities. This is
especially worrying considering the aforementioned rise in mass shootings and
numerous instances of shooters being radicalized online.
 To address this gap, we explore gun culture on /k/, 4chan's weapons board.
More specifically, using a variety of quantitative techniques, we examine over
4M posts on /k/ and position their discussion within the larger body of
theoretical understanding of gun culture. Among other things, our findings
suggest that gun culture on /k/ covers a relatively diverse set of topics (with
a particular focus on legal discussion), some of which are signals of
fetishism.
\\ ( https://arxiv.org/abs/2403.09254 ,  1570kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08787 (*cross-listing*)
Date: Tue, 30 Jan 2024 02:03:18 GMT   (3195kb)

Title: Multi-view Subspace Clustering via An Adaptive Consensus Graph Filter
Authors: Lai Wei, Shanshan Song
Categories: cs.CV cs.LG
\\
 Multiview subspace clustering (MVSC) has attracted an increasing amount of
attention in recent years. Most existing MVSC methods first collect
complementary information from different views and consequently derive a
consensus reconstruction coefficient matrix to indicate the subspace structure
of a multi-view data set. In this paper, we initially assume the existence of a
consensus reconstruction coefficient matrix and then use it to build a
consensus graph filter. In each view, the filter is employed for smoothing the
data and designing a regularizer for the reconstruction coefficient matrix.
Finally, the obtained reconstruction coefficient matrices from different views
are used to create constraints for the consensus reconstruction coefficient
matrix. Therefore, in the proposed method, the consensus reconstruction
coefficient matrix, the consensus graph filter, and the reconstruction
coefficient matrices from different views are interdependent. We provide an
optimization algorithm to obtain their optimal values. Extensive experiments on
diverse multi-view data sets demonstrate that our approach outperforms some
state-of-the-art methods.
\\ ( https://arxiv.org/abs/2403.08787 ,  3195kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08792 (*cross-listing*)
Date: Tue, 30 Jan 2024 16:12:20 GMT   (5981kb,D)

Title: Realtime Facial Expression Recognition: Neuromorphic Hardware vs. Edge
 AI Accelerators
Authors: Heath Smith, James Seekings, Mohammadreza Mohammadi, Ramtin Zand
Categories: cs.CV cs.LG cs.NE cs.PF
\\
 The paper focuses on real-time facial expression recognition (FER) systems as
an important component in various real-world applications such as social
robotics. We investigate two hardware options for the deployment of FER machine
learning (ML) models at the edge: neuromorphic hardware versus edge AI
accelerators. Our study includes exhaustive experiments providing comparative
analyses between the Intel Loihi neuromorphic processor and four distinct edge
platforms: Raspberry Pi-4, Intel Neural Compute Stick (NSC), Jetson Nano, and
Coral TPU. The results obtained show that Loihi can achieve approximately two
orders of magnitude reduction in power dissipation and one order of magnitude
energy savings compared to Coral TPU which happens to be the least
power-intensive and energy-consuming edge AI accelerator. These reductions in
power and energy are achieved while the neuromorphic solution maintains a
comparable level of accuracy with the edge accelerators, all within the
real-time latency requirements.
\\ ( https://arxiv.org/abs/2403.08792 ,  5981kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08793 (*cross-listing*)
Date: Tue, 30 Jan 2024 17:21:28 GMT   (3360kb,D)

Title: Neural Loss Function Evolution for Large-Scale Image Classifier
 Convolutional Neural Networks
Authors: Brandon Morgan and Dean Hougen
Categories: cs.CV cs.LG
\\
 For classification, neural networks typically learn by minimizing
cross-entropy, but are evaluated and compared using accuracy. This disparity
suggests neural loss function search (NLFS), the search for a drop-in
replacement loss function of cross-entropy for neural networks. We apply NLFS
to image classifier convolutional neural networks. We propose a new search
space for NLFS that encourages more diverse loss functions to be explored, and
a surrogate function that accurately transfers to large-scale convolutional
neural networks. We search the space using regularized evolution, a
mutation-only aging genetic algorithm. After evolution and a proposed loss
function elimination protocol, we transferred the final loss functions across
multiple architectures, datasets, and image augmentation techniques to assess
generalization. In the end, we discovered three new loss functions, called
NeuroLoss1, NeuroLoss2, and NeuroLoss3 that were able to outperform
cross-entropy in terms of a higher mean test accuracy as a simple drop-in
replacement loss function across the majority of experiments.
\\ ( https://arxiv.org/abs/2403.08793 ,  3360kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08804 (*cross-listing*)
Date: Tue, 6 Feb 2024 09:07:12 GMT   (863kb,D)

Title: Forward Direct Feedback Alignment for Online Gradient Estimates of
 Spiking Neural Networks
Authors: Florian Bacho, Dminique Chu
Categories: cs.NE cs.LG
\\
 There is an interest in finding energy efficient alternatives to current
state of the art neural network training algorithms. Spiking neural network are
a promising approach, because they can be simulated energy efficiently on
neuromorphic hardware platforms. However, these platforms come with limitations
on the design of the training algorithm. Most importantly, backpropagation
cannot be implemented on those. We propose a novel neuromorphic algorithm, the
\textit{Spiking Forward Direct Feedback Alignment} (SFDFA) algorithm, an
adaption of \textit{Forward Direct Feedback Alignment} to train SNNs. SFDFA
estimates the weights between output and hidden neurons as feedback
connections. The main contribution of this paper is to describe how exact local
gradients of spikes can be computed in an online manner while taking into
account the intra-neuron dependencies between post-synaptic spikes and derive a
dynamical system for neuromorphic hardware compatibility. We compare the SFDFA
algorithm with a number of competitor algorithms and show that the proposed
algorithm achieves higher performance and convergence rates.
\\ ( https://arxiv.org/abs/2403.08804 ,  863kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08806 (*cross-listing*)
Date: Tue, 6 Feb 2024 11:35:05 GMT   (145kb,D)

Title: Adversarially Robust Deepfake Detection via Adversarial Feature
 Similarity Learning
Authors: Sarwar Khan
Categories: cs.CV cs.LG cs.MM
Comments: MMM 2024 Accepted
DOI: 10.1007/978-3-031-53311-2_37
\\
 Deepfake technology has raised concerns about the authenticity of digital
content, necessitating the development of effective detection methods. However,
the widespread availability of deepfakes has given rise to a new challenge in
the form of adversarial attacks. Adversaries can manipulate deepfake videos
with small, imperceptible perturbations that can deceive the detection models
into producing incorrect outputs. To tackle this critical issue, we introduce
Adversarial Feature Similarity Learning (AFSL), which integrates three
fundamental deep feature learning paradigms. By optimizing the similarity
between samples and weight vectors, our approach aims to distinguish between
real and fake instances. Additionally, we aim to maximize the similarity
between both adversarially perturbed examples and unperturbed examples,
regardless of their real or fake nature. Moreover, we introduce a
regularization technique that maximizes the dissimilarity between real and fake
samples, ensuring a clear separation between these two categories. With
extensive experiments on popular deepfake datasets, including FaceForensics++,
FaceShifter, and DeeperForensics, the proposed method outperforms other
standard adversarial training-based defense methods significantly. This further
demonstrates the effectiveness of our approach to protecting deepfake detectors
from adversarial attacks.
\\ ( https://arxiv.org/abs/2403.08806 ,  145kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08812 (*cross-listing*)
Date: Fri, 9 Feb 2024 18:16:17 GMT   (4209kb,D)

Title: Gore Diffusion LoRA Model
Authors: Ayush Thakur and Ashwani Kumar Dubey
Categories: cs.HC cs.GR cs.LG
\\
 The Emergence of Artificial Intelligence (AI) has significantly impacted our
engagement with violence, sparking ethical deliberations regarding the
algorithmic creation of violent imagery. This paper scrutinizes the "Gore
Diffusion LoRA Model," an innovative AI model proficient in generating
hyper-realistic visuals portraying intense violence and bloodshed. Our
exploration encompasses the model's technical intricacies, plausible
applications, and the ethical quandaries inherent in its utilization. We
contend that the creation and implementation of such models warrant a
meticulous discourse concerning the convergence of AI, art, and violence.
Furthermore, we advocate for a structured framework advocating responsible
development and ethical deployment of these potent technologies.
\\ ( https://arxiv.org/abs/2403.08812 ,  4209kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08821 (*cross-listing*)
Date: Sat, 24 Feb 2024 05:48:05 GMT   (2777kb,D)

Title: Effective Gradient Sample Size via Variation Estimation for Accelerating
 Sharpness aware Minimization
Authors: Jiaxin Deng, Junbiao Pang, Baochang Zhang, Tian Wang
Categories: cs.CV cs.LG
\\
 Sharpness-aware Minimization (SAM) has been proposed recently to improve
model generalization ability. However, SAM calculates the gradient twice in
each optimization step, thereby doubling the computation costs compared to
stochastic gradient descent (SGD). In this paper, we propose a simple yet
efficient sampling method to significantly accelerate SAM. Concretely, we
discover that the gradient of SAM is a combination of the gradient of SGD and
the Projection of the Second-order gradient matrix onto the First-order
gradient (PSF). PSF exhibits a gradually increasing frequency of change during
the training process. To leverage this observation, we propose an adaptive
sampling method based on the variation of PSF, and we reuse the sampled PSF for
non-sampling iterations. Extensive empirical results illustrate that the
proposed method achieved state-of-the-art accuracies comparable to SAM on
diverse network architectures.
\\ ( https://arxiv.org/abs/2403.08821 ,  2777kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08826 (*cross-listing*)
Date: Sun, 10 Mar 2024 16:00:41 GMT   (1524kb,D)

Title: A Dataset for the Validation of Truth Inference Algorithms Suitable for
 Online Deployment
Authors: Fei Wang, Haoyu Liu, Haoyang Bi, Xiangzhuang Shen, Renyu Zhu, Runze
 Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Qi Liu, Zhenya Huang, Enhong Chen
Categories: cs.HC cs.LG
\\
 For the purpose of efficient and cost-effective large-scale data labeling,
crowdsourcing is increasingly being utilized. To guarantee the quality of data
labeling, multiple annotations need to be collected for each data sample, and
truth inference algorithms have been developed to accurately infer the true
labels. Despite previous studies having released public datasets to evaluate
the efficacy of truth inference algorithms, these have typically focused on a
single type of crowdsourcing task and neglected the temporal information
associated with workers' annotation activities. These limitations significantly
restrict the practical applicability of these algorithms, particularly in the
context of long-term and online truth inference. In this paper, we introduce a
substantial crowdsourcing annotation dataset collected from a real-world
crowdsourcing platform. This dataset comprises approximately two thousand
workers, one million tasks, and six million annotations. The data was gathered
over a period of approximately six months from various types of tasks, and the
timestamps of each annotation were preserved. We analyze the characteristics of
the dataset from multiple perspectives and evaluate the effectiveness of
several representative truth inference algorithms on this dataset. We
anticipate that this dataset will stimulate future research on tracking
workers' abilities over time in relation to different types of tasks, as well
as enhancing online truth inference.
\\ ( https://arxiv.org/abs/2403.08826 ,  1524kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08829 (*cross-listing*)
Date: Mon, 11 Mar 2024 12:08:08 GMT   (1155kb,D)

Title: Mitigating Biases in Collective Decision-Making: Enhancing Performance
 in the Face of Fake News
Authors: Axel Abels, Elias Fernandez Domingos, Ann Now\'e, Tom Lenaerts
Categories: cs.HC cs.LG cs.SI
\\
 Individual and social biases undermine the effectiveness of human advisers by
inducing judgment errors which can disadvantage protected groups. In this
paper, we study the influence these biases can have in the pervasive problem of
fake news by evaluating human participants' capacity to identify false
headlines. By focusing on headlines involving sensitive characteristics, we
gather a comprehensive dataset to explore how human responses are shaped by
their biases. Our analysis reveals recurring individual biases and their
permeation into collective decisions. We show that demographic factors,
headline categories, and the manner in which information is presented
significantly influence errors in human judgment. We then use our collected
data as a benchmark problem on which we evaluate the efficacy of adaptive
aggregation algorithms. In addition to their improved accuracy, our results
highlight the interactions between the emergence of collective intelligence and
the mitigation of participant biases.
\\ ( https://arxiv.org/abs/2403.08829 ,  1155kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08831 (*cross-listing*)
Date: Tue, 12 Mar 2024 18:01:30 GMT   (40kb)

Title: Majority-of-Three: The Simplest Optimal Learner?
Authors: Ishaq Aden-Ali, Mikael M{\o}ller H{\o}gsgaard, Kasper Green Larsen,
 Nikita Zhivotovskiy
Categories: stat.ML cs.LG math.ST stat.TH
Comments: 22 pages
\\
 Developing an optimal PAC learning algorithm in the realizable setting, where
empirical risk minimization (ERM) is suboptimal, was a major open problem in
learning theory for decades. The problem was finally resolved by Hanneke a few
years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns
the majority vote of many ERM classifiers that are trained on carefully
selected subsets of the data. It is thus a natural goal to determine the
simplest algorithm that is optimal. In this work we study the arguably simplest
algorithm that could be optimal: returning the majority vote of three ERM
classifiers. We show that this algorithm achieves the optimal in-expectation
bound on its error which is provably unattainable by a single ERM classifier.
Furthermore, we prove a near-optimal high-probability bound on this algorithm's
error. We conjecture that a better analysis will prove that this algorithm is
in fact optimal in the high-probability regime.
\\ ( https://arxiv.org/abs/2403.08831 ,  40kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08847 (*cross-listing*)
Date: Wed, 13 Mar 2024 16:50:04 GMT   (10kb)

Title: JAXbind: Bind any function to JAX
Authors: Jakob Roth, Martin Reinecke, Gordian Edenhofer
Categories: astro-ph.IM cs.LG stat.CO
Comments: 4 pages, Github: https://github.com/NIFTy-PPL/JAXbind
\\
 JAX is widely used in machine learning and scientific computing, the latter
of which often relies on existing high-performance code that we would ideally
like to incorporate into JAX. Reimplementing the existing code in JAX is often
impractical and the existing interface in JAX for binding custom code requires
deep knowledge of JAX and its C++ backend. The goal of JAXbind is to
drastically reduce the effort required to bind custom functions implemented in
other programming languages to JAX. Specifically, JAXbind provides an
easy-to-use Python interface for defining custom so-called JAX primitives that
support arbitrary JAX transformations.
\\ ( https://arxiv.org/abs/2403.08847 ,  10kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08851 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:00:00 GMT   (21003kb,D)

Title: PAPERCLIP: Associating Astronomical Observations and Natural Language
 with Multi-Modal Models
Authors: Siddharth Mishra-Sharma, Yiding Song, and Jesse Thaler
Categories: astro-ph.IM cs.CL cs.CV cs.IR cs.LG
Comments: 17+6 pages, 3+1 figures, 5+2 tables
Report-no: MIT-CTP/5690
\\
 We present PAPERCLIP (Proposal Abstracts Provide an Effective Representation
for Contrastive Language-Image Pre-training), a method which associates
astronomical observations imaged by telescopes with natural language using a
neural network model. The model is fine-tuned from a pre-trained Contrastive
Language-Image Pre-training (CLIP) model using successful observing proposal
abstracts and corresponding downstream observations, with the abstracts
optionally summarized via guided generation using large language models (LLMs).
Using observations from the Hubble Space Telescope (HST) as an example, we show
that the fine-tuned model embodies a meaningful joint representation between
observations and natural language through tests targeting image retrieval
(i.e., finding the most relevant observations using natural language queries)
and description retrieval (i.e., querying for astrophysical object classes and
use cases most relevant to a given observation). Our study demonstrates the
potential for using generalist foundation models rather than task-specific
models for interacting with astronomical data by leveraging text as an
interface.
\\ ( https://arxiv.org/abs/2403.08851 ,  21003kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08854 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:00:01 GMT   (10339kb,D)

Title: Moments of Clarity: Streamlining Latent Spaces in Machine Learning using
 Moment Pooling
Authors: Rikab Gambhir, Athis Osathapan, and Jesse Thaler
Categories: hep-ph cs.LG stat.ML
Comments: 15+7 pages, 14 figures, 7 tables. Code available at
 https://github.com/athiso/moment and https://github.com/rikab/MomentAnalysis
Report-no: MIT-CTP 5689
\\
 Many machine learning applications involve learning a latent representation
of data, which is often high-dimensional and difficult to directly interpret.
In this work, we propose "Moment Pooling", a natural extension of Deep Sets
networks which drastically decrease latent space dimensionality of these
networks while maintaining or even improving performance. Moment Pooling
generalizes the summation in Deep Sets to arbitrary multivariate moments, which
enables the model to achieve a much higher effective latent dimensionality for
a fixed latent dimension. We demonstrate Moment Pooling on the collider physics
task of quark/gluon jet classification by extending Energy Flow Networks (EFNs)
to Moment EFNs. We find that Moment EFNs with latent dimensions as small as 1
perform similarly to ordinary EFNs with higher latent dimension. This small
latent dimension allows for the internal representation to be directly
visualized and interpreted, which in turn enables the learned internal jet
representation to be extracted in closed form.
\\ ( https://arxiv.org/abs/2403.08854 ,  10339kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08901 (*cross-listing*)
Date: Wed, 13 Mar 2024 18:45:51 GMT   (39481kb,D)

Title: A Framework for Strategic Discovery of Credible Neural Network Surrogate
 Models under Uncertainty
Authors: Pratyush Kumar Singh, Kathryn A. Farrell-Maupin, Danial Faghihi
Categories: cs.CE cs.LG
\\
 The widespread integration of deep neural networks in developing data-driven
surrogate models for high-fidelity simulations of complex physical systems
highlights the critical necessity for robust uncertainty quantification
techniques and credibility assessment methodologies, ensuring the reliable
deployment of surrogate models in consequential decision-making. This study
presents the Occam Plausibility Algorithm for surrogate models
(OPAL-surrogate), providing a systematic framework to uncover predictive neural
network-based surrogate models within the large space of potential models,
including various neural network classes and choices of architecture and
hyperparameters. The framework is grounded in hierarchical Bayesian inferences
and employs model validation tests to evaluate the credibility and prediction
reliability of the surrogate models under uncertainty. Leveraging these
principles, OPAL-surrogate introduces a systematic and efficient strategy for
balancing the trade-off between model complexity, accuracy, and prediction
uncertainty. The effectiveness of OPAL-surrogate is demonstrated through two
modeling problems, including the deformation of porous materials for building
insulation and turbulent combustion flow for the ablation of solid fuels within
hybrid rocket motors.
\\ ( https://arxiv.org/abs/2403.08901 ,  39481kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08917 (*cross-listing*)
Date: Wed, 13 Mar 2024 19:19:19 GMT   (277kb,D)

Title: Efficiently Computing Similarities to Private Datasets
Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub
 Tarnawski
Categories: cs.CR cs.DS cs.LG
Comments: To appear at ICLR 2024
\\
 Many methods in differentially private model training rely on computing the
similarity between a query point (such as public or synthetic data) and private
data. We abstract out this common subroutine and study the following
fundamental algorithmic problem: Given a similarity function $f$ and a large
high-dimensional private dataset $X \subset \mathbb{R}^d$, output a
differentially private (DP) data structure which approximates $\sum_{x \in X}
f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel
function, such as $f(x,y) = e^{-\|x-y\|_2^2/\sigma^2}$ (also known as DP kernel
density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among
others.
 Our theoretical results improve upon prior work and give better
privacy-utility trade-offs as well as faster query times for a wide range of
kernels and distance functions. The unifying approach behind our results is
leveraging `low-dimensional structures' present in the specific functions $f$
that we study, using tools such as provable dimensionality reduction,
approximation theory, and one-dimensional decomposition of the functions. Our
algorithms empirically exhibit improved query times and accuracy over prior
state of the art. We also present an application to DP classification. Our
experiments demonstrate that the simple methodology of classifying based on
average similarity is orders of magnitude faster than prior DP-SGD based
approaches for comparable accuracy.
\\ ( https://arxiv.org/abs/2403.08917 ,  277kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08938 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:12:03 GMT   (206kb,D)

Title: A non-asymptotic theory of Kernel Ridge Regression: deterministic
 equivalents, test error, and GCV estimator
Authors: Theodor Misiakiewicz, Basil Saeed
Categories: stat.ML cs.LG math.ST stat.TH
Comments: 131 pages, 4 figures
\\
 We consider learning an unknown target function $f_*$ using kernel ridge
regression (KRR) given i.i.d. data $(u_i,y_i)$, $i\leq n$, where $u_i \in U$ is
a covariate vector and $y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$. A
recent string of work has empirically shown that the test error of KRR can be
well approximated by a closed-form estimate derived from an `equivalent'
sequence model that only depends on the spectrum of the kernel operator.
However, a theoretical justification for this equivalence has so far relied
either on restrictive assumptions -- such as subgaussian independent
eigenfunctions -- , or asymptotic derivations for specific kernels in high
dimensions.
 In this paper, we prove that this equivalence holds for a general class of
problems satisfying some spectral and concentration properties on the kernel
eigendecomposition. Specifically, we establish in this setting a non-asymptotic
deterministic approximation for the test error of KRR -- with explicit
non-asymptotic bounds -- that only depends on the eigenvalues and the target
function alignment to the eigenvectors of the kernel. Our proofs rely on a
careful derivation of deterministic equivalents for random matrix functionals
in the dimension free regime pioneered by Cheng and Montanari (2022).
 We apply this setting to several classical examples and show an excellent
agreement between theoretical predictions and numerical simulations. These
results rely on having access to the eigendecomposition of the kernel operator.
Alternatively, we prove that, under this same setting, the generalized
cross-validation (GCV) estimator concentrates on the test error uniformly over
a range of ridge regularization parameter that includes zero (the interpolating
solution). As a consequence, the GCV estimator can be used to estimate from
data the test error and optimal regularization parameter for KRR.
\\ ( https://arxiv.org/abs/2403.08938 ,  206kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08939 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:13:25 GMT   (991kb,D)

Title: FogGuard: guarding YOLO against fog using perceptual loss
Authors: Soheil Gharatappeh, Sepideh Neshatfar, Salimeh Yasaei Sekeh and Vikas
 Dhiman
Categories: cs.CV cs.LG
Comments: 8 pages, 1 figures, submitted on Robotic and Automation Letters
\\
 In this paper, we present a novel fog-aware object detection network called
FogGuard, designed to address the challenges posed by foggy weather conditions.
Autonomous driving systems heavily rely on accurate object detection
algorithms, but adverse weather conditions can significantly impact the
reliability of deep neural networks (DNNs).
 Existing approaches fall into two main categories, 1) image enhancement such
as IA-YOLO 2) domain adaptation based approaches. Image enhancement based
techniques attempt to generate fog-free image. However, retrieving a fogless
image from a foggy image is a much harder problem than detecting objects in a
foggy image. Domain-adaptation based approaches, on the other hand, do not make
use of labelled datasets in the target domain. Both categories of approaches
are attempting to solve a harder version of the problem. Our approach builds
over fine-tuning on the
 Our framework is specifically designed to compensate for foggy conditions
present in the scene, ensuring robust performance even. We adopt YOLOv3 as the
baseline object detection algorithm and introduce a novel Teacher-Student
Perceptual loss, to high accuracy object detection in foggy images.
 Through extensive evaluations on common datasets such as PASCAL VOC and RTTS,
we demonstrate the improvement in performance achieved by our network. We
demonstrate that FogGuard achieves 69.43\% mAP, as compared to 57.78\% for
YOLOv3 on the RTTS dataset.
 Furthermore, we show that while our training method increases time
complexity, it does not introduce any additional overhead during inference
compared to the regular YOLO network.
\\ ( https://arxiv.org/abs/2403.08939 ,  991kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08941 (*cross-listing*)
Date: Wed, 13 Mar 2024 20:16:21 GMT   (7584kb,D)

Title: Towards Model-Agnostic Posterior Approximation for Fast and Accurate
 Variational Autoencoders
Authors: Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez
Categories: stat.ML cs.LG
\\
 Inference for Variational Autoencoders (VAEs) consists of learning two
models: (1) a generative model, which transforms a simple distribution over a
latent space into the distribution over observed data, and (2) an inference
model, which approximates the posterior of the latent codes given data. The two
components are learned jointly via a lower bound to the generative model's log
marginal likelihood. In early phases of joint training, the inference model
poorly approximates the latent code posteriors. Recent work showed that this
leads optimization to get stuck in local optima, negatively impacting the
learned generative model. As such, recent work suggests ensuring a high-quality
inference model via iterative training: maximizing the objective function
relative to the inference model before every update to the generative model.
Unfortunately, iterative training is inefficient, requiring heuristic criteria
for reverting from iterative to joint training for speed. Here, we suggest an
inference method that trains the generative and inference models independently.
It approximates the posterior of the true model a priori; fixing this posterior
approximation, we then maximize the lower bound relative to only the generative
model. By conventional wisdom, this approach should rely on the true prior and
likelihood of the true model to approximate its posterior (which are unknown).
However, we show that we can compute a deterministic, model-agnostic posterior
approximation (MAPA) of the true model's posterior. We then use MAPA to develop
a proof-of-concept inference method. We present preliminary results on
low-dimensional synthetic data that (1) MAPA captures the trend of the true
posterior, and (2) our MAPA-based inference performs better density estimation
with less computation than baselines. Lastly, we present a roadmap for scaling
the MAPA-based inference method to high-dimensional data.
\\ ( https://arxiv.org/abs/2403.08941 ,  7584kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08965 (*cross-listing*)
Date: Wed, 13 Mar 2024 21:11:58 GMT   (2221kb,D)

Title: Deep Learning Based Dynamics Identification and Linearization of Orbital
 Problems using Koopman Theory
Authors: George Nehma, Madhur Tiwari, Manasvi Lingam
Categories: math-ph astro-ph.EP cs.LG math.MP physics.space-ph
\\
 The study of the Two-Body and Circular Restricted Three-Body Problems in the
field of aerospace engineering and sciences is deeply important because they
help describe the motion of both celestial and artificial satellites. With the
growing demand for satellites and satellite formation flying, fast and
efficient control of these systems is becoming ever more important. Global
linearization of these systems allows engineers to employ methods of control in
order to achieve these desired results. We propose a data-driven framework for
simultaneous system identification and global linearization of both the
Two-Body Problem and Circular Restricted Three-Body Problem via deep
learning-based Koopman Theory, i.e., a framework that can identify the
underlying dynamics and globally linearize it into a linear time-invariant
(LTI) system. The linear Koopman operator is discovered through purely
data-driven training of a Deep Neural Network with a custom architecture. This
paper displays the ability of the Koopman operator to generalize to various
other Two-Body systems without the need for retraining. We also demonstrate the
capability of the same architecture to be utilized to accurately learn a
Koopman operator that approximates the Circular Restricted Three-Body Problem.
\\ ( https://arxiv.org/abs/2403.08965 ,  2221kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08969 (*cross-listing*)
Date: Wed, 13 Mar 2024 21:30:01 GMT   (10081kb,D)

Title: The Full-scale Assembly Simulation Testbed (FAST) Dataset
Authors: Alec G. Moore, Tiffany D. Do, Nayan N. Chawla, Antonia Jimenez
 Iriarte, and Ryan P. McMahan
Categories: cs.HC cs.LG
\\
 In recent years, numerous researchers have begun investigating how virtual
reality (VR) tracking and interaction data can be used for a variety of machine
learning purposes, including user identification, predicting cybersickness, and
estimating learning gains. One constraint for this research area is the dearth
of open datasets. In this paper, we present a new open dataset captured with
our VR-based Full-scale Assembly Simulation Testbed (FAST). This dataset
consists of data collected from 108 participants (50 females, 56 males, 2
non-binary) learning how to assemble two distinct full-scale structures in VR.
In addition to explaining how the dataset was collected and describing the data
included, we discuss how the dataset may be used by future researchers.
\\ ( https://arxiv.org/abs/2403.08969 ,  10081kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08978 (*cross-listing*)
Date: Wed, 13 Mar 2024 22:06:03 GMT   (17578kb,D)

Title: AutoGuide: Automated Generation and Selection of State-Aware Guidelines
 for Large Language Model Agents
Authors: Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Sungryull Sohn, Lajanugen
 Logeswaran, Kyunghoon Bae, Honglak Lee
Categories: cs.CL cs.LG
\\
 The primary limitation of large language models (LLMs) is their restricted
understanding of the world. This poses significant difficulties for LLM-based
agents, particularly in domains where pre-trained LLMs lack sufficient
knowledge. In this paper, we introduce a novel framework, called AutoGuide,
that bridges the knowledge gap in pre-trained LLMs by leveraging implicit
knowledge in offline experiences. Specifically, AutoGuide effectively extracts
knowledge embedded in offline data by extracting a set of state-aware
guidelines. Importantly, each state-aware guideline is expressed in concise
natural language and follows a conditional structure, clearly describing the
state where it is applicable. As such, the resulting guidelines enable a
principled way to provide helpful knowledge pertinent to an agent's current
decision-making process. We show that our approach outperforms competitive
LLM-based baselines by a large margin in sequential decision-making benchmarks.
\\ ( https://arxiv.org/abs/2403.08978 ,  17578kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09030 (*cross-listing*)
Date: Thu, 14 Mar 2024 01:46:30 GMT   (559kb)

Title: An AI-Driven Approach to Wind Turbine Bearing Fault Diagnosis from
 Acoustic Signals
Authors: Zhao Wang, Xiaomeng Li, Na Li, Longlong Shu
Categories: cs.SD cs.LG eess.AS
\\
 This study aimed to develop a deep learning model for the classification of
bearing faults in wind turbine generators from acoustic signals. A
convolutional LSTM model was successfully constructed and trained by using
audio data from five predefined fault types for both training and validation.
To create the dataset, raw audio signal data was collected and processed in
frames to capture time and frequency domain information. The model exhibited
outstanding accuracy on training samples and demonstrated excellent
generalization ability during validation, indicating its proficiency of
generalization capability. On the test samples, the model achieved remarkable
classification performance, with an overall accuracy exceeding 99.5%, and a
false positive rate of less than 1% for normal status. The findings of this
study provide essential support for the diagnosis and maintenance of bearing
faults in wind turbine generators, with the potential to enhance the
reliability and efficiency of wind power generation.
\\ ( https://arxiv.org/abs/2403.09030 ,  559kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09032 (*cross-listing*)
Date: Thu, 14 Mar 2024 01:51:35 GMT   (999kb,D)

Title: CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language
 Models to Coding Preferences
Authors: Martin Weyssow, Aton Kamanda, and Houari Sahraoui
Categories: cs.SE cs.CL cs.LG
\\
 Evaluating the alignment of large language models (LLMs) with user-defined
coding preferences is a challenging endeavour that requires assessing intricate
textual LLMs' outputs. By relying on automated metrics and static analysis
tools, existing benchmarks fail to assess nuances in user instructions and LLM
outputs, highlighting the need for large-scale datasets and benchmarks for LLM
preference alignment. In this paper, we introduce CodeUltraFeedback, a
preference dataset of 10,000 complex instructions to tune and align LLMs to
coding preferences through AI feedback. We generate responses to the
instructions using a pool of 14 diverse LLMs, which we then annotate according
to their alignment with five coding preferences using the LLM-as-a-Judge
approach with GPT-3.5, producing both numerical and textual feedback. We also
present CODAL-Bench, a benchmark for assessing LLM alignment with these coding
preferences. Our results show that CodeLlama-7B-Instruct, aligned through
reinforcement learning from AI feedback (RLAIF) with direct preference
optimization (DPO) using CodeUltraFeedback's AI feedback data, outperforms 34B
LLMs on CODAL-Bench, validating the utility of CodeUltraFeedback for preference
tuning. Furthermore, we show our DPO-aligned CodeLlama model improves
functional correctness on HumanEval+ compared to the unaligned base model.
Therefore, our contributions bridge the gap in preference tuning of LLMs for
code and set the stage for further advancements in model alignment and RLAIF
for code intelligence. Our code and data are available at
https://github.com/martin-wey/CodeUltraFeedback.
\\ ( https://arxiv.org/abs/2403.09032 ,  999kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09090 (*cross-listing*)
Date: Thu, 14 Mar 2024 04:26:00 GMT   (444kb,D)

Title: Dissipative Gradient Descent Ascent Method: A Control Theory Inspired
 Algorithm for Min-max Optimization
Authors: Tianqi Zheng, Nicolas Loizou, Pengcheng You and Enrique Mallada
Categories: math.OC cs.LG
\\
 Gradient Descent Ascent (GDA) methods for min-max optimization problems
typically produce oscillatory behavior that can lead to instability, e.g., in
bilinear settings. To address this problem, we introduce a dissipation term
into the GDA updates to dampen these oscillations. The proposed Dissipative GDA
(DGDA) method can be seen as performing standard GDA on a state-augmented and
regularized saddle function that does not strictly introduce additional
convexity/concavity. We theoretically show the linear convergence of DGDA in
the bilinear and strongly convex-strongly concave settings and assess its
performance by comparing DGDA with other methods such as GDA, Extra-Gradient
(EG), and Optimistic GDA. Our findings demonstrate that DGDA surpasses these
methods, achieving superior convergence rates. We support our claims with two
numerical examples that showcase DGDA's effectiveness in solving saddle point
problems.
\\ ( https://arxiv.org/abs/2403.09090 ,  444kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09100 (*cross-listing*)
Date: Thu, 14 Mar 2024 04:48:06 GMT   (1245kb)

Title: Virtual birefringence imaging and histological staining of amyloid
 deposits in label-free tissue using autofluorescence microscopy and deep
 learning
Authors: Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk,
 Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir
 Pillar, Aydogan Ozcan
Categories: physics.med-ph cs.CV cs.LG physics.optics
Comments: 20 Pages, 5 Figures
\\
 Systemic amyloidosis is a group of diseases characterized by the deposition
of misfolded proteins in various organs and tissues, leading to progressive
organ dysfunction and failure. Congo red stain is the gold standard chemical
stain for the visualization of amyloid deposits in tissue sections, as it forms
complexes with the misfolded proteins and shows a birefringence pattern under
polarized light microscopy. However, Congo red staining is tedious and costly
to perform, and prone to false diagnoses due to variations in the amount of
amyloid, staining quality and expert interpretation through manual examination
of tissue under a polarization microscope. Here, we report the first
demonstration of virtual birefringence imaging and virtual Congo red staining
of label-free human tissue to show that a single trained neural network can
rapidly transform autofluorescence images of label-free tissue sections into
brightfield and polarized light microscopy equivalent images, matching the
histochemically stained versions of the same samples. We demonstrate the
efficacy of our method with blind testing and pathologist evaluations on
cardiac tissue where the virtually stained images agreed well with the
histochemically stained ground truth images. Our virtually stained polarization
and brightfield images highlight amyloid birefringence patterns in a
consistent, reproducible manner while mitigating diagnostic challenges due to
variations in the quality of chemical staining and manual imaging processes as
part of the clinical workflow.
\\ ( https://arxiv.org/abs/2403.09100 ,  1245kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09117 (*cross-listing*)
Date: Thu, 14 Mar 2024 05:40:23 GMT   (3607kb,D)

Title: Randomized Principal Component Analysis for Hyperspectral Image
 Classification
Authors: Mustafa Ustuner
Categories: eess.IV cs.CV cs.LG
Comments: 5 pages, I have submitted this paper to M2GARSS 2024, 2024 IEEE
 Mediterranean and Middle-East Geoscience and Remote Sensing Symposium
\\
 The high-dimensional feature space of the hyperspectral imagery poses major
challenges to the processing and analysis of the hyperspectral data sets. In
such a case, dimensionality reduction is necessary to decrease the
computational complexity. The random projections open up new ways of
dimensionality reduction, especially for large data sets. In this paper, the
principal component analysis (PCA) and randomized principal component analysis
(R-PCA) for the classification of hyperspectral images using support vector
machines (SVM) and light gradient boosting machines (LightGBM) have been
investigated. In this experimental research, the number of features was reduced
to 20 and 30 for classification of two hyperspectral datasets (Indian Pines and
Pavia University). The experimental results demonstrated that PCA outperformed
R-PCA for SVM for both datasets, but received close accuracy values for
LightGBM. The highest classification accuracies were obtained as 0.9925 and
0.9639 by LightGBM with original features for the Pavia University and Indian
Pines, respectively.
\\ ( https://arxiv.org/abs/2403.09117 ,  3607kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09183 (*cross-listing*)
Date: Thu, 14 Mar 2024 08:53:01 GMT   (4584kb)

Title: Generalized Relevance Learning Grassmann Quantization
Authors: M. Mohammadi, M. Babai and M.H.F. Wilkinson
Categories: cs.CV cs.LG
\\
 Due to advancements in digital cameras, it is easy to gather multiple images
(or videos) from an object under different conditions. Therefore, image-set
classification has attracted more attention, and different solutions were
proposed to model them. A popular way to model image sets is subspaces, which
form a manifold called the Grassmann manifold. In this contribution, we extend
the application of Generalized Relevance Learning Vector Quantization to deal
with Grassmann manifold. The proposed model returns a set of prototype
subspaces and a relevance vector. While prototypes model typical behaviours
within classes, the relevance factors specify the most discriminative principal
vectors (or images) for the classification task. They both provide insights
into the model's decisions by highlighting influential images and pixels for
predictions. Moreover, due to learning prototypes, the model complexity of the
new method during inference is independent of dataset size, unlike previous
works. We applied it to several recognition tasks including handwritten digit
recognition, face recognition, activity recognition, and object recognition.
Experiments demonstrate that it outperforms previous works with lower
complexity and can successfully model the variation, such as handwritten style
or lighting conditions. Moreover, the presence of relevances makes the model
robust to the selection of subspaces' dimensionality.
\\ ( https://arxiv.org/abs/2403.09183 ,  4584kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09259 (*cross-listing*)
Date: Thu, 14 Mar 2024 10:33:28 GMT   (8232kb,D)

Title: To Label or Not to Label: Hybrid Active Learning for Neural Machine
 Translation
Authors: Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza
Categories: cs.CL cs.LG
Comments: 11 pages, 3 figures
\\
 Active learning (AL) techniques reduce labeling costs for training neural
machine translation (NMT) models by selecting smaller representative subsets
from unlabeled data for annotation. Diversity sampling techniques select
heterogeneous instances, while uncertainty sampling methods select instances
with the highest model uncertainty. Both approaches have limitations -
diversity methods may extract varied but trivial examples, while uncertainty
sampling can yield repetitive, uninformative instances. To bridge this gap, we
propose HUDS, a hybrid AL strategy for domain adaptation in NMT that combines
uncertainty and diversity for sentence selection. HUDS computes uncertainty
scores for unlabeled sentences and subsequently stratifies them. It then
clusters sentence embeddings within each stratum using k-MEANS and computes
diversity scores by distance to the centroid. A weighted hybrid score that
combines uncertainty and diversity is then used to select the top instances for
annotation in each AL iteration. Experiments on multi-domain German-English
datasets demonstrate the better performance of HUDS over other strong AL
baselines. We analyze the sentence selection with HUDS and show that it
prioritizes diverse instances having high model uncertainty for annotation in
early AL iterations.
\\ ( https://arxiv.org/abs/2403.09259 ,  8232kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09267 (*cross-listing*)
Date: Thu, 14 Mar 2024 10:44:10 GMT   (5403kb,D)

Title: Deep Limit Order Book Forecasting
Authors: Antonio Briola, Silvia Bartolucci, Tomaso Aste
Categories: q-fin.TR cs.LG
Comments: 43 pages, 14 figures, 12 Tables
\\
 We exploit cutting-edge deep learning methodologies to explore the
predictability of high-frequency Limit Order Book mid-price changes for a
heterogeneous set of stocks traded on the NASDAQ exchange. In so doing, we
release `LOBFrame', an open-source code base, to efficiently process
large-scale Limit Order Book data and quantitatively assess state-of-the-art
deep learning models' forecasting capabilities. Our results are twofold. We
demonstrate that the stocks' microstructural characteristics influence the
efficacy of deep learning methods and that their high forecasting power does
not necessarily correspond to actionable trading signals. We argue that
traditional machine learning metrics fail to adequately assess the quality of
forecasts in the Limit Order Book context. As an alternative, we propose an
innovative operational framework that assesses predictions' practicality by
focusing on the probability of accurately forecasting complete transactions.
This work offers academics and practitioners an avenue to make informed and
robust decisions on the application of deep learning techniques, their scope
and limitations, effectively exploiting emergent statistical properties of the
Limit Order Book.
\\ ( https://arxiv.org/abs/2403.09267 ,  5403kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09298 (*cross-listing*)
Date: Thu, 14 Mar 2024 11:37:02 GMT   (28kb)

Title: More than words: Advancements and challenges in speech recognition for
 singing
Authors: Anna Kruspe
Categories: cs.SD cs.CL cs.IR cs.LG eess.AS
Comments: Conference on Electronic Speech Signal Processing (ESSV) 2024,
 Keynote
\\
 This paper addresses the challenges and advancements in speech recognition
for singing, a domain distinctly different from standard speech recognition.
Singing encompasses unique challenges, including extensive pitch variations,
diverse vocal styles, and background music interference. We explore key areas
such as phoneme recognition, language identification in songs, keyword
spotting, and full lyrics transcription. I will describe some of my own
experiences when performing research on these tasks just as they were starting
to gain traction, but will also show how recent developments in deep learning
and large-scale datasets have propelled progress in this field. My goal is to
illuminate the complexities of applying speech recognition to singing, evaluate
current capabilities, and outline future research directions.
\\ ( https://arxiv.org/abs/2403.09298 ,  28kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09302 (*cross-listing*)
Date: Thu, 14 Mar 2024 11:49:43 GMT   (44106kb,D)

Title: StainFuser: Controlling Diffusion for Faster Neural Style Transfer in
 Multi-Gigapixel Histology Images
Authors: Robert Jewsbury, Ruoyu Wang, Abhir Bhalerao, Nasir Rajpoot, Quoc Dang
 Vu
Categories: eess.IV cs.CV cs.LG
\\
 Stain normalization algorithms aim to transform the color and intensity
characteristics of a source multi-gigapixel histology image to match those of a
target image, mitigating inconsistencies in the appearance of stains used to
highlight cellular components in the images. We propose a new approach,
StainFuser, which treats this problem as a style transfer task using a novel
Conditional Latent Diffusion architecture, eliminating the need for handcrafted
color components. With this method, we curate SPI-2M the largest stain
normalization dataset to date of over 2 million histology images with neural
style transfer for high-quality transformations. Trained on this data,
StainFuser outperforms current state-of-the-art GAN and handcrafted methods in
terms of the quality of normalized images. Additionally, compared to existing
approaches, it improves the performance of nuclei instance segmentation and
classification models when used as a test time augmentation method on the
challenging CoNIC dataset. Finally, we apply StainFuser on multi-gigapixel
Whole Slide Images (WSIs) and demonstrate improved performance in terms of
computational efficiency, image quality and consistency across tiles over
current methods.
\\ ( https://arxiv.org/abs/2403.09302 ,  44106kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09318 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:09:36 GMT   (3694kb,D)

Title: A Hierarchical Fused Quantum Fuzzy Neural Network for Image
 Classification
Authors: Sheng-Yao Wu, Run-Ze Li, Yan-Qi Song, Su-Juan Qin, Qiao-Yan Wen, Fei
 Gao
Categories: quant-ph cs.LG
\\
 Neural network is a powerful learning paradigm for data feature learning in
the era of big data. However, most neural network models are deterministic
models that ignore the uncertainty of data. Fuzzy neural networks are proposed
to address this problem. FDNN is a hierarchical deep neural network that
derives information from both fuzzy and neural representations, the
representations are then fused to form representation to be classified. FDNN
perform well on uncertain data classification tasks. In this paper, we proposed
a novel hierarchical fused quantum fuzzy neural network (HQFNN). Different from
classical FDNN, HQFNN uses quantum neural networks to learn fuzzy membership
functions in fuzzy neural network. We conducted simulated experiment on two
types of datasets (Dirty-MNIST and 15-Scene), the results show that the
proposed model can outperform several existing methods. In addition, we
demonstrate the robustness of the proposed quantum circuit.
\\ ( https://arxiv.org/abs/2403.09318 ,  3694kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09347 (*cross-listing*)
Date: Thu, 14 Mar 2024 12:51:58 GMT   (4076kb,D)

Title: BurstAttention: An Efficient Distributed Attention Framework for
 Extremely Long Sequences
Authors: Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi,
 Maosong Sun, Shengnan Wang, Teng Su
Categories: cs.DC cs.LG
Comments: 13 pages, 7 figures
\\
 Effective attention modules have played a crucial role in the success of
Transformer-based large language models (LLMs), but the quadratic time and
memory complexities of these attention modules also pose a challenge when
processing long sequences. One potential solution for the long sequence problem
is to utilize distributed clusters to parallelize the computation of attention
modules across multiple devices (e.g., GPUs). However, adopting a distributed
approach inevitably introduces extra memory overheads to store local attention
results and incurs additional communication costs to aggregate local results
into global ones. In this paper, we propose a distributed attention framework
named ``BurstAttention'' to optimize memory access and communication operations
at both the global cluster and local device levels. In our experiments, we
compare BurstAttention with other competitive distributed attention solutions
for long sequence processing. The experimental results under different length
settings demonstrate that BurstAttention offers significant advantages for
processing long sequences compared with these competitive baselines, reducing
40% communication overheads and achieving 2 X speedup during training 32K
sequence length on 8 X A100.
\\ ( https://arxiv.org/abs/2403.09347 ,  4076kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09383 (*cross-listing*)
Date: Thu, 14 Mar 2024 13:34:30 GMT   (3059kb,D)

Title: Pantypes: Diverse Representatives for Self-Explainable Models
Authors: Rune Kj{\ae}rsgaard, Ahc\`ene Boubekki, Line Clemmensen
Categories: stat.ML cs.LG
\\
 Prototypical self-explainable classifiers have emerged to meet the growing
demand for interpretable AI systems. These classifiers are designed to
incorporate high transparency in their decisions by basing inference on
similarity with learned prototypical objects. While these models are designed
with diversity in mind, the learned prototypes often do not sufficiently
represent all aspects of the input distribution, particularly those in low
density regions. Such lack of sufficient data representation, known as
representation bias, has been associated with various detrimental properties
related to machine learning diversity and fairness. In light of this, we
introduce pantypes, a new family of prototypical objects designed to capture
the full diversity of the input distribution through a sparse set of objects.
We show that pantypes can empower prototypical self-explainable models by
occupying divergent regions of the latent space and thus fostering high
diversity, interpretability and fairness.
\\ ( https://arxiv.org/abs/2403.09383 ,  3059kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09389 (*cross-listing*)
Date: Thu, 14 Mar 2024 13:40:26 GMT   (199kb,D)

Title: Learning to optimize with convergence guarantees using nonlinear system
 theory
Authors: Andrea Martin and Luca Furieri
Categories: eess.SY cs.LG cs.SY
\\
 The increasing reliance on numerical methods for controlling dynamical
systems and training machine learning models underscores the need to devise
algorithms that dependably and efficiently navigate complex optimization
landscapes. Classical gradient descent methods offer strong theoretical
guarantees for convex problems; however, they demand meticulous hyperparameter
tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O)
automates the discovery of algorithms with optimized performance leveraging
learning models and data - yet, it lacks a theoretical framework to analyze
convergence and robustness of the learned algorithms. In this paper, we fill
this gap by harnessing nonlinear system theory. Specifically, we propose an
unconstrained parametrization of all convergent algorithms for smooth
non-convex objective functions. Notably, our framework is directly compatible
with automatic differentiation tools, ensuring convergence by design while
learning to optimize.
\\ ( https://arxiv.org/abs/2403.09389 ,  199kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09429 (*cross-listing*)
Date: Thu, 14 Mar 2024 14:20:22 GMT   (815kb,D)

Title: Variational Inference with Sequential Sample-Average Approximations
Authors: Heiko Zimmermann, Christian A. Naesseth, Jan-Willem van de Meent
Categories: stat.ML cs.LG
\\
 We present variational inference with sequential sample-average approximation
(VISA), a method for approximate inference in computationally intensive models,
such as those based on numerical simulations. VISA extends importance-weighted
forward-KL variational inference by employing a sequence of sample-average
approximations, which are considered valid inside a trust region. This makes it
possible to reuse model evaluations across multiple gradient steps, thereby
reducing computational cost. We perform experiments on high-dimensional
Gaussians, Lotka-Volterra dynamics, and a Pickover attractor, which demonstrate
that VISA can achieve comparable approximation accuracy to standard
importance-weighted forward-KL variational inference with computational savings
of a factor two or more for conservatively chosen learning rates.
\\ ( https://arxiv.org/abs/2403.09429 ,  815kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09465 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:04:45 GMT   (105kb,D)

Title: Outlier Robust Multivariate Polynomial Regression
Authors: Vipul Arora, Arnab Bhattacharyya, Mathews Boban, Venkatesan Guruswami,
 Esty Kelman
Categories: cs.DS cs.LG
\\
 We study the problem of robust multivariate polynomial regression: let
$p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of
degree at most $d$ in each variable. We are given as input a set of random
samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy
versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each
$\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on
$[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an
outlier) with probability at most $\rho < 1/2$, and otherwise satisfies
$|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial
$\hat{p}$, of degree at most $d$ in each variable, within an
$\ell_\infty$-distance of at most $O(\sigma)$ from $p$.
 Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We
generalize their results to the $n$-variate setting, showing an algorithm that
achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant
depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The
sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the
uniform distribution instead. The approximation error is guaranteed to be at
most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting
where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the
run-time's dependence on $N$ is linear. We also show that our sample
complexities are optimal in terms of $d^n$. Furthermore, we show that it is
possible to have the run-time be independent of $1/\sigma$, at the cost of a
higher sample complexity.
\\ ( https://arxiv.org/abs/2403.09465 ,  105kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09477 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:19:19 GMT   (6869kb,D)

Title: VIRUS-NeRF -- Vision, InfraRed and UltraSonic based Neural Radiance
 Fields
Authors: Nicolaj Schmid, Cornelius von Einem, Cesar Cadena, Roland Siegwart,
 Lorenz Hruby, Florian Tschopp
Categories: cs.RO cs.LG
\\
 Autonomous mobile robots are an increasingly integral part of modern factory
and warehouse operations. Obstacle detection, avoidance and path planning are
critical safety-relevant tasks, which are often solved using expensive LiDAR
sensors and depth cameras. We propose to use cost-effective low-resolution
ranging sensors, such as ultrasonic and infrared time-of-flight sensors by
developing VIRUS-NeRF - Vision, InfraRed, and UltraSonic based Neural Radiance
Fields. Building upon Instant Neural Graphics Primitives with a Multiresolution
Hash Encoding (Instant-NGP), VIRUS-NeRF incorporates depth measurements from
ultrasonic and infrared sensors and utilizes them to update the occupancy grid
used for ray marching. Experimental evaluation in 2D demonstrates that
VIRUS-NeRF achieves comparable mapping performance to LiDAR point clouds
regarding coverage. Notably, in small environments, its accuracy aligns with
that of LiDAR measurements, while in larger ones, it is bounded by the utilized
ultrasonic sensors. An in-depth ablation study reveals that adding ultrasonic
and infrared sensors is highly effective when dealing with sparse data and low
view variation. Further, the proposed occupancy grid of VIRUS-NeRF improves the
mapping capabilities and increases the training speed by 46% compared to
Instant-NGP. Overall, VIRUS-NeRF presents a promising approach for
cost-effective local mapping in mobile robotics, with potential applications in
safety and navigation tasks. The code can be found at
https://github.com/ethz-asl/virus nerf.
\\ ( https://arxiv.org/abs/2403.09477 ,  6869kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09509 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:56:02 GMT   (2380kb,D)

Title: On STPA for Distributed Development of Safe Autonomous Driving: An
 Interview Study
Authors: Ali Nouri, Christian Berger, Fredrik T\"orner
Categories: cs.SE cs.LG
Comments: Accepted at SEAA. 8 pages, 2 figures
Journal-ref: A. Nouri, C. Berger and F. Torner, "On STPA ... Interview Study,"
 2023 49th Euromicro Conference on Software Engineering and Advanced
 Applications (SEAA), Durres, Albania, 2023, pp. 5-12
DOI: 10.1109/SEAA60479.2023.00011
\\
 Safety analysis is used to identify hazards and build knowledge during the
design phase of safety-relevant functions. This is especially true for complex
AI-enabled and software intensive systems such as Autonomous Drive (AD).
System-Theoretic Process Analysis (STPA) is a novel method applied in
safety-related fields like defense and aerospace, which is also becoming
popular in the automotive industry. However, STPA assumes prerequisites that
are not fully valid in the automotive system engineering with distributed
system development and multi-abstraction design levels. This would inhibit
software developers from using STPA to analyze their software as part of a
bigger system, resulting in a lack of traceability. This can be seen as a
maintainability challenge in continuous development and deployment (DevOps). In
this paper, STPA's different guidelines for the automotive industry, e.g.
J31887/ISO21448/STPA handbook, are firstly compared to assess their
applicability to the distributed development of complex AI-enabled systems like
AD. Further, an approach to overcome the challenges of using STPA in a
multi-level design context is proposed. By conducting an interview study with
automotive industry experts for the development of AD, the challenges are
validated and the effectiveness of the proposed approach is evaluated.
\\ ( https://arxiv.org/abs/2403.09509 ,  2380kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09516 (*cross-listing*)
Date: Thu, 14 Mar 2024 15:58:36 GMT   (8192kb,D)

Title: Leveraging Prototypical Representations for Mitigating Social Bias
 without Demographic Information
Authors: Shadi Iskander, Kira Radinsky, Yonatan Belinkov
Categories: cs.CL cs.LG
\\
 Mitigating social biases typically requires identifying the social groups
associated with each data sample. In this paper, we present DAFair, a novel
approach to address social bias in language models. Unlike traditional methods
that rely on explicit demographic labels, our approach does not require any
such information. Instead, we leverage predefined prototypical demographic
texts and incorporate a regularization term during the fine-tuning process to
mitigate bias in the model's representations. Our empirical results across two
tasks and two models demonstrate the effectiveness of our method compared to
previous approaches that do not rely on labeled data. Moreover, with limited
demographic-annotated data, our approach outperforms common debiasing
approaches.
\\ ( https://arxiv.org/abs/2403.09516 ,  8192kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09543 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:30:52 GMT   (1747kb,D)

Title: Explorations in Texture Learning
Authors: Blaine Hoak, Patrick McDaniel
Categories: cs.CV cs.LG
Comments: Accepted to ICLR 2024, Tiny Papers Track
\\
 In this work, we investigate \textit{texture learning}: the identification of
textures learned by object classification models, and the extent to which they
rely on these textures. We build texture-object associations that uncover new
insights about the relationships between texture and object classes in CNNs and
find three classes of results: associations that are strong and expected,
strong and not expected, and expected but not present. Our analysis
demonstrates that investigations in texture learning enable new methods for
interpretability and have the potential to uncover unexpected biases.
\\ ( https://arxiv.org/abs/2403.09543 ,  1747kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09547 (*cross-listing*)
Date: Thu, 14 Mar 2024 16:35:39 GMT   (2328kb)

Title: How do Machine Learning Projects use Continuous Integration Practices?
 An Empirical Study on GitHub Actions
Authors: Jo\~ao Helis Bernardo, Daniel Alencar da Costa, S\'ergio Queiroz de
 Medeiros, Uir\'a Kulesza
Categories: cs.SE cs.LG
Comments: 10 pages, Mining Software Repositories, MSR 2024
\\
 Continuous Integration (CI) is a well-established practice in traditional
software development, but its nuances in the domain of Machine Learning (ML)
projects remain relatively unexplored. Given the distinctive nature of ML
development, understanding how CI practices are adopted in this context is
crucial for tailoring effective approaches. In this study, we conduct a
comprehensive analysis of 185 open-source projects on GitHub (93 ML and 92
non-ML projects). Our investigation comprises both quantitative and qualitative
dimensions, aiming to uncover differences in CI adoption between ML and non-ML
projects. Our findings indicate that ML projects often require longer build
durations, and medium-sized ML projects exhibit lower test coverage compared to
non-ML projects. Moreover, small and medium-sized ML projects show a higher
prevalence of increasing build duration trends compared to their non-ML
counterparts. Additionally, our qualitative analysis illuminates the
discussions around CI in both ML and non-ML projects, encompassing themes like
CI Build Execution and Status, CI Testing, and CI Infrastructure. These
insights shed light on the unique challenges faced by ML projects in adopting
CI practices effectively.
\\ ( https://arxiv.org/abs/2403.09547 ,  2328kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09571 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:00:29 GMT   (7297kb,D)

Title: Are you a robot? Detecting Autonomous Vehicles from Behavior Analysis
Authors: Fabio Maresca, Filippo Grazioli, Antonio Albanese, Vincenzo
 Sciancalepore, Gianpiero Negri, Xavier Costa-Perez
Categories: cs.RO cs.LG
\\
 The tremendous hype around autonomous driving is eagerly calling for emerging
and novel technologies to support advanced mobility use cases. As car
manufactures keep developing SAE level 3+ systems to improve the safety and
comfort of passengers, traffic authorities need to establish new procedures to
manage the transition from human-driven to fully-autonomous vehicles while
providing a feedback-loop mechanism to fine-tune envisioned autonomous systems.
Thus, a way to automatically profile autonomous vehicles and differentiate
those from human-driven ones is a must. In this paper, we present a
fully-fledged framework that monitors active vehicles using camera images and
state information in order to determine whether vehicles are autonomous,
without requiring any active notification from the vehicles themselves.
Essentially, it builds on the cooperation among vehicles, which share their
data acquired on the road feeding a machine learning model to identify
autonomous cars. We extensively tested our solution and created the NexusStreet
dataset, by means of the CARLA simulator, employing an autonomous driving
control agent and a steering wheel maneuvered by licensed drivers. Experiments
show it is possible to discriminate the two behaviors by analyzing video clips
with an accuracy of 80%, which improves up to 93% when the target state
information is available. Lastly, we deliberately degraded the state to observe
how the framework performs under non-ideal data collection conditions.
\\ ( https://arxiv.org/abs/2403.09571 ,  7297kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09579 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:13:37 GMT   (3212kb,D)

Title: uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with
 Unsupervised Audio Mixtures
Authors: Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito
 Koishida
Categories: cs.SD cs.LG eess.AS
Comments: 5 pages, 6 figures, 4 tables. To appear in ICASSP'2024
\\
 Masked Autoencoders (MAEs) learn rich low-level representations from
unlabeled data but require substantial labeled data to effectively adapt to
downstream tasks. Conversely, Instance Discrimination (ID) emphasizes
high-level semantics, offering a potential solution to alleviate annotation
requirements in MAEs. Although combining these two approaches can address
downstream tasks with limited labeled data, naively integrating ID into MAEs
leads to extended training times and high computational costs. To address this
challenge, we introduce uaMix-MAE, an efficient ID tuning strategy that
leverages unsupervised audio mixtures. Utilizing contrastive tuning, uaMix-MAE
aligns the representations of pretrained MAEs, thereby facilitating effective
adaptation to task-specific semantics. To optimize the model with small amounts
of unlabeled data, we propose an audio mixing technique that manipulates audio
samples in both input and virtual label spaces. Experiments in low/few-shot
settings demonstrate that \modelname achieves 4-6% accuracy improvements over
various benchmarks when tuned with limited unlabeled data, such as
AudioSet-20K. Code is available at https://github.com/PLAN-Lab/uamix-MAE
\\ ( https://arxiv.org/abs/2403.09579 ,  3212kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09598 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:39:14 GMT   (291kb,D)

Title: Mixture of Mixups for Multi-label Classification of Rare Anuran Sounds
Authors: Ilyass Moummad, Nicolas Farrugia, Romain Serizel, Jeremy Froidevaux,
 Vincent Lostanlen
Categories: cs.SD cs.LG eess.AS
\\
 Multi-label imbalanced classification poses a significant challenge in
machine learning, particularly evident in bioacoustics where animal sounds
often co-occur, and certain sounds are much less frequent than others. This
paper focuses on the specific case of classifying anuran species sounds using
the dataset AnuraSet, that contains both class imbalance and multi-label
examples. To address these challenges, we introduce Mixture of Mixups (Mix2), a
framework that leverages mixing regularization methods Mixup, Manifold Mixup,
and MultiMix. Experimental results show that these methods, individually, may
lead to suboptimal results; however, when applied randomly, with one selected
at each training iteration, they prove effective in addressing the mentioned
challenges, particularly for rare classes with few occurrences. Further
analysis reveals that Mix2 is also proficient in classifying sounds across
various levels of class co-occurrences.
\\ ( https://arxiv.org/abs/2403.09598 ,  291kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09611 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:51:32 GMT   (14464kb,D)

Title: MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Authors: Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen
 Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers,
 Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu H\`e, Max
 Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan
 Du, Tao Lei, Sam Wiseman, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch,
 Alexander Toshev, Yinfei Yang
Categories: cs.CV cs.CL cs.LG
\\
 In this work, we discuss building performant Multimodal Large Language Models
(MLLMs). In particular, we study the importance of various architecture
components and data choices. Through careful and comprehensive ablations of the
image encoder, the vision language connector, and various pre-training data
choices, we identified several crucial design lessons. For example, we
demonstrate that for large-scale multimodal pre-training using a careful mix of
image-caption, interleaved image-text, and text-only data is crucial for
achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks,
compared to other published pre-training results. Further, we show that the
image encoder together with image resolution and the image token count has
substantial impact, while the vision-language connector design is of
comparatively negligible importance. By scaling up the presented recipe, we
build MM1, a family of multimodal models up to 30B parameters, consisting of
both dense models and mixture-of-experts (MoE) variants, that are SOTA in
pre-training metrics and achieve competitive performance after supervised
fine-tuning on a range of established multimodal benchmarks. Thanks to
large-scale pre-training, MM1 enjoys appealing properties such as enhanced
in-context learning, and multi-image reasoning, enabling few-shot
chain-of-thought prompting.
\\ ( https://arxiv.org/abs/2403.09611 ,  14464kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09612 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:51:38 GMT   (13043kb,D)

Title: Compute-first optical detection for noise-resilient visual perception
Authors: Jungmin Kim, Nanfang Yu, Zongfu Yu
Categories: physics.optics cs.LG
Comments: Main 9 pages, 5 figures, Supplementary information 5 pages
\\
 In the context of visual perception, the optical signal from a scene is
transferred into the electronic domain by detectors in the form of image data,
which are then processed for the extraction of visual information. In noisy and
weak-signal environments such as thermal imaging for night vision applications,
however, the performance of neural computing tasks faces a significant
bottleneck due to the inherent degradation of data quality upon noisy
detection. Here, we propose a concept of optical signal processing before
detection to address this issue. We demonstrate that spatially redistributing
optical signals through a properly designed linear transformer can enhance the
detection noise resilience of visual perception tasks, as benchmarked with the
MNIST classification. Our idea is supported by a quantitative analysis
detailing the relationship between signal concentration and noise robustness,
as well as its practical implementation in an incoherent imaging system. This
compute-first detection scheme can pave the way for advancing infrared machine
vision technologies widely used for industrial and defense applications.
\\ ( https://arxiv.org/abs/2403.09612 ,  13043kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09625 (*cross-listing*)
Date: Thu, 14 Mar 2024 17:57:04 GMT   (6768kb,D)

Title: Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
Authors: Fangfu Liu, Hanyang Wang, Weiliang Chen, Haowen Sun, Yueqi Duan
Categories: cs.CV cs.LG
Comments: Project page: https://liuff19.github.io/Make-Your-3D
\\
 Recent years have witnessed the strong power of 3D generation models, which
offer a new level of creative flexibility by allowing users to guide the 3D
content generation process through a single image or natural language. However,
it remains challenging for existing 3D generation methods to create
subject-driven 3D content across diverse prompts. In this paper, we introduce a
novel 3D customization method, dubbed Make-Your-3D that can personalize
high-fidelity and consistent 3D content from only a single image of a subject
with text description within 5 minutes. Our key insight is to harmonize the
distributions of a multi-view diffusion model and an identity-specific 2D
generative model, aligning them with the distribution of the desired 3D
subject. Specifically, we design a co-evolution framework to reduce the
variance of distributions, where each model undergoes a process of learning
from the other through identity-aware optimization and subject-prior
optimization, respectively. Extensive experiments demonstrate that our method
can produce high-quality, consistent, and subject-specific 3D content with
text-driven modifications that are unseen in subject image.
\\ ( https://arxiv.org/abs/2403.09625 ,  6768kb)
%%--%%--%%--%%--%%--%%--%%--%