-
Notifications
You must be signed in to change notification settings - Fork 0
/
mail_text.txt
5891 lines (5687 loc) · 338 KB
/
mail_text.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Send any comments regarding submissions directly to submitter.
------------------------------------------------------------------------------
Archives at http://arxiv.org/
To unsubscribe, e-mail To: [email protected], Subject: cancel
------------------------------------------------------------------------------
Submissions to:
Artificial Intelligence
Computational Geometry
Computers and Society
Machine Learning
received from Wed 13 Mar 24 18:00:00 GMT to Thu 14 Mar 24 18:00:00 GMT
------------------------------------------------------------------------------
------------------------------------------------------------------------------
\\
arXiv:2403.08802
Date: Mon, 5 Feb 2024 14:20:19 GMT (482kb)
Title: Governance of Generative Artificial Intelligence for Companies
Authors: Johannes Schneider, Rene Abraham, Christian Meske
Categories: cs.AI cs.CY cs.LG
\\
Generative Artificial Intelligence (GenAI), specifically large language
models like ChatGPT, has swiftly entered organizations without adequate
governance, posing both opportunities and risks. Despite extensive debates on
GenAI's transformative nature and regulatory measures, limited research
addresses organizational governance, encompassing technical and business
perspectives. This review paper fills this gap by surveying recent works. It
goes beyond mere summarization by developing a framework for GenAI governance
within companies. Our framework outlines the scope, objectives, and governance
mechanisms tailored to harness business opportunities and mitigate risks
associated with GenAI integration. This research contributes a focused approach
to GenAI governance, offering practical insights for companies navigating the
challenges of responsible AI adoption. It is also valuable for a technical
audience to broaden their perspective as increasingly ethical and business
concerns gain in prevalence and allow them to identify novel research
directions.
\\ ( https://arxiv.org/abs/2403.08802 , 482kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08843
Date: Wed, 13 Mar 2024 14:45:54 GMT (943kb,D)
Title: Fuzzy Fault Trees Formalized
Authors: Thi Kim Nhung Dang, Milan Lopuha\"a-Zwakenberg, Mari\"elle Stoelinga
Categories: cs.AI
Comments: 14 pages
\\
Fault tree analysis is a vital method of assessing safety risks. It helps to
identify potential causes of accidents, assess their likelihood and severity,
and suggest preventive measures. Quantitative analysis of fault trees is often
done via the dependability metrics that compute the system's failure behaviour
over time. However, the lack of precise data is a major obstacle to
quantitative analysis, and so to reliability analysis. Fuzzy logic is a popular
framework for dealing with ambiguous values and has applications in many
domains. A number of fuzzy approaches have been proposed to fault tree
analysis, but -- to the best of our knowledge -- none of them provide rigorous
definitions or algorithms for computing fuzzy unreliability values. In this
paper, we define a rigorous framework for fuzzy unreliability values. In
addition, we provide a bottom-up algorithm to efficiently calculate fuzzy
reliability for a system. The algorithm incorporates the concept of
$\alpha$-cuts method. That is, performing binary algebraic operations on
intervals on horizontally discretised $\alpha$-cut representations of fuzzy
numbers. The method preserves the nonlinearity of fuzzy unreliability. Finally,
we illustrate the results obtained from two case studies.
\\ ( https://arxiv.org/abs/2403.08843 , 943kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08910
Date: Wed, 13 Mar 2024 19:00:36 GMT (31kb)
Title: Meta-operators for Enabling Parallel Planning Using Deep Reinforcement
Learning
Authors: \'Angel Aso-Mollar, Eva Onaindia
Categories: cs.AI
Comments: 9 pages. Submitted to PRL workshop at ICAPS 2023
\\
There is a growing interest in the application of Reinforcement Learning (RL)
techniques to AI planning with the aim to come up with general policies.
Typically, the mapping of the transition model of AI planning to the state
transition system of a Markov Decision Process is established by assuming a
one-to-one correspondence of the respective action spaces. In this paper, we
introduce the concept of meta-operator as the result of simultaneously applying
multiple planning operators, and we show that including meta-operators in the
RL action space enables new planning perspectives to be addressed using RL,
such as parallel planning. Our research aims to analyze the performance and
complexity of including meta-operators in the RL process, concretely in domains
where satisfactory outcomes have not been previously achieved using usual
generalized planning models. The main objective of this article is thus to pave
the way towards a redefinition of the RL action space in a manner that is more
closely aligned with the planning perspective.
\\ ( https://arxiv.org/abs/2403.08910 , 31kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09232
Date: Thu, 14 Mar 2024 09:56:35 GMT (625kb,D)
Title: Generating Feasible and Plausible Counterfactual Explanations for
Outcome Prediction of Business Processes
Authors: Alexander Stevens, Chun Ouyang, Johannes De Smedt, Catarina Moreira
Categories: cs.AI
Comments: Journal Submission
\\
In recent years, various machine and deep learning architectures have been
successfully introduced to the field of predictive process analytics.
Nevertheless, the inherent opacity of these algorithms poses a significant
challenge for human decision-makers, hindering their ability to understand the
reasoning behind the predictions. This growing concern has sparked the
introduction of counterfactual explanations, designed as human-understandable
what if scenarios, to provide clearer insights into the decision-making process
behind undesirable predictions. The generation of counterfactual explanations,
however, encounters specific challenges when dealing with the sequential nature
of the (business) process cases typically used in predictive process analytics.
Our paper tackles this challenge by introducing a data-driven approach,
REVISEDplus, to generate more feasible and plausible counterfactual
explanations. First, we restrict the counterfactual algorithm to generate
counterfactuals that lie within a high-density region of the process data,
ensuring that the proposed counterfactuals are realistic and feasible within
the observed process data distribution. Additionally, we ensure plausibility by
learning sequential patterns between the activities in the process cases,
utilising Declare language templates. Finally, we evaluate the properties that
define the validity of counterfactuals.
\\ ( https://arxiv.org/abs/2403.09232 , 625kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09249
Date: Thu, 14 Mar 2024 10:16:57 GMT (1713kb,D)
Title: Leveraging Constraint Programming in a Deep Learning Approach for
Dynamically Solving the Flexible Job-Shop Scheduling Problem
Authors: Imanol Echeverria, Maialen Murua, Roberto Santana
Categories: cs.AI
\\
Recent advancements in the flexible job-shop scheduling problem (FJSSP) are
primarily based on deep reinforcement learning (DRL) due to its ability to
generate high-quality, real-time solutions. However, DRL approaches often fail
to fully harness the strengths of existing techniques such as exact methods or
constraint programming (CP), which can excel at finding optimal or near-optimal
solutions for smaller instances. This paper aims to integrate CP within a deep
learning (DL) based methodology, leveraging the benefits of both. In this
paper, we introduce a method that involves training a DL model using optimal
solutions generated by CP, ensuring the model learns from high-quality data,
thereby eliminating the need for the extensive exploration typical in DRL and
enhancing overall performance. Further, we integrate CP into our DL framework
to jointly construct solutions, utilizing DL for the initial complex stages and
transitioning to CP for optimal resolution as the problem is simplified. Our
hybrid approach has been extensively tested on three public FJSSP benchmarks,
demonstrating superior performance over five state-of-the-art DRL approaches
and a widely-used CP solver. Additionally, with the objective of exploring the
application to other combinatorial optimization problems, promising preliminary
results are presented on applying our hybrid approach to the traveling salesman
problem, combining an exact method with a well-known DRL method.
\\ ( https://arxiv.org/abs/2403.09249 , 1713kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09289
Date: Thu, 14 Mar 2024 11:22:51 GMT (525kb,D)
Title: Silico-centric Theory of Mind
Authors: Anirban Mukherjee, Hannah Hanwen Chang
Categories: cs.AI
\\
Theory of Mind (ToM) refers to the ability to attribute mental states, such
as beliefs, desires, intentions, and knowledge, to oneself and others, and to
understand that these mental states can differ from one's own and from reality.
We investigate ToM in environments with multiple, distinct, independent AI
agents, each possessing unique internal states, information, and objectives.
Inspired by human false-belief experiments, we present an AI ('focal AI') with
a scenario where its clone undergoes a human-centric ToM assessment. We prompt
the focal AI to assess whether its clone would benefit from additional
instructions. Concurrently, we give its clones the ToM assessment, both with
and without the instructions, thereby engaging the focal AI in higher-order
counterfactual reasoning akin to human mentalizing--with respect to humans in
one test and to other AI in another. We uncover a discrepancy: Contemporary AI
demonstrates near-perfect accuracy on human-centric ToM assessments. Since
information embedded in one AI is identically embedded in its clone, additional
instructions are redundant. Yet, we observe AI crafting elaborate instructions
for their clones, erroneously anticipating a need for assistance. An
independent referee AI agrees with these unsupported expectations. Neither the
focal AI nor the referee demonstrates ToM in our 'silico-centric' test.
\\ ( https://arxiv.org/abs/2403.09289 , 525kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09361
Date: Thu, 14 Mar 2024 13:11:30 GMT (307kb,D)
Title: A Multi-population Integrated Approach for Capacitated Location Routing
Authors: Pengfei He, Jin-Kao Hao, Qinghua Wu
Categories: cs.AI
\\
The capacitated location-routing problem involves determining the depots from
a set of candidate capacitated depot locations and finding the required routes
from the selected depots to serve a set of customers whereas minimizing a cost
function that includes the cost of opening the chosen depots, the fixed
utilization cost per vehicle used, and the total cost (distance) of the routes.
This paper presents a multi-population integrated framework in which a
multi-depot edge assembly crossover generates promising offspring solutions
from the perspective of both depot location and route edge assembly. The method
includes an effective neighborhood-based local search, a feasibility-restoring
procedure and a diversification-oriented mutation. Of particular interest is
the multi-population scheme which organizes the population into multiple
subpopulations based on depot configurations. Extensive experiments on 281
benchmark instances from the literature show that the algorithm performs
remarkably well, by improving 101 best-known results (new upper bounds) and
matching 84 best-known results. Additional experiments are presented to gain
insight into the role of the key elements of the algorithm.
\\ ( https://arxiv.org/abs/2403.09361 , 307kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09404
Date: Thu, 14 Mar 2024 13:53:05 GMT (436kb,D)
Title: Heuristic Reasoning in AI: Instrumental Use and Mimetic Absorption
Authors: Anirban Mukherjee, Hannah Hanwen Chang
Categories: cs.AI
\\
We propose a novel program of heuristic reasoning within artificial
intelligence (AI) systems. Through a series of innovative experiments,
including variations of the classic Linda problem and a novel application of
the Beauty Contest game, we uncover trade-offs between accuracy maximization
and effort reduction that shape the conditions under which AIs transition
between exhaustive logical processing and the use of cognitive shortcuts
(heuristics). We distinguish between the 'instrumental' use of heuristics to
match resources with objectives, and 'mimetic absorption,' whereby heuristics
are learned from humans, and manifest randomly and universally. We provide
evidence that AI, despite lacking intrinsic goals or self-awareness, manifests
an adaptive balancing of precision and efficiency, consistent with principles
of resource-rational human cognition as explicated in classical theories of
bounded rationality and dual-process theory.
\\ ( https://arxiv.org/abs/2403.09404 , 436kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09481
Date: Thu, 14 Mar 2024 15:25:23 GMT (4223kb,D)
Title: Clinical Reasoning over Tabular Data and Text with Bayesian Networks
Authors: Paloma Rabaey, Johannes Deleu, Stefan Heytens, Thomas Demeester
Categories: cs.AI
Comments: 10 pages, 2 figures
\\
Bayesian networks are well-suited for clinical reasoning on tabular data, but
are less compatible with natural language data, for which neural networks
provide a successful framework. This paper compares and discusses strategies to
augment Bayesian networks with neural text representations, both in a
generative and discriminative manner. This is illustrated with simulation
results for a primary care use case (diagnosis of pneumonia) and discussed in a
broader clinical context.
\\ ( https://arxiv.org/abs/2403.09481 , 4223kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09510
Date: Thu, 14 Mar 2024 15:56:39 GMT (8782kb,D)
Title: Trust AI Regulation? Discerning users are vital to build trust and
effective AI regulation
Authors: Zainab Alalawi, Paolo Bova, Theodor Cimpeanu, Alessandro Di Stefano,
Manh Hong Duong, Elias Fernandez Domingos, The Anh Han, Marcus Krellner,
Bianca Ogbo, Simon T. Powers, and Filippo Zimmaro
Categories: cs.AI cs.CY cs.GT cs.MA math.DS
\\
There is general agreement that some form of regulation is necessary both for
AI creators to be incentivised to develop trustworthy systems, and for users to
actually trust those systems. But there is much debate about what form these
regulations should take and how they should be implemented. Most work in this
area has been qualitative, and has not been able to make formal predictions.
Here, we propose that evolutionary game theory can be used to quantitatively
model the dilemmas faced by users, AI creators, and regulators, and provide
insights into the possible effects of different regulatory regimes. We show
that creating trustworthy AI and user trust requires regulators to be
incentivised to regulate effectively. We demonstrate the effectiveness of two
mechanisms that can achieve this. The first is where governments can recognise
and reward regulators that do a good job. In that case, if the AI system is not
too risky for users then some level of trustworthy development and user trust
evolves. We then consider an alternative solution, where users can condition
their trust decision on the effectiveness of the regulators. This leads to
effective regulation, and consequently the development of trustworthy AI and
user trust, provided that the cost of implementing regulations is not too high.
Our findings highlight the importance of considering the effect of different
regulatory regimes from an evolutionary game theoretic perspective.
\\ ( https://arxiv.org/abs/2403.09510 , 8782kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09580
Date: Thu, 14 Mar 2024 17:14:53 GMT (16kb)
Title: Algorithmic syntactic causal identification
Authors: Dhurim Cakiqi and Max A. Little
Categories: cs.AI cs.LG stat.OT
Comments: 11 pages, 2 TikZ figures
\\
Causal identification in causal Bayes nets (CBNs) is an important tool in
causal inference allowing the derivation of interventional distributions from
observational distributions where this is possible in principle. However, most
existing formulations of causal identification using techniques such as
d-separation and do-calculus are expressed within the mathematical language of
classical probability theory on CBNs. However, there are many causal settings
where probability theory and hence current causal identification techniques are
inapplicable such as relational databases, dataflow programs such as hardware
description languages, distributed systems and most modern machine learning
algorithms. We show that this restriction can be lifted by replacing the use of
classical probability theory with the alternative axiomatic foundation of
symmetric monoidal categories. In this alternative axiomatization, we show how
an unambiguous and clean distinction can be drawn between the general syntax of
causal models and any specific semantic implementation of that causal model.
This allows a purely syntactic algorithmic description of general causal
identification by a translation of recent formulations of the general ID
algorithm through fixing. Our description is given entirely in terms of the
non-parametric ADMG structure specifying a causal model and the algebraic
signature of the corresponding monoidal category, to which a sequence of
manipulations is then applied so as to arrive at a modified monoidal category
in which the desired, purely syntactic interventional causal model, is
obtained. We use this idea to derive purely syntactic analogues of classical
back-door and front-door causal adjustment, and illustrate an application to a
more complex causal model.
\\ ( https://arxiv.org/abs/2403.09580 , 16kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08977
Date: Wed, 13 Mar 2024 21:56:40 GMT (128kb,D)
Title: On maximum-sum matchings of bichromatic points
Authors: Oscar Chac\'on-Rivera, Pablo P\'erez-Lantero
Categories: cs.CG cs.DM
\\
Huemer et al. (Discrete Math, 2019) proved that for any two finite point sets
$R$ and $B$ in the plane with $|R| = |B|$, the perfect matching that matches
points of $R$ with points of $B$, and maximizes the total squared Euclidean
distance of the matched pairs, has the property that all the disks induced by
the matching have a nonempty common intersection. A pair of matched points
induces the disk that has the segment connecting the points as diameter. In
this note, we characterize these maximum-sum matchings for any continuous
(semi)metric, focusing on both the Euclidean distance and squared Euclidean
distance. Using this characterization, we give a different but simpler proof
for the common intersection property proved by Huemer et al..
\\ ( https://arxiv.org/abs/2403.08977 , 128kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09197
Date: Thu, 14 Mar 2024 09:09:15 GMT (645kb,D)
Title: MetroGNN: Metro Network Expansion with Reinforcement Learning
Authors: Hongyuan Su, Yu Zheng, Jingtao Ding, Depeng Jin, Yong Li
Categories: cs.CY
Comments: WWW24 short
MSC-class: 68T09
DOI: 10.1145/3589335.3651536
\\
Selecting urban regions for metro network expansion to meet maximal
transportation demands is crucial for urban development, while computationally
challenging to solve. The expansion process relies not only on complicated
features like urban demographics and origin-destination (OD) flow but is also
constrained by the existing metro network and urban geography. In this paper,
we introduce a reinforcement learning framework to address a Markov decision
process within an urban heterogeneous multi-graph. Our approach employs an
attentive policy network that intelligently selects nodes based on information
captured by a graph neural network. Experiments on real-world urban data
demonstrate that our proposed methodology substantially improve the satisfied
transportation demands by over 30\% when compared with state-of-the-art
methods. Codes are published at https://github.com/tsinghua-fib-lab/MetroGNN.
\\ ( https://arxiv.org/abs/2403.09197 , 645kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09208
Date: Thu, 14 Mar 2024 09:22:16 GMT (425kb)
Title: Older adults' safety and security online: A post-pandemic exploration of
attitudes and behaviors
Authors: Edgar Pacheco
Categories: cs.CY
Comments: 20 pages, 7 tables
\\
Older adults' growing use of the internet and related technologies, further
accelerated by the COVID-19 pandemic, has prompted not only a critical
examination of their behaviors and attitudes about online threats but also a
greater understanding of the roles of specific characteristics within this
population group. Based on survey data and using descriptive and inferential
statistics, this empirical study delves into this matter. The behaviors and
attitudes of a group of older adults aged 60 years and older (n=275) regarding
different dimensions of online safety and cybersecurity are investigated. The
results show that older adults report a discernible degree of concern about the
security of their personal information. Despite the varied precautions taken,
most of them do not know where to report online threats. What is more,
regarding key demographics, the study found some significant differences in
terms of gender and age group, but not disability status. This implies that
older adults do not seem to constitute a homogeneous group when it comes to
attitudes and behaviors regarding safety and security online. The study
concludes that support systems should include older adults in the development
of protective measures and acknowledge their diversity. The implications of the
results are discussed and some directions for future research are proposed.
\\ ( https://arxiv.org/abs/2403.09208 , 425kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09216
Date: Thu, 14 Mar 2024 09:31:20 GMT (432kb)
Title: Unlocking the Potential of Open Government Data: Exploring the
Strategic, Technical, and Application Perspectives of High-Value Datasets
Opening in Taiwan
Authors: Hsien-Lee Tseng, Anastasija Nikiforova
Categories: cs.CY
Comments: This paper has been accepted for publication in Proceedings of the
25th Annual International Conference on Digital Government Research and this
is a pre-print version of the manuscript. It is posted here for your personal
use. Not for redistribution
\\
Today, data has an unprecedented value as it forms the basis for data-driven
decision-making, including serving as an input for AI models, where the latter
is highly dependent on the availability of the data. However, availability of
data in an open data format creates a little added value, where the value of
these data, i.e., their relevance to the real needs of the end user, is key.
This is where the concept of high-value dataset (HVD) comes into play, which
has become popular in recent years. Defining and opening HVD is an ongoing
process consisting of a set of interrelated steps, the implementation of which
may vary from one country or region to another. Therefore, there has recently
been a call to conduct research in a country or region setting considered to be
of greatest national value. So far, only a few studies have been conducted at
the regional or national level, most of which consider only one step of the
process, such as identifying HVD or measuring their impact. With this study, we
answer this call and examine the national case of Taiwan by exploring the
entire lifecycle of HVD opening. The aim of the paper is to understand and
evaluate the lifecycle of high-value dataset publishing in one of the world's
leading producers of information and communication technology (ICT) products -
Taiwan. To do this, we conduct a qualitative study with exploratory interviews
with representatives from government agencies in Taiwan responsible for HVD
opening, exploring HVD opening lifecycle. As such, we examine (1) strategic
aspects related to the HVD determination process, (2) technical aspects, and
(3) application aspects.
\\ ( https://arxiv.org/abs/2403.09216 , 432kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08818
Date: Mon, 19 Feb 2024 23:48:40 GMT (647kb,D)
Title: Multimodal Fusion of EHR in Structures and Semantics: Integrating
Clinical Records and Notes with Hypergraph and LLM
Authors: Hejie Cui, Xinyu Fang, Ran Xu, Xuan Kan, Joyce C. Ho, Carl Yang
Categories: cs.LG cs.AI cs.CL
\\
Electronic Health Records (EHRs) have become increasingly popular to support
clinical decision-making and healthcare in recent decades. EHRs usually contain
heterogeneous information, such as structural data in tabular form and
unstructured data in textual notes. Different types of information in EHRs can
complement each other and provide a more complete picture of the health status
of a patient. While there has been a lot of research on representation learning
of structured EHR data, the fusion of different types of EHR data (multimodal
fusion) is not well studied. This is mostly because of the complex medical
coding systems used and the noise and redundancy present in the written notes.
In this work, we propose a new framework called MINGLE, which integrates both
structures and semantics in EHR effectively. Our framework uses a two-level
infusion strategy to combine medical concept semantics and clinical note
semantics into hypergraph neural networks, which learn the complex interactions
between different types of data to generate visit representations for
downstream prediction. Experiment results on two EHR datasets, the public
MIMIC-III and private CRADLE, show that MINGLE can effectively improve
predictive performance by 11.83% relatively, enhancing semantic integration as
well as multimodal fusion for structural and textual EHR data.
\\ ( https://arxiv.org/abs/2403.08818 , 647kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08819
Date: Tue, 20 Feb 2024 04:13:48 GMT (2161kb,D)
Title: Thermometer: Towards Universal Calibration for Large Language Models
Authors: Maohao Shen, Subhro Das, Kristjan Greenewald, Prasanna Sattigeri,
Gregory Wornell, Soumya Ghosh
Categories: cs.LG cs.CL stat.ML
\\
We consider the issue of calibration in large language models (LLM). Recent
studies have found that common interventions such as instruction tuning often
result in poorly calibrated LLMs. Although calibration is well-explored in
traditional applications, calibrating LLMs is uniquely challenging. These
challenges stem as much from the severe computational requirements of LLMs as
from their versatility, which allows them to be applied to diverse tasks.
Addressing these challenges, we propose THERMOMETER, a calibration approach
tailored to LLMs. THERMOMETER learns an auxiliary model, given data from
multiple tasks, for calibrating a LLM. It is computationally efficient,
preserves the accuracy of the LLM, and produces better-calibrated responses for
new tasks. Extensive empirical evaluations across various benchmarks
demonstrate the effectiveness of the proposed method.
\\ ( https://arxiv.org/abs/2403.08819 , 2161kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08820
Date: Wed, 21 Feb 2024 19:36:24 GMT (15522kb,D)
Title: Diet-ODIN: A Novel Framework for Opioid Misuse Detection with
Interpretable Dietary Patterns
Authors: Zheyuan Zhang, Zehong Wang, Shifu Hou, Evan Hall, Landon Bachman,
Vincent Galassi, Jasmine White, Nitesh V. Chawla, Chuxu Zhang, Yanfang Ye
Categories: cs.LG cs.AI cs.SI
\\
The opioid crisis has been one of the most critical society concerns in the
United States. Although the medication assisted treatment (MAT) is recognized
as the most effective treatment for opioid misuse and addiction, the various
side effects can trigger opioid relapse. In addition to MAT, the dietary
nutrition intervention has been demonstrated its importance in opioid misuse
prevention and recovery. However, research on the alarming connections between
dietary patterns and opioid misuse remain under-explored. In response to this
gap, in this paper, we first establish a large-scale multifaceted dietary
benchmark dataset related to opioid users at the first attempt and then develop
a novel framework - i.e., namely Opioid Misuse Detection with Interpretable
Dietary Patterns (Diet-ODIN) - to bridge heterogeneous graph (HG) and large
language model (LLM) for the identification of users with opioid misuse and the
interpretation of their associated dietary patterns. Specifically, in
Diet-ODIN, we first construct an HG to comprehensively incorporate both dietary
and health-related information, and then we devise a holistic graph learning
framework with noise reduction to fully capitalize both users' individual
dietary habits and shared dietary patterns for the detection of users with
opioid misuse. To further delve into the intricate correlations between dietary
patterns and opioid misuse, we exploit an LLM by utilizing the knowledge
obtained from the graph learning model for interpretation. The extensive
experimental results based on our established benchmark with quantitative and
qualitative measures demonstrate the outstanding performance of Diet-ODIN in
exploring the complex interplay between opioid misuse and dietary patterns, by
comparison with state-of-the-art baseline methods.
\\ ( https://arxiv.org/abs/2403.08820 , 15522kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08822
Date: Wed, 28 Feb 2024 06:50:10 GMT (661kb)
Title: LoRA-SP: Streamlined Partial Parameter Adaptation for Resource-Efficient
Fine-Tuning of Large Language Models
Authors: Yichao Wu, Yafei Xiang, Shuning Huo, Yulu Gong, Penghao Liang
Categories: cs.LG cs.CL
\\
In addressing the computational and memory demands of fine-tuning Large
Language Models(LLMs), we propose LoRA-SP(Streamlined Partial Parameter
Adaptation), a novel approach utilizing randomized half-selective parameter
freezing within the Low-Rank Adaptation(LoRA)framework. This method efficiently
balances pre-trained knowledge retention and adaptability for task-specific
optimizations. Through a randomized mechanism, LoRA-SP determines which
parameters to update or freeze, significantly reducing computational and memory
requirements without compromising model performance. We evaluated LoRA-SP
across several benchmark NLP tasks, demonstrating its ability to achieve
competitive performance with substantially lower resource consumption compared
to traditional full-parameter fine-tuning and other parameter-efficient
techniques. LoRA-SP innovative approach not only facilitates the deployment of
advanced NLP models in resource-limited settings but also opens new research
avenues into effective and efficient model adaptation strategies.
\\ ( https://arxiv.org/abs/2403.08822 , 661kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08834
Date: Wed, 13 Mar 2024 08:04:00 GMT (1308kb,D)
Title: Predictive Analysis of Tuberculosis Treatment Outcomes Using Machine
Learning: A Karnataka TB Data Study at a Scale
Authors: SeshaSai Nath Chinagudaba, Darshan Gera, Krishna Kiran Vamsi Dasu, Uma
Shankar S, Kiran K, Anil Singarajpure, Shivayogappa.U, Somashekar N, Vineet
Kumar Chadda and Sharath B N
Categories: cs.LG cs.AI
\\
Tuberculosis (TB) remains a global health threat, ranking among the leading
causes of mortality worldwide. In this context, machine learning (ML) has
emerged as a transformative force, providing innovative solutions to the
complexities associated with TB treatment.This study explores how machine
learning, especially with tabular data, can be used to predict Tuberculosis
(TB) treatment outcomes more accurately. It transforms this prediction task
into a binary classification problem, generating risk scores from patient data
sourced from NIKSHAY, India's national TB control program, which includes over
500,000 patient records.
Data preprocessing is a critical component of the study, and the model
achieved an recall of 98% and an AUC-ROC score of 0.95 on the validation set,
which includes 20,000 patient records.We also explore the use of Natural
Language Processing (NLP) for improved model learning. Our results,
corroborated by various metrics and ablation studies, validate the
effectiveness of our approach. The study concludes by discussing the potential
ramifications of our research on TB eradication efforts and proposing potential
avenues for future work. This study marks a significant stride in the battle
against TB, showcasing the potential of machine learning in healthcare.
\\ ( https://arxiv.org/abs/2403.08834 , 1308kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08835
Date: Wed, 13 Mar 2024 08:10:18 GMT (894kb)
Title: Stacking-based deep neural network for player scouting in football 1
Authors: Simon Lacan (IMT Nord Europe)
Categories: cs.LG cs.AI
\\
Datascouting is one of the most known data applications in professional
sport, and specifically football. Its objective is to analyze huge database of
players in order to detect high potentials that can be then individually
considered by human scouts. In this paper, we propose a stacking-based deep
learning model to detect high potential football players. Applied on
open-source database, our model obtains significantly better results that
classical statistical methods.
\\ ( https://arxiv.org/abs/2403.08835 , 894kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08836
Date: Wed, 13 Mar 2024 08:15:18 GMT (1135kb,D)
Title: Structural Positional Encoding for knowledge integration in
transformer-based medical process monitoring
Authors: Christopher Irwin, Marco Dossena, Giorgio Leonardi, Stefania Montani
Categories: cs.LG cs.AI
\\
Predictive process monitoring is a process mining task aimed at forecasting
information about a running process trace, such as the most correct next
activity to be executed. In medical domains, predictive process monitoring can
provide valuable decision support in atypical and nontrivial situations.
Decision support and quality assessment in medicine cannot ignore domain
knowledge, in order to be grounded on all the available information (which is
not limited to data) and to be really acceptable by end users.
In this paper, we propose a predictive process monitoring approach relying on
the use of a {\em transformer}, a deep learning architecture based on the
attention mechanism. A major contribution of our work lies in the incorporation
of ontological domain-specific knowledge, carried out through a graph
positional encoding technique. The paper presents and discusses the encouraging
experimental result we are collecting in the domain of stroke management.
\\ ( https://arxiv.org/abs/2403.08836 , 1135kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08837
Date: Wed, 13 Mar 2024 08:39:21 GMT (318kb,D)
Title: Cyclic Data Parallelism for Efficient Parallelism of Deep Neural
Networks
Authors: Louis Fournier (MLIA), Edouard Oyallon
Categories: cs.LG cs.AI cs.DC cs.NE stat.ML
\\
Training large deep learning models requires parallelization techniques to
scale. In existing methods such as Data Parallelism or ZeRO-DP, micro-batches
of data are processed in parallel, which creates two drawbacks: the total
memory required to store the model's activations peaks at the end of the
forward pass, and gradients must be simultaneously averaged at the end of the
backpropagation step. We propose Cyclic Data Parallelism, a novel paradigm
shifting the execution of the micro-batches from simultaneous to sequential,
with a uniform delay. At the cost of a slight gradient delay, the total memory
taken by activations is constant, and the gradient communications are balanced
during the training step. With Model Parallelism, our technique reduces the
number of GPUs needed, by sharing GPUs across micro-batches. Within the ZeRO-DP
framework, our technique allows communication of the model states with
point-to-point operations rather than a collective broadcast operation. We
illustrate the strength of our approach on the CIFAR-10 and ImageNet datasets.
\\ ( https://arxiv.org/abs/2403.08837 , 318kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08838
Date: Wed, 13 Mar 2024 12:05:02 GMT (3375kb,D)
Title: Predictive Clustering of Vessel Behavior Based on Hierarchical
Trajectory Representation
Authors: Rui Zhang, Hanyue Wu, Zhenzhong Yin, Zhu Xiao, Yong Xiong, and Kezhong
Liu
Categories: cs.LG cs.AI
\\
Vessel trajectory clustering, which aims to find similar trajectory patterns,
has been widely leveraged in overwater applications. Most traditional methods
use predefined rules and thresholds to identify discrete vessel behaviors. They
aim for high-quality clustering and conduct clustering on entire sequences,
whether the original trajectory or its sub-trajectories, failing to represent
their evolution. To resolve this problem, we propose a Predictive Clustering of
Hierarchical Vessel Behavior (PC-HiV). PC-HiV first uses hierarchical
representations to transform every trajectory into a behavioral sequence. Then,
it predicts evolution at each timestamp of the sequence based on the
representations. By applying predictive clustering and latent encoding, PC-HiV
improves clustering and predictions simultaneously. Experiments on real AIS
datasets demonstrate PC-HiV's superiority over existing methods, showcasing its
effectiveness in capturing behavioral evolution discrepancies between vessel
types (tramp vs. liner) and within emission control areas. Results show that
our method outperforms NN-Kmeans and Robust DAA by 3.9% and 6.4% of the purity
score.
\\ ( https://arxiv.org/abs/2403.08838 , 3375kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08839
Date: Wed, 13 Mar 2024 12:08:27 GMT (537kb,D)
Title: Learning-Enhanced Neighborhood Selection for the Vehicle Routing Problem
with Time Windows
Authors: Willem Feijen, Guido Sch\"afer, Koen Dekker, Seppo Pieterse
Categories: cs.LG
MSC-class: 90-05
\\
Large Neighborhood Search (LNS) is a universal approach that is broadly
applicable and has proven to be highly efficient in practice for solving
optimization problems. We propose to integrate machine learning (ML) into LNS
to assist in deciding which parts of the solution should be destroyed and
repaired in each iteration of LNS. We refer to our new approach as
Learning-Enhanced Neighborhood Selection (LENS for short). Our approach is
universally applicable, i.e., it can be applied to any LNS algorithm to amplify
the workings of the destroy algorithm. In this paper, we demonstrate the
potential of LENS on the fundamental Vehicle Routing Problem with Time Windows
(VRPTW). We implemented an LNS algorithm for VRPTW and collected data on
generated novel training instances derived from well-known, extensively
utilized benchmark datasets. We trained our LENS approach with this data and
compared the experimental results of our approach with two benchmark
algorithms: a random neighborhood selection method to show that LENS learns to
make informed choices and an oracle neighborhood selection method to
demonstrate the potential of our LENS approach. With LENS, we obtain results
that significantly improve the quality of the solutions.
\\ ( https://arxiv.org/abs/2403.08839 , 537kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08845
Date: Wed, 13 Mar 2024 16:30:57 GMT (25270kb,D)
Title: Bifurcated Attention for Single-Context Large-Batch Sampling
Authors: Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda,
Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen,
Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang
Categories: cs.LG cs.AI
\\
In our study, we present bifurcated attention, a method developed for
language model inference in single-context batch sampling contexts. This
approach aims to reduce redundant memory IO costs, a significant factor in
latency for high batch sizes and long context lengths. Bifurcated attention
achieves this by dividing the attention mechanism during incremental decoding
into two distinct GEMM operations, focusing on the KV cache from prefill and
the decoding process. This method ensures precise computation and maintains the
usual computational load (FLOPs) of standard attention mechanisms, but with
reduced memory IO. Bifurcated attention is also compatible with multi-query
attention mechanism known for reduced memory IO for KV cache, further enabling
higher batch size and context length. The resulting efficiency leads to lower
latency, improving suitability for real-time applications, e.g., enabling
massively-parallel answer generation without substantially increasing latency,
enhancing performance when integrated with postprocessing techniques such as
reranking.
\\ ( https://arxiv.org/abs/2403.08845 , 25270kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08879
Date: Wed, 13 Mar 2024 18:05:16 GMT (10655kb,D)
Title: Multi-Objective Optimization Using Adaptive Distributed Reinforcement
Learning
Authors: Jing Tan, Ramin Khalili, Holger Karl
Categories: cs.LG cs.AI cs.MA
\\
The Intelligent Transportation System (ITS) environment is known to be
dynamic and distributed, where participants (vehicle users, operators, etc.)
have multiple, changing and possibly conflicting objectives. Although
Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS
applications such as resource management and offloading, most RL algorithms
focus on single objectives. In many situations, converting a multi-objective
problem into a single-objective one is impossible, intractable or insufficient,
making such RL algorithms inapplicable. We propose a multi-objective,
multi-agent reinforcement learning (MARL) algorithm with high learning
efficiency and low computational requirements, which automatically triggers
adaptive few-shot learning in a dynamic, distributed and noisy environment with
sparse and delayed reward. We test our algorithm in an ITS environment with
edge cloud computing. Empirical results show that the algorithm is quick to
adapt to new environments and performs better in all individual and system
metrics compared to the state-of-the-art benchmark. Our algorithm also
addresses various practical concerns with its modularized and asynchronous
online training method. In addition to the cloud simulation, we test our
algorithm on a single-board computer and show that it can make inference in 6
milliseconds.
\\ ( https://arxiv.org/abs/2403.08879 , 10655kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08880
Date: Wed, 13 Mar 2024 18:06:43 GMT (4011kb,D)
Title: REFRESH: Responsible and Efficient Feature Reselection Guided by SHAP
Values
Authors: Shubham Sharma, Sanghamitra Dutta, Emanuele Albini, Freddy Lecue,
Daniele Magazzeni, Manuela Veloso
Categories: cs.LG
DOI: 10.1145/3600211.3604706
\\
Feature selection is a crucial step in building machine learning models. This
process is often achieved with accuracy as an objective, and can be cumbersome
and computationally expensive for large-scale datasets. Several additional
model performance characteristics such as fairness and robustness are of
importance for model development. As regulations are driving the need for more
trustworthy models, deployed models need to be corrected for model
characteristics associated with responsible artificial intelligence. When
feature selection is done with respect to one model performance characteristic
(eg. accuracy), feature selection with secondary model performance
characteristics (eg. fairness and robustness) as objectives would require going
through the computationally expensive selection process from scratch. In this
paper, we introduce the problem of feature \emph{reselection}, so that features
can be selected with respect to secondary model performance characteristics
efficiently even after a feature selection process has been done with respect
to a primary objective. To address this problem, we propose REFRESH, a method
to reselect features so that additional constraints that are desirable towards
model performance can be achieved without having to train several new models.
REFRESH's underlying algorithm is a novel technique using SHAP values and
correlation analysis that can approximate for the predictions of a model
without having to train these models. Empirical evaluations on three datasets,
including a large-scale loan defaulting dataset show that REFRESH can help find
alternate models with better model characteristics efficiently. We also discuss
the need for reselection and REFRESH based on regulation desiderata.
\\ ( https://arxiv.org/abs/2403.08880 , 4011kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08896
Date: Wed, 13 Mar 2024 18:37:16 GMT (42kb)
Title: One-Shot Averaging for Distributed TD($\lambda$) Under Markov Sampling
Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky
Categories: cs.LG
\\
We consider a distributed setup for reinforcement learning, where each agent
has a copy of the same Markov Decision Process but transitions are sampled from
the corresponding Markov chain independently by each agent. We show that in
this setting, we can achieve a linear speedup for TD($\lambda$), a family of
popular methods for policy evaluation, in the sense that $N$ agents can
evaluate a policy $N$ times faster provided the target accuracy is small
enough. Notably, this speedup is achieved by ``one shot averaging,'' a
procedure where the agents run TD($\lambda$) with Markov sampling independently
and only average their results after the final step. This significantly reduces
the amount of communication required to achieve a linear speedup relative to
previous work.
\\ ( https://arxiv.org/abs/2403.08896 , 42kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08946
Date: Wed, 13 Mar 2024 20:25:27 GMT (5156kb,D)
Title: Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM
Era
Authors: Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang,
Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu
Categories: cs.LG cs.CL cs.CY
Comments: 38 pages, 4 figures
\\
Explainable AI (XAI) refers to techniques that provide human-understandable
insights into the workings of AI models. Recently, the focus of XAI is being
extended towards Large Language Models (LLMs) which are often criticized for
their lack of transparency. This extension calls for a significant
transformation in XAI methodologies because of two reasons. First, many
existing XAI methods cannot be directly applied to LLMs due to their complexity
advanced capabilities. Second, as LLMs are increasingly deployed across diverse
industry applications, the role of XAI shifts from merely opening the "black
box" to actively enhancing the productivity and applicability of LLMs in
real-world settings. Meanwhile, unlike traditional machine learning models that
are passive recipients of XAI insights, the distinct abilities of LLMs can
reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in
the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems,
and (2) how LLMs can contribute to the advancement of XAI. We introduce 10
strategies, introducing the key techniques for each and discussing their
associated challenges. We also provide case studies to demonstrate how to
obtain and leverage explanations. The code used in this paper can be found at:
https://github.com/JacksonWuxs/UsableXAI_LLM.
\\ ( https://arxiv.org/abs/2403.08946 , 5156kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08955
Date: Wed, 13 Mar 2024 20:50:49 GMT (999kb,D)
Title: Towards Efficient Risk-Sensitive Policy Gradient: An Iteration
Complexity Analysis
Authors: Rui Liu, Erfaun Noorani, Pratap Tokekar, John S. Baras
Categories: cs.LG cs.AI
\\
Reinforcement Learning (RL) has shown exceptional performance across various
applications, enabling autonomous agents to learn optimal policies through
interaction with their environments. However, traditional RL frameworks often
face challenges in terms of iteration complexity and robustness. Risk-sensitive
RL, which balances expected return and risk, has been explored for its
potential to yield probabilistically robust policies, yet its iteration
complexity analysis remains underexplored. In this study, we conduct a thorough
iteration complexity analysis for the risk-sensitive policy gradient method,
focusing on the REINFORCE algorithm and employing the exponential utility
function. We obtain an iteration complexity of $\mathcal{O}(\epsilon^{-2})$ to
reach an $\epsilon$-approximate first-order stationary point (FOSP). We
investigate whether risk-sensitive algorithms can achieve better iteration
complexity compared to their risk-neutral counterparts. Our theoretical
analysis demonstrates that risk-sensitive REINFORCE can have a reduced number
of iterations required for convergence. This leads to improved iteration
complexity, as employing the exponential utility does not entail additional
computation per iteration. We characterize the conditions under which
risk-sensitive algorithms can achieve better iteration complexity. Our
simulation results also validate that risk-averse cases can converge and
stabilize more quickly after approximately half of the episodes compared to
their risk-neutral counterparts.
\\ ( https://arxiv.org/abs/2403.08955 , 999kb)
------------------------------------------------------------------------------
\\
arXiv:2403.08980
Date: Wed, 13 Mar 2024 22:10:42 GMT (137kb,D)
Title: Architectural Implications of Neural Network Inference for High
Data-Rate, Low-Latency Scientific Applications
Authors: Olivia Weng, Alexander Redding, Nhan Tran, Javier Mauricio Duarte,
Ryan Kastner
Categories: cs.LG cs.AR
\\
With more scientific fields relying on neural networks (NNs) to process data
incoming at extreme throughputs and latencies, it is crucial to develop NNs
with all their parameters stored on-chip. In many of these applications, there
is not enough time to go off-chip and retrieve weights. Even more so, off-chip
memory such as DRAM does not have the bandwidth required to process these NNs
as fast as the data is being produced (e.g., every 25 ns). As such, these
extreme latency and bandwidth requirements have architectural implications for
the hardware intended to run these NNs: 1) all NN parameters must fit on-chip,
and 2) codesigning custom/reconfigurable logic is often required to meet these
latency and bandwidth constraints. In our work, we show that many scientific NN
applications must run fully on chip, in the extreme case requiring a custom
chip to meet such stringent constraints.
\\ ( https://arxiv.org/abs/2403.08980 , 137kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09035
Date: Thu, 14 Mar 2024 02:11:38 GMT (6378kb,D)
Title: DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers
Authors: Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma
Categories: cs.LG
\\
Enabling efficient and accurate deep neural network (DNN) inference on
microcontrollers is non-trivial due to the constrained on-chip resources.
Current methodologies primarily focus on compressing larger models yet at the
expense of model accuracy. In this paper, we rethink the problem from the
inverse perspective by constructing small/weak models directly and improving
their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference
framework with a selector-classifiers architecture, where the selector routes
each input sample to the appropriate classifier for classification. DiTMoS is
grounded on a key insight: a composition of weak models can exhibit high
diversity and the union of them can significantly boost the accuracy upper
bound. To approach the upper bound, DiTMoS introduces three strategies
including diverse training data splitting to increase the classifiers'
diversity, adversarial selector-classifiers training to ensure synergistic
interactions thereby maximizing their complementarity, and heterogeneous
feature aggregation to improve the capacity of classifiers. We further propose
a network slicing technique to alleviate the extra memory overhead incurred by
feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and
evaluate it based on three time-series datasets for human activity recognition,
keywords spotting, and emotion recognition, respectively. The experiment
results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement
compared to the best baseline; (b) network slicing almost completely eliminates
the memory overhead incurred by feature aggregation with a marginal increase of
latency.
\\ ( https://arxiv.org/abs/2403.09035 , 6378kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09039
Date: Thu, 14 Mar 2024 02:26:10 GMT (10026kb,D)
Title: Spatial-temporal Memories Enhanced Graph Autoencoder for Anomaly
Detection in Dynamic Graphs
Authors: Jie Liu, Xuequn Shang, Xiaolin Han, Wentao Zhang, Hongzhi Yin
Categories: cs.LG cs.AI
\\
Anomaly detection in dynamic graphs presents a significant challenge due to
the temporal evolution of graph structures and attributes. The conventional
approaches that tackle this problem typically employ an unsupervised learning
framework, capturing normality patterns with exclusive normal data during
training and identifying deviations as anomalies during testing. However, these
methods face critical drawbacks: they either only depend on proxy tasks for
general representation without directly pinpointing normal patterns, or they
neglect to differentiate between spatial and temporal normality patterns,
leading to diminished efficacy in anomaly detection. To address these
challenges, we introduce a novel Spatial-Temporal memories-enhanced graph
autoencoder (STRIPE). Initially, STRIPE employs Graph Neural Networks (GNNs)
and gated temporal convolution layers to extract spatial features and temporal
features, respectively. Then STRIPE incorporates separate spatial and temporal
memory networks, which capture and store prototypes of normal patterns, thereby
preserving the uniqueness of spatial and temporal normality. After that,
through a mutual attention mechanism, these stored patterns are then retrieved
and integrated with encoded graph embeddings. Finally, the integrated features
are fed into the decoder to reconstruct the graph streams which serve as the
proxy task for anomaly detection. This comprehensive approach not only
minimizes reconstruction errors but also refines the model by emphasizing the
compactness and distinctiveness of the embeddings in relation to the nearest
memory prototypes. Through extensive testing, STRIPE has demonstrated a
superior capability to discern anomalies by effectively leveraging the distinct
spatial and temporal dynamics of dynamic graphs, significantly outperforming
existing methodologies, with an average improvement of 15.39% on AUC values.
\\ ( https://arxiv.org/abs/2403.09039 , 10026kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09048
Date: Thu, 14 Mar 2024 02:36:16 GMT (2596kb,D)
Title: Taming Cross-Domain Representation Variance in Federated Prototype
Learning with Heterogeneous Data Domains
Authors: Lei Wang, Jieming Bian, Letian Zhang, Chen Chen, Jie Xu
Categories: cs.LG cs.CV
Comments: 16 pages
\\
Federated learning (FL) allows collaborative machine learning training
without sharing private data. While most FL methods assume identical data
domains across clients, real-world scenarios often involve heterogeneous data
domains. Federated Prototype Learning (FedPL) addresses this issue, using mean
feature vectors as prototypes to enhance model generalization. However,
existing FedPL methods create the same number of prototypes for each client,
leading to cross-domain performance gaps and disparities for clients with
varied data distributions. To mitigate cross-domain feature representation
variance, we introduce FedPLVM, which establishes variance-aware dual-level
prototypes clustering and employs a novel $\alpha$-sparsity prototype loss. The
dual-level prototypes clustering strategy creates local clustered prototypes
based on private data features, then performs global prototypes clustering to
reduce communication complexity and preserve local data privacy. The
$\alpha$-sparsity prototype loss aligns samples from underrepresented domains,
enhancing intra-class similarity and reducing inter-class similarity.
Evaluations on Digit-5, Office-10, and DomainNet datasets demonstrate our
method's superiority over existing approaches.
\\ ( https://arxiv.org/abs/2403.09048 , 2596kb)
------------------------------------------------------------------------------
\\
arXiv:2403.09053
Date: Thu, 14 Mar 2024 02:42:19 GMT (129kb,D)