feat(route): add cool paper first-author infomation #17857

Muyun99 · 2024-12-10T17:49:35Z

Involved Issue / 该 PR 相关 Issue

Close #

Example for the Proposed Route(s) / 路由地址示例

/papers/arxiv/cs.RO

New RSS Route Checklist / 新 RSS 路由检查表

New Route / 新的路由
- Follows Script Standard / 跟随路由规范
Anti-bot or rate limit / 反爬/频率限制
- If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
Date and time / 日期和时间
- Parsed / 可以解析
- Correct time zone / 时区正确
New package added / 添加了新的包
Puppeteer

Note / 说明

增加了 cool paper 网站的作者，由于 const feed = await parser.parseURL(feedUrl); 只能解析到第一位作者，目前实现方式只显示了一作

cc @nczitzk ，老哥可以帮忙看下还有没有什么可以拿到所有作者的方式吗。这是示例 rss：https://papers.cool/arxiv/cs.AI/feed

github-actions · 2024-12-10T17:55:11Z

Successfully generated as following:

http://localhost:1200/papers/arxiv/cs.RO - Success ✔️

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Robotics</title>
    <link>https://papers.cool/arxiv/cs.RO</link>
    <atom:link href="http://localhost:1200/papers/arxiv/cs.RO" rel="self" type="application/rss+xml"></atom:link>
    <description>Robotics - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>[email protected] (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Tue, 10 Dec 2024 17:55:10 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06784.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06784&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06784&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Mara Levy&lt;/p&gt; &lt;p&gt;Developing generalizable robot policies that can robustly handle varied environmental conditions and object instances remains a fundamental challenge in robot learning. While considerable efforts have focused on collecting large robot datasets and developing policy architectures to learn from such data, naively learning from visual inputs often results in brittle policies that fail to transfer beyond the training data. This work presents Prescriptive Point Priors for Policies or P3-PO, a novel framework that constructs a unique state representation of the environment leveraging recent advances in computer vision and robot learning to achieve improved out-of-distribution generalization for robot manipulation. This representation is obtained through two steps. First, a human annotator prescribes a set of semantically meaningful points on a single demonstration frame. These points are then propagated through the dataset using off-the-shelf vision models. The derived points serve as an input to state-of-the-art policy architectures for policy learning. Our experiments across four real-world tasks demonstrate an overall 43% absolute improvement over prior methods when evaluated in identical settings as training. Further, P3-PO exhibits 58% and 80% gains across tasks for new object instances and more cluttered environments respectively. Videos illustrating the robot&#39;s performance are best viewed at point-priors.github.io.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06784</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06784</guid>
      <pubDate>Mon, 09 Dec 2024 18:59:42 GMT</pubDate>
      <author>Mara Levy</author>
      <enclosure url="https://arxiv.org/pdf/2412.06784.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06782.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06782&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06782&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Zhefei Gong&lt;/p&gt; &lt;p&gt;In robotic visuomotor policy learning, diffusion-based models have achieved significant success in improving the accuracy of action trajectory generation compared to traditional autoregressive models. However, they suffer from inefficiency due to multiple denoising steps and limited flexibility from complex constraints. In this paper, we introduce Coarse-to-Fine AutoRegressive Policy (CARP), a novel paradigm for visuomotor policy learning that redefines the autoregressive action generation process as a coarse-to-fine, next-scale approach. CARP decouples action generation into two stages: first, an action autoencoder learns multi-scale representations of the entire action sequence; then, a GPT-style transformer refines the sequence prediction through a coarse-to-fine autoregressive process. This straightforward and intuitive approach produces highly accurate and smooth actions, matching or even surpassing the performance of diffusion-based policies while maintaining efficiency on par with autoregressive policies. We conduct extensive evaluations across diverse settings, including single-task and multi-task scenarios on state-based and image-based simulation benchmarks, as well as real-world tasks. CARP achieves competitive success rates, with up to a 10% improvement, and delivers 10x faster inference compared to state-of-the-art policies, establishing a high-performance, efficient, and flexible paradigm for action generation in robotic tasks.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06782</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06782</guid>
      <pubDate>Mon, 09 Dec 2024 18:59:18 GMT</pubDate>
      <author>Zhefei Gong</author>
      <enclosure url="https://arxiv.org/pdf/2412.06782.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06779.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06779&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06779&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Guanxing Lu&lt;/p&gt; &lt;p&gt;Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pre-trained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pre-trained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06779</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06779</guid>
      <pubDate>Mon, 09 Dec 2024 18:58:43 GMT</pubDate>
      <author>Guanxing Lu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06779.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Enhancing Robotic System Robustness via Lyapunov Exponent-Based Optimization</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06776.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06776&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06776&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; G. Fadini&lt;/p&gt; &lt;p&gt;We present a novel approach to quantifying and optimizing stability in robotic systems based on the Lyapunov exponents addressing an open challenge in the field of robot analysis, design, and optimization. Our method leverages differentiable simulation over extended time horizons. The proposed metric offers several properties, including a natural extension to limit cycles commonly encountered in robotics tasks and locomotion. We showcase, with an ad-hoc JAX gradient-based optimization framework, remarkable power, and flexi-bility in tackling the robustness challenge. The effectiveness of our approach is tested through diverse scenarios of varying complexity, encompassing high-degree-of-freedom systems and contact-rich environments. The positive outcomes across these cases highlight the potential of our method in enhancing system robustness.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06776</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06776</guid>
      <pubDate>Mon, 09 Dec 2024 18:58:02 GMT</pubDate>
      <author>G. Fadini</author>
      <enclosure url="https://arxiv.org/pdf/2412.06776.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>FlexEvent: Event Camera Object Detection at Arbitrary Frequencies</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06708.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06708&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06708&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Dongyue Lu&lt;/p&gt; &lt;p&gt;Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to their microsecond-level temporal resolution and asynchronous operation. Existing event-based object detection methods, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event cameras. To address these limitations, we propose FlexEvent, a novel event camera object detection framework that enables detection at arbitrary frequencies. Our approach consists of two key components: FlexFuser, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FAL, a frequency-adaptive learning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06708</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06708</guid>
      <pubDate>Mon, 09 Dec 2024 17:57:14 GMT</pubDate>
      <author>Dongyue Lu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06708.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>CHOICE: Coordinated Human-Object Interaction in Cluttered Environments for Pick-and-Place Actions</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06702.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06702&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06702&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jintao Lu&lt;/p&gt; &lt;p&gt;Animating human-scene interactions such as pick-and-place tasks in cluttered, complex layouts is a challenging task, with objects of a wide variation of geometries and articulation under scenarios with various obstacles. The main difficulty lies in the sparsity of the motion data compared to the wide variation of the objects and environments as well as the poor availability of transition motions between different tasks, increasing the complexity of the generalization to arbitrary conditions. To cope with this issue, we develop a system that tackles the interaction synthesis problem as a hierarchical goal-driven task. Firstly, we develop a bimanual scheduler that plans a set of keyframes for simultaneously controlling the two hands to efficiently achieve the pick-and-place task from an abstract goal signal such as the target object selected by the user. Next, we develop a neural implicit planner that generates guidance hand trajectories under diverse object shape/types and obstacle layouts. Finally, we propose a linear dynamic model for our DeepPhase controller that incorporates a Kalman filter to enable smooth transitions in the frequency domain, resulting in a more realistic and effective multi-objective control of the character.Our system can produce a wide range of natural pick-and-place movements with respect to the geometry of objects, the articulation of containers and the layout of the objects in the scene.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06702</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06702</guid>
      <pubDate>Mon, 09 Dec 2024 17:49:00 GMT</pubDate>
      <author>Jintao Lu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06702.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Neo-FREE: Policy Composition Through Thousand Brains And Free Energy Optimization</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06636.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06636&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06636&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Francesca Rossi&lt;/p&gt; &lt;p&gt;We consider the problem of optimally composing a set of primitives to tackle control tasks. To address this problem, we introduce Neo-FREE: a control architecture inspired by the Thousand Brains Theory and Free Energy Principle from cognitive sciences. In accordance with the neocortical (Neo) processes postulated by the Thousand Brains Theory, Neo-FREE consists of functional units returning control primitives. These are linearly combined by a gating mechanism that minimizes the variational free energy (FREE). The problem of finding the optimal primitives&#39; weights is then recast as a finite-horizon optimal control problem, which is convex even when the cost is not and the environment is nonlinear, stochastic, non-stationary. The results yield an algorithm for primitives composition and the effectiveness of Neo-FREE is illustrated via in-silico and hardware experiments on an application involving robot navigation in an environment with obstacles.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06636</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06636</guid>
      <pubDate>Mon, 09 Dec 2024 16:28:27 GMT</pubDate>
      <author>Francesca Rossi</author>
      <enclosure url="https://arxiv.org/pdf/2412.06636.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>A switching Kalman filter approach to online mitigation and correction sensor corruption for inertial navigation</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06601.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06601&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06601&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Artem Mustaev&lt;/p&gt; &lt;p&gt;This paper introduces a novel approach to detect and address faulty or corrupted external sensors in the context of inertial navigation by leveraging a switching Kalman Filter combined with parameter augmentation. Instead of discarding the corrupted data, the proposed method retains and processes it, running multiple observation models simultaneously and evaluating their likelihoods to accurately identify the true state of the system. We demonstrate the effectiveness of this approach to both identify the moment that a sensor becomes faulty and to correct for the resulting sensor behavior to maintain accurate estimates. We demonstrate our approach on an application of balloon navigation in the atmosphere and shuttle reentry. The results show that our method can accurately recover the true system state even in the presence of significant sensor bias, thereby improving the robustness and reliability of state estimation systems under challenging conditions. We also provide a statistical analysis of problem settings to determine when and where our method is most accurate and where it fails.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06601</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06601</guid>
      <pubDate>Mon, 09 Dec 2024 15:49:56 GMT</pubDate>
      <author>Artem Mustaev</author>
      <enclosure url="https://arxiv.org/pdf/2412.06601.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06491.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06491&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06491&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yihong Xu&lt;/p&gt; &lt;p&gt;Motion forecasting (MF) for autonomous driving aims at anticipating trajectories of surrounding agents in complex urban scenarios. In this work, we investigate a mixed strategy in MF training that first pre-train motion forecasters on pseudo-labeled data, then fine-tune them on annotated data. To obtain pseudo-labeled trajectories, we propose a simple pipeline that leverages off-the-shelf single-frame 3D object detectors and non-learning trackers. The whole pre-training strategy including pseudo-labeling is coined as PPT. Our extensive experiments demonstrate that: (1) combining PPT with supervised fine-tuning on annotated data achieves superior performance on diverse testbeds, especially under annotation-efficient regimes, (2) scaling up to multiple datasets improves the previous state-of-the-art and (3) PPT helps enhance cross-dataset generalization. Our findings showcase PPT as a promising pre-training solution for robust motion forecasting in diverse autonomous driving contexts.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06491</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06491</guid>
      <pubDate>Mon, 09 Dec 2024 13:48:15 GMT</pubDate>
      <author>Yihong Xu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06491.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>An Efficient Scene Coordinate Encoding and Relocalization Method</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06488.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06488&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06488&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Kuan Xu&lt;/p&gt; &lt;p&gt;Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient scene coordinate encoding and relocalization method. Compared with the existing SCR methods, we design a unified architecture for both scene encoding and salient keypoint detection, enabling our system to focus on encoding informative regions, thereby significantly enhancing efficiency. Additionally, we introduce a mechanism that leverages sequential information during both map encoding and relocalization, which strengthens implicit triangulation, particularly in repetitive texture environments. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms other state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06488</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06488</guid>
      <pubDate>Mon, 09 Dec 2024 13:39:18 GMT</pubDate>
      <author>Kuan Xu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06488.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06454.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06454&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06454&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Francis Xiatian Zhang&lt;/p&gt; &lt;p&gt;Surgical workflow anticipation is the task of predicting the timing of relevant surgical events from live video data, which is critical in Robotic-Assisted Surgery (RAS). Accurate predictions require the use of spatial information to model surgical interactions. However, current methods focus solely on surgical instruments, assume static interactions between instruments, and only anticipate surgical events within a fixed time horizon. To address these challenges, we propose an adaptive graph learning framework for surgical workflow anticipation based on a novel spatial representation, featuring three key innovations. First, we introduce a new representation of spatial information based on bounding boxes of surgical instruments and targets, including their detection confidence levels. These are trained on additional annotations we provide for two benchmark datasets. Second, we design an adaptive graph learning method to capture dynamic interactions. Third, we develop a multi-horizon objective that balances learning objectives for different time horizons, allowing for unconstrained predictions. Evaluations on two benchmarks reveal superior performance in short-to-mid-term anticipation, with an error reduction of approximately 3% for surgical phase anticipation and 9% for remaining surgical duration anticipation. These performance improvements demonstrate the effectiveness of our method and highlight its potential for enhancing preparation and coordination within the RAS team. This can improve surgical safety and the efficiency of operating room usage.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06454</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06454</guid>
      <pubDate>Mon, 09 Dec 2024 12:53:08 GMT</pubDate>
      <author>Francis Xiatian Zhang</author>
      <enclosure url="https://arxiv.org/pdf/2412.06454.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Foresee and Act Ahead: Task Prediction and Pre-Scheduling Enabled Efficient Robotic Warehousing</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06425.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06425&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06425&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; B. Cao&lt;/p&gt; &lt;p&gt;In warehousing systems, to enhance logistical efficiency amid surging demand volumes, much focus is placed on how to reasonably allocate tasks to robots. However, the robots labor is still inevitably wasted to some extent. In response to this, we propose a pre-scheduling enhanced warehousing framework that predicts task flow and acts in advance. It consists of task flow prediction and hybrid tasks allocation. For task prediction, we notice that it is possible to provide a spatio-temporal representation of task flow, so we introduce a periodicity-decoupled mechanism tailored for the generation patterns of aggregated orders, and then further extract spatial features of task distribution with novel combination of graph structures. In hybrid tasks allocation, we consider the known tasks and predicted future tasks simultaneously and optimize the allocation dynamically. In addition, we consider factors such as predicted task uncertainty and sector-level efficiency evaluation in warehousing to realize more balanced and rational allocations. We validate our task prediction model across actual datasets derived from real factories, achieving SOTA performance. Furthermore, we implement our compelte scheduling system in a real-world robotic warehouse for months of lifelong validation, demonstrating large improvements in key metrics of warehousing, such as empty running rate, by more than 50%.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06425</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06425</guid>
      <pubDate>Mon, 09 Dec 2024 12:03:29 GMT</pubDate>
      <author>B. Cao</author>
      <enclosure url="https://arxiv.org/pdf/2412.06425.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Parameter Adjustments in POMDP-Based Trajectory Planning for Unsignalized Intersections</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06405.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06405&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06405&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Adam Kollarčík adn Zdeněk Hanzálek&lt;/p&gt; &lt;p&gt;This paper investigates the problem of trajectory planning for autonomous vehicles at unsignalized intersections, specifically focusing on scenarios where the vehicle lacks the right of way and yet must cross safely. To address this issue, we have employed a method based on the Partially Observable Markov Decision Processes (POMDPs) framework designed for planning under uncertainty. The method utilizes the Adaptive Belief Tree (ABT) algorithm as an approximate solver for the POMDPs. We outline the POMDP formulation, beginning with discretizing the intersection&#39;s topology. Additionally, we present a dynamics model for the prediction of the evolving states of vehicles, such as their position and velocity. Using an observation model, we also describe the connection of those states with the imperfect (noisy) available measurements. Our results confirmed that the method is able to plan collision-free trajectories in a series of simulations utilizing real-world traffic data from aerial footage of two distinct intersections. Furthermore, we studied the impact of parameter adjustments of the ABT algorithm on the method&#39;s performance. This provides guidance in determining reasonable parameter settings, which is valuable for future method applications.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06405</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06405</guid>
      <pubDate>Mon, 09 Dec 2024 11:36:13 GMT</pubDate>
      <author>Adam Kollarčík adn Zdeněk Hanzálek</author>
      <enclosure url="https://arxiv.org/pdf/2412.06405.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Sparse Identification of Nonlinear Dynamics-based Model Predictive Control for Multirotor Collision Avoidance</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06388.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06388&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06388&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jayden Dongwoo Lee&lt;/p&gt; &lt;p&gt;This paper proposes a data-driven model predictive control for multirotor collision avoidance considering uncertainty and an unknown model from a payload. To address this challenge, sparse identification of nonlinear dynamics (SINDy) is used to obtain the governing equation of the multirotor system. The SINDy can discover the equations of target systems with low data, assuming that few functions have the dominant characteristic of the system. Model predictive control (MPC) is utilized to obtain accurate trajectory tracking performance by considering state and control input constraints. To avoid a collision during operation, MPC optimization problem is again formulated using inequality constraints about an obstacle. In simulation, SINDy can discover a governing equation of multirotor system including mass parameter uncertainty and aerodynamic effects. In addition, the simulation results show that the proposed method has the capability to avoid an obstacle and track the desired trajectory accurately.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06388</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06388</guid>
      <pubDate>Mon, 09 Dec 2024 11:13:57 GMT</pubDate>
      <author>Jayden Dongwoo Lee</author>
      <enclosure url="https://arxiv.org/pdf/2412.06388.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06359.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06359&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06359&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jesse Hagenaars&lt;/p&gt; &lt;p&gt;Event cameras provide low-latency perception for only milliwatts of power. This makes them highly suitable for resource-restricted, agile robots such as small flying drones. Self-supervised learning based on contrast maximization holds great potential for event-based robot vision, as it foregoes the need to high-frequency ground truth and allows for online learning in the robot&#39;s operational environment. However, online, onboard learning raises the major challenge of achieving sufficient computational efficiency for real-time learning, while maintaining competitive visual perception performance. In this work, we improve the time and memory efficiency of the contrast maximization learning pipeline. Benchmarking experiments show that the proposed pipeline achieves competitive results with the state of the art on the task of depth estimation from events. Furthermore, we demonstrate the usability of the learned depth for obstacle avoidance through real-world flight experiments. Finally, we compare the performance of different combinations of pre-training and fine-tuning of the depth estimation networks, showing that on-board domain adaptation is feasible given a few minutes of flight.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06359</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06359</guid>
      <pubDate>Mon, 09 Dec 2024 10:23:03 GMT</pubDate>
      <author>Jesse Hagenaars</author>
      <enclosure url="https://arxiv.org/pdf/2412.06359.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Vision-Based Deep Reinforcement Learning of UAV Autonomous Navigation Using Privileged Information</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06313.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06313&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06313&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Junqiao Wang&lt;/p&gt; &lt;p&gt;The capability of UAVs for efficient autonomous navigation and obstacle avoidance in complex and unknown environments is critical for applications in agricultural irrigation, disaster relief and logistics. In this paper, we propose the DPRL (Distributed Privileged Reinforcement Learning) navigation algorithm, an end-to-end policy designed to address the challenge of high-speed autonomous UAV navigation under partially observable environmental conditions. Our approach combines deep reinforcement learning with privileged learning to overcome the impact of observation data corruption caused by partial observability. We leverage an asymmetric Actor-Critic architecture to provide the agent with privileged information during training, which enhances the model&#39;s perceptual capabilities. Additionally, we present a multi-agent exploration strategy across diverse environments to accelerate experience collection, which in turn expedites model convergence. We conducted extensive simulations across various scenarios, benchmarking our DPRL algorithm against the state-of-the-art navigation algorithms. The results consistently demonstrate the superior performance of our algorithm in terms of flight efficiency, robustness and overall success rate.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06313</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06313</guid>
      <pubDate>Mon, 09 Dec 2024 09:05:52 GMT</pubDate>
      <author>Junqiao Wang</author>
      <enclosure url="https://arxiv.org/pdf/2412.06313.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>A Scalable Decentralized Reinforcement Learning Framework for UAV Target Localization Using Recurrent PPO</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06231.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06231&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06231&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Leon Fernando&lt;/p&gt; &lt;p&gt;The rapid advancements in unmanned aerial vehicles (UAVs) have unlocked numerous applications, including environmental monitoring, disaster response, and agricultural surveying. Enhancing the collective behavior of multiple decentralized UAVs can significantly improve these applications through more efficient and coordinated operations. In this study, we explore a Recurrent PPO model for target localization in perceptually degraded environments like places without GNSS/GPS signals. We first developed a single-drone approach for target identification, followed by a decentralized two-drone model. Our approach can utilize two types of sensors on the UAVs, a detection sensor and a target signal sensor. The single-drone model achieved an accuracy of 93%, while the two-drone model achieved an accuracy of 86%, with the latter requiring fewer average steps to locate the target. This demonstrates the potential of our method in UAV swarms, offering efficient and effective localization of radiant targets in complex environmental conditions.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06231</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06231</guid>
      <pubDate>Mon, 09 Dec 2024 06:08:23 GMT</pubDate>
      <author>Leon Fernando</author>
      <enclosure url="https://arxiv.org/pdf/2412.06231.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06224.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06224&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06224&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jiazhao Zhang&lt;/p&gt; &lt;p&gt;A practical navigation agent must be capable of handling a wide range of interaction demands, such as following instructions, searching objects, answering questions, tracking people, and more. Existing models for embodied navigation fall short of serving as practical generalists in the real world, as they are often constrained by specific task configurations or pre-defined maps with discretized waypoints. In this work, we present Uni-NaVid, the first video-based vision-language-action (VLA) model designed to unify diverse embodied navigation tasks and enable seamless navigation for mixed long-horizon tasks in unseen real-world environments. Uni-NaVid achieves this by harmonizing the input and output data configurations for all commonly used embodied navigation tasks and thereby integrating all tasks in one model. For training Uni-NaVid, we collect 3.6 million navigation data samples in total from four essential navigation sub-tasks and foster synergy in learning across them. Extensive experiments on comprehensive navigation benchmarks clearly demonstrate the advantages of unification modeling in Uni-NaVid and show it achieves state-of-the-art performance. Additionally, real-world experiments confirm the model&#39;s effectiveness and efficiency, shedding light on its strong generalizability.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06224</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06224</guid>
      <pubDate>Mon, 09 Dec 2024 05:55:55 GMT</pubDate>
      <author>Jiazhao Zhang</author>
      <enclosure url="https://arxiv.org/pdf/2412.06224.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Discrete-Time Distribution Steering using Monte Carlo Tree Search</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06220.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06220&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06220&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Alexandros E. Tzikas&lt;/p&gt; &lt;p&gt;Optimal control problems with state distribution constraints have attracted interest for their expressivity, but solutions rely on linear approximations. We approach the problem of driving the state of a dynamical system in distribution from a sequential decision-making perspective. We formulate the optimal control problem as an appropriate Markov decision process (MDP), where the actions correspond to the state-feedback control policies. We then solve the MDP using Monte Carlo tree search (MCTS). This renders our method suitable for any dynamics model. A key component of our approach is a novel, easy to compute, distance metric in the distribution space that allows our algorithm to guide the distribution of the state. We experimentally test our algorithm under both linear and nonlinear dynamics.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06220</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06220</guid>
      <pubDate>Mon, 09 Dec 2024 05:30:50 GMT</pubDate>
      <author>Alexandros E. Tzikas</author>
      <enclosure url="https://arxiv.org/pdf/2412.06220.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Modeling, Planning, and Control for Hybrid UAV Transition Maneuvers</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06197.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06197&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06197&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Spencer Folk&lt;/p&gt; &lt;p&gt;Small unmanned aerial vehicles (UAVs) have become standard tools in reconnaissance and surveying for both civilian and defense applications. In the future, UAVs will likely play a pivotal role in autonomous package delivery, but current multi-rotor candidates suffer from poor energy efficiency leading to insufficient endurance and range. In order to reduce the power demands of package delivery UAVs while still maintaining necessary hovering capabilities, companies like Amazon are experimenting with hybrid Vertical Take-Off and Landing (VTOL) platforms. Tailsitter VTOLs offer a mechanically simple and cost-effective solution compared to other hybrid VTOL configurations, and while advances in hardware and microelectronics have optimized the tailsitter for package delivery, the software behind its operation has largely remained a critical barrier to industry adoption. Tailsitters currently lack a generic, computationally efficient method of control that can provide strong safety and robustness guarantees over the entire flight domain. Further, tailsitters lack a closed-form method of designing dynamically feasible transition maneuvers between hover and cruise. In this paper, we survey the modeling and control methods currently implemented on small-scale tailsitter UAVs, and attempt to leverage a nonlinear dynamic model to design physically realizable, continuous-pitch transition maneuvers at constant altitude. Primary results from this paper isolate potential barriers to constant-altitude transition, and a novel approach to bypassing these barriers is proposed. While initial results are unsuccessful at providing feasible transition, this work acts as a stepping stone for future efforts to design new transition maneuvers that are safe, robust, and computationally efficient.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06197</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06197</guid>
      <pubDate>Mon, 09 Dec 2024 04:26:43 GMT</pubDate>
      <author>Spencer Folk</author>
      <enclosure url="https://arxiv.org/pdf/2412.06197.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06192.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06192&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06192&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jiwon Choi&lt;/p&gt; &lt;p&gt;Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots. To address the scarcity of comprehensive datasets capturing these dynamic scenarios, we introduce a new multi-modal dataset that includes image and point-wise annotations of maritime hazards. Our dataset provides detailed ground truth for obstacle detection and tracking, including objects as small as 10$\times$10 pixels, which are crucial for maritime safety. To validate the dataset&#39;s effectiveness as a reliable benchmark, we conducted evaluations using various methodologies, including \ac{SOTA} techniques for object detection and tracking. These evaluations are expected to contribute to performance improvements, particularly in the complex maritime environment. To the best of our knowledge, this is the first dataset offering multi-modal annotations specifically tailored to maritime environments. Our dataset is available at https://sites.google.com/view/polaris-dataset.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06192</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06192</guid>
      <pubDate>Mon, 09 Dec 2024 04:18:07 GMT</pubDate>
      <author>Jiwon Choi</author>
      <enclosure url="https://arxiv.org/pdf/2412.06192.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06142.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06142&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06142&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Zonglin Meng&lt;/p&gt; &lt;p&gt;Cooperative perception has attracted wide attention given its capability to leverage shared information across connected automated vehicles (CAVs) and smart infrastructures to address sensing occlusion and range limitation issues. However, existing research overlooks the fragile multi-sensor correlations in multi-agent settings, as the heterogeneous agent sensor measurements are highly susceptible to environmental factors, leading to weakened inter-agent sensor interactions. The varying operational conditions and other real-world factors inevitably introduce multifactorial noise and consequentially lead to multi-sensor misalignment, making the deployment of multi-agent multi-modality perception particularly challenging in the real world. In this paper, we propose AgentAlign, a real-world heterogeneous agent cross-modality feature alignment framework, to effectively address these multi-modality misalignment issues. Our method introduces a cross-modality feature alignment space (CFAS) and heterogeneous agent feature alignment (HAFA) mechanism to harmonize multi-modality features across various agents dynamically. Additionally, we present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions, facilitating a systematic evaluation of our approach&#39;s robustness. Extensive experiments on the V2X-Real and V2XSet-Noise benchmarks demonstrate that our framework achieves state-of-the-art performance, underscoring its potential for real-world applications in cooperative autonomous driving. The controllable V2XSet-Noise dataset and generation pipeline will be released in the future.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06142</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06142</guid>
      <pubDate>Mon, 09 Dec 2024 01:51:18 GMT</pubDate>
      <author>Zonglin Meng</author>
      <enclosure url="https://arxiv.org/pdf/2412.06142.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>How Accurate is the Positioning in VR? Using Motion Capture and Robotics to Compare Positioning Capabilities of Popular VR Headsets</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06116.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06116&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06116&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Adam Banaszczyk&lt;/p&gt; &lt;p&gt;In this paper, we introduce a new methodology for assessing the positioning accuracy of virtual reality (VR) headsets, utilizing a cooperative industrial robot to simulate user head trajectories in a reproducible manner. We conduct a comprehensive evaluation of two popular VR headsets, i.e., Meta Quest 2 and Meta Quest Pro. Using head movement trajectories captured from realistic VR game scenarios with motion capture, we compared the performance of these headsets in terms of precision and reliability. Our analysis revealed that both devices exhibit high positioning accuracy, with no significant differences between them. These findings may provide insights for developers and researchers seeking to optimize their VR experiences in particular contexts such as manufacturing.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06116</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06116</guid>
      <pubDate>Mon, 09 Dec 2024 00:37:50 GMT</pubDate>
      <author>Adam Banaszczyk</author>
      <enclosure url="https://arxiv.org/pdf/2412.06116.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Self-supervised cost of transport estimation for multimodal path planning</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06101.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06101&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06101&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Vincent Gherold&lt;/p&gt; &lt;p&gt;Autonomous robots operating in real environments are often faced with decisions on how best to navigate their surroundings. In this work, we address a particular instance of this problem: how can a robot autonomously decide on the energetically optimal path to follow given a high-level objective and information about the surroundings? To tackle this problem we developed a self-supervised learning method that allows the robot to estimate the cost of transport of its surroundings using only vision inputs. We apply our method to the multi-modal mobility morphobot (M4), a robot that can drive, fly, segway, and crawl through its environment. By deploying our system in the real world, we show that our method accurately assigns different cost of transports to various types of environments e.g. grass vs smooth road. We also highlight the low computational cost of our method, which is deployed on an Nvidia Jetson Orin Nano robotic compute unit. We believe that this work will allow multi-modal robotic platforms to unlock their full potential for navigation and exploration tasks.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06101</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06101</guid>
      <pubDate>Sun, 08 Dec 2024 23:02:35 GMT</pubDate>
      <author>Vincent Gherold</author>
      <enclosure url="https://arxiv.org/pdf/2412.06101.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06080.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06080&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06080&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Karlo Koledic&lt;/p&gt; &lt;p&gt;Generalizing metric monocular depth estimation presents a significant challenge due to its ill-posed nature, while the entanglement between camera parameters and depth amplifies issues further, hindering multi-dataset training and zero-shot accuracy. This challenge is particularly evident in autonomous vehicles and mobile robotics, where data is collected with fixed camera setups, limiting the geometric diversity. Yet, this context also presents an opportunity: the fixed relationship between the camera and the ground plane imposes additional perspective geometry constraints, enabling depth regression via vertical image positions of objects. However, this cue is highly susceptible to overfitting, thus we propose a novel canonical representation that maintains consistency across varied camera setups, effectively disentangling depth from specific parameters and enhancing generalization across datasets. We also propose a novel architecture that adaptively and probabilistically fuses depths estimated via object size and vertical image position cues. A comprehensive evaluation demonstrates the effectiveness of the proposed approach on five autonomous driving datasets, achieving accurate metric depth estimation for varying resolutions, aspect ratios and camera setups. Notably, we achieve comparable accuracy to existing zero-shot methods, despite training on a single dataset with a single-camera setup.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06080</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06080</guid>
      <pubDate>Sun, 08 Dec 2024 22:04:34 GMT</pubDate>
      <author>Karlo Koledic</author>
      <enclosure url="https://arxiv.org/pdf/2412.06080.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Teleoperation of Continuum Instruments: Investigation of Linear vs. Angular Commands through Task-Priority Analysis</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06035.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06035&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06035&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Ehsan Nasiri&lt;/p&gt; &lt;p&gt;This paper addresses the challenge of teleoperating continuum instruments for minimally invasive surgery (MIS). We develop and adopt a novel task-priority-based kinematic formulation to quantitatively investigate teleoperation commands for continuum instruments under remote center of motion (RCM) constraints. Using redundancy resolution methods, we investigate the kinematic performance during teleoperation, comparing linear and angular commands within a task-priority scheme. For experimental validation, an instrument module (IM) was designed and integrated with a 7-DoF manipulator. Assessments, simulations, and experimental validations demonstrated the effectiveness of the proposed framework. The experiments involved several tasks: trajectory tracking of the IM tip along multiple paths with varying priorities for linear and angular teleoperation commands, pushing a ball along predefined paths on a silicon board, following a pattern on a pegboard, and guiding the continuum tip through rings on a ring board using a standard surgical kit.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06035</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06035</guid>
      <pubDate>Sun, 08 Dec 2024 19:18:56 GMT</pubDate>
      <author>Ehsan Nasiri</author>
      <enclosure url="https://arxiv.org/pdf/2412.06035.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Digital Modeling of Massage Techniques and Reproduction by Robotic Arms</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05940.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05940&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05940&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yuan Xu&lt;/p&gt; &lt;p&gt;This paper explores the digital modeling and robotic reproduction of traditional Chinese medicine (TCM) massage techniques. We adopt an adaptive admittance control algorithm to optimize force and position control, ensuring safety and comfort. The paper analyzes key TCM techniques from kinematic and dynamic perspectives, and designs robotic systems to reproduce these massage techniques. The results demonstrate that the robot successfully mimics the characteristics of TCM massage, providing a foundation for integrating traditional therapy with modern robotics and expanding assistive therapy applications.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.05940</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.05940</guid>
      <pubDate>Sun, 08 Dec 2024 13:46:25 GMT</pubDate>
      <author>Yuan Xu</author>
      <enclosure url="https://arxiv.org/pdf/2412.05940.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>AC-LIO: Towards Asymptotic and Consistent Convergence in LiDAR-Inertial Odometry</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05873.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05873&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05873&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Tianxiang Zhang&lt;/p&gt; &lt;p&gt;Existing LiDAR-Inertial Odometry (LIO) frameworks typically utilize prior state trajectories derived from IMU integration to compensate for the motion distortion within LiDAR frames, and demonstrate outstanding accuracy and stability in regular low-speed and smooth scenes. However, in high-speed or intense motion scenarios, the residual distortion may increase due to the limitation of IMU&#39;s accuracy and frequency, which will degrade the consistency between the LiDAR frame with its represented geometric environment, leading pointcloud registration to fall into local optima and consequently increasing the drift in long-time and large-scale localization. To address the issue, we propose a novel asymptotically and consistently converging LIO framework called AC-LIO. First, during the iterative state estimation, we backwards propagate the update term based on the prior state chain, and asymptotically compensate the residual distortion before next iteration. Second, considering the weak correlation between the initial error and motion distortion of current frame, we propose a convergence criteria based on pointcloud constraints to control the back propagation. The approach of guiding the asymptotic distortion compensation based on convergence criteria can promote the consistent convergence of pointcloud registration and increase the accuracy and robustness of LIO. Experiments show that our AC-LIO framework, compared to other state-of-the-art frameworks, effectively promotes consistent convergence in state estimation and further improves the accuracy of long-time and large-scale localization and mapping.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.05873</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.05873</guid>
      <pubDate>Sun, 08 Dec 2024 09:37:32 GMT</pubDate>
      <author>Tianxiang Zhang</author>
      <enclosure url="https://arxiv.org/pdf/2412.05873.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>DiTer++: Diverse Terrain and Multi-modal Dataset for Multi-Robot SLAM in Multi-session Environments</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05839.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05839&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05839&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Juwon Kim&lt;/p&gt; &lt;p&gt;We encounter large-scale environments where both structured and unstructured spaces coexist, such as on campuses. In this environment, lighting conditions and dynamic objects change constantly. To tackle the challenges of large-scale mapping under such conditions, we introduce DiTer++, a diverse terrain and multi-modal dataset designed for multi-robot SLAM in multi-session environments. According to our datasets&#39; scenarios, Agent-A and Agent-B scan the area designated for efficient large-scale mapping day and night, respectively. Also, we utilize legged robots for terrain-agnostic traversing. To generate the ground-truth of each robot, we first build the survey-grade prior map. Then, we remove the dynamic objects and outliers from the prior map and extract the trajectory through scan-to-map matching. Our dataset and supplement materials are available at https://sites.google.com/view/diter-plusplus/.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.05839</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.05839</guid>
      <pubDate>Sun, 08 Dec 2024 07:21:21 GMT</pubDate>
      <author>Juwon Kim</author>
      <enclosure url="https://arxiv.org/pdf/2412.05839.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05789.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05789&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05789&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Pengzhen Ren&lt;/p&gt; &lt;p&gt;Realizing scaling laws in embodied AI has become a focus. However, previous work has been scattered across diverse simulation platforms, with assets and models lacking unified interfaces, which has led to inefficiencies in research. To address this, we introduce InfiniteWorld, a unified and scalable simulator for general vision-language robot interaction built on Nvidia Isaac Sim. InfiniteWorld encompasses a comprehensive set of physics asset construction methods and generalized free robot interaction benchmarks. Specifically, we first built a unified and scalable simulation framework for embodied learning that integrates a series of improvements in generation-driven 3D asset construction, Real2Sim, automated annotation framework, and unified 3D asset processing. This framework provides a unified and scalable platform for robot interaction and learning. In addition, to simulate realistic robot interaction, we build four new general benchmarks, including scene graph collaborative exploration and open-world social mobile manipulation. The former is often overlooked as an important task for robots to explore the environment and build scene knowledge, while the latter simulates robot interaction tasks with different levels of knowledge agents based on the former. They can more comprehensively evaluate the embodied agent&#39;s capabilities in environmental understanding, task planning and execution, and intelligent interaction. We hope that this work can provide the community with a systematic asset interface, alleviate the dilemma of the lack of high-quality assets, and provide a more comprehensive evaluation of robot interactions.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.05789</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.05789</guid>
      <pubDate>Sun, 08 Dec 2024 02:59:04 GMT</pubDate>
      <author>Pengzhen Ren</author>
      <enclosure url="https://arxiv.org/pdf/2412.05789.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Asymptotically Optimal Sampling-Based Path Planning Using Bidirectional Guidance Heuristic</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05754.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05754&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05754&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yi Wang&lt;/p&gt; &lt;p&gt;This paper introduces Bidirectional Guidance Informed Trees (BIGIT*),~a new asymptotically optimal sampling-based motion planning algorithm. Capitalizing on the strengths of \emph{meet-in-the-middle} property in bidirectional heuristic search with a new lazy strategy, and uniform-cost search, BIGIT* constructs an implicitly bidirectional preliminary motion tree on an implicit random geometric graph (RGG). This efficiently tightens the informed search region, serving as an admissible and accurate bidirectional guidance heuristic. This heuristic is subsequently utilized to guide a bidirectional heuristic search in finding a valid path on the given RGG. Experiments show that BIGIT* outperforms the existing informed sampling-based motion planners both in faster finding an initial solution and converging to the optimum on simulated abstract problems in $\mathbb{R}^{16}$. Practical drone flight path planning tasks across a campus also verify our results.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.05754</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.05754</guid>
      <pubDate>Sat, 07 Dec 2024 21:53:53 GMT</pubDate>
      <author>Yi Wang</author>
      <enclosure url="https://arxiv.org/pdf/2412.05754.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Constrained Control for Autonomous Spacecraft Rendezvous: Learning-Based Time Shift Governor</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.05748.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05748&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.05748&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Taehyeun Kim&lt;/p&gt; &lt;p&gt;This paper develops a Time Shift Governor (TSG)-based control scheme to enforce constraints during rendezvous and docking (RD) missions in the setting of the Two-Body problem. As an add-on scheme to the nominal closed-loop system, the TSG generates a time-shifted Chief spacecraft trajectory as a target reference for the Deputy spacecraft. This modification of the commanded reference trajectory ensures that constraints are enforced while the time shift is reduced to zero to effect the rendezvous. Our approach to TSG implementation integrates an LSTM neural network which approximates the time shift parameter as a function of a sequence of past Deputy and Chief spacecraft states. This LSTM neural network is trained offline from simulation data. We report simulation results for RD missions in the Low Earth Orbit (LEO

github-actions · 2024-12-11T05:13:35Z

Successfully generated as following:

http://localhost:1200/papers/arxiv/cs.RO - Success ✔️

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Robotics</title>
    <link>https://papers.cool/arxiv/cs.RO</link>
    <atom:link href="http://localhost:1200/papers/arxiv/cs.RO" rel="self" type="application/rss+xml"></atom:link>
    <description>Robotics - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>[email protected] (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Wed, 11 Dec 2024 05:13:34 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07773.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07773&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07773&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Chenhao Lu&lt;/p&gt; &lt;p&gt;Humanoid robots require both robust lower-body locomotion and precise upper-body manipulation. While recent Reinforcement Learning (RL) approaches provide whole-body loco-manipulation policies, they lack precise manipulation with high DoF arms. In this paper, we propose decoupling upper-body control from locomotion, using inverse kinematics (IK) and motion retargeting for precise manipulation, while RL focuses on robust lower-body locomotion. We introduce PMP (Predictive Motion Priors), trained with Conditional Variational Autoencoder (CVAE) to effectively represent upper-body motions. The locomotion policy is trained conditioned on this upper-body motion representation, ensuring that the system remains robust with both manipulation and locomotion. We show that CVAE features are crucial for stability and robustness, and significantly outperforms RL-based whole-body control in precise manipulation. With precise upper-body motion and robust lower-body locomotion control, operators can remotely control the humanoid to walk around and explore different environments, while performing diverse manipulation tasks.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07773</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07773</guid>
      <pubDate>Tue, 10 Dec 2024 18:59:50 GMT</pubDate>
      <author>Chenhao Lu</author>
      <enclosure url="https://arxiv.org/pdf/2412.07773.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>SAT: Spatial Aptitude Training for Multimodal Language Models</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07755.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07755&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07755&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Arijit Ray&lt;/p&gt; &lt;p&gt;Spatial perception is a fundamental component of intelligence. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only test for static spatial reasoning, such as categorizing the relative positions of objects. Meanwhile, real-world deployment requires dynamic capabilities like perspective-taking and egocentric action recognition. As a roadmap to improving spatial intelligence, we introduce SAT, Spatial Aptitude Training, which goes beyond static relative object position questions to the more dynamic tasks. SAT contains 218K question-answer pairs for 22K synthetic scenes across a training and testing set. Generated using a photo-realistic physics engine, our dataset can be arbitrarily scaled and easily extended to new actions, scenes, and 3D assets. We find that even MLMs that perform relatively well on static questions struggle to accurately answer dynamic spatial questions. Further, we show that SAT instruction-tuning data improves not only dynamic spatial reasoning on SAT, but also zero-shot performance on existing real-image spatial benchmarks: $23\%$ on CVBench, $8\%$ on the harder BLINK benchmark, and $18\%$ on VSR. When instruction-tuned on SAT, our 13B model matches larger proprietary MLMs like GPT4-V and Gemini-3-1.0 in spatial reasoning. Our data/code is available at http://arijitray1993.github.io/SAT/ .&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07755</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07755</guid>
      <pubDate>Tue, 10 Dec 2024 18:52:45 GMT</pubDate>
      <author>Arijit Ray</author>
      <enclosure url="https://arxiv.org/pdf/2412.07755.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07746.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07746&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07746&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Ziqi Lu&lt;/p&gt; &lt;p&gt;Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks. However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained models still struggle to generalize to many challenging circumstances, such as limited view overlap or low lighting. To address this, we propose LoRA3D, an efficient self-calibration pipeline to $\textit{specialize}$ the pre-trained models to target scenes using their own multi-view predictions. Taking sparse RGB images as input, we leverage robust optimization techniques to refine multi-view predictions and align them into a global coordinate frame. In particular, we incorporate prediction confidence into the geometric optimization process, automatically re-weighting the confidence to better reflect point estimation accuracy. We use the calibrated confidence to generate high-quality pseudo labels for the calibrating views and use low-rank adaptation (LoRA) to fine-tune the models on the pseudo-labeled data. Our method does not require any external priors or manual labels. It completes the self-calibration process on a $\textbf{single standard GPU within just 5 minutes}$. Each low-rank adapter requires only $\textbf{18MB}$ of storage. We evaluated our method on $\textbf{more than 160 scenes}$ from the Replica, TUM and Waymo Open datasets, achieving up to $\textbf{88% performance improvement}$ on 3D reconstruction, multi-view pose estimation and novel-view rendering.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07746</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07746</guid>
      <pubDate>Tue, 10 Dec 2024 18:45:04 GMT</pubDate>
      <author>Ziqi Lu</author>
      <enclosure url="https://arxiv.org/pdf/2412.07746.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Optimizing Sensor Redundancy in Sequential Decision-Making Problems</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07686.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07686&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07686&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jonas Nüßlein&lt;/p&gt; &lt;p&gt;Reinforcement Learning (RL) policies are designed to predict actions based on current observations to maximize cumulative future rewards. In real-world applications (i.e., non-simulated environments), sensors are essential for measuring the current state and providing the observations on which RL policies rely to make decisions. A significant challenge in deploying RL policies in real-world scenarios is handling sensor dropouts, which can result from hardware malfunctions, physical damage, or environmental factors like dust on a camera lens. A common strategy to mitigate this issue is the use of backup sensors, though this comes with added costs. This paper explores the optimization of backup sensor configurations to maximize expected returns while keeping costs below a specified threshold, C. Our approach uses a second-order approximation of expected returns and includes penalties for exceeding cost constraints. We then optimize this quadratic program using Tabu Search, a meta-heuristic algorithm. The approach is evaluated across eight OpenAI Gym environments and a custom Unity-based robotic environment (RobotArmGrasping). Empirical results demonstrate that our quadratic program effectively approximates real expected returns, facilitating the identification of optimal sensor configurations.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07686</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07686</guid>
      <pubDate>Tue, 10 Dec 2024 17:20:44 GMT</pubDate>
      <author>Jonas Nüßlein</author>
      <enclosure url="https://arxiv.org/pdf/2412.07686.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>RRT-GPMP2: A Motion Planner for Mobile Robots in Complex Maze Environments</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07683.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07683&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07683&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jiawei Meng&lt;/p&gt; &lt;p&gt;With the development of science and technology, mobile robots are playing a significant important role in the new round of world revolution. Further, mobile robots might assist or replace human beings in a great number of areas. To increase the degree of automation for mobile robots, advanced motion planners need to be integrated into them to cope with various environments. Complex maze environments are common in the potential application scenarios of different mobile robots. This article proposes a novel motion planner named the rapidly exploring random tree based Gaussian process motion planner 2, which aims to tackle the motion planning problem for mobile robots in complex maze environments. To be more specific, the proposed motion planner successfully combines the advantages of a trajectory optimisation motion planning algorithm named the Gaussian process motion planner 2 and a sampling-based motion planning algorithm named the rapidly exploring random tree. To validate the performance and practicability of the proposed motion planner, we have tested it in several simulations in the Matrix laboratory and applied it on a marine mobile robot in a virtual scenario in the Robotic operating system.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07683</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07683</guid>
      <pubDate>Tue, 10 Dec 2024 17:16:57 GMT</pubDate>
      <author>Jiawei Meng</author>
      <enclosure url="https://arxiv.org/pdf/2412.07683.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Dynamic Obstacle Avoidance of Unmanned Surface Vehicles in Maritime Environments Using Gaussian Processes Based Motion Planning</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07664.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07664&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07664&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jiawei Meng&lt;/p&gt; &lt;p&gt;During recent years, unmanned surface vehicles are extensively utilised in a variety of maritime applications such as the exploration of unknown areas, autonomous transportation, offshore patrol and others. In such maritime applications, unmanned surface vehicles executing relevant missions that might collide with potential static obstacles such as islands and reefs and dynamic obstacles such as other moving unmanned surface vehicles. To successfully accomplish these missions, motion planning algorithms that can generate smooth and collision-free trajectories to avoid both these static and dynamic obstacles in an efficient manner are essential. In this article, we propose a novel motion planning algorithm named the Dynamic Gaussian process motion planner 2, which successfully extends the application scope of the Gaussian process motion planner 2 into complex and dynamic environments with both static and dynamic obstacles. First, we introduce an approach to generate safe areas for dynamic obstacles using modified multivariate Gaussian distributions. Second, we introduce an approach to integrate real-time status information of dynamic obstacles into the modified multivariate Gaussian distributions. Therefore, the multivariate Gaussian distributions with real-time statuses of dynamic obstacles can be innovatively added into the optimisation process of factor graph to generate an optimised trajectory. The proposed Dynamic Gaussian process motion planner 2 algorithm has been validated in a series of benchmark simulations in the Matrix laboratory and a dynamic obstacle avoidance mission in a high-fidelity maritime environment in the Robotic operating system to demonstrate its functionality and practicability.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07664</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07664</guid>
      <pubDate>Tue, 10 Dec 2024 16:50:39 GMT</pubDate>
      <author>Jiawei Meng</author>
      <enclosure url="https://arxiv.org/pdf/2412.07664.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Bayesian Data Augmentation and Training for Perception DNN in Autonomous Aerial Vehicles</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07655.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07655&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07655&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Ashik E Rasul&lt;/p&gt; &lt;p&gt;Learning-based solutions have enabled incredible capabilities for autonomous systems. Autonomous vehicles, both aerial and ground, rely on DNN for various integral tasks, including perception. The efficacy of supervised learning solutions hinges on the quality of the training data. Discrepancies between training data and operating conditions result in faults that can lead to catastrophic incidents. However, collecting vast amounts of context-sensitive data, with broad coverage of possible operating environments, is prohibitively difficult. Synthetic data generation techniques for DNN allow for the easy exploration of diverse scenarios. However, synthetic data generation solutions for aerial vehicles are still lacking. This work presents a data augmentation framework for aerial vehicle&#39;s perception training, leveraging photorealistic simulation integrated with high-fidelity vehicle dynamics. Safe landing is a crucial challenge in the development of autonomous air taxis, therefore, landing maneuver is chosen as the focus of this work. With repeated simulations of landing in varying scenarios we assess the landing performance of the VTOL type UAV and gather valuable data. The landing performance is used as the objective function to optimize the DNN through retraining. Given the high computational cost of DNN retraining, we incorporated Bayesian Optimization in our framework that systematically explores the data augmentation parameter space to retrain the best-performing models. The framework allowed us to identify high-performing data augmentation parameters that are consistently effective across different landing scenarios. Utilizing the capabilities of this data augmentation framework, we obtained a robust perception model. The model consistently improved the perception-based landing success rate by at least 20% under different lighting and weather conditions.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07655</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07655</guid>
      <pubDate>Tue, 10 Dec 2024 16:41:19 GMT</pubDate>
      <author>Ashik E Rasul</author>
      <enclosure url="https://arxiv.org/pdf/2412.07655.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>POMDP-Based Trajectory Planning for On-Ramp Highway Merging</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07567.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07567&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07567&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Adam Kollarčík&lt;/p&gt; &lt;p&gt;This paper addresses the trajectory planning problem for automated vehicle on-ramp highway merging. To tackle this challenge, we extend our previous work on trajectory planning at unsignalized intersections using Partially Observable Markov Decision Processes (POMDPs). The method utilizes the Adaptive Belief Tree (ABT) algorithm, an approximate sampling-based approach to solve POMDPs efficiently. We outline the POMDP formulation process, beginning with discretizing the highway topology to reduce problem complexity. Additionally, we describe the dynamics and measurement models used to predict future states and establish the relationship between available noisy measurements and predictions. Building on our previous work, the dynamics model is expanded to account for lateral movements necessary for lane changes during the merging process. We also define the reward function, which serves as the primary mechanism for specifying the desired behavior of the automated vehicle, combining multiple goals such as avoiding collisions or maintaining appropriate velocity. Our simulation results, conducted on three scenarios based on real-life traffic data from German highways, demonstrate the method&#39;s ability to generate safe, collision-free, and efficient merging trajectories. This work shows the versatility of this POMDP-based approach in tackling various automated driving problems.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07567</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07567</guid>
      <pubDate>Tue, 10 Dec 2024 14:57:35 GMT</pubDate>
      <author>Adam Kollarčík</author>
      <enclosure url="https://arxiv.org/pdf/2412.07567.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Optimization-Driven Design of Monolithic Soft-Rigid Grippers</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07556.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07556&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07556&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Pierluigi Mansueto&lt;/p&gt; &lt;p&gt;Sim-to-real transfer remains a significant challenge in soft robotics due to the unpredictability introduced by common manufacturing processes such as 3D printing and molding. These processes often result in deviations from simulated designs, requiring multiple prototypes before achieving a functional system. In this study, we propose a novel methodology to address these limitations by combining advanced rapid prototyping techniques and an efficient optimization strategy. Firstly, we employ rapid prototyping methods typically used for rigid structures, leveraging their precision to fabricate compliant components with reduced manufacturing errors. Secondly, our optimization framework minimizes the need for extensive prototyping, significantly reducing the iterative design process. The methodology enables the identification of stiffness parameters that are more practical and achievable within current manufacturing capabilities. The proposed approach demonstrates a substantial improvement in the efficiency of prototype development while maintaining the desired performance characteristics. This work represents a step forward in bridging the sim-to-real gap in soft robotics, paving the way towards a faster and more reliable deployment of soft robotic systems.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07556</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07556</guid>
      <pubDate>Tue, 10 Dec 2024 14:47:09 GMT</pubDate>
      <author>Pierluigi Mansueto</author>
      <enclosure url="https://arxiv.org/pdf/2412.07556.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07544.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07544&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07544&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Amin Abyaneh&lt;/p&gt; &lt;p&gt;Imitation learning is a data-driven approach to learning policies from expert behavior, but it is prone to unreliable outcomes in out-of-sample (OOS) regions. While previous research relying on stable dynamical systems guarantees convergence to a desired state, it often overlooks transient behavior. We propose a framework for learning policies using modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. By leveraging recurrent equilibrium networks and coupling layers, the policy structure guarantees contractivity for any parameter choice, which facilitates unconstrained optimization. Furthermore, we provide theoretical upper bounds for worst-case and expected loss terms, rigorously establishing the reliability of our method in deployment. Empirically, we demonstrate substantial OOS performance improvements in robotics manipulation and navigation tasks in simulation.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07544</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07544</guid>
      <pubDate>Tue, 10 Dec 2024 14:28:18 GMT</pubDate>
      <author>Amin Abyaneh</author>
      <enclosure url="https://arxiv.org/pdf/2412.07544.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>A Real-time Degeneracy Sensing and Compensation Method for Enhanced LiDAR SLAM</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07513.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07513&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07513&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Zongbo Liao&lt;/p&gt; &lt;p&gt;LiDAR is widely used in Simultaneous Localization and Mapping (SLAM) and autonomous driving. The LiDAR odometry is of great importance in multi-sensor fusion. However, in some unstructured environments, the point cloud registration cannot constrain the poses of the LiDAR due to its sparse geometric features, which leads to the degeneracy of multi-sensor fusion accuracy. To address this problem, we propose a novel real-time approach to sense and compensate for the degeneracy of LiDAR. Firstly, this paper introduces the degeneracy factor with clear meaning, which can measure the degeneracy of LiDAR. Then, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering method adaptively perceives the degeneracy with better environmental generalization. Finally, the degeneracy perception results are utilized to fuse LiDAR and IMU, thus effectively resisting degeneracy effects. Experiments on our dataset show the method&#39;s high accuracy and robustness and validate our algorithm&#39;s adaptability to different environments and LiDAR scanning modalities.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07513</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07513</guid>
      <pubDate>Tue, 10 Dec 2024 13:50:46 GMT</pubDate>
      <author>Zongbo Liao</author>
      <enclosure url="https://arxiv.org/pdf/2412.07513.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Ontology-driven Prompt Tuning for LLM-based Task and Motion Planning</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07493.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07493&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07493&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Muhayy Ud Din&lt;/p&gt; &lt;p&gt;Performing complex manipulation tasks in dynamic environments requires efficient Task and Motion Planning (TAMP) approaches, which combine high-level symbolic plan with low-level motion planning. Advances in Large Language Models (LLMs), such as GPT-4, are transforming task planning by offering natural language as an intuitive and flexible way to describe tasks, generate symbolic plans, and reason. However, the effectiveness of LLM-based TAMP approaches is limited due to static and template-based prompting, which struggles in adapting to dynamic environments and complex task contexts. To address these limitations, this work proposes a novel ontology-driven prompt-tuning framework that employs knowledge-based reasoning to refine and expand user prompts with task contextual reasoning and knowledge-based environment state descriptions. Integrating domain-specific knowledge into the prompt ensures semantically accurate and context-aware task plans. The proposed framework demonstrates its effectiveness by resolving semantic errors in symbolic plan generation, such as maintaining logical temporal goal ordering in scenarios involving hierarchical object placement. The proposed framework is validated through both simulation and real-world scenarios, demonstrating significant improvements over the baseline approach in terms of adaptability to dynamic environments, and the generation of semantically correct task plans.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07493</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07493</guid>
      <pubDate>Tue, 10 Dec 2024 13:18:45 GMT</pubDate>
      <author>Muhayy Ud Din</author>
      <enclosure url="https://arxiv.org/pdf/2412.07493.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Stereo Hand-Object Reconstruction for Human-to-Robot Handover</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07487.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07487&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07487&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yik Lung Pang&lt;/p&gt; &lt;p&gt;Jointly estimating hand and object shape ensures the success of the robot grasp in human-to-robot handovers. However, relying on hand-crafted prior knowledge about the geometric structure of the object fails when generalising to unseen objects, and depth sensors fail to detect transparent objects such as drinking glasses. In this work, we propose a stereo-based method for hand-object reconstruction that combines single-view reconstructions probabilistically to form a coherent stereo reconstruction. We learn 3D shape priors from a large synthetic hand-object dataset to ensure that our method is generalisable, and use RGB inputs instead of depth as RGB can better capture transparent objects. We show that our method achieves a lower object Chamfer distance compared to existing RGB based hand-object reconstruction methods on single view and stereo settings. We process the reconstructed hand-object shape with a projection-based outlier removal step and use the output to guide a human-to-robot handover pipeline with wide-baseline stereo RGB cameras. Our hand-object reconstruction enables a robot to successfully receive a diverse range of household objects from the human.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07487</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07487</guid>
      <pubDate>Tue, 10 Dec 2024 13:12:32 GMT</pubDate>
      <author>Yik Lung Pang</author>
      <enclosure url="https://arxiv.org/pdf/2412.07487.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Performance Evaluation of ROS2-DDS middleware implementations facilitating Cooperative Driving in Autonomous Vehicle</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07485.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07485&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07485&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Sumit Paul&lt;/p&gt; &lt;p&gt;In the autonomous vehicle and self-driving paradigm, cooperative perception or exchanging sensor information among vehicles over wireless communication has added a new dimension. Generally, an autonomous vehicle is a special type of robot that requires real-time, highly reliable sensor inputs due to functional safety. Autonomous vehicles are equipped with a considerable number of sensors to provide different required sensor data to make the driving decision and share with other surrounding vehicles. The inclusion of Data Distribution Service(DDS) as a communication middleware in ROS2 has proved its potential capability to be a reliable real-time distributed system. DDS comes with a scoping mechanism known as domain. Whenever a ROS2 process is initiated, it creates a DDS participant. It is important to note that there is a limit to the number of participants allowed in a single domain. The efficient handling of numerous in-vehicle sensors and their messages demands the use of multiple ROS2 nodes in a single vehicle. Additionally, in the cooperative perception paradigm, a significant number of ROS2 nodes can be required when a vehicle functions as a single ROS2 node. These ROS2 nodes cannot be part of a single domain due to DDS participant limitation; thus, different domain communication is unavoidable. Moreover, there are different vendor-specific implementations of DDS, and each vendor has their configurations, which is an inevitable communication catalyst between the ROS2 nodes. The communication between vehicles or robots or ROS2 nodes depends directly on the vendor-specific configuration, data type, data size, and the DDS implementation used as middleware; in our study, we evaluate and investigate the limitations, capabilities, and prospects of the different domain communication for various vendor-specific DDS implementations for diverse sensor data type.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07485</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07485</guid>
      <pubDate>Tue, 10 Dec 2024 13:07:26 GMT</pubDate>
      <author>Sumit Paul</author>
      <enclosure url="https://arxiv.org/pdf/2412.07485.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulation for Time-Efficient Fine-Resolution Policy Learning</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07477.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07477&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07477&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yuki Kadokawa&lt;/p&gt; &lt;p&gt;In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07477</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07477</guid>
      <pubDate>Tue, 10 Dec 2024 12:50:25 GMT</pubDate>
      <author>Yuki Kadokawa</author>
      <enclosure url="https://arxiv.org/pdf/2412.07477.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07392.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07392&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07392&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Muhayy Ud Din&lt;/p&gt; &lt;p&gt;Vision-based target tracking is crucial for unmanned surface vehicles (USVs) to perform tasks such as inspection, monitoring, and surveillance. However, real-time tracking in complex maritime environments is challenging due to dynamic camera movement, low visibility, and scale variation. Typically, object detection methods combined with filtering techniques are commonly used for tracking, but they often lack robustness, particularly in the presence of camera motion and missed detections. Although advanced tracking methods have been proposed recently, their application in maritime scenarios is limited. To address this gap, this study proposes a vision-guided object-tracking framework for USVs, integrating state-of-the-art tracking algorithms with low-level control systems to enable precise tracking in dynamic maritime environments. We benchmarked the performance of seven distinct trackers, developed using advanced deep learning techniques such as Siamese Networks and Transformers, by evaluating them on both simulated and real-world maritime datasets. In addition, we evaluated the robustness of various control algorithms in conjunction with these tracking systems. The proposed framework was validated through simulations and real-world sea experiments, demonstrating its effectiveness in handling dynamic maritime conditions. The results show that SeqTrack, a Transformer-based tracker, performed best in adverse conditions, such as dust storms. Among the control algorithms evaluated, the linear quadratic regulator controller (LQR) demonstrated the most robust and smooth control, allowing for stable tracking of the USV.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07392</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07392</guid>
      <pubDate>Tue, 10 Dec 2024 10:35:17 GMT</pubDate>
      <author>Muhayy Ud Din</author>
      <enclosure url="https://arxiv.org/pdf/2412.07392.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Virtual Reflections on a Dynamic 2D Eye Model Improve Spatial Reference Identification</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07344.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07344&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07344&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Matti Krüger&lt;/p&gt; &lt;p&gt;The visible orientation of human eyes creates some transparency about people&#39;s spatial attention and other mental states. This leads to a dual role for the eyes as a means of sensing and communication. Accordingly, artificial eye models are being explored as communication media in human-machine interaction scenarios. One challenge in the use of eye models for communication consists of resolving spatial reference ambiguities, especially for screen-based models. Here, we introduce an approach for overcoming this challenge through the introduction of reflection-like features that are contingent on artificial eye movements. We conducted a user study with 30 participants in which participants had to use spatial references provided by dynamic eye models to advance in a fast-paced group interaction task. Compared to a non-reflective eye model and a pure reflection mode, their combination in the new approach resulted in a higher identification accuracy and user experience, suggesting a synergistic benefit.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07344</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07344</guid>
      <pubDate>Tue, 10 Dec 2024 09:37:25 GMT</pubDate>
      <author>Matti Krüger</author>
      <enclosure url="https://arxiv.org/pdf/2412.07344.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07332.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07332&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07332&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Ondřej Procházka&lt;/p&gt; &lt;p&gt;This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)&#39;s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV even in cases where the deck&#39;s inclination is continuously changing. We use an MPC-based scheme to create trajectories that consider both the UAV dynamics and the predicted states of the USV up to the first derivative of position and orientation. Compared to existing approaches, our method dynamically modifies the penalization matrices to precisely follow the corresponding states with respect to the flight phase. Especially during the landing maneuver, the UAV synchronizes attitude with the USV&#39;s, allowing for fast landing on a tilted deck. Simulations show the method&#39;s reliability in various sea conditions up to Rough sea (wave height 4 m), outperforming state-of-the-art methods in landing speed and accuracy, with twice the precision on average. Finally, real-world experiments validate the simulation results, demonstrating robust landings on a moving USV, while all computations are performed in real-time onboard the UAV.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07332</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07332</guid>
      <pubDate>Tue, 10 Dec 2024 09:23:37 GMT</pubDate>
      <author>Ondřej Procházka</author>
      <enclosure url="https://arxiv.org/pdf/2412.07332.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>ArtFormer: Controllable Generation of Diverse 3D Articulated Objects</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07237.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07237&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07237&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Jiayi Su&lt;/p&gt; &lt;p&gt;This paper presents a novel framework for modeling and conditional generation of 3D articulated objects. Troubled by flexibility-quality tradeoffs, existing methods are often limited to using predefined structures or retrieving shapes from static datasets. To address these challenges, we parameterize an articulated object as a tree of tokens and employ a transformer to generate both the object&#39;s high-level geometry code and its kinematic relations. Subsequently, each sub-part&#39;s geometry is further decoded using a signed-distance-function (SDF) shape prior, facilitating the synthesis of high-quality 3D shapes. Our approach enables the generation of diverse objects with high-quality geometry and varying number of parts. Comprehensive experiments on conditional generation from text descriptions demonstrate the effectiveness and flexibility of our method.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07237</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07237</guid>
      <pubDate>Tue, 10 Dec 2024 07:00:05 GMT</pubDate>
      <author>Jiayi Su</author>
      <enclosure url="https://arxiv.org/pdf/2412.07237.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>RoboMM: All-in-One Multimodal Large Model for Robotic Manipulation</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07215.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07215&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07215&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Feng Yan&lt;/p&gt; &lt;p&gt;In recent years, robotics has advanced significantly through the integration of larger models and large-scale datasets. However, challenges remain in applying these models to 3D spatial interactions and managing data collection costs. To address these issues, we propose the multimodal robotic manipulation model, RoboMM, along with the comprehensive dataset, RoboData. RoboMM enhances 3D perception through camera parameters and occupancy supervision. Building on OpenFlamingo, it incorporates Modality-Isolation-Mask and multimodal decoder blocks, improving modality fusion and fine-grained perception. RoboData offers the complete evaluation system by integrating several well-known datasets, achieving the first fusion of multi-view images, camera parameters, depth maps, and actions, and the space alignment facilitates comprehensive learning from diverse robotic datasets. Equipped with RoboData and the unified physical space, RoboMM is the generalist policy that enables simultaneous evaluation across all tasks within multiple datasets, rather than focusing on limited selection of data or tasks. Its design significantly enhances robotic manipulation performance, increasing the average sequence length on the CALVIN from 1.7 to 3.3 and ensuring cross-embodiment capabilities, achieving state-of-the-art results across multiple datasets.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07215</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07215</guid>
      <pubDate>Tue, 10 Dec 2024 06:11:59 GMT</pubDate>
      <author>Feng Yan</author>
      <enclosure url="https://arxiv.org/pdf/2412.07215.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Crack-EdgeSAM Self-Prompting Crack Segmentation System for Edge Devices</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07205.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07205&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07205&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yingchu Wang&lt;/p&gt; &lt;p&gt;Structural health monitoring (SHM) is essential for the early detection of infrastructure defects, such as cracks in concrete bridge pier. but often faces challenges in efficiency and accuracy in complex environments. Although the Segment Anything Model (SAM) achieves excellent segmentation performance, its computational demands limit its suitability for real-time applications on edge devices. To address these challenges, this paper proposes Crack-EdgeSAM, a self-prompting crack segmentation system that integrates YOLOv8 for generating prompt boxes and a fine-tuned EdgeSAM model for crack segmentation. To ensure computational efficiency, the method employs ConvLoRA, a Parameter-Efficient Fine-Tuning (PEFT) technique, along with DiceFocalLoss to fine-tune the EdgeSAM model. Our experimental results on public datasets and the climbing robot automatic inspections demonstrate that the system achieves high segmentation accuracy and significantly enhanced inference speed compared to the most recent methods. Notably, the system processes 1024 x 1024 pixels images at 46 FPS on our PC and 8 FPS on Jetson Orin Nano.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07205</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07205</guid>
      <pubDate>Tue, 10 Dec 2024 05:50:50 GMT</pubDate>
      <author>Yingchu Wang</author>
      <enclosure url="https://arxiv.org/pdf/2412.07205.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Unified Vertex Motion Estimation for Integrated Video Stabilization and Stitching in Tractor-Trailer Wheeled Robots</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07154.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07154&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07154&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Hao Liang&lt;/p&gt; &lt;p&gt;Tractor-trailer wheeled robots need to perform comprehensive perception tasks to enhance their operations in areas such as logistics parks and long-haul transportation. The perception of these robots face three major challenges: the relative pose change between the tractor and trailer, the asynchronous vibrations between the tractor and trailer, and the significant camera parallax caused by the large size. In this paper, we propose a novel Unified Vertex Motion Video Stabilization and Stitching framework designed for unknown environments. To establish the relationship between stabilization and stitching, the proposed Unified Vertex Motion framework comprises the Stitching Motion Field, which addresses relative positional change, and the Stabilization Motion Field, which tackles asynchronous vibrations. Then, recognizing the heterogeneity of optimization functions required for stabilization and stitching, a weighted cost function approach is proposed to address the problem of camera parallax. Furthermore, this framework has been successfully implemented in real tractor-trailer wheeled robots. The proposed Unified Vertex Motion Video Stabilization and Stitching method has been thoroughly tested in various challenging scenarios, demonstrating its accuracy and practicality in real-world robot tasks.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07154</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07154</guid>
      <pubDate>Tue, 10 Dec 2024 03:22:39 GMT</pubDate>
      <author>Hao Liang</author>
      <enclosure url="https://arxiv.org/pdf/2412.07154.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>A Powered Prosthetic Hand with Vision System for Enhancing the Anthropopathic Grasp</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.07105.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07105&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.07105&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Yansong Xu&lt;/p&gt; &lt;p&gt;The anthropomorphism of grasping process significantly benefits the experience and grasping efficiency of prosthetic hand wearers. Currently, prosthetic hands controlled by signals such as brain-computer interfaces (BCI) and electromyography (EMG) face difficulties in precisely recognizing the amputees&#39; grasping gestures and executing anthropomorphic grasp processes. Although prosthetic hands equipped with vision systems enables the objects&#39; feature recognition, they lack perception of human grasping intention. Therefore, this paper explores the estimation of grasping gestures solely through visual data to accomplish anthropopathic grasping control and the determination of grasping intention within a multi-object environment. To address this, we propose the Spatial Geometry-based Gesture Mapping (SG-GM) method, which constructs gesture functions based on the geometric features of the human hand grasping processes. It&#39;s subsequently implemented on the prosthetic hand. Furthermore, we propose the Motion Trajectory Regression-based Grasping Intent Estimation (MTR-GIE) algorithm. This algorithm predicts pre-grasping object utilizing regression prediction and prior spatial segmentation estimation derived from the prosthetic hand&#39;s position and trajectory. The experiments were conducted to grasp 8 common daily objects including cup, fork, etc. The experimental results presented a similarity coefficient $R^{2}$ of grasping process of 0.911, a Root Mean Squared Error ($RMSE$) of 2.47\degree, a success rate of grasping of 95.43$\%$, and an average duration of grasping process of 3.07$\pm$0.41 s. Furthermore, grasping experiments in a multi-object environment were conducted. The average accuracy of intent estimation reached 94.35$\%$. Our methodologies offer a groundbreaking approach to enhance the prosthetic hand&#39;s functionality and provides valuable insights for future research.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.07105</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.07105</guid>
      <pubDate>Tue, 10 Dec 2024 01:45:14 GMT</pubDate>
      <author>Yansong Xu</author>
      <enclosure url="https://arxiv.org/pdf/2412.07105.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Ground Perturbation Detection via Lower-Limb Kinematic States During Locomotion</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06985.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06985&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06985&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Maria T. Tagliaferri&lt;/p&gt; &lt;p&gt;Falls during daily ambulation activities are a leading cause of injury in older adults due to delayed physiological responses to disturbances of balance. Lower-limb exoskeletons have the potential to mitigate fall incidents by detecting and reacting to perturbations before the user. Although commonly used, the standard metric for perturbation detection, whole-body angular momentum, is poorly suited for exoskeleton applications due to computational delays and additional tunings. To address this, we developed a novel ground perturbation detector using lower-limb kinematic states during locomotion. To identify perturbations, we tracked deviations in the kinematic states from their nominal steady-state trajectories. Using a data-driven approach, we further optimized our detector with an open-source ground perturbation biomechanics dataset. A pilot experimental validation with five able-bodied subjects demonstrated that our model detected ground perturbations with 97.8% accuracy and only a delay of 23.1% within the gait cycle, outperforming the benchmark by 46.8% in detection accuracy. The results of our study offer exciting promise for our detector and its potential utility to enhance the controllability of robotic assistive exoskeletons.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06985</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06985</guid>
      <pubDate>Mon, 09 Dec 2024 20:49:26 GMT</pubDate>
      <author>Maria T. Tagliaferri</author>
      <enclosure url="https://arxiv.org/pdf/2412.06985.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Collision-inclusive Manipulation Planning for Occluded Object Grasping via Compliant Robot Motions</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06983.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06983&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06983&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Kejia Ren&lt;/p&gt; &lt;p&gt;Traditional robotic manipulation mostly focuses on collision-free tasks. In practice, however, many manipulation tasks (e.g., occluded object grasping) require the robot to intentionally collide with the environment to reach a desired task configuration. By enabling compliant robot motions, collisions between the robot and the environment are allowed and can thus be exploited, but more physical uncertainties are introduced. To address collision-rich problems such as occluded object grasping while handling the involved uncertainties, we propose a collision-inclusive planning framework that can transition the robot to a desired task configuration via roughly modeled collisions absorbed by Cartesian impedance control. By strategically exploiting the environmental constraints and exploring inside a manipulation funnel formed by task repetitions, our framework can effectively reduce physical and perception uncertainties. With real-world evaluations on both single-arm and dual-arm setups, we show that our framework is able to efficiently address various realistic occluded grasping problems where a feasible grasp does not initially exist.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06983</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06983</guid>
      <pubDate>Mon, 09 Dec 2024 20:43:56 GMT</pubDate>
      <author>Kejia Ren</author>
      <enclosure url="https://arxiv.org/pdf/2412.06983.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06931.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06931&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06931&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Hoi-Yin Lee&lt;/p&gt; &lt;p&gt;The ability to wield tools was once considered exclusive to human intelligence, but it&#39;s now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert the human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot&#39;s tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06931</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06931</guid>
      <pubDate>Mon, 09 Dec 2024 19:21:05 GMT</pubDate>
      <author>Hoi-Yin Lee</author>
      <enclosure url="https://arxiv.org/pdf/2412.06931.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Haptics in Micro- and Nano-Manipulation</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06917.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06917&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06917&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Ahmet Fatih Tabak&lt;/p&gt; &lt;p&gt;One of the motivations for the development of wirelessly guided untethered magnetic devices (UMDs), such as microrobots and nanorobots, is the continuous demand to manipulate, sort, and assemble micro-objects with high level of accuracy and dexterity. UMDs can function as microgrippers or manipulators and move micro-objects with or without direct contact. In this case, the UMDs can be directly teleoperated by an operator using haptic tele-manipulation systems. The aim of this chapter is threefold: first, to provide a mathematical framework to design a scaled bilateral tele-manipulation system to achieve wireless actuation of micro-objects using magnetically-guided UMDs; second, to demonstrate closed-loop stability based on absolute stability theory; third, to provide experimental case studies performed on haptic devices to manipulate microrobots and assemble micro-objects. In this chapter, we are concerned with some fundamental concepts of electromagnetics and low-Reynolds number hydrodynamics to understand the stability and performance of haptic devices in micro- and nano-manipulation applications.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06917</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06917</guid>
      <pubDate>Mon, 09 Dec 2024 19:04:36 GMT</pubDate>
      <author>Ahmet Fatih Tabak</author>
      <enclosure url="https://arxiv.org/pdf/2412.06917.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Bio-Inspired Pneumatic Modular Actuator for Peristaltic Transport</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06823.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06823&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06823&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Brian Ye&lt;/p&gt; &lt;p&gt;While its biological significance is well-documented, its application in soft robotics, particularly for the transport of fragile and irregularly shaped objects, remains underexplored. This study presents a modular soft robotic actuator system that addresses these challenges through a scalable, adaptable, and repairable framework, offering a cost-effective solution for versatile applications. The system integrates optimized donut-shaped actuation modules and utilizes real-time pressure feedback for synchronized operation, ensuring efficient object grasping and transport without relying on intricate sensing or control algorithms. Experimental results validate the system`s ability to accommodate objects with varying geometries and material characteristics, balancing robustness with flexibility. This work advances the principles of peristaltic actuation, establishing a pathway for safely and reliably manipulating delicate materials in a range of scenarios.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06823</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06823</guid>
      <pubDate>Fri, 06 Dec 2024 05:21:15 GMT</pubDate>
      <author>Brian Ye</author>
      <enclosure url="https://arxiv.org/pdf/2412.06823.pdf" type="application/pdf"></enclosure>
    </item>
    <item>
      <title>Effect of Adaptive Communication Support on Human-AI Collaboration</title>
      <description>&lt;a href=&quot;https://arxiv.org/pdf/2412.06808.pdf&quot;&gt;[PDF]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06808&quot;&gt;[Site]&lt;/a&gt; &lt;a href=&quot;https://papers.cool/arxiv/2412.06808&quot;&gt;[Kimi]&lt;/a&gt; &lt;p&gt;&lt;b&gt;Authors:&lt;/b&gt; Shipeng Liu&lt;/p&gt; &lt;p&gt;Effective human-AI collaboration requires agents to adopt their roles and levels of support based on human needs, task requirements, and complexity. Traditional human-AI teaming often relies on a pre-determined robot communication scheme, restricting teamwork adaptability in complex tasks. Leveraging the strong communication capabilities of Large Language Models (LLMs), we propose a Human-Robot Teaming Framework with Multi-Modal Language feedback (HRT-ML), a framework designed to enhance human-robot interaction by adjusting the frequency and content of language-based feedback. The HRT-ML framework includes two core modules: a Coordinator for high-level, low-frequency strategic guidance and a Manager for task-specific, high-frequency instructions, enabling passive and active interactions with human teammates. To assess the impact of language feedback in collaborative scenarios, we conducted experiments in an enhanced Overcooked-AI game environment with varying levels of task complexity (easy, medium, hard) and feedback frequency (inactive, passive, active, superactive). Our results show that as task complexity increases relative to human capabilities, human teammates exhibited stronger preferences toward robotic agents that can offer frequent, proactive support. However, when task complexities exceed the LLM&#39;s capacity, noisy and inaccurate feedback from superactive agents can instead hinder team performance, as it requires human teammates to increase their effort to interpret and respond to the large amount of communications, with limited performance return. Our results offer a general principle for robotic agents to dynamically adjust their levels and frequencies of communication to work seamlessly with humans and achieve improved teaming performance.&lt;/p&gt; </description>
      <link>https://papers.cool/arxiv/2412.06808</link>
      <guid isPermaLink="false">https://papers.cool/arxiv/2412.06808</guid>
      <pubDate>Tue, 26 Nov 2024 00:06:47 GMT</pubDate>
      <author>Shipeng Liu</author>
      <enclosure url="https://arxiv.org/pdf/2412.06808.pdf" type="application/pdf"></enclosure>
    </item>
  </channel>
</rss>

* [update] update cool paper * [add] add author of cool paper

Muyun99 added 4 commits October 15, 2024 00:52

[update] update cool paper

8d704aa

Merge branch 'DIYgod:master' into master

d264ff5

Merge branch 'DIYgod:master' into master

0ae95a8

[add] add author of cool paper

174dbb7

github-actions bot added the Route label Dec 10, 2024

github-actions bot added the Auto: Route Test Complete Auto route test has finished on given PR label Dec 10, 2024

Muyun99 changed the title ~~add cool paper first-author infomation~~ feat(route): add cool paper first-author infomation Dec 11, 2024

TonyRL merged commit 7c5a04f into DIYgod:master Dec 11, 2024
37 of 38 checks passed

artefaritaKuniklo pushed a commit to artefaritaKuniklo/RSSHub that referenced this pull request Dec 13, 2024

feat(route): add cool paper first-author infomation (DIYgod#17857)

8222324

* [update] update cool paper * [add] add author of cool paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(route): add cool paper first-author infomation #17857

feat(route): add cool paper first-author infomation #17857

Muyun99 commented Dec 10, 2024

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024

feat(route): add cool paper first-author infomation #17857

feat(route): add cool paper first-author infomation #17857

Conversation

Muyun99 commented Dec 10, 2024

Involved Issue / 该 PR 相关 Issue

Example for the Proposed Route(s) / 路由地址示例

New RSS Route Checklist / 新 RSS 路由检查表

Note / 说明

github-actions bot commented Dec 10, 2024

github-actions bot commented Dec 11, 2024