Publikationen

Sprachgesteuerte Benutzerschnittstellen für mühelose Navigation in medizinischen Virtual-Reality-Umgebungen

Jan Hombeck, Henrik Voigt, Kai Lawonn

In verschiedenen Situationen, z. B. in klinischen Umgebungen mit sterilen Bedingungen oder wenn die Hände mit mehreren Geräten beschäftigt sind, sind herkömmliche Methoden der Navigation und Szenenanpassung unpraktisch oder sogar unmöglich. Wir erforschen eine neue Lösung, indem wir die Sprachsteuerung nutzen, um die Interaktion in virtuellen Welten zu erleichtern und den Einsatz zusätzlicher Steuerungen zu vermeiden. Dazu untersuchen wir drei Szenarien: Objektorientierung, Visualisierungsanpassung und analytische Aufgaben und evaluieren, ob natürlichsprachliche Interaktion in jedem dieser Szenarien möglich und erfolgversprechend ist. In unserer quantitativen Nutzerstudie waren die Teilnehmer in der Lage, virtuelle Umgebungen mühelos mit verbalen Anweisungen zu steuern. Dies führte zu schnellen Orientierungsanpassungen, adaptiven visuellen Hilfen und präzisen Datenanalysen. Darüber hinaus ergaben Umfragen zur Benutzerzufriedenheit und zur Benutzerfreundlichkeit ein durchweg hohes Maß an Akzeptanz und Benutzerfreundlichkeit. Zusammenfassend zeigt unsere Studie, dass die Verwendung von natürlicher Sprache eine vielversprechende Alternative zur Verbesserung der Benutzerinteraktion in virtuellen Umgebungen sein kann. Sie ermöglicht intuitive Interaktionen in virtuellen Räumen, insbesondere in Situationen, in denen herkömmliche Bedienelemente ihre Grenzen haben.

Beyond buttons: A user-centric approach to hands-free locomotion in Virtual Reality via voice commands

Jan Hombeck, Henrik Voigt, Kai Lawonn

Using speech for hands-free interaction in Virtual Reality (VR) is gaining popularity, supported by advances in natural language processing that enable accurate speech-to-text transcription and intent inference. Prior work has shown that voice-based navigation is effective when hand use is restricted or occupied, particularly when visually salient points in the scene, known as landmarks, can be identified and articulated. However, in many virtual environments, such landmarks may be absent or not easily recognizable, limiting the applicability of voice commands. To address this, we propose three landmark-free, user-centered coordinate systems to support speech-based locomotion in VR: Cartesian, cylindrical, and spherical. Each system uses a distinct encoding and interaction style. The Cartesian system employs a simple three-digit code and requires low cognitive effort. The spherical system prioritizes precision but demands higher mental effort. The cylindrical system combines elements of both, offering a balance between usability and accuracy. We evaluated these systems in a quantitative user study with 24 participants, all with backgrounds in information technology. The study assessed the systems’ effectiveness for object positioning and navigation, comparing them to teleportation, the standard non-voice locomotion method in VR. Results show that the Cartesian system enables faster and more intuitive navigation compared to the spherical system. Distinct usage patterns were observed, with users predominantly focusing on the central field of view during navigation. Our findings indicate that user-centered coordinate systems are practical to implement and present a viable alternative for speech-driven, hands-free navigation in VR.

AortaAnalyzer: Interactive, integrated CTA aorta segmentation and quantitative analysis platform

Fabienne von Deylen, Pepe Eulzer, Kai Lawonn

The diagnosis of aortic diseases could be significantly enhanced with modern advances in model-based vessel visualization, objective parameter quantification, as well as information gained through numerical blood flow simulation. Most state-of-the-art methods, however, require heavy processing and are often split across various frameworks that require setting up complex workflows, making many clinical applications unrealistic and hindering research on large datasets. We present the AortaAnalyzer, a unified, end-to-end pipeline for processing computed-tomography angiography (CTA) of the aorta, integrating a state-of-the-art 3D segmentation network (Dice 0.95+- 0.01, HD95 5.25+- 5.73 mm), interactive correction tools, automated surface extraction, robust centerline computation, inlet/outlet capping for numerical hemodynamics, and clinical metric quantification. All modules share a single GUI, use standard formats (nrrd, STL, OBJ, CSV), and propagate changes automatically, eliminating complex multi-tool workflows. We developed the framework in an iterative process based on evaluations with seven independent experts—two numerical hemodynamics researchers, two vessel visualization researchers, two cardiac surgeons, and one radiologist. The framework received high usefulness ratings and feature requests drove the addition of surface capping and extended metric measurements. To assess efficiency, we compared processing time against 3D Slicer and SimVascular. The AortaAnalyzer demonstrated increased robustness and required substantially less manual interaction and overall processing time. AortaAnalyzer supports both clinical assessment and research purposes by providing rapid visualization of the vessel morphology, reproducible diameter, volume, and landmark analysis, and accelerated pre-processing for blood-flow simulation. It is open access and serves as an extendable platform.

Proof of concept of a static approach to determine mechanical tissue properties during tumor surgery

Max Jäger, Katja Uhrhan, Christine Mucha, María Alejandra Guzmán Alfaro, Hartmut Witte

The mechanical properties of tumor tissue differ from those of healthy tissue. Therefore, surgeons palpate accessible surgical sites to determine tumor boundaries prior to resection. However, palpation is not possible during minimally invasive surgery, so instrumented palpation is required instead. This study investigates the suitability of an engineering method that combines mechanical object scanning and indentation to determine Young’s modulus of soft, tissue-like materials. To establish a defined reference, we tested our concept on silicone phantoms containing stiff tumor-like inclusions. We used a sensor consisting of a load cell connected to a rigid probe with a spherical indenter tip. Young’s modulus was calculated by measured force, indentation depth, and indenter geometry. These results were compared with those of a palpation experiment on the same specimens, conducted with surgeons. Validation results reflect the accuracy of the method. Error in estimation of Young’s modulus is: soft material 6.7%, stiff material 44.9%. Repeatability is high, with a standard deviation <7%. By scanning a phantom and creating a stiffness image, we were able to identify the location and shape of the inclusion more clearly than experienced surgeons could using manual palpation. Looking ahead, the prospect of miniaturizing the presented technique for localizing tumor boundaries during surgery seems promising.

Gradient Extrapolation for Debiased Representation Learning

Machine learning classification models trained with empirical risk minimization (ERM) often inadvertently rely on spurious correlations. When absent in the test data, these unintended associations between non-target attributes and target labels lead to poor generalization. This paper addresses this problem from a model optimization perspective and proposes a novel method, Gradient Extrapolation for Debiased Representation Learning (GERNE), designed to learn debiased representations in both known and unknown attribute training cases. GERNE uses two distinct batches with different amounts of spurious correlations and defines the target gradient as a linear extrapolation of the gradients computed from each batch’s loss. Our analysis shows that when the extrapolated gradient points toward the batch gradient with fewer spurious correlations, it effectively guides training toward learning a debiased model. GERNE serves as a general framework for debiasing, encompassing ERM and Resampling methods as special cases. We derive the theoretical upper and lower bounds of the extrapolation factor employed by GERNE. By tuning this factor, GERNE can adapt to maximize either Group-Balanced Accuracy (GBA) or Worst-Group Accuracy (WGA). We validate GERNE on five vision and one NLP benchmarks, demonstrating competitive and often superior performance compared to state-of-the-art baselines. The project page is available at: https://gerne-debias.github.io/.