next day
all days

View: session overviewtalk overview

09:00-09:50 Session Keynote2-ENGAGE: Prof. Alyn Rockwood, Boulder Graphics

Zoom Link:         Meeting ID: 825 6139 3464, Password: cgi2023


Prof. Alyn Rockwood, Chief Scientist, Boulder Graphics

Talk Title: TBD

Abstract: TBD

Bio: Alyn Rockwood is Chief Scientist at Boulder Graphics, developing 3D computer graphics. Until recently, he was Professor of Applied Mathematics and Associate Director of the Geometric Modeling and Scientific Visualization Research Center at King Abdullah University of Science & Technology (KAUST) in Saudi Arabia. Dr. Rockwood has been involved with computer graphics and related research for more than 35 years. At the pioneering graphics company Evans and Sutherland, he led a team that first achieved certification for a pilot training simulator, which allowed pilots to train completely for new aircraft on a simulator. At Silicon Graphics, Inc., he developed the method for rendering curved surfaces in real time that is integral to OpenGL today.  He was SIGGRAPH Papers Chair in 1999, Conference Chair in 2003 and SIGGRAPH Asia Papers’ Chair in 2013.  Before moving to KAUST, Dr. Rockwood held academic positions at both Arizona State University and Colorado School of Mines. He has received several teaching awards, the COFES 2007 Innovation in Technology Award, the CAD Society “Heroes of Engineering” Award. The SIGGRAPH’s Outstanding Service Award.2017 He received his PhD in applied mathematics from Cambridge University, IK

Eckhard Hitzer (International Christian University, Japan)
09:00-09:30 Session WC-3DMed-Keynote1: Special Session - 3D Medical Image Processing, Quality Enhancement and Analysis, Keynote Speaker 1: Yudong Zhang, Chair Professor, School of Computing and Mathematical Sciences, University of Leicester

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023


Title: Recent Advances in Medical Image Processing and Analysis


The medical image processing and analysis field has witnessed remarkable advancements in recent years, largely attributed to the incredible potential of artificial intelligence and deep learning theories and techniques. This talk aims to provide an overview of our group’s advancements in artificial intelligence in medical image processing and analysis. The talk will begin with an introduction to deep learning and its vital variants, such as convolutional neural networks, advanced pooling networks, graph convolutional networks, attention neural networks, weakly supervised networks, vision transformers, etc. We will explore how these neural networks can be tailored and applied to various medical imaging modalities, including magnetic resonance imaging, computed tomography, and histopathology slides. Furthermore, we will discuss the challenges faced in medical image processing and analysis, such as limited labeled data, class imbalance, and interpretability, and delve into the theories and techniques employed to mitigate these issues.


Prof. Yudong Zhang is a Chair Professor at the School of Computing and Mathematical Sciences, University of Leicester, UK. His research interests include deep learning and medical image analysis. He is the Fellow of IET, Fellow of EAI, and Fellow of BCS. He is the Senior Member of IEEE and ACM. He is the Distinguished Speaker of ACM. He was 2019, 2021 & 2022 recipient of Clarivate Highly Cited Researcher. He has (co)authored over 400 peer-reviewed articles. There are more than 60 ESI Highly Cited Papers and 6 ESI Hot Papers in his (co)authored publications. His citation reached 27567 in Google Scholar (h-index 91). He is the editor of Neural Networks, IEEE TITS, IEEE TCSVT, IEEE JBHI, etc. He has conducted many successful industrial projects and academic grants from NIH, Royal Society, British Council, GCRF, EPSRC, MRC, BBSRC, Hope, and NSFC. He has served as (Co-)Chair for more than 60 international conferences (including more than 20 IEEE or ACM conferences). More than 70 news presses have reported his research outputs, such as Reuters, BBC, Telegraph, Mirror, Physics World, UK Today News, etc.

Xiaohong Liu (Shanghai Jiao Tong University, China)
09:30-10:00 Session WC-3DMed-Keynote2: Special Session - 3D Medical Image Processing, Quality Enhancement and Analysis, Keynote Speaker 2: Lichi Zhang, Associate Professor, School of Biomedical Engineering, Shanghai Jiao Tong University

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023

Title: Intelligent Medical Image Analysis and Computer-aided Diagnosis

Abstract: Medical image analysis and computer-aided diagnosis are highly-demanded in nowadays, which can assist doctors in alleviating the diagnosis burden and resolving the subjectivity issues in the interpretation of medical image. Recently there have been significant advancements in these fields with the integration of deep learning techniques, which have been developed rapidly over the last decade. However, there are several challenges that need to be addressed in the actual clinical scenarios, including the high variability and complex anatomical structures of the medical images, lack of interpretability in deep learning model, and limitations in data collection for model training. This talk will introduce our recent research in the field of medical image analysis and computer-aided diagnosis, which consists of three parts including the brain MR image processing, computer-aided diagnosis for knee osteoarthritis (OA) disease and TCT histopathology image processing and high-throughput screening. I will also present the methods relevant to these topics such as image segmentation, object detection, image reconstruction and etc, and how they can overcome the aforementioned challenges.


Lichi Zhang is an Associate Professor at the School of Biomedical Engineering, Shanghai Jiao Tong University. He received a Ph.D. degree in computer science from the University of York, UK, and a BS degree in network engineering from Beijing University of Posts and Telecommunications, China. From 2014 to 2017, he was a postdoc researcher at the University of North Carolina at Chapel Hill, US, and Shanghai Jiao Tong University, China. He was selected for Shanghai Pujiang Talent Program, and has also hosted and participated in the National Natural Science Foundation of China Grants, National Key Research and Development Program of China Grant and etc. His research interests include medical image analysis, computer-aided diagnosis and computer vision. He has published more than 90 academic papers in Medical Image Analysis, IEEE TMI, Pattern Recognition, NPJ Digital Medicine, MICCAI and other journals and conferences renowned in the fields of medical image analysis and computer vision. He is also serving as the Junior Editor of Aging and Disease Journal, and Guest Associate Editor of Frontiers in Neuroscience.

Xiaohong Liu (Shanghai Jiao Tong University, China)
09:50-10:30 Session WC1-ENGAGE1: Empowering Novel Geometric Algebra for Graphics & Engineering Workshop 1

Zoom Link:         Meeting ID: 825 6139 3464, Password: cgi2023

Eckhard Hitzer (International Christian University, Japan)
Clément Chomicki (Université Gustave Eiffel, LIGM, France)
Stéphane Breuils (LAMA, France)
Venceslas Biri (Université Gustave Eiffel, LIGM, France)
Vincent Nozick (Université Gustave Eiffel, LIGM, France)
Intersection of conic sections using geometric algebra

ABSTRACT. Conic sections are extensively encountered in a wide range of disciplines, including optics, physics, and various other fields. Consequently, the geometric algebra community is actively engaged in developing frameworks that enable efficient support and manipulation of conic sections. Conic-conic intersection objects are known and supported by algebras specialized in conic sections representation, but there is yet no elegant formula to extract the intersection points from them. This paper proposes a method for point extraction from an conic intersection through the concept of pencils. It will be based on QC2GA, the 2D version of QCGA, that we also prove to be equivalent to GAC.

Jiyi Zhang (Nantong University, China)
Tianzi Wei (5 School of Business, Nantong Institute of Technology, Nantong, China,, China)
Fan Yang (1 School of Geographical Science, Nantong University, Nantong, China, China)
Yingying Wei (1 School of Geographical Science, Nantong University, Nantong, China, China)
Jingyu Wang (1 School of Geographical Science, Nantong University, Nantong, China, China)
A multi-dimensional unified concavity and convexity detection method based on geometric algebra

ABSTRACT. The detection of concavity and convexity of vertices and edges of three-dimensional (3D) geometric objects is a classic problem in the field of computer graphics. As the foundation of other related graphics algorithms and operations, scholars have proposed many algorithms for determining the concavity and convexity of vertices and edges. However, existing concavity and convexity detection algorithms mainly focus on vertices and lack research on concavity and convexity detection methods for edges of 3D geometric objects. On the other hand, existing algorithms often require different detection methods when dealing with two-dimensional (2D) planar geometric objects and 3D spatial geometric objects. This means that the algorithm structure of those algorithms becomes very complex when dealing with concavity and convexity judgments involving both planar polygon vertices and 3D geometric object edges. To solve the above problems, this paper design a multi-dimensional unified concave convex detection algorithm framework for geometric objects taking advantages of geometric algebra in multi-dimensional unified expression and calculation. The method proposed in this article can not only achieve concavity and convexity detection of planar polygon vertices and 3D geometric object vertices based on unified rules, but also further achieve concavity and convexity detection of 3D geometric object edges on this basis. By unifying the framework and detection rules of different dimensional geometric object concavity detection algorithms, the complexity of synchronous detection algorithms for planar polygon vertices and 3D geometric object vertices and edges concavity can be effectively simplified.

10:00-10:30 Session WC2-3DMed1

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023



Xiaohong Liu (Shanghai Jiao Tong University, China)
Qiuhui Yang (School of Faculty of Applied Sciences, Macao Polytechnic University, China)
Hao Chen (Jiangsu JITRI Sioux Technologies Co., Ltd., China)
Mingfeng Jiang (School of Computer Sciecnes and Techonology, Zhejiang Sci-Tech University, China)
Mingwei Wang (Department of Dardiovascular Medicine, Affiliated Hospital of Hangzhou Normal University, China)
Jiong Zhang (Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, China)
Yue Sun (School of Faculty of Applied Sciences, Macao Polytechnic University, China)
Tao Tan (School of Faculty of Applied Sciences, Macao Polytechnic University, China)
A hybrid supervised fusion deep learning framework for microscope multi-focus images

ABSTRACT. The quality of multi-focus microscopic image fusion hinges upon the precision of the image registration technology. However, algorithms for registration tailored specifically for multifocal microscopic images are lacking. Due to the presence of fuzzy regions and weak textures of multi-focus microscope images, the registration of patches is suboptimal. For these problems, this paper formulates a hybrid supervised deep learning model. It can improve the accuracy of registration and fusion. The generalization ability of the model to the actual deformation field enhance by the artificial deformation field. A step of patch movement simulation is employed to blur the multi-focus microscopic images and make synthetic flow, thus emulating distinct fuzzy regions in the two images to be registered, consequently enhancing the model's generalization ability. The experiments demonstrate that our proposed approach is superior to the existing registration algorithms and improves the accuracy of image fusion.

Yu Feng (East China Normal University, China)
Tai Ma (East China Normal University, China)
Hao Zeng (East China Normal University, China)
Zhengke Xu (East China Normal University, China)
Suwei Zhang (East China Normal University, China)
Ying Wen (East China Normal University, China)
ScaleNet: Rethinking Feature Interaction From A Scale-wise Perspective For Medical Image Segmentation

ABSTRACT. Recently, vision transformers have become outstanding segmentation structures for their remarkable global modeling capability. In current transformer-based models for medical image segmentation, convolutional layers are often replaced by transformers, or transformers are added to the deepest layer of the encoder to learn the global context. However, for the extracted multi-scale feature information, most existing methods tend to ignore the multi-scale dependencies, which leads to inadequate feature learning and fails to produce rich feature representations. In this paper, we propose ScaleNet from the perspective of feature interaction at different scales that can alleviate mentioned problems. Specifically, our approach consists of two multi-scale feature interaction modules: the spatial scale interaction (SSI) and the channel scale interaction (CSI). SSI uses a transformer to aggregate patches from different scale features to enhance the feature representations at the spatial scale. CSI uses a 1D convolutional layer and a fully connected layer to perform a global fusion of multi-level features at the channel scale. The combination of CSI and SSI enables ScaleNet to emphasize multi-scale dependencies and effectively resolve complex scale variations. Extensive experiments on multi-organ (Synapse, ACDC) and skin lesion segmentation tasks (ISIC 2018) present the superior performance of ScaleNet compared to previous works.

10:30-11:00Coffee Break
11:00-12:30 Session WC3-ENGAGE2: Empowering Novel Geometric Algebra for Graphics & Engineering Workshop 2

Zoom Link:         Meeting ID: 825 6139 3464, Password: cgi2023

Dmitry Shirokov (HSE University, Russia)
Kapila Attele (Chicago State University, United States)
Foundations of Geometric Algebras

ABSTRACT. In a foundational expository paper geometric algebra objects are rigorously established paying particular attention to complications made the presence of null vectors.

Jian Wang (Nanjing Normal University, China)
Ziqiang Wang (Nanjing Normal University, China)
Han Wang (Nanjing Normal University, China)
Wen Luo (Nanjing Normal University, China)
Linwang Yuan (Nanjing Normal University, China)
Guonian Lü (Nanjing Normal University, China)
Zhaoyuan Yu (Nanjing Normal University, China)
Large Language Model for Geometric Algebra : A preliminary attempt

ABSTRACT. Considering geometric algebra’s status as the unified language of mathematics, physics, and engineering in the 21st century, which coincides with the era of arti-ficial intelligence, the utilization of a Large Language Model (LLM) can greatly benefit the learning and application of geometric algebra. This article aims to ex-plore the fusion of geometric algebra and a large-scale language model from mul-tiple perspectives, including concepts, operators, equations, and computer pro-gramming. This study developed a representative system based on the ggml-ggml-nous-gpt4-vicuna-13b model. Collecting 20,711 papers and books from sources such as arXiv and AACA extensively explores key terms in geometric algebra, such as affine space, homogeneous space, and conformal space, estab-lishing a repository of geometric algebra knowledge. Through the process of data extraction and transformation, mathematical and code languages are converted in-to text suitable for model learning, thus establishing a knowledge base for geo-metric algebra. Furthermore, this system has the capability of iterative refinement, strengthening its understanding and reasoning of geometric algebra knowledge. It has accomplished the textual summarization of research content, methods, inno-vations, and conclusions. It offers interactive question-and-answer sessions about geometric algebra concepts, such as vectors, points, lines, planes, stations, projec-tions, and operations, including inner and outer products. Additionally, it facili-tates the development of tailored learning plans for students from diverse fields to acquire knowledge of geometric algebra in their respective domains.

Ed Saribatir (Intelligent Computing and Systems Lab, Australian Artificial Intelligence Institute, University of Technology Sydney, Australia)
Niko Zurstrassen (RWTH Aachen University, Germany)
Dietmar Hildenbrand (Technische Universitaet Darmstadt, Germany)
Florian Stock (Technische Universitaet Darmstadt, Germany)
Atilio Morillo Piña (Applied Math Research Center (CIMA), Engineering School, The University of Zulia, Maracaibo, Venezuela, Venezuela)
Frederic von Wegner (School of Biomedical Sciences, The University of New South Wales, Sydney, Australia, Australia)
Zheng Yan (Intelligent Computing and Systems Lab, Australian Artificial Intelligence Institute, University of Technology Sydney, Australia)
Shiping Wen (Intelligent Computing and Systems Lab, Australian Artificial Intelligence Institute, University of Technology Sydney, Australia)
Matthew Arnold (School of Mathematical and Physical Sciences, University of Technology Sydney, Australia)
Game Physics Engine Using Optimised Geometric Algebra RISC-V Vector Extensions Code Using Fourier Series Data
PRESENTER: Ed Saribatir

ABSTRACT. We describe an example of using a Geometric Algebra algorithm to compute motion in a game physics engine, we optimise the Geometric Algebra algorithm using GAALOP and utilise RISC-V Vector Extension (RVV) operations to perform computations on vectors, we combine this with vectors used to represent a number of Fourier series to model x, y and z components of gravity, wind and surface friction.

Eckhard Hitzer (International Christian University, Japan)
Quadratic Phase Quaternion Domain Fourier Transform

ABSTRACT. Based on the quaternion domain Fourier transform (QDFT) of 2016 and the quadratic-phase Fourier transform of 2018, we introduce the quadratic-phase quaternion domain Fourier transform (QPQDFT) and study some of its properties, like its representation in terms of the QDFT, linearity, Riemann-Lebesgue lemma, shift and modulation, scaling, inversion, Parseval type identity, Plancherel theorem, directional uncertainty principle, and the (direction-independent) uncertainty principle. The generalization thus achieved includes the special cases of QDFT, a quaternion domain (QD) fractional Fourier transform, and a QD linear canonical transform.

11:00-12:30 Session WC4-3DMed2

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023



Xiaohong Liu (Shanghai Jiao Tong University, China)
Zhengke Xu (East China Normal University, China)
Xinxin Shan (East China Normal University, China)
Ying Wen (East China Normal University, China)
HMINet: A Hierarchical Multi-scale Interconnection Network For Medical Image Segmentation

ABSTRACT. In this work, an improved end-to-end U-Net structure, a hierarchical multi-scale interconnection network (HMINet), is proposed to make full use of the information contained in different feature maps in encoders and decoders to improve the accuracy of medical image segmentation. The network consists of two main components: a multi-scale fusion unit (MSF) and a multi-head feature enhancement unit (MFE). In the encoder part, the multi-scale fusion unit is used to fuse the information between the feature maps of different scales. By using convolution at different levels, a wider range of context information can be captured and fused into a more comprehensive representation of features. This helps to reduce information loss and improve the accuracy of medical image segmentation tasks. In the decoder part, multiple feature enhancement units can fully pay attention to the coordinates and channel information between feature maps, and then splice the encoded feature maps step by step to maximize the use of information from different feature maps. These feature maps are joined by a well-designed skip connection mechanism to retain more feature information and minimize information loss. The proposed method is tested on four public medical datasets and compared with other classical image segmentation models. The results show that HMINet can significantly improve the accuracy of medical image segmentation tasks and exceed the performance of other models in most cases.

Xinxin Zhang (East China Normal University, China)
Hang Liu (East China Normal University, China)
Xinru Chen (East China Normal University, China)
Rui Qin (East China Normal University, China)
Yan Zhu (Shanghai Changzheng Hospital, China)
Wenfang Li (Shanghai Changzheng Hospital, China)
Menghan Hu (East China Normal University, China)
Jian Zhang (East China Normal University, China)
CASCO: A Contactless Cough Screening System based on Audio Signal Processing

ABSTRACT. Cough is a common symptom of respiratory disease, which produces a specific sound. Cough detection has great significance to prevent, assess, and control epidemics. This paper proposes CASCO (Cough Analysis System using Short-Time Fourier Transform (STFT) and Convolutional Neural Networks (CNN) in the WeChat mini Program), a cough detection system capable of quantifying the number of coughs through an audio division algorithm. This system combines STFT with CNN, achieving accuracy, precision, recall, and F1-score with 97.0\%, 95.6\%, 98.7\%, and 0.97 respectively in cough detection. The model is embedded into the WeChat mini program to make it feasible to apply cough detection on smartphones and realize large-scale and contactless cough screening. Future research can combine audio and video signals to further improve the accuracy of large-scale cough screening.

Fan Wu (Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, China)
Yumeng Qian (Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, China)
Haozhun Zheng (Tsinghua University, China)
Yan Zhang (Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, China)
Xiawu Zheng (Peng Cheng Laboratory, China)
Exploring a Novel Neighbor Aggregation Function for Accurate Analysis of 3D Medical Point Clouds

ABSTRACT. Point cloud analysis is a technique that performs analysis and processing of point cloud data. In the medical field, point cloud analysis has been widely used. However, the existing common neighbor aggregation module in point cloud analysis networks can only aggregate some of the neighbor features, which will lead to the omission of valid information and affect the performance of point cloud analysis, which may lead to serious consequences in the medical diagnosis process. In this paper, we improve the ability of point cloud analysis networks to extract complex biological structures by improving the neighbor aggregation module in point cloud analysis. Specifically, we enable the module to efficiently extract more adequate information by softening the max pooling function commonly used in the neighbor aggregation module. In particular, we improve 2.18\% IoU on the IntrA dataset compared to the previous state-of-the-art method, and we also surpass the previous state-of-the-art method on the S3DIS dataset.

Kuo Yang (East China Normal University, China)
Wenhao Jiang (East China Normal University, China)
Yiqiao Shi (East China Normal University, China)
Rui Qin (East China Normal University, China)
Wanli Bai (East China Normal University, China)
Duo Li (DiDi Chuxing, China)
Yue Wu (Ninth People’s Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, China)
Menghan Hu (East China Normal University, China)
Cup-disk ratio segmentation joint with key retinal vascular information under diagnostic and screening scenarios

ABSTRACT. Glaucoma is one of the leading causes of irreversible blindness worldwide. Numerous studies have shown that a larger vertical Cup-to-Disc Ratio (CDR) is closely associated with the glaucoma diagnosis. CDR is highly useful in the clinical practice and evaluation of glaucoma. However, the determination of CDR varies among clinicians and is highly dependent on the doctor's subjectivity. Existing methods only segment the cup and disc features without considering the nearby vascular information. Based on guidance and criteria from experienced clinicians in diagnosing glaucoma, we incorporate segmented essential vascular information to constrain CDR segmentation. We add key vessel information to the network as the prior knowledge to better guide the model to distinguish the boundary of the optic cup. The effectiveness of incorporating essential vascular information has been demonstrated through experiments conducted on the public dataset REFUGE as well as the home-made dataset. The home-made dataset consists of high-quality CDR images and remade CDR images, corresponding to the diagnosis scenario and the screening scenario in which the patient needs to upload the fundus image by taking photos. The model is deployed on the Wechat mini-program for practical glaucoma diagnostic and screening applications.

Wenzhuo Zheng (Shanghai Jiao Tong University, China)
Junhao Zhao (Shanghai Jiao Tong University, China)
Yongyang Pan (Shanghai Jiao Tong University, China)
Zhenghao Gan (Shanghai Jiao Tong University, China)
Haozhe Han (Shanghai Jiao Tong University, China)
Xiaohong Liu (Shanghai Jiao Tong University, China)
Ning Liu (Shanghai Jiao Tong University, China)
FLAME-based Multi-view 3D Face Reconstruction

ABSTRACT. At present, face 3D reconstruction has broad application prospects in various fields, but the research on it is still in the development stage. In this paper, we hope to achieve better face 3D reconstruction quality by combining multi-view training framework with face parametric model FLAME, propose a multi-view training and testing model MFNet(Multi-view FLAME Network). We build a self-supervised training framework and implement constraints such as multi-view optical flow loss function and face landmark loss, and finally obtain a complete MFNet. We propose innovative implementations of multi-view optical flow loss and the covisible mask. We test our model on AFLW and facescape datasets and also take pictures of our faces to reconstruct 3D faces while simulating actual scenarios as much as possible, which achieves good results. Our work mainly addresses the problem of combining parametric models of faces with multi-view face 3D reconstruction and explores the implementation of a FLAME-based multi-view training and testing framework for contributing to the field of face 3D reconstruction.

12:30-13:30Lunch Break
13:30-15:30 Session VRIH1

Zoom Link:         Meeting ID: 825 6139 3464, Password: cgi2023

Rui Yao (China University of Mining and Technology, China)
Semir Elezovikj (Temple University, United States)
Jianqing Jia (Syracuse University, United States)
Chiu Tan (Temple University, United States)
Haibin Ling (Stony Brook University, United States)
PartLabeling: A Label Management Framework in 3D Space
PRESENTER: Semir Elezovikj

ABSTRACT. In this work, we focus on the label layout problem: specifying the positions of overlaid virtual annotations in Virtual/Augmented Reality scenarios. Designing a layout of labels that does not violate domain-specific design requirements, while at the same time satisfying aesthetic and functional principles of good design, can be a daunting task even for skilled visual designers. Presenting the annotations in 3D object space instead of projection space, allows for the preservation of spatial and depth cues. This results in stable layouts in dynamic environments, since the annotations are anchored in 3D space. In this paper we make two major contributions. First, we propose a technique for managing the layout and rendering of annotations in Virtual/Augmented Reality scenarios by manipulating the annotations directly in 3D space. For this, we make use of Artificial Potential Fields and use 3D geometric constraints to adapt them in 3D space. Second, we introduce PartLabeling: an open source platform in the form of a web application that acts as a much-needed generic framework allowing to easily add labeling algorithms and 3D models. This serves as a catalyst for researchers in this field to make their algorithms and implementations publicly available, as well as ensure research reproducibility. The PartLabeling framework relies on a dataset that we generate as a subset of the original PartNet dataset consisting of models suitable for the label management task. The dataset consists of 1,000 3D models with part annotations.

Nick Vitsas (Athens University of Economics and Business, Greece)
Iordanis Evangelou (Athens University of Economics and Business, Greece)
Georgios Papaioannou (Athens University of Economics and Business, Greece)
Anastasios Gkaravelis (Athens University of Economics and Business, Greece)
Opening Design using Bayesian Optimization
PRESENTER: Nick Vitsas

ABSTRACT. Opening design is a major consideration in architectural buildings during early structural layout specification. Decisions regarding the geometric characteristics of windows, skylights, hatches, etc., greatly impact the overall energy efficiency, airflow and appearance of a building, both internally and externally. In this work, we employ a goal-based, illumination-driven approach to opening design using a Bayesian Optimization approach, based on Gaussian Processes. A method is proposed that allows a designer to easily set lighting intentions along with qualitative and quantitative characteristics of desired openings. All parameters are optimized within a cost minimization framework to calculate geometrically feasible, architecturally admissible and aesthetically pleasing openings of any desired shape, while respecting the designer's lighting constraints.

Simon Seibt (Nuremberg Institute of Technology, Germany)
Bastian Kuth (Coburg University of Applied Sciences and Arts, Germany)
Bartosz von Rymon Lipinski (Nuremberg Institute of Technology, Germany)
Thomas Chang (Nuremberg Institute of Technology, Germany)
Marc Erich Latoschik (University of Wuerzburg, Germany)
Multidimensional Image Morphing - Fast Image-based Rendering of Open 3D and VR Environments
PRESENTER: Simon Seibt

ABSTRACT. The demand for interactive photorealistic 3D environments has increased in recent years and in various fields such as architecture, engineering and entertainment. Nevertheless, achieving a balance between quality and performance for high-performance 3D applications and Virtual Reality (VR) remains a challenge. This paper addresses this issue by revisiting and extending view interpolation for image-based rendering, enabling the exploration of spacious open environments in 3D and VR. Therefore, we introduce multi-morphing, a novel rendering method based on a spatial data structure of 2D image patches, called the image graph. With this approach, novel views can be rendered with up to six degrees of freedom using only a sparse set of views. The rendering process does not require 3D reconstruction of geometry, nor per-pixel depth information: All relevant data for output is extracted from local morphing cells of the image graph. Detection of parallax image regions during preprocessing reduces rendering artifacts by extrapolating image patches from adjacent cells in real-time. Additionally, a GPU-based solution to resolve exposure inconsistencies within a dataset is presented, enabling seamless transitions of brightness when moving between areas with varying light intensities. Experiments on multiple real-world and synthetic scenes demonstrate that the presented method achieves high ``VR-compatible'' frame rates, even on mid-range and legacy hardware, respectively.

Ron Vanderfeesten (Utrecht University, Netherlands)
Real-time Vascular Networks for Dynamic Skin Texture Animation

ABSTRACT. One of the aspects often missing from real-time character animation is change in color of the skin due to emotions (e.g., reddening due to embarrassment) or exerted pressure (e.g., whitening due to pressure on the skin, or stretching of skin during motion). These effects, while subtle, are important for creating truly convincing virtual characters. These changes in skin color are due to a change in blood volume. Hence, modeling skin color in animations can be done using a vascular network that models the subcutaneous blood vessels which then determines the resulting skin color. This paper presents a method that allows real-time, realistic modeling of skin color due to blood volume change, providing a next step towards realistic virtual characters. Our method includes a procedure to dynamically generate blood vessel networks based on medically accurate models, and an algorithm to render these as a set of textures accepted by modern (real-time) rendering engines. Moreover, these textures can be altered in real-time. This allows for the simulation of change in skin color due to blood volume changes, all while maintaining biological plausibility.

Zizhuo Wang (Tsinghua Shenzhen International Graduate School, China)
Kun Hu (Tsinghua Shenzhen International Graduate School, China)
Zhaoyangfan Huang (Tsinghua Shenzhen International Graduate School, China)
Zixuan Hu (Tsinghua Shenzhen International Graduate School, China)
Shuo Yang (Tsinghua Shenzhen International Graduate School, China)
Xingjun Wang (Tsinghua Shenzhen International Graduate School, China)
Robust Blind Image Watermarking Based on Interest Points

ABSTRACT. Digital watermarking technology plays an essential role in the work of anti-counterfeiting and traceability. However, image watermarking algorithms are weak against hybrid attacks, especially geometric attacks, such as cropping attacks, rotation attacks, etc. We propose a robust blind image watermarking algorithm that combines stable interest points and deep learning networks to improve the robustness of the watermarking algorithm further. First, to extract more sparse and stable interest points, we use the Superpoint algorithm for generation and design two steps to perform the screening procedure. We first keep the points with the highest possibility in a given region to ensure the sparsity of the points and then filter the robust interest points by hybrid attacks to ensure high stability. The message is embedded in sub-blocks centered on stable interest points using a deep learning-based framework. Different kinds of attacks and simulated noise are added to the adversarial training to guarantee the robustness of embedded blocks. We use the ConvNext network for watermark extraction and determine the division threshold based on the decoded values of the unembedded sub-blocks. Through extensive experimental results, we demonstrate that our proposed algorithm can improve the accuracy of the network in extracting information while ensuring high invisibility between the embedded image and the original cover image. Comparison with previous SOTA work reveals that our algorithm can achieve better visual and numerical results on hybrid and geometric attacks.

Heng Zhang (Tongji University, Shanghai, China, China)
Zhihua Wei (Tongji University, Shanghai, China, China)
Guanming Liu (Tongji University, Shanghai, China, China)
Ruibin Mu (Tongji University, Shanghai, China&Alibaba Group, China)
Rui Wang (Tongji University, Shanghai, China, China)
Chuan Bao Liu (Alibaba Group, Shanghai, China, China)
Aiquan Yuan (Alibaba Group, Shanghai, China, China)
Guodong Cao (Alibaba Group, Shanghai, China, China)
Ning Hu (Alibaba Group, Shanghai, China, China)
MKEAH: Multimodal Knowledge Extraction and Accumulation Based on Hyperplane Embedding for Knowledge-based Visual Question Answering

ABSTRACT. External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in some of representing complex relations, resulting in a lack of topic-related knowledge but the redundancy of topic-irrelevant information. To this end, we propose MKEAH to represent Multimodal Knowledge Extraction and Accumulation on Hyperplanes. To ensure that the length of the feature vectors projected to the hyperplane compares equally and to filter out enough topic-irrelevant information, two losses are proposed to learn the triplet representations from the complementary views: range loss and orthogonal loss. In order to interpret the capability of extracting topic-related knowledge, we present Topic Similarity (TS) between topic and entity-relation. Experimental results demonstrate the effectiveness of hyperplane embedding for knowledge representation in knowledge-based visual question answering. Our model outperforms the state-of-the-art methods by 2.12% and 3.24%, respectively, on two challenging knowledge-required datasets: OK-VQA and KRVQA. The obvious advantages of our model on TS shows that using hyperplane embedding to represent multimodal knowledge can improve the ability of the model to extract topic-related knowledge.

Dvir Ginzburg (Tel Aviv University, Israel)
Dan Raviv (Tel Aviv University, Israel)
Selective Sampling with Gromov-Hausdorff Metric - Efficient Dense Shape Correspondence via Confidence-Based Sample Consensus
PRESENTER: Dvir Ginzburg

ABSTRACT. We present a novel method for dense shape correspondence that combines spatial information transformed by neural networks with their projection on spectral maps. This approach builds on the proven efficiency of the original "functional mapping" method, but addresses a major challenge faced by all such methods: a "Chicken or the egg" scenario, where poor spatial features lead to inadequate spectral alignment and vice versa during training. This often results in slow convergence, high computational costs, and failure to learn, particularly when working with small datasets.

To overcome this challenge, we propose a new method that selectively samples only those points with high confidence in their alignment. These points then participate in the alignment and spectral loss terms, boosting training and accelerating convergence by a factor of 5. To ensure full, unsupervised learning, we use the \textit{Gromov Hausdorff distance metric} to choose the best set of confident points with the maximal alignment score.

Our approach offers significant advantages over current methods, including faster convergence, improved accuracy, and reduced computational costs. We demonstrate the effectiveness of our approach on several benchmark datasets and report superior results compared to spectral and spatial based methods. Overall, our method provides a promising new approach to dense shape correspondence that addresses key challenges in the field.

Xuefei Tian (Shanghai Jiao Tong University, China)
Xiaoju Dong (Shanghai Jiao Tong University, China)
Zhiyu Wu (Shanghai Jiao Tong University, China)
Yinuo Liu (Shanghai Jiao Tong University, China)
Shengtao Chen (Shanghai Jiao Tong University, China)
ILIDViz: An Incremental Learning-Based Visual Analysis System for Network Anomaly Detection
PRESENTER: Xuefei Tian

ABSTRACT. With the development of information technology, network traffic logs mixed with various kinds of cyber-attacks have grown explosively. Traditional intrusion detection systems (IDS) have limited ability to discover new inconstant patterns and identify malicious traffic traces in real-time. It is urgent to implement more effective intrusion detection technologies to protect computer security. In this paper, we design a hybrid IDS, combining our incremental learning model (KAN-SOINN) and active learning, to learn new log patterns and detect various network anomalies in real-time. The experimental results on the NSLKDD dataset show that the KAN-SOINN can be improved continuously and detect malicious logs more effectively. Meanwhile, the comparative experiments prove that using a hybrid query strategy in active learning can improve the model learning efficiency.

13:30-15:30 Session WC6-ENGAGE3: Empowering Novel Geometric Algebra for Graphics & Engineering Workshop 3

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023

Vincent Nozick (Université Gustave Eiffel, LIGM, France)
Leo Dorst (University of Amsterdam, Netherlands)
Paraxial Geometric Optics in 3D through Point-based Geometric Algebra

ABSTRACT. The versors of the homogeneous-point-based version R(d,0,1) (dubbed HGA) are related to the basic operations in geometric paraxial optics. Odd versors represent reflections in spherical mirrors (be they concave or convex) and even versors implement the lens equation. We extend the results to arbitrarily positioned optical elements by embedding R(d,0,1) into CGA R(d+1,1). The total transformation through a paraxial optical system now consists of successive teleportation (by CGA dot and outer product) to the next optical center, and then applying its local versors.

The result is a straightforward sequence of operations that implements a total system of arbitrarily placed paraxial lenses and mirrors in 3D (or any dimension), parameterized by the CGA tangent vectors (from each optical center to the corresponding focal point) for each optical component. This can be used to compile the homogeneous transformation matrices of a total paraxial system in terms of those geometric parameters.

Danail Brezov (Department of Mathematics, UACEG, Bulgaria)
Michael Werman (Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel)
Static Object Surveillance by Moving Cameras

ABSTRACT. We study the geometry and kinematics of surveillance of static objects with moving cameras both in 2D and 3D space via geometric algebras. The former case is easily handled using polar representation of complex numbers, while for the latter we resort on dual projective quaternions and Pl\"{u}cker line geometry. Hence, a brief preliminary section is provided, after which we approach the task naively, searching for exact solutions in the case of pre-determined trajectory. The infinitesimal setting is of particular interest for practical reasons, so we pay attention to it as well. Finally, some ideas for generalizations are given, along with curious examples.

Rolf Sören Kraußhar (Universität Erfurt, Germany)
Dmitrii Legatiuk (Universität Erfurt, Germany)
Discretisation of octonionic analysis: Weyl calculus perspective
PRESENTER: Dmitrii Legatiuk

ABSTRACT. Octonions are 8-dimensional hypercomplex numbers which form the largest normed division algebra (in the wider sense of admitting non-associativity) over the real numbers. Motivated by applications in theoretical physics, continuous octonionic analysis has become a very active area of research during the recent years. Looking at possible practical applications, it is beneficial to work directly with discrete structures rather than approximating continuous objects. Therefore, in previous papers, we have proposed some ideas towards the development of a discrete octonionic analysis. It is well known that there are several possibilities to discretise the continuous setting. The Weyl calculus approach, which is typically used in the associative discrete Clifford analysis setting, has not been studied in the octonionc setting yet. The aim of this paper is to close this gap. We succeed in presenting the discretisation of octonionic analysis based on the Weyl calculus.

Dimiter Prodanov (Imec, Belgium)
Algorithmic computation of multivector inverses and characteristic polynomials in non-degenerate Clifford algebras

ABSTRACT. Clifford algebras provide the natural generalizations of complex, dual numbers and quaternions into non-commutative Clifford numbers. The paper demonstrates an algorithm for the computation of inverses of such numbers in a non-degenerate Clifford algebra of an arbitrary dimension. The algorithm is a variation of the Faddeev-LeVerrier-Souriau algorithm and is implemented in the open-source Computer Algebra System Maxima. Symbolic and numerical examples in different Clifford algebras are presented.

Dmitry Shirokov (HSE University, Moscow, Russia; IITP RAS, Moscow, Russia, Russia)
On Singular Value Decomposition and Polar Decomposition in Geometric Algebras

ABSTRACT. This paper is a brief note on the natural implementation of singular value decomposition (SVD) and polar decomposition of an arbitrary multivector in nondegenerate real (Clifford) geometric algebras of arbitrary dimension and signature. We naturally define these and other related structures (operation of Hermitian conjugation, Euclidean space, and Lie groups) in geometric algebras. The results can be used in various applications of geometric algebras in computer graphics, computer vision, data analysis, computer science, engineering, physics, big data, machine learning, etc.

15:30-16:00Coffee Break
16:00-18:30 Session Displays

Zoom Link:     Meeting ID: 847 8718 3153, Password: cgi2023

Yanci Zhang (College of Computer Science, SiChuan University, China)
Rui Yao (China University of Mining and Technology, China)
Xiangbin Zhu (China University of Mining and Technology, China)
Yong Zhou (China University of Mining and Technology, China)
Zhiwen Shao (China University of Mining and Technology, China)
Fuyuan Hu (Suzhou University of Science and Technology, China)
Yanning Zhang (Northwestern Polytechnical University, China)
Unsupervised cycle-consistent adversarial attacks for visual object tracking

ABSTRACT. Adversarial attacks on visual object tracking have attracted increasing attention in order to evaluate and improve the robustness and security of object tracking models. Most adversarial attack methods for object tracking are based on fully supervised attacks, i.e. labels are required. It is impractical to apply all labeled video object tracking to an attack. To this end, this paper proposes an unsupervised attack method against the visual object tracking model, which uses the cycle consistency principle of the object tracking model to make the forward tracking and backward tracking of the object tracking model as inconsistent as possible, resulting in effective countermeasures. sample. In addition, this paper proposes a contextual attack method, which utilizes the information of the attack object region and its surrounding context regions to simultaneously attack the object region and its surrounding context regions to reduce its response score to the attack. The proposed attack method is evaluated on different types of deep learning-based object trackers, and the experimental results on multiple benchmarks show that the proposed method has competitive attack results.

Fangchuan Li (College of Computer Science, SiChuan University, China)
Shuangjia Liu (College of Computer Science, SiChuan University, China)
Ning Ma (College of Computer Science, SiChuan University, China)
Yanli Liu (College of Computer Science, SiChuan University, China)
Guanyu Xing (College of Computer Science, SiChuan University, China)
Yanci Zhang (College of Computer Science, SiChuan University, China)
A GPU-Friendly Hybrid Occlusion Culling Algorithm for Large Scenes
PRESENTER: Fangchuan Li

ABSTRACT. In this paper, we present a novel hybrid occlusion culling method for large scale scenes. The basic idea is to use an iterative hierarchical Z-buffer occlusion culling algorithm to execute a coarse-grained culling via compute shader, followed by a fine-grained culling via rasterization. We also propose a forward warping method to generate a low resolution approximated depth map to accelerate the culling process. Our solution requires only one indirect multidraw command and runs on GPU totally without any CPU read-back operations. Our experimental results indicate that our algorithm outperforms the existing solutions both in performance and culling rate.

Yi Xiao (Beijing Institute of Technology, China)
Hao Sha (Beijing Institute of Technology, China)
Huaying Hao (Beijing Institute of Technology, China)
Yue Liu (Beijing Institute of Technology, China)
Yongtian Wang (Beijing Institute of Technology, China)
3D Hand Mesh Recovery through Inverse Kinematics from a Monocular RGB Image

ABSTRACT. Recovering 3D hand mesh from a monocular RGB image has a wide range of application scenarios such as VR/AR. The parametric hand model provides a good geometric prior to the shape of hand, and is commonly used to recover the 3D hand mesh. However, the rotation parameters of hand model are not easy to learn, which influences the accuracy of model-based methods. To address this problem, we take advantage of the inverse kinematic chains of hand to derive an analytical method, which can convert hand joint locations into rotation parameters. By integrating such analytical method into the neural network, we propose an end-to-end learnable model named IKHand to recover the 3D hand mesh. IKHand comprises detection module and mesh generation module. Detection module predicts the 3D hand keypoints while mesh generation module takes these keypoints to generate the 3D hand mesh. Experimental results show that our proposed method can generate impressive and robust 3D hand meshes under several challenging conditions, and can achieve superior accuracy in model-based methods.

Jiahui Huang (Adobe Research, UBC, Canada)
Leonid Sigal (UBC, Vector Institute, Canada)
Kwang Moo Yi (UBC, Canada)
Oliver Wang (Adobe Research, United States)
Joon-Young Lee (Adobe Research, United States)
INVE: Interactive Neural Video Editing
PRESENTER: Jiahui Huang

ABSTRACT. We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip.

Our method is inspired by the recent work Layered Neural Atlas (LNA). LNA, however, suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases, including direct frame editing and rigid texture tracking.

To address these challenges we leverage and adopt highly efficient network architectures, powered by hash-grids encoding, to substantially improve processing speed. In addition, we learn bi-directional functions between image-atlas and introduce vectorized editing, which collectively enables a much greater variety of edits in both the atlas and the frames directly.

Compared to LNA, INVE reduces the learning and inferencing time by a factor of 5, and supports various video editing operations that LNA cannot. We showcase the superiority of INVE over LNA in interactive video editing through a comprehensive quantitative and qualitative analysis, highlighting its numerous advantages and improved performance.

Zhaoyi Jiang (College of Computer Science & Information Engineering, Zhejiang Gongshang University, China)
Guoliang Wang (College of Computer Science & Information Engineering, Zhejiang Gongshang University, China)
Gary Kl Tam (Swansea University, UK)
Chao Song (Zhejiang Gongshang University, China)
Bailin Yang (College of Computer&Information Engineering, Zhejiang Gongshang University, China)
Frederick W. B. Li (University of Durham, UK)
An End-to-end Dynamic Point Cloud Geometry Compression in Latent Space

ABSTRACT. Dynamic point clouds are widely used for 3D data representation in various applications such as immersive and mixed reality, robotics and autonomous driving. However, their irregularity and large scale make efficient compression and transmission a challenge. Existing methods require high bitrates to encode point clouds since temporal correlation is not well considered. This paper proposes an end-to-end dynamic point cloud compression network that operates in latent space, resulting in more accurate motion estimation and more effective motion compensation. Specifically, a multi-scale motion estimation network is introduced to obtain accurate motion vectors. Motion information computed at a coarser level is upsampled and warped to the finer level based on cost volume analysis for motion compensation. Additionally, a residual compression network is designed to mitigate the effects of noise and inaccurate predictions by encoding latent residuals, resulting in smaller conditional entropy and better results. The proposed method achieves an average 12.09\% BD-Rate gain over state-of-the-art Deep Dynamic Point Cloud Compression (D-DPCC) in experimental results. The novelty of our method lies in the use of latent space for all major operations and the introduction of the multi-scale motion estimation network to improve motion estimation accuracy.

Wuzhen Shi (Shenzhen University, China)
Zhijie Liu (Shenzhen University, China)
Yingxiang Li (Shenzhen University, China)
Yang Wen (Shenzhen University, China)
Light-weight 3D Mesh Generation Networks based on Multi-stage and Progressive Knowledge Distillation
PRESENTER: Yingxiang Li

ABSTRACT. Due to the high data dimensionality and the complexity of the problem, existing 3D mesh reconstruction models often require significant computational resources to achieve satisfactory results. While lightweight model based on knowledge distillation has been explored in many fields such as image classification, training a lightweight 3D mesh reconstruction model remains a challenging task. In this paper, we propose a method to learn a lightweight 3D mesh reconstruction network using knowledge distillation. Specifically, we introduce a novel approach called multi-stage and progressive knowledge distillation, which effectively enhances the guidance from the teacher network to the student network, thereby improving reconstruction performance. Additionally, we propose a projection-based spatial feature unpooling method to provide more accurate spatial features for the increased spatial points. Experimental results show that our lightweight 3D mesh reconstruction network has comparable performance to existing complex models while greatly reducing the number of parameters. Specifically, our method achieves its 98.97% accuracy while reducing the number of graph neural network parameters to 71.42% of the teacher network.

Lan Wei (University of Science and Technology of China, China)
Nikolaos Freris (University of Science and Technology of China, China)
Multi-scale Graph Neural Network for Physics-informed Fluid Simulation

ABSTRACT. Learning-based fluid simulation has proliferated due to its ability to replicate the dynamics with substantial computational savings over traditional numerical solvers. To this end, Graph Neural Networks (GNNs) is a suitable tool to capture fluid dynamics through local particle interactions. Nonetheless, it remains challenging to model the long-range behaviors. To tackle this, this paper models the fluid flow via graphs at different scales in succinct consideration of scalability and physical constraints. We propose a novel multi-scale GNN for physics-informed fluid simulation (MSG) by introducing a non-parametric sampling and aggregation method to combine features from graphs with different resolutions. Our design reduces the size of the learnable model and accelerates the model inference time. In addition, zero velocity divergence is explicitly incorporated as a physical constraint through the training loss function. Finally, a fusion mechanism of consecutive predictions is incorporated to alleviate the inductive bias caused by the Markovian assumption. Extensive experiments corroborate the merits over leading particle-based neural network models in terms of both one-step accuracy (+6.7%) and long trajectory prediction (+16.9%). This comes with a run-time reduction by 2.8% over the best baseline method.

Hongjin Lyu (Cardiff University, UK)
Paul Rosin (Cardiff University, UK)
Yukun Lai (Cardiff University, UK)
WCGAN: Robust Portrait Watercolorization with Adaptive Hierarchical Localized Constraints
PRESENTER: Hongjin Lyu

ABSTRACT. Deep learning has enabled image style transfer to make great strides forward. However, unlike many other styles, transferring the watercolor style to portraits is significantly challenging in image synthesis and style transfer. Pixel-correlation-based methods do not produce satisfactory watercolors. This is because portrait watercolors exhibit the sophisticated fusion of various painting techniques in local areas, which poses a problem for convolutional neural networks to accurately handle fine-grained features. Moreover, common but problematic multi-scale challenge greatly impedes the performance of existing style transfer methods with fixed receptive fields. Although it is possible to develop an image processing pipeline mimicking various watercolor effects, such algorithms are slow and fragile, especially for inputs of different scales. As a remedy, this paper proposes WCGAN, a generative adversarial network (GAN) architecture dedicated to watercolorization of portraits. Specifically, a novel localized style loss suitable for watercolorization is proposed to deal with local details. To handle portraits of different scales and improve robustness, a novel discriminator architecture with three parallel branches of varying receptive fields is introduced. In addition, the application of WCGAN is expanded to video style transfer where a novel kind of video training data based on random crops is developed to efficiently capture temporal consistency. Extensive experimental results from qualitative and quantitative analysis demonstrate that WCGAN generates state-of-the-art, high quality watercolors from portraits.

Tao Peng (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Wenjie Wu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Junping Liu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Zili Zhang (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Li Li (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Xinrong Hu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Ruhan He (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
PGN-Cloth: Physics-based Graph Network model for 3D cloth animation

ABSTRACT. Graph neural networks have been used in the learning-based simulation of cloth and have received a lot of attention recently. Some learning-based graph networks lack information on cloth structure, the generated results are not plausible, and penetration is unavoidable. We present the PGN-Cloth model, which uses mesh to represent the state of the cloth system and computes the dynamics through graph neural networks. Our contributions include: (1) PGN-Cloth combine the ideas of physics-based deep learning and mass-spring models, adding EdgeLoss, CosLoss, and BendLoss to generate more plausible results; (2) an additional penetration loss is added to optimize the penetration problem that exists in the current state-of-the-art method; (3) our approach can significantly improve the training speed of the network with only a small increase in computational, and has better stability than traditional learning methods. The experimental results are better than state-of-the-art in several indicators, and there is no significant penetration in the generated results.

16:00-18:30 Session VRIH2

Zoom Link:         Meeting ID: 825 6139 3464, Password: cgi2023

Guihua Shan (Computer Network Information Center, Chinese Academy of Sciences, China)
Biao Dong (school of computer science and engineering Northeastern University, China)
Wenjun Tan (school of computer science and engineering Northeastern University, China)
Weichao Chang (school of computer science and engineering Northeastern University, China)
Baoting Li (school of computer science and engineering Northeastern University, China)
Yanliang Guo (school of computer science and engineering Northeastern University, China)
Quanxing Hu (school of computer science and engineering Northeastern University, China)
Guangwei Liu (Liaoning Technical University, China)
Yongfeng Qiao (Dandong Dongfang Measurement& Control Technolo, China)
Research on the integrated modeling method of real scene and underground geological model on open-pit mine

ABSTRACT. With the advancement and popularization of information technology, open-pit mines are also rapidly developing towards integration and digitization. Three-dimensional reconstruction technology has been successfully applied to surface scene modeling and geological structure reconstruction in open-pit mines. However, there is no open-pit mine integration model that can fuse above-ground scene information with underground geological information. In this paper, we propose an integrated modeling method for open-pit mines, which integrates the real scene on the ground with the underground geological model. Based on oblique photography technology, the real scene model above ground was established. Based on the surface suture method proposed in this paper, a three-dimensional underground geological model is constructed, and the above-ground and underground models are registered and fused to establish an integrated model of open-pit mine. The auxiliary planning system of open-pit mine is designed and implemented, and the functions of mining planning and output calculation are carried out based on the integrated model of open-pit mine, which assists users in mining planning and operation management, improving production efficiency and management level.

Mingkang Wang (Guangdong University of Technology, China)
Min Meng (Guangdong University of Technology, China)
Jigang Liu (Ping An Life Insurance of China, China)
Jigang Wu (Guangdong University of Technology, China)
Learning Adequate Alignment and Interaction for Cross-Modal Retrieval
PRESENTER: Mingkang Wang

ABSTRACT. Cross-modal retrieval has attracted widespread attention in many cross-media similarity search applications, especially image-text retrieval in the fields of computer vision and natural language processing. Recently, visual and semantic embedding (VSE) learning has shown promising improvements on image-text retrieval tasks. Most existing VSE models employ two unrelated encoders to extract features, then use complex methods to contextualize and aggregate those features into holistic embeddings. Despite recent advances, existing approaches still suffer from two limitations: 1) without considering intermediate interaction and adequate alignment between different modalities, these models cannot guarantee the discriminative ability of representations; 2) existing feature aggregators are susceptible to certain noisy regions, which may lead to unreasonable pooling coefficients and affect the quality of the final aggregated features. To address these challenges, we propose a novel cross-modal retrieval model containing a well-designed alignment module and a novel multimodal fusion encoder, which aims to learn adequate alignment and interaction on aggregated features for effectively bridging the modality gap. Experiments on Microsoft COCO and Flickr30k datasets demonstrates the superiority of our model over the state-of-the-art methods.

Shiyu Cheng (Computer Network Information Center, Chinese Academy of Sciences, China)
Guihua Shan (Computer Network Information Center, Chinese Academy of Sciences, China)
Beifang Niu (Computer Network Information Center, Chinese Academy of Sciences, China)
Yang Wang (Computer Network Information Center, Chinese Academy of Sciences, China)
RobotExplorer: Visual Analytics for DRL-based Robot Control
PRESENTER: Shiyu Cheng

ABSTRACT. Deep reinforcement learning (DRL) has demonstrated superior performance in playing video games and chess. However, applying DRL to complicated tasks like robot control is still challenging. Trial-and-error learning introduces additional uncertainty, while statistics such as rewards obscure detailed information, making retrospective analysis difficult. In this work, we propose RobotExplorer, a visual analytics system to help domain experts extract keyframes for diagnosis, understand the semantics of continuous actions, and summarize the impact of features on decision-making. Through case studies and user interviews conducted with deep learning experts, we demonstrate the effectiveness of RobotExplorer.

Minghua Jiang (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Zhangyuan Tian (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Chenyu Yu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Yankang Shi (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Li Liu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Tao Peng (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Xinrong Hu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Feng Yu (School of Computer and Artificial Intelligence, Wuhan Textile University, China)
Intelligent 3D Garment System of Human Body based on Deep Spiking Neural Network

ABSTRACT. Intelligent garment, as an emerging intelligent wearable device, has been widely applied in fields such as sports training and medical rehabilitation. However, the current research in the field of smart wearables mainly focuses on sensor functionality and quantity, while neglecting user experience and interaction aspects. To address this issue, this study proposes a real-time 3D interactive system based on intelligent garment. The system utilizes lightweight sensor modules to collect human motion data and introduces a dual-stream fusion network based on pulse neural units for classifying and recognizing human movements, achieving real-time interaction between users and sensors. Additionally, the system incorporates 3D human visualization functionality, which visualizes the sensor data and recognized human actions as 3D models in real-time, providing accurate and comprehensive visual feedback to help users better understand and analyze the details and features of human motion. The system holds great potential for applications in motion detection, medical monitoring, virtual reality, and other fields. Accurate classification of human actions contributes to the development of personalized training plans and injury prevention strategies. In summary, this research has significant implications in the fields of intelligent garment, human motion monitoring, and digital twin visualization. The development of this system will drive advancements in wearable technology and enhance our understanding of human motion.

Lichao Niu (Hefei University of Technology, China)
Wenjun Xie (Hefei University of Technology, China)
Dong Wang (Hefei University of Technology, China)
Zhongrui Cao (Hefei University of Technology, China)
Xiaoping Liu (Hefei University of Technology, China)
Audio2AB: A Study of Audio-Driven Collaborative Generation of Virtual Character Animation

ABSTRACT. A great deal of research has been conducted in the area of audio-driven virtual character gestures and facial animation with some degree of success. However, there are few methods for generating full-body animations, and the portability of virtual character gestures and facial animations is not given enough attention. Therefore, we propose a deep learning-based Audio2AB method, which generates gesture animations and ARKit's 52 facial expression parameter blendshape weights based on audio, audio corresponding text, emotion labels and semantic relevance labels, to generate parametric data for full-body animations. This parameterization method can be used to drive full-body animations of virtual characters and improve the portability. In the experiment, we first downsample the gesture and facial datas to achieve the same temporal resolution for input datas and output gesture and face datas. So that the synthesized gestures and faces animation can be processed frame by frame through a sequence model. Secondly, our network is trained based on audio and other information as the dataset. Audio2AB network encodes audio, audio corresponding text, emotion labels, and semantic relevance labels, then fuses the text, emotion labels and semantic relevance labels into the audio to obtain better audio features. Then, we establish links between body, gesture, and facial decoders, and generate corresponding animation sequences through our proposed GAN-GF loss function. Finally, by using audio, audio corresponding text, emotional and semantic relevance labels as input, the trained Audio2AB network can generate gesture animation data and containing blendshape weights' facial data. Therefore, different 3D virtual character animations can be driven through parameterization. The experimental results show that our proposed method can generate significant gesture and facial animations.

Yujie Liu (中国石油大学(华东), China)
Xiaorui Sun (中国石油大学(华东), China)
Wenbin Shao (中国石油大学(华东), China)
Yafu Yuan (中国石油大学(华东), China)
S2ANet: Combining local spectral and spatial point grouping to point cloud processing
PRESENTER: Xiaorui Sun

ABSTRACT. Despite the recent progresses on 3D point cloud process with deep CNNs, the inability to extract local features remains a challenging problem. In addition, the current method only considers the spatial domain in the feature extraction process, so in this paper, we propose a graph convolutional network—S2ANet, which combines spectral and spatial features to point cloud processing. First, we calculate the local frequency of the point cloud in the spectral domain. Then we use local frequency to group points, and we provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency. At the same time, we also extract the local features in the spatial domain to supplement the final features. S2ANet was benchmarked on several point cloud analysis tasks where we achieved the state-of-the-art classification accuracy of 93.8\%, 88.0\%, and 83.1\% on ModelNet40, ShapeNetCore, and ScanObjectNN datasets, respectively. For indoor scene segmentation, training and testing are performed on the S3DIS dataset, and the mIoU is 62.4\%. Our code is released on

Zhi Li (South China Normal University, China)
Xiongwen Pang (South China Normal University, China)
Yiyue Jiang (South China Normal University, China)
Yujie Wang (South China Normal University, China)
RealFuVSR: Feature Enhanced Real-World Video Super-Resolution

ABSTRACT. Recurrent recovery is a common method for video super-resolution (VSR) that models the correlation between frames via hidden states. However, the application of this structure in real-world scenarios can lead to unsatisfactory artifacts. We found that in real-world VSR training, the use of unknown and complex degradation can better simulate the degradation process in the real world. Based on this, we propose the RealFuVSR model, which simulates real-world degradation and mitigates artifacts caused by the VSR. Specifically, we propose a multiscale feature extraction module (MSF) module that extracts and fuses features from multiple scales, thereby facilitating the elimination of hidden state artifacts. To improve the accuracy of the hidden state alignment information, RealFuVSR uses an advanced optical flow-guided deformable convolution. Moreover, a cascaded residual upsampling module was used to eliminate noise caused by the upsampling process. The experiment demonstrates that RealFuVSR model can not only recover high-quality videos but also outperforms the state-of-the-art RealBasicVSR and RealESRGAN models.

Junjie Tao (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China, China)
Yinghui Wang (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China, China)
Haomiao Ma (School of Computer Science, Shaanxi Normal University, Xi’an 710119, China, China)
Tao Yan (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China, China)
Lingyu Ai (School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China, China)
Shaojie Zhang (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China, China)
Wei Li (School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China, China)
An image defocus deblurring method based on gradient difference of boundary neighborhood

ABSTRACT. For static scenes with multiple depth layers, the existing defocused image deblurring methods have the problems of edge ringing artifacts or insufficient deblurring degree due to inaccurate estimation of blur amount, In addition, the prior knowledge in non blind deconvolution is not strong, which leads to image detail recovery challenge. To this end, this paper proposes a blur map estimation method for defocused images based on the gradient difference of the boundary neighborhood, which uses the gradient difference of the boundary neighborhood to accurately obtain the amount of blurring, thus preventing boundary ringing artifacts. Then, the obtained blur map is used for blur detection to determine whether the image needs to be deblurred, thereby improving the efficiency of deblurring without manual intervention and judgment. Finally, a non blind deconvolution algorithm is designed to achieve image deblurring based on the blur amount selection strategy and sparse prior. Experimental results show that our method improves PSNR and SSIM by an average of 4.6% and 7.3%, respectively, compared to existing methods.

Senhua Xue (College of Intelligence and Computing, Tianjin University, China)
Liqing Gao (College of Intelligence and Computing, Tianjin University, China)
Liang Wan (College of Intelligence and Computing, Tianjin University, China)
Wei Feng (College of Intelligence and Computing, Tianjin University, China)
Multi-Scale Context-Aware Network for Continuous Sign Language Recognition

ABSTRACT. The hands and face in the sign language video are the most important parts for expressing sign language morphemes. However, we find that existing Continuous Sign Language Recognition (CSLR) works lack the mining of hand and face information in CNN backbones or apply expensive and time-consuming external extractors to explore this information. Besides, the signs have different length, while previous CSLR methods usually use a fixed-length window to segment the video to capture sequential features, which disturbs the perception of complete signs. In this paper, we propose a Multi-Scale Context-Aware network (MSCA-Net) to solve the problems above. Our MSCA-Net contains two main modules: 1) Multi-Scale Motion Attention (MSMA), using the differences among frames to perceive information of hands and face in multiple spatial scales, replacing the heavy feature extractors; 2) Multi-Scale Temporal Modeling (MSTM), exploring crucial temporal information in the sign language video from different temporal scales. We conduct extensive experiments on three widely-used sign language datasets, i.e., RWTH-PHOENIX-Weather-2014, RWTH-PHOENIX-Weather-2014T and CSL-Daily. The proposed MSCA-Net achieves state-of-the-art performance, which demonstrates the effectiveness of our approach.