Second IEEE Workshop on Coding for Machines
July 15, 2024, Niagara Falls, Canada
in conjunction with IEEE ICME 2024
Technical Program
1:00pm - 1:05pm Welcome and opening
1:05pm - 2:00pm Keynote lecture: Enabling Collaborative Intelligence in Dynamic Environments via Learnt Compression [Slides]
Dr. Nilesh A. Ahuja, Intel Labs
2:00pm - 2:05pm Break
2:05pm - 3:00pm Invited lecture: Visual Data Compression in the AI Era [Slides]
Prof. Fengqing Maggie Zhu, Purdue University
3:00pm - 3:20pm Break
3:20pm - 5:00pm Technical session: Visual Coding for Machines
Region-Of-Interest-Based Video Coding for Machines [Slides]
O. Stankiewicz, T. Grajek, S. Maćkowiak, J. Stankowski, S. Różek, M. Lorkiewicz, M. Wawrzyniak, M. Domański (Poznan University of Technology)
Compression Without Compromise: Optimizing Point Cloud Object Detection With Bottleneck Architectures for Split Computing [Slides] [paper]
N. A. Ahuja, O. Tickoo, and V. Kashyap (Intel Labs)
AFC: Asymmetrical Feature Coding for Multi-Task Machine Intelligence [Slides] [paper]
Y. Zhang, H. Wang, Y. Li (China Telecom) and L. Yu (Zhejiang University)
Towards Task-Compatible Compressible Representations [Slides] [arXiv]
A. de Andrade and I. V. Bajić (Simon Fraser University)
Compressive Feature Selection for Remote Visual Multi-Task Inference [Slides] [arXiv]
S. Ranjbar Alvar and I. V. Bajić (Simon Fraser University)
Keynote lecture
Enabling Collaborative Intelligence in Dynamic Environments via Learnt Compression
Edge computing enables real-time analysis and decision making close to source of data. Low-power IoT and client devices can leverage the power of the edge by streaming data to a nearby edge server where AI-based analytics can be deployed. Split-computing is an emerging paradigm for such usages wherein deep neural network (DNN) or other AI models are partitioned into a client-side front-end and a server-side back-end . Task-specific intermediate representations are compressed via end-to-end learned compression and transmitted from the front-end to the back-end. This has been shown to achieve far superior rate-accuracy performance compared to compressing and transmitting raw data using standard image or video codecs. In most split-computing approaches, however, the parameters of the DNN need to be retrained if either the compression level or the split-point needs to changed. This is a serious limitation for deploying in realistic environments where the operating network and platform conditions change dynamically. We will present methods to design lightweight, rate-distortion optimized, trainable neural network layers commonly known as 'bottleneck units' that perform compression of DNN features. These can be inserted at any split point of the DNN without modifying its original weights. We demonstrate on a variety of image analytic tasks that this approach achieves state-of-the-art performance and enables adaptivity required in dynamic operating conditions. We also extend this approach to video analytics by introducing flow-based prediction in the feature space. This further improves the rate-accuracy performance by exploiting temporal correlations inherent in video data, while simultaneously reducing compute complexity. Finally, we present early results of extending our approach to 3D visual AI with point-cloud data.
Dr. Nilesh A. Ahuja
Intel Labs
Nilesh A. Ahuja is a AI Research Scientist in Intel Labs. His current research focus is in the area of adaptive AI systems for the Edge. This includes development of efficient and reliable methods for uncertainty estimation in AI systems; its applications to real-world problems such out-of-distribution detection for industrial anomaly detection and novelty detection for continual learning systems; and efficient and adaptive deployments on Edge systems via split computing. His other research interests include 3D computer vision; odometry and SLAM; super-resolution; image, and video processing; and AI methods for video compression. He received his Ph.D. degree in Electrical Engineering from the Pennsylvania State University in 2008. His work has been published in top-tier journals and conferences, and he has over 20 issued or pending US and international patents.
Invited lecture
Visual Data Compression in the AI Era
This talk will delve into the evolution of visual data compression in recent years, spotlighting how compression techniques and AI models are integrated together. Traditionally, image and video codecs such as JPEG, HEVC, and AV1 are designed primarily for accurate pixel reconstruction. However, the advancement of AI technologies has begun transforming these frameworks to meet modern application demands.
This talk will discuss such shift through three lenses:
Compression with AI: enhance the coding performance of compression algorithms with AI techniques. I will discuss the use of generative AI models, particularly variational autoencoders in lossy image compression, and their resemblance to traditional coding concepts such as transform coding, and wavelet transform.
Compression for AI: design compression systems for AI-based recognition and processing (as opposed to human viewing only). I will showcase potential real-world applications of coding for machines and present several recent methods targeting mobile-cloud systems.
Compression of AI: efficient compression of AI models to reduce computational costs. I will present pruning and quantization methods to address the challenges of compressing neural network-based codecs.
Prof. Fengqing Maggie Zhu
Purdue University
Fengqing Maggie Zhu is an Associate Professor of the Elmore Family School of Electrical and Computer Engineering at Purdue University, West Lafayette, Indiana. Dr. Zhu received the B.S.E.E. (with highest distinction), M.S. and Ph.D. degrees in Electrical and Computer Engineering from Purdue University in 2004, 2006 and 2011, respectively. Prior to joining Purdue in 2015, she was a Staff Researcher at Futurewei Technologies, where she received a Certification of Recognition for Core Technology Contribution in 2012. She is the recipient of an NSF CISE Research Initiation Initiative (CRII) award in 2017, a Google Faculty Research Award in 2019, and an ESI and trainee poster award for the NIH Precision Nutrition workshop in 2021. Her research interests include smart health with a focus on image-based dietary assessment and wearable sensor data analysis, visual coding for machines, and application-driven visual data analytics. Dr. Zhu is a senior member of the IEEE. She is the associate editor for the IEEE Transactions on Circuits and Systems for Video Technology and serves as the IEEE Multimedia Signal Processing Technical Committee award subcommittee chair. She has served on the organizing and program committees of major conferences in her field and received recognition such as the Outstanding Area Chair for ICME 2021.
Workshop scope
Multimedia signals – speech, audio, images, video, point clouds, light fields, … – have traditionally been acquired, processed, and compressed for human use. However, it is estimated that in the near future, the majority of Internet connections will be machine-to-machine (M2M). So, increasingly, the data communicated across networks is primarily intended for automated machine analysis. Applications include remote monitoring, surveillance, and diagnostics, autonomous driving and navigation, smart homes / buildings / neighborhoods / cities, and so on. This necessitates rethinking of traditional compression and pre-/post-processing methods to facilitate efficient machine-based analysis of multimedia signals. As a result, standardization efforts such as MPEG VCM (Video Coding for Machines), MPEG FCM (Feature Coding for Machines) and JPEG AI have been launched.
Both the theory and early design examples have shown that significant bit savings for a given inference accuracy are possible compared to traditional human-oriented coding approaches. However, a number of open issues remain. These include a thorough understanding of the tradeoffs involved in coding for machines, coding for multiple machine tasks, as well as combined human-machine use, model architectures, software and hardware optimization, error resilience, privacy, security, and others. The workshop is intended to bring together researchers from academia, industry, and government who are working on related problems, provide a snapshot of the current research and standardization efforts in the area, and generate ideas for future work. We welcome papers on the following and related topics:
Theories and frameworks for coding for machines
Methods for feature compression
End-to-end approaches for coding for machines
Compression for human-and-machine use
Compressed-domain multimedia analysis (understanding, translation, classification, object detection, segmentation, pose estimation, etc.)
Compressed-domain multimedia processing (denoising, super-resolution, enhancement, etc.)
Datasets for coding for machines
Error resilience in coding for machines
Privacy and security in coding for machines
Important dates
Paper submission: 6 Apr 2024
Acceptance notification: 2 May 2024
Camera-ready papers: 16 May 2024
Workshop date: 15 July 2024
Organizers
Fengqing Maggie Zhu, Purdue University, USA
Heming Sun, Yokohama National University, Japan
Hyomin Choi, InterDigital, USA
Ivan V. Bajić, Simon Fraser University, Canada
Technical Program Committee
Balu Adsumilli, Google/YouTube, USA
Nilesh Ahuja, Intel Labs, USA
João Ascenso, Instituto Superior Técnico, Portugal
Zhihao Duan, Purdue University, USA
Yuxing (Erica) Han, Tsinghua University, China
Wei Jiang, Futurewei, USA
Hari Kalva, Florida Atlantic University, USA
André Kaup, Friedrich-Alexander University Erlangen-Nuremberg, Germany
Xiang Li, Google, USA
Weisi Lin, Nanyang Technological University, Singapore
Jiaying Liu, Peking University, China
Saeed Ranjbar Alvar, Huawei, Canada
Shiqi Wang, City University of Hong Kong
Shurun Wang, Alibaba DAMO Academy, China
Li Zhang, ByteDance, USA