Multimodal large language models have shown powerful abilities to understand and reason across text and images, but their ...
The quadratic computational complexity of the Transformer attention mechanism makes large context LLM inferences prohibitively costly. This project, inspired by the research of DeepSeek-OCR, ...
This repository contains the codebase for our paper on Lossy Image Compression with Conditional Diffusion Models. We provide an off-the-shelf test code for both x-parameterization and ...
Abstract: 3D models have become an essential element of multimedia applications because they provide visual effects that permit interactive exploration. As in other media types, efficient compression ...