interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.","authors":[{"id":"646c1f5d4e9afab66afa096f","name":"Jinming Liu","org":"Shanghai Jiao Tong University","orgid":"5f71b54b1c455f439fe502b0"},{"id":"635100b2ac95af67f73c2151","name":"Ruoyu Feng","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Yunpeng Qi","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Qiuyu Chen","org":"Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China","orgid":"5f71b44b1c455f439fe48fe1"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"53f360a0dabfae4b349845ab","name":"Wenjun Zeng","org":"Eastern Institute of Technology, Ningbo","orgid":"5f71b44b1c455f439fe48fe1"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Institute of Technology, Ningbo","orgid":"5f71b44b1c455f439fe48fe1"}],"create_time":"2024-07-17T04:47:18.714Z","doi":"10.1007\u002F978-3-031-72992-8_19","id":"669729b401d2a3fbfc78709a","issn":"0302-9743","keywords":["Neural Image Compression","Image Compression for Machine","Variable-bitrate Compression","Cognition-distortion Trade-off"],"lang":"en","num_citation":0,"pages":{"end":"348","start":"329"},"pdf":"A36A602BA12B186D8B04583637824F54.pdf","title":"Rate-Distortion-Cognition Controllable Versatile Neural Image Compression","update_times":{"u_a_t":"2024-12-25T23:55:09Z","u_v_t":"2024-12-25T23:55:09Z"},"urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.11700"],"venue":{"info":{"name":"COMPUTER VISION - ECCV 2024, PT LVI"},"volume":"15114"},"versions":[{"id":"669729b401d2a3fbfc78709a","sid":"2407.11700","src":"arxiv","vsid":"COMPUTER VISION - ECCV 2024, PT LVI","year":2025},{"id":"66e0233b01d2a3fbfc2a3b08","sid":"journals\u002Fcorr\u002Fabs-2407-11700","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"67091ff601d2a3fbfcdbd271","sid":"1802","src":"conf_eccv","vsid":"7338","year":2024},{"id":"6721eeca01d2a3fbfc3aab0a","sid":"10.1007\u002F978-3-031-72992-8_19","src":"crossref","year":2024},{"id":"674e82e3ae8580e7ffc6c17e","sid":"conf\u002Feccv\u002FLiuFQCCZJ24","src":"dblp","vsid":"conf\u002Feccv","year":2024},{"id":"676ba70dae8580e7ff125906","sid":"WOS:001352862700019","src":"wos","vsid":"COMPUTER VISION - ECCV 2024, PT LVI","year":2025}],"year":2025},{"abstract":"We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develops the powerful mixture-of-experts (MoE) prompt module, where some basic prompts cooperate to excavate the task-customized diffusion priors from Stable Diffusion (SD) for each compression task. Moreover, the degradation-aware routing mechanism is proposed to enable the flexible assignment of basic prompts. To activate and reuse the cross-modality generation prior of SD, we design the visual-to-text adapter for MoE-DiffIR, which aims to adapt the embedding of low-quality images from the visual domain to the textual domain as the textual guidance for SD, enabling more consistent and reasonable texture generation. We also construct one comprehensive benchmark dataset for universal CIR, covering 21 types of degradations from 7 popular traditional and learned codecs. Extensive experiments on universal CIR have demonstrated the excellent robustness and texture restoration capability of our proposed MoE-DiffIR. The project can be found at https:\u002F\u002Frenyulin-f.github.io\u002FMoE-DiffIR.github.io\u002F.","authors":[{"name":"Yulin Ren","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"6551c204c2bd4f93db2195d9","name":"Xin Li","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"66054c1b919ea4fd56a832f8","name":"Bingchen Li","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Xingrui Wang","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Mengxi China Guo","org":"Bytedance Inc.","orgid":"5f71b69c1c455f439fe5932f"},{"id":"63271b1ba95e4d1d0553f514","name":"Shijie Zhao","org":"Bytedance Inc.","orgid":"5f71b69c1c455f439fe5932f"},{"id":"56104b4545cedb33977dea07","name":"Li Zhang","org":"Bytedance Inc.","orgid":"5f71b69c1c455f439fe5932f"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"}],"create_time":"2024-07-17T05:44:27.505Z","doi":"10.1007\u002F978-3-031-72673-6_7","id":"6695d3c001d2a3fbfccec13f","issn":"0302-9743","keywords":["Compressed Image Restoration","Mixture-of-Experts","Prompt Learning","Stable Diffusion"],"lang":"en","num_citation":0,"pages":{"end":"134","start":"116"},"pdf":"8D745929312ACAF79D2042D28BCDDA25.pdf","title":"MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration","update_times":{"u_a_t":"2024-12-25T23:55:27Z","u_v_t":"2024-12-25T23:55:27Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2407.10833"],"venue":{"info":{"name":"COMPUTER VISION - ECCV 2024, PT IX"},"volume":"15067"},"versions":[{"id":"6695d3c001d2a3fbfccec13f","sid":"2407.10833","src":"arxiv","vsid":"COMPUTER VISION - ECCV 2024, PT IX","year":2025},{"id":"66e0233b01d2a3fbfc2a41f6","sid":"journals\u002Fcorr\u002Fabs-2407-10833","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"67091ff801d2a3fbfcdbe721","sid":"1787","src":"conf_eccv","vsid":"1381","year":2024},{"id":"6716ac3401d2a3fbfcd6f474","sid":"10.1007\u002F978-3-031-72673-6_7","src":"crossref","year":2024},{"id":"674e829dae8580e7ffc6796e","sid":"conf\u002Feccv\u002FRenLLWGZZC24","src":"dblp","vsid":"conf\u002Feccv","year":2024},{"id":"676ba703ae8580e7ff1244b8","sid":"WOS:001352785800007","src":"wos","vsid":"COMPUTER VISION - ECCV 2024, PT IX","year":2025}],"year":2025},{"abstract":"3D Gaussian Splatting (3DGS) has become an emerging technique with remarkablepotential in 3D representation and image rendering. However, the substantialstorage overhead of 3DGS significantly impedes its practical applications. Inthis work, we formulate the compact 3D Gaussian learning as an end-to-endRate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that canachieve flexible and continuous rate control. RDO-Gaussian addresses two mainissues that exist in current schemes: 1) Different from prior endeavors thatminimize the rate under the fixed distortion, we introduce dynamic pruning andentropy-constrained vector quantization (ECVQ) that optimize the rate anddistortion at the same time. 2) Previous works treat the colors of eachGaussian equally, while we model the colors of different regions and materialswith learnable numbers of parameters. We verify our method on both real andsynthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3DGaussian over 40x, and surpasses existing methods in rate-distortionperformance.","authors":[{"id":"6672947068feb7602f3085ef","name":"Henan Wang","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"66054c1b919ea4fd56a83385","name":"Hanxin Zhu","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"542d4bb3dabfae48d123283d","name":"Tianyu He","org":"Microsoft Research Asia","orgid":"5f71b2831c455f439fe3c634"},{"id":"646c1f5d4e9afab66afa0a5a","name":"Runsen Feng","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Jiajun Deng","org":"The University of Adelaide","orgid":"5f71b2911c455f439fe3cc23"},{"id":"53f556d9dabfae963d25ea60","name":"Jiang Bian","org":"Microsoft Research Asia","orgid":"5f71b2831c455f439fe3c634"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"}],"create_time":"2024-08-07T00:12:00.702Z","doi":"10.1007\u002F978-3-031-73636-0_5","id":"665fc71c01d2a3fbfc4cc6ce","issn":"0302-9743","lang":"en","num_citation":0,"pages":{"end":"92","start":"76"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FD9\u002F19\u002F8B\u002FD9198BFBB21F8E5F4FDCD23E45A17E4A.pdf","title":"End-to-End Rate-Distortion Optimized 3D Gaussian Representation","update_times":{"u_a_t":"2024-12-25T01:31:03Z","u_v_t":"2024-12-25T01:31:03Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2406.01597"],"venue":{"info":{"name":"COMPUTER VISION - ECCV 2024, PT LVIII"},"volume":"15116"},"versions":[{"id":"665fc71c01d2a3fbfc4cc6ce","sid":"2406.01597","src":"arxiv","vsid":"COMPUTER VISION - ECCV 2024, PT LVIII","year":2025},{"id":"66b18cca01d2a3fbfc7c1d87","sid":"journals\u002Fcorr\u002Fabs-2406-01597","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"67091ff801d2a3fbfcdbe7ed","sid":"2039","src":"conf_eccv","vsid":"7526","year":2024},{"id":"67294ad801d2a3fbfcaa7fb5","sid":"10.1007\u002F978-3-031-73636-0_5","src":"crossref","year":2024},{"id":"674e82e3ae8580e7ffc6c0d6","sid":"conf\u002Feccv\u002FWangZHFDBC24","src":"dblp","vsid":"conf\u002Feccv","year":2024},{"id":"676aedd0ae8580e7ff28ded0","sid":"WOS:001353698800005","src":"wos","vsid":"COMPUTER VISION - ECCV 2024, PT LVIII","year":2025}],"year":2025},{"abstract":"We present the first loss agent, dubbed LossAgent, for low-level image processing tasks, e.g., image super-resolution and restoration, intending to achieve any customized optimization objectives of low-level image processing in different practical applications. Notably, not all optimization objectives, such as complex hand-crafted perceptual metrics, text description, and intricate human feedback, can be instantiated with existing low-level losses, e.g., MSE loss. which presents a crucial challenge in optimizing image processing networks in an end-to-end manner. To eliminate this, our LossAgent introduces the powerful large language model (LLM) as the loss agent, where the rich textual understanding of prior knowledge empowers the loss agent with the potential to understand complex optimization objectives, trajectory, and state feedback from external environments in the optimization process of the low-level image processing networks. In particular, we establish the loss repository by incorporating existing loss functions that support the end-to-end optimization for low-level image processing. Then, we design the optimization-oriented prompt engineering for the loss agent to actively and intelligently decide the compositional weights for each loss in the repository at each optimization interaction, thereby achieving the required optimization trajectory for any customized optimization objectives. Extensive experiments on three typical low-level image processing tasks and multiple optimization objectives have shown the effectiveness and applicability of our proposed LossAgent. Code and pre-trained models will be available at https:\u002F\u002Fgithub.com\u002Flbc12345\u002FLossAgent.","authors":[{"id":"66054c1b919ea4fd56a832f8","name":"Bingchen Li"},{"id":"6551c204c2bd4f93db2195d9","name":"Xin Li"},{"id":"660a7e1b33a35fdbc670f6b4","name":"Yiting Lu"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen"}],"create_time":"2024-12-06T11:25:12.118Z","id":"67526d9eae8580e7ff3d4212","lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F67\u002F8D\u002FCB\u002F678DCB9FD94F3FB408C080A6B824A4AD.pdf","title":"LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents","update_times":{"u_a_t":"2024-12-06T11:25:12.118Z","u_v_t":"2024-12-06T11:25:12.118Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04090"],"versions":[{"id":"67526d9eae8580e7ff3d4212","sid":"2412.04090","src":"arxiv","year":2024}],"year":2024},{"abstract":"The objective of non-reference video quality assessment is to evaluate thequality of distorted video without access to reference high-definitionreferences. In this study, we introduce an enhanced spatial perception module,pre-trained on multiple image quality assessment datasets, and a lightweighttemporal fusion module to address the no-reference visual quality assessment(NR-VQA) task. This model implements Swin Transformer V2 as a local-levelspatial feature extractor and fuses these multi-stage representations through aseries of transformer layers. Furthermore, a temporal transformer is utilizedfor spatiotemporal feature fusion across the video. To accommodate compressedvideos of varying bitrates, we incorporate a coarse-to-fine contrastivestrategy to enrich the model's capability to discriminate features from videosof different bitrates. This is an expanded version of the one-page abstract.","authors":[{"email":"yuzihao@mail.ustc.edu.cn","id":"668b81646a21953c984afae5","name":"Zihao Yu","org":"Univ Sci & Technol China, Hefei, Anhui, Peoples R China","orgid":"5f71b2c81c455f439fe3e380"},{"email":"guanfb@mail.ustc.edu.cn","id":"668b81646a21953c984afae8","name":"Fengbin Guan","org":"Univ Sci & Technol China, Hefei, Anhui, Peoples R China","orgid":"5f71b2c81c455f439fe3e380"},{"email":"luyt31415@mail.ustc.edu.cn","id":"660a7e1b33a35fdbc670f6b4","name":"Yiting Lu","org":"Univ Sci & Technol China, Hefei, Anhui, Peoples R China","orgid":"5f71b2c81c455f439fe3e380"},{"email":"lixin666@mail.ustc.edu.cn","id":"6551c204c2bd4f93db2195d9","name":"Xin Li","org":"Univ Sci & Technol China, Hefei, Anhui, Peoples R China","orgid":"5f71b2c81c455f439fe3e380"},{"email":"chenzhibo@ustc.edu.cn","id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"Univ Sci & Technol China, Hefei, Anhui, Peoples R China","orgid":"5f71b2c81c455f439fe3e380"}],"create_time":"2024-05-27T20:04:57.98Z","doi":"10.1109\u002Fdcc58796.2024.00118","id":"65a7511f939a5f408259872d","issn":"1068-0314","keywords":["Quality Assessment","Level Of Quality","Bitrate","Ranking Loss","Video Compression"],"lang":"en","num_citation":0,"pages":{"end":"601","start":"601"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F78\u002F4A\u002F1A\u002F784A1AACC56905393C610478C8E9C4A7.pdf","title":"Video Quality Assessment Based on Swin TransformerV2 and Coarse to Fine Strategy","update_times":{"u_a_t":"2024-12-25T21:21:45Z","u_v_t":"2024-12-25T21:21:45Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2401.08522"],"venue":{"info":{"name":"2024 DATA COMPRESSION CONFERENCE, DCC"}},"versions":[{"id":"65a7511f939a5f408259872d","sid":"2401.08522","src":"arxiv","vsid":"2024 DATA COMPRESSION CONFERENCE, DCC","year":2024},{"id":"65bc9f64939a5f408227a58c","sid":"journals\u002Fcorr\u002Fabs-2401-08522","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"65dc3475939a5f408269706b","sid":"W4390963128","src":"openalex","vsid":"S4306400194","year":2024},{"id":"664d811b01d2a3fbfc6f8ba0","sid":"10.1109\u002Fdcc58796.2024.00118","src":"crossref","year":2024},{"id":"664e1e2001d2a3fbfc624837","sid":"10533836","src":"ieee","vsid":"10533696","year":2024},{"id":"6654052901d2a3fbfcb1a8c5","sid":"conf\u002Fdcc\u002FYuGL0024","src":"dblp","vsid":"conf\u002Fdcc","year":2024},{"id":"66cd89a601d2a3fbfca20f4d","sid":"W4398163716","src":"openalex","year":2024},{"id":"66b46f0801d2a3fbfc9e193e","sid":"WOS:001233817300110","src":"wos","vsid":"2024 DATA COMPRESSION CONFERENCE, DCC","year":2024}],"year":2024},{"abstract":"Recently, AI-generated content (AIGC) has gained significant traction due to its powerful creation capability. However, the storage and transmission of large amounts of high-quality AIGC images inevitably pose new challenges for recent file formats. To overcome this, we define a new file format for AIGC images, named AIGIF, enabling ultra-low bitrate coding of AIGC images. Unlike compressing AIGC images intuitively with pixel-wise space as existing file formats, AIGIF instead compresses the generation syntax. This raises a crucial question: Which generation syntax elements, e.g., text prompt, device configuration, etc, are necessary for compression\u002Ftransmission? To answer this question, we systematically investigate the effects of three essential factors: platform, generative model, and data configuration. We experimentally find that a well-designed composable bitstream structure incorporating the above three factors can achieve an impressive compression ratio of even up to 1\u002F10,000 while still ensuring high fidelity. We also introduce an expandable syntax in AIGIF to support the extension of the most advanced generation models to be developed in the future.","authors":[{"name":"Yixin Gao"},{"id":"646c1f5d4e9afab66afa0a5a","name":"Runsen Feng"},{"id":"6508e278ea10f8257b3c75ff","name":"Xin Li"},{"id":"542bf705dabfae2b4e1c2831","name":"Weiping Li"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen"}],"create_time":"2024-10-15T11:30:12.417Z","id":"670ddee301d2a3fbfc10e835","lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F40\u002F6F\u002FB6\u002F406FB6B1665B8FDA7FE6888022D01E00.pdf","title":"Towards Defining an Efficient and Expandable File Format for AI-Generated Contents","update_times":{"u_a_t":"2025-01-03T16:15:17Z","u_v_t":"2025-01-03T16:15:17Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2410.09834"],"venue":{"info":{"name":"CoRR"},"volume":"abs\u002F2410.09834"},"versions":[{"id":"670ddee301d2a3fbfc10e835","sid":"2410.09834","src":"arxiv","vsid":"journals\u002Fcorr","year":2024},{"id":"6777993cae8580e7ff4564b1","sid":"journals\u002Fcorr\u002Fabs-2410-09834","src":"dblp","vsid":"journals\u002Fcorr","year":2024}],"year":2024},{"abstract":"User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions. Concretely, we introduce two powerful priors, i.e., the content and distortion priors, by extracting the content and distortion embeddings from two pre-trained feature extractors. Then we adopt these two powerful embeddings as the adaptive prior tokens, which are transferred to the vision transformer backbone jointly with implicit quality features. Based on the above strategy, the proposed PriorFormer achieves state-of-the-art performance on three public UGC VQA datasets including KoNViD-1K, LIVE-VQC and YouTube-UGC.","authors":[{"id":"666122256406bfa3f4a03856","name":"Yajing Pei"},{"name":"Shiyu Huang"},{"id":"660a7e1b33a35fdbc670f6b4","name":"Yiting Lu"},{"id":"6551c204c2bd4f93db2195d9","name":"Xin Li"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen"}],"create_time":"2024-08-07T09:39:49.314Z","doi":"10.1109\u002Ficip51287.2024.10647869","id":"667a253e01d2a3fbfc71538f","keywords":["User Generated Content","video quality assessment","Transformer"],"lang":"en","num_citation":0,"pages":{"end":"1252","start":"1246"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FFA\u002FDD\u002FBF\u002FFADDBFC8CFB7CDA585E014D2D6144EF4.pdf","title":"Priorformer: A UGC-VQA Method with Content and Distortion Priors","update_times":{"u_a_t":"2024-09-28T18:45:05Z","u_v_t":"2024-09-28T18:45:05Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2406.16297"],"venue":{"info":{"name":"2024 IEEE International Conference on Image Processing (ICIP)","publisher":"IEEE"},"volume":"abs\u002F2211.04894"},"versions":[{"id":"667a253e01d2a3fbfc71538f","sid":"2406.16297","src":"arxiv","year":2024},{"id":"66b18cc401d2a3fbfc7bedb7","sid":"journals\u002Fcorr\u002Fabs-2406-16297","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"66f73eb501d2a3fbfcaa148e","sid":"10647869","src":"ieee","vsid":"10647221","year":2024},{"id":"66f7dbb401d2a3fbfc9b6c3d","sid":"10.1109\u002Ficip51287.2024.10647869","src":"crossref","year":2024}],"year":2024},{"abstract":"Tuning-free long video diffusion has been proposed to generate\nextended-duration videos with enriched content by reusing the knowledge from\npre-trained short video diffusion model without retraining. However, most works\noverlook the fine-grained long-term video consistency modeling, resulting in\nlimited scene consistency (i.e., unreasonable object or background\ntransitions), especially with multiple text inputs. To mitigate this, we\npropose the Consistency Noise Injection, dubbed CoNo, which introduces the\n\"look-back\" mechanism to enhance the fine-grained scene transition between\ndifferent video clips, and designs the long-term consistency regularization to\neliminate the content shifts when extending video contents through noise\nprediction. In particular, the \"look-back\" mechanism breaks the noise\nscheduling process into three essential parts, where one internal noise\nprediction part is injected into two video-extending parts, intending to\nachieve a fine-grained transition between two video clips. The long-term\nconsistency regularization focuses on explicitly minimizing the pixel-wise\ndistance between the predicted noises of the extended video clip and the\noriginal one, thereby preventing abrupt scene transitions. Extensive\nexperiments have shown the effectiveness of the above strategies by performing\nlong-video generation under both single- and multi-text prompt conditions. The\nproject has been available in https:\u002F\u002Fwxrui182.github.io\u002FCoNo.github.io\u002F.","authors":[{"name":"Xingrui Wang"},{"id":"6551c204c2bd4f93db2195d9","name":"Xin Li"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen"}],"create_time":"2024-08-07T01:55:57.321Z","hashs":{"h1":"ccnit","h3":"lvd"},"id":"66665dee01d2a3fbfc326484","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F96\u002FBD\u002F37\u002F96BD37F175F92842C607C0FF593803A5.pdf","title":"CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion","update_times":{"u_v_t":"2024-08-20T19:28:18.328Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2406.05082","db\u002Fjournals\u002Fcorr\u002Fcorr2406.html#abs-2406-05082","https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2406.05082"],"venue":{"info":{"name":"CoRR"},"volume":"abs\u002F2406.05082"},"versions":[{"id":"66665dee01d2a3fbfc326484","sid":"2406.05082","src":"arxiv"},{"id":"66b18cca01d2a3fbfc7c1f49","sid":"journals\u002Fcorr\u002Fabs-2406-05082","src":"dblp"}],"year":2024},{"abstract":"This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the domain of visual signal coding and processing. This survey study begins with a brief introduction of well-established generative models, including the Variational Autoencoder (VAE) models, Generative Adversarial Network (GAN) models, Autoregressive (AR) models, Normalizing Flows and Diffusion models. The subsequent section of the paper explores the advancements in visual signal coding based on generative models, as well as the ongoing international standardization activities. In the realm of visual signal processing, our focus lies on the application and development of various generative models in the research of visual signal restoration. We also present the latest developments in generative visual signal synthesis and editing, along with visual signal quality assessment using generative models and quality assessment for generative models. The practical implementation of these studies is closely linked to the investigation of fast optimization. This paper additionally presents the latest advancements in fast optimization on visual signal coding and processing with generative models. We hope to advance this field by providing researchers and practitioners a comprehensive literature review on the topic of visual signal coding and processing with generative models.","authors":[{"email":"chenzhibo@ustc.edu.cn","id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230026, Peoples R China"},{"email":"sun-heming-vg@ynu.ac.jp","id":"542f4508dabfaed7c7c3ee04","name":"Heming Sun","org":"Yokohama Natl Univ, Fac Engn, Yokohama, Kanagawa 2400067, Japan","orgid":"5f71b29a1c455f439fe3cff8"},{"email":"lizhang.idm@bytedance.com","id":"56104b4545cedb33977dea07","name":"Li Zhang","org":"Bytedance Inc, San Diego, CA 92122 USA","orgid":"5f71b69c1c455f439fe5932f"},{"email":"fan.zhang@bristol.ac.uk","id":"560c289f45cedb33974b3112","name":"Fan Zhang","org":"Univ Bristol, Sch Comp Sci, Bristol BS8 1UB, England"}],"create_time":"2024-08-07T01:53:57.575Z","doi":"10.1109\u002Fjetcas.2024.3403524","id":"664d48a201d2a3fbfc2e5cb1","issn":"2156-3357","keywords":["Visualization","Data models","Encoding","Image coding","Optimization","Predictive models","Training","Generative models","visual signal coding","visual signal processing","optimization"],"lang":"en","num_citation":0,"pages":{"end":"171","start":"149"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F8E\u002F93\u002F3F\u002F8E933F6CDC7E6EC45D37FF9AE41DEE5A.pdf","title":"Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization","update_times":{"u_a_t":"2024-12-26T05:03:36Z","u_v_t":"2024-12-26T05:03:36Z"},"urls":["http:\u002F\u002Fdx.doi.org\u002F10.1109\u002Fjetcas.2024.3403524"],"venue":{"info":{"name":"IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS"},"issue":"2","volume":"14"},"versions":[{"id":"664d48a201d2a3fbfc2e5cb1","sid":"10.1109\u002Fjetcas.2024.3403524","src":"crossref","vsid":"IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS","year":2024},{"id":"664d7bc101d2a3fbfc68588a","sid":"10535893","src":"ieee","vsid":"5503868","year":2024},{"id":"664ff4d301d2a3fbfc51cfa8","sid":"2405.14221","src":"arxiv","year":2024},{"id":"6683b81e01d2a3fbfc8ee9f9","sid":"journals\u002Fcorr\u002Fabs-2405-14221","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"66b18caa01d2a3fbfc7b706b","sid":"journals\u002Festicas\u002FChenSZZ24a","src":"dblp","vsid":"journals\u002Festicas","year":2024},{"id":"66e5d33a01d2a3fbfc1f834e","sid":"W4398188163","src":"openalex","vsid":"S142323794","year":2024},{"id":"66b47d3801d2a3fbfcd3fb95","sid":"WOS:001263602100016","src":"wos","vsid":"IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS","year":2024}],"year":2024},{"abstract":"Recently, adaptive graph convolutional network based traffic prediction methods, learning a latent graph structure from traffic data via various attention -based mechanisms, have achieved impressive performance. However, they are still limited to finding a better description of spatial relationships between traffic conditions due to: (1) ignoring the prior of the observed road network topology; (2) neglecting the presence of negative spatial relationships; and (3) lacking investigation on the uncertainty of the graph structure. In this paper, we propose a Bayesian Graph Convolutional Network (BGCN) framework to alleviate these issues. Under this framework, the graph structure is viewed as a random realization from a parametric generative model, and its posterior is inferred using the observed topology of the road network and traffic data. Specifically, the parametric generative model is comprised of two parts: (1) a constant adjacency matrix that discovers potential spatial relationships from the observed physical connections between roads using a Bayesian approach; (2) a learnable adjacency matrix that learns globally shared spatial correlations from traffic data in an end -to -end fashion and can model negative spatial correlations. The posterior of the graph structure is then approximated by performing Monte Carlo dropout on the parametric graph structure. We verify the effectiveness of our method on five real -world datasets, and the experimental results demonstrate that BGCN attains superior performance compared with state-of-the-art methods. The source code is available at https:\u002F\u002Fgithub.com\u002F JunFu1995\u002FBGCN.git.","authors":[{"id":"53f39bfddabfae4b34a99aa3","name":"Jun Fu","org":"CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"560be35845cedb339741227b","name":"Wei Zhou","org":"CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China","orgid":"5f71b2c81c455f439fe3e380"},{"email":"chenzhibo@ustc.edu.cn","id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, University of Science and Technology of China, Hefei 230027, China","orgid":"5f71b2c81c455f439fe3e380"}],"create_time":"2024-06-11T07:38:15.371Z","doi":"10.1016\u002Fj.neucom.2024.127507","id":"6066f80991e011f2d6d47c76","issn":"0925-2312","keywords":["Traffic prediction","Bayesian","Generative model"],"lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FD0\u002FBE\u002FCD\u002FD0BECDF2C830B1E8CD1C7F2A78F1D5E2.pdf","title":"Bayesian Graph Convolutional Network for Traffic Prediction.","update_times":{"u_a_t":"2024-12-25T20:59:37Z","u_v_t":"2024-12-25T20:59:37Z"},"urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2104.00488"],"venue":{"info":{"name":"NEUROCOMPUTING"},"volume":"582"},"venue_hhb_id":"5ea18efeedb6e7d53c00a01c","versions":[{"id":"645647a0d68f896efae31d9f","sid":"journals\u002Fcorr\u002Fabs-2104-00488","src":"dblp","vsid":"NEUROCOMPUTING","year":2024},{"id":"657776f3939a5f4082617020","sid":"W3148598834","src":"openalex","vsid":"S45693802","year":2024},{"id":"65efcc9e13fb2c6cf66c1e97","sid":"S0925231224002789","src":"sciencedirect","vsid":"neurocomputing","year":2024},{"id":"65efd9e913fb2c6cf6950b4a","sid":"10.1016\u002Fj.neucom.2024.127507","src":"crossref","year":2024},{"id":"665d945d01d2a3fbfc0711c7","sid":"journals\u002Fijon\u002FFuZC24","src":"dblp","vsid":"journals\u002Fijon","year":2024},{"id":"6660ed2601d2a3fbfc3e4d65","sid":"W4392657388","src":"openalex","vsid":"S4306400194","year":2021},{"id":"6066f80991e011f2d6d47c76","sid":"2104.00488","src":"arxiv","year":2021},{"id":"61c8d70c5244ab9dcb100204","sid":"7d9d83b732bc7fc24cf560f98d7c05da77dd60cc","src":"semanticscholar","year":2021},{"id":"6678abea01d2a3fbfcbd993b","sid":"WOS:001222242400001","src":"wos","vsid":"NEUROCOMPUTING","year":2024}],"year":2024}],"profilePubsTotal":398,"profilePatentsPage":0,"profilePatents":null,"profilePatentsTotal":null,"profilePatentsEnd":false,"profileProjectsPage":0,"profileProjects":null,"profileProjectsTotal":null,"newInfo":null,"checkDelPubs":[]}};