interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.","authors":[{"id":"646c1f5d4e9afab66afa096f","name":"Jinming Liu","org":"Shanghai Jiao Tong University","orgid":"5f71b54b1c455f439fe502b0"},{"id":"635100b2ac95af67f73c2151","name":"Ruoyu Feng","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Yunpeng Qi","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"name":"Qiuyu Chen","org":"Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China","orgid":"5f71b44b1c455f439fe48fe1"},{"id":"561669a545ce1e596399e480","name":"Zhibo Chen","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"53f360a0dabfae4b349845ab","name":"Wenjun Zeng","org":"Eastern Institute of Technology, Ningbo","orgid":"5f71b44b1c455f439fe48fe1"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Institute of Technology, Ningbo","orgid":"5f71b44b1c455f439fe48fe1"}],"create_time":"2024-07-17T04:47:18.714Z","doi":"10.1007\u002F978-3-031-72992-8_19","id":"669729b401d2a3fbfc78709a","issn":"0302-9743","keywords":["Neural Image Compression","Image Compression for Machine","Variable-bitrate Compression","Cognition-distortion Trade-off"],"lang":"en","num_citation":0,"pages":{"end":"348","start":"329"},"pdf":"A36A602BA12B186D8B04583637824F54.pdf","title":"Rate-Distortion-Cognition Controllable Versatile Neural Image Compression","update_times":{"u_a_t":"2024-12-25T23:55:09Z","u_v_t":"2024-12-25T23:55:09Z"},"urls":["https:\u002F\u002Farxiv.org\u002Fabs\u002F2407.11700"],"venue":{"info":{"name":"COMPUTER VISION - ECCV 2024, PT LVI"},"volume":"15114"},"versions":[{"id":"669729b401d2a3fbfc78709a","sid":"2407.11700","src":"arxiv","vsid":"COMPUTER VISION - ECCV 2024, PT LVI","year":2025},{"id":"66e0233b01d2a3fbfc2a3b08","sid":"journals\u002Fcorr\u002Fabs-2407-11700","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"67091ff601d2a3fbfcdbd271","sid":"1802","src":"conf_eccv","vsid":"7338","year":2024},{"id":"6721eeca01d2a3fbfc3aab0a","sid":"10.1007\u002F978-3-031-72992-8_19","src":"crossref","year":2024},{"id":"674e82e3ae8580e7ffc6c17e","sid":"conf\u002Feccv\u002FLiuFQCCZJ24","src":"dblp","vsid":"conf\u002Feccv","year":2024},{"id":"676ba70dae8580e7ff125906","sid":"WOS:001352862700019","src":"wos","vsid":"COMPUTER VISION - ECCV 2024, PT LVI","year":2025}],"year":2025},{"abstract":"Language-image pre-training largely relies on how precisely and thoroughly atext describes its paired image. In practice, however, the contents of an imagecan be so rich that well describing them requires lengthy captions (e.g., with10 sentences), which are usually missing in existing datasets. Consequently,there are currently no clear evidences on whether and how language-imagepre-training could benefit from long captions. To figure this out, we firstre-caption 30M images with detailed descriptions using a pre-trainedMulti-modality Large Language Model (MLLM), and then study the usage of theresulting captions under a contrastive learning framework. We observe that,each sentence within a long caption is very likely to describe the imagepartially (e.g., an object). Motivated by this, we propose to dynamicallysample sub-captions from the text label to construct multiple positive pairs,and introduce a grouping loss to match the embeddings of each sub-caption withits corresponding local image patches in a self-supervised manner. Experimentalresults on a wide rage of downstream tasks demonstrate the consistentsuperiority of our method, termed DreamLIP, over previous alternatives,highlighting its fine-grained representational capacity. It is noteworthy that,on the tasks of image-text retrieval and semantic segmentation, our modeltrained with 30M image-text pairs achieves on par or even better performancethan CLIP trained with 400M pairs. Project page is available athttps:\u002F\u002Fzyf0619sjtu.github.io\u002Fdream-lip.","authors":[{"id":"61831d388672f1a6df29485d","name":"Kecheng Zheng","org":"ant group","orgid":"5f71b6501c455f439fe570b9"},{"id":"64ec47318a47b66603e6fcbd","name":"Yifei Zhang","org":"Shanghai Jiao Tong University","orgid":"5f71b54b1c455f439fe502b0"},{"id":"542a1e73dabfae61d4956d48","name":"Wei Wu","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"65ef1e44c5a2072dc1944a98","name":"Fan Lu","org":"University of Science and Technology of China","orgid":"5f71b2c81c455f439fe3e380"},{"id":"646c1f5d4e9afab66afa0a9c","name":"Shuailei Ma","org":"Northeastern University","orgid":"61e698836896273465736757"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Institute of Technology, Ningbo","orgid":"5f71b44b1c455f439fe48fe1"},{"id":"6597b4f37129e2381dc27a09","name":"Wei Chen","org":"State Key Lab of CAD&CG, Zhejiang University"},{"id":"542f9743dabfae3f1d59b2b2","name":"Yujun Shen","org":"Ant Research"}],"create_time":"2024-05-17T19:28:10.403Z","doi":"10.1007\u002F978-3-031-72649-1_5","id":"66023c0d13fb2c6cf6ab6937","issn":"0302-9743","keywords":["Language-image pre-training","Long caption","Multi-modal learning"],"lang":"en","num_citation":0,"pages":{"end":"90","start":"73"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F5E\u002F98\u002F46\u002F5E98462EC2A2B6160F808C0C24FF66BC.pdf","title":"DreamLIP: Language-Image Pre-training with Long Captions","update_times":{"u_a_t":"2024-12-26T03:04:34Z","u_v_t":"2024-12-26T03:04:34Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2403.17007"],"venue":{"info":{"name":"COMPUTER VISION-ECCV 2024, PT XVIII"},"volume":"15076"},"versions":[{"id":"66023c0d13fb2c6cf6ab6937","sid":"2403.17007","src":"arxiv","vsid":"COMPUTER VISION-ECCV 2024, PT XVIII","year":2025},{"id":"6646ca7701d2a3fbfc1f659e","sid":"journals\u002Fcorr\u002Fabs-2403-17007","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"666b872401d2a3fbfcce199b","sid":"W4393213852","src":"openalex","vsid":"S4306400194","year":2024},{"id":"66f9297301d2a3fbfc522343","sid":"10.1007\u002F978-3-031-72649-1_5","src":"crossref","year":2024},{"id":"67091ff401d2a3fbfcdbc709","sid":"1125","src":"conf_eccv","vsid":"2666","year":2024},{"id":"67205d3b01d2a3fbfc1b96b1","sid":"conf\u002Feccv\u002FZhengZWLMJCS24","src":"dblp","vsid":"conf\u002Feccv","year":2024},{"id":"676bdb97ae8580e7ffb0e341","sid":"WOS:001346386900005","src":"wos","vsid":"COMPUTER VISION-ECCV 2024, PT XVIII","year":2025}],"year":2025},{"abstract":"The fine-grained attribute descriptions can significantly supplement thevaluable semantic information for person image, which is vital to the successof person re-identification (ReID) task. However, current ReID algorithmstypically failed to effectively leverage the rich contextual informationavailable, primarily due to their reliance on simplistic and coarse utilizationof image attributes. Recent advances in artificial intelligence generatedcontent have made it possible to automatically generate plentiful fine-grainedattribute descriptions and make full use of them. Thereby, this paper exploresthe potential of using the generated multiple person attributes as prompts inReID tasks with off-the-shelf (large) models for more accurate retrievalresults. To this end, we present a new framework called Multi-Prompts ReID(MP-ReID), based on prompt learning and language models, to fully dip fineattributes to assist ReID task. Specifically, MP-ReID first learns tohallucinate diverse, informative, and promptable sentences for describing thequery images. This procedure includes (i) explicit prompts of which attributesa person has and furthermore (ii) implicit learnable prompts foradjusting\u002Fconditioning the criteria used towards this person identity matching.Explicit prompts are obtained by ensembling generation models, such as ChatGPTand VQA models. Moreover, an alignment module is designed to fuse multi-prompts(i.e., explicit and implicit ones) progressively and mitigate the cross-modalgap. Extensive experiments on the existing attribute-involved ReID datasets,namely, Market1501 and DukeMTMC-reID, demonstrate the effectiveness andrationality of the proposed MP-ReID solution.","authors":[{"email":"yajingzhai9@gmail.com","id":"6625d411d9a27592a933ca31","name":"Yajing Zhai","org":"Hunan Univ, Coll Comp Sci & Elect Engn, Changsha, Peoples R China","orgid":"5f71b2ae1c455f439fe3d7a3"},{"email":"yawenzeng11@gmail.com","id":"54095566dabfae450f4785b5","name":"Yawen Zeng","org":"Hunan Univ, Coll Comp Sci & Elect Engn, Changsha, Peoples R China","orgid":"5f71b2ae1c455f439fe3d7a3"},{"email":"huangzy@comp.nus.edu.sg","id":"54058bbcdabfae91d3feff5b","name":"Zhiyong Huang","org":"Natl Univ Singapore, NUS Res Inst Chongqing, Singapore, Singapore","orgid":"5f71b2971c455f439fe3cecb"},{"email":"zqin@hnu.edu.cn","id":"562fb78f45cedb339979f8c8","name":"Zheng Qin","org":"Hunan Univ, Coll Comp Sci & Elect Engn, Changsha, Peoples R China","orgid":"5f71b2ae1c455f439fe3d7a3"},{"email":"jinxin@eitech.edu.cn","id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Inst Technol, Ningbo Inst Digital Twin, Ningbo, Peoples R China"},{"email":"caoda0721@gmail.com","id":"53f44265dabfaedf435bf1f0","name":"Da Cao","org":"Hunan Univ, Coll Comp Sci & Elect Engn, Changsha, Peoples R China","orgid":"5f71b2ae1c455f439fe3d7a3"}],"create_time":"2024-04-19T06:58:00.108Z","doi":"10.1609\u002Faaai.v38i7.28524","id":"658e4adc939a5f4082dbe496","issn":"2159-5399","keywords":["Feature Learning","Person Re-identification","Multiple Object Tracking","Metric Learning","Cross-View Recognition"],"lang":"en","num_citation":0,"pages":{"end":"6987","start":"6979"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002FBF\u002FC1\u002F11\u002FBFC111776C877A77511F447825C26254.pdf","title":"Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification","update_times":{"u_a_t":"2024-12-25T16:48:20Z","u_v_t":"2024-12-25T16:48:20Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2312.16797"],"venue":{"info":{"name":"THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7"}},"venue_hhb_id":"5ea18efeedb6e7d53c00a01c","versions":[{"id":"658e4adc939a5f4082dbe496","sid":"2312.16797","src":"arxiv","vsid":"THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7","year":2024},{"id":"65bc9f64939a5f408227b898","sid":"journals\u002Fcorr\u002Fabs-2312-16797","src":"dblp","vsid":"journals\u002Fcorr","year":2023},{"id":"65d81a06939a5f4082951f7e","sid":"W4390437616","src":"openalex","vsid":"S4306400194","year":2023},{"id":"66027af713fb2c6cf6154f66","sid":"10.1609\u002Faaai.v38i7.28524","src":"crossref","year":2024},{"id":"6613d9be13fb2c6cf64584dd","sid":"conf\u002Faaai\u002FZhaiZH0JC24","src":"dblp","vsid":"conf\u002Faaai","year":2024},{"id":"666b863b01d2a3fbfccbe75c","sid":"W4393153810","src":"openalex","vsid":"S4210191458","year":2024},{"id":"66ec5daf01d2a3fbfc316cff","sid":"WOS:001239937300054","src":"wos","vsid":"THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7","year":2024}],"year":2024},{"abstract":"This article describes the 2023 IEEE Low-Power Computer Vision Challenge\n(LPCVC). Since 2015, LPCVC has been an international competition devoted to\ntackling the challenge of computer vision (CV) on edge devices. Most CV\nresearchers focus on improving accuracy, at the expense of ever-growing sizes\nof machine models. LPCVC balances accuracy with resource requirements. Winners\nmust achieve high accuracy with short execution time when their CV solutions\nrun on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The\nvision problem for 2023 LPCVC is segmentation of images acquired by Unmanned\nAerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC\nattracted 60 international teams that submitted 676 solutions during the\nsubmission window of one month. This article explains the setup of the\ncompetition and highlights the winners' methods that improve accuracy and\nshorten execution time.","authors":[{"name":"Leo Chen"},{"name":"Benjamin Boardley"},{"name":"Ping Hu"},{"id":"54480ee8dabfae84fd193230","name":"Yiru Wang"},{"id":"6551c204c2bd4f93db219621","name":"Yifan Pu"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin"},{"id":"561d831945cedb33980881c7","name":"Yongqiang Yao"},{"id":"53f456dfdabfaee0d9bf6028","name":"Ruihao Gong"},{"id":"5617171545cedb3397bb9f04","name":"Bo Li"},{"id":"540835d9dabfae44f0870362","name":"Gao Huang"},{"id":"53f4993fdabfaedce5623eae","name":"Xianglong Liu"},{"id":"6661f2993e8f949bd38e9d14","name":"Zifu Wan"},{"name":"Xinwang Chen"},{"id":"5405ac72dabfae91d3000792","name":"Ning Liu"},{"name":"Ziyi Zhang"},{"name":"Dongping Liu"},{"name":"Ruijie Shan"},{"id":"53f32489dabfae9a84469680","name":"Zhengping Che"},{"id":"66987c63f53397aa74b9b043","name":"Fachao Zhang"},{"id":"65d456ebc136ef133166a2ac","name":"Xiaofeng Mou"},{"id":"53f4af12dabfaeb22f5762e8","name":"Jian Tang"},{"name":"Maxim Chuprov"},{"name":"Ivan Malofeev"},{"id":"63734124ec88d95668d6480b","name":"Alexander Goncharenko"},{"name":"Andrey Shcherbin"},{"name":"Arseny Yanchenko"},{"id":"637275e5ec88d95668ce4213","name":"Sergey Alyamkin"},{"id":"63270baca95e4d1d0552cbd3","name":"Xiao Hu"},{"id":"548caaa9dabfae9b401353f7","name":"George K. Thiruvathukal"},{"id":"54850773dabfae9b401332c7","name":"Yung Hsiang Lu"}],"create_time":"2024-05-17T20:03:37.428Z","hashs":{"h1":"2lcvc","h3":"ls"},"id":"65f108e413fb2c6cf6acd29a","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F19\u002F4E\u002F3F\u002F194E3F25AA2989132E5C260517703803.pdf","title":"2023 Low-Power Computer Vision Challenge (LPCVC) Summary","urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2403.07153","db\u002Fjournals\u002Fcorr\u002Fcorr2403.html#abs-2403-07153","https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2403.07153"],"venue":{"info":{"name":"CoRR"},"volume":"abs\u002F2403.07153"},"versions":[{"id":"65f108e413fb2c6cf6acd29a","sid":"2403.07153","src":"arxiv"},{"id":"6646ca7701d2a3fbfc1f65e2","sid":"journals\u002Fcorr\u002Fabs-2403-07153","src":"dblp"}],"year":2024},{"authors":[{"id":"65ed4e720b6735f4855ea396","name":"Wei Zhao"},{"id":"6411567751f0c0b04736ed41","name":"Yijun Wang"},{"id":"542d4bb3dabfae48d123283d","name":"Tianyu He"},{"id":"66612222ca8dc092e69d3bfc","name":"Lianying Yin"},{"id":"5d50c3ba7390bff0db2a8713","name":"Jianxin Lin"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin"}],"create_time":"2024-03-15T10:57:47.054Z","doi":"10.2139\u002Fssrn.4758069","hashs":{"h1":"blfs3","h2":"pfa","h3":"nhpds"},"id":"65f279f113fb2c6cf6407a13","num_citation":0,"title":"Breathing Life into Faces: Speech-Driven 3d Parametric Facial Animation with Natural Head Pose and Detailed Shapes","urls":["http:\u002F\u002Fdx.doi.org\u002F10.2139\u002Fssrn.4758069"],"venue":{"info":{"publisher":"Elsevier BV"}},"versions":[{"id":"65f279f113fb2c6cf6407a13","sid":"10.2139\u002Fssrn.4758069","src":"crossref"}],"year":2024},{"abstract":"While learned image compression methods have achieved impressive results in either human visual perception or machine vision tasks, they are often specialized only for one domain. This drawback limits their versatility and generalizability across scenarios and also requires retraining to adapt to new applications-a process that adds significant complexity and cost in real-world scenarios. In this study, we introduce an innovative semantics DISentanglement and COmposition VERsatile codec (DISCOVER) to simultaneously enhance human-eye perception and machine vision tasks. The approach derives a set of labels per task through multimodal large models, which grounding models are then applied for precise localization, enabling a comprehensive understanding and disentanglement of image components at the encoder side. At the decoding stage, a comprehensive reconstruction of the image is achieved by leveraging these encoded components alongside priors from generative models, thereby optimizing performance for both human visual perception and machine-based analytical tasks. Extensive experimental evaluations substantiate the robustness and effectiveness of DISCOVER, demonstrating superior performance in fulfilling the dual objectives of human and machine vision requirements.","authors":[{"id":"646c1f5d4e9afab66afa096f","name":"Jinming Liu"},{"name":"Yuntao Wei"},{"name":"Junyan Lin"},{"name":"Shengyang Zhao"},{"id":"542f4508dabfaed7c7c3ee04","name":"Heming Sun"},{"id":"53f4532edabfaedf435ff266","name":"Zhibo Chen"},{"id":"53f360a0dabfae4b349845ab","name":"Wenjun Zeng"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin"}],"create_time":"2024-12-25T16:56:53.967Z","id":"676b7234ae8580e7ff87baf2","lang":"en","num_citation":0,"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F63\u002F1A\u002F93\u002F631A934E38E66400340828D146AB597F.pdf","title":"Semantics Disentanglement and Composition for Versatile Codec Toward Both Human-eye Perception and Machine Vision Task","update_times":{"u_a_t":"2024-12-25T16:50:49Z","u_v_t":"2024-12-25T16:50:49Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2412.18158"],"versions":[{"id":"676b7234ae8580e7ff87baf2","sid":"2412.18158","src":"arxiv","year":2024}],"year":2024},{"abstract":"Semantic segmentation in the context of 3D point clouds for the railway environment holds a significant economic value, but its development is severely hindered by the lack of suitable and specific datasets. Additionally, the models trained on existing urban road point cloud datasets demonstrate poor generalisation on railway data due to a large domain gap caused by non‐overlapping special\u002Frare categories, for example, rail track, track bed etc. To harness the potential of supervised learning methods in the domain of 3D railway semantic segmentation, we introduce RailPC, a new point cloud benchmark. RailPC provides a large‐scale dataset with rich annotations for semantic segmentation in the railway environment. Notably, RailPC contains twice the number of annotated points compared to the largest available mobile laser scanning (MLS) point cloud dataset and is the first railway‐specific 3D dataset for semantic segmentation. It covers a total of nearly 25 km railway in two different scenes (urban and mountain), with 3 billion points that are finely labelled as 16 most typical classes with respect to railway, and the data acquisition process is completed in China by MLS systems. Through extensive experimentation, we evaluate the performance of advanced scene understanding methods on the annotated dataset and present a synthetic analysis of semantic segmentation results. Based on our findings, we establish some critical challenges towards railway‐scale point cloud semantic segmentation. The dataset is available at https:\u002F\u002Fgithub.com\u002FNNU‐GISA\u002FGISA‐RailPC, and we will continuously update it based on community feedback.","authors":[{"id":"62e4c09cd9f204418d72618c","name":"Tengping Jiang","org":"Nanjing Normal Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China","orgid":"5f71b2c81c455f439fe3e34a"},{"name":"Shiwei Li","org":"Nanjing Normal Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China","orgid":"5f71b2c81c455f439fe3e34a"},{"id":"651be3748a47b61c50d0e88b","name":"Qinyu Zhang","org":"Nanjing Normal Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China","orgid":"5f71b2c81c455f439fe3e34a"},{"name":"Guangshuai Wang","org":"Tianjin Key Lab Rail Transit Nav Positioning & Spa, Tianjin, Peoples R China"},{"id":"653778f250dee4c422822312","name":"Zequn Zhang","org":"Northwest Normal Univ, Coll Comp Sci & Engn, Lanzhou, Peoples R China","orgid":"5f71b4601c455f439fe498bd"},{"name":"Fankun Zeng","org":"Washington Univ St Louis, McKelvey Sch Engn, St Louis, MO USA","orgid":"5f71b2871c455f439fe3c7e0"},{"name":"Peng An","org":"Ningbo Univ Technol, Sch Elect & Informat Engn, Ningbo, Peoples R China","orgid":"5f71b4e31c455f439fe4d3d7"},{"email":"jinxin@eitech.edu.cn","id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Inst Technol EIT, Ningbo, Peoples R China"},{"email":"liushan@njnu.edu.cn","id":"53f465bcdabfaeee22a5103c","name":"Shan Liu","org":"Nanjing Normal Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China","orgid":"5f71b2c81c455f439fe3e34a"},{"email":"wangyongjun@njnu.edu.cn","id":"560b1e1745cedb3397266127","name":"Yongjun Wang","org":"Nanjing Normal Univ, Jiangsu Ctr Collaborat Innovat Geog Informat Resou, Nanjing, Peoples R China","orgid":"5f71b2c81c455f439fe3e34a"}],"create_time":"2024-09-10T11:05:00.585Z","doi":"10.1049\u002Fcit2.12349","id":"667264b301d2a3fbfc169c56","issn":"2468-2322","keywords":["3-D","data acquisition","scene understanding","segmentation"],"lang":"en","num_citation":0,"title":"RailPC: A Large‐scale Railway Point Cloud Semantic Segmentation Dataset","update_times":{"u_a_t":"2024-12-31T02:10:26Z","u_v_t":"2024-12-31T02:10:26Z"},"urls":["https:\u002F\u002Fapi.crossref.org\u002Fworks\u002F10.1049\u002Fcit2.12349","http:\u002F\u002Fdx.doi.org\u002F10.1049\u002Fcit2.12349","https:\u002F\u002Fietresearch.onlinelibrary.wiley.com\u002Fdoi\u002F10.1049\u002Fcit2.12349"],"venue":{"info":{"name":"CAAI Transactions on Intelligence Technology","publisher":"Institution of Engineering and Technology"}},"versions":[{"id":"667264b301d2a3fbfc169c56","sid":"10.1049\u002Fcit2.12349","src":"crossref","vsid":"S2898415742","year":2024},{"id":"668f842701d2a3fbfc454551","sid":"WOS:001248494400001","src":"wos","vsid":"CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY","year":2024},{"id":"67725339ae8580e7ff410996","sid":"7ab3c84133df4097bfe8343096e924dc","src":"doaj","year":2024},{"id":"66ddd26001d2a3fbfc862b60","sid":"W4399760810","src":"openalex","vsid":"S2898415742","year":2024}],"year":2024},{"abstract":"Humans constantly interact with their surrounding environments. Currenthuman-centric generative models mainly focus on synthesizing humans plausiblyinteracting with static scenes and objects, while the dynamic humanaction-reaction synthesis for ubiquitous causal human-human interactions isless explored. Human-human interactions can be regarded as asymmetric withactors and reactors in atomic interaction periods. In this paper, wecomprehensively analyze the asymmetric, dynamic, synchronous, and detailednature of human-human interactions and propose the first multi-setting humanaction-reaction synthesis benchmark to generate human reactions conditioned ongiven human actions. To begin with, we propose to annotate the actor-reactororder of the interaction sequences for the NTU120, InterHuman, and Chi3Ddatasets. Based on them, a diffusion-based generative model with a Transformerdecoder architecture called ReGenNet together with an explicit distance-basedinteraction loss is proposed to predict human reactions in an online manner,where the future states of actors are unavailable to reactors. Quantitative andqualitative results show that our method can generate instant and plausiblehuman reactions compared to the baselines, and can generalize to unseen actormotions and viewpoint changes.","authors":[{"id":"562121e245cedb33982f3235","name":"Liang Xu","org":"Shanghai Jiao Tong University","orgid":"5f71b54b1c455f439fe502b0"},{"id":"54056999dabfae8faa5cf31f","name":"Yizhou Zhou","org":"WeChat AI"},{"id":"560b7f7745cedb3397335491","name":"Yichao Yan","org":"Shanghai Jiao Tong University","orgid":"5f71b54b1c455f439fe502b0"},{"id":"562cd5d345cedb3398cdb3ef","name":"Xin Jin","org":"Eastern Institute for Advanced Study","orgid":"61e6981668962734657358c8"},{"id":"56275a9345ce1e596592f063","name":"Wenhan Zhu"},{"id":"53f39fc7dabfae4b34ab28f6","name":"Fengyun Rao","org":"WeChat, Tencent Inc.","orgid":"5f71b4341c455f439fe48558"},{"id":"542a9df6dabfae646d5782d2","name":"Xiaokang Yang","org":"Shanghai Jiao Tong University, China","orgid":"5f71b54b1c455f439fe502b0"},{"id":"53f360a0dabfae4b349845ab","name":"Wenjun Zeng"}],"create_time":"2024-05-17T22:17:08.197Z","doi":"10.1109\u002Fcvpr52733.2024.00173","id":"65f9019c13fb2c6cf673c594","keywords":["Human Reaction Generation","Human-Human Interaction","Human Motion Generation"],"lang":"en","num_citation":0,"pages":{"end":"1769","start":"1759"},"pdf":"https:\u002F\u002Fcz5waila03cyo0tux1owpyofgoryroob.aminer.cn\u002F48\u002FC3\u002F16\u002F48C3167E18AF3CF4B2C7C356C25B3F9B.pdf","title":"ReGenNet: Towards Human Action-Reaction Synthesis","update_times":{"u_a_t":"2024-10-29T14:10:14Z","u_v_t":"2024-10-29T14:10:14Z"},"urls":["http:\u002F\u002Farxiv.org\u002Fabs\u002F2403.11882"],"venue":{"info":{"name":"Computer Vision and Pattern Recognition"}},"venue_hhb_id":"5ea18efeedb6e7d53c00a01c","versions":[{"id":"65f9019c13fb2c6cf673c594","sid":"2403.11882","src":"arxiv","vsid":"conf\u002Fcvpr","year":2024},{"id":"65fc055b13fb2c6cf6df1d77","sid":"cvpr2024#113","src":"conf_cvpr","vsid":"CVPR 2024","year":2024},{"id":"6646ca7801d2a3fbfc1f7a47","sid":"journals\u002Fcorr\u002Fabs-2403-11882","src":"dblp","vsid":"journals\u002Fcorr","year":2024},{"id":"666b855b01d2a3fbfcc9fc43","sid":"W4392971638","src":"openalex","vsid":"S4306400194","year":2024},{"id":"66e8bec301d2a3fbfc8ac59b","sid":"10654934","src":"ieee","vsid":"10654794","year":2024},{"id":"672078e601d2a3fbfc3b63f2","sid":"conf\u002Fcvpr\u002FXuZY0ZRYZ24","src":"dblp","vsid":"conf\u002Fcvpr","year":2024}],"year":2024}],"profilePubsTotal":121,"profilePatentsPage":0,"profilePatents":null,"profilePatentsTotal":null,"profilePatentsEnd":false,"profileProjectsPage":0,"profileProjects":null,"profileProjectsTotal":null,"newInfo":null,"checkDelPubs":[]}};