This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP
No Data
Image Captioning
- Partially Non-Autoregressive Image Captioning
- Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network. [paper]
- Object Relation Attention for Image Paragraph Captioning
- Dual-Level Collaborative Transformer for Image Captioning
- Memory-Augmented Image Captioning
- Image Captioning with Context-Aware Auxiliary Guidance. [paper]
- Consensus Graph Representation Learning for Better Grounded Image Captioning
- FixMyPose: Pose Correctional Captioning and Retrieval
- VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning [paper]
Video Captioning
- Non-Autoregressive Coarse-to-Fine Video Captioning. [paper]
- Semantic Grouping Network for Video Captioning
- Augmented Partial Mutual Learning with Frame Masking for Video Captioning
Image Captioning - Structural Semantic Adversarial Active Learning for Image Captioning.
oral[paper]
Video Captioning - Controllable Video Captioning with an Exemplar Sentence.
oral[paper]
oral[paper]
Image Captioning - Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets.
oral[paper] - In-Home Daily-Life Captioning Using Radio Signals.
oral[paper] [website] - TextCaps: a Dataset for Image Captioning with Reading Comprehension.
oral[paper] [website] [code] - SODA: Story Oriented Dense Video Captioning Evaluation Framework. [paper] - Towards Unique and Informative Captioning of Images. [paper] - Learning Visual Representations with Caption Annotations. [paper] [website] - Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. [paper] - Length Controllable Image Captioning. [paper] [code] - Comprehensive Image Captioning via Scene Graph Decomposition. [paper] [website] - Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning. [paper] - Captioning Images Taken by People Who Are Blind. [paper] - Learning to Generate Grounded Visual Captions without Localization Supervision. [paper] [code]
Video Captioning - Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos.
Spotlight[paper] [code] - Character Grounding and Re-Identification in Story of Videos and Text Descriptions.
Spotlight[paper] [code] - Identity-Aware Multi-Sentence Video Description. [paper]
Image Captioning
Video Captioning
Object Relational Graph With Teacher-Recommended Learning for Video Captioning [paper]
Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha
Spatio-Temporal Graph for Video Captioning With Knowledge Distillation [paper] [code]
Boxiao Pan, Haoye Cai, De-An Huang, Kuan-Hui Lee, Adrien Gaidon, Ehsan Adeli, Juan Carlos Niebles
Better Captioning With Sequence-Level Exploration [paper]
Jia Chen, Qin Jin
Syntax-Aware Action Targeting for Video Captioning [code]
Qi Zheng, Chaoyue Wang, Dacheng Tao
Image Captioning
Clue: Cross-modal Coherence Modeling for Caption Generation [paper]
Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut and Matthew Stone
Improving Image Captioning Evaluation by Considering Inter References Variance [paper]
Yanzhi Yi, Hangyu Deng and Jinglu Hu
Improving Image Captioning with Better Use of Caption [paper] [code]
Zhan Shi, Xu Zhou, Xipeng Qiu and Xiaodan Zhu
Video Captioning
Image Captioning
Unified VLP: Unified Vision-Language Pre-Training for Image Captioning and VQA [paper]
Luowei Zhou (University of Michigan); Hamid Palangi (Microsoft Research); Lei Zhang (Microsoft); Houdong Hu (Microsoft AI and Research); Jason Corso (University of Michigan); Jianfeng Gao (Microsoft Research)
OffPG: Reinforcing an Image Caption Generator using Off-line Human Feedback [paper]
Paul Hongsuck Seo (POSTECH); Piyush Sharma (Google Research); Tomer Levinboim (Google); Bohyung Han(Seoul National University); Radu Soricut (Google)
MemCap: Memorizing Style Knowledge for Image Captioning [paper]
Wentian Zhao (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang(Alibaba Group)
C-R Reasoning: Joint Commonsense and Relation Reasoning for Image and Video Captioning [paper]
Jingyi Hou (Beijing Institute of Technology); Xinxiao Wu (Beijing Institute of Technology); Xiaoxun Zhang (AlibabaGroup); Yayun Qi (Beijing Institute of Technology); Yunde Jia (Beijing Institute of Technology); Jiebo Luo (University of Rochester)
MHTN: Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption [paper]
Wei Zhang (East China Normal University); Yue Ying (East China Normal University); Pan Lu (The University of California, Los Angeles); Hongyuan Zha (GEORGIA TECH)
Show, Recall, and Tell: Image Captioning with Recall Mechanism [paper]
Li WANG (MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China); Zechen BAI(Institute of Software, Chinese Academy of Science, China); Yonghua Zhang (Bytedance); Hongtao Lu (Shanghai Jiao Tong University)
Interactive Dual Generative Adversarial Networks for Image Captioning
Junhao Liu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Kai Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Chunpu Xu (Huazhong University of Science and Technology); Zhou Zhao (Zhejiang University); Ruifeng Xu (Harbin Institute of Technology (Shenzhen)); Ying Shen (Peking University Shenzhen Graduate School); Min Yang ( Chinese Academy of Sciences)
FDM-net: Feature Deformation Meta-Networks in Image Captioning of Novel Objects [paper]
Tingjia Cao (Fudan University); Ke Han (Fudan University); Xiaomei Wang (Fudan University); Lin Ma (Tencent AI Lab); Yanwei Fu (Fudan University); Yu-Gang Jiang (Fudan University); Xiangyang Xue (Fudan University)
Video Captioning
Informative Image Captioning with External Sources of Information [paper]
Sanqiang Zhao, Piyush Sharma, Tomer Levinboim and Radu Soricut
Dense Procedure Captioning in Narrated Instructional Videos [paper]
Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu and Ming Zhou
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang
Generating Question Relevant Captions to Aid Visual Question Answering [paper]
Jialin Wu, Zeyuan Hu and Raymond Mooney
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning [paper]
Zhihao Fan, Zhongyu Wei, Siyuan Wang and Xuanjing Huang
Image Captioning
Video Captioning
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research [paper] [challenge]
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang Wang
ICCV 2019 Oral
POS+CG: Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network [paper]
Bairui Wang, Lin Ma, Wei Zhang, Wenhao Jiang, Jingwen Wang, Wei Liu
POS: Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning [paper]
Jingyi Hou, Xinxiao Wu, Wentian Zhao, Jiebo Luo, Yunde Jia
Image Captioning
DUDA: Robust Change Captioning
Dong Huk Park, Trevor Darrell, Anna Rohrbach [paper]
ICCV 2019 Oral
AoANet: Attention on Attention for Image Captioning [paper]
Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
ICCV 2019 Oral
MaBi-LSTMs: Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style [paper]
Hongwei Ge, Zehang Yan, Kai Zhang, Mingde Zhao, Liang Sun
Align2Ground: Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment [paper]
Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran*
GCN-LSTM+HIP: Hierarchy Parsing for Image Captioning [paper]
Ting Yao, Yingwei Pan, Yehao Li, Tao Mei
IR+Tdiv: Generating Diverse and Descriptive Image Captions Using Visual Paraphrases [paper]
Lixin Liu, Jiajun Tang, Xiaojun Wan, Zongming Guo
CNM+SGAE: Learning to Collocate Neural Modules for Image Captioning [paper]
Xu Yang, Hanwang Zhang, Jianfei Cai
Seq-CVAE: Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning [paper]
Jyoti Aneja, Harsh Agrawal, Dhruv Batra, Alexander Schwing
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings [paper]
Iro Laina, Christian Rupprecht, Nassir Navab
Human Attention in Image Captioning: Dataset and Analysis [paper]
Sen He, Hamed R. Tavakoli, Ali Borji, Nicolas Pugeault
RDN: Reflective Decoding Network for Image Captioning [paper]
Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai
PSST: Joint Optimization for Cooperative Image Captioning [paper]
Gilad Vered, Gal Oren, Yuval Atzmon, Gal Chechik
MUTAN: Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning [paper]
Tanzila Rahman, Bicheng Xu, Leonid Sigal
ETA: Entangled Transformer for Image Captioning [paper]
Guang Li, Linchao Zhu, Ping Liu, Yi Yang
nocaps: novel object captioning at scale [paper]
Harsh Agrawal, Karan Desai, Yufei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection [paper]
Keren Ye, Mingda Zhang, Adriana Kovashka, Wei Li, Danfeng Qin, Jesse Berent
Graph-Align: Unpaired Image Captioning via Scene Graph Alignments paper
Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang
: Learning to Caption Images Through a Lifetime by Asking Questions [paper]
Tingke Shen, Amlan Kar, Sanja Fidler
Image Captioning
SGAE: Auto-Encoding Scene Graphs for Image Captioning [paper] [code]
XU YANG (Nanyang Technological University); Kaihua Tang (Nanyang Technological University); Hanwang Zhang (Nanyang Technological University); Jianfei Cai (Nanyang Technological University)
CVPR 2019 Oral
POS: Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech [paper]
Aditya Deshpande (University of Illinois at UC); Jyoti Aneja (University of Illinois, Urbana-Champaign); Liwei Wang (Tencent AI Lab); Alexander Schwing (UIUC); David Forsyth (Univeristy of Illinois at Urbana-Champaign)
CVPR 2019 Oral
Unsupervised Image Captioning [paper] [code]
Yang Feng (University of Rochester); Lin Ma (Tencent AI Lab); Wei Liu (Tencent); Jiebo Luo (U. Rochester)
Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Yan Xu (UESTC); Baoyuan Wu (Tencent AI Lab); Fumin Shen (UESTC); Yanbo Fan (Tencent AI Lab); Yong Zhang (Tencent AI Lab); Heng Tao Shen (University of Electronic Science and Technology of China (UESTC)); Wei Liu (Tencent)
Describing like Humans: On Diversity in Image Captioning [paper]
Qingzhong Wang (Department of Computer Science, City University of Hong Kong); Antoni Chan (City University of Hong Kong, Hong, Kong)
MSCap: Multi-Style Image Captioning With Unpaired Stylized Text [paper]
Longteng Guo ( Institute of Automation, Chinese Academy of Sciences); Jing Liu (National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences); Peng Yao (University of Science and Technology Beijing); Jiangwei Li (Huawei); Hanqing Lu (NLPR, Institute of Automation, CAS)
CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection [paper] [code]
Lu Zhang (Dalian University of Technology); Huchuan Lu (Dalian University of Technology); Zhe Lin (Adobe Research); Jianming Zhang (Adobe Research); You He (Naval Aviation University)
Context and Attribute Grounded Dense Captioning [paper]
Guojun Yin (University of Science and Technology of China); Lu Sheng (The Chinese University of Hong Kong); Bin Liu (University of Science and Technology of China); Nenghai Yu (University of Science and Technology of China); Xiaogang Wang (Chinese University of Hong Kong, Hong Kong); Jing Shao (Sensetime)
Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning [paper]
Dong-Jin Kim (KAIST); Jinsoo Choi (KAIST); Tae-Hyun Oh (MIT CSAIL); In So Kweon (KAIST)
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [paper]
Marcella Cornia (University of Modena and Reggio Emilia); Lorenzo Baraldi (University of Modena and Reggio Emilia); Rita Cucchiara (Universita Di Modena E Reggio Emilia)
Self-Critical N-step Training for Image Captioning [paper]
Junlong Gao (Peking University Shenzhen Graduate School); Shiqi Wang (CityU); Shanshe Wang (Peking University); Siwei Ma (Peking University, China); Wen Gao (PKU)
Look Back and Predict Forward in Image Captioning [paper]
Yu Qin (Shanghai Jiao Tong University); Jiajun Du (Shanghai Jiao Tong University); Hongtao Lu (Shanghai Jiao Tong University); Yonghua Zhang (Bytedance)
Intention Oriented Image Captions with Guiding Objects [paper]
Yue Zheng (Tsinghua University); Ya-Li Li (THU); Shengjin Wang (Tsinghua University)
Adversarial Semantic Alignment for Improved Image Captions [paper]
Pierre Dognin (IBM); Igor Melnyk (IBM); Youssef Mroueh (IBM Research); Jarret Ross (IBM); Tom Sercu (IBM Research AI)
Good News, Everyone! Context driven entity-aware captioning for news images [paper] [code]
Ali Furkan Biten (Computer Vision Center); Lluis Gomez (Universitat Autónoma de Barcelona); Marçal Rusiñol (Computer Vision Center, UAB); Dimosthenis Karatzas (Computer Vision Centre)
Pointing Novel Objects in Image Captioning [paper]
Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Yingwei Pan (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com)
Engaging Image Captioning via Personality [paper]
Kurt Shuster (Facebook); Samuel Humeau (Facebook); Hexiang Hu (USC); Antoine Bordes (Facebook); Jason Weston (FAIR)
Intention Oriented Image Captions With Guiding Objects [paper]
Yue Zheng, Yali Li, Shengjin Wang
Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables [paper]
Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu
Video Captioning
SDVC: Streamlined Dense Video Captioning [paper]
Jonghwan Mun (POSTECH); Linjie Yang (ByteDance AI Lab); Zhou Ren (Snap Inc.); Ning Xu (Snap); Bohyung Han (Seoul National University)
CVPR 2019 Oral
GVD: Grounded Video Description [paper]
Luowei Zhou (University of Michigan); Yannis Kalantidis (Facebook Research); Xinlei Chen (Facebook AI Research); Jason J Corso (University of Michigan); Marcus Rohrbach (Facebook AI Research)
CVPR 2019 Oral
HybridDis: Adversarial Inference for Multi-Sentence Video Description [paper]
Jae Sung Park (UC Berkeley); Marcus Rohrbach (Facebook AI Research); Trevor Darrell (UC Berkeley); Anna Rohrbach (UC Berkeley)
CVPR 2019 Oral
OA-BTG: Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning [paper]
Junchao Zhang (Peking University); Yuxin Peng (Peking University)
MARN: Memory-Attended Recurrent Network for Video Captioning [paper]
Wenjie Pei (Tencent); Jiyuan Zhang (Tencent YouTu); Xiangrong Wang (Delft University of Technology); Lei Ke (Tencent); Xiaoyong Shen (Tencent); Yu-Wing Tai (Tencent)
GRU-EVE: Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [paper]
Nayyer Aafaq (The University of Western Australia); Naveed Akhtar (The University of Western Australia); Wei Liu (University of Western Australia); Syed Zulqarnain Gilani (The University of Western Australia); Ajmal Mian (University of Western Australia)
Image Captioning
AAAI 2019 Oral
AAAI 2019 Oral
Video Captioning
TAMoE: Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning [code] [paper]
Xin Wang (University of California, Santa Barbara); Jiawei Wu (University of California, Santa Barbara); Da Zhang (UC Santa Barbara); Yu Su (OSU); William Wang (UC Santa Barbara)
AAAI 2019 Oral
TDConvED: Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning [paper]
Jingwen Chen (Sun Yat-set University); Yingwei Pan (JD AI Research); Yehao Li (Sun Yat-Sen University); Ting Yao (JD AI Research); Hongyang Chao (Sun Yat-sen University); Tao Mei (AI Research of JD.com)
AAAI 2019 Oral
FCVC-CF&IA: Fully Convolutional Video Captioning with Coarse-to-Fine and Inherited Attention [paper]
Kuncheng Fang (Fudan University); Lian Zhou (Fudan University); Cheng Jin (Fudan University); Yuejie Zhang (Fudan University); Kangnian Weng (Shanghai University of Finance and Economics); Tao Zhang (Shanghai University of Finance and Economics); Weiguo Fan (University of Iowa)
MGSA: Motion Guided Spatial Attention for Video Captioning [paper]
Shaoxiang Chen (Fudan University); Yu-Gang Jiang (Fudan University)