建设公司网站需要多少钱图书类网站开发的背景
文章目录
- CVPR2023
- 一. Vision and Language / Multimodal
CVPR2023
根据官方信息统计,今年共收到 9155 份提交,比去年增加了 12%,创下新纪录,今年接收了 2360 篇论文,接收率为 25.78%。作为对比,去年有 8100 多篇有效投稿,大会接收了 2067 篇,接收率为 25%。
https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers
现在根据关键词,对自己感兴趣的方向进行规整以及分类(有筛选)
一. Vision and Language / Multimodal
| 论文名 | 简介 |
|---|---|
| Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles | |
| Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training | |
| Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning | |
| Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks | |
| CREPE: Can Vision-Language Foundation Models Reason Compositionally? | |
| Task Residual for Tuning Vision-Language Models | |
| Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A Self-Train on Unlabeled Images! | |
| FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks | |
| VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining | |
| Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator | |
| Image as a Foreign Language BEiT Pretraining for Vision and Vision-Language Tasks | |
| FashionSAP: Symbols and Attributes Prompt for Fine-grained Fashion Vision-Language Pre-training | |
| Accelerating Vision-Language Pretraining with Free Language Modeling | |
| Leveraging per Image-Token Consistency for Vision-Language Pre-training | |
| Position-guided Text Prompt for Vision-Language Pre-training | |
| IFSeg: Image-free Semantic Segmentation via Vision-Language Model | |
| Enhanced Multimodal Representation Learning with Cross-modal KD | |
| Efficient Multimodal Fusion via Interactive Prompting | |
| Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data | |
| Revisiting Multimodal Representation in Contrastive Learning From Patch and Token embeddings to Finite Discrete Tokens | |
| Align and Attend: Multimodal Summarization with Dual Contrastive Losses | |
| Multimodal Prompting with Missing Modalities for Visual Recognition |
