In this repo, we re-collect a series of our existing works on the contextual explainable video representation.
Our works on Temporal Action Proposals Generation are summarized as follows:
- Agent-Environment Network (AEN, published in ICASSP 2021):
- Agent-Aware Boundary Network (ABN, published in IEEE Access):
- Actors-Environment Interaction (AEI, published in BMVC 2021, Oral Session):
- Actors-Objects-Environment Network (AOE-Net, published in International Journal of Computer Vision):
Our works on Video Paragraph Captioning are summarized as follows:
- Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning (VLCap, published in ICIP 2022):
- VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning (VlLTinT, published in AAAI 2023):
- ArXiv: https://arxiv.org/abs/2211.15103
- Source code: https://github.com/UARK-AICV/VLTinT