BELCAVELLO, F.; VIRIDIANO, M.; DINIZ DA COSTA, A.; MATOS, E. E.; TORRENT, T. T. (2020). Frame-Based Annotation of Multimodal Corpora: Tracking (A)Synchronies in Meaning Construction. In: Proceedings of the LREC International FrameNet Workshop 2020. Marseille, France: ELRA, p. 23-30.
BELCAVELLO, F.; DINIZ DA COSTA, A.; ALMEIDA, V. ; VIRIDIANO, M.; TORRENT, T. T. (2019). Multimodal Analysis for Building Semantic Representations in The Tourism Domain Using Frames and Qualia. In: 4th Bremen Conference on Multimodality (BreMM19) 2019 Conference Procedings. Bremem, Germany.
Study the fundamentals first by reading Speech and Language Processing, 2nd Edition, by Jurafsky and Martin. The 3rd edition is in progress and some chapters are available as pdf.
Also...
- BENDER, Emily M. Linguistic fundamentals for natural language processing: 100 essentials from morphology and syntax. Synthesis lectures on human language technologies, v. 6, n. 3, p. 1-184, 2013.
10.2200/S00493ED1V01Y201303HLT020
- BENDER, Emily M.; LASCARIDES, Alex. Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Synthesis Lectures on Human Language Technologies, v. 12, n. 3, p. 1-268, 2019.
10.2200/S00935ED1V02Y201907HLT043
- GOLDBERG, Yoav. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, v. 10, n. 1, p. 1-309, 2017.
10.2200/S00762ED1V01Y201703HLT037
- HUTCHINS, William John; SOMERS, Harold L. An introduction to machine translation. London: Academic Press, 1992. [download pdf]
- MANNING, Christopher D.; MANNING, Christopher D.; SCHÜTZE, Hinrich. Foundations of statistical natural language processing. MIT press, 1999. [download pdf]
- KOEHN, Philipp. Neural machine translation. arXiv preprint arXiv:1709.07809, 2017. [download pdf]
- KOEHN, Philipp. Statistical machine translation. Cambridge University Press, 2009.
10.1017/CBO9780511815829
The following texts are useful, but not required. All of them can be read free online.
- Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning
If you have no background in Neural Networks, you might well find one of these books helpful to give you more background:
- Michael A. Nielsen. Neural Networks and Deep Learning
- Eugene Charniak. Introduction to Deep Learning
For learning about Deep Learning for NLP, take the Stanford cs224n online course or watch the Stanford cs224n Lecture collection on NLP with Deep Learning.
- Deep Learning Drizzle has the most comprehensive database of online courses on NLP and ML.
Also...
- NLP Pandect – a fantastically detailed, curated collection of NLP resources on everything NLP — from general information resources, to frameworks, to podcsats and Youtube channels
- NLP Tutorial – includes lots of minimal walk-throughs of NLP models implemented with less than 100 lines of code
- NLP Roadmap 2019 – roadmap and keyword for students those who have interest in learning Natural Language Processing
- NLP Progress – Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks, by @sebastianruder
- Image Data Labelling and Annotation – Image annotation types, formats and tools
- Object Detection and Tracking in 2020
- A complete overview of ML online courses – Every single Machine Learning course on the internet, ranked by your reviews
- An overview of semantic image segmentation – how to use convolutional neural networks for the task of semantic image segmentation
- Going beyond the bounding box with semantic segmentation
- Semantic Image Segmentation with DeepLab in TensorFlow
- SRITagging
- ImageGraph – Visual Computing made easy. Computer Vision. Image Processing. Data Visualization. All drag-and-drop in the browser.
- YOLOv3 – Real-Time Object Detection
- MakeSense.AI – An open-source and free to use annotation tool under GPLv3
- ScaLabel – A scalable open-source web annotation tool
- RectLabel – An image annotation tool to label images for bounding box object detection and segmentation
- labelme – Image Polygonal Annotation with Python
- LabelImg – A graphical image annotation tool written in Python.
- VGG Image Annotator – A standalone image annotator application packaged as a single HTML file (< 400 KB) that runs on most modern web browsers
- Figure Eight – If you need labels and annotations for your machine learning project, we can help. You upload your unlabeled data, with the rules you need for your machine learning project, and launch. We use a distributed network of human annotators and cutting edge machine learning models to annotate that data at enterprise scale
Dataset Download | Paper | Description |
---|---|---|
Multi 30K | [Elliott et al. 2016] arXiv:1605.00459 | Extends the Flickr30K dataset with German translations created by professional translators over a subset of the English descriptions |
Flickr 30K Entities | [Plummer et al. 2015] arXiv:1505.04870 | 244k coreference chains and 276k manually annotated bounding boxes for each of the 31,783 images and 158,915 English captions (five per image) in the original dataset |
Flickr 30K | [Young et al. 2014] 10.1162/tacl_a_00166 | Standard benchmark for sentence-based image description |
MS COCO | [Lin et al. 2014] arXiv:1405.0312 | Large-scale object detection, segmentation, and captioning dataset |
AVA | [Roth et al. 2019] arXiv:1901.01342 | Spatio-temporal audiovisual annotations of human actions in movies, suitable for training localized action recognition systems |
Open Images | [Kuznetsova et al. 2018] arXiv:1811.00982 | ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narrative |
Google's Conceptual Captions | [SHARMA, Piyush et al. 2018] 10.18653/v1/P18-1238 | ~3.3M images annotated with captions. In contrast with the curated style of other image caption annotations, Conceptual Caption images and their raw descriptions are harvested from the web, and therefore represent a wider variety of styles. More precisely, the raw descriptions are harvested from the Alt-text HTML attribute associated with web images. To arrive at the current version of the captions, we have developed an automatic pipeline that extracts, filters, and transforms candidate image/caption pairs, with the goal of achieving a balance of cleanliness, informativeness, fluency, and learnability of the resulting captions. |
VCR | [ZELLERS, Rowan et al. 2019] arXiv:1811.10830 | A dataset consisting of 290k multiple choice QA problems derived from 110k movie scenes. |
VisualCOMET | [PARK, Jae Sung et al. 2020] arXiv:2004.10796 | A large-scale repository of Visual Commonsense Graphs that consists of over 1.4 million textual descriptions of visual commonsense inferences carefully annotated over a diverse set of 60,000 images, each paired with short video summaries of before and after + person-grounding (i.e., co-reference links) between people appearing in the image and people mentioned in the textual commonsense descriptions, allowing for tighter integration between images and text. |
Also...
- The Big Bad NLP Database
- YouTube BoundingBoxes – Large-scale data set of video URLs with densely-sampled high-quality single-object bounding box annotations. All the video segments were human-annotated with high precision classifications and bounding boxes at 1 frame per second.
- What's Cookin' – A list of cooking-related Youtube video ids, along with time stamps marking the (estimated) start and end of various events.
- PASCAL VOC – A standardised image data sets for object class recognition and a common set of tools for accessing the data sets and annotations
- PASCAL Context – Indoor and outdoor scenes with 400+ classes
- MPII Human Pose Dataset – State of the art benchmark for evaluation of articulated human pose estimation
- Cityscapes Dataset – benchmark suite and evaluation server for pixel-level and instance-level semantic labeling
- Mapillary Vistas Dataset – a diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world
- ApolloScape Scene Parsing – RGB videos with high resolution image sequences and per pixel annotation, survey-grade dense 3D points with semantic segmentation
- Stanford Background Dataset – A set of outdoor scenes with at least one foreground object
- SEMAFOR – automatically processes English sentences according to the form of semantic analysis in Berkeley FrameNet.
- Google Sling – Natural language frame semantics parser
- Open Sesame – Frame-semantic parsing system based on a softmax-margin SegRNN
- PathLSMT – Neural SRL model