Resolving Language and Vision Ambiguities Together:
Joint Segmentation & Prepositional Attachment Resolution
in Captioned Scenes

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

Gordon Christie*, Ankit Laddha*, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra

* denotes equal contribution

@InProceedings{holistic_emnlp16,
author = {Gordon Christie and Ankit Laddha and Aishwarya Agrawal and Stanislaw Antol and Yash Goyal and Kevin Kochersberger and Dhruv Batra},
title = {{Resolving Language and Vision Ambiguities Together: Joint Segmentation \& Prepositional Attachment Resolution in Captioned Scenes}},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = {2016},
}

Pascal-Context-50S

 


Prepositional Phrase Attachment Data Format

The mat file is a cell array of size N x 3 where N is the number of (caption, image) pairs in the dataset. For each (caption, image) pair, there are 3 elements --
  1. The first element is a 10 x 1 struct array consisting of Prepositional Phrase Attachment Data for each of the 10 parses.
  2. The second element is the image name.
  3. The third element is the caption.
The fields in each of the structs in 10 x 1 struct array (the first element) are :

"ID": double
"score": double
"depend": cell
"preps": cell array
"prepsNew": cell array

ID: Rank of the corresponding parse from the Stanford Parser.
score: Log probability of the corresponding parse from the Stanford Parser.
depend: Each element in the cell array is a dependency of the form dependency_type(word1, word2) for the corresponding parse from the Stanford Parser.
preps: Each element in the cell array is a prepositional dependency (e.g., "prep_on(woman-8, couch-11)") for the corresponding parse from the Stanford Parser.
prepsNew: Each element in the cell array is a modified prepositional dependency (e.g., "prep_on(woman-8, couch-11)") for the corresponding parse from the Stanford Parser.


Prepositional Phrase Attachment Annotations Format

Each line in the text file corresponds to a (caption, image) pair containing the Prepositional Phrase Attachment accuracy of each of the 10 parses of the caption separated by semicolon. The ordering of the (caption, image) pairs in the text file is same as that in Prepositional Phrase Attachment Data.

Below is the MATLAB code to read the accuracies from the text file:

accuracies = dlmread(filename, ';');

The accuracy of the j-th parse of the i-th caption is given by accuracies(i,j).