- Software Technology Department, Undergraduate Thesis Program
- System Overview
- How to run the system
- Research Overview
- Coref API Documentation
- Coref Class Documentation
- Main API Documentation
- View the landing page
- Generate the screenplay of a story
- View the results of the element extraction module
- View the results of the element extraction module
- View the text file of a story
- View the annotation page for a story
- View the extraction results page for a story
- View the generated screenplay for a story
- Download the generated screenplay for a story as a PDF file
- Download the generated screenplay for a story as a TeX file
- Main Class Documentation
AnnotationHelper
ConceptNet
CorefResolver
DialogueExtractor
EntityExtractor
ActionExtractor
ScreenplayGenerator
SpacyUtil
StoryPresenter
ExtractionEvaluator
evaluate_extraction()
evaluate_dialogue_speaker(file)
evaluate_dialogue_content(file)
evaluate_characters(file)
evaluate_props(file)
evaluate_actions(file)
evaluate_transitions(file)
count(prediction, annotation)
count_bianca(prediction, annotation)
evaluate(tp, fp, fn)
evaluate(perfect, missing, lacking, excess, missing, wrong)
UnderstandingEvaluator
- A system for screenwriters to generate a first draft of a screenplay adaptation from a short story
- Extracts story elements from a short story text file using Natural Language Processing
- Represents the story elements as abstract data structures
- Generates a screenplay from the abstract story representation downloadable in PDF and TeX formats
- Built using Django, SQLite, spaCy, and TeX Live
Convert a story to a screenplay in two simple steps:
- Provide the title, author, and the story .txt file.
- View and download the screenplay.
For evaluation purposes, the system can be used to annotate story elements.
The story representation results can be viewed. Metrics on the left require the story to be annotated first.
- Python 3.7.9
- virtualenv
- TeX distribution software, preferably TeX Live
- screenplay package for your chosen TeX distribution software
- If it is your first time running the project, run
install.bat
from the root directory. Initial setup may take a while. - Run
run.bat
from the root directory. - The project webpage will be shown after a few seconds.
- If the webpage is unresponsive, refresh after 30 seconds.
- If the webpage is still unresponsive, try the manual setup.
- Go to the
coref
directory using the command prompt - Create and activate a Python 3.7.9 virtual environment
py -m venv env
.\env\Scripts\activate
- Install dependencies
py -m pip install -r requirements.txt
- Download spaCy pre-trained models
py -m spacy download en_core_web_sm
- Run
py manage.py runserver
- Go to the
main
directory using the command prompt - Create and activate a Python 3.7.9 virtual environment
py -m venv env
.\env\Scripts\activate
- Install dependencies
py -m pip install -r requirements.txt
- Download spaCy pre-trained models
py -m spacy download en_core_web_sm
- Set up database
py manage.py makemigrations
py manage.py migrate
- Run
py manage.py runserver
- Open your browser and enter the URL localhost:8000
- If an error occurs during the setup, please take a screenshot and contact the developers. Thank you.
A story is a series of events that can be represented in many ways. They are diverse and follow no strict format. The screenplay is a medium to tell stories clearly and straightforwardly. They focus on vital story elements that are ordered such that the story's meaning is retained. The conversion of stories into screenplays is currently time and human resource expensive due to the research and creativity needed to make a faithful adaptation. However, there are strategies in converting stories to screenplays that are repeatable for screenwriters. Thus, we created a system that automatically translates short stories into screenplays. Story elements were extracted and classified simultaneously and mapped to screenplay elements through abstract story representation. Of the story elements extracted, the system performed best with dialogue content and action lines, with precision, recall, and f1 scores above 60%. Readers were able to understand the screenplays across both corpora, performing with an above 60% similarity using Simple Matching Coefficient with story readers across all story elements.
The research document can be found here. Please request access if prompted.
A lightweight REST API for coreference resolution in a text. Uses spaCy's neuralcoref library.
text
: string
The text to resolve coreferences from.
coref
: JSON
A JSON object containing the coreference clusters found in the text. coref[entity_start][entity_end]
returns a list of mentions, where each mention is 2-item list [mention_start, mention_end]
. *_start
and *_end
are integers that represent the indices of the spaCy Token
objects for the start and end of the noun phrase, respectively.
text
: string
The text to resolve coreferences from.
coref
: dict
A dictionary containing the coreference clusters found in the text. coref[entity_start][entity_end]
returns a list of mentions, where each mention is 2-item list [mention_start, mention_end]
. *_start
and *_end
are integers that represent the indices of the spaCy Token
objects for the start and end of the noun phrase, respectively.
No parameters.
An HTML response of the landing page for the application.
title
: string
The title of the story
author
: string
The author of the story
text_file
: file
The text file of the story
An HTML response of the screenplay page for the story.
No parameters.
An HTML response of the page for the element extraction results.
No parameters.
An HTML response of the page for the story understanding results.
id
: string
Unique identifier for the story.
A plaintext HTTP response of the text file of the specified story.
id
: string
Unique identifier for the story.
An HTML response of the annotation page for the specific story.
id
: string
Unique identifier for the story.
An HTML response of the extraction results page for the specific story.
id
: string
Unique identifier for the story.
An HTML response of the screenplay page for the specific story.
id
: string
Unique identifier for the story.
A downloadable .pdf file of the generated screenplay.
id
: string
Unique identifier for the story.
A downloadable .tex file of the generated screenplay.
Splits the text into tokens and sentences for the annotation page.
text
: string
The text to process.
No return values.
Checks if a noun is a prop or a character using ConceptNet.
possibleCharacter
: string
The noun to check if it's a prop or not.
verb
: string
The verb to check if the noun can perform this action.
flag
: boolean
If flag == True
, then possibleCharacter
is a prop. Otherwise, possibleCharacter
is a character.
Checks if a noun is a named location or not using ConceptNet.
pobj
: string
The noun to check if it's a named location or not.
flag
: boolean
If flag == True
, then pobj
is a named location. Otherwise, pobj
is not a named location.
Checks for an adpositional phrase or verb to determine a location change.
adp
: string
The adpositional phrase to check if there's a location change.
verb
: string
The verb to check if there's a location change.
flag
: boolean
If flag == True
, then a location change might have happened. Otherwise, there was no location change.
Builds a dictionary of coreferences from the JSON response from the Coref API.
doc
: spaCy.Doc
The story represented by spaCy's Doc
object.
data
: dict
The dictionary built from the JSON response from the Coref API.
No return values.
Prints the dictionary of coreferences.
No parameters.
No return values.
Extracts the dialogue content, and then extracts the dialogue speakers.
doc
: spaCy.Doc
The story represented by spaCy's Doc
object.
story
: Story
The story represented by the Story
object.
dialogues
: List<Dialogue>
The list of dialogues extracted from the story.
Prints the speaker and the content of a dialogue.
dialogue
: Dialogue
The dialogue to be printed
No return values.
Gets the Entity
object that starts at start
and ends at end
.
start
: integer
The token index of the start of the noun phrase.
end
: integer
The token index of the end of the noun phrase.
speaker
: Entity
The Entity
object that starts at start
and ends at end
. speaker == None
if no Entity
is found.
Extracts dialogue content using spaCy's Matcher
class. Words enclosed in double quotes are considered for dialogue content.
No parameters.
No return values.
Extracts the speakers of the extracted dialogue contents. Three scenarios are considered:
- Speaker said, "Hi."
- "Hi," said Speaker.
- "Hi."
No parameters.
No return values.
Resolves coreferences in the extracted dialogues using the dictionary from the CorefResolver
class.
mention_entity_dict
: dict
The dictionary from the CorefResolver
class.
No return values.
Prints all the dialogues.
No parameters.
No return values.
Extracts the entities from the story. Uses spaCy's DependencyMatcher
class to extract noun subject and action verb pairs, and classifies the noun subject as a character or prop.
doc
: spaCy.Doc
The story represented by spaCy's Doc
object.
story
: Story
The story represented by the Story
object.
speakers
: List<Entity>
The list of speakers extracted from the DialogueExtractor
.
No return values.
entities
: List<Entity>
The total list of entities extracted from the story.
doc
: spaCy.Doc
The story represented by spaCy's Doc
object.
distinct_entities
: List<Entity>
The list of entities where no two entities have the same string representation.
Prints the extracted characters from the story.
No parameters.
No return values.
Prints the extracted props from the story.
No parameters.
No return values.
entity
: Entity
The entity to be printed
No return values.
Gets the Character
object that starts at start
and ends at end
.
start
: integer
The token index of the start of the noun phrase.
end
: integer
The token index of the end of the noun phrase.
character
: Character
The Character
object that starts at start
and ends at end
. character == None
if no Character
is found.
Gets the Prop
object that starts at start
and ends at end
.
start
: integer
The token index of the start of the noun phrase.
end
: integer
The token index of the end of the noun phrase.
prop
: Prop
The Prop
object that starts at start
and ends at end
. prop == None
if no Prop
is found.
Resolves coreferences in the extracted characters using the dictionary from the CorefResolver
class.
mention_entity_dict
: dict
The dictionary from the CorefResolver
class.
No return values.
Resolves coreferences in the extracted props using the dictionary from the CorefResolver
class.
mention_entity_dict
: dict
The dictionary from the CorefResolver
class.
No return values.
sentence
: spaCy.Span
The sentence to determine the event type of.
sent_characters
: List<Character>
The characters found in the sentence.
sent_props
: List<Prop>
The props found in the sentence.
The event type of the sentence, either a scene transition or an action event.
Instantiates and returns an ActionEvent
with a scene transition classification.
sentence
: spaCy.Span
The sentence to determine the event type of.
idx
: integer
The index of the sentence relative to all sentences in the spaCy Doc
.
sent_characters
: List<Character>
The characters found in the sentence.
sent_props
: List<Prop>
The props found in the sentence.
A complete ActionEvent
object that's classified as a scene transition and contains the characters and props found in the sentence.
Instantiates and returns an ActionEvent
.
sentence
: spaCy.Span
The sentence to determine the event type of.
idx
: integer
The index of the sentence relative to all sentences in the spaCy Doc
.
sent_characters
: List<Character>
The characters found in the sentence.
sent_props
: List<Prop>
The props found in the sentence.
A complete ActionEvent
object that's not classified as a scene transition and contains the characters and props found in the sentence.
Iterates through all of the sentences in doc
and instantiates Scene
and ActionEvent
objects based on the classification of each sentence.
doc
: spaCy.Doc
The story represented by spaCy's Doc
object.
story
: Story
The story represented by the Story
object.
dialogue_events
: List<Dialogue>
The dialogue events extracted by the DialogueExtractor
class.
character_list
: List<Character>
The characters extracted by the EntityExtractor
class.
prop_list
: List<Prop>
The props extracted by the EntityExtractor
class.
No return values.
Prints the extracted Scene
and Event
objects.
No parameters.
No return values.
Generates a .tex file from the abstract story representation, and then generates a .pdf file from the .tex file.
No parameters.
No return values.
Generates a .tex file from the abstract story representation.
No parameters.
No return values.
Generates a .pdf file from the generated .tex file.
No parameters.
No return values.
Generates the string representation of the title page for the screenplay.
No parameters.
No return values.
Generates the string representation of the main body for the screenplay.
No parameters.
No return values.
Generates the string representation of a scene transition.
transition_event
: TransitionEvent
The transition event to be generated.
No return values.
Generates the string representation of an action event.
action_event
: ActionEvent
The action event to be generated.
No return values.
Generates the string representation of a dialogue.
dialogue_event
: DialogueEvent
The dialogue event to be generated.
No return values.
token
: spaCy.Token
The token in question.
previous_token
: spaCy.Token
The first non-whitespace and non-newline token before token
.
token
: spaCy.Token
The token in question.
next_token
: spaCy.Token
The first non-whitespace and non-newline token after token
.
token
: spaCy.Token
The token in question.
previous_word
: spaCy.Token
The first word before token
.
token
: spaCy.Token
The token in question.
next_word
: spaCy.Token
The first word after token
.
token
: spaCy.Token
The token in question.
anchor
: spaCy.Token
The syntactic anchor of token
anchor
: spaCy.Token
The syntactic anchor of a sentence.
subject
: spaCy.Token
The noun subject of anchor
.
anchor
: spaCy.Token
The syntactic anchor of a sentence.
direct_object
: spaCy.Token
The direct object of anchor
.
noun
: spaCy.Token
The noun in question.
noun_chunk
: spaCy.Span
The Span
noun chunk that contains the Token
noun
.
sent
: spaCy.Span
The sentence in question.
idx
: integer
The index of the sentence with respect to the story Doc
.
Transforms the abstract story representation into a list of sentences and tokens for presentation.
No parameters.
No return values.
Evaluates the precision, recall, and f1-score of each story element.
No parameters.
No return values.
Evaluates the precision, recall, and f1-score of extracted dialogue speakers.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for dialogue speakers formatted as a tuple (precision, recall, f1-score)
Evaluates the precision, recall, and f1-score of extracted dialogue content.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for dialogue content formatted as a tuple (precision, recall, f1-score)
Evaluates the precision, recall, and f1-score of extracted characters.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for characters formatted as a tuple (precision, recall, f1-score)
Evaluates the precision, recall, and f1-score of extracted props.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for props formatted as a tuple (precision, recall, f1-score)
Evaluates the precision, recall, and f1-score of extracted action lines.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for action lines formatted as a tuple (precision, recall, f1-score)
Evaluates the precision, recall, and f1-score of extracted scene transitions.
file
: File
The annotation .txt file to base the ground truth from.
score
: tuple
The evaluation score of the extraction for scene transitions formatted as a tuple (precision, recall, f1-score)
.
Counts the number of true positives, false positives, and false negatives in the prediction.
prediction
: List<integer>
The predicted results of the system.
annotation
: List<integer>
The annotated results.
score
: tuple
The count score of the prediction formatted as a tuple (true positives, false positives, false negatives)
Implements Bianca's algorithm to count the number of perfect, missing, lacking, excess, missing, and wrong predictions.
prediction
: List<integer>
The predicted results of the system.
annotation
: List<integer>
The annotated results.
score
: tuple
The count score of the prediction formatted as a tuple (perfect, missing, lacking, excess, missing, wrong)
Calculates and returns the precision, recall, and f1-score given the count score.
tp
: integer
The number of true positives.
fp
: integer
The number of false positives.
fn
: integer
The number of false negatives.
score
: tuple
The evaluation score formatted as a tuple (precision, recall, f1-score)
.
Implements Bianca's algorithm to calculate and return the precision, recall, and f1-score given the count score.
perfect
: integer
The number of perfect predictions.
missing
: integer
The number of missing predictions.
lacking
: integer
The number of lacking predictions.
excess
: integer
The number of excess predictions.
missing
: integer
The number of missing predictions.
wrong
: integer
The number of wrong predictions.
score
: tuple
The evaluation score formatted as a tuple (precision, recall, f1-score)
.
Calculates the simple matching coefficient, jaccard's coefficient, and cosine similarities of the story questionnaire responses and screenplay questionnaire responses.
No parameters.
story_understanding
: dict
The evaluation results of the story understanding module.
Calculates the simple matching coefficient of sets x
and y
.
x
: List<integer>
The first set.
y
: List<integer>
The second set.
smc
: double
The simple matching coefficient of the two sets.
Calculates the Jaccard's coefficient of sets x
and y
.
x
: List<integer>
The first set.
y
: List<integer>
The second set.
jc
: double
The Jaccard's coefficient of the two sets.
Calculates the cosine similarity of vectors x
and y
.
x
: List<integer>
The first vector.
y
: List<integer>
The second vector.
cs
: double
The cosine similarity of the two vectors.
Calculates the dot product of vectors x
and y
.
x
: List<integer>
The first vector.
y
: List<integer>
The second vector.
dot_product
: double
The dot product of the two vectors.
Calculates the length of vectors x
and y
.
x
: List<integer>
The first vector.
y
: List<integer>
The second vector.
length
: double
The length of the two vectors.
Reads a csv file and returns its 2D array representation.
csv_file
: File
The csv file to be read.
result
: List<List<string>>
The 2D array representation of the csv file.
Transforms string responses into binary responses.
responses
: dict
The aggregated responses of the story and screenplay questionnaires.
result
: dict
responses
but the string
responses are now binary.