Homework
HW3: Vector Space Models
Your task is to develop a sentiment analyzer train on the Stanford Sentiment Treebank:
Create a vector_space_models.py file in the src/homework/ directory.
Define a function named
sentiment_analyzer()
that takes two parameters, a list of training documents and a list of test documents for classification, and returns the predicted sentiment labels along with the respective similarity scores.Use the -nearest neighbors algorithm for the classification. Find the optimal value of using the development set, and then hardcode this value into your function before submission.
Data
The sentiment_treebank directory contains the following two files:
sst_trn.tst: a training set consisting of 8,544 labeled documents.
sst_dev.tst: a development set consisting of 1,101 labeled documents.
Each line is a document, which is formatted as follows:
Below are the explanations of what each label signifies:
0
: Very negative1
: Negative2
: Neutral3
: Positive4
: Very positive
Submission
Commit and push the vector_space_models.py file to your GitHub repository.
Extra Credit
Define a function named sentiment_analyzer_extra()
that gives an improved sentiment analyzer.
Rubric
Code Submission (1 point)
Program Execution (1 point)
Development Set Accuracy (4 points)
Evaluation Set Accuracy (4 points)
Concept Quiz (2 points)
Last updated
Was this helpful?