arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Homework

HW3: Vector Space Models

Your task is to develop a sentiment analyzer train on the Stanford Sentiment Treebankarrow-up-right:

  • Create a vector_space_models.pyarrow-up-right file in the src/homework/arrow-up-right directory.

  • Define a function named sentiment_analyzer() that takes two parameters, a list of training documents and a list of test documents for classification, and returns the predicted sentiment labels along with the respective similarity scores.

  • Use the -nearest neighbors algorithm for the classification. Find the optimal value of using the development set, and then hardcode this value into your function before submission.

hashtag
Data

The directory contains the following two files:

  • : a training set consisting of 8,544 labeled documents.

  • : a development set consisting of 1,101 labeled documents.

Each line is a document, which is formatted as follows:

Below are the explanations of what each label signifies:

  • 0: Very negative

  • 1: Negative

  • 2: Neutral

hashtag
Submission

Commit and push the vector_space_models.py file to your GitHub repository.

hashtag
Extra Credit

Define a function named sentiment_analyzer_extra() that gives an improved sentiment analyzer.

hashtag
Rubric

  • Code Submission (1 point)

  • Program Execution (1 point)

  • Development Set Accuracy (4 points)

  • Evaluation Set Accuracy (4 points)

3: Positive

  • 4: Very positive

  • Concept Quiz (2 points)

  • kkk
    kkk
    sentiment_treebankarrow-up-right
    sst_trn.tstarrow-up-right
    sst_dev.tstarrow-up-right
    [Label]\t[Document]