NLP Essentials
GitHub Author
  • Overview
    • Syllabus
    • Schedule
    • Development Environment
    • Homework
  • Text Processing
    • Frequency Analysis
    • Tokenization
    • Lemmatization
    • Regular Expressions
    • Homework
  • Language Models
    • N-gram Models
    • Smoothing
    • Maximum Likelihood Estimation
    • Entropy and Perplexity
    • Homework
  • Vector Space Models
    • Bag-of-Words Model
    • Term Weighting
    • Document Similarity
    • Document Classification
    • Homework
  • Distributional Semantics
    • Distributional Hypothesis
    • Word Representations
    • Latent Semantic Analysis
    • Neural Networks
    • Word2Vec
    • Homework
  • Contextual Encoding
    • Subword Tokenization
    • Recurrent Neural Networks
    • Transformer
    • Encoder-Decoder Framework
    • Homework
  • NLP Tasks & Applications
    • Text Classification
    • Sequence Tagging
    • Structure Parsing
    • Relation Extraction
    • Question Answering
    • Machine Translation
    • Text Summarization
    • Dialogue Management
    • Homework
  • Projects
    • Speed Dating
    • Team Formation
    • Proposal Pitch
    • Proposal Report
    • Live Demonstration
    • Final Report
    • Team Projects
      • Team Projects (2024)
    • Project Ideas
      • Project Ideas (2024)
Powered by GitBook

Copyright © 2023 All rights reserved

On this page
  • Data
  • Submission
  • Extra Credit
  • Rubric

Was this helpful?

Export as PDF
  1. Vector Space Models

Homework

HW3: Vector Space Models

PreviousDocument ClassificationNextDistributional Semantics

Last updated 1 month ago

Was this helpful?

Your task is to develop a sentiment analyzer train on the :

  • Create a file in the directory.

  • Define a function named sentiment_analyzer() that takes two parameters, a list of training documents and a list of test documents for classification, and returns the predicted sentiment labels along with the respective similarity scores.

  • Use the kkk-nearest neighbors algorithm for the classification. Find the optimal value of kkk using the development set, and then hardcode this value into your function before submission.

Data

The directory contains the following two files:

  • : a training set consisting of 8,544 labeled documents.

  • : a development set consisting of 1,101 labeled documents.

Each line is a document, which is formatted as follows:

[Label]\t[Document]

Below are the explanations of what each label signifies:

  • 0: Very negative

  • 1: Negative

  • 2: Neutral

  • 3: Positive

  • 4: Very positive

Submission

Commit and push the vector_space_models.py file to your GitHub repository.

Extra Credit

Define a function named sentiment_analyzer_extra() that gives an improved sentiment analyzer.

Rubric

  • Code Submission (1 point)

  • Program Execution (1 point)

  • Development Set Accuracy (4 points)

  • Evaluation Set Accuracy (4 points)

  • Concept Quiz (2 points)

Stanford Sentiment Treebank
vector_space_models.py
src/homework/
sentiment_treebank
sst_trn.tst
sst_dev.tst