1 of 1

Frequency Analysis

Frequency Analysis examines how often each word appears in a corpus. It helps understand language patterns and structure by measuring how often words appear in text.

Word Counting

Consider the following text from Wikipedia about Emory University (as of 2023-10-18):

Our task is to determine the number of word tokens and unique word types in this text.

Q1: What is the difference between a word token and a word type?

A simple way of accomplishing this task is to split the text by whitespace and count the resulting strings:

L1: Import the class, a special type of a , from the package.
L3: Use to indicate the parameter type (str) and the return type (Counter).
L4: the corpus

L1: The corpus can be found .
L4: Save the total number of word tokens in the corpus, which is the of counts values.
L5: Save the number of unique word types in the corpus, which is the of counts

When running this program, you may encounter FileNotFoundError. To fix this error, follow these steps to set up your working directory:

Go to [Run] > [Edit Configurations] in the menu.

Word Frequency

In this task, we aim to retrieve the top-k most or least frequently occurring word types in the text:

L1: Sort words in counts in descending order and save them into dec as a list of (word, count) , sorted from the most frequent to the least frequent words (, , ).
L2: Sort words in counts in ascending order and save them into asc as a list of (word, count) tuples.

Notice that the top-10 least-frequent word list contains unnormalized words such as "Atlanta," (with the comma) and "Georgia." (with the period). This occurs because the text was split only by whitespaces without considering punctuation. Consequently, these words are recognized separately from the word types "Atlanta" and "Georgia". Therefore, the counts of word tokens and types processed above do not necessarily represent the distributions of the text accurately.

Q2: How can we interpret the most frequent words in a text?

Save Output

Finally, let us save all word types in alphabetical order to a file:

L2: Open outfile in mode (w).
L4: Iterate over unique word types (keys) of counts in alphabetical order.
L5: Write each word followed by a newline character to fout

L1: Creates the file if it does not exist; otherwise, its previous contents will be completely overwritten.

References

Source:
, The Python Standard Library - Built-in Types
, The Python Standard Library - Built-in Types

Frequency Analysis

Frequency Analysis examines how often each word appears in a corpus. It helps understand language patterns and structure by measuring how often words appear in text.

Word Counting

Consider the following text from Wikipedia about Emory University (as of 2023-10-18):

Our task is to determine the number of word tokens and unique word types in this text.

Q1: What is the difference between a word token and a word type?

A simple way of accomplishing this task is to split the text by whitespace and count the resulting strings:

L1: Import the class, a special type of a , from the package.
L3: Use to indicate the parameter type (str) and the return type (Counter).
L4: the corpus

L1: The corpus can be found .
L4: Save the total number of word tokens in the corpus, which is the of counts values.
L5: Save the number of unique word types in the corpus, which is the of counts

When running this program, you may encounter FileNotFoundError. To fix this error, follow these steps to set up your working directory:

Go to [Run] > [Edit Configurations] in the menu.

Word Frequency

In this task, we aim to retrieve the top-k most or least frequently occurring word types in the text:

L1: Sort words in counts in descending order and save them into dec as a list of (word, count) , sorted from the most frequent to the least frequent words (, , ).
L2: Sort words in counts in ascending order and save them into asc as a list of (word, count) tuples.

Q2: How can we interpret the most frequent words in a text?

Save Output

Finally, let us save all word types in alphabetical order to a file:

L2: Open outfile in mode (w).
L4: Iterate over unique word types (keys) of counts in alphabetical order.
L5: Write each word followed by a newline character to fout

L1: Creates the file if it does not exist; otherwise, its previous contents will be completely overwritten.

References

Source:
, The Python Standard Library - Built-in Types
, The Python Standard Library - Built-in Types

Emory University is a private research university in Atlanta, Georgia. Founded in 1836 as Emory College by the Methodist Episcopal Church and named in honor of Methodist bishop John Emory.[18]

Emory University has nine academic divisions. Emory Healthcare is the largest healthcare system in the state of Georgia[19] and comprises seven major hospitals, including Emory University Hospital and Emory University Hospital Midtown.[20] The university operates the Winship Cancer Institute, Yerkes National Primate Research Center, and many disease and vaccine research centers.[21][22] Emory University is the leading coordinator of the U.S. Health Department's National Ebola Training and Education Center.[23] The university is one of four institutions involved in the NIAID's Tuberculosis Research Units Program.[24] The International Association of National Public Health Institutes is headquartered at the university.[25]

Emory University has the 15th-largest endowment among U.S. colleges and universities.[9] The university is classified among "R1: Doctoral Universities - Very high research activity"[26] and is cited for high scientific performance and citation impact in the CWTS Leiden Ranking.[27] The National Science Foundation ranked the university 36th among academic institutions in the United States for research and development (R&D) expenditures.[28][29] In 1995 Emory University was elected to the Association of American Universities, an association of the 65 leading research universities in the United States and Canada.[5]

Emory faculty and alumni include 2 Prime Ministers, 9 university presidents, 11 members of the United States Congress, 2 Nobel Peace Prize laureates, a Vice President of the United States, a United States Speaker of the House, and a United States Supreme Court Justice. Other notable alumni include 21 Rhodes Scholars and 6 Pulitzer Prize winners, as well as Emmy Award winners, MacArthur Fellows, CEOs of Fortune 500 companies, heads of state and other leaders in foreign government.[30] Emory has more than 149,000 alumni, with 75 alumni clubs established worldwide in 20 countries.[31][32][33]

Frequency Analysis

hashtagWord Counting

hashtagWord Frequency

hashtagSave Output

hashtagReferences

Frequency Analysis

hashtagWord Counting

hashtagWord Frequency

hashtagSave Output

hashtagReferences

Word Counting

Word Frequency

Save Output

References

Word Counting

Word Frequency

Save Output

References