This section describes how to implement a trie.
Let us define a class, TrieNode
:
L1
: the generic type T
represents the type of value
in L6
.
L2
: Map
is used to store the children.
L4
: true
if this node represents the last character of a certain word; otherwise, false
.
What other types of maps are implemented in Java?
Let us define getter methods:
L10
: returns the map consisting of children's characters as keys and their nodes as values.
Is there any risk of returning
this.children
inL11
?
Let us then define setter methods:
L7
: returns the previously assigned value of this node.
L13
: returns the child with the specific key if exists; otherwise, a new child with the key.
What does the
removeChild()
method return ifkey
does not exist in this node?
Finally, let us define checker methods:
L1
: returns true
if this node represents the last character of a certain word; false
.
Let us create the class, Trie
:
L1
: defines a generic type T
that can be any type.
L5
: initializes the root with the null
character.
Let us define the find()
method that takes a specific key in string and returns the node corresponding the last character of the key:
L5
: iterates through every character in the string.
L6
: finds the node whose key is the current character.
L7
: the child does not exist, implying that key
is not introduced in this trie.
Given the find()
method, let us define the get()
method that takes a string key and returns the value of the node corresponding to the key in this trie:
L3
: checks if both the key exists and the key is a word.
We then define the put() method that inserts a key-value pair to the trie:
L4-5
: iterates through all characters in the key and adds children as necessary.
L7
: sets the end state of the node corresponding the last character to true
.
L8
: returns the value of the previously key if exists; otherwise, null
.
What does
node.addChild(c)
return if the child with the keyc
already exists?
The following demonstrates how the put()
method works:
Finally, let us define the remove()
method that removes all appropriate nodes corresponding to the key:
L2
: retrieves the node corresponding to the last character in the key.
L4
: returns false
the key does not exist or the key is not a word.
L7-10
: if the retrieved node has children, sets its end-state to false
and returns true
.
L12-23
: removes ancestors who have descendants only related to this key.
The following demonstrates how the remove()
method works:
L9
: HashMap
gives the complexity for searching a key.
This chapter discusses tries, also known as prefix trees.
This section provides exercises for better understanding in tries.
Let L
be a list of country names in String
where spaces are allowed (e.g., "United States"
, "South Korea"
). Consider a trie where all country names in L
and their unique IDs are put as key-value pairs as follows:
Write a report that briefly describe your approach and explains the worst-case complexity of your approach, and save it as quiz5.pdf
.
Substring matching of country names is not required. If your dictionary has "United States"
as a country name and the input string contains "United Nation"
, the "United"
should not be matched as a country name.
Substring matching within input words are expected. If your dictionary has "America"
as a country name and the input contains "American"
, the first 7 characters, "America"
should be recognized as a country name.
The worst-case complexity needs to be explained in terms of the number of characters n
in the input string.
Create the class under the package that inherits by passing Integer
as the generic type.
Update the getEntities()
method that takes an input string and returns the list of , where each entity consists of the begin index (inclusive) and end index (exclusive) of the first and the last characters of the corresponding country name respectively as well as the ID of the country name.
See for an example of how to unit-test your approach.
This section explains the homework assignment regarding Autocomplete.
Most virtual keyboards provide the option of auto-complete, which gives a list of candidate words that you wish to type given a prefix you enter. For instance, when you type "sh", it gives a list of candidate words such as "she", "shell", "ship", etc.
If you select a word from the candidates, it should learn your selection as the top candidate for that prefix. For instance, if you select "ship" from the example above, the next time you enter the same prefix "sh", it should give a list of candidates where the first item is "ship" instead of "she".
Your task is to write a program that gives a list of candidate words for any prefix and learns the selected candidates.
Create a class called AutocompleteHW
under the autocomplete
package:
This class extends the abstract class Autocomplete
, which extends Trie
.
The value type of the generic T
can be a collection of strings (e.g., List<String>
).
You must create a constructor that takes two parameters, dict_file
and max
, and calls its super constructor, which reads all words in the dictionary file (e.g., dict.txt
).
Override the getCandidates()
method that takes a prefix and returns a list of candidate words matching the prefix:
The max-number of candidates in the list must be the return value of the getMax()
method.
The most recently selected candidate should appear as the 1st item, the 2nd most recently selected candidate should appear as the 2nd item, and so on.
The rest of the candidate list should be filled with the shortest words matching the prefix.
If there are more than one candidate with the same lengths, they should be sorted alphabetically.
Make sure the same candidate does not appear more than once.
Override the pickCandidate()
method that takes a prefix and a selected candidate, and saves the information:
This is the most recently selected candidate for that particular prefix. It must appear as the first item in the candidate list when the same prefix is entered next time.
Submit AutocompleteHW.java
to Canvas.
Create a class called AutocompleteHWExtra
under the autocomplete
package.
Feel free to change the generic type, representing the value type of each TrieNode
, to anything other than List<String>
(List<something else>
).
The getCandidates()
method:
Must show the most frequently picked candidate first, the 2nd-most frequently picked candidate next, and so on.
If there are more than one candidate with the same frequency, sort them by recency.
The rest of candidates should be filled as in the original task.
You must submit both AutocompleteHW
and AutocompleteHWExtra
to earn the extra credit.
Do not change the dictionary file. If you find anything peculiar about the dictionary file, please let me know so everyone works on the same copy of the dictionary file.
Please test your program yourself. We will evaluate your program using our unit tests and measure the performance (both speed and accuracy).
Take a look at Map if you are not familiar with methods in the standard library.
If you are having trouble with implementing getCandidates()
, think about how to traverse the trie given any node.
If you are having trouble with implementing pickCandidate()
, take a look at Trie#put
.
You are not allowed to use any type of Map to store candidates for this homework.
Your program should be able to handle prefixes or candidates that do not exist in the dictionary.
All picked candidates should be introduced as real words to the trie.
Whitespaces are not allowed as input; thus, you should trim all input strings.
This section gives an overview of tries.
A trie is a tree where each node has:
0-to-many children,
A key whose type is character,
A value that can be any type of object, and
An end-state that indicates if the node and its ancestors together represent the end of a certain word.
Let us consider the following list of strings:
Given the strings as keys, a binary search tree can be constructed as follows using a balancing algorithm:
How many character comparisons does it need to make to search a string in a balanced BST?
With the same list of strings, a trie can be constructed as follows:
1st cell: key (a character).
2nd cell: value (the index of the string that the node represents).
3rd cell: end-state (T
for true, F
for false).
What is the worst-case complexity of searching a string in a trie?