Monday, December 6, 2010

How to use a sorted map in Java - TreeMap

Sometimes in Java, you want the ease of using a map, with all the ease that using get() and put() entails. If you'd like the keys to be sorted, you could implement your own Comparator (and you pretty much have to, for complex objects). But most of the time you want a map with simple objects like Integers or Strings in them.

Enter TreeMap to the rescue. This nifty class is part of the Java Collections framework and is a sorted map, where the keys are sorted according to their natural ordering. Simply put, this means that for simple objects like Strings and Integers, you have to do nothing more than declare your map.

Consider the following text in a text file, taken from here:

Overtones of guilt about teenage hormonal episodes aside, this graphic novel holds your attention. The monochromatic format uses its high contrast to draw your eye to the stark emotions on the characters’ faces. The heads are large, often oversized and the usual emotions of angst, betrayal, anger, humiliation, terror, shame, guilt and apathy that make up the pre-pubescent psyche are portrayed realistically here. This is no Fuuuuu comic.

If you have to count the number of times each word occurs in a text file and print out the totals, sorted by the words themselves, this is easy with the following code:

public class WordCount
{
public TreeMap < String, Integer > wordMap = new TreeMap < String, Integer > ();

public static void main(String[] args) throws IOException
{
WordCount w = new WordCount();
w.countWords();
for (Map.Entry<String, Integer> entry : w.wordMap.entrySet())
{
System.out.println(entry.getKey() + " occurred " + entry.getValue() + " times");
}
}
void countWords() throws IOException
{
BufferedReader br = new BufferedReader(new FileReader("/tmp/a.txt"));
String line = "";
while((line = br.readLine()) != null)
{
String[] tokens = line.split("\\s+");
for(int i = 0; i < length; i++)
{
int count = wordMap.get(tokens[i]) == null ? 0 : wordMap.get(tokens[i]);
wordMap.put(tokens[i], ++count);
}
}
}

}


Your output, sorted by the words in ascending order, is:



Fuuuuu occurred 1 times
Overtones occurred 1 times
The occurred 2 times
This occurred 1 times
about occurred 1 times
and occurred 2 times
anger, occurred 1 times
angst, occurred 1 times
apathy occurred 1 times
are occurred 2 times
aside, occurred 1 times
attention. occurred 1 times
betrayal, occurred 1 times
characters’ occurred 1 times
comic. occurred 1 times
contrast occurred 1 times
draw occurred 1 times
emotions occurred 2 times
episodes occurred 1 times
eye occurred 1 times
faces. occurred 1 times
format occurred 1 times
graphic occurred 1 times
guilt occurred 2 times
heads occurred 1 times
here. occurred 1 times
high occurred 1 times
holds occurred 1 times
hormonal occurred 1 times
humiliation, occurred 1 times
is occurred 1 times
its occurred 1 times
large, occurred 1 times
make occurred 1 times
monochromatic occurred 1 times
no occurred 1 times
novel occurred 1 times
of occurred 2 times
often occurred 1 times
on occurred 1 times
oversized occurred 1 times
portrayed occurred 1 times
pre-pubescent occurred 1 times
psyche occurred 1 times
realistically occurred 1 times
shame, occurred 1 times
stark occurred 1 times
teenage occurred 1 times
terror, occurred 1 times
that occurred 1 times
the occurred 4 times
this occurred 1 times
to occurred 2 times
up occurred 1 times
uses occurred 1 times
usual occurred 1 times
your occurred 2 times


It should be trivial enough to switch the keys and values around to print the most-occurring words first, instead of sorting by the words. Good luck!

No comments:

Post a Comment