In this article, I will show how Calculate word frequency
of a given list of strings using JAVA 8. This is helpful while analyzing
big data, we can get some phrases that most users
are using for searching.
Tools Used :
1) eclipse version Luna 4.4.1.
2) Maven 3.3.3
3) JDK 1.8
Simple steps to follow are :
1) Create a simple maven project.
2) Write a simple java program to Calculate word frequency in list of strings.
3) Run the program.
Write a simple java program to Calculate word frequency in list of strings :
WordFrequency.java is the main class that is having wordFrequency()
that calculate each work frequency in given list of strings,
WordFrequency.java
package com.devjavasource.java8;
import java.util.Arrays;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Consumer;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class WordFrequency {
public static void main(String[] args) {
WordFrequency obj = new WordFrequency();
final Map<String, Double> map = obj.wordFrequency(Arrays.asList(
"Hotels in Oakland",
"Resorts near Sanfrancisco",
"Restorants near Bay area",
"Software Jobs in Oakland",
"Hotels in Oakland Airport",
"Resorts near Abudabi Airport",
"Restorants near Australia",
"Software Jobs in USA"));
System.out.println("JAVA 8 : WordFrequency Example");
System.out.println("===============================");
map.entrySet().stream().forEach(obj.printit);
}
/**
* print a Map.Entry
*/
private static Consumer<Map.Entry<String, Double>> printit = w -> System.out
.printf("word: %s score:%.2f%n", w.getKey(), w.getValue());
/**
* Build a Map of all words found in the list of strings along with their
* relative frequency in the list.
*
* @param strings
* @return
*/
public Map<String, Double> wordFrequency(List<String> strings) {
Stream<String> streams = strings.stream();
// map each word to it total number of occurrences
Map<String, Long> wordCount = streams.map(w -> w.split(" "))
// return Stream<String[]>
.flatMap(Arrays::stream)
// flatten to Stream<String>
.map(trimit)
// strip non-alphanumerics and uppercase all
.filter(isalpha)
.collect(
Collectors.groupingBy(Function.identity(),
Collectors.counting())); // map word strings to
// count
// total number of words in list
Long wordTotal = wordCount.values().stream()
.reduce(0L, (a, b) -> a + b);
// convert total occurrences to a percentage of total words
Map<String, Double> wordFreq = wordCount
.entrySet()
.stream()
.collect(
Collectors.toMap(e -> e.getKey(),
e -> (100 * (e.getValue().doubleValue()))
/ wordTotal));
List<Map.Entry<String, Double>> sorted = wordFreq.entrySet().stream()
.sorted(Map.Entry.comparingByValue())
.collect(Collectors.toList());
Map<String, Double> sortedMap = new LinkedHashMap<String, Double>();
sorted.forEach(e -> sortedMap.put(e.getKey(), e.getValue()));
return sortedMap;
}
private Function<String, String> trimit = s -> s.replaceAll("[^A-Za-z0-9]",
"").toUpperCase();
private Predicate<String> isalpha = s -> s.matches("[a-zA-Z]+")
&& s.length() > 2;
}
Run the program :
Select WordFrequency, Run As -> Java Application.
Out Put :
JAVA 8 : WordFrequency Example =============================== word: SANFRANCISCO score:4.00 word: AUSTRALIA score:4.00 word: USA score:4.00 word: AREA score:4.00 word: BAY score:4.00 word: ABUDABI score:4.00 word: HOTELS score:8.00 word: AIRPORT score:8.00 word: SOFTWARE score:8.00 word: JOBS score:8.00 word: RESTORANTS score:8.00 word: RESORTS score:8.00 word: OAKLAND score:12.00 word: NEAR score:16.00
You can download complete Project, Here
*** Venkat – Happy leaning ****