In this article, I will show how Calculate word frequency
of a given list of strings using JAVA 8. This is helpful while analyzing
big data, we can get some phrases that most users
are using for searching.
Tools Used :
1) eclipse version Luna 4.4.1.
2) Maven 3.3.3
3) JDK 1.8
Simple steps to follow are :
1) Create a simple maven project.
2) Write a simple java program to Calculate word frequency in list of strings.
3) Run the program.
Write a simple java program to Calculate word frequency in list of strings :
WordFrequency.java is the main class that is having wordFrequency()
that calculate each work frequency in given list of strings,
WordFrequency.java
package com.devjavasource.java8; import java.util.Arrays; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; import java.util.function.Consumer; import java.util.function.Function; import java.util.function.Predicate; import java.util.stream.Collectors; import java.util.stream.Stream; public class WordFrequency { public static void main(String[] args) { WordFrequency obj = new WordFrequency(); final Map<String, Double> map = obj.wordFrequency(Arrays.asList( "Hotels in Oakland", "Resorts near Sanfrancisco", "Restorants near Bay area", "Software Jobs in Oakland", "Hotels in Oakland Airport", "Resorts near Abudabi Airport", "Restorants near Australia", "Software Jobs in USA")); System.out.println("JAVA 8 : WordFrequency Example"); System.out.println("==============================="); map.entrySet().stream().forEach(obj.printit); } /** * print a Map.Entry */ private static Consumer<Map.Entry<String, Double>> printit = w -> System.out .printf("word: %s score:%.2f%n", w.getKey(), w.getValue()); /** * Build a Map of all words found in the list of strings along with their * relative frequency in the list. * * @param strings * @return */ public Map<String, Double> wordFrequency(List<String> strings) { Stream<String> streams = strings.stream(); // map each word to it total number of occurrences Map<String, Long> wordCount = streams.map(w -> w.split(" ")) // return Stream<String[]> .flatMap(Arrays::stream) // flatten to Stream<String> .map(trimit) // strip non-alphanumerics and uppercase all .filter(isalpha) .collect( Collectors.groupingBy(Function.identity(), Collectors.counting())); // map word strings to // count // total number of words in list Long wordTotal = wordCount.values().stream() .reduce(0L, (a, b) -> a + b); // convert total occurrences to a percentage of total words Map<String, Double> wordFreq = wordCount .entrySet() .stream() .collect( Collectors.toMap(e -> e.getKey(), e -> (100 * (e.getValue().doubleValue())) / wordTotal)); List<Map.Entry<String, Double>> sorted = wordFreq.entrySet().stream() .sorted(Map.Entry.comparingByValue()) .collect(Collectors.toList()); Map<String, Double> sortedMap = new LinkedHashMap<String, Double>(); sorted.forEach(e -> sortedMap.put(e.getKey(), e.getValue())); return sortedMap; } private Function<String, String> trimit = s -> s.replaceAll("[^A-Za-z0-9]", "").toUpperCase(); private Predicate<String> isalpha = s -> s.matches("[a-zA-Z]+") && s.length() > 2; }
Run the program :
Select WordFrequency, Run As -> Java Application.
Out Put :
JAVA 8 : WordFrequency Example =============================== word: SANFRANCISCO score:4.00 word: AUSTRALIA score:4.00 word: USA score:4.00 word: AREA score:4.00 word: BAY score:4.00 word: ABUDABI score:4.00 word: HOTELS score:8.00 word: AIRPORT score:8.00 word: SOFTWARE score:8.00 word: JOBS score:8.00 word: RESTORANTS score:8.00 word: RESORTS score:8.00 word: OAKLAND score:12.00 word: NEAR score:16.00
You can download complete Project, Here
*** Venkat – Happy leaning ****