-
DataSet WordCount 예제flink 2021. 5. 15. 15:08
문제
input 파일(wc.txt) 를 읽어 'N' 으로 시작하는 단어만 word counting 해 output 파일(wc.out) 에 저장한다.
Input(wc.txt)
Noman Joyce Noman Isidore Nipun Rebekah Nipun
Expected Output(wc.out)
Nipun 2 Noman 2
Code
import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.api.scala._ object WordCountExample { case class WordCount(word: String, count: Int) def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment env.setParallelism(1) val params = ParameterTool.fromArgs(args) env.getConfig.setGlobalJobParameters(params) val text = env.readTextFile(params.get("input")) val filtered = text.filter(_.startsWith("N")) val tokenized = filtered.map(x => WordCount(x, 1)) val counts = tokenized.groupBy("word").sum("count") if (params.has("output")) { counts.writeAsCsv(params.get("output"), "\n", " ") env.execute("WordCount Example") } } }
실행
bin/flink run wc.jar --input file:///work/wc.txt --output file:///work/wc.out
참고: https://www.udemy.com/course/apache-flink-a-real-time-hands-on-course-on-flink/
'flink' 카테고리의 다른 글
Iterate 예제 (0) 2021.05.17 Stream Split 예제 (0) 2021.05.17 Stream Reduce, Min, MinBy, Max, MaxBy 예제 (0) 2021.05.16 Stream WordCount 예제 (0) 2021.05.15 DataSet Join 예제 (0) 2021.05.15