-
spark 을 local, cluster mode 에서 실행하기 위한 JVM option, log4j.properties 설정spark 2021. 7. 10. 20:15
log4j.proerties
log4j.appender.file.File=${spark.yarn.app.container.log.dir}/${logfile.name}.log 사용에 주목할 것 !!
# Set everything to be logged to the console log4j.rootCategory=WARN, console # define console appender log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.out log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n #application log log4j.logger.guru.learningjournal.spark.examples=INFO, console, file log4j.additivity.guru.learningjournal.spark.examples=false #define rolling file appender log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.File=${spark.yarn.app.container.log.dir}/${logfile.name}.log #define following in Java System # -Dlog4j.configuration=file:log4j.properties # -Dlogfile.name=hello-spark # -Dspark.yarn.app.container.log.dir=app-logs log4j.appender.file.ImmediateFlush=true log4j.appender.file.Append=false log4j.appender.file.MaxFileSize=500MB log4j.appender.file.MaxBackupIndex=2 log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Recommendations from Spark template log4j.logger.org.apache.spark.repl.Main=WARN log4j.logger.org.spark_project.jetty=WARN log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
local 실행시 아래와 같이 JVM option 을 설정한다.
-Dlog4j.configuration=file:log4j.properties -Dlogfile.name=hello-spark -Dspark.yarn.app.container.log.dir=logs
cluster mode 로 실행하기 위한 Spark Submit
spark-submit --verbose --class guru.learningjournal.spark.examples.HelloSpark --files log4j.properties,spark.conf --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-driver’ --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-executor’ --master yarn --deploy-mode cluster hellospark_2.11-0.1.jar /user/root/data/sample.csv
cluster mode 에서 로그 확인하기
# dirver log 만 조회 yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-driver.log # executor log 만 조회 yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-executor.log
참고: https://www.udemy.com/course/apache-spark-programming-in-scala
'spark' 카테고리의 다른 글
spark tables - managed vs unmanaged(external) tables (0) 2021.07.25 데이터의 partition 수와 partition 별로 레코드 수 확인 방법 (0) 2021.07.18 spark 에서 schema 를 적용해 데이터 읽기 (0) 2021.07.18 spark 에서 groupByKey 로 특정 column 기준 unique row 만 남기기 (0) 2021.07.16 windows10 에 hdfs 설치하기 (0) 2021.07.10