ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • spark 을 local, cluster mode 에서 실행하기 위한 JVM option, log4j.properties 설정
    spark 2021. 7. 10. 20:15

    log4j.proerties

    log4j.appender.file.File=${spark.yarn.app.container.log.dir}/${logfile.name}.log 사용에 주목할 것 !!

    # Set everything to be logged to the console
    log4j.rootCategory=WARN, console
    
    # define console appender
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.out
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    #application log
    log4j.logger.guru.learningjournal.spark.examples=INFO, console, file
    log4j.additivity.guru.learningjournal.spark.examples=false
    
    #define rolling file appender
    log4j.appender.file=org.apache.log4j.RollingFileAppender
    log4j.appender.file.File=${spark.yarn.app.container.log.dir}/${logfile.name}.log
    #define following in Java System
    # -Dlog4j.configuration=file:log4j.properties
    # -Dlogfile.name=hello-spark
    # -Dspark.yarn.app.container.log.dir=app-logs
    log4j.appender.file.ImmediateFlush=true
    log4j.appender.file.Append=false
    log4j.appender.file.MaxFileSize=500MB
    log4j.appender.file.MaxBackupIndex=2
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.conversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    
    # Recommendations from Spark template
    log4j.logger.org.apache.spark.repl.Main=WARN
    log4j.logger.org.spark_project.jetty=WARN
    log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    log4j.logger.org.apache.parquet=ERROR
    log4j.logger.parquet=ERROR
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

     

    local 실행시 아래와 같이 JVM option 을 설정한다.

    -Dlog4j.configuration=file:log4j.properties
    -Dlogfile.name=hello-spark
    -Dspark.yarn.app.container.log.dir=logs

     

    cluster mode 로 실행하기 위한 Spark Submit

    spark-submit
    --verbose
    --class guru.learningjournal.spark.examples.HelloSpark
    --files log4j.properties,spark.conf
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-driver’
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-executor’
    --master yarn
    --deploy-mode cluster
    hellospark_2.11-0.1.jar /user/root/data/sample.csv

     

    cluster mode 에서 로그 확인하기

    # dirver log 만 조회
    yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-driver.log
    
    # executor log 만 조회
    yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-executor.log

     

     

     

    참고: https://www.udemy.com/course/apache-spark-programming-in-scala

    댓글

Designed by Tistory.