ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • spark 을 local, cluster mode 에서 실행하기 위한 JVM option, log4j.properties 설정
    spark 2021. 7. 10. 20:15

    log4j.proerties

    log4j.appender.file.File=${spark.yarn.app.container.log.dir}/${logfile.name}.log 사용에 주목할 것 !!

    # Set everything to be logged to the console
    log4j.rootCategory=INFO, console, file
    
    # define console appender
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.out
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} [%t] %-5p %c{1}: %m%n
    
    #application log
    #log4j.logger.guru.learningjournal.spark.examples=INFO, console, file
    #log4j.additivity.guru.learningjournal.spark.examples=false
    
    #define rolling file appender
    log4j.appender.file=org.apache.log4j.RollingFileAppender
    log4j.appender.file.File=${spark.yarn.app.container.log.dir}/app.log
    log4j.appender.file.ImmediateFlush=true
    log4j.appender.file.Append=false
    log4j.appender.file.MaxFileSize=500MB
    log4j.appender.file.MaxBackupIndex=2
    log4j.appender.file.layout=org.apache.log4j.PatternLayout
    log4j.appender.file.layout.conversionPattern=%d{yyyy-MM-dd HH:mm:ss} [%t] %-5p %c{1}: %m%n
    
    
    # Recommendations from Spark template
    log4j.logger.org.apache.spark.repl.Main=WARN
    log4j.logger.org.spark_project.jetty=WARN
    log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    log4j.logger.org.apache.parquet=ERROR
    log4j.logger.parquet=ERROR
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

     

    local 실행시 아래와 같이 JVM option 을 설정한다.

    -Dlog4j.configuration=file:log4j.properties
    -Dlogfile.name=hello-spark
    -Dspark.yarn.app.container.log.dir=logs

     

    cluster mode 로 실행하기 위한 Spark Submit

    yarn cluster 모드에서는 driver가 클러스터 노드에서 실행되므로 log4j.properties 파일이 명시적으로 전달되어야 한다.

    --files log4j.properties 로 파일 전송해야 함

    spark-submit
    --verbose
    --class guru.learningjournal.spark.examples.HelloSpark
    --files log4j.properties,spark.conf
    --conf 'spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-driver’
    --conf 'spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties -Dlogfile.name=hello-spark-executor’
    --master yarn
    --deploy-mode cluster
    hellospark_2.11-0.1.jar /user/root/data/sample.csv

     

    cluster mode 에서 로그 확인하기

    # dirver log 만 조회
    yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-driver.log
    
    # executor log 만 조회
    yarn logs -applicationId application_1585061770249_0004 -log_files hello-spark-executor.log

     

     

     

    참고: https://www.udemy.com/course/apache-spark-programming-in-scala

    댓글

Designed by Tistory.