ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • dataclass 로 부터 dataframe schema 추출하기
    pyspark 2024. 5. 29. 17:55

    AI 가 생성한 코드

    from pyspark.sql.types import StructType, StructField, IntegerType, StringType
    from dataclasses import fields
    
    # Assuming you have a dataclass named User
    @dataclasses.dataclass
    class User:
        id: int
        name: str
        age: int
    
    # Get the fields of the User class
    user_fields = fields(User)
    
    # Create a StructType and StructField for each field
    user_schema = StructType([
        StructField(field.name, type_to_spark_type(field.type), True)
        for field in user_fields
    ])
    
    def type_to_spark_type(pytype):
        if isinstance(pytype, int):
            return IntegerType()
        elif isinstance(pytype, str):
            return StringType()
        # Add more conditions for other types as needed
        else:
            raise ValueError(f"Unsupported type: {pytype}")
    
    print(user_schema)
    StructType(List(
        StructField(id, IntegerType, True),
        StructField(name, StringType, True),
        StructField(age, IntegerType, True)
    ))

    'pyspark' 카테고리의 다른 글

    pyspark 에서 avro 파일 읽기  (0) 2024.10.29
    pyspark Local 개발 환경 구성  (0) 2023.12.10

    댓글

Designed by Tistory.