분류 전체보기
-
Semaphore 를 사용해 최대 maxConcurrent 개의 Future만 실행되도록 제한scala/basic 2025. 3. 13. 10:28
내가 작성package com.naver.search.webimport java.util.concurrent.Semaphoreimport scala.concurrent.ExecutionContext.Implicits.globalimport scala.concurrent.{Await, Future}object SemaphoreFutureTest { def semFuture[T](f: => T, n: Int): Future[T] = { val sem = new Semaphore(n) Future { try { sem.acquire() f } finally { sem.release() } } } def f(x: Int): Int = { Thread.sleep(1000) pr..
-
Session Killgraph database/nebula graph 2025. 3. 10. 20:45
아래 nGQL 로 session 을 확인 / KILL 할 수 있다. show SESSIONS;kill SESSION 그런데 연결된 session 이 너무 많아 하나씩 KILL 하기 힘들 때는 코드를 작성해 진행할 수 있다.import com.vesoft.nebula.client.graph.SessionPoolimport com.vesoft.nebula.client.graph.data.ResultSetimport scala.collection.JavaConverters._object KillAllSessions { def main(args: Array[String]): Unit = { val sessionPool: SessionPool = ... val resultSet: ResultSet = se..
-
nGQL Basicgraph database/nebula graph 2025. 3. 10. 18:16
https://docs.nebula-graph.io/3.8.0/2.quick-start/4.nebula-graph-crud/Space 생성CREATE SPACE `sites` (partition_num = 1000, replica_factor = 3, vid_type = FIXED_STRING(64)) COMMENT = "site relations" TAG, EDGE 만들기CREATE TAG site(url string);CREATE EDGE follow(weight int); VERTEX, EDGE 데이터 넣기INSERT VERTEX site(url) VALUES "64f3134d1e65568291bac977cf60019a2e1d2e1e537231a88f4ab90c6ab0b9c9":("http://ho..
-
pandas rolling 을 pyspark window function 으로 표현해 보기pyspark 2025. 3. 5. 21:39
Pandas rollingimport pandas as pddf = pd.read_csv('./UDEMY_TSA_FINAL/Data/starbucks.csv', index_col='Date', parse_dates=True)df.rolling(window=7).mean() Pyspark window functionimport pysparkfrom pyspark.sql import SparkSession, Windowimport pyspark.sql.functions as Fspark = SparkSession.builder.appName('spark_test').master("local[*]").getOrCreate()df = spark.read.csv('./UDEMY_TSA_FINAL/Data/star..
-
kittens examplescala/cats2 2025. 3. 5. 20:56
https://github.com/typelevel/kittens "org.typelevel" %% "kittens" % "3.4.0" Example1import cats.Showimport cats.derived.semiautoimport cats.implicits.*case class Name(value: String)case class Person(name: Name, age: Int)object Person { given Show[Person] = semiauto.show}object Test { def main(args: Array[String]): Unit = { val person = Person(Name("KJM"), 20) println(person.show) // Per..
-
Pandas dataframe <-> Spark dataframe 변환pyspark 2025. 1. 10. 11:55
Spark dataframe -> Pandas dataframe toPandas() 메서드로 변환할 수 있다.spark_df = spark.read.parquet("hdfs://...").limit(10)pandas_df = spark_df.toPandas() Pandas dataframe -> Spark dataframe pandas_df = pd.read_csv("data.csv", header=0)# spark, pandas 버전에 따라 아래 추가가 필요할 수 있다.# https://stackoverflow.com/a/76404841/5137193pd.DataFrame.iteritems = pd.DataFrame.itemsspark_df = spark.createDataFrame(pandas_df..
-
Recommendations dataset 으로 실습하기graph database/neo4j 2025. 1. 9. 12:27
문서graphdb conceptscyper style guide 실습 준비https://sandbox.neo4j.com/ 에 접속해 loginGetting started with Neo4j Browser 시작Movie Recommendations dataset 선택 실습 데이터 Query 실습CALL db.schema.visualization()MATCH (m:Movie) RETURN mLIMIT 1MATCH (g:Genre)RETURN gLIMIT 1MATCH (g:Genre)RETURN g.name# Genre name 이 Comedy 인 Movie title 출력MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)WHERE g.name = 'Comedy'RETURN m.title..
-
groupBy 후 group 당 N 개의 record 만 남기기spark 2025. 1. 6. 21:19
groupBy 후 group 당 최대 N=300 개만 남긴다고 할 때 방법1groupByKey + mapGroups 사용val urlLineDs: Dataset[(String, Seq[String])] = spark.read.text(path) .toDF("url", "item") .groupByKey(row => row.getAs[String]("url")) .mapGroups { case (url, rowIter: Iterator[Row]) => val itemList: Seq[String] = rowIter.take(300).map(_.getAs[String]("item")).toList (url, itemList) } 방법2window function 사용val window = Windo..