Apache Spark SQL: Calculating Rank and writing to file | Baby Names Part 2

Standard
Reading Time: < 1 minute

Lets do something interesting and useful with the data like figure out the rankings and write results to a file.

First lets learn how to filter and only get female names.

...
sparkSession = SparkSession.builder.appName("FilterFemaleNames").getOrCreate()

schema = StructType([
    StructField('name', StringType(), True),
    StructField('gender', StringType(), True),
    StructField('amount', IntegerType(), True)])

namesDF = sparkSession.read.schema(schema).csv(data_file)

femaleNamesDF = namesDF.filter(namesDF.gender == 'F')

femaleNamesDF.sort('name').show(truncate=False)
...

Pretty straight forward.

Calculate ranking for female names and write to file.

...
namesDF = sparkSession.read.schema(schema).csv(data_file)

femaleNamesDF = namesDF.filter(namesDF.gender == 'F')
nameSpec = Window.orderBy(functions.desc("amount"))

results = femaleNamesDF.withColumn(
    "rank", functions.dense_rank().over(nameSpec))

results.coalesce(1).write.format("csv").mode('overwrite').save(
    "results/female_rank.csv", header="true")
...

Calculate rank and partition by gender and write to file.

...
namesDF = sparkSession.read.schema(schema).csv(data_file)

nameSpec = Window.partitionBy("gender").orderBy(functions.desc("amount"))

results = namesDF.withColumn(
    "rank", functions.dense_rank().over(nameSpec))

results.coalesce(1).write.format("csv").mode('overwrite').save(
    "results/rank_names_partitioned.csv", header="true")
...

Filter rank by partition by name.

...
namesDF = sparkSession.read.schema(schema).csv(data_file)

nameSpec = Window.partitionBy("gender").orderBy(functions.desc("amount"))

results = namesDF.withColumn(
    "rank", functions.dense_rank().over(nameSpec))

results.filter(results.name == "Zyan").show()
...

Results:

+----+------+------+----+
|name|gender|amount|rank|
+----+------+------+----+
|Zyan|     F|    11| 942|
|Zyan|     M|    87| 832|
+----+------+------+----+

Not to many kids named Zyan.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.