Getting Started

Scautable: One line CSV import and dataframe utilities based on scala's NamedTuple.

Scala CLI

//> using dep io.github.quafadas::scautable::0.0.28

Here's a screencap of a tiny, self contained example.

Example

Quickstart...

Source: Kaggle

cereals

//> using scala 3.7.2
//> using dep io.github.quafadas::scautable::0.0.28
//> using resourceDir resources

import io.github.quafadas.table.*

@main def run(): Unit =
  val df = CSV.resource("cereals.csv", TypeInferrer.FromAllRows)

  val data = LazyList.from(
    df
      .addColumn["double_the_sugar", Double](_.sugars * 2)
      .dropColumn["fiber"] // no one cares about the healthy bit
      .mapColumn["name", String](_.toUpperCase)
      .renameColumn["mfr", "manufacturer"]
  )

  data.take(20).ptbln

  println("Hot cereals: ")
  data.collect{
    case row if row.`type` == "H" =>
      (name = row.name, made_by = row.manufacturer, sugar = row.sugars, salt = row.sodium)
  }.ptbln

Mill

mvn"io.github.quafadas::scautable::0.0.28"

Then run the same code as above in src/Example.scala.

Goals

5 second CSV quickstart

import io.github.quafadas.table.*
val data = CSV.resource("titanic.csv", TypeInferrer.FromAllRows)

// This doesn't display well on a website because of the ANSI...
data.toSeq.describe
// But these lines should be all you need to get an overview of the data.



// In order to make it look nice on a website
val (numerics, categoricals) = LazyList.from(
  CSV.resource("titanic.csv", TypeInferrer.FromAllRows)
).summary

In order to make it look nice on a website

println(
    numerics
      .mapColumn["mean", String](s => "%.2f".format(s))
      .mapColumn["0.25", String](s => "%.2f".format(s))
      .mapColumn["0.75", String](s => "%.2f".format(s))
      .consoleFormatNt(fansi = false)
)
// | |       name|   typ|  mean| min|  0.25|            median|  0.75|     max|
// +-+-----------+------+------+----+------+------------------+------+--------+
// |0|PassengerId|   Int|446.00| 1.0|223.25|             446.0|668.75|   891.0|
// |1|     Pclass|   Int|  2.31| 1.0|  1.77|               3.0|  3.00|     3.0|
// |2|        Age|Double| 29.70|0.42| 20.37|           28.2952| 38.35|    80.0|
// |3|      SibSp|   Int|  0.52| 0.0|  0.00|               0.0|  1.00|     8.0|
// |4|      Parch|   Int|  0.38| 0.0|  0.00|               0.0|  0.09|     6.0|
// |5|       Fare|Double| 32.20| 0.0|  7.91|14.302549127640036| 31.04|512.3292|
// +-+-----------+------+------+----+------+------------------+------+--------+

println(
  categoricals
  .mapColumn["sample", String](_.take(20))
  .consoleFormatNt(fansi = false)
)
// | |    name|uniqueEntries|            mostFrequent|frequency|              sample|
// +-+--------+-------------+------------------------+---------+--------------------+
// |0|Survived|            2|                   false|      549|         false, true|
// |1|    Name|          891|Young, Miss. Marie Grice|        1|Wick, Miss. Mary Nat|
// |2|     Sex|            2|                    male|      577|        male, female|
// |3|  Ticket|          681|                  347082|        7|11967, 372622, 13568|
// |4|   Cabin|          147|                 B96 B98|        4|C126, C54, D28, A23,|
// |5|Embarked|            3|                       S|      644|             S, Q, C|
// +-+--------+-------------+------------------------+---------+--------------------+