CSV

Getting started

Our first move, is to tell the compiler, where your CSV file may be found. CSV.resource is a macro which reads the column headers and injects them into the compilers type system. Here; we inline a string for the compiler to analyze.

import io.github.quafadas.table.*

val csv : CsvIterator[("col1", "col2", "col3"), (Int, Int, Int)] = CSV.fromString("col1,col2,col3\n1,2,7\n3,4,8\n5,6,9")
// csv: CsvIterator[Tuple3["col1", "col2", "col3"], Tuple3[Int, Int, Int]] = empty iterator

val asList = LazyList.from(csv)
// asList: LazyList[NamedTuple[Tuple3["col1", "col2", "col3"], Tuple3[Int, Int, Int]]] = LazyList(
//   (1, 2, 7),
//   (3, 4, 8),
//   (5, 6, 9)
// )

asList.take(2).consoleFormatNt(fansi = false)
// res0: String = """| |col1|col2|col3|
// |-|----|----|----|
// |0|   1|   2|   7|
// |1|   3|   4|   8|
// |-|----|----|----|"""

The key point of the whole library - Note the take(2) method. This is a method from scala's stdlib. In case it's not clear - you get all the other stuff too - .filter, groupMapReduce, which are powerful. Their use is strongly typed, because CSVIterator is merely an Iterator of NamedTuples - you access the columns via their column name.

Reading CSV files

Reading CSV's as strings would be relatively uncommon - normally .csv is a file.

The CSV object has a few methods of reading CSV files. It is fundamentally scala.Source based inside the macro.

import io.github.quafadas.table.*

val csv_resource = CSV.resource("simple.csv")
val csv_abs = CSV.absolutePath("/users/simon/absolute/path/simple.csv")
val csv_url = CSV.url("https://example.com/simple.csv")
/**
 * Note: this reads from the _compilers_ current working directory. If you are compiling via bloop through scala-cli, for example, then this will * read the temporary directory _bloop_ is running in, _not_ your project directory.
 */
val opts = CsvOpts(typeInferrer = TypeInferrer.FirstN(1000), delimiter = ';')
val csv_pwd = CSV.pwd("file.csv", opts)

For customisation options look at CsvOpts, and supply that as a second argument to any of the above methods.

Strongly Typed CSVs

Scautable analyzes the CSV file and provides types and names for the columns. That means should get IDE support, auto complete, error messages for non sensical code, etc.

import io.github.quafadas.table.*

val experiment = asList
  .mapColumn["col1", Double](_.toDouble)
  .mapColumn["col2", Boolean](_.toInt > 3)
// experiment: LazyList[NamedTuple[Tuple3["col1", "col2", "col3"], *:[Double, *:[Boolean, *:[Int, EmptyTuple]]]]] = LazyList(
//   (1.0, false, 7),
//   (3.0, true, 8),
//   (5.0, true, 9)
// )

println(experiment.consoleFormatNt(fansi = false))
// | |col1| col2|col3|
// |-|----|-----|----|
// |0| 1.0|false|   7|
// |1| 3.0| true|   8|
// |2| 5.0| true|   9|
// |-|----|-----|----|

e.g. one cannot make column name typos because they are embedded in the type system.

val nope = experiment.mapColumn["not_col1", Double](_.toDouble)

// error:
// value toDouble is not a member of EmptyTuple
//  val nope = experiment.mapColumn["not_col1", Double](_.toDouble)
//                                                      ^^^^^^^^^^
// error:
// Column ("not_col1" : String) not found
//  val nope = experiment.mapColumn["not_col1", Double](_.toDouble)
//                                                                 ^

Column Operations

Let's have a look at the some column manipulation helpers;

dropColumn
addColumn
renameColumn
mapColumn

val colmanipuluation = experiment
  .dropColumn["col2"]
  .addColumn["col4", Double](x => x.col1 * 2 + x.col3.toDouble)
  .renameColumn["col4", "col4_renamed"]
  .mapColumn["col4_renamed", Double](_ * 2)
// colmanipuluation: LazyList[NamedTuple[*:["col1", *:["col3", *:["col4_renamed", EmptyTuple]]], *:[Double, *:[Int, *:[Double, EmptyTuple]]]]] = LazyList(
//   (1.0, 7, 18.0),
//   (3.0, 8, 28.0),
//   (5.0, 9, 38.0)
// )

colmanipuluation.consoleFormatNt(fansi = false)
// res3: String = """| |col1|col3|col4_renamed|
// |-|----|----|------------|
// |0| 1.0|   7|        18.0|
// |1| 3.0|   8|        28.0|
// |2| 5.0|   9|        38.0|
// |-|----|----|------------|"""

println(colmanipuluation.column["col4_renamed"].foldLeft(0.0)(_ + _))
// 84.0

// and select a subset of columns
colmanipuluation.columns[("col4_renamed", "col1")].consoleFormatNt(fansi = false)
// res5: String = """| |col4_renamed|col1|
// |-|------------|----|
// |0|        18.0| 1.0|
// |1|        28.0| 3.0|
// |2|        38.0| 5.0|
// |-|------------|----|"""

Accumulating, slicing etc

We can delegate all such concerns, to the standard library in the usual way - as we have everything in side the type system!

colmanipuluation.filter(_.col4_renamed > 20).groupMapReduce(_.col1)(_.col4_renamed)(_ + _)
// res6: Map[Double, Double] = Map(3.0 -> 28.0, 5.0 -> 38.0)