CSV
Getting started
Our first move, is to tell the compiler, where your CSV file may be found. CSV.resource
is a macro which reads the column headers and injects them into the compilers type system. Here; we inline a string for the compiler to analyze.
import io.github.quafadas.table.*
val csv : CsvIterator[("col1", "col2", "col3"), (String, String, String)] = CSV.fromString("col1,col2,col3\n1,2,7\n3,4,8\n5,6,9")
// csv: CsvIterator[Tuple3["col1", "col2", "col3"], Tuple3[String, String, String]] = empty iterator
val asList = LazyList.from(csv)
// asList: LazyList[NamedTuple[Tuple3["col1", "col2", "col3"], Tuple3[String, String, String]]] = LazyList(
// ("1", "2", "7"),
// ("3", "4", "8"),
// ("5", "6", "9")
// )
asList.take(2).consoleFormatNt(fansi = false)
// res0: String = """| |col1|col2|col3|
// +-+----+----+----+
// |0| 1| 2| 7|
// |1| 3| 4| 8|
// +-+----+----+----+"""
The key point of the whole library - Note the take(2)
method. This is a method from scala's stdlib. In case it's not clear - you get all the other stuff too - .filter
, groupMapReduce
, which are powerful. Their use is strongly typed, because CSVIterator
is merely an Iterator
of NamedTuples
- you access the columns via their column name.
Reading CSV files
Reading CSV's as strings would be relatively uncommon - normally .csv
is a file.
The CSV
object has a few methods of reading CSV files. It is fundamentally scala.Source
based inside the macro.
import io.github.quafadas.table.*
val csv_resource = CSV.resource("simple.csv")
val csv_abs = CSV.absolutePath("/users/simon/absolute/path/simple.csv")
val csv_url = CSV.url("https://example.com/simple.csv")
/**
* Note: this reads from the _compilers_ current working directory. If you are compiling via bloop through scala-cli, for example, then this will * read the temporary directory _bloop_ is running in, _not_ your project directory.
*/
val csv_pwd = CSV.pwd("file.csv")
Strongly Typed CSVs
Scautable analyzes the CSV file and provides types and names for the columns. That means should get IDE support, auto complete, error messages for non sensical code, etc.
import io.github.quafadas.table.*
val experiment = asList
.mapColumn["col1", Double](_.toDouble)
.mapColumn["col2", Boolean](_.toInt > 3)
// experiment: LazyList[NamedTuple[Tuple3["col1", "col2", "col3"], *:[Double, *:[Boolean, *:[String, EmptyTuple]]]]] = LazyList(
// (1.0, false, "7"),
// (3.0, true, "8"),
// (5.0, true, "9")
// )
println(experiment.consoleFormatNt(fansi = false))
// | |col1| col2|col3|
// +-+----+-----+----+
// |0| 1.0|false| 7|
// |1| 3.0| true| 8|
// |2| 5.0| true| 9|
// +-+----+-----+----+
e.g. one cannot make column name typos because they are embedded in the type system.
val nope = experiment.mapColumn["not_col1", Double](_.toDouble)
// error:
// value toDouble is not a member of EmptyTuple
// val nope = experiment.mapColumn["not_col1", Double](_.toDouble)
// ^^^^^^^^^^
// error:
// Column ("not_col1" : String) not found
// val nope = experiment.mapColumn["not_col1", Double](_.toDouble)
// ^
Column Operations
Let's have a look at the some column manipulation helpers;
dropColumn
addColumn
renameColumn
mapColumn
val colmanipuluation = experiment
.dropColumn["col2"]
.addColumn["col4", Double](x => x.col1 * 2 + x.col3.toDouble)
.renameColumn["col4", "col4_renamed"]
.mapColumn["col4_renamed", Double](_ * 2)
// colmanipuluation: LazyList[NamedTuple[*:["col1", *:["col3", *:["col4_renamed", EmptyTuple]]], *:[Double, *:[String, *:[Double, EmptyTuple]]]]] = LazyList(
// (1.0, "7", 18.0),
// (3.0, "8", 28.0),
// (5.0, "9", 38.0)
// )
colmanipuluation.consoleFormatNt(fansi = false)
// res3: String = """| |col1|col3|col4_renamed|
// +-+----+----+------------+
// |0| 1.0| 7| 18.0|
// |1| 3.0| 8| 28.0|
// |2| 5.0| 9| 38.0|
// +-+----+----+------------+"""
println(colmanipuluation.column["col4_renamed"].foldLeft(0.0)(_ + _))
// 84.0
// and select a subset of columns
colmanipuluation.columns[("col4_renamed", "col1")].consoleFormatNt(fansi = false)
// res5: String = """| |col4_renamed|col1|
// +-+------------+----+
// |0| 18.0| 1.0|
// |1| 28.0| 3.0|
// |2| 38.0| 5.0|
// +-+------------+----+"""
Accumulating, slicing etc
We can delegate all such concerns, to the standard library in the usual way - as we have everything in side the type system!
colmanipuluation.filter(_.col4_renamed > 20).groupMapReduce(_.col1)(_.col4_renamed)(_ + _)
// res6: Map[Double, Double] = Map(3.0 -> 28.0, 5.0 -> 38.0)