Subset

What if we wanted to extract a subset of the data?

import io.github.quafadas.table.*
import scala.compiletime.constValueTuple
val datarator  =
  CSV.resource("titanic_short.csv", TypeInferrer.FromAllRows)
  .zipWithIndex.map{case (r, idx) => (origIdx = idx ) ++ r}

val data = LazyList.from(datarator)

type myCols = ("Name", "Pclass", "Ticket")

val subset = data
  .filter(_.Sex == "female")
  .columns[myCols]
val csv = subset.toCsv(includeHeaders = true, delimiter = ',', quote = '"')
// csv: Iterator[String] = non-empty iterator
os.write.over(os.pwd / "subset.csv", csv)

This materialises the entire CSV in memory. It would also be possible to write a simple streaming transformation using similar constructs.

Streaming

One may stream a transformation to another file with relative ease.

val csvStrings: Iterator[String] = datarator
  .filter(_.Sex == "female")
  .columns[myCols]
  .toCsv(includeHeaders = true, delimiter = ',', quote = '"')

val fileStream = os.write.outputStream(os.pwd / "test.csv")

csvStrings.foreach{s =>
  fileStream.write(s.getBytes)
  fileStream.write('\n')
}