Subset

What if we wanted to extract a subset of the data?

import io.github.quafadas.table.*
import scala.compiletime.constValueTuple
val datarator  =
  CSV.resource("titanic_short.csv", TypeInferrer.FromAllRows)
  .zipWithIndex.map{case (r, idx) => (origIdx = idx ) ++ r}

val data = LazyList.from(datarator)

type myCols = ("Name", "Pclass", "Ticket")

val subset = data
  .filter(_.Sex == "female")
  .columns[myCols]
val csv = subset.toCsv(includeHeaders = true, delimiter = ',', quote = '"')
// csv: String = """Name,Pclass,Ticket
// "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",1,PC 17599
// "Heikkinen, Miss. Laina",3,STON/O2. 3101282
// "Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,113803
// "Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",3,347742
// "Nasser, Mrs. Nicholas (Adele Achem)",2,237736
// "Sandstrom, Miss. Marguerite Rut",3,PP 9549
// "Bonnell, Miss. Elizabeth",1,113783
// "Vestrom, Miss. Hulda Amanda Adolfina",3,350406
// "Hewlett, Mrs. (Mary D Kingcome) ",2,248706
// "Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",3,345763
// "Masselmani, Mrs. Fatima",3,2649"""
os.write.over(os.pwd / "subset.csv", csv)

This materialises the entire CSV in memory. It would also be possible to write a simple streaming transformation using similar constructs.

Streaming

One may stream a transformation to another file with relative ease.

val csvStrings: Iterator[String] = datarator
  .filter(_.Sex == "female")
  .columns[myCols]
  .toCsv(includeHeaders = true, delimiter = ',', quote = '"')

val fileStream = os.write.outputStream(os.pwd / "test.csv")

csvStrings.foreach{s =>
  fileStream.write(s.getBytes)
  fileStream.write('\n')
}