Subset
What if we wanted to extract a subset of the data?
import io.github.quafadas.table.*
import scala.compiletime.constValueTuple
val datarator =
CSV.resource("titanic_short.csv", TypeInferrer.FromAllRows)
.zipWithIndex.map{case (r, idx) => (origIdx = idx ) ++ r}
val data = LazyList.from(datarator)
type myCols = ("Name", "Pclass", "Ticket")
val subset = data
.filter(_.Sex == "female")
.columns[myCols]
val csv = subset.toCsv(includeHeaders = true, delimiter = ',', quote = '"')
// csv: String = """Name,Pclass,Ticket
// "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",1,PC 17599
// "Heikkinen, Miss. Laina",3,STON/O2. 3101282
// "Futrelle, Mrs. Jacques Heath (Lily May Peel)",1,113803
// "Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",3,347742
// "Nasser, Mrs. Nicholas (Adele Achem)",2,237736
// "Sandstrom, Miss. Marguerite Rut",3,PP 9549
// "Bonnell, Miss. Elizabeth",1,113783
// "Vestrom, Miss. Hulda Amanda Adolfina",3,350406
// "Hewlett, Mrs. (Mary D Kingcome) ",2,248706
// "Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)",3,345763
// "Masselmani, Mrs. Fatima",3,2649"""
os.write.over(os.pwd / "subset.csv", csv)
This materialises the entire CSV in memory. It would also be possible to write a simple streaming transformation using similar constructs.
Streaming
One may stream a transformation to another file with relative ease.
val csvStrings: Iterator[String] = datarator
.filter(_.Sex == "female")
.columns[myCols]
.toCsv(includeHeaders = true, delimiter = ',', quote = '"')
val fileStream = os.write.outputStream(os.pwd / "test.csv")
csvStrings.foreach{s =>
fileStream.write(s.getBytes)
fileStream.write('\n')
}