Column Orient
"Vector" style computation is beyond the scope of scautable itself. However, it's clear that a row oriented representation of the data, is not always the right construct - particularly for analysis type tasks.
To note again: statistics is beyond the scope of scautable.
It is encouraged to wheel in some other alternative mathematics / stats library (entirely at your own discretion / risk).
Reading CSV directly as columns
Scautable can read CSV data directly into a columnar format using the ReadAs.Columns option. This is more efficient than reading rows and then converting, as it only requires a single pass through the data.
This will fire up a repl with necssary imports;
scala-cli repl --dep io.github.quafadas::scautable::0.0.35 --dep io.github.quafadas::vecxt:0.0.35 --java-opt "--add-modules=jdk.incubator.vector" --scalac-option -Xmax-inlines --scalac-option 2048 --java-opt -Xss4m --repl-init-script 'import io.github.quafadas.table.{*, given}; import vecxt.all.{*, given}'
import io.github.quafadas.table.*
// Read directly as columns - returns NamedTuple of Arrays
// lazy - useful to prevent printing repl
lazy val simpleCols = CSV.resource("simple.csv", CsvOpts(readAs = ReadAs.Columns))
// Access columns directly as typed arrays
val col1: Array[Int] = simpleCols.col1
// col1: Array[Int] = Array(1, 3, 5)
val col2: Array[Int] = simpleCols.col2
// col2: Array[Int] = Array(2, 4, 6)
val col3: Array[Int] = simpleCols.col3
// col3: Array[Int] = Array(7, 8, 9)
// With vecxt, we get optimsed vector operations too.
// simpleCols.col1 + simpleCols.cols2
// Works with type inference
val titanicCols = CSV.resource("titanic.csv", CsvOpts(TypeInferrer.FromAllRows, ReadAs.Columns))
// titanicCols: NamedTuple[Tuple12["PassengerId", "Survived", "Pclass", "Name", "Sex", "Age", "SibSp", "Parch", "Ticket", "Fare", "Cabin", "Embarked"], *:[Array[Int], *:[Array[Boolean], *:[Array[Int], *:[Array[String], *:[Array[String], *:[Array[Option[Double]], *:[Array[Int], *:[Array[Int], *:[Array[String], *:[Array[Double], *:[Array[Option[String]], *:[Array[Option[String]], EmptyTuple]]]]]]]]]]]]] = Tuple12(
// _1 = Array(
// 1,
// 2,
// 3,
// 4,
// 5,
// 6,
// 7,
// 8,
// 9,
// 10,
// 11,
// 12,
// 13,
// 14,
// 15,
// 16,
// 17,
// 18,
// 19,
// 20,
// 21,
// 22,
// 23,
// 24,
// 25,
// 26,
// 27,
// 28,
// 29,
// 30,
// 31,
// 32,
// 33,
// 34,
// 35,
// 36,
// 37,
// 38,
// 39,
// 40,
// 41,
// 42,
// 43,
// 44,
// 45,
// 46,
// 47,
// ...
val ages: Array[Option[Double]] = titanicCols.Age
// ages: Array[Option[Double]] = Array(
// Some(22.0),
// Some(38.0),
// Some(26.0),
// Some(35.0),
// Some(35.0),
// None,
// Some(54.0),
// Some(2.0),
// Some(27.0),
// Some(14.0),
// Some(4.0),
// Some(58.0),
// Some(20.0),
// Some(39.0),
// Some(14.0),
// Some(55.0),
// Some(2.0),
// None,
// Some(31.0),
// None,
// Some(35.0),
// Some(34.0),
// Some(15.0),
// Some(28.0),
// Some(8.0),
// Some(38.0),
// None,
// Some(19.0),
// None,
// None,
// Some(40.0),
// None,
// None,
// Some(66.0),
// Some(28.0),
// Some(42.0),
// None,
// Some(21.0),
// Some(18.0),
// Some(14.0),
// Some(40.0),
// Some(27.0),
// None,
// Some(3.0),
// Some(19.0),
// None,
// None,
// None,
// ...
val survived: Array[Boolean] = titanicCols.Survived
// survived: Array[Boolean] = Array(
// false,
// true,
// true,
// true,
// false,
// false,
// false,
// false,
// true,
// true,
// true,
// true,
// false,
// false,
// false,
// true,
// false,
// true,
// false,
// true,
// false,
// true,
// true,
// true,
// false,
// true,
// false,
// false,
// true,
// false,
// false,
// true,
// true,
// false,
// false,
// false,
// true,
// false,
// false,
// true,
// false,
// false,
// false,
// true,
// true,
// false,
// false,
// true,
// ...
Converting row-oriented data to columns
Alternatively, you can read data as rows (the default) and then convert to columnar format:
//> using dep io.github.quafadas::vecxt:0.0.31
import io.github.quafadas.table.*
import vecxt.all.cumsum
import vecxt.BoundsCheck.DoBoundsCheck.yes
type ColSubset = ("Name", "Sex", "Age")
val data = CSV.resource("titanic.csv", TypeInferrer.FromAllRows)
.take(3)
.columns[ColSubset]
// data: Iterator[NamedTuple[ColSubset, *:[String, *:[String, *:[Option[Double], EmptyTuple]]]]] = empty iterator
val colData = LazyList.from(data).toColumnOrientedAs[Array]
// colData: NamedTuple[Tuple3["Name", "Sex", "Age"], *:[Array[String], *:[Array[String], *:[Array[Option[Double]], EmptyTuple]]]] = (
// Array(
// "Braund, Mr. Owen Harris",
// "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
// "Heikkinen, Miss. Laina"
// ),
// Array("male", "female", "female"),
// Array(Some(22.0), Some(38.0), Some(26.0))
// )
colData.Age
// res0: Array[Option[Double]] = Array(Some(22.0), Some(38.0), Some(26.0))
colData.Age.map(_.get).cumsum
// res1: Array[Double] = Array(22.0, 60.0, 86.0)
The direct columnar reading (first approach) is recommended when you know upfront that you need columnar access, as it's more efficient.
Reading CSV as Dense Arrays
For interoperability with numerical libraries (e.g., BLAS, LAPACK) or when you need a single contiguous memory layout, scautable provides dense array reading modes. These modes read all CSV data into a single flat array with stride information for accessing rows and columns.
Column-Major Dense Arrays
Column-major layout stores data column-by-column in memory, which is the standard layout for Fortran and mathematical libraries like BLAS/LAPACK.
import io.github.quafadas.table.*
// Read as column-major dense array
val colMajor = CSV.resource("simple.csv", CsvOpts(readAs = ReadAs.ArrayDenseColMajor[Int]()))
// colMajor: NamedTuple[Tuple5["data", "rowStride", "colStride", "rows", "cols"], Tuple5[Array[Int], Int, Int, Int, Int]] = (
// Array(1, 3, 5, 2, 4, 6, 7, 8, 9),
// 3,
// 1,
// 3,
// 3
// )
// Access the fields
val cmData: Array[Int] = colMajor.data // The flat array containing all data
// cmData: Array[Int] = Array(1, 3, 5, 2, 4, 6, 7, 8, 9)
val cmRowStride: Int = colMajor.rowStride // Stride to next row = numRows
// cmRowStride: Int = 3
val cmColStride: Int = colMajor.colStride // Stride to next column = 1
// cmColStride: Int = 1
val cmRows: Int = colMajor.rows // Number of rows
// cmRows: Int = 3
val cmCols: Int = colMajor.cols // Number of columns
// cmCols: Int = 3
// Access element at row i, col j
def getElementColMajor(i: Int, j: Int): Int =
cmData(j * cmRowStride + i * cmColStride)
// Example: get element at row 1, col 1
val cmElement = getElementColMajor(1, 1)
// cmElement: Int = 4
In column-major layout:
colStride = 1(next element in the same column)rowStride = numRows(jump to the next row)- Data is stored:
[col0_row0, col0_row1, ..., col1_row0, col1_row1, ...]
Row-Major Dense Arrays
Row-major layout stores data row-by-row in memory, which is the standard layout for C and most programming languages.
import io.github.quafadas.table.*
// Read as row-major dense array
val rowMajor = CSV.resource("simple.csv", CsvOpts(readAs = ReadAs.ArrayDenseRowMajor[Double]()))
// rowMajor: NamedTuple[Tuple5["data", "rowStride", "colStride", "rows", "cols"], Tuple5[Array[Double], Int, Int, Int, Int]] = (
// Array(1.0, 2.0, 7.0, 3.0, 4.0, 8.0, 5.0, 6.0, 9.0),
// 1,
// 3,
// 3,
// 3
// )
// Access the fields
val rmData: Array[Double] = rowMajor.data // The flat array containing all data
// rmData: Array[Double] = Array(1.0, 2.0, 7.0, 3.0, 4.0, 8.0, 5.0, 6.0, 9.0)
val rmRowStride: Int = rowMajor.rowStride // Stride to next row = 1
// rmRowStride: Int = 1
val rmColStride: Int = rowMajor.colStride // Stride to next column = numCols
// rmColStride: Int = 3
val rmRows: Int = rowMajor.rows // Number of rows
// rmRows: Int = 3
val rmCols: Int = rowMajor.cols // Number of columns
// rmCols: Int = 3
// Access element at row i, col j
def getElementRowMajor(i: Int, j: Int): Double =
rmData(i * rmColStride + j * rmRowStride)
// Example: get element at row 0, col 2
val rmElement = getElementRowMajor(0, 2)
// rmElement: Double = 7.0
In row-major layout:
rowStride = 1(next element in the same row)colStride = numCols(jump to the next column)- Data is stored:
[row0_col0, row0_col1, ..., row1_col0, row1_col1, ...]
Type Safety
The dense array modes require a type parameter specifying the array element type:
// Strongly typed as Array[Int]
val intArray = CSV.resource("data.csv", CsvOpts(readAs = ReadAs.ArrayDenseColMajor[Int]()))
// Strongly typed as Array[Double]
val doubleArray = CSV.resource("data.csv", CsvOpts(readAs = ReadAs.ArrayDenseRowMajor[Double]()))
// Strongly typed as Array[String]
val stringArray = CSV.fromString("a,b\nfoo,bar", CsvOpts(readAs = ReadAs.ArrayDenseColMajor[String]()))
The type conversion is handled automatically using scautable's ColumnDecoder infrastructure, which supports Int, Long, Double, Boolean, String, and Option types.
Use Cases
Dense arrays are particularly useful for:
- Numerical computing: Passing data to BLAS/LAPACK or other numerical libraries
- Machine learning: Preparing data for algorithms that expect contiguous arrays
- Performance: Single memory allocation and cache-friendly access patterns
- Interop: Integration with libraries expecting specific memory layouts (column-major for Fortran/R, row-major for C/Python)