Tuesday, 29 November 2016


data.table package allows R to handle very large data sets, typically 10's or 100's of millions of rows, efficiently. This includes loading/importing the data and aggregating the data.
To import a flat file with very large number of rows, data.table provides fread function.
Data<- fread("data.csv", sep = ",", header = TRUE)

To aggregate the data set: 
Agg <-[, list(Avg_Sepal_Length = mean(Sepal.Length)), by = "Species"]
When aggregating multiple columns at the same time:
AggMC <-[, list(Avg_Sepal_Length = mean(Sepal.Length), Avg_Petal_Length = mean(Petal.Length)), by = "Species"]
When aggregating all columns other than the grouping column:
AggAC <-[, lapply(.SD, mean), by = "Species"]
When aggregating by multiple grouping columns:

AggMCMG <-[, list(Avg_Conc = mean(conc), Total_Uptake = sum(uptake)), by = c("Plant", "Type")]