-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunkwise support for read.fst
?
#269
Comments
Hi @hope-data-science, thanks for posting your request. I'm interested in learning your specific use case for chunked data, are your calculations memory constrained? Because (the same applies to any function where some ordering of column data is needed) |
but indeed you can use the library(dplyr)
library(fst)
tmp_file <- tempfile(fileext = "fst")
# write sample fst file
x <- data.frame(
X = sample(sample(1:100, 1000, replace = TRUE))
) %>%
write_fst(tmp_file)
# determine chunks
nr_of_chunks <- 8
chunk_size <- metadata_fst(tmp_file)$nrOfRows / nr_of_chunks
# custom function to run on each chunk
my_funct <- function(tbl, chunk) {
tbl %>%
summarise(
Mean = mean(X)
) %>%
mutate(
Chunk = chunk
)
}
# run custom function on each chunk
z <- lapply(1:nr_of_chunks, function(chunk, custom_function) {
y <- read_fst(
tmp_file,
from = 1 + (chunk - 1) * chunk_size,
to = chunk * chunk_size
)
custom_function(y, chunk)
}, my_funct) %>%
bind_rows()
print(z)
#> Mean Chunk
#> 1 51.680 1
#> 2 46.936 2
#> 3 51.304 3
#> 4 47.824 4
#> 5 52.000 5
#> 6 53.712 6
#> 7 55.440 7
#> 8 51.256 8 from there it depends on the actual custom function used how you need to combine the chunks, in this case: z %>%
summarise(
Mean = mean(Mean)
)
#> Mean
#> 1 51.269 |
I think |
Recently, I've learned a package named chunked. Since fst supports row access via row number, I suggest maybe
read.fst
function could support this sort of chunkwise operation. Any ideas to include this as a new feature? Are there solutions possible?Thanks.
The text was updated successfully, but these errors were encountered: