-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Speed improvement #32
Comments
@ramiromagno looking into those packages - any suggestion for higher speed performance on write? rcpp.simdjson has read functions but I wasn't seeing as much for writing. |
You're right, it seems rcpp.simdjson only has read functions. yyjsonr looks promising though. |
Using my yyjson_switch branch with 2 cores and 16gb of ram in a container: ae <- read_dataset_json(test_path("testdata", "ae.json"))
ae_100 <- dplyr::bind_rows(rep(list(ae),100000))
ds_metadata <- dplyr::bind_rows(purrr::map(ae, \(x) attributes(x)))
ds_metadata['name'] <- names(ae)
ds_json <-
dataset_json(ae_100, "SDTM.AE", "AE", "Adverse Events", ds_metadata)
start <- Sys.time()
write_dataset_json(ds_json, file="test.json")
print(Sys.time()-start)
Time difference of 42.58133 secs In total that's 7,400,000 rows and 37 columns. Total output size is 1.8GB A quick test against the current dev branch using jsonlite had a time of 2.141051 mins. |
Feature Idea
Depend on rcpp.simdjson or yyjsonr, instead of jsonlite. The link contains a nice benchmark.
Relevant Input
No response
Relevant Output
No response
Reproducible Example/Pseudo Code
No response
The text was updated successfully, but these errors were encountered: