Why is the first read slower? #277

tentacles-from-outer-space · 2023-05-24T14:20:21Z

I notice that in fresh session first use of read_fst is slower than next uses. The same file. The same number of cores. Just running the same command again.

RStudio on server (80 cores), ~4GB file.

The first use is like twice slower.

example_file <- "*****.fst"
library(fst)
library(fstcore)
threads_fstlib(10)
system.time(read_fst(example_file))
#>    user  system elapsed 
#>  19.383  13.449   4.730
system.time(read_fst(example_file))
#>    user  system elapsed 
#>  13.354  10.911   2.538
system.time(read_fst(example_file))
#>    user  system elapsed 
#>  12.953  11.057   2.652
system.time(read_fst(example_file))
#>    user  system elapsed 
#>  12.632  11.213   2.522

^{Created on 2023-05-24 with reprex v2.0.2}

Is it some kind of caching?

I run into this when I try to compare different setting of number of threads.

The text was updated successfully, but these errors were encountered:

tentacles-from-outer-space · 2023-05-24T14:26:56Z

Could it be the case from #218?

MarcusKlik · 2023-12-01T08:55:39Z

Hi @tentacles-from-outer-space, thanks for your question!

Modern storage devices have increasingly large RAM caches that can be used to service multiple identical requests. That is why a correct benchmark should always read a unique file from disk on every iteration to avoid reading from SSD cache instead of the actual stored file.

Also, files can be cached in the OS and some storage devices have an initial startup time where they have to get out of a lower power mode and that can take some time too.

For your benchmark, you might consider first writing a large number (e.g. 100) of unique fst files to get a benchmark for writing and then reading those unique files to get a correct benchmark of the reading performance. With such a setup it's safe to assume that there are no caching effects :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the first read slower? #277

Why is the first read slower? #277

tentacles-from-outer-space commented May 24, 2023 •

edited

Loading

tentacles-from-outer-space commented May 24, 2023

MarcusKlik commented Dec 1, 2023

Why is the first read slower? #277

Why is the first read slower? #277

Comments

tentacles-from-outer-space commented May 24, 2023 • edited Loading

tentacles-from-outer-space commented May 24, 2023

MarcusKlik commented Dec 1, 2023

tentacles-from-outer-space commented May 24, 2023 •

edited

Loading