Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary format for data files: FASLs! #5

Open
aartaka opened this issue Aug 16, 2022 · 11 comments
Open

Binary format for data files: FASLs! #5

aartaka opened this issue Aug 16, 2022 · 11 comments
Assignees

Comments

@aartaka
Copy link
Contributor

aartaka commented Aug 16, 2022

Our data files take time to load. And, even though we often load them asynchronously or optimize their reading in other ways, there's only so much we can speed up without changing the format. And, while SQLite or some other opaque format may be nice and performant, impelementation-native FASLs may be even faster and need no foreign interfaces. The only challenge being: how do we put data into FASL and read it back, portably?

Solution

One way I'm thinking about is

  • serializing the data into s-expressions, and then writing (defvar *data* ...s-expressions) into some file,
  • calling compile-file on this newly written file,
  • and, when we need our data, loading it to get *data* variable set to the right value. Which is probably faster and more overflow-resistant than read-ing the whole file.
  • Bonus point: we can somehow leverage ASDF to compile and cache all of our data files in some smart way, much like with Nyxt configuration systems generated on the fly.

Alternative solutions

I'm pretty sure there are other ways, like using compiled function bodies or eval-ing the data fetched from FASL, and I'm conscious that my approach is not the most performant, but it skips at least some some roadbumps.

The perfect approach would be to compile the Lisp data itself (and not its cl-prevalence-serialized representation) into FASLs.

Additional context

I've done some of this compilation magic in Sade, and here is the relevant bit:

(let* ((in (uiop:merge-pathnames* (uiop:parse-native-namestring (second args))
                                               (uiop:getcwd)))
                    (out (uiop:merge-pathnames*
                          (or (uiop:parse-native-namestring (third args)) (pathname-name in))
                          (uiop:getcwd))))
               #+ecl
               (uiop:with-temporary-file
                   (:stream f :pathname p :type "lisp" :keep t)
                 (print (with-open-file (i in) (bf i)) f)
                 (print '(si:quit) f)
                 :close-stream
                 (compile-file p :system-p t)
                 (c:build-program
                  out :lisp-files (list (uiop:merge-pathnames*
                                         (concatenate 'string (pathname-name p) ".o") p))))
               #-ecl
               (let ((tmpname (gensym "TMP")))
                 (bf-compile-from-file tmpname in)
                 (setf uiop:*image-entry-point* (lambda () (funcall tmpname)))
                 (uiop:dump-image out :executable t)))

Those, however, are concerned with making executable files, and not FASLs, but they still set the tone for how we might approach the problem.

@Ambrevar
Copy link
Member

Yes, serializing to .fasls makes total sense!
Note that it would never be portable though. But that's OK, since in practice this would just be a "fast cache" and you'd fallback on the original file if the fasl is not available for your current implementation.

It's also not so clear where to store the fasl. I suppose in the usual ~/.cache/common-lisp folder? Is there a function to expand the cache path? ASDF can do this I think, need to find the right API point.

So to implement this, why we need to do is simply to extend the lisp-file methods. We could add a slot to the class to let the user choose the cache path for instance.

  • and, when we need our data, loading it to get *data* variable set to the right value. Which is probably faster and more overflow-resistant than read-ing the whole file.

I didn't understand this. Can you give an example?

@aartaka
Copy link
Contributor Author

aartaka commented Aug 19, 2022

Cool!

  • and, when we need our data, loading it to get *data* variable set to the right value. Which is probably faster and more overflow-resistant than read-ing the whole file.

I didn't understand this. Can you give an example?

I mean that, when we load a file, we can't get its contents directly through load. We need to either call some contained function or access some variable that's defined in this file. So my suggestion here is to use some magic variable that's being set to a new value after every loaded FASL. But still, that's not nice and if you have an idea for how to avoid all the function/variable hacks here, I'll be glad to know :)

@Ambrevar
Copy link
Member

Oh, I see.

Same here, I'd like to know...

@aartaka
Copy link
Contributor Author

aartaka commented Apr 20, 2023

Here's a progress and a complete-ish prototype (in comments) for loading data from FASLs: https://www.reddit.com/r/Common_Lisp/comments/12dxdic/dumping_objects_into_compiled_files/

@Ambrevar
Copy link
Member

I'm not sure I get the macro trick... What's special about it beside binding to a global variable?

@aartaka
Copy link
Contributor Author

aartaka commented Apr 25, 2023

I'm not sure I get the macro trick... What's special about it beside binding to a global variable?

So the logic is:

  • We can store arbitrary objects in FASL files.
  • We cannot print literal objects to files, because printing them would produce unreadable object #<foo 1332>.
  • But! We can use macroexpansion to inject the literal object into the file:
    • We create a macro that returns a form with the object injected (because macro-returned forms can contain anything, even objects, literal arrays, streams—basically anything!)
    • We put the macro invocation inside a file.
    • We call compile-file on the file with the macro.
    • At compile-time, the macro expands to the form with this object injected into it.
    • And the file compiler creates (because, by the standard, it has to) an object that will be restore-able from the FASL at loading.
  • Now, the last thing to do is figuring out how to actually return this object:
    • load does not return the object loaded into the image.
    • But loading the file alters the state of the image, if there are toplevel setfs, defuns etc.
    • So we can add a (setf *data* ...) inside a compiled file so that some variable is set to the object we compile.
  • Having this variable modification and literal object storage, we can:
    • Inject a literal object into a file with a macro expanding to (setf *data-variable* ,object).
    • compile-file it.
    • And load it to get a new'n'shiny object equivalent to the serialized object, stored in *data-variable*.
  • Et voilá! We've got FASLs as a way to persist arbitrary Lisp objects!

@Ambrevar
Copy link
Member

OK, got the macro trick now, it's super smart!

@Ambrevar
Copy link
Member

All that said, does this really belong to Nfiles? I believe Nfiles should leave the user with the option to choose their prefered serialization format.

If we don't want to create a dedicated library just for this, I suggest to create a dedicated package at least, to decouple regular file management from specialized serialization.

@Ambrevar
Copy link
Member

@aartaka Wanna work on it?

@Ambrevar
Copy link
Member

We would also need some benchmarks to compare cl-prevalence with this approach.

@aartaka
Copy link
Contributor Author

aartaka commented Apr 26, 2023

@aartaka Wanna work on it?

Yes, but not necessarily soon enough :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants