Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds modular SERDE approach #175

Merged
merged 2 commits into from
May 31, 2024
Merged

Adds modular SERDE approach #175

merged 2 commits into from
May 31, 2024

Conversation

skrawcz
Copy link
Contributor

@skrawcz skrawcz commented May 11, 2024

This PR introduces a modular SERDE approach.

  1. Singledispatch to register type serializers.
  2. Any complex object deserialization, needs to be accompanied with a "type field" that will map to the name
    of a "string dispatcher function". i.e. like single dispatch, but you register a string value for it. This approach
    means that most people will get around the need for field level deserialization / having to tell Burr about state types explicitly.

This skips field level serde registration for now. But leaves room for it.

Otherwise this custom serde for:

  • langchain docs & messages
  • pydantic objects
  • some more complex serialization, e.g. pandas dataframe serialization
  • ability to register classes to be pickled

TODOs:

  • decide on module structure and where it should live
  • wire changes through to tracker serialization
  • wire changes through to persister serialization
  • wire changes through to UI reading local tracker logs Not required.
  • update LC examples
  • write tests
  • add documentation

Changes

How I tested this

  • unit tests
  • running lcel example

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

skrawcz added a commit to skrawcz/Scrapegraph-ai that referenced this pull request May 11, 2024
skrawcz added a commit to skrawcz/Scrapegraph-ai that referenced this pull request May 11, 2024
burr/tracking/common/models.py Outdated Show resolved Hide resolved
burr/tracking/common/models.py Outdated Show resolved Hide resolved
@skrawcz skrawcz changed the title Adds support for lc documents & similar Adds modular SERDE approach May 13, 2024
@skrawcz skrawcz marked this pull request as draft May 13, 2024 05:56
@skrawcz
Copy link
Contributor Author

skrawcz commented May 29, 2024

Update: just need to test more, but I think I have a good design (and update libraries to latest locally...)

core.serde has functions.
core.state uses it and imports integrations
state.State knows how to serialize & deserialize using it.

People can then register their own for their application -- I haven't tested overriding but I assume it'll work.
Added example of pandas serializing somewhere and just showing link back...

Tracker models should be dumb -- so instead I push serialization calling into hooks.
Also persisters updated to use serialization.

Once I merge and rebase off of #215 can add persister tests that will also then check for this...

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looks good, I'm confused about the serde kwargs TBH.

Obviously needs docs but I imagine that'll come later.

burr/core/persistence.py Show resolved Hide resolved
burr/core/state.py Show resolved Hide resolved
burr/integrations/__init__.py Outdated Show resolved Hide resolved
burr/integrations/serde/pandas.py Show resolved Hide resolved
burr/integrations/serde/pandas.py Show resolved Hide resolved
burr/integrations/serde/pandas.py Show resolved Hide resolved
burr/integrations/persisters/postgresql.py Show resolved Hide resolved
burr/tracking/common/models.py Outdated Show resolved Hide resolved
burr/tracking/server/run.py Show resolved Hide resolved
@DAGWorks-Inc DAGWorks-Inc deleted a comment from elijahbenizzy May 30, 2024
@skrawcz skrawcz marked this pull request as ready for review May 30, 2024 06:34
from functools import singledispatch
from typing import Any, Union

KEY = "__bur_serde__"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

burr_serde

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, docs + small items

It was lost somehow / bad before.
docs/concepts/serde.rst Outdated Show resolved Hide resolved
docs/concepts/serde.rst Show resolved Hide resolved
This code introduces a core module for SERDE.
State relies on this to know how to serialize and deserialize itself.

A design decision made was that we will use singledispatch style
registration based on types for serialization, and a string key value
for deserialization. Serialization always outputs a dictionary that
can then be JSON serialized somewhere. A key is added to that
to know how to deserialize things.

Field level SERDE specification is still possible, but hasn’t been
implemented yet.

Otherwise people interacting with State object should use the methods
defined on it. The core.serde module can be used in other
contexts easily too — so if trackers and persisters want to utilize it they can.

One constraint is that all SERDE functions take in **kwargs. That way people
could wire through some interesting args to do more interesting serialization,
e.g. like for pickle, or pandas. The kwargs are really a dict of kwargs, where
each serde function/module should namespace its kwargs to avoid clashes.

If in the future we need to change what is registered as SERDE, that’s also possible
to adjust. The only limitations this design imposes is a global way to do SERDE.
Field level will come, and make it even more customizeable, otherwise we think
this approach should cover most use cases.

This change also fixed a bug in the tracker when looking for sequence ids, because
it hadn’t anticipated spans…

Squashed commits:
[6cbc9ee] Flips print to debug
[b9b7dda] Fixes broken tests due to deps
[9daf325] Updates code and adds tests

This adds tests and removes code.
[20579ac] Fixes bugs with incomplete file log state

File logs could be incomplete. So this checks to find
the last entry that is valid for that particular context.
[47d3288] WIP: Adds more docs and adds pickle

This commit will be cleaned up.

But adds more docs, and a way to register
classes to be pickled!
[1a1efb2] Refactors serde to proper place and adds POC

This is WIP -- but it seems to work via unit tests.

Uses **kwargs for future proofing, but also allowing
one to pass in values to serde -- idea is that each
would have some namespace of kwargs.

Just now need to test it for the tracker.

TODO: allow one to register field level state serde.
[3762aa9] Adds prototype module to handle serde

Singledispatch for type serialization. For non-primitive types, make them return a dict with a serde key.

Then use a string value based mechanism to deserialize. This way we can be "field agnostic" during deserialization.

Otherwise deserialization without knowing what it is, is a challenge.
Thus the idea to include a key in a dict for it.

Otherwise the only other way for deserialization is to field
registries...

TODO:
- determine module naming and location
- actually wire in
- proper tests
- how to override these
- think about any other edge/error cases
@skrawcz skrawcz merged commit a5d6159 into main May 31, 2024
11 checks passed
@skrawcz skrawcz deleted the serialize_lc_docs branch May 31, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants