Skip to content

Commit

Permalink
Merge pull request #529 from graphistry/dev/gfql-serialization
Browse files Browse the repository at this point in the history
Dev/gfql serialization
  • Loading branch information
lmeyerov authored Dec 23, 2023
2 parents 3fd294a + 87cb5c7 commit 7f0d32f
Show file tree
Hide file tree
Showing 26 changed files with 678 additions and 101 deletions.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,27 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm

## [Development]

## [0.32.0 - 2023-12-22]

### Added

* GFQL `Chain` AST object
* GFQL query serialization - `Chain`, `ASTObject`, and `ASTPredict` implement `ASTSerializable`
- Ex:`Chain.from_json(Chain([n(), e(), n()]).to_json())`
* GFQL predicate `is_year_end`

### Docs

* GFQL in readme.md

### Changes

* Refactor `ASTEdge`, `ASTNode` field naming convention to match other `ASTSerializable`s

### Breaking 🔥

* GFQL `e()` now aliases `e_undirected` instead of the base class `ASTEdge`

## [0.31.1 - 2023-12-05]

### Docs
Expand Down
21 changes: 16 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,9 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit
g2.plot()
```

* Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb))
* GFQL: Cypher-style graph pattern mining queries on dataframes ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb))

Run Cypher-style graph queries natively on dataframes without going to a database or Java:
Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL:

```python
from graphistry import n, e_undirected, is_in
Expand Down Expand Up @@ -1133,7 +1133,7 @@ g2.plot() # nodes are values from cols s, d, k1
destination_node_match={"k2": 2},
destination_node_query='k2 == 2 or k2 == 4',
)
.chain([ # filter to subgraph
.chain([ # filter to subgraph with Cypher-style GFQL
n(),
n({'k2': 0, "m": 'ok'}), #specific values
n({'type': is_in(["type1", "type2"])}), #multiple valid values
Expand All @@ -1156,7 +1156,7 @@ g2.plot() # nodes are values from cols s, d, k1
.collapse(node='some_id', column='some_col', attribute='some val')
```

Both `hop()` and `chain()` match dictionary expressions support dataframe series *predicates*. The above examples show `is_in([x, y, z, ...])`. Additional predicates include:
Both `hop()` and `chain()` (GFQL) match dictionary expressions support dataframe series *predicates*. The above examples show `is_in([x, y, z, ...])`. Additional predicates include:

* categorical: is_in, duplicated
* temporal: is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end
Expand Down Expand Up @@ -1233,7 +1233,7 @@ assert 'pagerank' in g2._nodes.columns

#### Graph pattern matching

PyGraphistry supports a PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java
PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java

See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb)

Expand Down Expand Up @@ -1316,6 +1316,17 @@ print('# end edges: ', len(g3._edges[ g3._edges.final_edge ]))

See table above for more predicates like `is_in()` and `gt()`

Queries can be serialized and deserialized, such as for saving and remote execution:

```python
from graphistry.compute.chain import Chain

pattern = Chain([n(), e(), n()])
pattern_json = pattern.to_json()
pattern2 = Chain.from_json(pattern_json)
g.chain(pattern2).plot()
```

#### Pipelining

```python
Expand Down
4 changes: 4 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@
('py:class', '3'),
('py:class', "<class 'dict'>"),
('py:class', "<class 'str'>"),
('py:class', "graphistry.compute.ASTSerializable.ASTSerializable"),
('py:class', "graphistry.compute.chain.Chain"),
('py:class', "graphistry.compute.predicates.ASTPredicate.ASTPredicate"),
('py:class', 'graphistry.compute.predicates.categorical.Duplicated'),
('py:class', 'graphistry.compute.predicates.is_in.IsIn'),
Expand All @@ -60,6 +62,7 @@
('py:class', 'graphistry.compute.predicates.numeric.LT'),
('py:class', 'graphistry.compute.predicates.numeric.NE'),
('py:class', 'graphistry.compute.predicates.numeric.NotNA'),
('py:class', 'graphistry.compute.predicates.numeric.NumericASTPredicate'),
('py:class', 'graphistry.compute.predicates.str.Contains'),
('py:class', 'graphistry.compute.predicates.str.Endswith'),
('py:class', 'graphistry.compute.predicates.str.IsAlnum'),
Expand All @@ -81,6 +84,7 @@
('py:class', 'graphistry.compute.predicates.temporal.IsQuarterEnd'),
('py:class', 'graphistry.compute.predicates.temporal.IsQuarterStart'),
('py:class', 'graphistry.compute.predicates.temporal.IsYearStart'),
('py:class', 'graphistry.compute.predicates.temporal.IsYearEnd'),
('py:class', 'graphistry.Engine.Engine'),
('py:class', 'graphistry.gremlin.CosmosMixin'),
('py:class', 'graphistry.gremlin.GremlinMixin'),
Expand Down
2 changes: 2 additions & 0 deletions graphistry/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@

from graphistry.compute import (
n, e_forward, e_reverse, e_undirected,
Chain,

is_in, IsIn,

Expand All @@ -61,6 +62,7 @@
is_quarter_start, IsQuarterStart,
is_quarter_end, IsQuarterEnd,
is_year_start, IsYearStart,
is_year_end, IsYearEnd,
is_leap_year, IsLeapYear,

gt, GT,
Expand Down
40 changes: 40 additions & 0 deletions graphistry/compute/ASTSerializable.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from abc import ABC, abstractmethod
from typing import Dict
import pandas as pd

from graphistry.utils.json import JSONVal, serialize_to_json_val


class ASTSerializable(ABC):
"""
Internal, not intended for use outside of this module.
Class name becomes o['type'], and all non reserved_fields become JSON-typed key
"""

reserved_fields = ['type']

def validate(self) -> None:
pass

def to_json(self, validate=True) -> Dict[str, JSONVal]:
"""
Returns JSON-compatible dictionry {"type": "ClassName", "arg1": val1, ...}
Emits all non-reserved instance fields
"""
if validate:
self.validate()
data: Dict[str, JSONVal] = {'type': self.__class__.__name__}
for key, value in self.__dict__.items():
if key not in self.reserved_fields:
data[key] = serialize_to_json_val(value)
return data

@classmethod
def from_json(cls, d: Dict[str, JSONVal]) -> 'ASTSerializable':
"""
Given c.to_json(), hydrate back c
Corresponding c.__class__.__init__ must accept all non-reserved instance fields
"""
constructor_args = {k: v for k, v in d.items() if k not in cls.reserved_fields}
return cls(**constructor_args)
2 changes: 2 additions & 0 deletions graphistry/compute/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from .ast import (
n, e_forward, e_reverse, e_undirected
)
from .chain import Chain
from .predicates.is_in import (
is_in, IsIn
)
Expand All @@ -14,6 +15,7 @@
is_quarter_start, IsQuarterStart,
is_quarter_end, IsQuarterEnd,
is_year_start, IsYearStart,
is_year_end, IsYearEnd,
is_leap_year, IsLeapYear
)
from .predicates.numeric import (
Expand Down
Loading

0 comments on commit 7f0d32f

Please sign in to comment.