Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset and Action Object Deletion APIs #9004

Merged
merged 48 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
67d9503
[syft/action_obj] - also delete the corresponding blob entry for an a…
khoaguin Jul 2, 2024
cc4665d
[syft/unit-tests] add tests for deleting action objects
khoaguin Jul 2, 2024
b796964
[syft/dataset_service] fix `unhashable type: 'list'` when deleting a …
khoaguin Jul 2, 2024
921c754
Merge branch 'save-small-variables-without-blob-storage' into dataset…
khoaguin Jul 3, 2024
8b9eb84
[syft/dataset_service] delete assets of a dataset when deleting the d…
khoaguin Jul 3, 2024
f12b519
[syft/unit-tests] fix `test_delete_datasets`
khoaguin Jul 3, 2024
281cd61
Merge branch 'save-small-variables-without-blob-storage' into dataset…
khoaguin Jul 3, 2024
45d41b4
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 3, 2024
6ebe48c
[syft/dataset] adding properties to get private data and mock's blob …
khoaguin Jul 4, 2024
11e1d22
[tests/unit] - add property methods to retrieve blob entries for Asse…
khoaguin Jul 4, 2024
5fbeb38
[syft/chore] fix linting
khoaguin Jul 4, 2024
fccc2e4
[syft/dataset] return None if can't get mock / data
khoaguin Jul 4, 2024
d08fafd
[syft/dataset] add back error handling for `Asset.data` and `Asset.mock`
khoaguin Jul 4, 2024
8c32f9d
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 4, 2024
f915e74
[tests/integration] change gateway_test to reflect new changes in Asset
khoaguin Jul 4, 2024
41ad01a
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 4, 2024
351ffc4
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 5, 2024
5d62639
[tests/unit] delete dataset after depositing results in `test_diff_st…
khoaguin Jul 5, 2024
449d0fb
[tests/unit] delete low side dataset in `test_diff_state_with_dataset…
khoaguin Jul 5, 2024
ef881b6
[tests/unit] improve assert message for clarity
khoaguin Jul 5, 2024
25c2d9d
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 8, 2024
a4284d9
cleanup message for the blob storage entry delete
shubham3121 Jul 8, 2024
9205371
Merge branch 'dev' into dataset-actionobj-delete-apis
shubham3121 Jul 8, 2024
204d262
add a marked as deleted flag to datasets
shubham3121 Jul 8, 2024
2657114
[syft/dataset_service] - remove `mock_blob` and `data_blob` propertie…
khoaguin Jul 9, 2024
ca1abef
[syft/client] remove `_check_asset_must_contain_mock` when upload dat…
khoaguin Jul 9, 2024
f8b43fe
[tests/integration] rename `dataset.delete_by_uid` to `dataset.delete`
khoaguin Jul 9, 2024
e759846
[syft/dataset] soft deleting a dataset by updating `marked_as_deleted…
khoaguin Jul 9, 2024
db85699
[syft/dataset] remove dataset from the `get_all` if it is marked as d…
khoaguin Jul 9, 2024
ea5edc4
[syft/dataset] exclude a dataset from get / search services if it is …
khoaguin Jul 9, 2024
510a67e
Merge branch 'dev' into dataset-actionobj-delete-apis
kiendang Jul 15, 2024
3b50968
Rename deleted dataset
kiendang Jul 15, 2024
d96392d
Rename Dataset.marked_as_deleted to Dataset.to_be_deleted
kiendang Jul 15, 2024
aa99cd6
Rename node to server
kiendang Jul 15, 2024
f7ca621
Rename root_domain_client to root_client
kiendang Jul 15, 2024
d925942
Create a None ActionObject to replace the one linked with the deleted…
kiendang Jul 15, 2024
dd3cba3
Rename node_uid to server_uid
kiendang Jul 15, 2024
a2de957
Update protocol version
kiendang Jul 15, 2024
24050bd
[syft/action_service] soft deleting an action ojbect / twin object fr…
khoaguin Jul 15, 2024
7b8fc80
[syft/action_service] moving logic to soft delete a dataset's assets …
khoaguin Jul 15, 2024
b8d1e55
[syft/dataset_service] add an option to delete a dataset object witho…
khoaguin Jul 15, 2024
f17816a
[syft/action_service] stop deleting the action object but just settin…
khoaguin Jul 15, 2024
694cd17
Merge branch 'dev' into dataset-actionobj-delete-apis
shubham3121 Jul 15, 2024
553d3f2
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 16, 2024
95341f9
[tests/integration] debugging `test_delete_idle_worker`
khoaguin Jul 16, 2024
efa3beb
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 17, 2024
47c9b49
Merge branch 'dev' into dataset-actionobj-delete-apis
khoaguin Jul 18, 2024
ea9983e
Merge branch 'dev' into dataset-actionobj-delete-apis
shubham3121 Jul 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,8 @@
"metadata": {},
"outputs": [],
"source": [
"assert training_images.data is None"
"assert isinstance(training_images.data, sy.SyftError)\n",
"training_labels.data"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion packages/syft/src/syft/client/datasite_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def upload_dataset(self, dataset: CreateDataset) -> SyftSuccess | SyftError:
asset = dataset.asset_list[i]
dataset.asset_list[i] = add_default_uploader(user, asset)

dataset._check_asset_must_contain_mock()
# dataset._check_asset_must_contain_mock()
dataset_size: float = 0.0

# TODO: Refactor so that object can also be passed to generate warnings
Expand Down
117 changes: 108 additions & 9 deletions packages/syft/src/syft/service/action/action_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import importlib
import logging
from typing import Any
from typing import cast

# third party
import numpy as np
Expand Down Expand Up @@ -289,10 +290,6 @@ def _get(
resolve_nested: bool = True,
) -> Result[ActionObject, str]:
"""Get an object from the action store"""
# stdlib

# relative

result = self.store.get(
uid=uid, credentials=context.credentials, has_permission=has_permission
)
Expand Down Expand Up @@ -945,12 +942,114 @@ def exists(

@service_method(path="action.delete", name="delete", roles=ADMIN_ROLE_LEVEL)
def delete(
self, context: AuthedServiceContext, uid: UID
self, context: AuthedServiceContext, uid: UID, soft_delete: bool = False
) -> SyftSuccess | SyftError:
res = self.store.delete(context.credentials, uid)
if res.is_err():
return SyftError(message=res.err())
return SyftSuccess(message="Great Success!")
get_res = self.store.get(uid=uid, credentials=context.credentials)
if get_res.is_err():
return SyftError(message=get_res.err())
obj: ActionObject | TwinObject = get_res.ok()
return_msg = []

# delete any associated blob storage entry object to the action object
blob_del_res = self._delete_blob_storage_entry(context=context, obj=obj)
if isinstance(blob_del_res, SyftError):
return SyftError(message=blob_del_res.message)
return_msg.append(blob_del_res.message)

# delete the action object from the action store
store_del_res = self._delete_from_action_store(
context=context, uid=obj.id, soft_delete=soft_delete
)
if isinstance(store_del_res, SyftError):
return SyftError(message=store_del_res.message)
return_msg.append(store_del_res.message)

return SyftSuccess(message="\n".join(return_msg))

def _delete_blob_storage_entry(
self,
context: AuthedServiceContext,
obj: TwinObject | ActionObject,
) -> SyftSuccess | SyftError:
deleted_blob_ids = []
blob_store_service = cast(
BlobStorageService, context.server.get_service(BlobStorageService)
)

if isinstance(obj, ActionObject) and obj.syft_blob_storage_entry_id:
blob_del_res = blob_store_service.delete(
context=context, uid=obj.syft_blob_storage_entry_id
)
if isinstance(blob_del_res, SyftError):
return SyftError(message=blob_del_res.message)
deleted_blob_ids.append(obj.syft_blob_storage_entry_id)

if isinstance(obj, TwinObject):
if obj.private.syft_blob_storage_entry_id:
blob_del_res = blob_store_service.delete(
context=context, uid=obj.private.syft_blob_storage_entry_id
)
if isinstance(blob_del_res, SyftError):
return SyftError(message=blob_del_res.message)
deleted_blob_ids.append(obj.private.syft_blob_storage_entry_id)

if obj.mock.syft_blob_storage_entry_id:
blob_del_res = blob_store_service.delete(
context=context, uid=obj.mock.syft_blob_storage_entry_id
)
if isinstance(blob_del_res, SyftError):
return SyftError(message=blob_del_res.message)
deleted_blob_ids.append(obj.mock.syft_blob_storage_entry_id)

message = f"Deleted blob storage entries: {', '.join(str(blob_id) for blob_id in deleted_blob_ids)}"

return SyftSuccess(message=message)

def _delete_from_action_store(
self,
context: AuthedServiceContext,
uid: UID,
soft_delete: bool = False,
) -> SyftSuccess | SyftError:
if soft_delete:
get_res = self.store.get(uid=uid, credentials=context.credentials)
if get_res.is_err():
return SyftError(message=get_res.err())
obj: ActionObject | TwinObject = get_res.ok()

if isinstance(obj, TwinObject):
res = self._soft_delete_action_obj(
context=context, action_obj=obj.private
)
if res.is_err():
return SyftError(message=res.err())
res = self._soft_delete_action_obj(context=context, action_obj=obj.mock)
if res.is_err():
return SyftError(message=res.err())

if isinstance(obj, ActionObject):
res = self._soft_delete_action_obj(context=context, action_obj=obj)
if res.is_err():
return SyftError(message=res.err())
else:
res = self.store.delete(credentials=context.credentials, uid=uid)
if res.is_err():
return SyftError(message=res.err())

return SyftSuccess(message=f"Action object with uid '{uid}' deleted.")

def _soft_delete_action_obj(
self, context: AuthedServiceContext, action_obj: ActionObject
) -> Result[ActionObject, str]:
action_obj.syft_action_data_cache = None
res = action_obj._save_to_blob_storage()
if isinstance(res, SyftError):
return Err(res.message)
set_result = self._set(
context=context,
action_object=action_obj,
)
return set_result


def resolve_action_args(
Expand Down
46 changes: 26 additions & 20 deletions packages/syft/src/syft/service/blob_storage/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -352,30 +352,36 @@ def mark_write_complete(
def delete(
self, context: AuthedServiceContext, uid: UID
) -> SyftSuccess | SyftError:
result = self.stash.get_by_uid(context.credentials, uid=uid)
if result.is_ok():
obj = result.ok()
get_res = self.stash.get_by_uid(context.credentials, uid=uid)
if get_res.is_err():
return SyftError(message=get_res.err())

if obj is None:
return SyftError(
message=f"No blob storage entry exists for uid: {uid}, or you have no permissions to read it"
)

try:
with context.server.blob_storage_client.connect() as conn:
file_unlinked_result = conn.delete(obj.location)
except Exception as e:
return SyftError(message=f"Failed to delete file: {e}")
obj = get_res.ok()
if obj is None:
return SyftError(
message=f"No blob storage entry exists for uid: {uid}, "
f"or you have no permissions to read it"
)

if isinstance(file_unlinked_result, SyftError):
return file_unlinked_result
blob_storage_entry_deleted = self.stash.delete(
context.credentials, UIDPartitionKey.with_obj(uid), has_permission=True
try:
with context.server.blob_storage_client.connect() as conn:
file_unlinked_result = conn.delete(obj.location)
if isinstance(file_unlinked_result, SyftError):
return file_unlinked_result
except Exception as e:
return SyftError(
message=f"Failed to delete blob file with id '{uid}'. Error: {e}"
)
if blob_storage_entry_deleted.is_ok():
return file_unlinked_result

return SyftError(message=result.err())
blob_entry_delete_res = self.stash.delete(
context.credentials, UIDPartitionKey.with_obj(uid), has_permission=True
)
if blob_entry_delete_res.is_err():
return SyftError(message=blob_entry_delete_res.err())

return SyftSuccess(
message=f"Blob storage entry with id '{uid}' deleted successfully."
)


TYPE_TO_SERVICE[BlobStorageEntry] = BlobStorageEntry
39 changes: 25 additions & 14 deletions packages/syft/src/syft/service/dataset/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,17 @@
from typing_extensions import Self

# relative
from ...client.api import APIRegistry
from ...serde.serializable import serializable
from ...store.document_store import PartitionKey
from ...types.datetime import DateTime
from ...types.dicttuple import DictTuple
from ...types.syft_object import PartialSyftObject
from ...types.syft_object import SYFT_OBJECT_VERSION_1
from ...types.syft_object import SyftObject
from ...types.transforms import TransformContext
from ...types.transforms import generate_id
from ...types.transforms import make_set_default
from ...types.transforms import transform
from ...types.transforms import validate_url
from ...types.uid import UID
Expand Down Expand Up @@ -239,9 +242,6 @@ def __eq__(self, other: object) -> bool:

@property
def pointer(self) -> Any:
# relative
from ...client.api import APIRegistry

api = APIRegistry.api_for(
server_uid=self.server_uid,
user_verify_key=self.syft_client_verify_key,
Expand All @@ -251,16 +251,15 @@ def pointer(self) -> Any:

@property
def mock(self) -> SyftError | Any:
# relative
from ...client.api import APIRegistry

api = APIRegistry.api_for(
server_uid=self.server_uid,
user_verify_key=self.syft_client_verify_key,
)
if api is None:
return SyftError(message=f"You must login to {self.server_uid}")
result = api.services.action.get_mock(self.action_id)
if isinstance(result, SyftError):
return result
try:
if isinstance(result, SyftObject):
return result.syft_action_data
Expand All @@ -282,7 +281,6 @@ def has_permission(self, data_result: Any) -> bool:
@property
def data(self) -> Any:
# relative
from ...client.api import APIRegistry

api = APIRegistry.api_for(
server_uid=self.server_uid,
Expand All @@ -291,6 +289,8 @@ def data(self) -> Any:
if api is None or api.services is None:
return None
res = api.services.action.get(self.action_id)
if isinstance(res, str):
return SyftError(message=f"Could not access private data. {str(res)}")
if self.has_permission(res):
return res.syft_action_data
else:
Expand Down Expand Up @@ -471,6 +471,7 @@ class Dataset(SyftObject):
created_at: DateTime = DateTime.now()
uploader: Contributor
summary: str | None = None
to_be_deleted: bool = False

__attr_searchable__ = [
"name",
Expand Down Expand Up @@ -523,6 +524,8 @@ def _repr_html_(self) -> Any:
"""
else:
description_info_message = ""
if self.to_be_deleted:
return "This dataset has been marked for deletion. The underlying data may be not available."
return f"""
<style>
{FONT_CSS}
Expand All @@ -549,8 +552,7 @@ def _repr_html_(self) -> Any:
"""

def action_ids(self) -> list[UID]:
data = [asset.action_id for asset in self.asset_list if asset.action_id]
return data
return [asset.action_id for asset in self.asset_list if asset.action_id]

@property
def assets(self) -> DictTuple[str, Asset]:
Expand Down Expand Up @@ -639,9 +641,6 @@ class CreateDataset(Dataset):

model_config = ConfigDict(validate_assignment=True, extra="forbid")

def _check_asset_must_contain_mock(self) -> None:
_check_asset_must_contain_mock(self.asset_list)

@field_validator("asset_list")
@classmethod
def __assets_must_contain_mock(
Expand All @@ -650,6 +649,13 @@ def __assets_must_contain_mock(
_check_asset_must_contain_mock(asset_list)
return asset_list

@field_validator("to_be_deleted")
@classmethod
def __to_be_deleted_must_be_false(cls, v: bool) -> bool:
if v is True:
raise ValueError("to_be_deleted must be False")
return v

def set_description(self, description: str) -> None:
self.description = MarkdownDescription(text=description)

Expand Down Expand Up @@ -876,8 +882,13 @@ def createdataset_to_dataset() -> list[Callable]:
validate_url,
convert_asset,
add_current_date,
make_set_default("to_be_deleted", False), # explicitly set it to False
]


class DatasetUpdate:
pass
class DatasetUpdate(PartialSyftObject):
__canonical_name__ = "DatasetUpdate"
__version__ = SYFT_OBJECT_VERSION_1

name: str
to_be_deleted: bool
Loading
Loading