Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when calculating L2 distances #19

Open
MisaOgura opened this issue Aug 10, 2023 · 0 comments
Open

Error when calculating L2 distances #19

MisaOgura opened this issue Aug 10, 2023 · 0 comments

Comments

@MisaOgura
Copy link

Hello and thank you for making the code to run cell painting VAE available!

I've encountered a problem executing the notebook cell-painting/3.application/experiment_level5.ipynb and wanted to get your thoughts on whether the modification I applied is a correct one.

The function calculate_L2_distances defined in code block no.11 and executed in code block no.44 in throws below error:-

TypeError: Could not convert DNA inhibitornorepinephrine reputake inhibitormaternal embryonic leucine zipper kinase inhibitordopamine receptor antagonist|serotonin receptor agonistdopamine reuptake inhibitorDMSO to numeric

This is presumably due to the fact that the first line of the function is trying to take a mean over feature columns of a dataframe, but this dataframe contains non-feature columns with non-numeric values such as moa.

Below is my best guess at achieving the desired functionality:-

# BEFORE
...
def calculate_L2_distances(predictions, predictions_random):
    mean_of_moas = meta_moa_complete_df.groupby(['moa']).mean().loc[:,'Cells_AreaShape_FormFactor':]
    ...

# AFTER
...
def calculate_L2_distances(predictions, predictions_random):
    mean_of_moas = (
        meta_moa_complete_df
        .groupby(['moa'])[complete_features_df.columns]
        .mean()
        .loc[:,'Cells_AreaShape_FormFactor':]
    )
...

Retaining only the feature columns enables to calculate the mean over these columns. complete_features_df is defined the code block no.3 of the same notebook to be a dataframe that only includes feature columns:-

# code block #3
data_dict = load_data(["complete"])
meta_features = infer_cp_features(data_dict["complete"], metadata=True)
cp_features = infer_cp_features(data_dict["complete"])

complete_features_df = data_dict["complete"].reindex(cp_features, axis="columns")
complete_meta_df = data_dict["complete"].reindex(meta_features, axis="columns")

Look forward to your input.

Thanks so much,
Misa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant