-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using Pymoo for find optimal index of compounds #581
Comments
It is difficult for me to follow your code how you posted it. But are you just trying to implement non-dominated sorting?
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I have multiple files consisting of the index of compounds, smiles of compound and two properties. I would like to do pareto opimization to find the compounds with maximum properties. To do so, I defined the problem as below, but I get an error that x is not an interger. I was wondering if you have any suggestion for the issue.
Thank you
Best Regards
Soodabeh
from pymoo.termination import get_termination
from pymoo.optimize import minimize
from pymoo.visualization.scatter import Scatter
from pymoo.core.problem import ElementwiseProblem
from pymoo.algorithms.moo.nsga2 import NSGA2
import numpy as np
import pandas as pd
import gc
import matplotlib.pyplot as plt
Define the problem
class CompoundProblem(ElementwiseProblem):
def init(self, chunk_paths):
super().init(n_var=1, n_obj=2, n_constr=0, type_var=int)
self.chunk_paths = chunk_paths
self.cumulative_sizes = [0]
for path in chunk_paths:
chunk = pd.read_csv(path)
# Filter out compounds based on uncertainty and toxicity
chunk = chunk[(chunk['Uncertainty']*100 > 50) & (chunk['Toxic']*100 < 50)]
self.cumulative_sizes.append(self.cumulative_sizes[-1] + len(chunk))
self.xl = np.array([0]) # Lower bound (index of first compound)
self.xu = np.array([self.cumulative_sizes[-1] - 1]) # Upper bound (index of last compound)
print(self.xl)
print(self.xu)
def _evaluate(self, x, out, *args, **kwargs):
print(f"x[0] type: {type(x[0])}, value: {x[0]}") # Debug print statement
compound_index = x[0]
chunk_index = np.searchsorted(self.cumulative_sizes, compound_index, side='right') - 1
index_in_chunk = compound_index - self.cumulative_sizes[chunk_index]
chunk = pd.read_csv(self.chunk_paths[chunk_index])
# Filter out compounds based on uncertainty and toxicity
chunk = chunk[(chunk['Uncertainty']*100 > 50) & (chunk['Toxic']*100 < 50)]
compound = chunk.iloc[index_in_chunk]
out["F"] = np.array([compound['Uncertainty'] * 100, compound['Toxic'] * 100])
List of chunk paths
chunk_paths = [f'/scratch/gpfs/sg6615/retraining/zinc/chunk_uncertainty_toxicity_{i}.csv' for i in range(5)]
Initialize the problem
problem = CompoundProblem(chunk_paths)
Define the algorithm
algorithm = NSGA2(pop_size=100)
Define the termination criterion
termination = get_termination("n_gen", 100)
Run the optimization
res = minimize(problem,
algorithm,
termination,
save_history=True,
verbose=True)
Plot the results
plot = Scatter()
plot.add(res.F, color="red")
plot.show()
plot.save("pareto_front.png")
Plot the hypervolume vs. number of generations
hv = [algorithm.pop.get("F").hv(ref_point=np.array([10000, 10000])) for algorithm in res.history]
The text was updated successfully, but these errors were encountered: