-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I generate Arch features of new datasets from GLRM predict function #160
Comments
James: Thank you for bringing me the issue. @us8945 has also brought up a good question on how do we score a new data set using a trained GLRM model. Let me answer his question first here: Given a training dataset, the purpose of GLRM is to extract a set of basis vectors that span the whole subspace where the training dataset is derived from. Hence, the GLRM model will generate a set of archetypes (which are equivalent to the concept of basis vectors) here. Hence, each row vector in the training dataset can be written as a linear combination of the archetypes as yi = x1*archetype1 + x2 archetype2+x3archetype3+... . Note here, the x1, x2, x3 are the coefficients that are returned when we call predict on the training dataset for each row. During training, we derive the archetypes and the coefficients together in an alternate way. Now, given a new set of dataset derived from the same subspace that the training dataset is derived from, the job of the predict function here is find the set of coefficients for each data row using the archetypes that are derived earlier. Here, we already know the archetypes, only need to find the coefficients. This is achieved by setting initial values of coefficients to random values and then using simple gradient descend to minimize the objective function to obtained the correct objective function. |
Thanks Wendy, but unfortunately, this is past my depth level. "GLRM, you decompose a matrix A = XY and you perform clustering on X. For a new dataset, ANew, you need to get your new XNew. To do this, you perform XNew = ANew * inverse(Y)" How do I implement this ANew * inverse(Y) in R in order for me to cluster on the features from XNew Apologies, I'm quite a novice in this space. |
James: I know what you are looking for, the X for a new dataset. Luckily Uri (@us8945) has brought up the issue to me. I will write a new function for you in order for you to get the new X. The predict function return Anew to you but you are looking for the new X. Will get this done for you. Thank you for bringing this to our attention. W |
Here is the JIRA: https://h2oai.atlassian.net/browse/PUBDEV-8750 |
Raised the same question here;
https://stackoverflow.com/questions/72753783/how-do-i-generate-the-archetypes-of-new-dataset-from-the-glrm-predict-function.
I have used these sites as reference and though has been resourceful, I'm unable to regenerate the reduced dimensions of new datasets via the glrm predict function
I work in the Sparklyr environment with H2o. I'm keen to use the GLRM function to reduce dimensions to cluster. Though from the model, i am able to access the PCAs or Arch, i would like to generate the Archs from the GRLM predict function on new datasets.
Appreciate your help.
Here is the training of the GLRM model on the training dataset
The Arch Types from the training dataset:
But when i wish use the trained GLRM model on new dataset to regenerate these arch types,
I got the full dimensions instead of the Arch types as per X above?
I'm using these Arch as features for clustering purposes.
thank you
The text was updated successfully, but these errors were encountered: