Add a Poisson regression model to Shogun #5000

Khalifa1997 · 2020-04-07T13:32:37Z

As needed for my GSoC project, it would be quite beneficial for me and future Shogun users aswell to have a possion regression model, especially for count-based problems. If it's possible I'd like to be working on this issue

karlnapf · 2020-04-07T17:03:33Z

As said on irc, this is very welcome and useful. You will need to write a new machine class (potentially inheriting from LinearMachine but not necessarily). I think it might be nice to make this modular wrt likelihood (and link) functions, although we can certainly start without that (as long as the likelihood codes inside the class are separate from the rest). As you will need to use gradient descent to fit this model, make sure to check out Shogun's optimization algorithms, which you should use. Re gradient, you could start by implementing it by hand, and then make that also modular wrt the likelihood function. I suggest you start by sending class headers, a diagram how things are intended to interact, then some pseudo-code, and then we can take it from there. Check how other libs are dealing with design/modularity/optimization. You could also start by writing a Python (object oriented, roughly following Shogun's API) draft of everything if that helps.

karlnapf · 2020-04-07T17:23:13Z

https://en.wikipedia.org/wiki/Poisson_regression

karlnapf · 2020-04-07T17:28:29Z

Here is a list of common link functions and likelihoods: https://www.statsmodels.org/stable/glm.html

geektoni · 2020-04-07T17:43:14Z

As I mentioned in the email, also this lib here could be used as an inspiration https://github.com/glm-tools/pyglmnet

Khalifa1997 · 2020-04-07T19:37:08Z

This is a class-diagram of what I propose, we'd inherit from the Machine class since we are using the train() and apply() functions anyways. I have added a question mark on exposure since I am not sure if its applicable to all GLMs. Also to further add onto that, I don't think that different GLMs would have anything that's unique to them so no need to implement a unique class per GLM (they would all just share the same data members namely: betas/weight)

vigsterkr · 2020-04-07T19:45:48Z

linearmachine?

Khalifa1997 · 2020-04-07T19:50:49Z

linearmachine?

if you mean inheriting from LinearMachine well the issue comes that it already has things like bias that aren't applicable at all -to my knowledge- to GLMs

geektoni · 2020-04-07T19:58:37Z

linearmachine?

if you mean inheriting from LinearMachine well the issue comes that it already has things like bias that aren't applicable at all -to my knowledge- to GLMs

GLMs can have a bias :) they are just another type of "linear model" after all. Try to have a look to this page here http://glm-tools.github.io/pyglmnet/ for the math.

Khalifa1997 · 2020-04-07T20:02:54Z

linearmachine?

if you mean inheriting from LinearMachine well the issue comes that it already has things like bias that aren't applicable at all -to my knowledge- to GLMs

GLMs can have a bias :) they are just another type of "linear model" after all. Try to have a look to this page here http://glm-tools.github.io/pyglmnet/ for the math.

do you mean the epsilon is the bias in this examples?

geektoni · 2020-04-07T20:06:10Z

The bias is usually represented by beta_0. The epsilon means a noise value.

Khalifa1997 · 2020-04-07T20:20:28Z

Ahh this makes sense in my model, I added all betas including the bias to one array, splitting them into one bias variable that would be inherited from LinearMachine and the rest "weights" which would be inherited from LinearMachine aswell would solve things regarding this problem

karlnapf · 2020-04-08T11:27:23Z

I agree on inheriting from linear machine to reuse weights/bias
I am not sure about putting the optimizer should be part of the class state, it could (should?) interact from outside, same goes for things like learning rate etc. The model would only specify the gradients.
link function should also be modular
no need for get methods, we have them for all parameters, rather be more precise about class members
not sure what "Distribution" here does?

Khalifa1997 · 2020-04-08T19:30:29Z

@karlnapf Yea the Distribution class doesn't belong here, I was confused about that part. But I don't understand points 2 and 3 you mentionned, shouldn't there be some gradient update algorithm for poission to update the weights? and for 3 what do you mean with link function being modular?

karlnapf · 2020-04-09T11:11:22Z

link function modular -> different link functions possible.

About the optimizer, nevermind for now. I suggest we move into some actual code for discussing this further

Khalifa1997 · 2020-04-11T03:41:12Z

Hey @karlnapf
I have made an intial header file but I wanted to take your opinion on it before hand, https://pastebin.com/VbeNDyM2
So those are just some things I had thought of
Since the Link Functions and Distributions/Family are meant to be modular, I thought of using an enum for Link Functions and for Distributions/Families instead of adding an extra layer of un-needed complexity where we'd have a class for Link Fns and Families..
I also would like to know which type of Features should I be using I saw some Regressions using DenseFeatures and others using DotFeatures which one would you recommend here

Also one last point, I am not sure how would I fit in LikelihoodModels and DescendUpdaters here
I used a pastebin instead of a PR so that I hear your thoughts about the points I mentionned and then start porting a PR

karlnapf · 2020-04-11T09:54:13Z

It is really hard to give feedback from a pastebin because I canot comment inline.

However

things like learning rate should not be part of the class, as that is part of the inference algorithm/gradient descent and as such should be stored in a class related to that
I don't like the enums for the modularity, but you can keep it like this for now. We can work on this later
please check how to document c++ code with doxygen, it is different than python docstrings
please follow shogun standards in terms of code formating, variable naming, whitespace, etc
dont accept features in the ctor, the are passed in the train method (which is inherited from Machine and accepts Features
the code doesnt show how you would fit the model, you should write a train_machine function (read the base classes for the exact methods that you need to override, check other simple algorithms how to do it
your class needs a get_name
related to that, please always make sure your code compiles before you ask for feedback, the compiler will tell you many many things that you need to do (like you need to add get_name as it is purely virtual in the base class
in the train method, assume that you have a gradient descent class that accepts parameters ( a state), and a gradient function, and does the updating for you
Try to make something work, this draft covers only an extremely small fraction of the things you need to do in order to write this model, so having code that actually runs and does something is where you need to aim

karlnapf added the ML: New Algorithms label Apr 7, 2020

Khalifa1997 mentioned this issue Apr 12, 2020

Added GLM #5004

Closed

Hephaestus12 mentioned this issue Apr 30, 2020

WIP Write Generalized Linear Machine class #5006

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Poisson regression model to Shogun #5000

Add a Poisson regression model to Shogun #5000

Khalifa1997 commented Apr 7, 2020

karlnapf commented Apr 7, 2020 •

edited

karlnapf commented Apr 7, 2020

karlnapf commented Apr 7, 2020

geektoni commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020 •

edited

vigsterkr commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020

geektoni commented Apr 7, 2020 •

edited

Khalifa1997 commented Apr 7, 2020

geektoni commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020

karlnapf commented Apr 8, 2020

Khalifa1997 commented Apr 8, 2020

karlnapf commented Apr 9, 2020

Khalifa1997 commented Apr 11, 2020 •

edited

karlnapf commented Apr 11, 2020

Add a Poisson regression model to Shogun #5000

Add a Poisson regression model to Shogun #5000

Comments

Khalifa1997 commented Apr 7, 2020

karlnapf commented Apr 7, 2020 • edited

karlnapf commented Apr 7, 2020

karlnapf commented Apr 7, 2020

geektoni commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020 • edited

vigsterkr commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020

geektoni commented Apr 7, 2020 • edited

Khalifa1997 commented Apr 7, 2020

geektoni commented Apr 7, 2020

Khalifa1997 commented Apr 7, 2020

karlnapf commented Apr 8, 2020

Khalifa1997 commented Apr 8, 2020

karlnapf commented Apr 9, 2020

Khalifa1997 commented Apr 11, 2020 • edited

karlnapf commented Apr 11, 2020

karlnapf commented Apr 7, 2020 •

edited

Khalifa1997 commented Apr 7, 2020 •

edited

geektoni commented Apr 7, 2020 •

edited

Khalifa1997 commented Apr 11, 2020 •

edited