Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survival analysis #248

Open
1 of 3 tasks
tucnak opened this issue Nov 21, 2022 · 1 comment
Open
1 of 3 tasks

Survival analysis #248

tucnak opened this issue Nov 21, 2022 · 1 comment

Comments

@tucnak
Copy link

tucnak commented Nov 21, 2022

I'm submitting a

  • bug report.
  • improvement.
  • feature request.

What is survival analysis?

The objective in survival analysis — also referred to as reliability analysis in engineering — is to establish a connection between covariates and the time of an event. The name survival analysis originates from clinical research, where predicting the time to death, i.e., survival, is often the main objective. Survival analysis is a type of regression problem (one wants to predict a continuous value), but with a twist. It differs from traditional regression by the fact that parts of the training data can only be partially observed – they are censored.

re: scikit-survival

In view of great popularity of time series data and time series databases, survival analysis provides a straightforward way to reason about time-to-event predictions. We learnt about Smartcore from PostgresML— machine learning for Postgres, which recently made the switch to use Smartcore for performance reasons. We're trying to estimate how hard it would be to perform time-to-event prediction in our Postgres; it would seem that doing so would require a Smartcore implementation for survival analysis (originally implemented as sckit_learn.linear_model)

Exactly how hard would it be, how much would have to be done from scratch, and where would we even begin to approach this if we don't have the expertise in the subject? Honestly, we would love to pay somebody to do this but where would we look to hire for this?

Best regards

@Mec-iS
Copy link
Collaborator

Mec-iS commented Nov 21, 2022

Hi, thanks for using smartcore. We really like to hear from organizations leveraging the library, the engineers at PostgresML have been very kind to contribute useful code recently.

I have never heard of scikit-survival, I will take a look; though I have worked a little with SCM (structural causal models, you can read some examples in my blogposts), at first sight it looks like they try to tackle similar problems but I don't know of any formal relation between the two approaches. New day, new things to learn.

As usual scikit provides a nice API and we can always try to backtrack, having support of a scientist that knows the the math in depth is sometimes hard to find, fortunately we can usually collect a nice number of tests from existing libraries and experiments or papers to feel confident enough of being numerically accurate. Becoming an expert of a particular framework takes indeed years, the same to make a good library. In my experience it depends mostly on how many layers of nested functions are needed. In very simple words, for example, implementing a pipeline with multiple pre-processing and processing steps takes longer (usually months even in presence of existing code, many "classes" and methods that call each other) than a simple regression that usually means implementing some methods (in smartcore most of the classifiers fit in one file, multiple files usually means compounded effort).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants