Skip to content

Releases: AnotherSamWilson/miceforest

Major Update 6

27 Jul 14:28
Compare
Choose a tag to compare

Major Update 6 comes with improvements in API usability.

  • native support of imputing numpy arrays is no longer supported. It made the code too complex.
  • Mean match customization is now much more simple, and handled entirely through parameters instead of custom classes. The parameters mean_match_strategy and mean_match_candidates are all that are needed to control mean matching.
  • The saving and loading of kernels was modernized to make use of getstate and setstate, without the need for a load_kernel method.
  • Major improvements to testing suite.
  • Plotting was moved to plotnine.

Release for Zenodo DOI

12 Dec 14:01
Compare
Choose a tag to compare

This release will generate a DOI for this project.

Stable v5.6.0

29 Jul 19:33
8a5ecac
Compare
Choose a tag to compare

This release implemented some major changes:

  • Implemented MeanMatchScheme
  • Implemented mean matching on shap values
  • Tighter controls and warnings around categorical levels
  • Included type hints for major functions.

This release is marked as stable because the API will not see significant changes in the future.

v5.0.0

15 Oct 19:33
Compare
Choose a tag to compare
  • New main classes (ImputationKernel, ImputedData) replace (ImputationKernel, ImputationKernel, ImputedDataSet, MultipleImputedDataSet).
  • Data can now be referenced and imputed in place. This saves a lot of memory allocation and is much faster.
  • Data can now be completed in place. This allows for only a single copy of the dataset to be in memory at any given time, even if performing multiple imputation.
  • mean_match_subset parameter has been replaced with data_subset. This subsets the data used to build the model as well as the candidates.
  • More performance improvements around when data is copied and where it is stored.
  • Raw data is now stored as the original. Can handle pandas DataFrame and numpy ndarray.

Major update

28 Sep 20:31
Compare
Choose a tag to compare

This release improved a number of areas:

  • Huge performance improvements, especially if categorical variables were being imputed. These come from not predicting candidate data if we don't need to, using a much faster neighbors search, using numpy internally for indexing instead of pandas, and others.
  • Ability to tune parameters of models, and use best parameters for mice.
  • Improvements to code layout - got rid of ImputationSchema.
  • Raw data is now stored as a numpy array to save space and improve indexing.
  • Numpy arrays can be imputed, if you want to avoid pandas.
  • Options of multiple build-in mean matching functions.
  • Mean matching functions can handle most lightgbm objectives.

Switch to lightgbm

03 Sep 21:46
Compare
Choose a tag to compare

This is a major release, with breaking API changes:

  • The random forest package is now lightgbm
    • Much more lightweight (serialized kernels tend to be 5x smaller or more)
    • Much faster on big datasets (for comparable parameters)
    • More flexible... We can now use gbdt if we wish. lightgbm is more flexible in general.
  • Added a mean_match_subset parameter. This will help greatly speed up many processes.
  • mean_match_candidates now lazily accepts dicts as long as the keys are a subset of parameters in variable_schema.
  • Model parameters can be specified by variable, or globally.
  • Mean matching function can be overwritten if the user wishes.

Major Update

08 Sep 15:34
Compare
Choose a tag to compare
  • Models from all iterations can be saved with save_models == 2.
  • Kernel classes inherit from base imputed classes - allows for methods to be called on imputed datasets obtained form impute_new_data().
  • Time log was added
  • MultipleImputedDataset is now a collection of ImputedDataSets with methods for comparing them. Subscripting gives the desired dataset.
  • Tests updated to be much more comprehensive
  • Datasets can now be added and removed from a MultipleImputedDataSet/MultipleImputedKernel.

Stable Release

31 Aug 01:35
Compare
Choose a tag to compare

Automatic testing, coverage, and formatting has been implemented. Code is (reasonably) bug free.