Skip to content

Commit

Permalink
Merge pull request #50 from flennerhag/dev
Browse files Browse the repository at this point in the history
0.1.6
  • Loading branch information
flennerhag authored Aug 20, 2017
2 parents 815256d + 604e880 commit a88bc6a
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 29 deletions.
30 changes: 8 additions & 22 deletions docs/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,30 +12,16 @@ problem is not addressed here.
Bad interaction with third-party packages
-----------------------------------------

Parallel processing with generic Python objects is a difficult task, and while
ML-Ensemble is routinely tested to function seamlessly with Scikit-learn, other machine
learning libraries can cause bad behaviour during parallel estimations. This
is unfortunately a fundamental problem rooted in how `Python runs processes in parallel`_,
and in particular that Python is not thread-safe. ML-Ensemble is by configured
to avoid such issues to the greatest extent possible, but issues can occur.
ML-Ensemble is designed to work with any estimator that implements a minimal API, and is specifically unit tested to work with Scikit-learn. When using estimators from other libraries, it can happen that the estimation stalls and fails to complete. A clear sign of this is if there is no python process with high CPU usage.

In particular, ensemble can run either on multiprocessing or multithreading.
For standard Scikit-learn use cases, the GIL_ can be released and
multithreading used. This will speed up estimation and consume less memory.
However, Python is not inherently thread-safe, so this strategy is not stable.
For this reason, the safest choice to avoid corrupting the estimation process
is to use multiprocessing instead. This requires creating sub-process to run
each job, and so increases additional overhead both in terms of job management
and sharing memory. As of this writing, the default setting in ML-Ensemble is
'multiprocessing', but you can change this variable globally: see :ref:`configs`.
Due to how `Python runs processes in parallel`_, child workers can receive a corrupted thread state that causes the worker to try to acquire more threads than are available, resulting in a deadlock. If this happens, raise an issue at the Github repository.
There are a few things to try that might alleviate the problem:

In Python 3.4+, ML-Ensemble defaults to ``'forkserver'`` on unix systems
and ``'spawn'`` on Windows for generating sub-processes. These require more
overhead than the default ``'fork'`` method, but avoids corrupting the thread
state and as such is much more stable against third-party conflict. These
conflicts are caused by each worker thinking they have more threads available
than they actually do, leading to deadlocks and race conditions. For more
information on this issue see the `Scikit-learn FAQ`_.
#. ensure that all estimators in the ensemble or evaluator has ``n_jobs`` or ``nthread`` equal to ``1``,
#. change the ``backend`` parameter to either ``threading`` or ``multiprocessing`` depending on what the current setting is,
#. try using ``multiprocessing`` together with a fork method (see :ref:`configs`).

For more information on this issue see the `Scikit-learn FAQ`_.

Array copying during fitting
----------------------------
Expand Down
6 changes: 1 addition & 5 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,7 @@ estimator. By leveraging API elements from deep learning libraries like Keras_
for building ensembles, it is straightforward to build deep ensembles
with complex interactions.

ML-Ensemble is open for contributions at all levels. There are
some low hanging fruit to build introductory example, use cases and
general benchmarks. If you would like to get involved, reach out to the
project's Github_ repository. We are currently in beta testing, so please do
report any bugs or issues by creating an issue_. If you are interested in
ML-Ensemble is open for contributions at all levels.If you would like to get involved, reach out to the project's Github_ repository. We are currently in beta testing, so please report any bugs or issues by creating an issue_. If you are interested in
contributing to development, see :ref:`dev` for a quick introduction to
ensemble implementation, or check out the issue tracker.

Expand Down
8 changes: 7 additions & 1 deletion docs/updates.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,13 @@ Change log

* 07/2017 Release_ of version 0.1.5.1 and 0.1.5.2
- Bug fixes
- ```clear_cache`` function to check for residual caches. Safeguard against old caches not being killed.
- ``clear_cache`` function to check for residual caches. Safeguard against old caches not being killed.

* 08/2017 Release_ of version 0.1.6
- Propagate sparse input features
- On the fly prediction array generation
- Threading as default backend, ``fork`` as default fork method
- Bug fixes

.. _Release: https://github.com/flennerhag/mlens/releases
.. _Feature propagation:
2 changes: 1 addition & 1 deletion mlens/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import mlens.config
from mlens.config import clear_cache

__version__ = "0.1.5.dev0"
__version__ = "0.1.6"

__all__ = ['base',
'utils',
Expand Down

0 comments on commit a88bc6a

Please sign in to comment.