Cluster Analysis of Unsupervised Political Opinions - Undergraduate senior thesis @ Universidad Simón Bolívar
In order to reproduce the workflow of initial experiments in the project, you can follow this suggested order (although, since each experiment is independent from others, you can run just one, or several in any order you want):
measure_embeddings.py
measure_proper_k_for_embeddings.py
measure_possible_eps_values_for_embeddings.py
measure_hyperparameters_for_dbscan.py
measure_optics_clusters.py
measure_hdbscan_clusters.py
measure_mean_shift_clusters.py
measure_affinity_propagation_clusters.py
You can run the following script which has been added to the repository in order to easen up each measurement run with some extra considerations, such as turning off Twitter fetching while tests run
nohup nice -n -19 ./run_initial_experiments.sh &
To run Python scripts that are costly / resource intensive:
- Use
nohup
so that internet interruptions don't... interrupt your script (delete previousnohup.out
files) - Use
nice -n -19
to give your script maximum priority (unless you don't need to) - Use
python -u
to disable the stdout buffer and allow for exceptions and errors to be recorded immediately in log files - Use
&
to daemonize your script
Example:
nohup nice -n -19 python -u measure_embeddings.y &
- MongoDB cheat sheet: https://gist.github.com/bradtraversy/f407d642bdc3b31681bc7e56d95485b6
To run the API (back-end), use this command:
gunicorn -w 2 --threads 2 --preload --bind 0.0.0.0:5000 backend.app:app
Adjust the port and other parameters of Gunicorn as needed.