Crystal Ball for Startups: Predictive Modeling Meets Empirical Analysis to Unearth the Key to Startup Success!
Welcome aboard the voyage of discovery that is "Demystifying Startup Success"! This Final Year Project (FYP), which scored full marks, uses Crunchbase's rich dataset and the power of machine learning to provide insights into the journey of startups. By examining why 90% of startups fail, this project aims to steer entrepreneurs and investors towards better decisions, mitigating pitfalls and maximizing opportunities. The models developed through this research promise a high degree of reliability as the backbone of our claims is data - an asset that seldom lies. The two main stages of this project: data preparation and prediction modeling are represented as separate, but interconnected modules.
No additional libraries are needed to run this project beyond the Anaconda distribution of Python. The code can be ran directly by you by making a copy of it and editing it in google colab. The code should run with no issues using Python versions 3.*.
The startup ecosystem has seen an explosion of activity in recent years. However, the high failure rate (90%) casts a shadow of uncertainty, creating a challenge for investors and entrepreneurs. A lack of financial records and proven track records further exacerbate this uncertainty. Through this research, we aim to harness data to navigate this uncertain world and guide investors and entrepreneurs alike. Covering multiple continents, we aspire to deliver universally applicable insights.
There are two primary scripts in this repository:
crunchbase_cleanup.ipynb
- This script performs the data cleaning, feature engineering, and variable selection on the Crunchbase dataset, transforming numerous tables into a single, manageable dataframe.startup_predictor.ipynb
- This script takes the clean, prepared data from the previous script and leverages it to construct and train our predictive models.
To get a comprehensive understanding of this project, please visit [Link to Project]. You are invited to explore, fork, clone, or provide feedback on this project.
This project is licensed under the MIT License - see the LICENSE.md file for details.
Created by Marc Violides
This project was inspired by the entrepreneurial spirit and the quest to better understand the world of startups. The work is also an academic endeavor that seeks to add to the existing body of knowledge in the field of startup success prediction. Special thanks to Crunchbase for providing the dataset through its educational license.
This FYP is meticulously organized to guide you through the research journey:
- The introduction: Defines the scope, context, research questions, and objectives.
- The literature review: Offers an overview of current research on startups, machine learning, and predictive modeling for startup success.
- The methodology: Details the research design, data collection and processing techniques, feature selection, and modeling approaches.
- Results and analysis: Presents univariate, bivariate analyses, and regression analysis, followed by the modeling phase and optimization of the selected models.
- Discussion: Interprets the findings, their practical implications, and comparisons with similar studies while also discussing the study's limitations.
- Conclusion: Highlights the key takeaways and main results.
Please check the project link above for the complete exploration of this work. Let me know if you have any suggestions or questions! The motivation behind this project was to understand why 90% of startups fail. By identifying the key factors for success, we can provide a roadmap for entrepreneurs and startups to avoid common pitfalls.