Hello World! My name is Luis Vinicius, also known as Tito! Make yourself at home and find content on Data Science, Machine Learning and Python here.👋🏾
My Data Science Repository | Linkedin
A data science nut, fearless young man, rap listener and researching ethics, biases and justice in the use of algorithms and development of machine learning models. Graduated in Information Management at UFPE, I am a Data Engineer at Fundação 1bi, studying for a master's degree in Computer Science at the UFPE IT Center (researching ethics, biases and algorithmic justice in machine learning), I have been studying data engineering, and created contents at el tito lab.
As a data engineer at the 1BI Foundation, I am responsible for building the architecture and data pipeline of the 1BI Foundation's products and developing data analysis for an educational platform that aims to support thousands of teachers in preparing classes.
As a Junior Data Scientist at Data Rudder, I worked on projects in the areas of fraud, finance, health, credit and collection with large clients.
KNOWLEDGE IN:
- Programming languages: Python, R and JavaScript.
- Analytics: SQL, PostgreSQL, MySQL, Power BI, Data Studio and Qlik.
- What I usually use for DS and ML: Pandas, Scikit-learn, Plotly, Pyspark, Pycaret, SciPy, NumPy, regression and classification algorithms.
- Extra knowledge: Agile methodologies, Tensorflow, building/deploying ML models and Deep learning.
KNOWLEDGE ABOUT DATA ENGINEERING:
- Pipeline orchestration with: Apache Airflow and Prefect
- Data processing with Spark on data bricks
- Big data with hadoop and hive
- Understanding of concepts: ETL, ELT, Data lakes, Data Warehouse, Batch and Streaming
- Data visualization integration with Metabase
- AWS databases: AWS RDS (Aurora) and AWS Dynamo DB
- Data lakehouse with AWS: S3 and Athena
- Data ingestion with Airbyte and AWS Kinesis
- Data transformation with DBT
- Other AWS tools: Sagemaker, Amazon EMR and Glue (As a data crawler)
For those who don't know me, my name is Luis Vinicius, also known as Tito and I'm 23 years old. Despite always admiring the area of technology, my interest in the area intensified in 2016 when I wanted to work at Riot Games, the company that created League of Legends. In 2018, the medium and already had the necessary technology to follow the Information Management course at the Federal University of Pernambuco where I am currently graduating. It was in the same year of 2019 that I became interested in the python programming language in the area of data science and artificial intelligence, where since then I have been treading a path to become a good professional in the area and become a reference.
Here you can find the notebooks of my projects in the area of Data Science and Machine Learning.
-
Machine learning project to predict possible outcomes of 2022 world cup matches: Machine learning project to predict possible outcomes of 2022 world cup matches: This is a project for the purposes of curiosity and machine learning study, with the aim of developing a model capable of predicting possible outcomes of the 2022 World Cup matches, until reaching the result of the grand winner of the championship.
-
League of Legends and Data Science – Predicting match results: This Machine Learning project, defined as an end-to-end project, aimed to go from collecting match data to building a machine learning model, to predict the chances of the team that is playing on the blue side. on the side of the map win. Performing steps such as: Pre-processing and data analysis, dimensionality reduction and selection of variables, and construction of both a model completed with XGBClassifier, and construction of a logistic regression model from the results obtained from AutoML with Pycaret.
-
Spotify & Python and Data Science – Data Analysis of Artist NexoAnexo Albums: The objective of the project was to perform a data analysis of the Spotify albums by the artist NexoAnexo, going through the main steps of a data analysis. Being, data collection, pre-processing, exploration and visualization of the data. Finally, after the analysis was completed, an application/dashboard was built with Python and the application was made available on the web through heroku. Another objective is that from the conclusions made from the analysis of the data of the songs of the albums, factors that help or contribute to an album or song to be more successful and how this can be used in future releases could be identified.
Repository/Application source code
After publicizing the project and the application developed with Streamlit on Linkedin, Product Marketing Ted Ricks from Streamlit found my application and in his words said "Really enjoyed your app- wanted to let you know it was included in this week's Weekly Roundup on our community forum Streamlit", Therefore, the application was included in the [weekly summary 29/11/2020](https://discuss.streamlit.io/t/weekly-roundup-agraph components-streambackmachines-text-generation-tutorials-and-more/7640) from the Streamlit community in the apps topic of the week.
-
What they didn't tell you about the coronavirus: An analysis of covid-19 data: During the month of March, in China, the number of recovered cases were already greater than the number of confirmed cases, however, countries such as the United States and South Korea still had their number of cases of deaths greater than the number of recovered cases. and for countries like Canada and Brazil, it was still very new. In this brief analysis of covid-19 data from 01/22 to 03/09, I was able to identify and alert the number of increasing cases of deaths in countries like the United States, even before the high peak of the virus.
-
Machine Learning for breast cancer detection: In this Machine Learning project, a simple Machine Learning model was built in order to detect the presence of breast cancer.
-
Exploratory Data Analysis with Streamlit: This application was built with streamlit, a python framework for creating an application/dashboard. The application makes an initial exploratory analysis of the data through statistical methods and data visualization, I also took the opportunity to insert some statistical explanations in the application.
These are my participations in technology events or meetings, speaking about Data Science.
-
Speaker (Northeast Python 2024) - Data balancing with Python to mitigate sampling and algorithmic biases: The lecture aims to present an experiment by building a classification model with data balancing, balancing data from groups such as gender and race, using techniques such as smote, undersampling and oversampling in unfair datasets that have imbalances in the data classes and the final evaluate the model with and without balancing.
-
Speaker (AfroPython 2023)- How to build an ML model capable of predicting possible results of the 2022 World Cup: It was discussed how it was built and how it is possible to build a model capable of predicting possible results and winner of the 2022 World Cup using Python and its libraries focused on data science and FIFA data extraction.
-
Speaker (Global Latinx tech Conference by GitHub) 2022 - How to build an ML model capable of predicting possible results of the 2022 World Cup: It was discussed how it was built and how it is possible to build a model capable of predicting possible results and winner of the 2022 World Cup using Python and its libraries focused on data science and FIFA data extraction.
-
The first three months of 2020 - Python analysis of the first impact of covid-19 - Northeast Python 2022: It was a lecture to present the results of the analysis carried out with Python on covid-19 data from 01/22/2020 to 03/09/2020, a period in which some countries such as China had already suffered a major impact from covid-19.
-
What happens if you join Spotify and Python and Data Science - Python Brazil 2020: In the lecture, the Spotify API was presented and how it can be useful for us developers and especially data professionals. From this, it was seen how to collect and analyze Spotify data, just using the Python programming language.
-
Data Science and How to Maintain Good Practices When Building Data Visualizations - Technology Week - University Uninassau: The lecture presented at Uninassau College Technology Week aimed to introduce students to this world of Data Science. Being talked about data science nowadays, the fact that it is an interdisciplinary area and not just linked to the technology area, how students could take their first steps to get into the data science market and how to maintain good practices when building data visualizations.
These are some of the articles I wrote for my linkedin, on linkedin, my articles tend to have more insights from the projects, curiosities and doubts about data science, machine learning and pains faced when building projects with a more opinionated and informal character.
-
Recommendation System and Netflix: With the advancement of technology, we are increasingly familiar with this type of system, even without knowing how it works, and recommendation systems have become very important in the online sales industry of sites such as Amazon, Submarino and in the entertainment business. , more specifically streaming services like Youtube and Netflix, because of their ability to influence users' choices.
-
Spotify & Python and Data Science - How to Collect Spotify Data with Python: Data collection is the first step of every Data Science project, if we choose not to collect a ready-made dataset, we must look for data sources that can be useful to build our database and extract the data from the sources.
-
Spotify and Python and Data Science - Building an Application with Python + Deploy with Heroku: This article is an excerpt from my project Spotify & Python & Data Science - Analyzing Artist Data NexoAnnexo. After finishing the project, I aimed to build with Streamlit an application in which I could present graphics and insights that were taken from the project, in a more presentable way beyond Google Colab and Jupyter Notebook.
More articles in linkedin
Made with 💖 by Luis Vinicius