Skip to content
View matth3us's full-sized avatar
🤖
Automating everything!
🤖
Automating everything!
  • São José dos Campos, SP
Block or Report

Block or report matth3us

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
matth3us/README.md

My name is Celso, I'm a data scientist. 📡

🏗 What am I working on?

You can see the pinned projects bellow, but let me tell you why I pinned them:

  1. tccENAP: capstone project for my graduate course on data analyis for public policies, where I examine distance by car between municipalities without Brazilian's IRS assistance centes and municipalities with these centers, in order to highlight possible opportunities in establishing and closing centers.
  2. iusNominatim: combination of OSM nominatim in a docker, libpostal and some data wrangling with brazilian geo data from geobr to create a geocoding system that delivers better data than vanilla nominatim.
  3. pypelineDeals: Python wrapper for the API of PipelineDeals.

You can also check my kaggle profile for some interesting projects outside of github.

🤐 What am I working on outside of github?

Due to business strategy and NDAs, I can't put everything I do in github - or we have to keep it in github but private. I'm currently working for Quero Educação,, a marketplace for private education in Brazil. Think Booking, but for private college, schools and other courses. I work as the data lead for the K12 branch. So, here's what takes most of my time:

  • I have experience leading people! Before the pandemic, I use to lead two BI analysts, but now I'm only leading one;
  • A looot of SQL. Like a lot. Mostly quick analysis in Spark SQL, using databricks, for quick business decisions. Sometimes, we can make more complicated analysis, like regressions, classifications or even quasi-experimental studies in order to make strategy pivots, when needed;
  • I create a lot of datamarts in databricks using mostly databricks jobs in Spark SQL and pyspark. I use to write them in R/SparkR as well, but since the community is stronger on python/pypark, I have less of a headache and more support if I keep it all in python. I'm the R guy of the company and not a lot of stack overflow for SparkR issues, so...
  • Some dashboards, mostly Datastudio. I can do them in PowerBI too, but I try to keep dashboards tasks at a minimum in our team's backlog. Dashboards are dead, people!
  • Every semester, we forecast future growth of revenue using Facebook Prophet. We're thinking of using it for other kinds of timeseries forecasting, like B2C leads and visits;
  • We have a Bayesian AB testing framework based on this white paper in the company, that we're refactoring and trying to build as a python package; I'm contributing with this cross sector project;
  • And finally, we've been dabbling with Natural Language Processing (NLP) so that we can better know the K12 market in Brazil. Public data is sparse and diffuse, and we have to gather data from several public sources, where school names, addresses and owners are often not exactly the same. Therefore, a lot of fuzzy matching that we're constantly improving.

📫 Get in touch!

Best way to reach me is checking my Linkedin LinkedIn.

💻 Check my most used languages below!

Top Langs

Pinned Loading

  1. tccENAP tccENAP Public

    Projeto para trabalho de conclusão de curso em análise de dados em Políticas Públicas na ENAP

    HTML

  2. iusNominatim iusNominatim Public

    Docker container to parse addresses and return coordinates, checking them againts IPEA/IBGE vectorized maps of Brazil

    R 1

  3. pypelineDeals pypelineDeals Public

    Unofficial Python wrapper for Pipeline Deals API

    Python 1