Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/optimize pipeline #57

Merged
merged 19 commits into from
Nov 27, 2023
Merged

Conversation

MariaGabrielaReis
Copy link
Member

Otimizando passos 4 a 6 e extra: pré processamento, análise de sentimento, teste do modelo e modelagem de tópico

PR Type

Que tipo de mudança a PR introduz?

  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)

Descreva a alteração

Foram criados novos arquivos para melhor organização dos passos da pipeline: um arquivo apenas para o modelo classificador e suas funções, outro arquivo exclusivo de pré processamento e um outro exclusivo para modelagem de tópicos, além de um arquivo separado na pasta utils para avaliar as métricas do modelo classificador (geração de métricas).

Refatorações e otimizações foram feitas no modelo, na formatação do código, organização de funções entre outras ações que permitiram que toda a base de dados fosse pré processada em um tempo médio de 9 a 12 minutos, como demonstrado na captura do terminal abaixo:

image

OBS.: Após as refaforações e otimizações gerais será observado se ainda haverá a necessidade de aplicação da técnica de chunks e se existem outras formas de melhorar o modelo classificador de sentimento

Copy link

sonarcloud bot commented Nov 26, 2023

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

16.4% 16.4% Coverage
0.0% 0.0% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint

@JoaoM-py JoaoM-py mentioned this pull request Nov 27, 2023
7 tasks
@MariaGabrielaReis MariaGabrielaReis merged commit ad1f1ed into develop Nov 27, 2023
2 of 3 checks passed
JoaoM-py added a commit that referenced this pull request Dec 1, 2023
* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

* Feat/#0106 bring birth year and gender (#50)

* Tests and Sentment Analysis (#39)

* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>

* feat: Bring informations of the client

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>
Co-authored-by: GabrielCamargoL <[email protected]>

* Feat/#11 new runtime tracker (#49)

* Tests and Sentment Analysis (#39)

* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>

* feat: New time metric, storage of metrics and

* feat: increase test coverage

* feat: Coverage

* feat: increase test coverage

* feature: Update sonar yml

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Config env

* refactor:Update utils imports

* refactor: Update env

* refactor: Update url and env

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>
Co-authored-by: GabrielCamargoL <[email protected]>

* Feat/#48 unit tests (#51)

* Tests and Sentment Analysis (#39)

* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>

* feat: increase test coverage

* feat: Coverage

* feat: increase test coverage

* feature: Update sonar yml

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Update test url

* feat: Config env

* refactor:Update utils imports

* refactor: Update env

* refactor: Update url and env

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>
Co-authored-by: GabrielCamargoL <[email protected]>

* Feat: new classification model (#52)

* feat: add new model to sentiment analysis

* fix: use correct training and treat exceptions

* fix: remove console logs

* feat: increment reviews quantity

* Refactor/optimize pipeline (#56)

* feat: add new model to sentiment analysis

* fix: use correct training and treat exceptions

* fix: remove console logs

* feat: increment reviews quantity

* feat: select only necessary columns and apply your types

* feat: optimize clear data step

* Delete .env

* Refactor/optimize pipeline (#57)

* feat: add new model to sentiment analysis

* fix: use correct training and treat exceptions

* fix: remove console logs

* feat: increment reviews quantity

* feat: select only necessary columns and apply your types

* feat: optimize clear data step

* Delete .env

* style: format and remove unused files

* feat: create a file for pre processing step

* feat: create file to classification model

* feat: create file to topic model

* feat: get metrics from classification model

* feat: use new steps and adjust details

* fix: remove some code smells and run processing

* fix: update training and test data visualization

* Fix/unit tests (#59)

* Tests and Sentment Analysis (#39)

* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>

* feat: add new model to sentiment analysis

* fix: use correct training and treat exceptions

* fix: remove console logs

* feat: increment reviews quantity

* fix: tests of the updated pipeline

* feat: select only necessary columns and apply your types

* feat: optimize clear data step

* Delete .env

* style: format and remove unused files

* feat: create a file for pre processing step

* feat: create file to classification model

* feat: create file to topic model

* feat: get metrics from classification model

* feat: use new steps and adjust details

* fix: remove some code smells and run processing

* fix: update training and test data visualization

* feat: create and fix unit tests

* fix: remove .coverage

* reafacto: remove unused test

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>
Co-authored-by: GabrielCamargoL <[email protected]>

* Feat/logs and alerts (#60)

* Tests and Sentment Analysis (#39)

* hotfix names

* Refactor: Format date (#27)

* refactor: capitalize sentiments (#24)

Co-authored-by: JoaoM-py <[email protected]>

* fix: translate sentiments (#28)

* Feat/separate training reviews (#35)

* feat: create function to get random data

* feat: create function to get training data

* feat: get training data

* chore: Manual sentiment classification

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>

* Feat/#0303 create classification model (#37)

* feat: update manual classification

* fix: translate topics and sentiments

* feat: add seaborn lib

* fix: translate topics, update reviews count

* feat: create classification model

* feat: training, test and apply classification model

* refactor: Update stars name

* refactor: Update training method

---------

Co-authored-by: JoaoM-py <[email protected]>

* Feat/#42 test coverage (#38)

* feat: create pipeline tests

* refactor: Update files names

* feat: Training data

* feat: Creating pipeline tests

* merge: Merge develop

* feat: Coverage and sonarcloud config

* refactor: Update processing and remove comments

* chore: update python version

* chore: update sonarcloud

* chore: update tests

* chore: update tests

* chore: update tests

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>

* feat: add new model to sentiment analysis

* fix: use correct training and treat exceptions

* fix: remove console logs

* feat: increment reviews quantity

* fix: tests of the updated pipeline

* feat: select only necessary columns and apply your types

* feat: optimize clear data step

* Delete .env

* style: format and remove unused files

* feat: create a file for pre processing step

* feat: create file to classification model

* feat: create file to topic model

* feat: get metrics from classification model

* feat: use new steps and adjust details

* fix: remove some code smells and run processing

* fix: update training and test data visualization

* feat: create and fix unit tests

* fix: remove .coverage

* reafacto: remove unused test

* feat: Organizing metrics, collect of logs and alerts

* fix: Pipeline exec time

* fix: code format and increase reviews quantity

* feat: increase reviews quantity

* fix: Pipeline exec time

* fix: Pipeline exec time

* fix: Pipeline exec time

* fix: Pipeline exec time format

---------

Co-authored-by: Maria Gabriela Reis <[email protected]>
Co-authored-by: GabrielCamargoL <[email protected]>

---------

Co-authored-by: GabrielCamargoL <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Co-authored-by: JoaoM-py <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[#0058] Melhorar a qualidade do código [#0057] Processar a base de dados toda
1 participant