GitHub - databricks-industry-solutions/toxicity-detection-in-gaming: Build a lakehouse for all your gamer data and use natural language processing techniques to flag questionable comments for moderation.

Overview

Toxicity can have a large impact on player engagement and satisfaction. Game companies are working on ways to address forms of toxicity in their platforms. One of the most common interactions with toxicity is in chat boxes or in-game messaging systems. As companies are becoming more data driven, the opportunity to detect toxicity using the data at hand is present, but technically challenging. This solution accelerator is a head start on deploying a ML-enhanced data pipeline to address toxic messages in real time.

** Authors**

Duncan Davis [[email protected]]
Dan Morris [[email protected]]

This series of notebooks is intended to help you use multi-label classification to detect and analyze toxicity in your data.
In support of this goal, we will:
Load toxic-comment training data from Jigsaw and game data from Dota 2.
Create one pipeline for streaming and batch to detect toxicity in near real-time and/or on an ad-hoc basis. This pipeline can then be used for managing tables for reporting, ad hoc queries, and/or decision support.
Label text chat data using Multi-Label Classification.
Create a dashboard for monitoring the impact of toxicity.

© 2022 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.

Library Name	Library license	Library License URL	Library Source URL
Spark-nlp	Apache-2.0 License	https://nlp.johnsnowlabs.com/license.html	https://www.johnsnowlabs.com/
Kaggle	Apache-2.0 License	https://github.com/Kaggle/kaggle-api/blob/master/LICENSE	https://github.com/Kaggle/kaggle-api
Python	Python Software Foundation (PSF)	https://github.com/python/cpython/blob/master/LICENSE	https://github.com/python/cpython
Spark	Apache-2.0 License	https://github.com/apache/spark/blob/master/LICENSE	https://github.com/apache/spark

To run this accelerator, clone this repo into a Databricks workspace. Attach the RUNME notebook to any cluster running a DBR 11.0 or later Runtime, and execute the notebook via Run-All. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. Execute the multi-step-job to see how the pipeline runs.

The job configuration is written in the RUNME notebook in json format. The cost associated with running the accelerator is the user's responsibility.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
config		config
.gitignore		.gitignore
00_context.py		00_context.py
01_intro.py		01_intro.py
02_load_data.py		02_load_data.py
03_simple_classification.py		03_simple_classification.py
04_inference_eda.py		04_inference_eda.py
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
RUNME.py		RUNME.py
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

config

config

.gitignore

.gitignore

00_context.py

00_context.py

01_intro.py

01_intro.py

02_load_data.py

02_load_data.py

03_simple_classification.py

03_simple_classification.py

04_inference_eda.py

04_inference_eda.py

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

NOTICE

NOTICE

README.md

README.md

RUNME.py

RUNME.py

SECURITY.md

SECURITY.md

Repository files navigation

Overview

About

Releases

Packages

Contributors 2

Languages

License

databricks-industry-solutions/toxicity-detection-in-gaming

Folders and files

Latest commit

History

Repository files navigation

Overview

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Languages