Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapse repeated characters (possibly using regex) #8

Open
finnbear opened this issue Oct 9, 2020 · 1 comment
Open

Collapse repeated characters (possibly using regex) #8

finnbear opened this issue Oct 9, 2020 · 1 comment
Labels
good first issue Good for newcomers

Comments

@finnbear
Copy link
Contributor

finnbear commented Oct 9, 2020

The program

package main

import (
	"fmt"
	"github.com/TwinProduction/go-away"
)

func main() {
	test("shit")
	test("shiit")
	test("shiiiiiit")
}

func test(word string) {
	fmt.Printf("\"%s\" profane? %t\n", word, goaway.IsProfane(word))
}

prints

"shit" profane? true
"shiit" profane? false
"shiiiiiit" profane? false

I suggest you include a list of regexes to match cases such as this.

An alternative would be to collapse all repeated characters before matching, although this would have an issue with any profanities that contain repeated characters that are no longer profane once collapsed.

@TwiN
Copy link
Owner

TwiN commented Oct 9, 2020

Completely agree with you on this. Tackling this is already in the list of TODOs on the README:

[TODO] All words that have the same character repeated more than twice in a row are removed (e.g. poooop -> poop)

  • NOTE: This is obviously not a perfect approach, as words like fuuck wouldn't be detected, but it's better than nothing.

I think one of the solutions could be to only allow specific characters to be repeated twice, such as o (e.g. poop) and t (e.g. letter), but collapse letters that aren't usually repeated into a single letter, like i (e.g. shiit) or u (e.g. fuuck).

I'm not sure regex is the best solution for this either, because Golang's Regex is known for not being the fastest, but it would be good to check what the performance difference is between multiple strings.Replace(...) and a single regex.

Anyways, thanks for creating an issue for this.
I can't promise when I'll be able to tackle the issue, but if you want to give it a try, you're welcome to :)

@TwiN TwiN added the good first issue Good for newcomers label Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants