⊧ dupi

Dupi is an engine for identifying and exploring duplicative text in sets of documents.

Status

Dupi is in alpha/early beta development stage. Please feel free to give it a try (and file issues). We have run it on several document sets successfully, but it definitely needs more testing.

Input

Throw hundreds of thousands of textual documents at it. Or extract text from other documents and send that to dupi.

Output

Find and query for repeated chunks of text.

Tutorial

Design

Design Document

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
attic		attic
blotter		blotter
cmd		cmd
dmd		dmd
docs		docs
internal/shard		internal/shard
lock		lock
post		post
testdata		testdata
token		token
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md
blot.go		blot.go
config.go		config.go
doc.go		doc.go
fnames.go		fnames.go
fnames_test.go		fnames_test.go
go.mod		go.mod
go.sum		go.sum
index.go		index.go
index_test.go		index_test.go
indexer.go		indexer.go
query.go		query.go
remove.go		remove.go
shatter.go		shatter.go
stats.go		stats.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⊧ dupi

Status

Input

Output

Tutorial

Design

Library Reference

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

go-air/dupi

Folders and files

Latest commit

History

Repository files navigation

⊧ dupi

Status

Input

Output

Tutorial

Design

Library Reference

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages