DocToIdoit

Description

Import ordered products from a delivery note directly to i-doit.

You only have to define your products once in the appsettings.json file. Everthing after will be autonomous. With OCR, the PDF document will be processed to searchable text. Therefore DocToIdoit will filter out all products which are set up in the appsettings.json and import them with order date, serial number and delivery note number to i-doit.

IronOCR license

DocToIdoit uses IronOCR as ocr engine. If you want to use it in a production environment without a debugger, you need a license for it. There is also a trial license for 30 days.

Delivery note document layout

Make sure that your documents meet this requirements to be processed correctly.

Only one date has to be on the document (Regex for date detection can be modified in the appsettings.json)
Some kind of a delivery note number has to be on the document (Regex for delivery note detection can be modified in the appsettings.json)
The ordered products on that document has to be in this layout:
- The serial numbers have to begin in the next line after the product name
- Before the serial numbers begin, there has to be an idicator e.g. "S/N:" or "Serials:", but only at the beginning of the line!
Example:
Dell Inspiron 15 3505
S/N: EG3439898, EG34898478
Dell Inspiron 14 7400 Core i7 16GB RAM
S/N: EF456898, EF45558478

Requirements

i-doit

Running i-doit installation (>=1.17)
Modified i-doit API Addon to use object templates
IronOCR license

PDF

PDFs should have least a resolution of 300 dpi
PDFs should not be compressed to heavily, the better the quality of the PDF and source document, the more reliable the ocr result.

Tesseract language file

The default ocr language is German, if you want to use other languages you need to set a custom language file in the appsettings.json. Take a look here

Installation (Docker)

Install docker and docker-compose
Get the docker-compose file
Edit the docker-compose file as your needs, set your volume paths for the input scan folder and your appsettings.json
Replace latest in kleinrotti/doctoidoit:latest with a version you want to use
Run docker-compose -p doctoidoit up -d

Installation (bare metal on Ubuntu or Debian)

DocToIdoit

Install this packages: dotnet-runtime-6.0 apt-utils libgdiplus libc6-dev tessseract-ocr libtesseract-dev
Copy the compiled files to a folder on the server e.g. /usr/local/bin/DocToIdoit
Create a new user which will be used to run DocToIdoit (Do NOT use root as user)
Edit the appsettings.json as your needs
Run dotnet DocToIdoit.dll with the user you just created
(Optional) Use the sample doctoidoit.service file to create a systemd service

Documents folder

Make sure that your folder for scanned documents has read and write permission
Make sure that new files which are written by SMB or FTP are locked during write/send, otherwise it could happen that DocToIdoit tries to process the new file before write process is finished
DocToIdoit will process new files every 30 sec. automatically in that folder

Performance

I recommend at least 2 CPU cores and 2 GB RAM. You can calculate around 100MB per processed page. If you have large PDFs with many pages or if you turn on parallel processing, ensure you have enough RAM and CPU cores.

appsettings.json Matrix

Property	Description	Required	Default	Type
IronOcr.LicenseKey	IronOCR license key	Yes	-	string
Watcher.ScanPath	Listen for new files in this directory	Yes	-	string
Watcher.ProcessingPath	Files will be moved there while processing	Yes	-	string
Watcher.ErrorScanPath	Files will be moved there on errors	Yes	-	string
Watcher.OcrResultPath	After ocr finished, a text file will be created there	Yes	-	string
Watcher.ProcessAsync	Process multiple files in parallel. More RAM and CPU needed.	Yes	False	bool
Watcher.ScanInterval	Interval to scan for new files	Yes	30000	int
Idoit.Server	URL to your i-doit Server API	Yes	-	string
Idoit.ApiKey	i-doit Addon api key	Yes	-	string
Smtp.MailOnError	Send an email on processing errors	No	False	bool
Smtp.Server	Smtp Server IP	No	-	string
Smtp.Port	Smtp Server Port	No	25	int
Smtp.Subject	Subject of the email	No	-	string
Smtp.From	Sender of the email	No	-	string
Smtp.To	Recipient of the email	No	-	string
Smtp.Username	Username for Smtp server	No	-	string
Smtp.Password	Password for Smtp server	No	-	string
Smtp.SSL	Use use SSL for SMTP connection	No	false	bool
Ocr.Scale	Target scaling of the PDF	Yes	300	int
Ocr.Deskew	Correct rotation of the PDF	Yes	True	bool
Ocr.DeliveryNoteDetectionRegex	Regex to detect the delivery note number	Yes	(LIEF).\d*	string
Ocr.DateDetectionRegex	Regex to detect the date	Yes	\d{2}.\d{2}.\d{4}	string
Ocr.TicketIdDetectionRegex	Regex to detect the ticket id	No	(?<=Ticket# )[^.\\s]*	string
Ocr.DateFormat	Format of the dates	Yes	dd.MM.yyyy	string
Ocr.SerialDelimiter	Delimeter between serial numbers	Yes	,	string
Ocr.SerialIndicators	Indicators to detect the lines where serial number are listed	Yes	S/N:	string[]
Ocr.SupportedProducts	Array of supported products (see matrix below)	No	-	object[]
Ocr.CustomLanguageFile	Path to custom tesseract language file	No	-	string

appsettings.json SupportedProducts

Property	Description	Required	Default	Type
ProductName	Search string to find the product in the document	Yes	-	string
Type	i-doit object type	Yes	C__OBJTYPE__MONITOR	string
IdoitPrefix	Prefix for the i-doit object, after the prefix a timestamp will be added	Yes	-	string
Template	Template ID from i-doit which should be used	Yes	-	int

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docker		docker
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
doctoidoit.service		doctoidoit.service

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.vscode

.vscode

docker

docker

src

src

.gitignore

.gitignore

LICENSE.md

LICENSE.md

README.md

README.md

doctoidoit.service

doctoidoit.service

Repository files navigation

DocToIdoit

Description

IronOCR license

Delivery note document layout

Make sure that your documents meet this requirements to be processed correctly.

Requirements

i-doit

PDF

Tesseract language file

Installation (Docker)

Installation (bare metal on Ubuntu or Debian)

DocToIdoit

Documents folder

Performance

appsettings.json Matrix

appsettings.json SupportedProducts

About

Releases 4

Languages

License

Kleinrotti/DocToIdoit

Folders and files

Latest commit

History

Repository files navigation

DocToIdoit

Description

IronOCR license

Delivery note document layout

Make sure that your documents meet this requirements to be processed correctly.

Requirements

i-doit

PDF

Tesseract language file

Installation (Docker)

Installation (bare metal on Ubuntu or Debian)

DocToIdoit

Documents folder

Performance

appsettings.json Matrix

appsettings.json SupportedProducts

About

Topics

Resources

License

Stars

Watchers

Forks

Languages