doc2txt

doc2txt is a tool designed to extract text from .doc files within a specified directory, this may be useful for recovering the text from corrupted .doc files

Features

File Handling

Initially attempts to load files in a 1024-byte buffer. Files smaller than this buffer are logged as errors and skipped, optimizing the process for compatible file sizes.

Signature Verification

Reads byte offset 0x200 (512) to verify the .doc file signature, ensuring that only valid .doc files are processed.

Text Extraction

Targets the byte stream starting from offset 0xA00, where the actual text content is located, up to the point where null bytes are encountered to stop the read operation.

Output Management

Extracted text is written to .txt files in the specified output directory.

Logs

The process generates two types of log files for monitoring and troubleshooting:

error_log.txt: Captures expected errors, such as issues with the initial file buffering, for transparency and debugging.
successfully_recovered_log.txt: Logs all successfully processed files, providing a clear record of the operation's success.

Usage

doc2txt -d {directory of .doc files} -o {output directory}

Options

-h: Displays help information, listing all available options.
-d: Specifies the directory containing the .doc files to be processed.
-o: Defines the destination directory for the extracted text files.

Getting Started

Ensure that your environment is set up to run executable files.
Place doc2txt executable in a convenient location.
Open a command-line interface and navigate to the directory containing doc2txt.
Use the above command examples to start extracting text from your .doc files.

Contribution

Contributions are welcome! If you have suggestions for improvements or have identified bugs, please feel free to submit an issue or pull request on the project's GitHub page.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doc2txt

Features

File Handling

Signature Verification

Text Extraction

Output Management

Logs

Usage

Options

Getting Started

Contribution

About

Releases 1

Languages

c-sleuth/doc2txt

Folders and files

Latest commit

History

Repository files navigation

doc2txt

Features

File Handling

Signature Verification

Text Extraction

Output Management

Logs

Usage

Options

Getting Started

Contribution

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages