Theoretical exam scraper

This repository is dedicated to scraping www.praktycznyegzamin.pl exam for questions with answers using gocolly.

Prerequisites

Go programming language installed on your machine.
Access to the internet to download dependencies.

Setup

Clone or download the Go web scraper repository.
Navigate to the root directory of the project.

Installation

Run the following command to download the necessary dependencies:

make init

or

go mod download

Usage

You can run web scraper with following command:

make run

Makefile Variables for `make run`

1. `OUTPUTFILE`

Description: Defines the output directory where the scraped data will be stored.
Example: OUTPUTFILE=_out

2. `URL`

Description: Specifies the URL of the website that will be scraped.
Example: URL=https://www.praktycznyegzamin.pl/inf04/teoria/wszystko/

3. `REMOVETITLEPREFIX`

Description: Determines whether the scraper should remove title prefixes during scraping.
- true: Title prefixes will be removed.
- false: Title prefixes will not be removed.
Example: REMOVETITLEPREFIX=true

4. `REMOVEANSWERPREFIX`

Description: Controls whether the scraper should remove answer prefixes during scraping.
- true: Answer prefixes will be removed.
- false: Answer prefixes will not be removed.
Example: REMOVEANSWERPREFIX=true

Output Structure

The scraper will generate the following structure in the specified output directory:

_out
│   ├── images
│   │   ├── <image1>.jpg
│   │   ├── <image2>.jpg
│   │   ├── ...
│   ├── questions.json
│   └── videos
│       ├── <video1>.mp4
│       ├── <video2>.mp4
│       ├── ...

Questions data

questions.json file will contain list of Question with titles, answers and sometimes images or videos for additional context.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cmd/egzamin-teoretyczny-scraper		cmd/egzamin-teoretyczny-scraper
internal/egzamin-teoretyczny-scraper		internal/egzamin-teoretyczny-scraper
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Theoretical exam scraper

Prerequisites

Setup

Installation

Usage

Makefile Variables for `make run`

1. `OUTPUTFILE`

2. `URL`

3. `REMOVETITLEPREFIX`

4. `REMOVEANSWERPREFIX`

Output Structure

Questions data

About

Releases

Packages

Languages

ksawio97/egzamin-teoretyczny-scraper

Folders and files

Latest commit

History

Repository files navigation

Theoretical exam scraper

Prerequisites

Setup

Installation

Usage

Makefile Variables for make run

1. OUTPUTFILE

2. URL

3. REMOVETITLEPREFIX

4. REMOVEANSWERPREFIX

Output Structure

Questions data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Makefile Variables for `make run`

1. `OUTPUTFILE`

2. `URL`

3. `REMOVETITLEPREFIX`

4. `REMOVEANSWERPREFIX`

Packages