Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 1.01 KB

File metadata and controls

29 lines (20 loc) · 1.01 KB

PySpark-Zip-Unzip-Multi-Part-Files

Overview

This repository contains Python scripts for managing zip and unzip operations of multi-part files using PySpark. It is designed to handle large datasets that are distributed across multiple files.

Features

  • zip_unzip_manager.py: A script to manage the zipping and unzipping of files.
  • xml_to_json_converter.py: A script to convert XML files to JSON format.

Requirements

  • PySpark
  • Python 3.x

Installation

Clone the repository to your local machine using the following command: git clone https://github.com/DevStrikerTech/PySpark-Zip-Unzip-Multi-Part-Files.git

Usage

To use the scripts, navigate to the cloned repository's directory and run the desired script with PySpark.

For example: pyspark zip_unzip_manager.py

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.