This repository contains Python scripts for managing zip and unzip operations of multi-part files using PySpark. It is designed to handle large datasets that are distributed across multiple files.
- zip_unzip_manager.py: A script to manage the zipping and unzipping of files.
- xml_to_json_converter.py: A script to convert XML files to JSON format.
- PySpark
- Python 3.x
Clone the repository to your local machine using the following command:
git clone https://github.com/DevStrikerTech/PySpark-Zip-Unzip-Multi-Part-Files.git
To use the scripts, navigate to the cloned repository's directory and run the desired script with PySpark.
For example:
pyspark zip_unzip_manager.py
Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.
This project is open-sourced under the MIT License. See the LICENSE file for more details.