Skip to content

DevStrikerTech/PySpark-Zip-Unzip-Multi-Part-Files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

PySpark-Zip-Unzip-Multi-Part-Files

Overview

This repository contains Python scripts for managing zip and unzip operations of multi-part files using PySpark. It is designed to handle large datasets that are distributed across multiple files.

Features

  • zip_unzip_manager.py: A script to manage the zipping and unzipping of files.
  • xml_to_json_converter.py: A script to convert XML files to JSON format.

Requirements

  • PySpark
  • Python 3.x

Installation

Clone the repository to your local machine using the following command: git clone https://github.com/DevStrikerTech/PySpark-Zip-Unzip-Multi-Part-Files.git

Usage

To use the scripts, navigate to the cloned repository's directory and run the desired script with PySpark.

For example: pyspark zip_unzip_manager.py

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.

Releases

No releases published

Packages

No packages published

Languages