Skip to content

MaveDataHolic/PySpark-Zip-Unzip-Multi-Part-Files

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

PySpark-Zip-Unzip-Multi-Part-Files

Overview

This repository contains Python scripts for managing zip and unzip operations of multi-part files using PySpark. It is designed to handle large datasets that are distributed across multiple files.

Features

  • zip_unzip_manager.py: A script to manage the zipping and unzipping of files.
  • xml_to_json_converter.py: A script to convert XML files to JSON format.

Requirements

  • PySpark
  • Python 3.x

Installation

Clone the repository to your local machine using the following command: git clone https://github.com/DevStrikerTech/PySpark-Zip-Unzip-Multi-Part-Files.git

Usage

To use the scripts, navigate to the cloned repository's directory and run the desired script with PySpark.

For example: pyspark zip_unzip_manager.py

Contributing

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes.

License

This project is open-sourced under the MIT License. See the LICENSE file for more details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%