Oracle-to-S3 data uploader.

Ground to cloud data integration tool.
Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).

NOTE: You need Oracle Instant Client in order to use it (sqlplus.exe has to be in system path)

For clientless cx_Oracle version check Oracle-to-S3-Backup script or Snowpipe-for-Oracle.

Features:

Streams Oracle table (query) data to Amazon-S3.
No need to create CSV extracts before upload to S3.
Data stream is compressed while upload to S3.
No need for Amazon AWS CLI.
AWS Access Keys are not passed as arguments.
Written using Python/boto/PyInstaller.

Other scripts

ApLogr Eyeball your Linux shell batch logs on Windows.
DataWorm for Oracle ad-hoc backup.
TableHunter for Oracle Win OS sloole
PostgreSQL -> Redshift - data loader
MySQL -> Redshift - data loader
Oracle -> S3 - data loader
EC2 Metcics Plotter
Oracle->Oracle - data loader.
Oracle->MySQL - data loader.
CSV->S3 - data uploader.

Purpose

Stream (upload) Oracle table data to Amazon-S3.

How it works

Tool connects to source Oracle DB and opens data pipe for reading.
Data is pumped to S3 using multipart upload.
Optional upload to Reduced Redundancy storage (not RR by default).
Optional "make it public" after upload (private by default)
If doesn't, bucket is created
You can control the region where new bucket is created
Streamed data can be tee'd (dumped on disk) during upload.
If not set, S3 Key defaulted to query file name.
It's a Python/boto script
- Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html
Executable is created using [pyInstaller] (http://www.pyinstaller.org/)

Audience

Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps, DataOps

Designated Environment

Pre-Prod (UAT/QA/DEV)

Usage

c:\Python35-32\PROJECTS\Ora2S3>dist\oracle_to_s3_uploader.exe
#############################################################################
#Oracle to S3 Data Uploader (v1.2, beta, 04/05/2016 15:11:53) [64bit]
#Copyright (c): 2016 Alex Buzunov, All rights reserved.
#Agreement: Use this tool at your own risk. Author is not liable for any damages
#           or losses related to the use of this software.
################################################################################
Usage:
  set AWS_ACCESS_KEY_ID=<you access key>
  set AWS_SECRET_ACCESS_KEY=<you secret key>
  set ORACLE_LOGIN=tiger/scott@orcl
  set ORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1

  oracle_to_s3_uploader.exe [<ora_query_file>] [<ora_col_delim>] [<ora_add_header>]
                            [<s3_bucket_name>] [<s3_key_name>] [<s3_use_rr>] [<s3_public>]

        --ora_query_file -- SQL query to execure in source Oracle db.
        --ora_col_delim  -- CSV column delimiter (|).
        --ora_add_header -- Add header line to CSV file (False).
        --ora_lame_duck  -- Limit rows for trial upload (1000).
        --create_data_dump -- Use it if you want to persist streamed data on your filesystem.

        --s3_bucket_name -- S3 bucket name (always set it).
        --s3_location    -- New bucket location name (us-west-2)
                                Set it if you are creating new bucket
        --s3_key_name    -- CSV file name (to store query results on S3).
                if <s3_key_name> is not specified, the oracle query filename (ora_query_file) will be used.
        --s3_use_rr -- Use reduced redundancy storage (False).
        --s3_write_chunk_size -- Chunk size for multipart upoad to S3 (10<<21, ~20MB).
        --s3_public -- Make uploaded file public (False).

        Oracle data uploaded to S3 is always compressed (gzip).

Example

Environment variables

Set the following environment variables (for all tests): set_env.bat:

set AWS_ACCESS_KEY_ID=<you access key>
set AWS_SECRET_ACCESS_KEY=<you secret key>

set ORACLE_LOGIN=tiger/scott@orcl
set ORACLE_CLIENT_HOME=C:\\app\\oracle12\\product\\12.1.0\\dbhome_1

Test upload with data dump.

In this example complete table test2 get's uploaded to Aamzon-S3 as compressed CSV file.

Contents of the file table_query.sql:

SELECT * FROM test2;

Also temporary dump file is created for analysis (by default there are no files created) Use -s, --create_data_dump to dump streamed data.

If target bucket does not exists it will be created in user controlled region. Use argument -t, --s3_location to set target region name

Contents of the file test.bat:

dist\oracle_to_s3_uploader.exe ^
	-q table_query.sql ^
	-d "|" ^
	-e ^
	-b test_bucket ^
	-k oracle_table_export ^
	-r ^
	-p ^
	-s

Make sure you can call sqllplus from command line

C:\Users\alex_buz>sqlplus.exe 
SQL*Plus: Release 12.1.0.2.0 Production on Sat Oct 29 09:24:43 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter user-name:

Executing test.bat:

c:\Python35-32\PROJECTS\Ora2S3>dist\oracle_to_s3_uploader.exe   -q table_query.sql      -d "|"  -e      -b test_bucket       -k oracle_table_export  -r      -p      -s
Uploading results of "table_query.sql" to existing bucket "test_bucket"
Dumping data to: c:\Python35-32\PROJECTS\Ora2S3\data_dump\table_query\test_bucket\oracle_table_export.20160405_235310.gz
1 chunk 10.0 GB [8.95 sec]
2 chunk 5.94 GB [5.37 sec]
Uncompressed data size: 15.94 GB
Compressed data size: 63.39 MB
Upload complete (17.58 sec).
Your PUBLIC upload is at: https://s3-us-west-2.amazonaws.com/test_bucket/oracle_table_export.gz

Download

git clone https://github.com/alexbuz/Oracle_To_S3_Data_Uploader
Master Release -- oracle_to_s3_uploader 1.2

FAQ

I'm getting the error

"""WindowsError: [Error2] System cannot find file specified""" Solution is to call sqlplus from command line to make sure it's in system path and tool can find it when it's opening data extract shell.

C:\Users\alex_buz>sqlplus.exe 
SQL*Plus: Release 12.1.0.2.0 Production on Sat Oct 29 09:24:43 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Enter user-name:

Can it load Oracle data to Amazon S3 file?

Yes, it is the main purpose of this tool.

Can developers integrate `Oracle_To_S3_Data_Uploader` into their ETL pipelines?

Yes. Assuming they are doing it on OS Windows.

How fast is data upload using `CSV Loader for Redshift`?

As fast as any implementation of multi-part load using Python and boto.

####How to inscease upload speed? Input data stream is getting compressed before upload to S3. So not much could be done here. You may want to run it closer to source or target for better performance.

What are the other ways to move large amounts of data from Oracle to S3?

You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.

Does it create temporary data file to facilitate data load to S3?

No

Can I extract data from RDS Oracle to Amazon S3 using this tool?

Yes, but whatch where you invoke it. If you execute it outside of AWS you may get data charges. You should spawn OS Windows EC2 instance in the same Availability Zone with your DRS Oracle. Login to new EC2 instance usig Remote Desktop, download Oracle_To_S3_Data_Uploader and run it in CLI window.

Can I log transfered data for analysis?

Yes, Use -s, --create_data_dump to dump streamed data.

Explain first step of data transfer?

The query file you provided is used to select data form target Oracle server. Stream is compressed before load to S3.

Explain second step of data transfer?

Compressed data is getting uploaded to S3 using multipart upload protocol.

What technology was used to create this tool

I used SQLPlus, Python, Boto to write it. Boto is used to upload file to S3. SQLPlus is used to spool data to compressor pipe.

Where are the sources?

You can download sources here.

Can you modify functionality and add features?

Yes, please, ask me for new features.

Teardown

https://github.com/pydemo/teardown

Snowpipe

https://github.com/pydemo/Snowpipe-For-SQLServer

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
dist-32bit		dist-32bit
dist-64bit		dist-64bit
hooks		hooks
info		info
Crime.csv		Crime.csv
HEAD		HEAD
README.md		README.md
config		config
description		description
oracle_to_s3_uploader.py		oracle_to_s3_uploader.py
query.sql		query.sql
table_query.sql		table_query.sql
test.bat		test.bat
test.sql		test.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oracle-to-S3 data uploader.

Other scripts

Purpose

How it works

Audience

Designated Environment

Usage

Example

Environment variables

Test upload with data dump.

Download

FAQ

I'm getting the error

Can it load Oracle data to Amazon S3 file?

Can developers integrate `Oracle_To_S3_Data_Uploader` into their ETL pipelines?

How fast is data upload using `CSV Loader for Redshift`?

What are the other ways to move large amounts of data from Oracle to S3?

Does it create temporary data file to facilitate data load to S3?

Can I extract data from RDS Oracle to Amazon S3 using this tool?

Can I log transfered data for analysis?

Explain first step of data transfer?

Explain second step of data transfer?

What technology was used to create this tool

Where are the sources?

Can you modify functionality and add features?

Teardown

Snowpipe

About

Releases

Packages

Languages

oracletools/oracle-to-s3-uploader

Folders and files

Latest commit

History

Repository files navigation

Oracle-to-S3 data uploader.

Other scripts

Purpose

How it works

Audience

Designated Environment

Usage

Example

Environment variables

Test upload with data dump.

Download

FAQ

I'm getting the error

Can it load Oracle data to Amazon S3 file?

Can developers integrate Oracle_To_S3_Data_Uploader into their ETL pipelines?

How fast is data upload using CSV Loader for Redshift?

What are the other ways to move large amounts of data from Oracle to S3?

Does it create temporary data file to facilitate data load to S3?

Can I extract data from RDS Oracle to Amazon S3 using this tool?

Can I log transfered data for analysis?

Explain first step of data transfer?

Explain second step of data transfer?

What technology was used to create this tool

Where are the sources?

Can you modify functionality and add features?

Teardown

Snowpipe

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Can developers integrate `Oracle_To_S3_Data_Uploader` into their ETL pipelines?

How fast is data upload using `CSV Loader for Redshift`?

Packages