Export CSV To Influx: Process CSV data, and write the data to influx db
- Influx 0.x, 1.x
- influx 2.x: Start supporting from 0.2.0
Important Note: Influx 2.x has build-in csv write feature, it is more powerful: https://docs.influxdata.com/influxdb/v2.1/write-data/developer-tools/csv/
Use the pip to install the library. Then the binary export_csv_to_influx is ready.
pip install ExportCsvToInflux
- [Highlight 🌟🎉😍] Allow to use binary export_csv_to_influx to run exporter
- [Highlight 🌟🎉😍] Allow to check dozens of csv files in a folder
- [Highlight 🌟🎉😍🎊🍀🎈] Auto convert csv data to int/float/string in Influx
- [Highlight 🌟🎉😍] Allow to match or filter the data by using string or regex.
- [Highlight 🌟🎉😍] Allow to count, and generate count measurement
- Allow to limit string length in Influx
- Allow to judge the csv has new data or not
- Allow to use the latest file modify time as time column
- Auto Create database if not exist
- Allow to drop database before inserting data
- Allow to drop measurements before inserting data
You could use export_csv_to_influx -h
to see the help guide.
Note:
- You could pass
*
to --field_columns to match all the fields:--field_columns=*
,--field_columns '*'
- CSV data won't insert into influx again if no update. Use to force insert, default True:
--force_insert_even_csv_no_update=True
,--force_insert_even_csv_no_update True
- If some csv cells have no value, auto fill the influx db based on column data type:
int: -999
,float: -999.0
,string: -
# | Option | Mandatory | Default | Description |
---|---|---|---|---|
1 | -c, --csv |
Yes | CSV file path, or the folder path | |
2 | -db, --dbname |
For 0.x, 1.x only: Yes | InfluxDB Database name | |
3 | -u, --user |
For 0.x, 1.x only: No | admin | InfluxDB User name |
4 | -p, --password |
For 0.x, 1.x only: No | admin | InfluxDB Password |
5 | -org, --org |
For 2.x only: No | my-org | For 2.x only, my org |
6 | -bucket, --bucket |
For 2.x only: No | my-bucket | For 2.x only, my bucket |
7 | -http_schema, --http_schema |
For 2.x only: No | http | For 2.x only, influxdb http schema, could be http or https |
8 | -token, --token |
For 2.x only: Yes | For 2.x only, n | |
9 | -m, --measurement |
Yes | Measurement name | |
10 | -fc, --field_columns |
Yes | List of csv columns to use as fields, separated by comma | |
11 | -tc, --tag_columns |
No | None | List of csv columns to use as tags, separated by comma |
12 | -d, --delimiter |
No | , | CSV delimiter |
13 | -lt, --lineterminator |
No | \n | CSV lineterminator |
14 | -s, --server |
No | localhost:8086 | InfluxDB Server address |
15 | -t, --time_column |
No | timestamp | Timestamp column name. If no timestamp column, the timestamp is set to the last file modify time for whole csv rows. Note: Also support the pure timestamp, like: 1517587275. Auto detected |
16 | -tf, --time_format |
No | %Y-%m-%d %H:%M:%S | Timestamp format, see more: https://strftime.org/ |
17 | -tz, --time_zone |
No | UTC | Timezone of supplied data |
18 | -b, --batch_size |
No | 500 | Batch size when inserting data to influx |
19 | -lslc, --limit_string_length_columns |
No | None | Limit string length column, separated by comma |
20 | -ls, --limit_length |
No | 20 | Limit length |
21 | -dd, --drop_database |
Compatible with 2.x: No | False | Drop database or bucket before inserting data |
22 | -dm, --drop_measurement |
No | False | Drop measurement before inserting data |
23 | -mc, --match_columns |
No | None | Match the data you want to get for certain columns, separated by comma. Match Rule: All matches, then match |
24 | -mbs, --match_by_string |
No | None | Match by string, separated by comma |
25 | -mbr, --match_by_regex |
No | None | Match by regex, separated by comma |
26 | -fic, --filter_columns |
No | None | Filter the data you want to filter for certain columns, separated by comma. Filter Rule: Any one filter success, the filter |
27 | -fibs, --filter_by_string |
No | None | Filter by string, separated by comma |
28 | -fibr, --filter_by_regex |
No | None | Filter by regex, separated by comma |
29 | -ecm, --enable_count_measurement |
No | False | Enable count measurement |
30 | -fi, --force_insert_even_csv_no_update |
No | True | Force insert data to influx, even csv no update |
31 | -fsc, --force_string_columns |
No | None | Force columns as string type, separated as comma |
32 | -fintc, --force_int_columns |
No | None | Force columns as int type, separated as comma |
33 | -ffc, --force_float_columns |
No | None | Force columns as float type, separated as comma |
34 | -uniq, --unique |
No | False | Write duplicated points |
35 | --csv_charset, --csv_charset |
No | None | The csv charset. Default: None, which will auto detect |
Also, we could run the exporter programmatically.
from ExportCsvToInflux import ExporterObject
exporter = ExporterObject()
exporter.export_csv_to_influx(...)
# You could get the export_csv_to_influx parameter details by:
print(exporter.export_csv_to_influx.__doc__)
- Here is the demo.csv
timestamp,url,response_time
2022-03-08 02:04:05,https://jmeter.apache.org/,1.434
2022-03-08 02:04:06,https://jmeter.apache.org/,2.434
2022-03-08 02:04:07,https://jmeter.apache.org/,1.200
2022-03-08 02:04:08,https://jmeter.apache.org/,1.675
2022-03-08 02:04:09,https://jmeter.apache.org/,2.265
2022-03-08 02:04:10,https://sample-demo.org/,1.430
2022-03-08 03:54:13,https://sample-show.org/,1.300
2022-03-07 04:06:00,https://sample-7.org/,1.289
2022-03-07 05:45:34,https://sample-8.org/,2.876
- Command samples
# | Description | Influx 0.x, 1.x | Influx 2.x |
---|---|---|---|
1 | Write whole data into influx |
export_csv_to_influx \ |
export_csv_to_influx \ |
2 | Write whole data into influx, but: drop database or bucket |
export_csv_to_influx \ |
// The Read/Write API Token cannot create bucket. Before you using the --drop_database, make sure your toke have the access |
3 | Write part of data: timestamp matches 2022-03-07 and url matches sample-\d+ |
export_csv_to_influx \ |
export_csv_to_influx \ |
4 | Filter part of data, and write into influx: url filters sample |
export_csv_to_influx \ |
export_csv_to_influx \ |
5 | Enable count measurement. A new measurement named: demo.count generated, with match: timestamp matches 2022-03-07 and url matches sample-\d+ |
export_csv_to_influx \ |
export_csv_to_influx \ |
-
If enable the count measurement, the count measurement is:
// Influx 0.x, 1.x select * from "demo.count" name: demo.count time match_timestamp match_url total ---- --------------- --------- ----- 1562957134000000000 3 2 9 // Influx 2.x: For more info about Flux, see https://docs.influxdata.com/influxdb/v2.1/query-data/flux/ influx query 'from(bucket:"my-bucket") |> range(start:-100h) |> filter(fn: (r) => r._measurement == "demo.count")' --raw #group,false,false,true,true,false,false,true,true #datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,long,string,string #default,_result,,,,,,, ,result,table,_start,_stop,_time,_value,_field,_measurement ,,2,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,2,match_timestamp,demo.count ,,3,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,2,match_url,demo.count ,,4,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,9,total,demo.count
The lib is inspired by: https://github.com/fabio-miranda/csv-to-influxdb