go-numcalc

Read this in other languages: 简体中文

The go-numcalc package is a package developed in Go language. Its main function is to perform basic numerical processing such as data conversion, data grouping, and data smoothing for numerical type data. All functions support only two data types: Int32 and Float32.

Tip

This package is written in Go language and contains part of C++, that is, the cgo part

go-numcalc

Installation

Use the go get to install go-numcalc.

go get github.com/dingyuqi/go-numcalc

Dependencies

There are two main external libraries used in the NumCalc package:

Go language: Gonum
C++ language: Armadillo

Armadillo does not need to be installed and is called in Cgo as a static library.

Usage

The following is a simple example of the LogInt32() method in conversion. The usage of other functions is consistent with this Demo.

Call the NewCalculator() method in conversion to initialize a numerical conversion object.
Call the LogInt32() method. This method will return the logarithm calculation result of the corresponding subscript of the slice data.

package main

import (
	"github.com/dingyuqi/go-numcalc/src/conversion"
	"log"
)

func main() {
	data := []int32{1, 2, 3, 4, 5}
	c, err := conversion.NewCalculator()
	result, err := c.LogInt32(data)
	if err != nil {
		log.Fatal(err)
		return
	}
	log.Println("result is: ", result)
}

Tip

This test code is in example/example_test.go and can be run directly.

Project structure

src: src code
- binning: data binning functions
- conversion: data conversion functions
- outlier: data outlier functions
- smoothness: data smoothness functions
pkg: Used to place static compilation files and related header files of the C++ language Armadillo library.
example: Used to place Cgo calling samples and Go language calling samples.
test: Unit test data and test cases.

Tip

If you want to test Cgo, you can directly execute the following command in the example folder:

 go build ./example.exe

example_test.go contains a pure Go language implementation of the same function (logarithmic calculation), used to compare the calculation speed of Cgo.

Development

As of August 2023, only the first phase of functions has been implemented, and the implementation language is all based on the Go language.

Phase I functions

Serial number	Type	Function	Detailed description of function	Remarks
1	Data conversion	Minimum and maximum standardization	Perform a linear transformation on the data series so that the processed data all fall within the interval [0, 1]
2	Data conversion	Z-score standardization	Subtract the mean and divide by the variance for each data point in the data series so that the processed data approximately conforms to the standard normal distribution of (0, 1)
3	Data conversion	Logarithmic transformation	$y = \log_{base}{x}$	1. Negative value processing 2. base value
4	Data conversion	Square root transformation	$y = \sqrt{x}$	Negative value processing
5	Data Grouping	Cluster Grouping	Use cluster analysis methods to group data points into clusters with similar characteristics. Cluster grouping can be used to discover clustering patterns and categories in the data, which is useful for data mining and classification tasks.	1. Clustering method (random_subset, static_subset, etc.) 2. Number of clusters
6	Data Grouping	Equal Width Grouping	Divide the value range of the data into intervals of equal width. This method is simple and intuitive, but may not reflect the distribution characteristics of the data well, especially when there are imbalanced data or outliers.	Group Width
7	Data Grouping	Equal Frequency Grouping	Divide the data into groups containing the same number of data points. This method can better consider the distribution characteristics of the data, but for data containing a large number of repeated values, it may cause some groups to have the same values.	Number of Groups
8	Data Grouping	Grouping Based on Statistics	Divide the data into groups based on the quantiles of the data. Common methods include quartile grouping, decile grouping, etc. This method can divide data into groups with the same data density, which is more effective for skewed distribution data.	Grouping conditions (Not implemented temporarily due to overlap with the equal frequency grouping function)
9	Outlier judgment	Standard deviation	By calculating the difference between the standard deviation of the data point and the mean, values exceeding a certain threshold are considered outliers. Usually, values exceeding 3 times the standard deviation are considered outliers	1. true indicates an outlier 2. false indicates a non-outlier 3. Threshold for outlier judgment
10	Outlier judgment	Box plot	According to the quartiles and outlier range of the data, values beyond the upper and lower boundaries are considered outliers.	1. true indicates an outlier 2. false indicates a non-outlier

Phase II functions (TODO)

Serial number	Type	Function	Function description	Remarks
1	Data smoothing	Wavelet filtering	Decompose and reconstruct signals by applying wavelet transform to remove noise or mutations and retain important features in the signal. Wavelet filtering provides better analysis and processing capabilities in both time and frequency domains.	Different base functions have a great impact on the results. Different data need to choose different base functions and frequency ranges according to the analysis requirements. 1. Wavelet basis function (Daubechies wavelet, Haar wavelet, Morlet wavelet) 2. Scale parameter (determines the scaling factor of each wavelet basis function in the wavelet transform; a smaller scale parameter can capture higher frequency and detailed signal characteristics, while a larger scale parameter can capture lower frequency and overall trend signal characteristics) 3. Decomposition level (determines the order of wavelet transform; a higher decomposition level can provide more detailed frequency and scale information) 4. Threshold processing method (keep/discard)
2	Data smoothing	Moving average	This method smoothes the data by calculating the average value within a certain window size around the data point. The window size determines the degree of smoothing, and a larger window will smooth more fluctuations. Common moving averages include simple moving average and weighted moving average.	If the boundary cannot completely construct a window that meets the window size, the data points at these boundaries are usually removed in the output result, resulting in unequal lengths. 1. Window size 2. Weight Boundary processing method
3	Data smoothing	Exponential smoothing	Exponential smoothing is a recursive smoothing method that gives a higher weight to recent data. The weight of past observations is controlled by specifying a smoothing coefficient, where the larger the smoothing coefficient, the greater the impact on recent data. Exponential smoothing methods are often used to smooth time series data.	Smoothing factor
4	Data smoothing	Savitzky-Golay smoothing	This is a smoothing method based on polynomial fitting, which smoothes data by fitting neighboring data around the data point to a polynomial curve. The Savitzky-Golay smoothing method can retain the overall shape and trend of the data and has a good noise suppression effect.	1. Smoothing window 2. Polynomial order 3. Derivative order (optional)
5	Data smoothing	Loess smoothing	Similar to Lowess smoothing, Loess smoothing is also a nonparametric local regression method. It smoothes data by fitting a polynomial to the neighboring data around the data point. Unlike Lowess smoothing, Loess smoothing uses adaptive weighted least squares to better handle nonlinear relationships in the data.	1. Smoothing coefficient (controls the weight given to past observations) 2. Weighting function (default in the library)
6	Data smoothing	Lowess smoothing	Lowess smoothing is a nonparametric local regression method that smooths data by fitting a local linear regression model. The method uses weighted least squares to estimate the smoothed value of a data point, with weights assigned based on how far away the data point is.	1. Smoothing coefficient (controls the weight given to past observations) 2. Weighting function (default in the library)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
example		example
pkg		pkg
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-numcalc

Installation

Dependencies

Usage

Project structure

Development

Phase I functions

Phase II functions (TODO)

License

About

Releases 1

Packages

Languages

License

dingyuqi/go-numcalc

Folders and files

Latest commit

History

Repository files navigation

go-numcalc

Installation

Dependencies

Usage

Project structure

Development

Phase I functions

Phase II functions (TODO)

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages