Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

audiovention
Copy link

I'm working on a project which needed those operations.
tanh was parallelized in the same manner as other unary ops

ADD is generalized to allow for ggml_can_repeat constraint, instead of the ggml_can_repeat_rows
This was done adding two extra branches in the function, one of them is likely very slow and handles the most general case. The second is particularly optimized for my project's need (adding MxN and 1xP tensors) and uses ggml_vec_add1_f32.

Comment on lines +9383 to 9402
} else {
// all are not contiguous
for (int ie = ie0; ie < ie1; ++ie) {
// src1 is broadcastable across src0 and dst in i1, i2, i3
const int64_t i03 = ie/(ne02*ne01*ne00);
const int64_t i02 = (ie - i03*ne02*ne01*ne00)/(ne01*ne00);
const int64_t i01 = (ie - i03*ne02*ne01*ne00 - i02*ne01*ne00);
const int64_t i00 = (ie - i03*ne02*ne01*ne00 - i02*ne01*ne00 - i01*ne00);

const int64_t i13 = i03 % ne13;
const int64_t i12 = i02 % ne12;
const int64_t i11 = i01 % ne11;
const int64_t i10 = i00 % ne10;

float * dst_ptr = (float *) ((char *) dst->data + i03*nb3 + i02*nb2 + i01*nb1 + i00*nb0 );
float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00);
float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i10*nb10);

*dst_ptr = *src0_ptr + *src1_ptr;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch will probably be very slow - most of the computation will probably go into computing the indices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants