-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter是否支持batch处理,以及怎么设置batch_size? #285
Comments
Hi, 我们这里的batch主要考虑mapper中一个样本生成多个样本的情况,返回时需要包装成batch,目前只有mapper支持batch功能,且输入batch大小固定为1。确实每个类型的op都应支持batch比较合理,且batch大小的设置应该开放给用户。但是这边用户可能需要考虑一下打batch的开销,如果batch_op的加速不足以cover住这部分开销可能速度会更慢。 |
batch的开销有什么呢?内存占用? |
是的,内存是一个点,并行度相同的情况下,batch size越大,同时在处理的数据越多,内存占用可能越大。 目前大部分Filter算子能力暂时都只支持单样本依次处理,增加batch size带来的加速空间相对来说没有那么大,在内存等资源允许的情况下,不如增大并行度np。 此外,部分Mapper为batched OP的原因主要为这些Mapper是用来进行数据增强或者数据生成的,因此不同于普通Mapper的1->1的映射过程,它需要一个1->N映射过程,我们这里使用batch化来支持这种新类型。 |
This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day. |
Close this stale issue. |
Before Asking 在提问之前
I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
使用最新版本的docker镜像:v0.2.0
filter算子是否支持batch处理?我按照文档设置了self._batched_op = True,但是compute_stats中读取到的样本并不是列表,比较后发现Mapper类定义了is_batched_op方法而Filter类没有,我仿照Mapper类增加了is_batched_op方法后compute_stats可以读取到列表,但是列表长度为1,这样也无法提高自定义算子的效率。请问怎么设置batch的大小?
Additional 额外信息
No response
The text was updated successfully, but these errors were encountered: