Big Data Problem #33

xljhtq · 2018-03-21T11:17:31Z

When I load the file with many data, I have met with a problem. The free memory will be smaller and smaller because of the exitence of sorting algorithm in the preprocessing step. What should I do to optimize it ?

zhiguowang · 2018-04-02T17:01:22Z

I think one solution is to modify the "InstanceBatch" class in "SentenceMatchDataStream.py".
Right now, my code will load all data into memory and pad all variables beforehand (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchDataStream.py#L165). However, the padding part will cost a lot of memory.

One way to fix this is that don't pad variables while loading all data, but conduct the padding procedure right before you use it. This line (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchTrainer.py#L92) may be a good position to insert your padding function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Data Problem #33

Big Data Problem #33

xljhtq commented Mar 21, 2018

zhiguowang commented Apr 2, 2018

Big Data Problem #33

Big Data Problem #33

Comments

xljhtq commented Mar 21, 2018

zhiguowang commented Apr 2, 2018