Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting gibberish predictions when using recurrent LSTM and arrays of strings as output training data #799

Open
chrisvel opened this issue May 20, 2022 · 2 comments
Assignees

Comments

@chrisvel
Copy link

I have a bunch of sentences and I want to generate tags for each one. I have my data which are simple:

[{
 input: "Buy tickets for opera",
 output: ['errands', 'orders']
},
{
 input: "Clean garage",
 output: ['errands', 'home']
},
....

I am training the model simply with :

const network = new brain.recurrent.LSTM();

  network.train(trainingData, {
    (error) => console.log(error),
    iterations: 1000,
  });

When I run a:

network.run('Some random text');

sometimes I get an array with a correct tag, but other times it returns gibberish with random characters or the tags joined together in a string, for example the sentence "Service my XBox dvd drive" returns this output:

["sco comhermer fililys.AAouto afamily.shopping"] 

I read somewhere that LSTM cannot classify so I am ok with this but what do you suggest?

Will something like making a matching table with numbers and tag words work? Something like:

1: orders
2: errands
3: family 
4: personal 
.....

and then feeding the output of my training data with numbers ?

Is it something not expected to work as I am hoping to or is it a bug?

@nilooy
Copy link

nilooy commented Jun 11, 2022

same for me, i'm passing strings in both , input and output object, but when run it, it gives mixed/gibrish output.

@purplnay
Copy link

purplnay commented Jan 2, 2023

The gibberish is probably due to 2 reasons, the model learns to read and write character by character an not word by word, the second I would say is that 1,000 iterations seems quite low for training.
For now I would recommend splitting your input into words, with a space after each, and same for your output, add a space so they don't collapse like "errandsorders".

I've filled an issue for a feature request here #871 since I think this is a recurrent issue that is not so well documented.

@robertleeplummerjr robertleeplummerjr self-assigned this Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants