How to get 'sbert_x.pt' and other files for new dataset ? #19

naskk1 · 2024-10-03T06:36:25Z

If I want to train model on other dataset, how can I get model parameter file such as 'sbert_x.pt' in the dataset, it seems that there are no code for this.

ChenRunjin · 2024-10-03T18:18:04Z

Hi, you can find the code to generate sbert embedding in utils/data_process.py get_sbert_embedding() function.

naskk1 · 2024-10-04T12:05:58Z

Thank you for your reply, but what is the input text template for embedding? I think this is different from the paradigm for Q&A. I have tried some, but it doesn't work well.

ManuelSerna · 2024-10-09T20:17:35Z

Hi,

May I ask how one would obtain the .jsonl files? Also the file processed_data.pt for a new dataset?

Thank you for your time.

ChenRunjin · 2024-10-09T20:37:03Z

Hi, due to variations in the raw data formats across different datasets, we don't have a single unified function for generating processed_data.pt. To create processed_data.pt for new datasets, you only need to generate a Data instance in PyG format, ensuring that edge_index is included in this instance. And ensuring data.label_texts to include all label name, data.raw_texts to include node text feature if you want to train on node description task.

In general, we follow the guidelines from this repo to generate the edge_index.

To create the *.jsonl file, the main task is to generate the node sequence that represents the structure surrounding each node using template in *.jsonl. You can use our get_fix_shape_subgraph_sequence_fast function in utils/data_process.py to generate the node sequence.

honey0219 · 2024-10-16T06:03:14Z

After obtaining the processed_data.pt, sampled_2_10_test.jsonl, sampled_2_10_train.jsonl, and sampled_2_10_val.jsonl files for a new dataset, could you please let me know what additional steps I should take to run experiments in the "single focus" setting for the node classification task on the new dataset?
Thank you for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get 'sbert_x.pt' and other files for new dataset ? #19

How to get 'sbert_x.pt' and other files for new dataset ? #19

naskk1 commented Oct 3, 2024

ChenRunjin commented Oct 3, 2024

naskk1 commented Oct 4, 2024

ManuelSerna commented Oct 9, 2024

ChenRunjin commented Oct 9, 2024

honey0219 commented Oct 16, 2024 •

edited

Loading

How to get 'sbert_x.pt' and other files for new dataset ? #19

How to get 'sbert_x.pt' and other files for new dataset ? #19

Comments

naskk1 commented Oct 3, 2024

ChenRunjin commented Oct 3, 2024

naskk1 commented Oct 4, 2024

ManuelSerna commented Oct 9, 2024

ChenRunjin commented Oct 9, 2024

honey0219 commented Oct 16, 2024 • edited Loading

honey0219 commented Oct 16, 2024 •

edited

Loading