Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

预处理中clean_data函数的作用 #30

Open
so-coolboy opened this issue Jun 28, 2021 · 2 comments
Open

预处理中clean_data函数的作用 #30

so-coolboy opened this issue Jun 28, 2021 · 2 comments

Comments

@so-coolboy
Copy link

马老师,请问一下,preprocess文件夹下的convert_raw_data.py中的clean_data函数的作用是什么?是因为有的数据中包含多个触发词,所以要单独取出来吗,取得时候设置距离触发词前后的距离为40个字符?

@wjy3326
Copy link

wjy3326 commented Nov 12, 2021

同问,你知道答案了吗

@ItGirls
Copy link

ItGirls commented Feb 18, 2022

其实就是在补全论元,有些标注是有问题的不全面,缺少书名号或者缺少部分内容和书名号等符号,比如原始数据的某一个论元为互联网财险市场分析报告》,通过clean_data,可以将其补全为 《2014-2019年互联网财险市场分析报告》。当然这个操作有时也会有问题,比如《山东省人民政府-中国科学院推进山东新旧动能转换重大工程合作协议》,你在数据中搜索一下就知道为什么有问题了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants