Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-nlp python return only first date in sentence #14091

Open
1 task done
andreyversh opened this issue Dec 13, 2023 · 0 comments
Open
1 task done

Spark-nlp python return only first date in sentence #14091

andreyversh opened this issue Dec 13, 2023 · 0 comments
Assignees
Labels

Comments

@andreyversh
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and did not find a match.

Who can help?

No response

What are you working on?

text_list = ["See you on next monday.",
"She was born on 02/03/1966.",
"The project started yesterday, will finish next year.",
"She will graduate by July 2023.",
"Raffles Medical Group Ltd wishes to announce results ended 30 June 2022 will be released in the morning of 1 August 2022 (Monday) .",
"She will visit doctor tomorrow and next month again."]

    spark_df = spark.createDataFrame(text_list, StringType()).toDF("text")

    result = pipeline.fit(spark_df).transform(spark_df)
    result.selectExpr("text", "date.result as date", "multi_date.result as multi_date").show(truncate=False)

Current Behavior

|text |date |multi_date |
+----------------------------------------------------------------------------------+-------------+--------------------+
|See you on next monday. |[2023/12 /18]|[12/18/23] |
|She was born on 02/03/1966. |[1966/02 /03]|[02/03/66] |
|The project started yesterday, will finish next year. |[2024/12 /13]|[12/13/24, 12/12/23]|
|She will graduate by July 2023. |[2023/07 /01]|[07/01/23] |
|Raffles wishes ended 30 June 2022 and will be released 1 August 2022 (Monday) .|[2022/06 /30]|[06/30/22] |
|She will visit doctor tomorrow and next month again. |[2024/01 /13]|[01/13/24, 12/14/23]|
+----------------------------------------------------------------------------------+-------------+--------------------+

Expected Behavior

|text |date |multi_date |
+----------------------------------------------------------------------------------+-------------+--------------------+
|See you on next monday. |[2023/12 /18]|[12/18/23] |
|She was born on 02/03/1966. |[1966/02 /03]|[02/03/66] |
|The project started yesterday, will finish next year. |[2024/12 /13]|[12/13/24, 12/12/23]|
|She will graduate by July 2023. |[2023/07 /01]|[07/01/23] |
|Raffles wishes ended 30 June 2022 and will be released 1 August 2022 (Monday) .|[2022/06 /30]|[06/30/22, 08/01/2022] |
|She will visit doctor tomorrow and next month again. |[2024/01 /13]|[01/13/24, 12/14/23]|
+----------------------------------------------------------------------------------+-------------+--------------------+

Steps To Reproduce

Please, see a code in block "What are you working on?". It's official example from
https://www.johnsnowlabs.com/extracting-exact-dates-from-natural-language-text/
only one sentence was added

Spark NLP version and Apache Spark

5.2.0

Type of Spark Application

Python Application

Java Version

openjdk version "11.0.21" 2023-10-17 OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu123.10) OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu123.10, mixed mode, sharing)

Java Home Directory

No response

Setup and installation

No response

Operating System and Version

Distributor ID: Ubuntu Description: Ubuntu 23.10 Release: 23.10 Codename: mantic

Link to your project (if available)

No response

Additional Information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants