Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lida Summarizer, data type convertion error #117

Open
Dejian0328 opened this issue May 7, 2024 · 2 comments
Open

Lida Summarizer, data type convertion error #117

Dejian0328 opened this issue May 7, 2024 · 2 comments

Comments

@Dejian0328
Copy link

Does anyone facing this issue?
I plan to do a summarization on the dataframe, end up having a datatype issue.
Can you please advice on this.

df = pd.DataFrame.from_records(data, columns=columns)
data_summary = lida.summarize(df, summary_method="llm", textgen_config=textgen_config)

df:
ContributionID MemberID EmployerID ContributionMonth EmployeeShare
0 1 27 15 May 883.43
1 2 44 2 December 626.79
2 3 1 17 January 732.94
3 4 28 15 September 149.57
4 5 49 15 September 616.06
5 6 45 8 February 154.46
6 7 41 16 August 941.70
7 8 2 3 July 707.85
8 9 2 8 May 186.81
9 10 22 7 June 558.11

EmployerShare TotalContribution ContributionDate
0 536.68 1420.11 2021-05-13
1 368.82 995.61 2024-12-23
2 716.15 1449.09 2021-01-03
3 258.10 407.67 2022-09-27
4 519.45 1135.51 2022-09-09
5 840.50 994.96 2022-02-25
6 990.86 1932.56 2020-08-17
7 960.77 1668.62 2021-07-08
8 349.01 535.82 2021-05-16
9 585.05 1143.16 2022-06-30

error log:

\lida\components\manager.py:131, in Manager.summarize(self, data, file_name, n_samples, summary_method, textgen_config)
[128] data = read_dataframe(data)
[130] self.data = data
--> [131] return self.summarizer.summarize(
[132] data=self.data, text_gen=self.text_gen, file_name=file_name, n_samples=n_samples,
[133] summary_method=summary_method, textgen_config=textgen_config)

\lida\components\summarizer.py:130, in Summarizer.summarize(self, data, text_gen, file_name, n_samples, textgen_config, summary_method, encoding)
[128] # modified to include encoding
[129] data = read_dataframe(data, encoding=encoding)
--> [130] data_properties = self.get_column_properties(data, n_samples)
[132 # default single stage summary construction
...
File tslib.pyx:596, in pandas._libs.tslib.array_to_datetime()

File tslib.pyx:588, in pandas._libs.tslib.array_to_datetime()

TypeError: <class 'decimal.Decimal'> is not convertible to datetime, at position 0

@skyprince999
Copy link

skyprince999 commented May 9, 2024

can you share a copy of the data. Is it a tsv file?

Typically while summarizing the function uses the pandas.to_datetime function to convert it to a datetime object. If it doesnt find it in correct format it raises an error.

@Dejian0328
Copy link
Author

I extract the data from a Azure SQL DB, using pyodbc cursor.
The conversion raise an error when the data is in decimal data type. Once I convert them manually into float in the Azure DB, then the summarize function works fine.

The error is raised when I do not exclude EmployeeShare, EmployerShare and TotalContribution columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants