Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicite default column width #131

Open
buhtz opened this issue Jun 28, 2022 · 4 comments
Open

Explicite default column width #131

buhtz opened this issue Jun 28, 2022 · 4 comments

Comments

@buhtz
Copy link
Contributor

buhtz commented Jun 28, 2022

Different from pandas.to_excel() it seems to me that StyelFrame.to_excel() does explicit set the width of each column based on its values. I am not sure here it is just a hypothesis.

Please see the full code at the end of that report.

The excel result looks like this
image

I assume that column "B" was set to a default minimum with because there is much space before and after the values.
I also assume that column "C" was set to the with of its longest value (33123).
I assume that the with of headers are ignored.

Now I do a "auto fit column width" in Excel (in German "Spaltenbreite automatisch anpassen"):
image

The index column A is the only that was fit to the with of its values/contents.
Column B and C does not change. That is why I assume there an explicit width.

The default Excel behavior would be to set the width that it fit to the content/values of a column:
image

Even you format that table as a "table with header" in Excel the auto-fit function create this
image

#!/usr/bin/env python3
import os
import sys
import pathlib
import pandas
import styleframe
import styleframe.utils

print(styleframe._versions_)

df = pandas.DataFrame(
    data={
        'idx1': list('AABB'),
        'column with long caption': [1234, 345, 33123, 2],
        }
    )
print(df)

file_path = pathlib.Path.cwd() / 'test.xlsx'

default_style = styleframe.Styler(font_size=14)
sf = styleframe.StyleFrame(df, styler_obj=default_style)

sf.to_excel(file_path, index=True).save()

os.system(str(file_path))

Changing this would break the default behaviour of your users. I am aware of this.
But maybe there could be switch or something to avoid the setting of an explicit width.
IMHO better default behaviour would be to auto-fit the width.

@DeepSpace2
Copy link
Owner

DeepSpace2 commented Jul 8, 2022

We do have best_fit option for to_excel, it just does not take the column names into account, just the column content. That is probably an unexpected behavior, at least with how the documentation is currently worded (the note says "... correct column-width based on the longest value in each provided column"):

Parameters
best_fit (None or str or list or tuple or set) –
single column, list, set or tuple of columns names to attempt to best fit the width for.

Note
best_fit will attempt to calculate the correct column-width based on the longest value in each provided column. However this isn’t guaranteed to work for all fonts (works best with monospaced fonts). The formula used to calculate a column’s width is equivalent to

(len(longest_value_in_column) + A_FACTOR) * P_FACTOR
The default values for A_FACTOR and P_FACTOR are 13 and 1.3 respectively, and can be modified before calling StyleFrame.to_excel by directly modifying StyleFrame.A_FACTOR and StyleFrame.P_FACTOR

Consider this:

from styleframe import StyleFrame

long_string = 'a' * 100
data_with_long_column_name = {long_string: ['a']}
data_with_long_content = {'a': [long_string]}
long_column_name_sf = StyleFrame(data_with_long_column_name)
long_content_sf = StyleFrame(data_with_long_content)

long_column_name_sf.to_excel('long_column_name_sf_without_best_fit.xlsx').save()
long_content_sf.to_excel('long_content_sf_without_best_fit.xlsx').save()
long_column_name_sf.to_excel('long_column_name_sf_with_best_fit.xlsx', best_fit=long_string).save()
long_content_sf.to_excel('long_content_sf_with_best_fit.xlsx', best_fit='a').save()

long_column_name_sf_without_best_fit.xlsx:
image

long_content_sf_without_best_fit.xlsx:
image

long_column_name_sf_with_best_fit.xlsx:
image

long_content_sf_with_best_fit.xlsx:
image

The fix should be fairly easy,

self.set_column_width_dict({column: (max(self.data_df[column].astype(str).str.len()) + self.A_FACTOR) * self.P_FACTOR
should be changed to

self.set_column_width_dict({column: (max(self.data_df[column].astype(str).str.len().max(), len(column)) + self.A_FACTOR) * self.P_FACTOR
                                        for column in best_fit})

image

compared to before that fix:

image

I'll gladly accept a PR for this, otherwise I'll fix that myself :)

@buhtz
Copy link
Contributor Author

buhtz commented Jul 10, 2022

I can not provide a PR for this because I don't understand the details here. I understand your examples. But I would say that they doesn't matter for my problem.
You mentioned best_fit. But I don't use this argument and its default is None. So IMHO StyleFrame` shouldn't touch the width options in that case.

@DeepSpace2
Copy link
Owner

StyleFrame shouldn't touch the width options in that case.

I disagree. Pandas' defaults are pretty useless. How is this useful in any way? You can't even tell on what column the headers is:

import pandas as pd

pd.DataFrame({'a' * 100: []}).to_excel('output.xlsx')

image

If the long content is in a row (ie not in the header) it looks a bit better, but still messy. I don't want StyleFrame's output to look this way, not even by default.

image

@DeepSpace2 DeepSpace2 self-assigned this Jul 10, 2022
@buhtz
Copy link
Contributor Author

buhtz commented Jul 12, 2022

I see your point and agree with you.

You say "Pandas' defaults are pretty useless". From the viewpoint of StyleFrame you are right.
But from Pandas viewpoint it is not. There are no "defaults" in the context of formating the Excel output with DataFrame.to_excel(). Pandas don't take care of the visual part. This is not a bug or missing feature. It is just by design because Pandas don't want to take care of visual part and formating.

That is where your StyleFrame comes in. It is its job.

My initial problem was

  • The "auto fit column width" in Excel (in German "Spaltenbreite automatisch anpassen") doesn't work anymore when I use StyleFrame. I assume this is because you manipulate the width of columns. But I don't understand all the details of the technical reasons here.

I understand that StyleFrame tries and have to set some defaults for column widths. That is fine.
But that defaults shouldn't "block" Excel default behavior ("auto fit column width").

I don't know how you manipulate the column width. But isn't it possible to manipulate the column width without making it a fixed width so that the "auto fit column width" feature doesn't work anymore?

EDIT:
There are some topics that should be thought of. I would like to have StyleFrame features like this

  • Fit column width to content and header.
  • Fit column width to header but not content. Break (line wrap?) the content if it is to long.
  • Fit column width to content but not header. Line wrap the header if needed.

StyleFrame default behavior? Maybe this should be the same as Excels "auto fit": Fit column width to content and header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants