Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hexcodes for colors are strange/incorrect #89

Open
JohnHendrickx opened this issue Jul 12, 2023 · 4 comments
Open

Hexcodes for colors are strange/incorrect #89

JohnHendrickx opened this issue Jul 12, 2023 · 4 comments

Comments

@JohnHendrickx
Copy link

Thanks very much for your great work with tidyxl! I often have to deal with Excel files where key information is in the color coding and tidyxl finally solves that problem. I do have a question though about the hex strings for fill colors. I can identify cells based on coloring but the colors returned by xlsx_formats are incorrect (or I don't understand them).

I'm working with Excel files that are generated by an analysis apparatus. The Excel files use a grey background (#A6A6A6) for "unselected" and pink (#F2DCDB) for "assigned", where "unselected" values are to be ignored. However, xlsx_formats returns "FFFFFFFF" for "unselected" and "FFC0504D" for "assigned". I understand that the leading "FF" should be ignored and then the colors returned would be #FFFFFF (white) and #C0504D (reddish-brown).

Is there an explanation for these unexpected hex strings? As said, I can identify the "unselected" cells using the returned hex string but I'd like to understand why they're so different from what I expect.

@nacnudus
Copy link
Owner

Thank you for your kind remarks. Please could you attach an example spreadsheet with the two colours you mention?

@JohnHendrickx
Copy link
Author

JohnHendrickx commented Jul 12, 2023

test.xlsx
I used this Excel file for a test this morning, The values in row 1 are copy/pasted from the original Excel file, values in row 3 were created by me.

This is the R program I used. Strangely, I now get "FFED7D31" for pink.

# https://higgi13425.github.io/medical_r/posts/2021-01-13-extracting-highlighting-as-data-from-excel/

library(tidyxl)
library(readxl)
library(dplyr)
library(stringr)
library(tidyr)
library(here)

Excel_fn <- paste0(here(),"/test.xlsx")

# Extract formats as one large list
Group_Layout_formats <- xlsx_formats(Excel_fn)

# Extract the colours only from the list
Group_Layout_colours <- Group_Layout_formats$local$fill$patternFill$fgColor$rgb

These are the values I get now for Group_Layout_colours:

> Group_Layout_colours
[1] NA         "FF000000" "FFFFFFFF" "FFED7D31" "FFFFFFFF" "FFED7D31" "FF000000"

@nacnudus
Copy link
Owner

I think there are two things going on here.

The first is that it is possible (I think) to define the "office" colour theme elsewhere than in the Excel file itself. When you open the file in Excel, one of the definitions has to win. That might explain why FFC0504D became FFED7D31. In the file, the colour FFED7D31 is the "Office" theme colour called "accent 2".

The second is that there is a base RGB value, to which a tint (percentage) is applied. Tidyxl reports the RGB and the tint separately. I opened the file in Libreoffice (sadly I no longer have access to Excel), and it said the pink colour is FBE5D6.

I didn't implement tints in tidyxl because I never understood them, but it seems that it's straightforward to apply a tint to an RGB value and get an RGB value back. Note that, in the script below, I move the alpha channel from the beginning to the end, for compatibility with grDevices::col2rgb().

library(tidyxl)

filename <- "./test.xlsx"
f <- xlsx_formats(filename)
c <- xlsx_cells(filename)
i <- c[c$col==3, "local_format_id"]
rgb <- f$local$fill$patternFill$fgColor$rgb[i]
tint <- f$local$fill$patternFill$fgColor$tint[i]

rgb
# [1] "FFED7D31" "FFED7D31"

tint
# [1] 0.7999817 0.7999817

# https://rdrr.io/cran/MESS/src/R/colorfunctions.R
col.tint <- function(col, tint=.5) {
    if(missing(col))
        stop("a vector of colours is missing")
    if (tint<0 | tint>1)
        stop("shade must be between 0 and 1")
    mat <- t(col2rgb(col, alpha=TRUE)  +  c(rep(1-tint, 3), 0)*(255-col2rgb(col, alpha=TRUE)))
    rgb(mat, alpha=mat[,4], maxColorValue=255)
}

col.tint("#ED7D31FF", tint = 1 - 0.7999817)
#FBE4D5, which is very nearly FBE6D6 as in libreoffice

col.tint("#C0504DFF", tint = 1 - 0.7999817)
#F2DBDB, which is very nearly F2DCDB as expected

Does Excel show the untinted RGB value anywhere? If not, tidyxl ought to automatically apply the tint to the base RGB value and give a final RGB value. I'm afraid I don't have time to work on tidyxl any more, but I'd look at a pull request. It would have to be exact, rather than approximate as above.

@JohnHendrickx
Copy link
Author

If I hover over a color in the "Fill Color" menu, I see "Orange, Accent 2, Lighter 80%"
image

For grey, the popup text is "White, Background 1, Darker 25%"

So this does seem to be related to the "Theme Colors" and the tints applied. I'll experiment a bit with the col.tint and see if that clarifies things further. Thanks for your time and help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants