-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load file xlsx is very slow #3917
Comments
I have an XLSX that is 1.4MB and my code just hangs when I use the same method, doesn't even finish. |
@tvillafane please use https://github.com/aVadim483/fast-excel-reader is better faster |
Same problem. |
相同问题,2.2mb的文件,执行IOFactory::load()竟然需要四五十秒,有优化的方案吗? |
specialize $reader->setReadDataOnly(true) if you only read the file. $reader = \PhpOffice\PhpSpreadsheet\IOFactory::createReader(\PhpOffice\PhpSpreadsheet\IOFactory::identify($filename));
$reader->setReadDataOnly(true);
$spreadsheet = $reader->load($filename); |
My issue appeared for >2.1.0 versions. 2.1.0 – 0.29 sec I started to delete sheets from the workbook and left for you to judge the only one. The opening in 2.2.2 takes 2.71 sec, in 2.1.0 takes 0.08 sec. |
@DanielRuf Thank you for the information. Yes, it appears that might indeed be the cause. I'm flabbergasted. Our unit test suite reads tons of Xlsx files, and there didn't seem to be any significant change in run-time after that PR. I will see if I can improve it. |
Thanks for looking into this. My composer which upgraded to this version also indicated that version 2.2.2 seems to have caused a problem. |
@henno @tvillafane @nguyenthao1988 can you provide test files regarding your exact issue? Then I can check that. The commit identified by git My test setup:
<?php
namespace PhpOffice;
require 'vendor/autoload.php';
$filename = 'ru.xlsx';
$spreadsheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($filename); test command: test steps: git checkout master
git bisect start
git bisect bad
git bisect good 2.1.0
time php test.php
# when result of "time php test.php" is high, run "git bisect bad"
git bisect bad
# when result of "time php test.php" is low, run "git bisect good"
git bisect good
# test again after each "git bisect good" / "git bisect bad"
time php test.php
# repeat "git bisect bad" / "git bisect good" + "time php test.php" until you see "... is the first bad commit" |
Please test against PR #4153 if possible. That should help a lot for the issue reported by bisect. |
Performance-wise it looks good after applying the changes from the linked PR: wget https://github.com/PHPOffice/PhpSpreadsheet/pull/4153.patch
git apply -v --index 4153.patch
# Checking patch src/PhpSpreadsheet/Worksheet/Worksheet.php...
# Hunk #1 succeeded at 3730 (offset 6 lines).
# Checking patch src/PhpSpreadsheet/Cell/Cell.php...
# Checking patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php...
# Checking patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php...
# Applied patch src/PhpSpreadsheet/Worksheet/Worksheet.php cleanly.
# Applied patch src/PhpSpreadsheet/Cell/Cell.php cleanly.
# Applied patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php cleanly.
# Applied patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php cleanly.
time php8.2 test.php
# real 0m0.303s
# user 0m0.126s
# sys 0m0.055s |
@DanielRuf Thank you for confirming. I will wait a few days before merging the PR in case somebody else supplies a spreadsheet that needs to be checked. |
@oleibman I probably have another spreadsheet which results in very long execution times, if no one else provides one. But it would be better if the other participants of this issue can provide a corresponding file. |
@oleibman , here is the file I used which was causing long delays. |
@sharat-rhinofish on 2.1.0 I get these times with the
On 2.2.0 and master I had to cancel:
Wit the patch from the PR I get these times (which is better but not yet ideal):
On 1.29.0 I get these times:
So there seem to be 2 changes affecting the performance: 1.29.0 => 2.1.0 an 2.1.0 => 2.2.0 and the latter is only partly improved by the PR. The 1.29.0 => 2.1.0 performance change seems to come from this: Commit: 49a04bd And the other performance change came from #3917 (comment), which is a bit improved by the PR but still has ~3 times the execution time compared to previous versions. |
@sharat-rhinofish Something interesting happens when I open your file in Excel. |
@DanielRuf I don't really see any potential performance problem in the commit you've newly identified. Not having a good sense of how you're arriving at your results, can you tell if there might have been a difference due to PR #3497? |
@DanielRuf @oleibman , thanks for sharing the findings. Interesting. FYI, this excel file comes from our vendors who are sending back order information. It is likely that they have unoptimized excel data ravaged by use over a period of time. Ideally, I want to be able to use it directly without having to necessarily optimize it. If we can get to 2.1.0 timings, I would be happy. With PR, the times are still 2.5 times higher than version 2.1.0, which is a little bit of a concern. |
With
Without
So the following factors seem to contribute to the performance changes:
We did not yet check our affected excel file regarding #3917 (comment). @oleibman in our case we moved from 1.26 to 1.27 due to Structured References in some excel file, which threw some error during processing. With that we also had much slower processing, but this is probably a different issue. 1.29.0 times for
1.27.0 times for
1.26.0 times for
So to me it seems PR #3497 had no effect on |
FWIW, I see no significant time difference between 2.1.0 and the new PR. I do see a big difference between 2.2.0 and 2.1.0, and between 1.29.0 and 2.1.0. |
@DanielRuf I found a small optimization I could make. Can you please re-test against the new PR. |
I've skipped one step during |
Without the latest PR changes: time php8.2 test.php
# real 0m57.377s
# user 0m56.828s
# sys 0m0.201s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::IGNORE_EMPTY_CELLS
time php8.2 test.php
# real 0m2.106s
# user 0m1.942s
# sys 0m0.144s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::READ_DATA_ONLY
time php8.2 test.php
# real 0m25.482s
# user 0m24.983s
# sys 0m0.260s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::IGNORE_EMPTY_CELLS | \PhpOffice\PhpSpreadsheet\Reader\IReader::READ_DATA_ONLY
time php8.2 test.php
# real 0m1.621s
# user 0m1.404s
# sys 0m0.160s With the latest PR changes: wget https://github.com/PHPOffice/PhpSpreadsheet/pull/4153.patch
git apply -v --index 4153.patch
# Checking patch src/PhpSpreadsheet/Worksheet/Worksheet.php...
# Hunk #1 succeeded at 3730 (offset 6 lines).
# Checking patch src/PhpSpreadsheet/Cell/Cell.php...
# Checking patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php...
# Checking patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php...
# Checking patch src/PhpSpreadsheet/Cell/Cell.php...
# Checking patch src/PhpSpreadsheet/Reader/Xlsx.php...
# Hunk #1 succeeded at 950 (offset 3 lines).
# Applied patch src/PhpSpreadsheet/Worksheet/Worksheet.php cleanly.
# Applied patch src/PhpSpreadsheet/Cell/Cell.php cleanly.
# Applied patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php cleanly.
# Applied patch tests/PhpSpreadsheetTests/Worksheet/ApplyStylesTest.php cleanly.
# Applied patch src/PhpSpreadsheet/Cell/Cell.php cleanly.
# Applied patch src/PhpSpreadsheet/Reader/Xlsx.php cleanly.
time php8.2 test.php
# real 0m9.529s
# user 0m9.264s
# sys 0m0.184s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::IGNORE_EMPTY_CELLS
time php8.2 test.php
# real 0m1.306s
# user 0m1.137s
# sys 0m0.168s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::READ_DATA_ONLY
time php8.2 test.php
# real 0m8.441s
# user 0m8.124s
# sys 0m0.216s
# with \PhpOffice\PhpSpreadsheet\Reader\IReader::IGNORE_EMPTY_CELLS | \PhpOffice\PhpSpreadsheet\Reader\IReader::READ_DATA_ONLY
time php8.2 test.php
# real 0m1.233s
# user 0m1.105s
# sys 0m0.125s If anyone needs the patched version (master + #4153): src-patched-4153-202408311248.zip |
@sharat-rhinofish Your Excel file has tons of empty cells defined in rows 56 through 76. This is probably what Excel was complaining about above. You can boost your performance by a huge amount through the use of IGNORE_EMPTY_CELLS: $spreadsheet = IOFactory::load($path, IReader::IGNORE_EMPTY_CELLS);
// or
$reader = new Reader\Xlsx();
$reader->setReadEmptyCells(false);
$spreadsheet = $reader->load($path); And now I really am going to bed. |
@oleibman , Thanks. This helps and does speed things up. Thanks for the useful tip. I have added this to and am back on version 2.2.2 |
Based on what Excel did with that spreadsheet, it could make sense for PhpSpreadsheet Xlsx Reader to ignore all cells with only attribute |
Oh well, no need to consider it for future after all. That change made it through our test suite with only one failure, but that failure unfortunately would also apply to the "bad" spreadsheet. (Row specifies a style, cell does not - cell should therefore get default style rather than row style; ironically, the row style is not applied to any cell in the row in the spreadsheet, but determining that will give back all our performance gains.) We will just have to rely on user specifying "ignore empty" when appropriate. |
@oleibman will this be available with 2.2.3 and is there some ETA? Also we need a new release of PhpPresentation: PHPOffice/PHPPresentation#812 So maybe there is some coordination needed between both projects (not sure about that). |
I hope to release 2.3.0 (without dynamic tables) and 3.0.0 (with) within the next few weeks. This change will be part of both. |
`<?php
require 'vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\IOFactory;
$filename = 'data/danhsach.xlsx';
//$filename = 'data/nhanvien.xlsx';
$spreadsheet = IOFactory::load($filename);`
I use this code load file "danhsach.xlsx" with 16k record, file size excel: 7MB.
+> this code load finish taken time: 32s
how to optimize this, i load full column and row, not limit.
I used:
PHP: 8.2.4
PhpSpreadsheet: 2.0
Thanks bro,
The text was updated successfully, but these errors were encountered: