-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest-OptimizationCSVExportsToLogAnalytics Runbook with "consumptionexports" parameter failing consistently. #998
Comments
@davemilnercapg, thanks for reporting this issue. Can you please share what the "Exception" tab says for these failing jobs? |
Thread failed to start. (Thread failed to start. (Exception of type 'System.OutOfMemoryException' was thrown.)) Error on Exception tab |
It's unusual to see such errors in this runbook. Are you using the latest version of AOE? Can you share what was the latest message written in the job output and if you see the same last message in every job failure? |
We are using the latest version of the AOE. It was upgraded recently. However the issue with this runbook started prior to the upgrade and remained after the upgrade. The latest message written in the job output varies, depending on the last record processed prior to experiencing the system out of memory error. Here are 3 examples of the last message written on 3 seperate runs: |
I see... There was probably some other issue some months ago and AOE is now lagging behind trying to process very old CSVs. As you can see by the dates, it is trying to process blobs from June! To work around this problem, if you don't mind, I ask you to please identify the date/time of the last blob that was successfully processed. You can normally find this in the 5th line of the job output. Something like this:
Then open the AOE Storage Account, use the Storage Browser, navigate to the After that, test again the runbook, by starting it and passing |
Thx, Helder. I started out with about 27k objects in that container. After purging 10k down to 17k objects, I ran it again, and it ran out of memory. Purge 2: down to 12k objects. Currently running. For long-term, how should we manage the growth in these storage accounts? is there a scheduled task in the AOE for cleanup? Do we need to write one? Is it available to use forced garbage collection in the runbooks to not run out of memory? Or manage large blog stores another way? |
these Azure worker agents are allocated with 400MB of memory. |
27k objects is really a huge number for the consumption exports. Can you confirm the AOE storage account has the If it doesn't, please create one such rule and configure it to delete blobs at least 30 days after being modified. If it does, it means you have a large amount of subscriptions in your environment and you should maybe export consumption data at the EA/MCA level (generates one large blob per day), instead of doing it per subscription (potentially hundreds of blobs per day). Can you confirm the amount of subscriptions monitored by AOE and whether you are under an EA or a MCA? |
To change the consumption export scope from subscription to EA billing account, you must do the following:
I guess that 2. and 3. are already done, because you said earlier that the other runbooks were running without issues. If all the Reservations and Savings Plans workbooks are loading correctly, then for sure 2. and 3. are done. After you complete the steps above, you should get, in the next job, a single consumption export file for the whole EA. I hope this helps! |
Thx Helder! We set #1 and confirmed the other elevated setup. Running to obtain the larger single consumption export file. We also anticipate an out of memory exception and are in process of migrating this to a hybrid worker agent. That should get us covered - will update with results. |
ok for results after those modified settings, we are getting the following Error: Exception calling "GetBytes" with "1" argument(s): "Array cannot be null. Parameter name: chars" (Exception calling "GetBytes" with "1" argument(s): "Array cannot be null. Parameter name: chars" (Array cannot be null. Parameter name: chars)) The exception seems to be happening on line 257 due to a null input. The last output of the logs are: Found 7074 new blobs to process... About to process 2024-08-07-e904450b-efb3-4723-add4-48176d3c50eb-1.csv... Items:
Plz let me know any insights. |
The changes you made are influencing the outcome of the The log messages you are reporting are a symptom that old CSVs are still being processed. The |
Thanks for the additional details. The version of the As an additional check, can you add the following instruction in line 220 of the runbook?
|
I updated the version of that runbook thx. I also added the additional check to line 220. I have also approached this from a different angle, and created a new maintenance runbook - CleanUp-ZeroLengthBlobs - which approaches the problem from that angle. That also seems to be working... |
Latest update. After additions to line 220 and script update, the script chugged through all of the older outstanding small files successfully, and brought the last-updated up to 9/22/2024. After setting the Consumption to BillingAccount, the nightly export produced a 135MB CSV file with 100,000 lines. This failed on the nightly processing for that file - logs: Processing blobs modified after 2024-09-23T13:05:57.000Z (line 83999) and ingesting them into the AzureOptimizationConsumptionV1_CL table... 2024-09-22-71446200-AmortizedCost-1-final.csv found (modified on 2024-09-25T12:17:49.000Z) Found 1 new blobs to process... About to process 2024-09-22-71446200-AmortizedCost-1-final.csv... From there I get an error message: Exception calling "GetBytes" with "1" argument(s): "Array cannot be null. Parameter name: chars" (Exception calling "GetBytes" with "1" argument(s): "Array cannot be null. Parameter name: chars" (Array cannot be null. Parameter name: chars)) So what it looks like to me is the addition to line 220 will filter out all the small zero byte files. but the zero value lines within the bigger files are still causing problems. Troubleshooting... |
Seemingly, log ingestion failed between line 83999 and line 89999. Can you find something odd in those lines? Additionally, the only reason why |
I am not seeing anything odd in those lines. I am seeing the $jsonObject appearing as $null though. still troubleshooting... |
It seems you didn't change the essentials of the algorithm. You just added a couple of more checks to ensure a correct Now comes the moment of truth :-) All this effort is relevant only if the consumption-related workbooks load correctly. Can you confirm? |
It is looking good now! If you want to backfill consumption data for older dates, you just have to trigger the Be careful, however, with the amount of data to export in each job run. You said earlier that a single day generates 135 MB of data. It is maybe better to export no more than 3-4 consecutive days per job. Also, if you want to have more than 30 days of historical data, check the Log Analytics workspace retention, which is by default 30 days (free retention). |
@davemilnercapg, can you confirm whether the issue is definitely resolved? Thanks. |
@helderpinto - I can confirm that with the changes I made and posted above to this script that I have seen it run successfully multiple days in a row. The run from last night processed a 117k row single file successfully without errors. Without those changes, as the script is currently, it will fail consistently with null references sent to GetBytes. IMO the root cause of this both in EA mode (1 large file) and 27k smaller files is that many subscriptions do not have current values being output, so they result in a file being created, or a line being created with zero consumption values. These are not filtered out, so will either cause a null reference error or an out of memory error for larger subscriptions, running this with a billing account management level, or with EA level data coming in. No its not resolved without changes I specified. With them it resolves. To resolve this successfully, incorporate those changes and test it in a medium to larger subscription environment. |
So the specific types of small accounts I see producing zero values export are like the MSDN subscriptions, VSPE subscriptions, that will likely be in all customer tenants, and some other small ones. |
Thanks for the feedback, @davemilnercapg. Let's keep this bug open. It will be closed once the suggested changes are incorporated. On a side note, MSDN subscriptions and the like do not appear in the single, EA-level file, because they are not part of the agreement. Therefore, the zero-bytes lines must have a different cause. Nevertheless, let's simply discard those rare situations in the runbook code. |
Hi, @davemilnercapg There is a PR (#1048) that will fix this issue. If you want to try out the fix before the release, you just have to make sure you are running AOE on the latest version (September release) and then update the Please, make sure you back up your current code, so that you're able to roll back in case the proposed fix isn't effective. If you are not on the latest AOE release, check here how to upgrade. |
🐛 Problem
During daily scheduled task run of Ingest-OptimizationCSVExportsToLogAnalytics runbook, the runbook runs successfully for many other parameters, but always fails during the "consumptionexports" parameter task.
This results in all of the recommendations having blank Cost components.
👣 Repro steps
Deploy Azure Optimization Engine to Azure subscription, set up to monitor many subscriptions, over time the runbook fails.
🤔 Expected
Ingest-OptimizationCSVExportsToLogAnalytics completes successfully for all parameters and storage buckets.
📷 Screenshots
ℹ️ Additional context
It seems to be failing during the process of different blobs not all at the same one. So is this something to do with out of memory issues?
🙋♀️ Ask for the community
We could use your help:
The text was updated successfully, but these errors were encountered: