Using this package to run tests with github actions is buggy #40

joewagner · 2023-06-06T18:41:48Z

what happens

Running tests on a developer machine always works as expected, but when the tests are run as part of CI via github actions they fail about half the time.

how to reproduce

For an Example look at: https://github.com/tablelandnetwork/jeti/actions/runs/5191892282/jobs/9360322039
You will see that the network logs are turned on and the tests start running before the Validator has created the healthbot table. Some of the tests pass in this case, but the tests that create tables end up failing because the validator transaction receipt polling is aborted after the polling timeout is reached.
As a means of exploring why this is happening I had the test setup wait for a full 60 seconds after the local-tableland network signaled that it is ready. The same intermittent failures are still occurring.
I have a few theories why this is happening:

The Validator process is failing before the node.js process error listener has been attached correctly.
The Validator startup process is still taking place when the local-tableland parent process is signaling the network is ready.
The Validator is starting, but being overwhelmed with polling requests and smart contract events.

What is expected

This package should work correctly for CI done via github actions

Additional context

The parent process signals that the network is ready by inspecting the Validator process stdout, and waiting for a specific message. That message is currently the string "processing height". This is brittle at best, and we should consider having the Validator log a specific message when it considers itself to be fully online.

joewagner · 2023-06-09T19:10:09Z

After some debugging, it seems that this problem is caused by an unknown issue that is causing the registry contract to not be deployed.

joewagner · 2023-07-27T18:39:53Z

@dtbuchholz it looks like tablelandnetwork/local-tableland#433 did not fix this issue unfortunately. It still looks like the deploy process is silently crashing, or never starting?
here's CI run with logging: https://github.com/tablelandnetwork/js-validator/actions/runs/5684229078/job/15406459448

dtbuchholz · 2023-07-28T06:07:03Z

@joewagner dang...this is frustrating! I can try tweaking some things next week and maybe configure a subset of separate tests to work with act (for local debugging...its usage if docker wasn't working properly with tests).

A couple of things came up through some chatgpt convos. First, maybe the cache is out of date or corrupted? Since things start properly, I doubt this is the root cause but figured I'd mention it. We could temporarily try sort of flushing it by adding a version to the run, like:

key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}-v2

The other point was on concurrency. Could there be an issue with shared resources? I'd also not expect this to be an issue since it's just a single validator running. But, for example, if there was a way to catch the actual validator's error that happens when you run two LT instances with the same config on different registry ports.

Last thought. When I was debugging, I threw in a shelljs echo command with lsof -i tcp:8545, during registry and validator process shutdown iirc. Most of the time, there were 3 pids logged. But, every once in a while, I'd unpredictably see like 10-15 shown. I don't recall any impacts, though, so just sharing the observation.

Other than that, I wonder if there's something unknown with hardhat. A last ditch effort could be to replace it with forge, which I've only heard great things about for DX.

joewagner · 2023-07-28T15:53:48Z

Definitely all good thoughts.
Just to reiterate what I've found in debugging:
The symptom is specifically the deploy process, i.e. npx hardhat run scripts/deploy.ts. When a github failure occurs, the deploy process, (which is a synchronous child_process) logs the following:

[Contract Deploy] 
[Contract Deploy] Downloading compiler 0.8.19
[Contract Deploy]

Then the process exits without an error, which is obviously not super helpful. The hardhat node starts without any issue, and the Validator starts and connects to hardhat. But since the contract doesn't exist the validator can't be of much use, and then the tests fail, (mostly with polling timeouts).

The main take home from that is that since hardhat is starting, and the validator connects to it, i don't think the issue has to do with the port being already in use. It definitely could have to do with that, but it doesn't seem to be the problem in my tests.

dtbuchholz · 2023-08-11T21:29:50Z

@joewagner would CircleCI help with debugging this? i have some free credits that i can get if it's relevant.

joewagner · 2023-08-11T21:49:46Z

@dtbuchholz Maybe!? That definitely looks promising. I found an Action that starts an ssh server on the machine running the workflows, it seems like it would be helpful too. https://github.com/marketplace/actions/debugging-with-ssh
I used it to fix an unrelated issue with the monorepo.

dtbuchholz · 2023-08-11T22:23:59Z

hmm interesting. kk i'll get the credits and check that action out, too.

joewagner · 2023-08-29T17:14:37Z

I was able to get more information on the registry deploy process error.
After adding #22, following is now being logged

Couldn't download compiler version 0.8.19+commit.7dd6d404: Checksum verification failed.
@tableland/sdk: Please check your internet connection and try again.
@tableland/sdk: If this error persists, run "npx hardhat clean --global".
@tableland/sdk: HardhatError: HH[50](https://github.com/tablelandnetwork/tableland-js/actions/runs/5999749627/job/16270493927#step:7:51)3: Couldn't download compiler version 0.8.19+commit.7dd6d404: Checksum verification failed.

dtbuchholz · 2023-08-29T19:35:41Z

@joewagner no way! if the issue is due to proxying, we could try one of these out (via here):

// hardhat.config.ts
const { ProxyAgent, setGlobalDispatcher } = require("undici");
const proxyAgent = new ProxyAgent('http://127.0.0.1:7890'); // change to yours
setGlobalDispatcher(proxyAgent);

or via env vars:

export HTTP_PROXY=<username>:<password>@<ip_address>:<ip_port>
export HTTPS_PROXY=<username>:<password>@<ip_address>:<ip_port>

joewagner · 2023-08-29T20:24:26Z

if the issue is due to proxying, we could try one of these out (via here):

It looks like the error is HardhatError: HH503 which is described here. The suggested solution is to run hardhat clean --global. I'll try that in a PR, unfortunately we can't really be sure it will fix anything since the failure is intermittent...

dtbuchholz · 2023-11-21T01:29:03Z

(moving to #96 for Linear syncing puropses)

joewagner self-assigned this Jun 6, 2023

joewagner added bug Something isn't working linear Sync issue with linear labels Jun 6, 2023

joewagner mentioned this issue Jun 8, 2023

Abstract cli response logging, and add test coverage. tablelandnetwork/js-tableland-cli#348

Merged

joewagner mentioned this issue Aug 24, 2023

throw an error if the registry contract hasn't been deployed #22

Merged

joewagner mentioned this issue Aug 29, 2023

Ensure hardhat can download solidity #30

Draft

joewagner transferred this issue from tablelandnetwork/local-tableland Sep 27, 2023

joewagner mentioned this issue Sep 27, 2023

Export Schema and Column types so they can be used externally #39

Merged

dtbuchholz mentioned this issue Nov 21, 2023

[TABJS-17] Using this package to run tests with GitHub actions is buggy #96

Open

dtbuchholz closed this as not planned Won't fix, can't repro, duplicate, stale Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using this package to run tests with github actions is buggy #40

Using this package to run tests with github actions is buggy #40

joewagner commented Jun 6, 2023

joewagner commented Jun 9, 2023

joewagner commented Jul 27, 2023

dtbuchholz commented Jul 28, 2023

joewagner commented Jul 28, 2023 •

edited

Loading

dtbuchholz commented Aug 11, 2023

joewagner commented Aug 11, 2023 •

edited

Loading

dtbuchholz commented Aug 11, 2023

joewagner commented Aug 29, 2023

dtbuchholz commented Aug 29, 2023

joewagner commented Aug 29, 2023

dtbuchholz commented Nov 21, 2023

Using this package to run tests with github actions is buggy #40

Using this package to run tests with github actions is buggy #40

Comments

joewagner commented Jun 6, 2023

what happens

how to reproduce

What is expected

Additional context

joewagner commented Jun 9, 2023

joewagner commented Jul 27, 2023

dtbuchholz commented Jul 28, 2023

joewagner commented Jul 28, 2023 • edited Loading

dtbuchholz commented Aug 11, 2023

joewagner commented Aug 11, 2023 • edited Loading

dtbuchholz commented Aug 11, 2023

joewagner commented Aug 29, 2023

dtbuchholz commented Aug 29, 2023

joewagner commented Aug 29, 2023

dtbuchholz commented Nov 21, 2023

joewagner commented Jul 28, 2023 •

edited

Loading

joewagner commented Aug 11, 2023 •

edited

Loading