Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ananse network: distributed.core - ERROR - Exception while handling op shuffle_receive #216

Open
StevenBai97 opened this issue Jul 18, 2024 · 0 comments

Comments

@StevenBai97
Copy link

Hi,
I got an error when when running ananse network (v0.5.1) for non-model organisms.
ananse network -n 1 Cni_G_binding/binding.h5 -e G.mean.tpm -o Cni_G.ANANSE_network.tsv -g Cni.v2.assembly.chr.fasta -a Cni.v2.longest.gene.bed --include-promoter --include-enhancer

I have tried many times to running with different thread counts such as 1, 4, 12, 48, but they resulted in the same error as follows:
`2024-07-18 19:08:50 | INFO | Loading expression data
2024-07-18 19:08:50 | INFO | 100% of TFs found in both BED and expression file(s)
2024-07-18 19:08:51 | DEBUG | Loading tf binding activity data
2024-07-18 19:08:51 | INFO | Loading binding data
2024-07-18 19:08:51 | INFO | Using all 268 TFs.
2024-07-18 19:08:51 | INFO | Using all 184688 regions.

Aggregating: 0%| | 0/20 [00:00<?, ?contig/s]
Aggregating on Chr4. Overall progress: 0%| | 0/20 [00:00<?, ?contig/s]
Aggregating on Chr4. Overall progress: 5%|▌ | 1/20 [00:07<02:15, 7.15s/contig]
Aggregating on Chr2. Overall progress: 5%|▌ | 1/20 [00:07<02:15, 7.15s/contig]
Aggregating on Chr2. Overall progress: 10%|█ | 2/20 [00:13<01:56, 6.49s/contig]
Aggregating on Chr10. Overall progress: 10%|█ | 2/20 [00:13<01:56, 6.49s/contig]
Aggregating on Chr10. Overall progress: 15%|█▌ | 3/20 [00:18<01:39, 5.84s/contig]
Aggregating on Chr6. Overall progress: 15%|█▌ | 3/20 [00:18<01:39, 5.84s/contig]
Aggregating on Chr6. Overall progress: 20%|██ | 4/20 [00:24<01:38, 6.14s/contig]
Aggregating on fragScaff_342. Overall progress: 20%|██ | 4/20 [00:24<01:38, 6.14s/contig]
Aggregating on fragScaff_342. Overall progress: 25%|██▌ | 5/20 [00:28<01:17, 5.19s/contig]
Aggregating on fragScaff_111. Overall progress: 25%|██▌ | 5/20 [00:28<01:17, 5.19s/contig]
Aggregating on fragScaff_111. Overall progress: 30%|███ | 6/20 [00:31<01:03, 4.57s/contig]
Aggregating on Chr1. Overall progress: 30%|███ | 6/20 [00:31<01:03, 4.57s/contig]
Aggregating on Chr1. Overall progress: 35%|███▌ | 7/20 [00:38<01:08, 5.31s/contig]
Aggregating on fragScaff_9. Overall progress: 35%|███▌ | 7/20 [00:38<01:08, 5.31s/contig]
Aggregating on fragScaff_9. Overall progress: 40%|████ | 8/20 [00:41<00:55, 4.66s/contig]
Aggregating on fragScaff_67. Overall progress: 40%|████ | 8/20 [00:41<00:55, 4.66s/contig]2024-07-18 19:09:33 | DEBUG | No genes found on fragScaff_67

Aggregating on fragScaff_314. Overall progress: 40%|████ | 8/20 [00:41<00:55, 4.66s/contig]
Aggregating on fragScaff_314. Overall progress: 50%|█████ | 10/20 [00:45<00:32, 3.22s/contig]
Aggregating on Chr8. Overall progress: 50%|█████ | 10/20 [00:45<00:32, 3.22s/contig]
Aggregating on Chr8. Overall progress: 55%|█████▌ | 11/20 [00:49<00:32, 3.61s/contig]
Aggregating on Chr7. Overall progress: 55%|█████▌ | 11/20 [00:49<00:32, 3.61s/contig]
Aggregating on Chr7. Overall progress: 60%|██████ | 12/20 [00:55<00:33, 4.17s/contig]
Aggregating on fragScaff_50. Overall progress: 60%|██████ | 12/20 [00:55<00:33, 4.17s/contig]
Aggregating on fragScaff_50. Overall progress: 65%|██████▌ | 13/20 [00:58<00:27, 3.90s/contig]
Aggregating on Chr5. Overall progress: 65%|██████▌ | 13/20 [00:58<00:27, 3.90s/contig]
Aggregating on Chr5. Overall progress: 70%|███████ | 14/20 [01:05<00:27, 4.62s/contig]
Aggregating on fragScaff_356. Overall progress: 70%|███████ | 14/20 [01:05<00:27, 4.62s/contig]
Aggregating on fragScaff_356. Overall progress: 75%|███████▌ | 15/20 [01:08<00:21, 4.24s/contig]
Aggregating on fragScaff_6. Overall progress: 75%|███████▌ | 15/20 [01:08<00:21, 4.24s/contig]
Aggregating on fragScaff_6. Overall progress: 80%|████████ | 16/20 [01:11<00:15, 3.95s/contig]
Aggregating on fragScaff_1. Overall progress: 80%|████████ | 16/20 [01:11<00:15, 3.95s/contig]
Aggregating on fragScaff_1. Overall progress: 85%|████████▌ | 17/20 [01:14<00:11, 3.73s/contig]
Aggregating on fragScaff_15. Overall progress: 85%|████████▌ | 17/20 [01:14<00:11, 3.73s/contig]
Aggregating on fragScaff_15. Overall progress: 90%|█████████ | 18/20 [01:18<00:07, 3.58s/contig]
Aggregating on Chr9. Overall progress: 90%|█████████ | 18/20 [01:18<00:07, 3.58s/contig]
Aggregating on Chr9. Overall progress: 95%|█████████▌| 19/20 [01:23<00:04, 4.19s/contig]
Aggregating on Chr3. Overall progress: 95%|█████████▌| 19/20 [01:23<00:04, 4.19s/contig]
Aggregating on Chr3. Overall progress: 100%|██████████| 20/20 [01:30<00:00, 4.99s/contig]
Aggregating on Chr3. Overall progress: 100%|██████████| 20/20 [01:30<00:00, 4.53s/contig]
2024-07-18 19:10:41,093 - distributed.core - ERROR - Exception while handling op shuffle_receive
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,572 - distributed.core - ERROR - Exception while handling op shuffle_receive
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,585 - distributed.shuffle._comms - ERROR -
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_comms.py", line 72, in _process
await self.send(address, shards)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 124, in send
return await self.rpc(address).shuffle_receive(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,619 - distributed.shuffle._comms - ERROR -
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_comms.py", line 72, in _process
await self.send(address, shards)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 124, in send
return await self.rpc(address).shuffle_receive(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,699 - distributed.worker - WARNING - Compute Failed
Key: ('shuffle-transfer-88e0cfaa8489bdcfe8c2e364a0dbb424', 12)
Function: shuffle_transfer
args: ( tf_target weighted_binding _partitions
0 CNI_011695—CNI_029437 0.009343 8
1 CNI_028301—CNI_029437 0.008639 18
2 CNI_003268—CNI_029437 0.010407 3
3 CNI_007104—CNI_029437 0.004793 5
4 CNI_017922—CNI_029437 0.006409 12
.. ... ... ...
263 CNI_017527—CNI_029437 0.009470 12
264 CNI_008075—CNI_029437 0.011968 5
265 CNI_008091—CNI_029437 0.004803 5
266 CNI_025698—CNI_029437 0.005376 17
267 CNI_018002—CNI_029437 0.008188 13

[268 rows x 3 columns], '88e0cfaa8489bdcfe8c2e364a0dbb424', 12, 19, '_partitions', {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18})
kwargs: {}
Exception: "RuntimeError('shuffle_transfer failed during shuffle 88e0cfaa8489bdcfe8c2e364a0dbb424')"

2024-07-18 19:10:41,708 - distributed.core - ERROR - Exception while handling op shuffle_receive
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 610, in shuffle_receive
shuffle = await self._get_shuffle_run(shuffle_id, run_id)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 709, in _get_shuffle_run
raise shuffle._exception
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_shuffle.py", line 63, in shuffle_transfer
return _get_worker_extension().add_partition(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 652, in add_partition
shuffle = self.get_or_create_shuffle(shuffle_id, type=type, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 898, in get_or_create_shuffle
return sync(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 418, in sync
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 391, in f
result = yield future
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/tornado/gen.py", line 766, in run
value = future.result()
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 740, in _get_or_create_shuffle
raise shuffle._exception
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,737 - distributed.shuffle._comms - ERROR -
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_comms.py", line 72, in _process
await self.send(address, shards)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 124, in send
return await self.rpc(address).shuffle_receive(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 610, in shuffle_receive
shuffle = await self._get_shuffle_run(shuffle_id, run_id)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 709, in _get_shuffle_run
raise shuffle._exception
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_shuffle.py", line 63, in shuffle_transfer
return _get_worker_extension().add_partition(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 652, in add_partition
shuffle = self.get_or_create_shuffle(shuffle_id, type=type, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 898, in get_or_create_shuffle
return sync(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 418, in sync
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 391, in f
result = yield future
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/tornado/gen.py", line 766, in run
value = future.result()
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 740, in _get_or_create_shuffle
raise shuffle._exception
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)
AssertionError
2024-07-18 19:10:41,753 - distributed.core - ERROR - Exception while handling op shuffle_get
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 922, in _handle_comm
result = handler(**msg)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_scheduler_extension.py", line 143, in get
state = self.states[id]
KeyError: '88e0cfaa8489bdcfe8c2e364a0dbb424'
2024-07-18 19:10:41 | ERROR | An error has been caught in function '', process 'MainProcess' (1953258), thread 'MainThread' (22995272511808):
Traceback (most recent call last):

File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_shuffle.py", line 63, in shuffle_transfer
return _get_worker_extension().add_partition(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 652, in add_partition
shuffle = self.get_or_create_shuffle(shuffle_id, type=type, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 898, in get_or_create_shuffle
return sync(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 418, in sync
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 391, in f
result = yield future
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/tornado/gen.py", line 766, in run
value = future.result()
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 740, in _get_or_create_shuffle
raise shuffle._exception
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 611, in shuffle_receive
await shuffle.receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 486, in receive
await self._receive(data)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 501, in _receive
groups = await self.offload(self._repartition_buffers, filtered)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 133, in offload
return await asyncio.get_running_loop().run_in_executor(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 510, in _repartition_buffers
groups = split_by_partition(table, self.column)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 995, in split_by_partition
assert len(partitions) == len(shards)

AssertionError: assert len(partitions) == len(shards)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/gpfshddpool/home/.conda/envs/ananse/bin/ananse", line 609, in
args.func(args)
│ │ └ Namespace(binding='Cni_G_binding/binding.h5', fin_expression=['G.mean.tpm'], genome='Cni.v2.assembly.chr.fasta', annotation='...
│ └ <function network at 0x14e8e01f8550>
└ Namespace(binding='Cni_G_binding/binding.h5', fin_expression=['G.mean.tpm'], genome='Cni.v2.assembly.chr.fasta', annotation='...
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/ananse/commands/network.py", line 37, in network
b.run_network(
│ └ <function Network.run_network at 0x14e9d3d4a550>
└ <ananse.network.Network object at 0x14e8c7a50550>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/ananse/network.py", line 619, in run_network
df_binding = self.aggregate_binding(
│ └ <function Network.aggregate_binding at 0x14e9d3d4a3a0>
└ <ananse.network.Network object at 0x14e8c7a50550>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/ananse/network.py", line 465, in aggregate_binding
df = ddf.compute()
│ └ <function DaskMethodsMixin.compute at 0x14e9fc7ac700>
└ Dask DataFrame Structure:
weighted_binding
npartitions=19
CNI_000236—CNI_000017...
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/dask/base.py", line 310, in compute
(result,) = compute(self, traverse=False, **kwargs)
│ │ └ {}
│ └ Dask DataFrame Structure:
│ weighted_binding
│ npartitions=19
│ CNI_000236—CNI_000017...
└ <function compute at 0x14e9fc7acb80>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/dask/base.py", line 595, in compute
results = schedule(dsk, keys, **kwargs)
│ │ │ └ {}
│ │ └ [[('sort_index-db7e01245a05884dd414652652916111', 0), ('sort_index-db7e01245a05884dd414652652916111', 1), ('sort_index-db7e01...
│ └ HighLevelGraph with 3 layers.
│ <dask.highlevelgraph.HighLevelGraph object at 0x14e5ddc3b040>
│ 0. assign-3f2c55f95675be1e0c4e6a...
└ <bound method Client.get of <Client: 'tcp://127.0.0.1:40061' processes=1 threads=2, memory=0.98 TiB>>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/client.py", line 3243, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
│ │ │ │ └ None
│ │ │ └ None
│ │ └ [[<Future: cancelled, key: ('sort_index-db7e01245a05884dd414652652916111', 0)>, <Future: cancelled, key: ('sort_index-db7e012...
│ └ <function Client.gather at 0x14e8c7b83c10>
└ <Client: 'tcp://127.0.0.1:40061' processes=1 threads=2, memory=0.98 TiB>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/client.py", line 2368, in gather
return self.sync(
│ └ <function SyncMethodMixin.sync at 0x14e8d7fbd3a0>
└ <Client: 'tcp://127.0.0.1:40061' processes=1 threads=2, memory=0.98 TiB>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 351, in sync
return sync(
└ <function sync at 0x14e8d7fbd430>
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 418, in sync
raise exc.with_traceback(tb)
│ │ └ <traceback object at 0x14e8b762ad40>
│ └ <method 'with_traceback' of 'BaseException' objects>
└ RuntimeError('shuffle_transfer failed during shuffle 88e0cfaa8489bdcfe8c2e364a0dbb424')
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/utils.py", line 391, in f
result = yield future
└ <Task finished name='Task-14748' coro=<Client._gather() done, defined at /gpfshddpool/home/.conda/envs/ananse/lib/p...
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/tornado/gen.py", line 766, in run
value = future.result()
└ None
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/client.py", line 2231, in _gather
raise exception.with_traceback(traceback)
│ │ └ <traceback object at 0x14e84dd39e40>
│ └ <method 'with_traceback' of 'BaseException' objects>
└ RuntimeError('shuffle_transfer failed during shuffle 88e0cfaa8489bdcfe8c2e364a0dbb424')
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_shuffle.py", line 73, in shuffle_transfer
raise RuntimeError(f"shuffle_transfer failed during shuffle {id}") from e

RuntimeError: shuffle_transfer failed during shuffle 88e0cfaa8489bdcfe8c2e364a0dbb424
2024-07-18 19:10:42,071 - distributed.core - ERROR - Exception while handling op shuffle_receive
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 610, in shuffle_receive
shuffle = await self._get_shuffle_run(shuffle_id, run_id)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 699, in _get_shuffle_run
shuffle = await self._refresh_shuffle(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 766, in _refresh_shuffle
result = await self.worker.scheduler.shuffle_get(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 922, in _handle_comm
result = handler(**msg)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_scheduler_extension.py", line 143, in get
state = self.states[id]
KeyError: '88e0cfaa8489bdcfe8c2e364a0dbb424'
2024-07-18 19:10:42,080 - distributed.shuffle._comms - ERROR - '88e0cfaa8489bdcfe8c2e364a0dbb424'
Traceback (most recent call last):
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_comms.py", line 72, in _process
await self.send(address, shards)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 124, in send
return await self.rpc(address).shuffle_receive(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 924, in _handle_comm
result = await result
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 610, in shuffle_receive
shuffle = await self._get_shuffle_run(shuffle_id, run_id)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 699, in _get_shuffle_run
shuffle = await self._refresh_shuffle(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_worker_extension.py", line 766, in _refresh_shuffle
result = await self.worker.scheduler.shuffle_get(
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1365, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 1149, in send_recv
raise exc.with_traceback(tb)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/core.py", line 922, in _handle_comm
result = handler(**msg)
File "/gpfshddpool/home/.conda/envs/ananse/lib/python3.9/site-packages/distributed/shuffle/_scheduler_extension.py", line 143, in get
state = self.states[id]
KeyError: '88e0cfaa8489bdcfe8c2e364a0dbb424'
`
What should I do to resolve the problem?

Thanks in advance and all the best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant