Many `Trying to notify an import but the channel is full` during Spammening on Kusama #6600

AndreiEres · 2024-11-21T17:07:37Z

txpool produced many warnings on Kusama, cc @michalkucharczyk

Logs in Grafana
https://grafana.teleport.parity.io/goto/duhHadnNg?orgId=1

michalkucharczyk · 2024-11-21T17:49:54Z

The problem is with this channel:

polkadot-sdk/substrate/client/transaction-pool/src/graph/validated_pool.rs

Lines 592 to 598 in bf20a9e

    
           pub fn import_notification_stream(&self) -> EventStream<ExtrinsicHash<B>> { 
        
           	const CHANNEL_BUFFER_SIZE: usize = 1024; 
        
           	let (sink, stream) = channel(CHANNEL_BUFFER_SIZE); 
        
           	self.import_notification_sinks.lock().push(sink); 
        
           	stream 
        
           }

Which is later consumed by transaction networking protocol:

polkadot-sdk/substrate/client/service/src/builder.rs

Lines 570 to 596 in bf20a9e

    
           pub async fn propagate_transaction_notifications<Block, ExPool>( 
        
           	transaction_pool: Arc<ExPool>, 
        
           	tx_handler_controller: sc_network_transactions::TransactionsHandlerController< 
        
           		<Block as BlockT>::Hash, 
        
           	>, 
        
           	telemetry: Option<TelemetryHandle>, 
        
           ) where 
        
           	Block: BlockT, 
        
           	ExPool: MaintainedTransactionPool<Block = Block, Hash = <Block as BlockT>::Hash>, 
        
           { 
        
           	// transaction notifications 
        
           	transaction_pool 
        
           		.import_notification_stream() 
        
           		.for_each(move |hash| { 
        
           			tx_handler_controller.propagate_transaction(hash); 
        
           			let status = transaction_pool.status(); 
        
           			telemetry!( 
        
           				telemetry; 
        
           				SUBSTRATE_INFO; 
        
           				"txpool.import"; 
        
           				"ready" => status.ready, 
        
           				"future" => status.future, 
        
           			); 
        
           			ready(()) 
        
           		}) 
        
           		.await; 
        
           }

This probably gets clogged and may lead to #5486.

We would need some improvements here. I don't see easy fix for this.

This issue is probably one more reason to start working on new transaction protocol.

bkchr · 2024-11-21T22:26:27Z

I think this should fix the issue. Or at least reduce its likelihood.

michalkucharczyk · 2024-11-22T07:30:09Z

I think this should fix the issue. Or at least reduce its likelihood.

Reading status is indeed under the same lock as import. But status operation itself is very cheap.

Also the conflicting lock during import is relatively cheap (comparing to whole work done in submit).
It just protects importing the transaction to the graph:

polkadot-sdk/substrate/client/transaction-pool/src/graph/validated_pool.rs

Line 210 in 7c5224c

let imported = self.pool.write().import(tx)?;

which is here:

polkadot-sdk/substrate/client/transaction-pool/src/graph/base_pool.rs

Line 275 in 7c5224c

    
           pub fn import(&mut self, tx: Transaction<Hash, Ex>) -> error::Result<Imported<Hash, Ex>> {

So by saying "both are cheap" - I try to say they don't interfer with each other to the extent that cause import is blocking status.

I also quickly tested proposed fix locally with zombienet (rpc + 3 collators + spamming transactions), and I still can see warning Trying to notify an import but the channel is full.

The fix still makes sense - there is no reason to update telemetry with every transaction. But from what I can say it will not solve this problem.

Maybe we should just remove the warning and accept that propagating single transactions on import notification won't happen for every tx. They will be propagate in the batch, which is happening periodically here:

polkadot-sdk/substrate/client/network/transactions/src/lib.rs

Line 299 in 8159f80

self.propagate_transactions();

bkchr · 2024-11-22T07:37:04Z

I also quickly tested proposed fix locally with zombienet (rpc + 3 collators + spamming transactions), and I still can see warning Trying to notify an import but the channel is full.

Hmm :D

Basically we are just forwarding from one channel to another, weird that this is not be able to keep up with the speed we are importing tx.

michalkucharczyk · 2024-11-22T08:00:17Z

I think it goes down to:

polkadot-sdk/substrate/client/network/transactions/src/lib.rs

Lines 505 to 508 in 7c5224c

    
           for to_send in to_send { 
        
           	let _ = self 
        
           		.notification_service 
        
           		.send_sync_notification(who, vec![to_send].encode());

and then to:

polkadot-sdk/substrate/client/network/src/protocol/notifications/handler.rs

Lines 412 to 429 in 7c5224c

    
           pub fn send_sync_notification(&self, message: impl Into<Vec<u8>>) { 
        
           	let mut lock = self.inner.sync_channel.lock(); 
        
           	if let Some(tx) = lock.as_mut() { 
        
           		let message = message.into(); 
        
           		let result = tx.try_send(NotificationsSinkMessage::Notification { message }); 
        
           		if result.is_err() { 
        
           			// Cloning the `mpsc::Sender` guarantees the allocation of an extra spot in the 
        
           			// buffer, and therefore `try_send` will succeed. 
        
           			let _result2 = tx.clone().try_send(NotificationsSinkMessage::ForceClose); 
        
           			debug_assert!(_result2.map(|()| true).unwrap_or_else(|err| err.is_disconnected())); 
        
           			// Destroy the sender in order to not send more `ForceClose` messages. 
        
           			*lock = None; 
        
           		} 
        
           	} 
        
           }

So maybe, once the network channel to the other peer is closed, the other side of the sync_channel in notifications_service stops consuming event? And the import_notification_sink can get clogged.

But I speculate here, as I am not yet familiar with this part of codebase.

michalkucharczyk · 2024-11-22T08:19:44Z

And the size of SYNC_CHANNEL is relatively small comparing to number of txs imported during the spam test:

polkadot-sdk/substrate/client/network/src/protocol/notifications/handler.rs

Line 98 in 7c5224c

const SYNC_NOTIFICATIONS_BUFFER_SIZE: usize = 2048;

michalkucharczyk mentioned this issue Nov 22, 2024

Notify telemetry only every second about the tx pool status #6605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many `Trying to notify an import but the channel is full` during Spammening on Kusama #6600

Many `Trying to notify an import but the channel is full` during Spammening on Kusama #6600

AndreiEres commented Nov 21, 2024

michalkucharczyk commented Nov 21, 2024

bkchr commented Nov 21, 2024 •

edited

Loading

michalkucharczyk commented Nov 22, 2024

bkchr commented Nov 22, 2024

michalkucharczyk commented Nov 22, 2024 •

edited

Loading

michalkucharczyk commented Nov 22, 2024 •

edited

Loading

Many Trying to notify an import but the channel is full during Spammening on Kusama #6600

Many Trying to notify an import but the channel is full during Spammening on Kusama #6600

Comments

AndreiEres commented Nov 21, 2024

michalkucharczyk commented Nov 21, 2024

bkchr commented Nov 21, 2024 • edited Loading

michalkucharczyk commented Nov 22, 2024

bkchr commented Nov 22, 2024

michalkucharczyk commented Nov 22, 2024 • edited Loading

michalkucharczyk commented Nov 22, 2024 • edited Loading

Many `Trying to notify an import but the channel is full` during Spammening on Kusama #6600

Many `Trying to notify an import but the channel is full` during Spammening on Kusama #6600

bkchr commented Nov 21, 2024 •

edited

Loading

michalkucharczyk commented Nov 22, 2024 •

edited

Loading

michalkucharczyk commented Nov 22, 2024 •

edited

Loading