Ethereum Consensus Layer Call #92
Ethereum core developers concluded their 92nd bi-weekly Zoom meeting on Thursday, July 28th. These calls are attended primarily by developers building the consensus layer (CL) software of Ethereum, while the fortnightly All Core Developers (ACD) calls are attended by developers building either the consensus or execution layer (EL) of Ethereum. For more information about what the EL and CL of Ethereum are, read this Galaxy Digital Report.
On Thursday, developers discussed:
EL client behavior around multiple terminal blocks received before the Merge transition
Enabling early EL/CL node configurations before timing around the Merge is finalized
Extending the definition of optimistic nodes in CL code specifications
Addition of a circuit breaker mechanism to validators running MEV-Boos post-Merge
EL client behavior around multiple terminal blocks
As raised by Ethereum developer Mikhail Kalinin, there remains some discrepancy among EL client implementations around the correct behavior when receiving and evaluating between multiple conflicting blocks around the time of the Merge transition. The issue was first noticed in Nethermind EL nodes during a prior Goerli shadow fork and caused Nethermind nodes to get stuck, meaning that the nodes were unable to continue syncing to the latest version of the blockchain. Developers reiterated on the call that it's important for EL client nodes to continue to process blocks mined through proof-of-work (PoW) consensus right up until the Merge is triggered through a total terminal difficulty threshold (TTD). For background on TTD, read this Galaxy Digital Research report about Merge execution risks. In the event of multiple conflicting blocks that meet TTD, EL nodes may need to retroactively execute a mini fork once the Merge upgrade is finalized.
EL client teams including Besu and Erigon are investigating their node behavior in these events and may need to issue fixes to their client implementations. Developer Mario Vega added that new hive tests are being developed to ensure the correct behavior in EL clients around multiple conflicting blocks around the time of Merge activation. Danny Ryan also emphasized on the call that additional clarification and detail would be added to code specifications for the Merge.
Enabling early EL/CL node configuration set-ups
Previously, developers discussed opening software endpoints to allow communication between CL and EL nodes through the Engine API before parameters such as the TTD are set. This would allow users to set-up their EL/CL node configurations before timing around the Merge on mainnet is even confirmed, In addition to opening endpoints, Lighthouse developer Paul Hauner explain on today's call that EL clients need to specify what value node operators should use before a real TTD is set. This dummy value for CL clients is already set as 2^256-2^10. Alternatively, a value of nil or null could be returned on the EL side. Core developers agreed to keep things simple and use the same value as what is currently specified in the CL code specifications. Hauner emphasized that the dummy value does not need to be used in any meaningful way by EL nodes but simply returned as an automated message for CL nodes to take as a placeholder until the final configurations around the Merge upgrade such as the TTD are released.
Extending the definition of optimistic nodes
Post-Merge, Ethereum CL nodes can choose to perform an "optimistic sync" of the blockchain which involves processing blocks without verifying the execution payloads. Execution payloads refer to the blocks received from the EL nodes containing user transactions and smart contracts. Nodes performing an optimistic sync of the network are referred to as optimistic nodes given that their view of the latest state of the canonical chain is partial and excludes verification of EL blocks. In this state, Mikhail Kalinin noted on today's call there is the potential for dangerous behavior if certain execution payloads end up being invalid blocks as it could trigger chain rollbacks or cause nodes to go offline. In addition, there is no general approach adopted by all CL client teams to address this edge case. Prysm developer Potuz explained on the call that invalid blocks may still be propagated by Teku and Lighthouse nodes in optimistic sync mode. Several approaches to the correct behavior of dealing with invalid blocks by optimistic nodes were discussed on today's call. It was ultimately decided that further discussion would be needed in the Ethereum Discord chat. Kalinin also agreed to write out these edge cases with optimistic sync in more detail to share in a pull request on GitHub.
Circuit breaker on MEV-Boost
Developer Alex Stokes raised an "extreme failure case" for MEV-Boost on Thursday's call. For background on MEV-Boost. click here. There is the potential for a network liveness failure because of MEV-Boost software, Stokes explained. If a relay operator responsible for connecting validators to block builders fails to release blocks at the last minute, this may result in a series of missed slots and block proposals from validators running MEV-Boost in conjunction with their CL/EL nodes. This is because a validator who has failed to propose a block at a given slot because of bad behavior by a relay has no way of communicating or alerting other validators of this bad behavior. Hence, if the next set of validators who are assigned to propose a block are also connecting to the same malicious relay, they would face the same issue of blocks being withheld. Eventually, attentive validator node operators should disconnect from the relay and fall back on building blocks locally from the Ethereum mempool, in the case that they identify a malicious relayer. However, the risk is that if it takes validator node operators a long time to identify a bad relay and avoid it, this could result in several hours of missed blocks and unconfirmed transactions on the network.
One proposal to mitigate this issue of a potential relay failure on the liveness of the network is to introduce a circuit breaker condition to the MEV-Boost code run by validator node operators. The condition could be as simple as instructing validator nodes to automatically turn off MEV-Boost after noticing 5 missed slots on the network. However, such a solution would present a risk of validators instead of a relay acting maliciously. Specifically, a malicious validator node operator controlling a large amount of staked ETH could abuse this circuit breaker condition by withholding blocks from being proposed in slots intentionally to dupe all other validators into turning off their MEV-Boost. Then, the validator node operator could monopolize earnings from MEV for a period. As such, developers discussed potentially increasing the threshold of missed slots to a higher number such as 16 to make it harder for any malicious actors from intentionally fiddling with this threshold and triggering this behavior for their own gains. Another proposal was to leave it up to CL client teams to decide on their own threshold for disabling MEV-Boost. One of the main concerns from developers when discussing a potential solution was around increasing code complexity around the Merge, which already is considered a highly complex upgrade.
In parallel, developers suggested creating a third-party dashboard that could monitor relays. The Flashbots team is already working on creating a public dashboard to monitor MEV activity. However, they will also be running the relay that the dashboard is built to track. As such, developers discussed creating an open source tool to help alert validator node operators of suspicious relay activity that could be easily operated and maintained by other network stakeholders. Micah Zoltu emphasized on Thursday's call that there is a great deal of trust being placed in relays behaving correctly in a post-Merge environment. To this end, developers are working on implementing a system for MEV that doesn't require trusted relays called proposer builder separation (PBS). However, PBS will not be ready in time for mainnet Merge activation, and it is unlikely that other temporary solutions to incentivize or penalize bad relay behavior will be ready either. As such, developers agreed on the call to continue discussing simpler solutions for this failure case of MEV-Boost in the Ethereum Discord chat.
Goerli Shadow Fork #5 was activated last week. No major issues were identified among EL or CL client implementations. MEV-Boost software implementations by Prysm, Lodestar, and Teku are being tested on the shadow fork.
Lighthouse and Nimbus implementations of MEV-Boost are close to being ready for testing.
There was another Ethereum mainnet shadow fork testing the Merge on Tuesday, July 26. The activation went well and there were no major client compatibility issues, which is a first on mainnet shadow forks.
The third and final major testnet to undergo the Merge before mainnet is Goerli. The Goerli Merge activation has been scheduled to start on August 4th and end around August 8th. All validators and node operators are encouraged to participate in this public testnet Merge activation. The relevant client releases for following along with the activation are all listed here.
EIP 4844 introduces a new transaction type on Ethereum for supporting rollups and making the cost of rollups much cheaper on the network. There are ongoing breakout sessions for learning more about EIP 4844 development. The next session open to anyone to join will be on July 29th at 14:00 (UTC).