Ethereum All Core Developers Execution Call #159 Writeup
On April 13, Ethereum developers gathered for their 159th All Core Developers Execution (ACDE) call. Chaired by the Ethereum Foundation’s Tim Beiko, the ACDE calls are a bi-weekly meeting series where Ethereum developers discuss and coordinate changes to the execution layer (EL) of Ethereum. This week, developers discussed the Shanghai upgrade, which was activated on mainnet Ethereum a day prior on Wednesday, April 12. Developers discussed issues around missed slots and high CPU loads on node operators at the time of the upgrade. Other than Shanghai, developers also shared thoughts around Ethereum’s next EL upgrade, called Cancun, and a few other miscellaneous topics around Ethereum’s execution API specifications for payload ids and MEV-Boost relays.
Ethereum’s Shanghai upgrade was activated on Wednesday, April 12, at 6:27pm (ET). The upgrade went smoothly, with at least 75% of node operators having upgraded their computers in advance to support this hard fork. As explained in this Galaxy research report, Shanghai primarily enables staked ETH withdrawals. Since the activation of Shanghai, over 98,000 withdrawals have been processed and over 203,000 ETH distributed to validators. In addition, over 87,000 validators have successfully updated their withdrawal credentials so that rewards earned from Beacon Chain issuance can be automatically processed and deposited to their designated Ethereum EL account. The following table summarizes the main impacts of Shanghai in the hours leading up to and shortly following the activation of the upgrade:
Danny Ryan, Chair of the All Core Developer Consensus (ACDC) calls, noted that at the time of the upgrade, the network saw a dip in the number of block proposals. “I think one of the most interesting things that I observed is a number of validators were attesting [to blocks], but not proposing. That’s always kind of an interesting thing, because it means their nodes are up, and they’re following the chain, and they’re voting on things, but then this other component, block proposals, are borked,” said Ryan. The dip in the number of block proposals was caused by two factors. First, Ryan noted that there was an extremely high volume of withdrawal credential change messages, over 40,000, being gossiped by nodes at the time of the fork. This put high CPU loads on nodes, impacting block propagation speeds and leading to higher missed slots. Second, there was a bug in Prysm client software, which is run by close to 40% of Ethereum node operators. Terence Tsao, a developer for Prysm, explained the bug was related to MEV-Boost software, which is software that connects validators to third-party block builders.
Prysm bug post-mortem
Essentially, validators running Prysm consensus layer (CL) software would fail to propose blocks through an MEV relay due to incorrect block signature verification. Chris Hager from the Flashbots team said that he identified roughly 121 invalid block signatures across the Gnosis, Ultrasound, and Flashbots relays. Both the Flashbots and Prysm teams are working on a more detailed post-mortem of the bug in a forthcoming blog post. Tsao highlighted that when the bug was first discovered on mainnet, the MEV-Boost circuit break mechanism worked correctly, automatically switching Prysm validators from block production through MEV-Boost to local block production after five consecutive missed slots. Additionally, through coordination with the Flashbots team and other relay operators, developers were able to create a patch for relays to reject all subsequent blocks submitted by a Prysm validator. Hager from the Flashbots team explained that the patch for relays relies on identifying validators by a parameter known as the “user agent.” Validators that have since upgraded their client version to the latest release put out by the Prysm team, which fixes known issues around MEV-Boost block signatures, are identified by a new user agent that will not be automatically rejected by relays.
Then, developers discussed ways to improve testing around client software and its interaction with MEV-Boost for future hard forks. Mario Vega who is on the testing team at the Ethereum Foundation said that new test cases would be created in the Hive test suite in light of the Prysm bug. Parithosh Jayanthi, a DevOps Engineer at Ethereum Foundation, said the Prysm bug could have been caught if the interaction between Prysm validators, MEV-Boost relays, and builders had been tested earlier in the process of shadow forking and debugging. Hager noted that the circumstances triggering the Prysm bug were not being tested on earlier shadow forks and that more test coverage to catch different “edge cases” in client interactions with MEV-Boost software was needed.
A Prysm developer by the name of “Potuz,” added: “We should set up a way in that all of our coding is oriented towards testing the builder, because most of our blocks are going through the builder. This is a ridiculous bug that should not have happened at all like there’s many places where this should have been tested and all of them failed at the same time. This is just impressive. On Goerli [testnet], for example, we went back and checked the day of the fork and we only had like three missed [block] proposals out of all of those that were due to this bug and they were just lost in random noise. This is something that Hive should have tested and we should have tested on a unit test and we failed to do it. I guess it’s because we’re set up to thinking not on builder but thinking on the happy local [block] execution path.”
In addition to the Prysm bug, there was a minor bug identified in the Lighthouse (CL) client as well. It appears at the time of the upgrade validators running Lighthouse client software were not caching, that is storing data, on validator exits correctly. The Lighthouse team has since put out a new client release fixing this issue. Potuz mentioned that the Lighthouse error was in part triggered by the high number of missed slots caused by buggy Prysm validators.
On the topic of CL client bugs, Ben Edgington, a developer from the Teku (CL) client team, reported slow block processing times across some validators running Teku software. He explained that block imports usually take roughly 100 milliseconds for a validator node to process but that since the activation of Shanghai, these numbers have shot up occasionally by a factor of 10, pushing block import speeds over a second, and negatively impacting Teku validator block attestations. The issue appears to be less frequent across nodes the more time that passes from Shanghai, said Edgington. However, the Teku team continues to monitor the situation closely and investigate the root cause of the matter.
With Shanghai complete, Parithosh Jayanthi asked client teams about launching new devnets and testnets with a genesis state beginning with the activation of Shanghai and Capella, rather than the prior upgrade, Paris and Bellatrix. Jayanthi said that he would reach out to client teams about this minor change to client testing support. In addition, Jayanthi gave an announcement that the public Zhejiang testnet which was created for testing withdrawals would be deprecated next Wednesday, April 19. Any users or developers actively testing code on the Zhejiang testnet are encouraged to reach out to Jayanthi if more time is needed.
The successful completion of Shanghai also means that there will be a new release of CL specifications. Up until now, the code for the Capella upgrade, which is the name of the upgrade on the CL, has been recorded on GitHub as a “release candidate.” New CL specifications incorporating Capella code changes as a non-release candidate will be published in the forthcoming days, said Danny Ryan. On the EL side, the Ethereum Improvement Proposal (EIPs) that have been included in Shanghai should be marked as “final.” The authors of Shanghai EIPs are encouraged to open a pull request on GitHub to update these files and mark them as final.
Following the discussion on Shanghai, developers discussed EIP candidates for Cancun; namely, developers discussed EIP 4788 which is a code change to expose data about the state root of the CL in the EL. There are a few key benefits to doing this. The main one is that proofs about the state of the Beacon Chain could be created and verified in a trust-minimized way by smart contracts, such as decentralized staking pools. Since the EIP was proposed, Ethereum Foundation researcher Alex Stokes has made several updates to the EIP draft, including but not limited to:
Exposing the Beacon Chain block root in the EVM rather than the state root,
Use of a ring buffer to ensure that the use of storage for this feature stays constant over time,
Use of header timestamps to derive slot numbers, rather than consume additional header space.
About the use of block header timestamps, Stokes asked EL client teams whether this logic was something they would be comfortable implementing given that it creates greater interdependencies between the EL and CL that otherwise would have been kept separate. Developers discussed a few other paths forward in lieu of the use of block header timestamps on the call and agreed to revisit the matter during next week’s CL call.
Alongside EIP 4788, there are at least nine other EIPs being considered for inclusion in the next Ethereum upgrade. Rather than decide on which to include for Cancun/Deneb one by one, Beiko suggested that developers review them all at a high-level over the next few weeks and then have a discussion evaluating them all holistically. Beiko encouraged developers to reference the Cancun meta thread discussion forum on the Ethereum Magicians website for a complete list of proposed EIPs. Based on the volume of proposals, only a handful of them will be prioritized for the next Ethereum upgrade. The single EIP that has been confirmed for inclusion in Cancun/Deneb is EIP 4844, which reduces the costs of Layer-2 transactions. For more information about EIP 4844, read prior call notes here.
Mikhail Kalinin from the Teku (CL) client team highlighted a minor change to the Ethereum execution API around block payload IDs. The change was prompted by a bug initially discovered in the Nimbus/Besu (CL/EL) client combination but also impacting other clients such as the Nethermind (EL) client. The proposed change simply clarifies logic around stating block payload IDs as unique values for each instance of the “payloadattributes” field. For more information on this change, read the GitHub pull request here.
Alex Stokes, Researcher for the EF, highlighted that there is a custom JSON RPC implementation with the Geth (EL) client that enables block verification from third-party builders in the Flashbots relay. For the sake of creating greater client diversity not only in the protocol layer of Ethereum but also between MEV relay operators, Stokes asked whether client teams would have an appetite for creating similar RPC implementations for supporting builder block verification functionality across all EL clients. EL client teams were not particularly vocal about their support one way or the other, so Stokes agreed to work on building the specifications for the RPC implementation for now.