You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been playing around with various DRAM modules on a ZCU104 trying to get the setup to work, but I've been largely unsuccessful up till now. I'm still trying different troubleshooting steps, but as I've dug deeper into how the whole system works, I've noticed that the clock frequency used on the ZCU104 results in a DRAM clock frequency of 500 MHz (tck = 2 ns). This is surprising to me considering that many DRAM chip data sheets specify maximum clock periods of 1.8 or 1.5 ns per period. Some of the DIMMs I'm testing only spec their operation down to 1600 MT/s, which would be an 800 MHz clock.
I've tried synthesizing with a higher clock frequency, but the LiteX RowHammer module uses the ISERDESE3 blocks in the UltraScale+ I/O. These have a minimum clock period of 1.6ns, or 625 MHz. This doesn't even get us to the 666 MHz that is nominally required for 1333 MT/s speed grade. It seems that enabling realistic DDR4 speeds would entail changes to the upstream LiteX PHY before timing errors resulting from the Rowhammer-tester framework, itself, are even relevant.
I'm still troubleshooting my setup, but at this point I cannot rule out the possibility that the setup is actually running too slow for my DIMMs to operate correctly. Does anybody have any comments on this? I know that normally we worry about things running too fast to work properly and that things running too slowly isn't a problem, but the fact that DRAM spec sheets give max tck specifications leads me to believe there may be some reason for these max values. Perhaps there's a clock synchronizer on the other end that only works above certain frequencies? Perhaps it has to do with refresh?
An example of an error I'm seeing, on one of my modules, is intermittent memory training failure: about half the time, a single bit fails in the memtest at the end of memory training. I've checked and double-checked the module configuration I'm using to build, and I've tried both a Python class defined from data sheet timing and building from the parameters in SPD. The single-bit failure in memory training translates to 1-bit errors in ~200 rows that should not experience errors (because they are far away from the rows being hammered) whenever I run the hw_rowhammer.py script with the parameter --experiment-no 1. Conspicuously, these 1-bit errors in distant rows do not seem to occur when I run the software Rowhammer.py script. I haven't looked into why this might be the case, but my hypothesis is that it has to do with the fact that the BIST logic runs the DRAM at its "capacity" while the wishbone Rowhammer reads allow much more slack in timing.
I'd appreciate any thoughts or advice on why I might be seeing errors like this, and whether the fact that tck is above any realistic DDR4 value could be the cause.
Thanks,
Jacob
The text was updated successfully, but these errors were encountered:
I'm still troubleshooting my setup, but at this point I cannot rule out the possibility that the setup is actually running too slow for my DIMMs to operate correctly. Does anybody have any comments on this?
Hi @jaccharrison, I believe that increasing the speed could potentially make the results more stable. I think that the ranges specified in the DDR4 specs are the one that have been thoroughly tested, while speeds that fall out of those ranges, despite still working, may showcase instabilities.
Not necessarily lower speeds would not work, at least from the experiments we ran. For instance, we have tested DDR4 RDIMMs using the datacenter board tester, reaching speeds up to 800 MT/s, or even 400 MT/s and the memory training still passes correctly.
Increasing the MT/s would require increasing either the sys_clk frequency or the phases count, which if I am not mistaken is set to 4 by default when instantiating a DDR4 memory.
Moreover, higher sys_clk could cause timing failures during place&route though, but it is something that can be tested to verify if increasing the clock frequency makes things more stable.
Hi there,
I've been playing around with various DRAM modules on a ZCU104 trying to get the setup to work, but I've been largely unsuccessful up till now. I'm still trying different troubleshooting steps, but as I've dug deeper into how the whole system works, I've noticed that the clock frequency used on the ZCU104 results in a DRAM clock frequency of 500 MHz (tck = 2 ns). This is surprising to me considering that many DRAM chip data sheets specify maximum clock periods of 1.8 or 1.5 ns per period. Some of the DIMMs I'm testing only spec their operation down to 1600 MT/s, which would be an 800 MHz clock.
I've tried synthesizing with a higher clock frequency, but the LiteX RowHammer module uses the ISERDESE3 blocks in the UltraScale+ I/O. These have a minimum clock period of 1.6ns, or 625 MHz. This doesn't even get us to the 666 MHz that is nominally required for 1333 MT/s speed grade. It seems that enabling realistic DDR4 speeds would entail changes to the upstream LiteX PHY before timing errors resulting from the Rowhammer-tester framework, itself, are even relevant.
I'm still troubleshooting my setup, but at this point I cannot rule out the possibility that the setup is actually running too slow for my DIMMs to operate correctly. Does anybody have any comments on this? I know that normally we worry about things running too fast to work properly and that things running too slowly isn't a problem, but the fact that DRAM spec sheets give max
tck
specifications leads me to believe there may be some reason for these max values. Perhaps there's a clock synchronizer on the other end that only works above certain frequencies? Perhaps it has to do with refresh?An example of an error I'm seeing, on one of my modules, is intermittent memory training failure: about half the time, a single bit fails in the memtest at the end of memory training. I've checked and double-checked the module configuration I'm using to build, and I've tried both a Python class defined from data sheet timing and building from the parameters in SPD. The single-bit failure in memory training translates to 1-bit errors in ~200 rows that should not experience errors (because they are far away from the rows being hammered) whenever I run the hw_rowhammer.py script with the parameter
--experiment-no 1
. Conspicuously, these 1-bit errors in distant rows do not seem to occur when I run the software Rowhammer.py script. I haven't looked into why this might be the case, but my hypothesis is that it has to do with the fact that the BIST logic runs the DRAM at its "capacity" while the wishbone Rowhammer reads allow much more slack in timing.I'd appreciate any thoughts or advice on why I might be seeing errors like this, and whether the fact that
tck
is above any realistic DDR4 value could be the cause.Thanks,
Jacob
The text was updated successfully, but these errors were encountered: