Tales of FPGA - Road to SDRAM - E03

Posted on September 13, 2022 in hacks

This post is the third one of a serie which starts with Episode 1, in which we have presented what FPGA are and how to control them, and continues with Episode 2, in which we have discovered the importance of time.

The ultimate goal of this serie of posts is to provide a simple working example of a design that can reliably access the SDRAM chip on a Digilent Arty S7 FPGA development board.

In order to understand why we want to access that chip, let us first review the different types of RAM that can exist in a computer.

SRAM

The first and most simple memory type is Static Random Access Memory (SRAM). It is an array of memory cells and exposes:

On each clock signal, depending on the RW signal, the cells specified by the address bus are updated with the value specified by the data bus, or the data bus is updated with the value contained by the specified cells.

Each cell (i.e. each bit) is implemented with about six transistors, and is stable, or static: it keeps its value around for as long as power is supplied. Circuitry is minimal, and it is fast, as in: it can typically keep up with the CPU speed. In other words, the CPU can request a value on a clock tick, and read that value on the next clock tick.

(S)DRAM

Unfortunately, due to their transistors count, SRAM cells require a lot (relatively) of silicon, which implies low density and high costs. For instance, a modern Intel Core i9 CPU carries only about 90KB of SRAM L1 cache per core.

In order to increase density and lower costs, cells have to be simplified down to the simplest cell, which is composed of only one transistor and one capacitor: this type of cell is at the core of most modern memory types.

Its drawback is that it is not stable: its capacitor charge leaks and the cell loses its value after a while. Additional circuitry is therefore required to refresh (i.e. access) it periodically. Hence the name: Dynamic Random Access Memory (DRAM).

Due to their need to be refreshed, but also their density, DRAM have a different and more complex interface. Cells are organized in banks and rows and columns, which must be selected before any read or write can take place. A read or a write now takes several clock ticks, and more control lines determine which operation to achieve.

Simple, original DRAM operated asynchronously with the CPU's clock. In order to increase efficiency and throughput, they evolved into Synchronized DRAM (SDRAM) where everything is timed by the CPU. This allows for more optimizations such as pipelining: for instance, selecting the rows and columns of the next read, while receiving data for the previous read.

DDR SDRAM

In order to further increase the memory bandwidth and reduce latency, the frequency of the data line would need to increase. However, a higher frequency means more energy consumption and heat—this is the issue faced by all CPUs today.

To mitigate these side effects, Double Data Rate (DDR) SDRAM were introduced: they keep the same frequency, but transfer data on each edge of the clock, not only on each clock cycle, thus doubling the bandwidth.

DDR was followed by DDR2, DDR3, etc. standards which add more "tricks" to the SDRAM interface, such as burst reads & writes, all in order to increase the speed of memories.

Nevertheless, and despite all optimizations, SDRAM simply cannot keep up with modern CPU speeds anymore. The CPU cannot expect to request a value, and get it on the next clock tick, as it did with SRAM when computers had only a few KB of RAM. Modern CPU have to rely on various levels of caches and all sorts of optimizations in order to remain busy while waiting for memory reads and writes.

FPGA RAM

Now that we have reviewed the different types of memories, let us consider what exists on our FPGA board:

Accessing the SRAM blocks is relatively easy. Once instantiated in a design, one "just" needs to drive the address and data busses correctly. And, it is fast. Alas, the total available capacity is quite small.

Which leads us to the SDRAM chip. However, accessing that chip is an entirely different story, and the DDR3 protocol is a complex beast to implement. That will be the topic of the next post.

There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.