Friday, December 10, 2010

dual clock fifo / ram - gray codes

A long time ago I tried to write a dual clock fifo, it was fun, and I somewhat succeeded. I have now done it again, only this time with a lot more knowledge and expertise.

The main issues are how to create a dual clock ram that is not dependent on the existence or speeds of the clocks involved, and how to keep track of full/empty/... signals in the different clock domains.

Creating a dual clock ram / dpr (dual port ram) is an interesting challenge. To do this properly, you can't just transfer requests from one domain to the other, b/c that's ugly... and the other clock might be off, it might be really slow in comparison, and bunches of other reasons. Perhaps you could create one really high speed clock, but that's ugly too, and costs more power, and is wasteful and more difficult to implement in my opinion.

So how do you do it? Use the wea (write enable port a), and web (write enable port b) as the inputs of an or gate which will be your clock to the ram elements. My code samples the write enables in their individual clock domains at the falling edge, and outputs a clean wea and web. Make sure the address decode logic is an input to the d of the wea and web, and not after. If the decode logic is after you will have lots of glitchy writes to addresses you didn't mean to write to. Use the we (wea or'ed with web) to clock the ram elements. Of course you will need a separate we signal for each ram element. So far this seems to work fairly reliably. You will need to check timing too...

Now onto the fifo full/empty signals. Obviously calculate the pointers against each other. But how do you get the pointer from the other side? Use a gray code!!! Convert the pointers to gray code in the originating clock domain (and sample them), then sample them twice in the destination clock domain. Then convert them back. That's the whole story. Pretty easy I'd say.

The reason you need a gray code is so that for a given increment of 1, the change in the gray code is always guaranteed to be only 1 bit of difference. The gray code I use is:
gray = (binary >> 1) ^ binary.

For example
000 = 000 >> 1 ^ 000 = 000 ^ 000 = 000
001 = 001 >> 1 ^ 001 = 000 ^ 001 = 001
010 = 010 >> 1 ^ 010 = 001 ^ 010 = 011
011 = 011 >> 1 ^ 011 = 001 ^ 011 = 010

Note that from binary 1 (001) to binary 2 (010) there are 2 bits that must change. If they are sampled without converting to gray code then you might sample the change from 1 to 2 as 011 (3). This is because the second bit arrived at the destination clock domain right before the rising clock and the first bit arrived right after the rising clock.
0 0 0 0 0 0 0 0 0      bit 2
0 0 0 0 0 0 1 1 1      bit 1
1 1 1 1 1 1 1 1 0      bit 0
......................|...      clock

This is an example of how you could clock in a completely incorrect value. Of course if the clock had risen a few moments later, or even a few moments earlier, you wouldn't have gotten this value, but when dealing with unrelated clock domains, you have way of guaranteeing when the clock will rise. Also note that this issue will come up regardless of how close you make the wires, b/c there will ALWAYS be a point in the middle where one of the bits updated and the other hasn't.

When using gray code you get:
1 = 001
2 = 011

0 0 0 0 0 0 0 0 0      bit 2
0 0 0 0 0 0 1 1 1      bit 1
1 1 1 1 1 1 1 1 1      bit 0
......................|...      clock

No matter where the clock might rise in the change between value 1 and value 2, you will either get the previous (value 1) or the new (value 2) value. You will never sample a completely incorrect value since only one bit changes.

For those like me looking for an easy bit of code to create a gray code:

Gray Code Functions in VHDL: (from memory)

function to_gray ( b : std_logic_vector(3 downto 0)) return std_logic_vector(3 downto 0)
 variable g : std_logic_vector(3 downto 0);
begin
 g := b xor ('0' & b(3 downto 1));
 return g;
end function;

function from_gray ( g : std_logic_vector(3 downto 0)) return std_logic_vector(3 downto 0)
 variable b : std_logic_vector(3 downto 0);
begin
 b(3) := g(3);
 for i in 0 to 2 loop
  b(2 - i) := b(3 - i) xor g(2 - i);
 end loop;
 return b;
end function;

Although a more efficient way to do this would be to build a table of some sort...