Falcon v3 board. Plans for 128 universes

smeighan · Nov 23, 2016

I did not see if Dave Pitts had posted about this discussion of his new v3 board?Dave Pitts wrote:F16V3 latency and refresh rates. Open discussion.

In the new design I was trying to optimize the processor ram allocation to allow for at least 100 universes. So I optimized the Ethernet DMA memory descriptors to limit the receive buffers to just over an E131/Artnet universe and picked up another 70K memory. I am now receiving 150 universes at 25ms timing.

Not that I would make the controller receive/output 150 universes(maybe firmware update if needed) but 128 is the goal and I forsee no road blocks ahead. I thought 64 universes was excessive, but since many new users and old care about stats on a spreadsheet when purchasing a controller without factoring what it would take to make a 1020 channel string I am deciding to do it. The 1360 string will have a refresh rate of 50ms (20Hz) for both data being received and data being sent out as anything lower is too low for most use cases.

When I get the DMA driven 50Mhz, 16-bit bus going in F16V3 the transfer speed to FPGA should be about .8 ms which should reduce the latency by factor of 16. The processing latency was eliminated long ago with a one to one mapping of output data to input data using a 65K lookup table.

So the data flow is as follows.

1) Receive data via Ethernet DMA to memory defined by buffer descriptors. (160 buffers, DMA without processor)
2) Transfer incoming data to large input buffer. (This is done to prevent new data from over writing old data in 160 receiving buffers
3) Use lookup table to create large buffer to send to FPGA. The lookup tables are pre-calculated right when controller boots up or when settings are changed.
This step is a loop that has one statement and no "ifs". Very quick and no processing. Latency is at a minimal. No figuring out rgb orders or zig zag it has all been done already.
4) Output data to FPGA via 16-bit bus and 50Mhz (With DMA without processor). (.8ms latency)

5) FPGA outputs all pixel data. The FPGA has embedded in it 16 parallel "Pixel Output Processes" that run independently of each other. They are each capable of outputting all the pixel protocols the F16V2/F16V3 can output. The processes are like 16 small processors dedicated to output pixels really well.

The whole path uses very little processor power as it is done mostly with DMA, lookup tables and a 16 "pixel processor" FPGA design.
Systems without lookup tables and a limited slow data path (SPI or similar minimal bus width, < 25Mhz) to output stage will be subject to a large latency on the data path and refresh rates will suffer.

I am open to suggestions on how to lower latency as well as maximize refresh rate.</blockquote>Last edited by dpitts; Today at 10:54 PM.
David Pitts
PixelController, LLC
PixelController.com

multicast · Nov 23, 2016

one of the problems of course is that your data rate on the wire is being pushed up.. with these signal ened, non sheilded pixels the amount of RFI that these thigns start making is not trivial.

marmalade · Nov 23, 2016

of course if extremely long strings are used the data rate has to be maxed out, and as you stated the downside is emi and sensitivity to cable impedance/signal degradation.

The benefit of this though is in the fpga being able to control multiple smaller strings concurrently at good refresh rates, but maintaining lower data rates, and therefore having the the option to split longer strings up over multiple universes without cpu overhead or latency.

May only ever be useful for big matrix's though! Aren't aware of many with 1000 long pixel strings that need to be split up

Falcon v3 board. Plans for 128 universes

smeighan

Dedicated elf

multicast

Senior elf

marmalade

cats & pixels