Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large array crashes ghdl #812

Open
wsneijers opened this issue May 2, 2019 · 13 comments
Open

Large array crashes ghdl #812

wsneijers opened this issue May 2, 2019 · 13 comments
Labels
Milestone

Comments

@wsneijers
Copy link

Hello,

I've been running into an issue with GHDL using large arrays in vhdl (declared as type). To start I already saw previous issues about the same sort of subject: #342, #471 and #611. In those issues the problem is marked as solved. However I still have the same sort of issue, even with the latest master sources.

In my case I'm using a test setup combining cocotb and GHDL. Where cocotb creates the test stimuli and monitors. Don't know if this can make a difference, but for completeness I thought id mention it. I'm trying to simulate the following file:

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
use IEEE.std_logic_unsigned.all;

entity cosim_test is
end cosim_test;

architecture rtl of cosim_test is

type ram_type is array(0 to (2**28)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));

begin

end rtl;

Which for me causes the following output in the testbench:

loading VPI module '/home/docker/cocotb/build/libs/x86_64/libvpi.so'
     -.--ns INFO     cocotb.gpi                                  gpi_embed.c:114  in embed_init_python               Did not detect virtual environment. Using system-wide Python interpreter.
     -.--ns INFO     cocotb.gpi                                GpiCommon.cpp:91   in gpi_print_registered_impl       VPI registered
VPI module loaded!
/home/docker/cocotb/makefiles/simulators/Makefile.ghdl:63: recipe for target 'results.xml' failed
make[4]: *** [results.xml] Error 255
/home/docker/cocotb/makefiles/Makefile.sim:71: recipe for target 'sim' failed
make[3]: *** [sim] Error 2
CMakeFiles/Simulation.dir/build.make:68: recipe for target 'run_simulation' failed
make[2]: *** [run_simulation] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Simulation.dir/all' failed
make[1]: *** [CMakeFiles/Simulation.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

When I decrease the size of the array I get to a point where it starts working again:

type ram_type is array(0 to (2**15)-1) of std_logic_vector(127 downto 0);
signal ram : ram_type := (others => (others => '0'));

However the simulation is very very slow! I already tried debugging with GDB, breaking at _exit and __ghdl_fatal. The breakpoint at __ghdl_fatal does not work. The breakpoint at '_exit' does, however the output is not very usefull:

Breakpoint 1, __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) backtrace
#0  __GI__exit (status=status@entry=-1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1  0x00007f6dbfcbdfab in __run_exit_handlers (status=-1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2  0x00007f6dbfcbe045 in __GI_exit (status=<optimized out>) at exit.c:104
#3  0x00007f6dbfca4837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7fffcfd60c28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffcfd60c18) at ../csu/libc-start.c:325
#4  0x0000000000406029 in _start ()

At this point I'm kinda lost at what to do.

@umarcor
Copy link
Member

umarcor commented May 2, 2019

This might be related to #752.

@wsneijers, did you try increasing the size of the stack as suggested in #611 (comment)?

@wsneijers
Copy link
Author

@umarcor Yes I did try increasing the stack size, it did not work though. I don't know if it is related. Could definitely be. That is using vunit however and not cocotb.

@tgingold
Copy link
Member

tgingold commented May 2, 2019

2**28 * 128 is a huge number of signals.
Can you use a variable instead ?

@wsneijers
Copy link
Author

wsneijers commented May 3, 2019

@tgingold Good sugestion. Never thought of it, thanks! It does solve one problem, the speed. It is now as fast as normal. However it still breaks with 2^28 array size. Though it is a little more descriptive:

Starting program: /usr/local/bin/ghdl -r --std=08 --ieee=synopsys -O3 -Wno-binding -frelaxed-rules tb_shaping_dma_controller --vpi=/home/docker/cocotb/build/libs/x86_64/libvpi.so --wave=../wave.ghw --ieee-asserts=disable
warning: Error disabling address space randomization: Operation not permitted
loading VPI module '/home/docker/cocotb/build/libs/x86_64/libvpi.so'
     -.--ns INFO     cocotb.gpi                                  gpi_embed.c:114  in embed_init_python               Did not detect virtual environment. Using system-wide Python interpreter.
     -.--ns INFO     cocotb.gpi                                GpiCommon.cpp:91   in gpi_print_registered_impl       VPI registered
VPI module loaded!
./tb_shaping_dma_controller:error: NULL access dereferenced
./tb_shaping_dma_controller:error: error during elaboration

Breakpoint 1, __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
27	../sysdeps/unix/sysv/linux/_exit.c: No such file or directory.
(gdb) bt
#0  __GI__exit (status=status@entry=1) at ../sysdeps/unix/sysv/linux/_exit.c:27
#1  0x00007f6bca28efab in __run_exit_handlers (status=1, listp=<optimized out>, run_list_atexit=run_list_atexit@entry=true) at exit.c:97
#2  0x00007f6bca28f045 in __GI_exit (status=<optimized out>) at exit.c:104
#3  0x00007f6bca275837 in __libc_start_main (main=0x406b72 <main>, argc=11, argv=0x7ffca88eeac8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca88eeab8)
    at ../csu/libc-start.c:325
#4  0x0000000000406029 in _start ()
(gdb) 

It does now mention a NULL access deference. But the __ghdl_fatal breakpoint still does not work.

@go2sh
Copy link

go2sh commented May 3, 2019

If you think about it: 2**28 * 128 equals 32 Gbit. I'am not sure about the internal representation of std_logic, but with a efficient encoding it needs at least 4 bit. If it's implemented as a char (8bit), you'll need 32 GB of memory just for the data storage not including any overhead.

@lavovaLampa
Copy link
Contributor

lavovaLampa commented May 13, 2019

I am having the same problem with 2d array defined as:

subtype CCD_Width_Range is natural range 0 to 2752 - 1;
subtype CCD_Height_Range is natural range 0 to 2002 - 1;
subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);

type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;

which should amount to cca. 65 MB. Problem is, as was said, the array is being allocated on stack. In future would it be possible to detect big allocations and allocate them on heap instead?
I should have time in the next few months to be able to at least write code to detect and report this condition instead of getting segmentation fault, if that would be desirable.

However, what's puzzling is, that I tried rising the stack size limit and it doesn't immediately crash, but it tries to allocate more than 16GB of memory (memory requirement does not seem to scale linearly?).

@tgingold
Are variables allocated on heap or are they just faster overall?

@tgingold
Copy link
Member

tgingold commented May 14, 2019 via email

@lavovaLampa
Copy link
Contributor

lavovaLampa commented May 15, 2019

Yes, this one is even more interesting. It runs with 8MB stack limit but still it tries to allocate all of my RAM :). Running LLVM flavour, but same thing should happen under mcode. I know the code has problems, the point is, the memory usage is high even when instantiating entity with all array elements set to all '0'.

test_pkg.vhd

library ieee;
use ieee.std_logic_1164.all;

package test_pkg is
    subtype CCD_Width_Range is natural range 0 to 2752 - 1;
    subtype CCD_Height_Range is natural range 0 to 2002 - 1;
    subtype CCD_Pixel_Data_T is std_logic_vector(12 - 1 downto 0);
    type CCD_Matrix_T is array (CCD_Height_Range, CCD_Width_Range) of CCD_Pixel_Data_T;
end package test_pkg;

array_test.vhd

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.CCD_Matrix_T;

entity array_test is
    port(
        clkIn, rstAsyncIn : in std_logic;
        ccdArray          : in CCD_Matrix_T
    );
end entity array_test;

architecture RTL of array_test is
begin
    ctrlProc : process(clkIn, rstAsyncIn)
        variable currWidth, currHeight : natural := 0;
    begin
        if rstAsyncIn = '1' then
            currWidth  := 0;
            currHeight := 0;
        elsif rising_edge(clkIn) then
            report "Current pixel: " & to_hstring(ccdArray(currHeight, currWidth));
            if currWidth >= ccdArray'high(2) then
                currWidth  := 0;
                currHeight := currHeight + 1;
            else
                currWidth := currWidth + 1;
            end if;
        end if;
    end process ctrlProc;
end architecture RTL;

array_test_tb.vhd

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.test_pkg.all;

entity tb_array_test is
end tb_array_test;

architecture tb of tb_array_test is
    signal clkIn      : std_logic;
    signal rstAsyncIn : std_logic;
    signal ccdArray   : ccd_matrix_t;

    constant TbPeriod : time      := 1000 ns;
    signal TbClock    : std_logic := '0';
    signal TbSimEnded : std_logic := '0';

begin

    dut : entity work.array_test
        port map(
            clkIn      => clkIn,
            rstAsyncIn => rstAsyncIn,
            ccdArray   => (others => (others => X"000"))
        );

    TbClock <= not TbClock after TbPeriod / 2 when TbSimEnded /= '1' else '0';
    clkIn   <= TbClock;

    stimuli : process
    begin
        TbSimEnded <= '1';
        wait;
    end process;

end tb;

@tgingold
Copy link
Member

tgingold commented May 16, 2019 via email

@tgingold
Copy link
Member

tgingold commented May 16, 2019 via email

@lavovaLampa
Copy link
Contributor

lavovaLampa commented May 18, 2019

Where's the ~60E6 coming from? I see that my model needs:
2752 * 2002 * 144B ~= 760MB * 3 (driver, TB, model) ~= 2300MB

Where did I err in my calculations?

Otherwise thanks, I will rewrite it to use variables 👍 .

@tgingold
Copy link
Member

tgingold commented May 18, 2019 via email

@lavovaLampa
Copy link
Contributor

Ah, thanks again. It makes sense now.

@eine eine added the Bug label Sep 16, 2019
@eine eine added this to the v1.0 milestone May 8, 2020
@umarcor umarcor modified the milestones: v1.0, v2.0 Feb 3, 2021
@umarcor umarcor modified the milestones: v2.0, v3.0 Mar 1, 2022
@umarcor umarcor modified the milestones: v3.0, v4.0 Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants