Skip to content
This repository has been archived by the owner on Jan 4, 2021. It is now read-only.

ERROR: invalid memory alloc request size 18446744073709551613 #60

Open
bjne opened this issue Nov 19, 2017 · 18 comments
Open

ERROR: invalid memory alloc request size 18446744073709551613 #60

bjne opened this issue Nov 19, 2017 · 18 comments

Comments

@bjne
Copy link

bjne commented Nov 19, 2017

When using prepared statements in _U, I experience weird behavior. I am not able to create a testcase
that reproduces, but what I am seeing is:

  • I start a new psql session
  • call f1() (that prepares a statement in do _U = server.prepare(...):save and executes)
  • call f2() (that prepares a different statement as above and executes
  • f2 suddenly gets to a row that when accessed causes the ERROR
  • strange thing is that if I add more than 6 lines of commented out lua lines in f2, bug goes away
  • also, the next queries (in the same session) to f1/f2 does not cause this error, only the first(seems)
  • the problem does not appear if I dont run f1() before f2()
  • in addition, this seems to only happen when using rows(), and not getcursor (ro or not), but given the fact that adding lines of comments "fixes" the rows() problem, this might not be true

Any idea what this is or where I can look to start debugging this?
Running on pllua/pllua master and postgresql 9.5

@eugwne
Copy link
Member

eugwne commented Nov 19, 2017

postgres compiled with --enable-debug ?

@bjne
Copy link
Author

bjne commented Nov 19, 2017

No, but I can try that?

@eugwne
Copy link
Member

eugwne commented Nov 19, 2017

Yes, building postgres with debugging support might help

@golgote
Copy link
Member

golgote commented Nov 19, 2017 via email

@bjne
Copy link
Author

bjne commented Nov 19, 2017

Yeah, this is luajit. I tried with lua, and it gives the same problem, but at a slightly different time. But still,
when adding 6 lines of commented out code at the top of the function causes a big difference, something
is wrong here.

@bjne
Copy link
Author

bjne commented Nov 19, 2017

What I do know is that the rows() iterator accesses the first perhaps 20 rows just fine, but suddenly I get
a row that when I access one of its columns (a text column), it throws the error, so this might have something to do with __tostring (but why the difference between rows vs cursor)? Let me try to pin it down some more..

@bjne
Copy link
Author

bjne commented Nov 19, 2017

So it seems to me that only accessing the text columns of the given row gives the error. I tried accessing
booleans and integers in the same row by printing them, and that is just fine, but printing the text column
causes: invalid memory alloc request size 18446744073709551613

@bjne
Copy link
Author

bjne commented Nov 19, 2017

2017-11-19 19:06:34.192 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  server process (PID 1948) was terminated by signal 11: Segmentation fault
2017-11-19 19:06:34.192 GMT [1858] DETAIL:  Failed process was running: select module_schema('foo', 'bar')
2017-11-19 19:06:34.192 GMT [1858] LOG:  server process (PID 1948) was terminated by signal 11: Segmentation fault
2017-11-19 19:06:34.192 GMT [1858] DETAIL:  Failed process was running: select module_schema('foo', 'bar')
2017-11-19 19:06:34.192 GMT [1858] LOG:  terminating any other active server processes
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1865
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1861
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1860
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1862
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1863
2017-11-19 19:06:34.192 GMT [1858] DEBUG:  sending SIGQUIT to process 1864
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/global.stat"
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.192 GMT [1861] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/db_12328.stat"
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_12328.stat"
2017-11-19 19:06:34.192 GMT [1864] DEBUG:  writing stats file "pg_stat/db_17203.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_17203.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  writing stats file "pg_stat/db_0.stat"
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  removing temporary stats file "pg_stat_tmp/db_0.stat"
2017-11-19 19:06:34.193 GMT [1863] WARNING:  terminating connection because of crash of another server process
2017-11-19 19:06:34.193 GMT [1863] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-19 19:06:34.193 GMT [1863] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1863] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1864] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.193 GMT [1862] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.193 GMT [1865] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1865] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1865] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.193 GMT [1860] DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1860] DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1860] DEBUG:  proc_exit(-1): 0 callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  reaping dead processes
2017-11-19 19:06:34.194 GMT [1858] LOG:  all server processes terminated; reinitializing
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  shmem_exit(1): 0 before_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
2017-11-19 19:06:34.194 GMT [1858] DEBUG:  cleaning up dynamic shared memory control segment with ID 877356333
2017-11-19 19:06:34.196 GMT [1858] DEBUG:  invoking IpcMemoryCreate(size=15237120)
2017-11-19 19:06:34.196 GMT [1858] DEBUG:  mmap(16777216) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2017-11-19 19:06:34.201 GMT [1858] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2017-11-19 19:06:34.201 GMT [1858] DEBUG:  removing file "pg_notify/0000"

@bjne
Copy link
Author

bjne commented Nov 20, 2017

(gdb) b mcxt.c:858
Breakpoint 1 at 0x82dfb9: file mcxt.c, line 858.
(gdb) continue
Continuing.

Breakpoint 1, palloc (size=18446744073709551613) at mcxt.c:858
warning: Source file is more recent than executable.
858                     elog(ERROR, "palloc invalid memory alloc request size %zu", size);
(gdb) bt
#0  palloc (size=18446744073709551613) at mcxt.c:858
#1  0x00000000007e8e76 in text_to_cstring (t=0x1aacad0) at varlena.c:189
#2  0x000000000080d7ed in DirectFunctionCall1Coll (func=0x7e95a0 <textout>, collation=collation@entry=0, arg1=arg1@entry=27970256) at fmgr.c:717
#3  0x00007fc300e65c5b in luaP_pushdatum (L=L@entry=0x19876b0, dat=27970256, type=25) at plluaapi.c:891
#4  0x00007fc300e68cf5 in luaP_p_tupleindex (L=0x19876b0) at plluaspi.c:471
#5  0x00007fc300c12320 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#6  0x00007fc300c12734 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#7  0x00007fc300c1b5d6 in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#8  0x00007fc300c1c94e in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#9  0x00007fc300c1277d in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#10 0x00007fc300c11a5e in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#11 0x00007fc300c128eb in ?? () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#12 0x00007fc300c0e3a8 in lua_pcall () from /usr/lib/x86_64-linux-gnu/liblua5.1.so.0
#13 0x00007fc300e67a45 in luaP_callhandler (L=0x19876b0, fcinfo=0x7fff0e1fcac0) at plluaapi.c:1396
#14 0x00000000005ef599 in ExecMakeTableFunctionResult (setexpr=0x197e5d8, econtext=0x197e318, argContext=<optimized out>, expectedDesc=0x197f798, randomAccess=0 '\000') at execSRF.c:231
#15 0x00000000005fae13 in FunctionNext (node=node@entry=0x197e208) at nodeFunctionscan.c:94
#16 0x00000000005eea79 in ExecScanFetch (recheckMtd=0x5fab40 <FunctionRecheck>, accessMtd=0x5fab60 <FunctionNext>, node=0x197e208) at execScan.c:97
#17 ExecScan (node=0x197e208, accessMtd=0x5fab60 <FunctionNext>, recheckMtd=0x5fab40 <FunctionRecheck>) at execScan.c:147
#18 0x00000000005e8f7a in ExecProcNode (node=0x197e208) at ../../../src/include/executor/executor.h:250
#19 ExecutePlan (execute_once=<optimized out>, dest=0x19de918, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x197e208, estate=0x197dff8)
    at execMain.c:1722
#20 standard_ExecutorRun (queryDesc=0x1985218, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#21 0x000000000070e67c in PortalRunSelect (portal=portal@entry=0x1a16618, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x19de918) at pquery.c:932
#22 0x000000000070fa5e in PortalRun (portal=portal@entry=0x1a16618, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x19de918,
    altdest=altdest@entry=0x19de918, completionTag=0x7fff0e1fd1b0 "") at pquery.c:773
#23 0x000000000070b929 in exec_simple_query (query_string=0x19e2768 "select * from module_schema('foo','bar');") at postgres.c:1099
#24 0x000000000070cdbd in PostgresMain (argc=<optimized out>, argv=argv@entry=0x198f060, dbname=0x198c208 "webcfg", username=<optimized out>) at postgres.c:4088
#25 0x0000000000477d30 in BackendRun (port=0x1981f90) at postmaster.c:4357
#26 BackendStartup (port=0x1981f90) at postmaster.c:4029
#27 ServerLoop () at postmaster.c:1753
#28 0x00000000006a375c in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x195e9f0) at postmaster.c:1361
#29 0x0000000000478cc1 in main (argc=3, argv=0x195e9f0) at main.c:228

@bjne
Copy link
Author

bjne commented Nov 20, 2017

This seems to reproduce the bug I am seeing

create database pllua_bug;
\c pllua_bug

create extension if not exists pllua;

drop table test;
create table test as select s, md5(random()::text) from generate_Series(1,500) s;

create or replace function f1()
    returns void as $$
        for t in _U.get_test:rows() do
                print(t.md5)
        end
end
do
    _U = {
        get_test = server.prepare([[
            SELECT *
              FROM test
        ]]):save()
    }
$$ language plluau;

create or replace function f2()
    returns void as $$
        for t in _U.get_test:rows() do
                print(t.md5)
        end
end
do
    _U = {
        get_test = server.prepare([[
            SELECT *
              FROM test
        ]]):save()
    }
$$ language plluau;

select * from f1();
select * from f2();
...
INFO:  deaba1fc7c1890103fa3734a7a21cb2c
INFO:  effa1c3f13e@
INFO:  3ed5d52d6061147693064061091@
INFO:  12b83526047
ERROR:  palloc invalid memory alloc request size 1647392101

@bjne
Copy link
Author

bjne commented Nov 20, 2017

Above test run on postgres compiled with --enable-debug --enable-cassert

Thread 1 "postgres" hit Breakpoint 1, palloc (size=18446744073709551613) at mcxt.c:858
858                     elog(ERROR, "palloc invalid memory alloc request size %zu", size);
(gdb) bt
#0  palloc (size=18446744073709551613) at mcxt.c:858
#1  0x0000000000840806 in text_to_cstring (t=0xfc015c) at varlena.c:189
#2  0x0000000000867e3d in DirectFunctionCall1Coll (func=0x841010 <textout>, collation=collation@entry=0, arg1=arg1@entry=16515420) at fmgr.c:717
#3  0x00007f0902538e6b in luaP_pushdatum (L=L@entry=0x40f83378, dat=16515420, type=25) at plluaapi.c:880
#4  0x00007f090253bf35 in luaP_p_tupleindex (L=0x40f83378) at plluaspi.c:471
#5  0x00007f0902296a77 in ?? () from /usr/local/lib/libluajit-5.1.so.2
#6  0x00007f09022a700d in lua_pcall () from /usr/local/lib/libluajit-5.1.so.2
#7  0x00007f090253abe5 in luaP_callhandler (L=0x40f83378, fcinfo=0x7ffea9cb6cd0) at plluaapi.c:1385
#8  0x0000000000622579 in ExecMakeTableFunctionResult (setexpr=0xf73068, econtext=0xf72d60, argContext=<optimized out>, expectedDesc=0xf73880, randomAccess=0 '\000') at execSRF.c:231
#9  0x000000000062f143 in FunctionNext (node=node@entry=0xf72c48) at nodeFunctionscan.c:94
#10 0x0000000000621869 in ExecScanFetch (recheckMtd=0x62ee80 <FunctionRecheck>, accessMtd=0x62ee90 <FunctionNext>, node=0xf72c48) at execScan.c:97
#11 ExecScan (node=0xf72c48, accessMtd=0x62ee90 <FunctionNext>, recheckMtd=0x62ee80 <FunctionRecheck>) at execScan.c:147
#12 0x000000000061b5ea in ExecProcNode (node=0xf72c48) at ../../../src/include/executor/executor.h:250
#13 ExecutePlan (execute_once=<optimized out>, dest=0xf93668, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0xf72c48, estate=0xf72a30)
    at execMain.c:1722
#14 standard_ExecutorRun (queryDesc=0xecfac0, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:363
#15 0x000000000075caac in PortalRunSelect (portal=portal@entry=0xf6ea10, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0xf93668) at pquery.c:932
#16 0x000000000075e066 in PortalRun (portal=portal@entry=0xf6ea10, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0xf93668, altdest=altdest@entry=0xf93668,
    completionTag=0x7ffea9cb73c0 "") at pquery.c:773
#17 0x0000000000759b99 in exec_simple_query (query_string=0xf2af10 "select * from f2();") at postgres.c:1099
#18 0x000000000075afde in PostgresMain (argc=<optimized out>, argv=argv@entry=0xed7f78, dbname=0xed7df8 "pllua_bug", username=<optimized out>) at postgres.c:4088
#19 0x0000000000478819 in BackendRun (port=0xeccfa0) at postmaster.c:4357
#20 BackendStartup (port=0xeccfa0) at postmaster.c:4029
#21 ServerLoop () at postmaster.c:1753
#22 0x00000000006e4025 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0xea99f0) at postmaster.c:1361
#23 0x0000000000479c41 in main (argc=3, argv=0xea99f0) at main.c:228
(gdb) up 4
#4  0x00007f090253bf35 in luaP_p_tupleindex (L=0x40f83378) at plluaspi.c:471
471                     luaP_pushdatum(L, t->value[i], tupleDesc->attrs[i]->atttypid);
(gdb) print t
$1 = (luaP_Tuple *) 0xfa5460
(gdb) print t->value
$2 = (Datum *) 0xfa5490
(gdb) print *t
$3 = {changed = -1, relid = 0, tuple = 0xfc0128, tupdesc = 0x0, value = 0xfa5490, null = 0xfa54a0 "", rtupdesc = 0xf96998}
(gdb) print t->value[0]
$4 = 296
(gdb) print t->value[1]
$5 = 16515420
(gdb) print *(t->tuple)
$6 = {t_len = 912680504, t_self = {ip_blkid = {bi_hi = 12646, bi_lo = 13154}, ip_posid = 14643}, t_tableOid = 929391921, t_data = 0xfc7e00}
(gdb)

@RhodiumToad
Copy link
Member

I discussed this in the #postgresql irc channel with bjne. It looks to me like much of pllua's memory management is wrong - this particular bug seems to be the result of shallow-copying Datum values without care for whether the target of a by-reference value is being freed. In particular you can't safely SPI_freetuptable a result set until you no longer have any Datum values that were fetched from the tuples and not copied.

@bjne
Copy link
Author

bjne commented Jan 22, 2018

...and for the record, this was the start of rhodiumtoad/pllua-ng

@anthony-hz
Copy link

Many thanks for Pllua-ng

@ghost
Copy link

ghost commented Jul 23, 2020

Hello everyone!
This strange and terrifying number showed up on my trainee MySQL too, but in different place.
mysqli_get_client_stats();
active_connections: 18446744073709551613
... he he (in my language it is sardonic laugh).
And after every new mysqli_ - even mysqli_close($con) it is getting smaller... Sometimes it appears properly as 0 or 1, for one moment and again is going vicious, nothing helps. Database is working, data's aren't go freak.
I did small test of maths - I put some small values unsigned, negatives etc - everything is good, numbers keep values and types.
Right now, I decided to ignore this - this db is only for exercises and playing, no relevant data - and I have no idea what from it comes...
Perhaps someone smarter than me, will someday discover what devil speeds this number, I just leave this massage for investigators.
MySQL 5.7, utf-8, utf9mb4, general or unicode; well, Swedish collation and latin1 on server (default);
This is good working hosting since years, recently nothing changed, what else can I say? I did nothing weird to my db. Whole database it is 5-7 tables, each 0 - 20 rows, I am the only user of this db;
Wow, active_connections: 1, and right after (2 queries) 18446744073709551615, after few minutes: 0, than after 2 queries 18446744073709551614 active_connections...
So, I will ignore this, what else can I do?

@RhodiumToad
Copy link
Member

2^64 is 18446744073709551616, so 2^64 - 1 is 18446744073709551615, etc.

Therefore these numbers come up very frequently as a result of underflowing a 64-bit unsigned counter.

@ghost
Copy link

ghost commented Jul 24, 2020

Thank you, but what could that mean in case mentioned? The number is somehow understandable for x64 server, but still - mysqli_get_client_stats() [active_connections] is kind of "check the list and tell me how many" it should be 0 or 1 (int)
Where does this ... freak come from and why? Is there a cure for it? Of course, the most welcome answer would be (working) function like this:
function spell(){ wild x64 counters underflows GO HOME!}
....and they lived happily ever after.

Meanwhile - it still bothers me - is it somehow dangerous for server or for database? Should I keep digging, testing and asking or just getting used to it?

@RhodiumToad
Copy link
Member

This is not the place to ask these questions. Take it somewhere more appropriate.

@pllua pllua locked as off-topic and limited conversation to collaborators Jul 24, 2020
@RhodiumToad RhodiumToad reopened this Jul 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants