Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get IPC protocol working #15

Open
everythingfunctional opened this issue Apr 13, 2023 · 10 comments
Open

Get IPC protocol working #15

everythingfunctional opened this issue Apr 13, 2023 · 10 comments

Comments

@everythingfunctional
Copy link

I'd like to get the ipc protocol working so that we can make use of more Jupyter kernels in an environment where the tcp protocol is considered to not be sufficiently secure. I've managed to find where an exception is being caught (it's here). Would anybody know where it might be coming from, what the cause might be, or possibly how to fix the problem? I'm happy to make the fix if anybody could help point me in the right direction.

@JohanMabille
Copy link
Member

This is a catch-all handler, the exception can come from anywhere; do you have more detail, like the error message and/or the stacktrace?

@everythingfunctional
Copy link
Author

The only message is:

std::exception: Invalid argument

I commented out the catch block and ran it in gdb, and then the stacktrace is:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7204953 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff71b5ea8 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff719f53d in __GI_abort () at abort.c:79
#4  0x00007ffff74fe026 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#5  0x00007ffff74fc514 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#6  0x00007ffff74fb8b5 in __cxa_call_terminate (ue_header=0x555559980680) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#7  0x00007ffff74fbe5f in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=<optimized out>, exception_class=5138137972254386944, ue_header=0x555559980680, context=0x7fffffffd2f0) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:688
#8  0x00007ffff7ce0d79 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x555559980680, context=context@entry=0x7fffffffd2f0, frames_p=frames_p@entry=0x7fffffffd3e0) at ../../../libgcc/unwind.inc:64
#9  0x00007ffff7ce107c in _Unwind_RaiseException (exc=0x555559980680) at ../../../libgcc/unwind.inc:136
#10 0x00007ffff74fc74b in __cxxabiv1::__cxa_throw (obj=0x5555599806a0, tinfo=0x7ffff75f2910 <typeinfo for std::system_error>, dest=0x7ffff7519626 <std::system_error::~system_error()>) at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#11 0x00007ffff74f9469 in std::__throw_system_error (__i=22) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/system_error:202
#12 0x00007ffff7519959 in std::thread::join (this=0x55555993ea68) at ../../../../../libstdc++-v3/src/c++11/thread.cc:115
#13 0x00007ffff7f530d0 in xeus::xserver_zmq_split::~xserver_zmq_split (this=0x55555993e9c0, __in_chrg=<optimized out>) at /home/brad/Repositories/GitHub/other/xeus-zmq/src/xserver_zmq_split.cpp:48
#14 0x00007ffff7f4e0f6 in xeus::xserver_shell_main::~xserver_shell_main (this=0x55555993e9c0, __in_chrg=<optimized out>) at /home/brad/Repositories/GitHub/other/xeus-zmq/src/xserver_shell_main.cpp:32
#15 0x00007ffff7f4e112 in xeus::xserver_shell_main::~xserver_shell_main (this=0x55555993e9c0, __in_chrg=<optimized out>) at /home/brad/Repositories/GitHub/other/xeus-zmq/src/xserver_shell_main.cpp:32
#16 0x00007ffff7bd3056 in std::default_delete<xeus::xserver>::operator() (this=0x7fffffffdb00, __ptr=0x55555993e9c0) at /usr/include/c++/12.2.1/bits/unique_ptr.h:95
#17 0x00007ffff7bd2690 in std::unique_ptr<xeus::xserver, std::default_delete<xeus::xserver> >::~unique_ptr (this=0x7fffffffdb00, __in_chrg=<optimized out>) at /usr/include/c++/12.2.1/bits/unique_ptr.h:396
#18 0x00007ffff7bd0e01 in xeus::xkernel::xkernel (this=0x7fffffffd960, config=..., user_name=..., context=..., interpreter=..., sbuilder=0x7ffff7f4e1e0 <xeus::make_xserver_shell_main(xeus::xcontext&, xeus::xconfiguration const&, nlohmann::json_abi_v3_11_2::detail::error_handler_t)>, history_manager=..., logger=..., 
    dbuilder=0x7ffff7b8ba5c <xeus::make_null_debugger(xeus::xcontext&, xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&)>, debugger_config=..., eh=nlohmann::json_abi_v3_11_2::detail::error_handler_t::strict)
    at /home/brad/Repositories/GitHub/other/xeus/src/xkernel.cpp:94
#19 0x0000555556352b77 in LCompilers::LFortran::run_kernel (connection_filename=...) at /home/brad/Repositories/GitHub/other/lfortran/src/lfortran/fortran_kernel.cpp:512
#20 0x000055555609af66 in main (argc=4, argv=0x7fffffffe4b8) at /home/brad/Repositories/GitHub/other/lfortran/src/bin/lfortran.cpp:1762

@JohanMabille
Copy link
Member

This is really weird, this means that the thread is not joinable. I've built and run an IPC kernel and IPC client on my machine, it seems to work (thoses added in #16).

Can you try #17 and tell me if this fixes the issue (and also the output it gives)?
Also can you provide more information about you platform and compiler?

@JohanMabille
Copy link
Member

JohanMabille commented Apr 17, 2023

Ok so actually there is an issue in the destructor implementation, and it always throws when the kernel shutdowns. However it is unrelated to having the IPC protocol working.

you can export XEUS_LOG=1 before starting you kernel, it should dump the message type the kernel is receiving in the console, and the complete messages in a file if you pass your kernel a more complete logger. This could help understanding what happens.

In the meantime I'm gonna release a fix for the destructor issue.

@everythingfunctional
Copy link
Author

@JohanMabille , thanks for the info. It's a busy week for me, but I'll start poking around to this a bit more next week.

@everythingfunctional
Copy link
Author

@JohanMabille , sorry for the delay. I finally got a chance to poke at this a bit more. I wasn't able to get as far as a more complete logger, because the crash happens before that. I was able to comment out some catch blocks and get a stack trace though.

terminate called after throwing an instance of 'zmq::error_t'
  what():  Invalid argument
Traceback (most recent call last):
  Binary file "/home/brad/.conda/envs/lfortran-dev/bin/lfortran", in _start()
  Binary file "/usr/lib/libc.so.6", in __libc_start_main()
  Binary file "/usr/lib/libc.so.6", in __libc_init_first()
  File "/home/brad/Repositories/GitHub/other/lfortran/src/bin/lfortran.cpp", line 1786, in main()
    return LCompilers::LFortran::run_kernel(arg_kernel_f);
  File "/home/brad/Repositories/GitHub/other/lfortran/src/lfortran/fortran_kernel.cpp", line 512, in LCompilers::LFortran::run_kernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    debugger_config);
  File "/home/brad/Repositories/GitHub/other/xeus/src/xkernel.cpp", line 93, in xeus::xkernel::xkernel(xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<xeus::xcontext, std::default_delete<xeus::xcontext> >, std::unique_ptr<xeus::xinterpreter, std::default_delete<xeus::xinterpreter> >, std::unique_ptr<xeus::xserver, std::default_delete<xeus::xserver> > (*)(xeus::xcontext&, xeus::xconfiguration const&, nlohmann::json_abi_v3_11_2::detail::error_handler_t), std::unique_ptr<xeus::xhistory_manager, std::default_delete<xeus::xhistory_manager> >, std::unique_ptr<xeus::xlogger, std::default_delete<xeus::xlogger> >, std::unique_ptr<xeus::xdebugger, std::default_delete<xeus::xdebugger> > (*)(xeus::xcontext&, xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&), nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > >, nlohmann::json_abi_v3_11_2::detail::error_handler_t)
    init(sbuilder, dbuilder);
  File "/home/brad/Repositories/GitHub/other/xeus/src/xkernel.cpp", line 136, in xeus::xkernel::init(std::unique_ptr<xeus::xserver, std::default_delete<xeus::xserver> > (*)(xeus::xcontext&, xeus::xconfiguration const&, nlohmann::json_abi_v3_11_2::detail::error_handler_t), std::unique_ptr<xeus::xdebugger, std::default_delete<xeus::xdebugger> > (*)(xeus::xcontext&, xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&))
    p_server->update_config(m_config);
  File "/home/brad/Repositories/GitHub/other/xeus/src/xserver.cpp", line 61, in xeus::xserver::update_config(xeus::xconfiguration&) const
    update_config_impl(config);
  File "/home/brad/Repositories/GitHub/other/xeus-zmq/src/xserver_zmq_split.cpp", line 119, in xeus::xserver_zmq_split::update_config_impl(xeus::xconfiguration&) const
    config.m_control_port = p_controller->get_port();
  File "/home/brad/Repositories/GitHub/other/xeus-zmq/src/xcontrol.cpp", line 42, in xeus::xcontrol::get_port[abi:cxx11]() const
    return get_socket_port(m_control);
  File "/home/brad/Repositories/GitHub/other/xeus-zmq/src/xmiddleware.cpp", line 93, in xeus::get_socket_port[abi:cxx11](zmq::socket_t const&)
    std::string end_point = socket.get(zmq::sockopt::last_endpoint, 32);
  File "/home/brad/.conda/envs/lfortran-dev/include/zmq.hpp", line 1832, in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > zmq::detail::socket_base::get<32, 1>(zmq::sockopt::array_option<32, 1>, unsigned long) const
    size_t size = get(sockopt::array_option<Opt>{}, buffer(str));
  File "/home/brad/.conda/envs/lfortran-dev/include/zmq.hpp", line 1814, in unsigned long zmq::detail::socket_base::get<32, 1>(zmq::sockopt::array_option<32, 1>, zmq::mutable_buffer) const
    get_option(Opt, buf.data(), &size);
  File "/home/brad/.conda/envs/lfortran-dev/include/zmq.hpp", line 2063, in zmq::detail::socket_base::get_option(int, void*, unsigned long*) const
    throw error_t();
  File "/home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/eh_throw.cc", line 98, in __cxa_throw()
  File "/home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/eh_terminate.cc", line 58, in std::terminate()
  File "/home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/eh_terminate.cc", line 48, in __cxxabiv1::__terminate(void (*)())
  File "/home/conda/feedstock_root/build_artifacts/gcc_compilers_1666516830325/work/build/x86_64-conda-linux-gnu/libstdc++-v3/libsupc++/../../../../libstdc++-v3/libsupc++/vterminate.cc", line 95, in __gnu_cxx::__verbose_terminate_handler()
  Binary file "/usr/lib/libc.so.6", in abort()
  Binary file "/usr/lib/libc.so.6", in gsignal()
  Binary file "/usr/lib/libc.so.6", in pthread_key_delete()
  Binary file "/usr/lib/libc.so.6", in __sigaction()
Abort: Signal SIGABRT (abort) received

Does this help narrow anything down at all?

For more background, I'm trying to use the LFortran kernel, testing on an Arch Linux machine with the latest gcc (13.1.1). Let me know if there are additional details you'd like.

@JohanMabille
Copy link
Member

@everythingfunctional sorry for the delay too, these last weeks have been quite intense.

It definitely helps narrowing things, although I really don't understand why it would fail here. I cannot reproduce such a failure with the tiny ipc kernel test I have added to the main branch. Can you dump the connection file passed to the kernel? Also can you indicate with versions of cppzmq and zeromq you are using (although I guess these are automatically pulled by xeus-zmq when you create you environment)?

@everythingfunctional
Copy link
Author

Sorry for again taking so long to get back to this.

Can you dump the connection file passed to the kernel?

It looks like this:

{
  "shell_port": 1,
  "iopub_port": 2,
  "stdin_port": 3,
  "control_port": 4,
  "hb_port": 5,
  "ip": "/home/brad/.local/share/jupyter/runtime/kernel-187920-ipc",
  "key": "a33beb30-0a4ad868b05a6ca13f87e6b7",
  "transport": "ipc",
  "signature_scheme": "hmac-sha256",
  "kernel_name": "fortran"
}

I'll note that I'm running jupyter like jupyter console --kernel=fortran --transport=ipc

Also can you indicate with versions of cppzmq and zeromq you are using

  • cppzmq 4.10.0 h7e20d1c_0 conda-forge
  • zeromq 4.3.4 h9c3ff4c_1 conda-forge

I'm building xeus and xeus-zmq from source (latest main) and going to try poking around a bunch more with the debugger this week.

@everythingfunctional
Copy link
Author

Ok, now I think I've gotten somewhere interesting, but now I don't know what to look at next. I compiled libzmq from source too and using the debugger got all the way down into it to here, which has:

int zmq::do_getsockopt (void *const optval_,
                        size_t *const optvallen_,
                        const void *value_,
                        const size_t value_len_)
{
    // TODO behaviour is inconsistent with options_t::getsockopt; there, an
    // *exact* length match is required except for string-like (but not the
    // CURVE keys!) (and therefore null-ing remaining memory is a no-op, see
    // comment below)
    if (*optvallen_ < value_len_) {
        return sockopt_invalid ();
    }
    memcpy (optval_, value_, value_len_);
    // TODO why is the remaining memory null-ed?
    memset (static_cast<char *> (optval_) + value_len_, 0,
            *optvallen_ - value_len_);
    *optvallen_ = value_len_;
    return 0;
}

with

(gdb) print value_len_
$12 = 64
(gdb) print *optvallen_
$13 = 32

and

(gdb) bt
#0  zmq::do_getsockopt (optval_=0x5555598afbe0, optvallen_=0x7fffffffcd20, value_=0x5555598b2290, value_len_=64) at /home/brad/lfortran-kernel-debug/libzmq/src/options.cpp:48
#1  0x00007ffff7c65853 in zmq::do_getsockopt (optval_=0x5555598afbe0, optvallen_=0x7fffffffcd20, value_=...) at /home/brad/lfortran-kernel-debug/libzmq/src/options.cpp:35
#2  0x00007ffff7c83ff1 in zmq::socket_base_t::getsockopt (this=0x5555598b6c90, option_=32, optval_=0x5555598afbe0, optvallen_=0x7fffffffcd20) at /home/brad/lfortran-kernel-debug/libzmq/src/socket_base.cpp:471
#3  0x00007ffff7cae7a4 in zmq_getsockopt (s_=0x5555598b6c90, option_=32, optval_=0x5555598afbe0, optvallen_=0x7fffffffcd20) at /home/brad/lfortran-kernel-debug/libzmq/src/zmq.cpp:266
#4  0x00007ffff7f00770 in zmq::detail::socket_base::get_option (this=0x5555598b0860, option_=32, optval_=0x5555598afbe0, optvallen_=0x7fffffffcd20) at /home/brad/.conda/envs/lf-kernel-debug/include/zmq.hpp:2087
#5  0x00007ffff7f477c9 in zmq::detail::socket_base::get<32, 1> (this=0x5555598b0860, buf=...) at /home/brad/.conda/envs/lf-kernel-debug/include/zmq.hpp:1840
#6  0x00007ffff7f472e6 in zmq::detail::socket_base::get<32, 1> (this=0x5555598b0860, init_size=32) at /home/brad/.conda/envs/lf-kernel-debug/include/zmq.hpp:1858
#7  0x00007ffff7f46a25 in xeus::get_socket_port[abi:cxx11](zmq::socket_t const&) (socket=...) at /home/brad/lfortran-kernel-debug/xeus-zmq/src/xmiddleware.cpp:93
#8  0x00007ffff7ef5184 in xeus::xcontrol::get_port[abi:cxx11]() const (this=0x5555598b0860) at /home/brad/lfortran-kernel-debug/xeus-zmq/src/xcontrol.cpp:42
#9  0x00007ffff7f4fe85 in xeus::xserver_zmq_split::update_config_impl (this=0x555559894450, config=...) at /home/brad/lfortran-kernel-debug/xeus-zmq/src/xserver_zmq_split.cpp:122
#10 0x00007ffff7e45562 in xeus::xserver::update_config (this=0x555559894450, config=...) at /home/brad/lfortran-kernel-debug/xeus/src/xserver.cpp:61
#11 0x00007ffff7e10ba2 in xeus::xkernel::init (this=0x7fffffffd210, sbuilder=0x7ffff7f4953f <xeus::make_xserver_shell_main(xeus::xcontext&, xeus::xconfiguration const&, nlohmann::json_abi_v3_11_2::detail::error_handler_t)>, 
    dbuilder=0x7ffff7dc9b9a <xeus::make_null_debugger(xeus::xcontext&, xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&)>) at /home/brad/lfortran-kernel-debug/xeus/src/xkernel.cpp:136
#12 0x00007ffff7e10504 in xeus::xkernel::xkernel (this=0x7fffffffd210, config=..., user_name=..., context=..., interpreter=..., sbuilder=0x7ffff7f4953f <xeus::make_xserver_shell_main(xeus::xcontext&, xeus::xconfiguration const&, nlohmann::json_abi_v3_11_2::detail::error_handler_t)>, history_manager=..., logger=..., 
    dbuilder=0x7ffff7dc9b9a <xeus::make_null_debugger(xeus::xcontext&, xeus::xconfiguration const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::json_abi_v3_11_2::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long, unsigned long, double, std::allocator, nlohmann::json_abi_v3_11_2::adl_serializer, std::vector<unsigned char, std::allocator<unsigned char> > > const&)>, debugger_config=..., eh=nlohmann::json_abi_v3_11_2::detail::error_handler_t::strict)
    at /home/brad/lfortran-kernel-debug/xeus/src/xkernel.cpp:93
#13 0x00005555562dd7f6 in LCompilers::LFortran::run_kernel (connection_filename=...) at /home/brad/lfortran-kernel-debug/lfortran/src/lfortran/fortran_kernel.cpp:514
#14 0x00005555560680c9 in main (argc=4, argv=0x7fffffffe408) at /home/brad/lfortran-kernel-debug/lfortran/src/bin/lfortran.cpp:2014

At that point it returns -1 up the calling chain until it gets here in cppzmq which just does throw error_t().

Now that I know where the exception/error originates from, any ideas how/what to look at in terms of why it's an error?

@everythingfunctional
Copy link
Author

Any chance anyone has looked at this at all? @JohanMabille

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants