Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a few issues with the C generator (part 8) #20378

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

eafer
Copy link
Contributor

@eafer eafer commented Dec 26, 2024

This is the final patch from the original pull request at #14379. I left it for last because I think it might be controversial.

The problem is as follows. The C generator produces apparently matching *_create() and *_free() functions, but they can't actually be used together. *_create() only allocates memory for the object itself, not for its internal fields (it's a "shallow copy", so to speak), but *_free() will free everything. The caller may have allocated some of those fields statically, or on the stack, or they may still need them. Even if they used malloc(), there is no reason to believe that it's the exact same malloc() implementation from the library. All very confusing for the users, and potentially exploitable.

To deal with this as simply as possible (and without affecting other users of the generator if possible), this pull request includes the following changes:

  1. Mark *_create() as deprecated to discourage any callers. This function is useless: it just allocates the memory for the object and sets the fields, which the callers can do themselves without any extra work. The caller may even choose to allocate the whole object on the stack in many cases.
  2. Add a new _library_owned field to objects, that only gets set by *_create(). Use this to tell apart objects allocated by the library from those allocated by the caller; refuse to free the latter and print a warning. This isn't perfect because, depending on how the allocation took place, this field may have some stale data; but it's better than before and the user will see the warning eventually.
  3. Define an internal version of *_create() for use by the library itself. The point is to prevent the deprecation warnings from showing up during the library build.

I appreciate that this is a huge mess, but we believe it's necessary for security and to make life easier for our users. Any ideas on simpler ways to deal with the problem are of course very welcome.

@wing328 @ityuhui @zhemant @michelealbano

The behaviour of *_free() doesn't match *_create(), so the user should
avoid using them together. But they still need *_free() to clean up
library-allocated objects, so add a _library_owned flag to each struct
as an attempt to tell them apart. This isn't perfect though, because the
user may neglect to zero the field, but they would still see a warning
once in a while so it serves its purpose.

To prevent the new deprecation warnings (intended for the user) from
showing up during the library build itself, define a new family of
*_create_internal() functions, and turn *_create() into simple wrappers.
@imaami
Copy link
Contributor

imaami commented Dec 27, 2024

Can I jump in to discuss library API design choices? I would suggest a specific way to implement both initialization and allocation + their inverse. I'll try to summarize it briefly.

For every object type:

  • There should be an init/deinit function pair. These accept a pointer to an allocated object. They don't handle allocating or freeing the object itself. Their job is to init/deinit the internal data, including allocating and freeing of nested data pointers, if any. These are meant for on-stack object management and for users who choose their own allocation method.
  • There should be a create/destroy function pair. These are wrappers around the init/deinit functions which also manage allocation and deallocation.

object.h

#ifndef PROJECTNAME_OBJECT_H_
#define PROJECTNAME_OBJECT_H_

#include <stdbool.h>

// Object type definition.
struct object {
    int a;
    bool b;
};

// Initialize an allocated object. Returns an `errno` code on failure.
extern int
object_init (struct object *obj,
             int            some_param,
             bool           another_param);

// Deallocate any possible internal allocations and wipe all data.
extern void
object_fini (struct object *obj);

// Allocate a new object, initialize it by calling `object_init()`, and return a
// pointer to the new object. If something fails everything is rolled back
// (internal allocations are freed), an error code is saved to `*err` if `err`
// is not `nullptr`, and `nullptr` is returned.
//
// Note: don't read `err` if the return value is not `nullptr`. An uninitialized
// `err` passed to `object_create()` remains uninitialized if `object_create()`
// succeeds.
extern struct object *
object_create (int   some_param,
               bool  another_param,
               int  *err);

// Unilitialize and free the object pointed to by `*objp` and clear the user's
// object pointer. (Double frees are a common category of bugs in C code,
// and the library should ease that burden if possible.)
//
// Note: passing `nullptr` or a pointer to `nullptr` is a NOP.
extern void
object_destroy (struct object **objp);

#endif // PROJECTNAME_OBJECT_H_

object.c

#include <errno.h>
#include <stdlib.h>
#include <string.h>

#include "object.h"

// C23 compat hack
#undef HAVE_C23_NULLPTR
#if __STDC_VERSION__ >= 202000L
# if defined __clang_major__
#  if __clang_major__ >= 16
#   define HAVE_C23_NULLPTR 1
#  endif
# elif defined __GNUC__
#  if (__GNUC__ > 13) || (__GNUC__ == 13 && __GNUC_MINOR__ >= 1)
#   define HAVE_C23_NULLPTR 1
#  endif
# endif
#endif
#ifndef HAVE_C23_NULLPTR
# define nullptr NULL
#endif

int
object_init (struct object *obj,
             int            some_param,
             bool           another_param)
{
    if (!obj)
        return EFAULT;

    // memset() zeroes both data and internal padding bytes
    memset(obj, 0, sizeof *obj);

    obj->a = some_param;
    obj->b = another_param;

    return 0;
}

void
object_fini (struct object *obj)
{
    if (obj) {
        // no allocations to free this time
        memset(obj, 0, sizeof *obj);
    }
}

struct object *
object_create (int   some_param,
               bool  another_param,
               int  *err)
{
    struct object *obj = malloc(sizeof *obj);
    int e = errno;
    if (!obj)
        // allocation failure
        goto fail_alloc;

    e = object_init(obj, some_param, another_param);
    if (!e)
        // no errror
        return obj;

    // init failed
    free(obj);

fail_alloc:
    if (err)
        *err = e;

    return nullptr;
}

void
object_destroy (struct object **objp)
{
    if (objp) {
        struct object *obj = *objp;
        *objp = nullptr;
        object_fini(obj);
        free(obj);
    }
}

@imaami
Copy link
Contributor

imaami commented Dec 27, 2024

To clarify: that example design isn't necessarily meant as a change suggestion specifically for this exact PR. Implementing it across the board for the C generator would be a larger undertaking (although not a huge effort).

@eafer
Copy link
Contributor Author

eafer commented Dec 28, 2024

I don't disagree, but I'm trying to keep the api changes as small as possible to avoid annoying other users of the generator. Also, we sometimes allocate simple objects on the stack like this: https://github.com/ARM-software/avh-api/blob/c7a6b1f8a71c235a684766eac44adb3a9b9d80a1/c/example/example.c#L427, which is convenient. My fear is that having too many different ways to allocate/init the objects could confuse the users.

@wing328 wing328 added this to the 7.11.0 milestone Dec 30, 2024
@imaami
Copy link
Contributor

imaami commented Jan 1, 2025

It does seem like there's asymmetry between the current create and free interfaces. But what's the long-term plan for memory management? Leaving everything up to the library users is probably not going to help with security. Or maybe I misunderstood what you meant?

@eafer
Copy link
Contributor Author

eafer commented Jan 1, 2025

The library users still need to use free to deal with objects that were returned by api calls. Other than that, yes, I expect them to be responsible for the objects they allocate themselves. In the sample link I sent to you before, there's just no reasonable way for the library to handle memory.

@imaami
Copy link
Contributor

imaami commented Jan 1, 2025

The library users still need to use free to deal with objects that were returned by api calls.

The library allocates objects but the users need to call the libc free()? Or did you mean a library-provided deallocation function?

Other than that, yes, I expect them to be responsible for the objects they allocate themselves.

Of course, it's C after all. I'm mostly wondering about what APIs the library will provide for allocating and freeing its own objects.

In the sample link I sent to you before, there's just no reasonable way for the library to handle memory.

Yep. The _library_owned flag variable you added is a step in the right direction, it works as long as users only do aggregate initialization of stack objects and zero-initialize heap objects.

A great deal of unnecessary internal runtime allocations and strlen() churn could be eliminated by replacing plain char * use with a simple string object that has an internal state indicating whether the embedded pointer is a reference or owned, read only or writable, how much the allocation size is, and how long the currently stored string is.

@eafer
Copy link
Contributor Author

eafer commented Jan 2, 2025

The library allocates objects but the users need to call the libc free()? Or did you mean a library-provided deallocation function?

No no, I meant the *_free() functions from the library.

I'm mostly wondering about what APIs the library will provide for allocating and freeing its own objects.

For freeing, there is no change. The *_free() functions free the objects that the library returns. For allocating, I got rid of the function that allocates the whole object because it doesn't make much sense as it is, so the only library-allocated objects are the ones returned by api calls. IIRC there are other functions for allocating things such as lists; the user must free through the library anything allocated through the library, and they must free by themselves anything they allocated otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants