Skip to content

Update DSS webserver architecture

Benjamin Pelletier edited this page Jan 13, 2022 · 1 revision

Purpose

This document outlines a change to the webserver approach used by the InterUSS DSS. This change is motivated by the shortcomings of the current approach as outlined below, and the described change is intended to address those shortcomings.

Current approach

Overview

Currently, we use automated tooling to convert a canonical OpenAPI description (in YAML format) to a gRPC/Protocol Buffer definition (.proto file) after applying custom preprocessing to the OpenAPI YAML. Then, the protocol buffer compiler generates Go code describing both the objects and endpoints defined in the .proto file – the core-service implements handlers for these endpoints and attaches them to a gRPC server. Finally, the protocol buffer compiler generates Go code to translate HTTP requests into gRPC requests – the http-gateway calls this code to instruct the grpc-gateway module to listen for matching HTTP requests and fulfill them using the remote gRPC server (core-service).

Advantages

Fully automated YAML → Go

All API objects, endpoint mappings, and handler signatures are generated with one command and no other human involvement. This nearly eliminates the possibility of human error when translating the API specification into server behavior.

Fully built and proven

This approach is certain to work (because it already does), and has been proven in many deployments over the past 2+ years. We have proven there are no “gotchas” at demonstration and small scales.

Shortcomings

Cannot support some interfaces

This approach passes all data through an automatically-generated protobuf container, so only data types which can be described by an automatically-generated protobuf can be handled. InterUSS narrowly avoided a major problem implementing the SCD API as GeoJSON was tentatively going to be an input geo data format, GeoJSON expresses areas as an array of arrays, and protobufs cannot represent an array of arrays. There is a substantial risk that a future API which the DSS must support will use GeoJSON, and in that case, it will be difficult to fulfill the InterUSS DSS’s core mission of implementing standards-defined interfaces.

Off-nominal responses are difficult

Nominal responses are easy and convenient to return as the handler signatures include a return value of the nominal type (as well as a Go-standard error).  Returning a response of a different type in an off-nominal circumstance, like when some OVNs are missing, is very difficult.  A custom status proto must be constructed and attached to an error returned from the handler, then custom middleware must detect that special error and translate it as another special proto before dispatching the result back to the http-gateway, which must then be interpreted in a complex decoding routine.

For essentially the same reason, it is still difficult to return a nominal result with a different HTTP code than 200 even though standard APIs call for this behavior.

A gRPC server and HTTP gateway do not present most of these problems for APIs that are designed to use gRPC, and therefore respect gRPC style and limitations.  However, this architecture can make the core DSS mission of implementing standards-defined interfaces very difficult to achieve when those interfaces deviate from gRPC style and limitations.

Errors are hard to trace

The complex decoding routine in the http-gateway attempts to distinguish between

  1. Intentional off-nominal results with special codes, like missing OVNs
  2. Errors with specific response codes, but following the usual error payload format
  3. Unintentional core-service errors that were properly handled by core-service, and thus have well-defined error IDs
  4. Unintentional core-service errors that were not properly handled by core-service, and thus need an error ID to be assigned by http-gateway (meaning these errors are much more difficult to track down in core-service based on the content of the result returned to the client)
  5. Errors in http-gateway while handling an otherwise-valid core-service result
  6. Errors in http-gateway while handling an error result from core-service

This off-nominal scheme is very complex to understand, maintain, and interpret. Tracking down errors was nearly impossible before the addition of error IDs, and even now we often confuse #3 errors for #4 errors and therefore need to obtain the core-service error ID from the http-gateway log message according to the http-gateway error ID presented to the client.

Deployment is more complex

A DSS instance is logically composed of the webserver and the supporting database: two container types. When the webserver is split into two containers (core service & protocol translation), the number of containers increases by 50% and the amount of cross-container coordination doubles. This coordination is nontrivial, as can be observed in a short wall of confusing errors when a developer deploys a local DSS instance with run_locally.sh (when http-gateway tries to connect to core-service before core-service is ready to serve requests). Moreover, the orchestration of the current system is extremely complex: the deployment definitions (jsonnet for Tanka) are 57% larger than the entire DSS server, database migrator, and mock OAuth provider combined.

Sysops is more complex

With a single webserver container, that container is either working or not working. With two containers (gateway + core service), each container can be working or not working, so there are twice as many possible states for the system to be in when an operator is debugging a problem, and twice as many choices from which to select the correct action to fix the problem (not counting the additional problems made possible by the interconnection between the containers which was not previously present).

Performance is suboptimal

To handle an HTTP request (the only kind of request supported externally, as the gRPC interface is not exposed and no clients have expressed an interest in using it), the request must always be deserialized to memory, handled, and the response reserialized to be dispatched to the client. This architecture, however, adds a number of additional steps:

  • Request must be reserialized to gRPC format
  • Request must be transmitted from http-gateway container to core-service container
  • Request must be deserialized to memory in core-service container
  • Response must be transmitted from core-service container to http-gateway container
  • Response must be deserialized to memory in http-gateway container
  • Response must be reserialized to HTTP format

System is more exposed to vulnerabilities

The use of the gRPC layer introduces a very large tree of dependencies into the project, and there have been multiple instances where one of those indirect dependencies had a security vulnerability. Breaking changes in Go and that dependency tree have made keeping the dependencies fully up-to-date challenging, leaving security vulnerabilities unresolved for longer.

Proposed approach

Outline

The approach outlined in this document to address most of the shortcomings described above is to:

  • Remove the gRPC layer entirely
  • Have core-service directly expose an HTTP interface which will be automatically generated from OpenAPI YAML files, requiring implementation of a strongly-typed interface
  • Replace http-gateway with a standard HTTP reverse proxy (which also incidentally takes a step toward splitting core-service into rid-service, scd-service, and aux-service)

Alternatives considered

Use deepmap/oapi-codegen for server code generation

Tested by adding a Docker file in oapi-codegen repo root folder with:

FROM golang:1.16-alpine AS build
RUN mkdir /app
WORKDIR /app
COPY go.mod .
COPY go.sum .

RUN go mod download

RUN mkdir -p cmd
COPY pkg pkg
COPY cmd/oapi-codegen cmd/oapi-codegen

RUN go install ./...

FROM alpine:latest
COPY --from=build /go/bin/oapi-codegen /usr/bin
ENTRYPOINT ["/usr/bin/oapi-codegen"]

then

docker image build -t deepmap/oapi-codegen .

Then, in dss repo root folder

docker container run -v $(pwd)/interfaces/astm-utm/Protocol/utm.yaml:/utm.yaml:ro deepmap/oapi-codegen /utm.yaml -generate types,server > scd_server.gen.go

HTTP server support:

  • Echo
  • Chi
  • net/http
  • Gin

Pros

  • Generated code seems simple, clear and easy to use
  • Documentation seems good

Cons

  • Does not correctly handle $anyOf which is used extensively to get around OpenAPI 3.0’s shortcomings. This could be fairly easily partially resolved by preprocessing $anyOf’s like this into direct $refs, but then the comments automatically generated from the description tag would match the underlying type rather than the instance.
  • Does not have premade Docker image or Dockerfile
  • Poor CI usage: many commits to master fail CI checks
  • Does not handle multiple authorization scopes correctly

Use OpenAPITools/openapi-generator for server code generation

Tested with:

docker run --rm -v "${PWD}:/local" openapitools/openapi-generator-cli generate -i https://raw.githubusercontent.com/astm-utm/Protocol/master/utm.yaml --skip-validate-spec -g go-server -o /local/out/go

HTTP server support:

  • Echo
  • Chi
  • net/http
  • Gin

Pros

  • Extremely wide usage of the general tool (though usage of Go server generation may be less common)
  • Multiple generator variants for a Go server

Cons

  • Produces a huge number of files (many irrelevant for our purposes, including duplicates due to OpenAPI tags), and control of that behavior is clunky
  • Appears intended to generate a one-time starting point rather than separating just the interface which can be regenerated after handler implementations are finished (requires post-processing patching to adapt to our use case)
  • Appears to fail to define certain types when $anyOf is used (e.g., AnyOfTime), so will probably need the same $anyOf -> $ref preprocessing as deepmap/oapi-codegen
  • Appears to have some bugs (e.g., err := AssertanyOf<Time>Required(*obj.TimeStart))

Use a more-capable webserver framework rather than net/http

For example: gorilla/mux, labstack/echo, go-chi/chi, gofiber/fiber, gin-gonic/gin

Any of these frameworks could be fairly easily adopted at a later time by modifying our custom server autogeneration code, but none seemed to offer any particularly compelling features since full custom autogeneration means we can write exactly what we need and we’re not constrained by the drive to avoid boilerplate.

Implementation plan

Overview

The required changes to the project are unfortunately difficult to implement incrementally because the interface exposed by core-service will change, and therefore the service consumed by http-gateway will change. Therefore, both http-gateway and core-service will need to change into their new forms in one commit on master (though certainly developed on a separate branch until fully verified, like the transition to SCD API 0.3.17).

It may be possible to change one service at a time within core-service (i.e., changing SCD to an HTTP interface while RID and aux are still gRPC interfaces), but doing so would require appreciable extra development on http-gateway which would be immediately discarded at the conclusion of the upgrade. Developing the changes on a separate branch then merging that branch into master all at once should result in substantially lower developer resources.

Phase 1: openapi-to-go-server

This phase will be complete when a Python tool in the repository can generate a complete HTTP interface written in Go given only an OpenAPI YAML and a few high-level directives. This tool will be fully custom to ensure it meets all InterUSS needs. This phase does not need to be developed on a separate branch because it does not affect any of the current DSS architecture.

Phase 1a: Parse all data types

Data types are defined in components/schemas within the YAML, so the tool will simply read the information necessary to render those data types as Go data types. Some data types are also defined inline with individual fields; the tool will parse those data types in the same way and assign them reasonable names.

Phase 1b: Parse all endpoints

Endpoints are listed by path then by verb; the tool will parse all endpoints including defined parameters, request body, and responses along with their corresponding bodies.

Phase 1c: Render Go data types

The tool will use the parsed data types to auto-generate Go code definition all the data types used in the OpenAPI YAML.

Phase 1d: Render handler interface

The paradigm of the server generated by the tool will be that an Implementation will be provided to the auto-generated server that will implement a strongly-typed interface including all the parameters and request body, and return an object capable of representing any of the defined responses. In this phase, that interface will be defined (so, it will fully define type Implementation interface { which will include a method for each path:verb endpoint).

Phase 1e: Render HTTP server

The tool will auto-generate an HTTP server – that is, an object capable of responding according to the provided OpenAPI when its func Handle(w http.ResponseWriter, r *http.Request) method is called.

Phase 2: Migrate DSS webserver

In this phase, the gRPC server scaffolding in core-sevice is replaced with the server generated by the tool in Phase 1, and http-gateway is changed to a simple HTTP-HTTP reverse proxy rather than an HTTP-gRPC gateway.

Phase 2a: Migrate core-service

When this phase is complete, running the core-service binary will host an HTTP interface implementing all supported interfaces rather than a gRPC interface. Starting with this phase, the DSS will not run via any technique (including build/dev/run_locally.sh) except running its Go code directly on a host machine. The dss image will not build.

All content in pkg/api will be replaced with the new auto-generated content, and this package will then include the auto-generated server as well as the data types and implementation interface. The code in pkg/{rid|scd}/server/_handler.go will be combined with the code in pkg/{rid|scd}/application/_handler.go, but the core logic in all the handler routines should remain the same. The function signatures of these handlers will change to match the auto-generated implementation interface.

Phase 2b: Replace http-gateway with reverse proxy

When this phase is complete, http-gateway will proxy HTTP requests to the HTTP server in core-service and running a local instance (via build/dev/run_locally.sh) will again work.

Phase 2c: Update deployment

When this phase is complete, the DSS should retain all its original functionality and be deployable via the standard process. At this point, the Phase 2 branch should be ready to merge to master.