Mnemosyne and GitHub Copilot

TLDR; see below for a video of Mnemosyne working with OpenAI's Codex to automatically generate reliable code.

GitHub Copilot is an "AI pair programmer" which on the surface sounds identical to Mnemosyne. The differences lie in what developer tasks each supports and in how each works. We'll address these in turn. However, the top-level takeaway is that given their different focuses Copilot and Mnemosyne can work productively together. Think of Copilot as a very productive sloppy coder prone to making correctness and security flaws, and Mnemosyne as an automated defensive coder following behind adding tests, assertions, type annotations, and fixing bugs and stability issues as they are noticed. See below for a video of Mnemosyne and Codex (the AI model that powers Copilot) working together.

Mnemosyne and GitHub Copilot connected to a developer's IDE.

What Mnemosyne and Copilot do

Copilot: As shown in their very impressive home page demos and youtube videos GitHub's Copilot powered by the underlying OpenAI Codex model do a great job of generating large amounts of running code very quickly. These tools go from code comments or documentation to running code.
Mnemosyne: As shown in our demo video, Mnemosyne focuses on adding tests, types, and assertions of invariants to existing code. It also updates existing code to fix minor flaws, fix failing tests, and improve other properties such as floating point stability. Mnemosyne operates by taking and returning argot (our extension to LSP which communicates code, tests, types, and prose). Note, in some cases Mnemosyne Muses do synthesize code, e.g. our Trinity muse can synthesize mathematical code against provided input/output sets as shown here.

A developer has to perform many disparate tasks in their daily work. Producing code is certainly an important one, but writing the structures that document, constrain, test, and defend that code are also essential elements.

How Mnemosyne and Copilot work

Copilot: GitHub Copilot is a front end to OpenAI Codex which builds on their GPT3 model. GPT3 is a massive machine learning model supporting a generic "text in, text out" interface. In the case of Copilot the text in is code comments and the text out is running code.
Mnemosyne: Mnemosyne is a system that coordinates many developer assistants, rather than a single assistant. The actual assistants currently incorporated into Mnemosyne (i.e., muses) vary widely in their method of implementation using techniques including formal methods, machine learning, evolutionary computation, and simple mechanical refactorings.

Mnemosyne and Codex working together

The following demo shows the integration of the OpenAI Codex model into Mnemosyne where it works synergistically with Mnemosyne's existing automated testing support to automate the development of reliable software.

Codex generates a method body from a documentation string. Unfortunately as is true of many ML-generated functions this implementation has a bug.
After the generated code is inserted, Mnemosyne's Hypothesis muse engages to test the new code.
Hypothesis finds an example illustrating an error in the generated code.
The developer updates the documentation to describe newly identified edge case.
Codex then generates a more correct function body in response to the enhanced documentation string.