TLDR; see below for a video
of Mnemosyne working with OpenAI's Codex to automatically generate
reliable code.
GitHub Copilot is an "AI pair programmer" which on the surface
sounds identical to Mnemosyne. The differences lie in what
developer tasks each supports and in how each works. We'll
address these in turn. However, the top-level takeaway is that given
their different focuses Copilot and Mnemosyne can work productively
together. Think of Copilot as a very productive sloppy coder prone to
making correctness and security flaws, and Mnemosyne as an
automated defensive coder following behind adding tests, assertions,
type annotations, and fixing bugs and stability issues as they are
noticed. See below for a video of Mnemosyne and Codex (the AI
model that powers Copilot) working together.
What Mnemosyne and Copilot do
Copilot
As shown in their very impressive home page demos and youtube
videos GitHub's Copilot powered by the underlying OpenAI Codex
model do a great job of generating large amounts of running
code very quickly. These tools go from code comments or
documentation to running code.
Mnemosyne
As shown in our demo video, Mnemosyne focuses on adding tests,
types, and assertions of invariants to existing code. It also
updates existing code to fix minor flaws, fix failing tests, and
improve other properties such as floating point stability.
Mnemosyne operates by taking and returning argot (our
extension to LSP which communicates code, tests, types, and
prose). Note, in some cases Mnemosyne Muses do synthesize code,
e.g. our Trinity muse can synthesize mathematical code against
provided input/output sets as shown here.
A developer has to perform many disparate tasks in their daily work.
Producing code is certainly an important one, but writing the
structures that document, constrain, test, and defend that code are
also essential elements.
How Mnemosyne and Copilot work
Copilot
GitHub Copilot is a front end to OpenAI Codex which builds on
their GPT3 model. GPT3 is a massive machine learning model
supporting a generic "text in, text out" interface. In the case
of Copilot the text in is code comments and the text out is
running code.
Mnemosyne
Mnemosyne is a system that coordinates many developer assistants,
rather than a single assistant. The actual assistants currently
incorporated into Mnemosyne (i.e., muses) vary widely in their
method of implementation using techniques including formal
methods, machine learning, evolutionary computation,
and simple mechanical refactorings.
Mnemosyne and Codex working together
The following demo shows the integration of the OpenAI Codex model
into Mnemosyne where it works synergistically with Mnemosyne's
existing automated testing support to automate the development of
reliable software.
Codex generates a method body from a documentation string.
Unfortunately as is true of many ML-generated functions this
implementation has a bug.
After the generated code is inserted, Mnemosyne's Hypothesis muse
engages to test the new code.
Hypothesis finds an example illustrating an error in the generated
code.
The developer updates the documentation to describe newly
identified edge case.
Codex then generates a more correct function body in response to
the enhanced documentation string.
NOTE: Mnemosyne is in early development and is not yet ready for production use.