Open source In-Memory database architecture



My friend and I are considering to build In-Memory relational open source database.

We have discussed about database architecture and our goals.

I’ll try to summarize it in this blog post.

We have a few objectives:

    • To create a database using C++.
    • To create a database which will work on Linux, Mac OS X and Windows.
    • To use as much as open source components we can.
    • To create a database which is very fast, has low memory consumption and has little needs for an administration (no need to rebuild and optimize indices, handle deadlocks and etc. ).
    • Be able to handle very large data.

 

In-Memory database architecture diagram

Here is a small diagram of our vision of database architecture.

Architecture

1) Management studio and JDBC driver

We plan to use ODBC/JDBC driver for communication.

This way many languages will be able use database without having to write native drivers.

I also think that phpMyAdmin supports ODBC for connection to MySQL.

Our plan is to use SQL standard similar to MySQL so we can reuse phpMyAdmin as management studio for our database.

We planned to use phpMyAdmin as management studio at least for initial database release.

2) WIRE protocol

We can use the same WIRE Protocol as MySQL and re-use ODBC/JDBC drivers that MySQL uses, but if we do that we will be tied up to their standard.

Also, we plan to use websockets within the C++ REST SDK for handling communication between our database and client.

Using this library will enable us to easily offer REST service or enable direct communication between the browser and our database.

But, if we use websocket we won’t be able to reuse existing ODBC/JDBC drivers.

So, there is a question. Should we use existing wire protocol and ODBC/JDBC implementation or create our own WIRE protocol and drivers?

3) Memory Manager

We will need to create a custom memory manager to be able to have lock-free structures and to fine–tune memory allocation.

4) Query processing

We plan to use SQL Lite Lemon parser for parsing queries.

We plan to create custom compiler / optimizer and executor.

5) Transaction manager

Our database should support ACID transactions.

The transaction manager will handle all transaction operations.

6) I/O Manager

This component will handle all reading and writing from/to hard drive.

7) Structures manager

We’ve created a new innovative structure for indices.

The main purpose of this component is to handle operations related to indices and data as well with some metadata and statistic tables.

8) Recovery manager

This component will create snapshots from the transaction log and enable us to quickly recover from system failure.

Any feedback is welcomed. Feel free to suggest improvements to our design or some other open source components/projects we can use in this project.

 

  • Probably a fun project but this has been solved many times over. Why not just use an existing solution? Even just https://www.sqlite.org/inmemorydb.html ?

    • Hi robot. Thanks for comment. Existing solutions have not indexes and data optimized for in-memory access. Becouse of that they are signitifically solver than some DB tuned for in-memory use.

      • Granted, open-source solutions may not provide indexes but in-memory queries may be far less optimized by indexing, at least not enough improvement to warrant the loss of space available for additional in-memory data. Maybe once we commonly have many TBs or more of RAM, indexes may become useful.

        • Actually RAM will probably very soon become much important than you think.
          http://www.theregister.co.uk/2014/06/11/hp_memristor_the_machine/

          • Hopefully 🙂
            And just a design thought: Some applications that can benefit from an in-memory database might also want (close-to) direct access to table data, maybe just through exposing APIs at the storage-engine layer and maybe the executor layer. Essentially, i’ve had a desire to embed a database into my app, not just access it through some protocol.

          • Thanks for feedback NiceRobot.

  • Hi Radenko, is this project open-sourced now? Currently I’m looking for an open-source in-memory database that is implemented in C++, I wish I could do some contributions.

    • Hi Zhipeng.
      Project will be open-sourced and it will start in next 1-2 months.

      It needs to be developed from beginning.
      If you are still interesting in contributing send me a email using contact form so we can contact you with more details.