Apache Superset: Under The Hood

Gaurav Agrawal
5 min readJun 29, 2020

--

credits: Apache Software Foundation.

In this post, I will be sharing some useful knowledge I gathered while working on Apache Superset’s codebase as a power user. As an active Superset community member, I hope this walk through can help other users gain a solid understanding of how Superset works, so they can take advantage of its features, utilities and help make it better.

Superset is an open-source business intelligence app that’s built on top of technologies like Flask App Builder, React, SQLAlchemy.

credits: Apache Software Foundation

The codebase is very mature and intuitive to understand.

Superset’s main building blocks are:

  1. Flask App Builder
  2. React
  3. SQLalchemy
  4. Pandas
Superset’s Internal Desing (A look from the top)

The FAB Within Superset

Flask App Builder (FAB) is an open-source application development framework built on top of Flask. It includes built-in detailed security management, auto CRUD generation for your models, and other useful features out of the box.

Superset uses FAB to enable its MVC architecture and security capabilities. It provides necessary admin views like:

  • User Profile
  • Password Reset views
  • User Management Views
  • Security Views
  • Home page

One can easily add their own models and views on top of these views and theme the app as they like. FAB also allows for highly granular security settings and comes with pre-built session management and a security manager.The security manager is the module which enables the ability to define roles, assign permissions to roles, manage users and their roles, etc.

Just like most of the things in FAB, Security Manager is customizable as well.

FAB supports multiple types of popular authentication methods out of the box, including:

  1. OAuth v1/v2
  2. LDAP
  3. Database
  4. Open ID
  5. Remote_user

It’s really easy to set up your choice of authentication method with FAB.

FAB’s Database Authentication method, which is one of the most popular choices, and the default within Superset is simple to understand and works by using multiple tables joined using foreign keys.

Let’s see how it works from the top view in Superset:

  1. For every view and rest API FAB automatically generates permission pairs, which are in the form of:
  • can `permission type` on `resource/view/api`
  • Example: can get on MyAPI (user can make GET request on api “MyAPI”)

2. The table “ab_view_menu” lists all the views and APIs on which the permissions will be set.

3. The table “ab_permission” lists the permission strings which define the nature of the permission which is later going to be coupled with a view or API.

  • Example: “can_post”

4. The table “ab_permission_view” contains two foriegn keys which are permission_id and view_menu_id , thus making it a completely sensible permission string.

  • Example: can_this_form_post on ResetPasswordView — -> permission_id = 3 & view_menu_id = 6

5. The table “ab_roles” contains a list of all roles, using this table it gets pretty easy to map a set of permissions for a role in the table “ab_permission_role”.

6. Finally the “ab_user” table defines a user, which has a “role” that links to all the permission they have 😀

All this work is handled by FAB internally, and it’s possible to de-granularize the permission in order to avoid clutter and have a clean set of permissions. However as of now in version Superset 0.36.0, the permissions are yet to be de-granularized 😕

Superset’s code is so good at naming conventions that it’s really easy to just “read” the code to understand what is responsible for what feature or behaviour. This is something that is expected naturally from open-source projects and almost every good, popular open-source project has these qualities already, so Superset is not an exception.

Superset Codebase Organization

Now, let’s talk about what I learnt about Superset’s code organization and how you can use this information to do some hacking yourself.

Superset is organized into two main directories:

  • superset
  • superset-frontend

Superset’s backend code, which is built on top of FAB resides in the “superset” directory.

Superset’s frontend, things like React components, charts, fonts, CSS, image assets, etc., are in the “superset-frontend’ directory.

Let’s talk a bit about Superset’s backend, and how you can try to find the piece of code you are looking for.

Superset’s backend is MVC architecture-based, so there are a few main files and directories which we should be interested in:

  • superset/models
  • superset/views/core.py
  • superset/views

I haven’t listed all of the files and directories here, but the files and directories that are listed here are those which I was able to explore and realise that they are at the core of Superset’s functionalities and it’s features.

  1. superset/models
  • This directory contains the database models for the views like Dashboard, Slice etc.
  • If you want a new MVC view or customize existing models related to application data, this is one of the important places to go to first.

2. superset/views/core.py

  • As the name suggests, this is the core view file for Superset, it contains a Superset view class, some CSS template related model views.
  • Most importantly this is the place where most of the superset APIs are defined, like “explore” , “save_dash” , “tables”, “schemas” etc.
  • Exploring the APIs is particularly important as it deals with charts visualization.

3. superset/views

  • This is the place where Superset’s views and backend APIs can be found.
  • For example, if you want to know the backend of how export slices to the csv API, sql_lab, dashboard, charts etc works then this is the place to start looking.

Similarly, we can see that most of the files and directories in Superset are named and organized in a sense that they can be easily searched via global search and the naming convention makes total sense.

This becomes a great advantage for exploring the codebase and understanding it. That is the whole point of good naming conventions and standards. 😌

To wrap up, I would like to share a few tips on how to speed up your hacking journey with Superset.

  1. You will need to set up a development environment for it. I highly recommend the file https://github.com/apache/incubator-superset/blob/master/CONTRIBUTING.md, which is an all-in-one guide for setting one up. It really helps!
  2. If you are looking to add features which are related to existing features, for example, adding new charts, supporting a new database, or including a new export method, then I recommend you check out the code/commits of existing features. For example, in https://github.com/apache/incubator-superset/pull/3013 one can look for this PR as an example on how to add new visualizations to Superset.
  3. Look into existing issues and PR’s which are related to your use case.
  4. Open up issues and PR’s to seek reviews and help from community members (like me!).
  5. Join the Superset community Slack channel to ask for quick help with your questions.

Happy Hacking!

--

--

No responses yet