Deploying Superset to the Cloud From Source

Gaurav Agrawal
3 min readJun 6, 2021

--

- “Apache Superset was designed from the ground up to be cloud native and scale out to large deployments. It was battle hardened at Airbnb, where it served 600+ weekly active users viewing over 100k charts.”

To handle such a scale, Superset is deployed and run within Kubernetes. However, setting all of this infrastructure up in the cloud can be very heavy for a quick pilot or proof of concept. We want to empower data analysts and data scientists who are fluent in Python and SQL to be able to deploy Superset end to end and to help evaluate if Superset works for the teams they’re supporting.”

In this post, We will walk you through how to deploy Superset without using Docker

Here’s a map of different services that are needed to run Superset

  1. Nginx web server as reverse proxy
  2. Guincorn as application server
  3. Superset as our BI app.
Created using draw.io

Here’s an overview of how traffic flows through Superset.

  1. Use Nginx to serve clients.
  2. Use Nginx to serve static resource requests.
  3. Use Gunicorn for WSGI app server.

Installation:

Assuming that you have a Linux based virtual machine ready, let’s dig into the installation process.

  1. “Run the git command to check if you have git installed out of the box. If not, you can use this tutorial to help you install git on your Linux virtual machine
  2. Use git clone to pull down the source code from Github
  3. $git clone https://github.com/apache/incubator-superset.git
  4. Install os dependencies specific to your Linux distribution
  5. Follow the Superset contributing guide to setup and build the flask server and frontend assets.

Deployment:

Superset as unix/linux service

In this method we create a background process using systemd in the cloud. Which helps keep superset up and running.

Steps:

1. $ cd /etc/systemd/system
2. $ sudo nano superset.service
3. Paste the contents from file mentioned above.
4. Verify service is up or not by:
$sudo systemctl start superset
$sudo systemctl status superset
$sudo journalctl -xe (if any errors, check the logs this way)
5.

Systemd makes our lives easier and now can simply start and stop Superset as a background process.

You can also verify the service’s effect by visiting http://yourvm’sip:6000, but for this step you might need to open this port temporarily by tweaking your VM’s settings.

Nginx Configuration for Superset

1. $cd /etc/nginx/sites-enabled
2. Replace the content of `default` file with content mentioned above.
3. Validate configuration by:
$sudo nginx -t4. Restart nginx and make sure it’s running by checking it’s status and if there are any errors then take a look at the logs

$sudo systemctl restart nginx
$sudo systemctl status nginx

Visit http://yourip ---[you should be able to visit Superset at this point if any errors did not occurred]

$sudo journalctl -xe --[To get logs if any errors occurred]

Extras:

  1. You can configure Nginx to do a lot of stuff like http caching, compression, ssl etc.
  2. Configure Superset to use some different DB rather than sqlite for handling the application data.
  3. Configure Redis layer.

Tips for troubleshooting:

  1. You might miss any os-level dependency, so make sure if any one is missed try installing again from the os-dependencies section mentioned above, if anything still misses, you might face errors at pip install -r requirements/local.txt section, at that moment mostly searching the error over internet can help in most cases.
  2. Some errors you might face because of the python version, so make sure you are using the python3.8 which works well with current code.
  3. There is a known issue with Nginx on Ubuntu 18.04 LTS, if you are using Ubuntu’s this specific version and you face a similar issue then a linked thread can be followed.
  4. If on visiting your service it redirects to localhost try setting this flag: ENABLE_PROXY_FIX=True in config.py .
  5. Read config.py carefully and tweak things as per your requirement.
  6. Use Stackoverflow and Github issue tracker to search for any known issues in the process which you might have faced, and you can also check out other resources like the Slack, issue tracker and mailing lists to communicate with the community if something unique pops up.

Happy Dashboarding!

--

--

No responses yet