An introduction to Redash
2020-08-01 / modified at 2023-08-08 / 483 words / 3 mins

Redash is a free, open source, lightweight business intellegince tool connecting to any data source, query and visualization.

Why Redash

Comparison prequence

  • Focus on visualization only, no ETL/OLAP here.
  • Free and open source.
  • Only talks about SQL or ElasticSearch as datasource, no TSDB(prometheus/loki) here.

Metrics

For comparison, here’s my experience

BI ToolsMetabase(OSS)Redash(OSS)Grafana(OSS)superset
Demo siteN, but can be created by herokuLinklinklink
LicenseAGPL(Restricted)BSD2ApacheApache
Frontendreactreact, redash@viz-libraryreact, angularreact, d3
BackendClojurePython(Flask)GoPython(FAB)
HA support
ClusterN(In progress)N(Buy SaaS)Y(Session based)N(Celery is not)
DependsPostgresPostgres,Redis(as MQ)PostgresPostgres, Redis(as MQ)
Datasource
Engine & DriversSelf-developedQRDS(SQLAlchemy/ pystache,RQ-Queue)Self-developedSQLAlchemy, Celery
Caching resultsYYNY
OracleYYN(Enterprise Only)Y
ElasticSearchNYYY
AuthZ
Folder & GroupYHidden(multi_org)YN
RBACYNNN
Row-level securityN(Commercial Only)NNN
AuthN
Forward authNYYY
Visualization
Iframe embedding TokenYN(always public)YY
Pivot tableYY(A little hard)Y(Plugin required)Y

Here is my choice strategy

$2Pick upa BI toolJust want a standaloneSQL-based analyse AppMetabaseIframe embeddingMetabase,Redash,Grafana,supersetSecondary custumizationRedash,Grafana,superset,MetabaseLog analyseGrafana,Redash,superset,MetabaseMore requirement?Buy enterprise/SaaS version of them

In our scenario(Backend stack), the following is the main reason for Redash

  • Less code maintenance, no core source code modification required.
  • Newcomer friendly guide for a production setup
  • Raw SQL support(but with injection, no sql pre-compile)
  • Query definitions && low code support

Here are more talks between different teams.

Redash Inside

Architecture

Redash use a asynchronous connection to fetch and cache results.

$2EngineRQ-SchedulerRQ-Worker * NRedisWebApp(Flask)FrontendDataLakePostgrestimer taskconsumescrape raw datacachingRESTful APIsubmit taskcache_results

Scalability & High availability

Liebig’s law of redash

  • python-rq uses Redis as a queue implementation, but lacks cluster support. So the redis instance requires AOF/RDB setup.
  • Session/authorization: require shared sessions for multiple Flask apps(redis/db/traefik).
  • Scheduler: only single instance per Redis.

Production

For production use, you must deploy redash instances behinds a ELB.

$2EngineRQ-Scheduler * 1RQ-Worker * NRedis AOFELBWebApp(Flask) * NSession Implor Forward authFrontendDataLakePostgrestimer taskconsumeraw datacachingRESTful APIsubmit taskcache_results

However, there are no OSS HA solution available, buying a SaaS version may be a better way for less maintance and also a way to contribute for open source.

One more thing

  • Redash use jinja as template engine, NO sql injection protection is provided.
  • Redash front is NOT a low code solution, we have verified the productivity of redash with Echart.js +vue, it showed that it’s faster to write js manually than edit json files in redash.

Reference