upslug

Resources Tools

Learn to scale UP

the right way

Implement powerful pipelines and reach scale with:

Cloud, SRE, Analytics, Machine Learning, Microservices

enter your email

Data Analytics

know more about your business using key metrics

+ analytics pipeline for metric data on GCP

+ lambda / stream processing for log data

+ quickest setup for a modern BI pipeline

+ scalable batch processing for event data

CI / CD

increase productivity of code delivery with automated workflows

+ painless setup using circleci and ansible

+ scalable kubernetes on gcp with cloud build

+ use aws code pipeline and code deploy to build a traditional rails application

Site Reliability

maintain infrastructure and maximize availability of workloads

+ monitoring / alerting

+ automate aws with ssm, config, cloudwatch

+ automation with opsworks chef, ansible

+ kubernetes

+ logging with elk stack

+ high availability on aws

Cloud Infrastructure

+ high availability in aws

+ security in aws

+ networking in aws

+ storage in aws

Security

+ distribution / decomposition

+ communication

+ monitoring

+ data management

Microservices

+ kubernetes managed cluster on AWS EKS

+ nomad with less complex container cluster

+ easier Kubernetes management with GKE Autopilot

+ easier Kubernetes management with AWS Fargate

Machine Learning

+ distribution / decomposition

+ communication

+ monitoring

+ data management

What are We Building?

Data Analytics Pipelines

know more about your business using key metrics

Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. With today’s technology, it’s possible to analyze your data and get answers from it almost immediately – an effort that’s slower and less efficient with more traditional business intelligence solutions.

Tools

storm

Distributed and fault-tolerant realtime computation

big data, stream processing, analytics, distributed, dag

Spark

Fast and general engine for large-scale data processing

big data, stream processing, analytics, batch processing

kafka

distributed event streaming platform

big data, stream processing, streaming

airflow

Airflow is a platform to programmatically author, schedule and monitor workflows

big data, analytics, data engineering, etl, pipelines

flink

Fast and reliable large-scale data processing engine

big data, stream processing, analytics, distributed

presto

Distributed SQL Query Engine for Big Data

big data, sql, analytics

hudi

Hudi ingests & manages storage of large analytical datasets over DFS

big data, data lakes

delta lake

Reliable Data Lakes at Scale

big data, data lakes

great expectations

Always know what to expect from your data

3.4k

nifi

A reliable system to process and distribute data

big data, message queue, etl, analytics

hive

Data Warehouse Software for Reading, Writing, and Managing Large Datasets

big data

Resources

Emerging Architectures for Modern Data Infrastructure

9 min read

431

article

architecture, infrastructure, machine learning, sql, spark

10/15/2020

by Matt Bornstein / Martin Casado / Jennifer Li from a16z.com

Five years ago, if you were building a system, it was a result of the code you wrote. Now, it’s built around the data that is fed into that system. And a new class of tools and technologies have emerged to process data for both analytics and operational AI/ ML.

Scalable Efficient Big Data Pipeline Architecture

8 min read

article

data engineering, architecture, data lake, optimization, pipeline

03/01/2020

by Satish Chandra Gupta from satishchandragupta.com

For deploying big-data analytics, data science, and machine learning (ML) applications in real-world, analytics-tuning and model-training is only around 25% of the work. Approximately 50% of the effort goes into making data ready for analytics and ML. This article gives an introduction to the data pipeline and an overview of big data architecture alternatives

Spark Glossary

100 min read

documentation

spark, analytics, data, processing, etl

from databricks.com

The official page for common terms used in the Spark ecosystem

Playing with 80 Million Amazon Product Review Ratings Using Apache Spark

15 min read

189

tutorial

spark, r, data, notebook, analytics

01-02-2017

by Max Woolf from minimaxir.com

An example notebook using Spark and R to process and analyze Product Reviews on Amazon

Analyzing Apache Access Logs with Databricks

5 min read

tutorial

logs, spark, data, databricks, rdd

04/21/2015

by Ion Stoica / Vida Ha from databricks.com

Databricks provides a powerful platform to process, analyze, and visualize Josh Duffney big and small data in one place. In this blog, we will illustrate how to analyze access logs of an Apache HTTP web server using Notebooks. Notebooks allow users to write and run arbitrary Apache Spark code and interactively visualize the results. Currently, notebooks support three languages: Scala, Python, and SQL. In this blog, we will be using Python for illustration.

Spark by Examples

90 min read

course

streaming, aws, dataframes, rdd, sql, spark

from sparkbyexamples.com

In this Apache Spark Tutorial, you will learn Spark with Scala code examples and every sample example explained here is available at Spark Examples Github Project for reference. All Spark examples provided in this Apache Spark Tutorials are basic, simple, easy to practice for beginners who are enthusiastic to learn Spark, and these sample examples were tested in our development environment.

Use Cases

monitor company growth

optimize operational goals within teams by tracking key metrics

identify customer churn

turn critical user events from funnels into actionable insights

risk modeling

preventing costly errors and support investment

security analysis

be aware of potential attacks or suspicious behavior by analysing logs

Diagrams

Spark Pipeline

nifi log collector -> kafka queue -> spark processing / hive and hdfs -> tableau<

AWS Pipeline

fluentd log collector -> kinesis stream -> emr spark processing -> redshift -> tableau

Glossary

upslug

Subscribe for free new guides

enter your email

Learn to scale UP

the right way

Implement powerful pipelines and reach scale with:

Cloud, SRE, Analytics, Machine Learning, Microservices

What are We Building?

Data Analytics Pipelines

know more about your business using key metrics

Tools

Resources

Use Cases

monitor company growth

identify customer churn

risk modeling

security analysis

Diagrams

Guides

analytics pipeline for metric data on GCP

lambda / stream processing for log data

quickest setup for a modern BI pipeline

scalable batch processing for event data

Glossary

upslug

Company

Products

Guides