Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. With today’s technology, it’s possible to analyze your data and get answers from it almost immediately – an effort that’s slower and less efficient with more traditional business intelligence solutions.
Distributed and fault-tolerant realtime computation
big data, stream processing, analytics, distributed, dag
Fast and general engine for large-scale data processing
big data, stream processing, analytics, batch processing
Airflow is a platform to programmatically author, schedule and monitor workflows
big data, analytics, data engineering, etl, pipelines
Fast and reliable large-scale data processing engine
big data, stream processing, analytics, distributed
9 min read
431
article
architecture, infrastructure, machine learning, sql, spark
10/15/2020
by Matt Bornstein / Martin Casado / Jennifer Li from a16z.com
Five years ago, if you were building a system, it was a result of the code you wrote. Now, it’s built around the data that is fed into that system. And a new class of tools and technologies have emerged to process data for both analytics and operational AI/ ML.
article
data engineering, architecture, data lake, optimization, pipeline
03/01/2020
by Satish Chandra Gupta from satishchandragupta.com
For deploying big-data analytics, data science, and machine learning (ML) applications in real-world, analytics-tuning and model-training is only around 25% of the work. Approximately 50% of the effort goes into making data ready for analytics and ML. This article gives an introduction to the data pipeline and an overview of big data architecture alternatives
100 min read
documentation
spark, analytics, data, processing, etl
from databricks.com
The official page for common terms used in the Spark ecosystem
tutorial
spark, r, data, notebook, analytics
01-02-2017
by Max Woolf from minimaxir.com
An example notebook using Spark and R to process and analyze Product Reviews on Amazon
5 min read
tutorial
logs, spark, data, databricks, rdd
04/21/2015
by Ion Stoica / Vida Ha from databricks.com
Databricks provides a powerful platform to process, analyze, and visualize Josh Duffney big and small data in one place. In this blog, we will illustrate how to analyze access logs of an Apache HTTP web server using Notebooks. Notebooks allow users to write and run arbitrary Apache Spark code and interactively visualize the results. Currently, notebooks support three languages: Scala, Python, and SQL. In this blog, we will be using Python for illustration.
90 min read
course
streaming, aws, dataframes, rdd, sql, spark
from sparkbyexamples.com
In this Apache Spark Tutorial, you will learn Spark with Scala code examples and every sample example explained here is available at Spark Examples Github Project for reference. All Spark examples provided in this Apache Spark Tutorials are basic, simple, easy to practice for beginners who are enthusiastic to learn Spark, and these sample examples were tested in our development environment.
optimize operational goals within teams by tracking key metrics
turn critical user events from funnels into actionable insights
preventing costly errors and support investment
be aware of potential attacks or suspicious behavior by analysing logs
nifi log collector -> kafka queue -> spark processing / hive and hdfs -> tableau<
fluentd log collector -> kinesis stream -> emr spark processing -> redshift -> tableau
design a robust analytics system with batch processing using airflow, singer, spark, bigquery and tableau
facilitate real time data flow using flink, kafka, s3 and presto to instantly react on new events
get most of what you need in a minimal yet powerful analytics system using only fivetran, snowflake and mode
ship, process and store large amounts of events using snowplow, fluentd, airflow, spark and redshift
copyright upslug.com @ seattle