Launch HN: Data Mechanics (YC S19) – The Simplest Way to Run Apache Spark Hi HN, We’re JY & Julien, co-founders of Data Mechanics ( https://ift.tt/2Ll02Aw ), a big data platform striving to offer the simplest way to run Apache Spark. Apache Spark is an open-source distributed computing engine. It’s the most used technology in big data. First, because it’s fast (10-100x faster than Hadoop MapReduce). Second, because it offers simple, high-level APIs in Scala, Python, SQL, and R. In a few lines of code, data scientists and engineers can explore data, train machine learning models, and build batch or streaming pipelines over very large datasets (size ranging from 10GBs to PBs). While writing Spark applications is pretty easy, managing their infrastructure, deploying them and keeping them performant and stable in production over time is hard. You need to learn how Apache Spark works under the hood, become an expert with YARN and the JVM, manually choose dozens of infrastructure paramet...