Skip to main content

Launch HN: Hubble (YC S20) – Monitor data quality inside data warehouses https://ift.tt/3gno0Zj

Launch HN: Hubble (YC S20) – Monitor data quality inside data warehouses Hey everyone! We’re Oliver and Hamzah from Hubble ( https://gethubble.io/hn ). Hubble runs tests on your data warehouse so you can identify issues with data quality. You can test for things like missing values, uniqueness of data or how frequently data is added/updated. We worked together for the last 4 years at a startup where we built and managed data products for insurers and banks. A common pattern we saw was teams taking data from their internal tools (CRM, HR system, etc.), application databases, and 3rd party data and storing it in a warehouse for analysis. However, when analysts/data scientists used the data for reports they would spot something suspicious and the engineering team would have to manually go through the data pipelines to find the source of the problem. More often than not it was simple things like a spike in missing values because an ETL job failed or stale data because a 3rd party data source hadn’t updated correctly. We realised that reliability/ trustworthiness of the raw data was essential before you could start abstracting away more interesting tasks like analysis, insight or predictions. We wanted to do this without having to write and maintain lots of individual tests in our code. So we built Hubble, which connects to a data warehouse and creates tests based on the type of data being stored (i.e. freshness of timestamps, the cardinality of strings, max value of numbers, missing values, etc.). We’ve also added the ability to write any custom tests using a built-in SQL editor. All the tests run on a schedule and you’ll get an email or slack alert when they fail. We’re also building webhooks and an Airflow operator so you can run tests immediately after running an ETL job or trigger a process to fix a failing test. Instead of asking users to send their data to us, the tests are run in the data warehouse and we track the test results over time. Today we support BigQuery, Snowflake and Rockset (which lets us work with MongoDB and DynamoDB) and are adding more on request. We’re planning on charging $200 a month for a few seats, and $30-50 for extra users after that. We’re still at an early access stage but want the HN community’s feedback so we’ve opened up access to the app for a few days, you can try it out here https://gethubble.io/hn . We’ve added a demo data warehouse you can start with that has data on COVID-19 cases in Italy and bike-share trips in San Francisco. Thanks and looking forward to hearing your ideas, experiences and feedback! August 20, 2020 at 08:38PM

Comments

Popular posts from this blog

Show HN: AI Generated Short Video https://ift.tt/3iS1sRE

Show HN: AI Generated Short Video Hey everyone, I have always wanted to content. I enjoy creating videos and watching other's creations! Being a programmer, I decided to create an AI pipeline which can create textual scripts and churn out thousands of videos / blogs on a daily basis given any topic. This is my first video and I would appreciate any comments and suggestions regarding it but I am particularly interested in hearing about tips and tricks / content format which can be followed to make the such video more fun and intuitive to watch. Also looking for advice if this can be monetised maybe saas or ads or something completely different. I would also want to know if anyone of you would be interested in such an AI tool. The Video: https://www.youtube.com/watch?v=1J_DOIPGKsw Specifically, what I wanted to ask: What are the specific things you liked and what you didn't. Would there have been a better format for me to have made this video to deliver the message more effective...

Show HN: AWS-Powered Rube Goldberg Machine https://ift.tt/2UId1C2

Show HN: AWS-Powered Rube Goldberg Machine AWS has so many services—like more than most of us can name. What are some creative ways you can stitch them together to accomplish simple tasks in the most roundabout of ways? Get creative! This isn't about being practical! Here's a theoretical example of how to create a GIF! (Steps 4 through 998 are left as an exercise for creative readers!) 1. Add a new AWS IAM user, gif-creator, where each frame that will be part of the final GIF is base64 encoded and included as tag to the IAM user. 2. This triggers a CloudTrail event to be logged and published to Simple Notification Service. 3. Upon receiving this event, a lambda gets triggered that builds a Docker container that simply scans your domain for new DNS records. Additionally, the lambda spins up an entire Elastic Kubernetes Service (EKS) cluster with that Docker container. . . . 999. You now have a GIF in your inbox! April 6, 2020 at 02:10AM

Launch HN: TagMango (YC W20) – Personalized video shoutouts in India https://ift.tt/3e1PZxC

Launch HN: TagMango (YC W20) – Personalized video shoutouts in India Hi HN! TagMango ( https://ift.tt/2x7YwhO ) is a marketplace where fans can book personalized video shoutouts from their favourite influencers and celebrities in India (essentially building Cameo for India). Why now: Celebrity culture in India has always been way different and more pompous as compared to other nations. People are fascinated by celebrities’ lives of glamour, infact celebrities are actually worshipped here. Like Rajnikanth, a south superstar, has over 30 temples to his name. Employees are literally given holiday on his movie release day. These celebrities are respectful of this culture and are always looking to give back to their fan base. India has been leading in content consumption and creation on social media, platforms like Tik Tok are doubling MAU every year. The craze for content, the fan culture and the ease of digital payments make it an exciting opportunity for the indian audience to actually i...