Skip to main content

Launch HN: Hubble (YC S20) – Monitor data quality inside data warehouses https://ift.tt/3gno0Zj

Launch HN: Hubble (YC S20) – Monitor data quality inside data warehouses Hey everyone! We’re Oliver and Hamzah from Hubble ( https://gethubble.io/hn ). Hubble runs tests on your data warehouse so you can identify issues with data quality. You can test for things like missing values, uniqueness of data or how frequently data is added/updated. We worked together for the last 4 years at a startup where we built and managed data products for insurers and banks. A common pattern we saw was teams taking data from their internal tools (CRM, HR system, etc.), application databases, and 3rd party data and storing it in a warehouse for analysis. However, when analysts/data scientists used the data for reports they would spot something suspicious and the engineering team would have to manually go through the data pipelines to find the source of the problem. More often than not it was simple things like a spike in missing values because an ETL job failed or stale data because a 3rd party data source hadn’t updated correctly. We realised that reliability/ trustworthiness of the raw data was essential before you could start abstracting away more interesting tasks like analysis, insight or predictions. We wanted to do this without having to write and maintain lots of individual tests in our code. So we built Hubble, which connects to a data warehouse and creates tests based on the type of data being stored (i.e. freshness of timestamps, the cardinality of strings, max value of numbers, missing values, etc.). We’ve also added the ability to write any custom tests using a built-in SQL editor. All the tests run on a schedule and you’ll get an email or slack alert when they fail. We’re also building webhooks and an Airflow operator so you can run tests immediately after running an ETL job or trigger a process to fix a failing test. Instead of asking users to send their data to us, the tests are run in the data warehouse and we track the test results over time. Today we support BigQuery, Snowflake and Rockset (which lets us work with MongoDB and DynamoDB) and are adding more on request. We’re planning on charging $200 a month for a few seats, and $30-50 for extra users after that. We’re still at an early access stage but want the HN community’s feedback so we’ve opened up access to the app for a few days, you can try it out here https://gethubble.io/hn . We’ve added a demo data warehouse you can start with that has data on COVID-19 cases in Italy and bike-share trips in San Francisco. Thanks and looking forward to hearing your ideas, experiences and feedback! August 20, 2020 at 08:38PM

Comments

Popular posts from this blog

Show HN: Infstream – We’re trying to fix video monetization for creators https://ift.tt/34Rcd11

Show HN: Infstream – We’re trying to fix video monetization for creators TL;DR: https://ift.tt/2VFChrA Hi HN – we’re Ben & Callum from Infstream. We’ve always been heavy users of YouTube, for entertainment, education and sharing. Towards the end of last year, we saw more and more horror stories of YouTubers losing their livelihood to the ad algorithm. We decided to build a content-first video platform, which aims to reduce issues by removing advertisers from the equation. Instead, we charge for the content you watch – bold, I know. Instead of paying in advertising and data, users on Infstream build their own streaming package, a channel at a time. Anyone can start a channel (US & UK now, Europe soon) and earn directly from their subscribers. Subscribers pay $1 per month per channel, of which the channel receives $0.75. This all begins from the first subscriber, there are no minimums to start monetization. Channels have total control, and can publish on a daily, weekly or monthl...

Show HN: Teddy Bear Tracker iOS App https://ift.tt/34MIiHn

Show HN: Teddy Bear Tracker iOS App Two weeks ago when walking around my neighborhood I noticed a strange amount of teddy bears placed in the windows of homes. When I got home I searched the internet and found https://ift.tt/2URjc5m describing that this was being done to provide additional entertainment for people going on walks during these times of social distancing. This past week I decided to repurpose some old code into an app that would allow me to keep track of the teddy bears I found while on my own walks. It's quite simple but I hope others can get some enjoyment out of it! :) Here is the Apple App Store link: https://ift.tt/3al5kpV April 19, 2020 at 10:26PM

Show HN: MailPhantom – Keeping your email address invisible https://ift.tt/2Lc2z02

Show HN: MailPhantom – Keeping your email address invisible Been reading HN for some time now, but this would be my first post. https://ift.tt/2zkYqEh Copy and past from the site: ######### The use of unique password are considered best practice, why are we not doing this with email addresses as well. MailPhantom aims to achieve this, with an added benefit, you'll see which service providers or mailing lists are sharing your email addresses. ######### This is basically a MVP, and may likely break somewhere. But if there is a lot of interest I may build/work on it more. I have used it in its current state for a few months now. I welcome any feedback :) ^C May 10, 2020 at 12:59PM