apachehudi @apachehudi
Official twitter handle of Apache Hudi. We marry stream processing to petabytes of data. https://t.co/Ka1NABVHlw hudi.apache.org Joined January 2019-
Tweets417
-
Followers3K
-
Following134
-
Likes250
Come & join the Apache Hudi community on May 11th for an in-person event in Bangalore, India 🇮🇳 We have 3 amazing talks that touches upon key challenges in real-time data processing, enhancing time-to-insight & preview of Hudi 1.0. Register: docs.google.com/forms/d/e/1FAI…
Apache Hudi + Daft = 💜 We're thrilled to share that 'Daft Distributed DataFrames' now supports reading Hudi Tables directly from the data lake. For the first time, you can use the power of Hudi directly in Python — without the need for JVM or Spark tinyurl.com/dafthudi
🎉 Excited to bring the Apache Hudi Meetup to Bangalore! Join us at Navi Office to hear about the challenges in data ingestion, improving time-to-insight & innovations in Hudi 1.0. 📅 Date: May 11, 2024 📍 Location: Navi Office, Bangalore 🔗 Link: forms.gle/wpd9gbKmC99GxK…
Building a near Real-time Lakehouse with Apache Hudi using AWS Stack. Real-time data analytics on operational data is increasingly becoming a standard requirement. A 🧵
Bin Packing Algorithm for "Small File" Issue in Lakehouses. Small File problem is one of the critical problems in a data lake that impacts query performance when reading files using compute engines. The problem occurs when writing data in smaller chunks 🧵
Query Optimization with 'Clustering' in Apache Hudi. Today I presented how the clustering service in Hudi makes a huge impact on the overall query perf. To highlight the difference, I ran the same query using Presto once before clustering & after in a 1 TB TPC-DS dataset.
A few months back, I started this 10-post blog series: @apachehudi from Zero to One, with a goal to give a comprehensive deep-dive of Hudi designs. Happy to share the last post today: (10/10) Becoming "One" - the upcoming 1.0 highlights #apachehudi open.substack.com/pub/datumagic/…
Super excited to bring the Monthly Hudi newsletter to the community! 🎉 There's just so much momentum happening with Apache Hudi & the overall lakehouse space that we needed to bring this to one consolidated place! Link: hudinewsletter.substack.com/p/hudi-newslet…
Have you checked out our YouTube yet? It has some of the amazing videos from our Community sync and Hudi Live sessions! ✅ Notion’s journey through different stages of data scale ✅ Shaping a Database Experience within Data Lakes with Apache Hudi Link: youtube.com/@apachehudi
Have general Hudi quesitons? Wonder about Hudi's best practices or tips for troubleshooting? We are happy to start hosting additional 1-1 office hours every week! Book it now at calendly.com/apache-hudi/of…
Join us tomorrow to learn more about @doris_apache & Apache Hudi's integration! 🗓️ 13th March 2024 | 8 AM PT | 11 AM ET
Leonard Xu @Leonardxbj
2K Followers 700 Following Flink PMC Member & Flink CDC Lead, Flink Connector TL @alibaba_cloud, focus on Streaming SQL & Data IntegrationGwen (Chen) Shapira @gwenshap
26K Followers 9K Following Co-founder of @niledatabase. Making SaaS global, elastic and chill. Find me at: https://t.co/uyuHg400cpOnehouse @Onehousehq
921 Followers 98 Following Onehouse is the universal data lakehouse, offering a cloud-native managed lakehouse built on @apachehudi, accessible across table formats, engines and clouds.Jacek Laskowski @jace.. @jaceklaskowski
7K Followers 874 Following Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeaconsMim @mim_djo
9K Followers 3K Following #Fabric Enthusiast, Small Data And self service, #Microsoftemployee since Nov 2023 , but my tweets are my ownRobin Moffatt 🍻�.. @rmoff
10K Followers 661 Following DevEx Engineer at @Decodableco. Doing fun stuff with data and open source. 🌐 https://t.co/WparjfmCF5 🔗 Mastodon: @[email protected]Vinoth Chandar @byte_array
1K Followers 236 Following Founder @Onehousehq, Creator of @apachehudi. Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)ABC @Ubunta
3K Followers 3K Following Data & ML Infrastructure for Healthcare https://t.co/FwocCiCQAT Opinions are पड़ोसी' In 🇩🇪Berlin from 🇮🇳Kolkata/छत्तीसगढ़Eric Sammer @esammer
13K Followers 716 Following ceo at @decodableco! prev: @splunk, @rocanainc (acq'd), @cloudera. open source / dist systems / data. o'reilly author. [email protected]Trino @trinodb
5K Followers 92 Following Distributed SQL query engine for big data, formerly known as PrestoSQL🕺💃🤟 Alexande.. @emaxerrno
4K Followers 2K Following Founder & CEO of @RedpandaData - A Kafka® replacement for mission critical systems. 10x Faster; Safe; API compatible. 🇨🇴Decodable @Decodableco
3K Followers 2K Following Decodable is a serverless real-time data platform built on #ApacheFlink. No clusters to set up. No code to write. No PhD required.Apache - The ASF @TheASF
67K Followers 211 Following Official feed: The Apache Software Foundation. The world's largest Open Source foundation provides $22B+ worth of software for the public good at 100% no cost.Alex Merced | Open Da.. @AMdatalakehouse
853 Followers 2K Following Developer Advocate at Dremio helping get the word about disruptive Open Data Lakehouse technology using best-in-breed tools like Dremio.Tim Spann @PaaSDev
4K Followers 5K Following Principal Developer Advocate 🥑 Cloudera https://t.co/ZpBW3t3IQN #NiFi xPivotal #Flink #Kafka #FLaNK xStreamNative 🐈⬛ 🇺🇦 https://t.co/lKExpMlKcuApache SeaTunnel @ASFSeaTunnel
487 Followers 104 Following A distributed, high-performance data integration platform for the synchronization and integration of massive data. Medium:https://t.co/CCcPqCHccqGary A. Stafford @GaryStafford
3K Followers 5K Following Area Principal Solutions Architect @AWSCloud | AWS Analytics Technical Field Community | 10x AWS Certified Pro | Former @ThoughtWorks & @AccentureStreamNative @streamnativeio
2K Followers 29 Following StreamNative was founded by the original creators of Apache Pulsar and offers a fully managed Pulsar solution.Avishek Halder @avishekhalder07
24 Followers 370 Following Full Stack Developer skilled in JavaScript,React, laravel Node.js, and Golang. Passionate about crafting efficient web solutions for seamless user experiences.Fawad Awan @fawadawn66
6 Followers 142 FollowingJack Vanlightly @vanlightly
3K Followers 218 Following @confluentinc thinking about event streaming. Previously @Splunk, @VMware @[email protected] Credit: ESO/B. Tafreshi (https://t.co/DvCarvC9L3)ALEX TSIGARIDAS @AlexTsigar52988
12 Followers 180 FollowingJayne Feehan @FeehaJay
86 Followers 5K FollowingDanilo Soto @DaniloJSoto
32 Followers 1K FollowingVIVEK ANANDH.K.M @VIVEKANANDHKM2
49 Followers 719 FollowingDeltaStream @DeltaStreamInc
274 Followers 66 Following Unified stream processing, powered by Apache Flink®. Get a free trial 👇Ajithkumar @ajith__ds
55 Followers 138 Following I'm a aspiring data scientist, have completed Data science program in 2020 and did MCA in 2019. Looking for Analytics / data scientist roles.Joe Matusik @vagueadvice268
2 Followers 10 Following[email protected].. @zkancs
951 Followers 2K Following ODDS| Data Product Developer. Passionate in software engineering, data engineering, and data science. ♥TENTANANO @tentanano
8 Followers 125 FollowingMike Caine @mikey_caine
196 Followers 5K FollowingCraig K @CKSolnEngineer
2 Followers 77 FollowingMohan Rajendran @MohanRajendran
75 Followers 2K Following Engineering@Amazon Photos | Previously Amazon Adskevinprice41 @k3v1nPr1c3
13 Followers 20 FollowingMauro Reinehr @MauroReinehr
69 Followers 1K FollowingAlejandro Duarte @alejandro_du
3K Followers 799 Following #Java #SQL #Programming #RaspberryPi #Vaadin #MariaDB #DevRel Published Author · Software Engineer · Developer Relations Engineer at MariaDBEldrid Rensburg @EldridRensburg
19 Followers 1K Following In the beginning, the Universe implemented Unix (Linux) & C (C with Classes) & said: let there = vars; & saw that it was good . . ¯¯\_(ツ)_/¯¯ . . ʕつ•ᴥ•ʔつdinesh kr anand @DineshAnand30d
343 Followers 4K FollowingRolandas Ziukevicius @rolandaszz
26 Followers 827 FollowingKazimir Lyshchynski @k_lyshchynski
1 Followers 397 Following Individuum over socium, discipline over individuumKaiming @ AutoMQ @wan0573
58 Followers 556 Following Architect & Lead Evangelist @AutoMQ_lab. Formerly lead CDC Platform @alibaba_cloud & co-founder @CloudCanal. Interested in data streaming & CDC.vinayde @vinayde
37 Followers 317 Followingjiwen liu @jiwen_liu57664
0 Followers 4 FollowingDai Mars @DaiMars3306
29 Followers 91 FollowingData Mentor @mentor_data
12 Followers 24 FollowingMadhusudhanan Vri @MadhusudhananVr
5 Followers 31 Following高级码农 @9OQ3QlckSsA5i1F
38 Followers 374 FollowingKeith Kraus @keithjkraus
1K Followers 1K Following CTO and Co-Founder @VoltronData, @RAPIDSAI maintainer, @condaforge core. Previously @NVIDIA. My thoughts are my own.Raxit @raxit65535
28 Followers 187 Following well I don't think that much About my self. there are lots of other interesting topics & problems to invest time in.Venkateshkumar Siva @venkystweet
33 Followers 395 Following Bharat🇮🇳 Proud Farmer👨🏻🌾 CSKian🦁 Java Developer💻 #KonguNadu🏹🦚🐂🐏🐓🐅Gunnar Morling 🌍 @gunnarmorling
51K Followers 302 Following Software engineer @Decodableco · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴Leonard Xu @Leonardxbj
2K Followers 700 Following Flink PMC Member & Flink CDC Lead, Flink Connector TL @alibaba_cloud, focus on Streaming SQL & Data IntegrationGwen (Chen) Shapira @gwenshap
26K Followers 9K Following Co-founder of @niledatabase. Making SaaS global, elastic and chill. Find me at: https://t.co/uyuHg400cpOnehouse @Onehousehq
921 Followers 98 Following Onehouse is the universal data lakehouse, offering a cloud-native managed lakehouse built on @apachehudi, accessible across table formats, engines and clouds.Jacek Laskowski @jace.. @jaceklaskowski
7K Followers 874 Following Freelance Data Engineer | #ApacheSpark #DeltaLake #Databricks #ApacheKafka #KafkaStreams | Java Champion | @theASF | #DatabricksBeaconsMim @mim_djo
9K Followers 3K Following #Fabric Enthusiast, Small Data And self service, #Microsoftemployee since Nov 2023 , but my tweets are my ownRobin Moffatt 🍻�.. @rmoff
10K Followers 661 Following DevEx Engineer at @Decodableco. Doing fun stuff with data and open source. 🌐 https://t.co/WparjfmCF5 🔗 Mastodon: @[email protected]Vinoth Chandar @byte_array
1K Followers 236 Following Founder @Onehousehq, Creator of @apachehudi. Distributed/Data Systems, Linkedin, Uber, Confluent alum. (views are mine)ABC @Ubunta
3K Followers 3K Following Data & ML Infrastructure for Healthcare https://t.co/FwocCiCQAT Opinions are पड़ोसी' In 🇩🇪Berlin from 🇮🇳Kolkata/छत्तीसगढ़Trino @trinodb
5K Followers 92 Following Distributed SQL query engine for big data, formerly known as PrestoSQLApache - The ASF @TheASF
67K Followers 211 Following Official feed: The Apache Software Foundation. The world's largest Open Source foundation provides $22B+ worth of software for the public good at 100% no cost.Gary A. Stafford @GaryStafford
3K Followers 5K Following Area Principal Solutions Architect @AWSCloud | AWS Analytics Technical Field Community | 10x AWS Certified Pro | Former @ThoughtWorks & @AccentureStreamNative @streamnativeio
2K Followers 29 Following StreamNative was founded by the original creators of Apache Pulsar and offers a fully managed Pulsar solution.Nilesh Mahajan @nilesh_mahajan
372 Followers 128 Following Micro-SAAS founder, Engineer and Writer. Building @walkthrough_so now. Former @uber @ebayAmazon Web Services @awscloud
2.2M Followers 465 Following The official account for Amazon Web Services (#AWS). ☁️ For help, please contact: @AWSSupportDipankar Mazumdar🥑 @Dipankartnt
1K Followers 531 Following Staff Data Engineering Advocate @OnehouseHQ, prev DevRel @Dremio, R&D @Qlik, Data @OtisElevatorCo | Author (O’Reilly) | Research: https://t.co/AiDKzVJCGaAlluxio @Alluxio
1K Followers 201 Following Data Orchestration for analytics and machine learning in the Cloud. @TachyonProject is now @Alluxio! [email protected]Mindy Ferguson @woman_hattan
373 Followers 572 Following VP AWS, Messaging and Streaming, @awscloud Urban birdwatcher, photographer (Nikon Z9), squash and tennis fan/player, loves fly fishing. NYC/LASid Anand @r39132
2K Followers 700 Following Dad, Hacker, Ambivert, Nature, History, & Science buff, Futbol fan (he/him) #TweetsMyOwnsoumil @soumil44145290
35 Followers 6 Following Hello! I’m Soumil shah | full stack python developerFred Pace @fpace
1K Followers 616 Following Dad, geek, tech junkie, data jockey, gadget freak, gamer, burger aficionado, pilot, Ducati owner. Retired... Former electron mover at MSFT and AMZN.Haider Sabri @hsabri
861 Followers 474 Following Former Head of Product & Engineering @TrainWithTempo, Former Head of Engineering @UberEatsAndy Walner @andywalner
235 Followers 939 Following Product Manager @OnehouseHQ // prev @Google, @DashworksAI, @UMichAnand Babu Periasamy @abperiasamy
2K Followers 150 Following MinIO, Gluster, Startups, Angel Investor. “Where there is love there is life.” ― Mahatma GandhiSarah Krasnik Bedell @sarahmk125
3K Followers 1K Following Analytics & GTM. @PrefectIO @Perpay_inc @JohnsHopkins @NorthwesternU. Blogging at https://t.co/jyBE9wg5cs. Ski ⛷️ and sail ⛵ in Vermont 🌲Simon Späti 🏔️ @sspaeti
3K Followers 1K Following Dad. Technical Author, Data Engineer and Educator https://t.co/49Ty3GXkqs, https://t.co/7r8pihWPQz. Tweets mostly: #dataengineering, #opensource, #writing, #pkm and #neovimApache Kyuubi @KyuubiApache
176 Followers 116 Following Apache Kyuubi: A Distributed and Multi-tenant Gateway to Provide Serverless SQL on Lakehouses. https://t.co/ETCmmPjkfXiamvinoth @iamvinoth
42 Followers 56 FollowingTim Meehan @tdcmeehan1
21 Followers 0 FollowingAdam Breindel @adbreind
654 Followers 332 FollowingBhaskar Ghosh @BGMusings
448 Followers 390 FollowingMd Hishaam Akhtar @HishaamAkhtar
326 Followers 348 Following They're sharing a drink they call lonliness, but its better than drinking alone Opinions are my own and are subject to change.Vaibhav Nivargi @vnivargi
770 Followers 2K Following Founder & CTO @moveworks. Previously: Founder @clearstorydata (@alteryx); Early engineer @asterdata (@teradata); @stanford CSMahdi Karabiben @MahdiKarabiben
455 Followers 2K Following Product @Siffletdata. Ex-Zendesk. I love hearing what the data has to say. Views are my own. he/him.Denodo @denodo
6K Followers 6K Following #Denodo is the leader in #datamanagement – providing unmatched performance, unified access to the range of enterprise, #bigdata, #cloud and unstructured sourcesDarragh Kennedy @darraghke
1K Followers 2K Following Director of Engineering @Zendesk - views are my ownLéo Biscassi @leobiscassi
72 Followers 174 Following Problem solver, lifelong learner, curious about data systems.Apache Doris @doris_apache
1K Followers 2K Following An open-source real-time data warehouse. Github: https://t.co/8SplJcHxKH Slack: https://t.co/qOIgHkaZc0Sagar Sumit @sagarsumit6
55 Followers 65 Following Database Engineering @Onehousehq | PMC Member & Committer @apachehudi | CS @gtcomputingAWS Blogs (Unofficial.. @AWSBlogs
4K Followers 1 Following Unofficial feed of AWS blog posts across all categories. Built and maintained by @donkersgood.Surya Prasanna @ThinkSurya
94 Followers 1K FollowingRoshan Naik @naikrosh
97 Followers 33 Following Realtime Big Data. Architect of Apache Storm 2.0's high performance engine, the Kappa+ architecture and Castor. Physiology hacker.Ananth Packkildurai @ananthdurai
2K Followers 2K Following Data @Zendesk, @SlackHQ | Author https://t.co/rvlBOXX0cy | Creator of https://t.co/XdMVrxUay6 | Angel Investor | Advisor for early stage data startupsSimon Whiteley @MrSiWhiteley
3K Followers 589 Following Director of Engineering / Owner of @AdvAnalyticsUK, Speaker & Consultant. Spark Nerd. Londoner, foodie & gamer! Microsoft MVP. Databricks Beacon. He/Him.Eliad Gat 🇮🇱 @eliadgat
17 Followers 112 FollowingBob Haffner @bobhaffner
258 Followers 120 Following Data Engineer | Host of @EngSideOfData #dataengineering #dataengineer podcast: https://t.co/07UIkRSRciBuilding a near Real-time Lakehouse with Apache Hudi using AWS Stack. Real-time data analytics on operational data is increasingly becoming a standard requirement. A 🧵
The Magic of Hudi + Flink, Stream Processing on the Data Lakehouse x.com/i/broadcasts/1…
Bin Packing Algorithm for "Small File" Issue in Lakehouses. Small File problem is one of the critical problems in a data lake that impacts query performance when reading files using compute engines. The problem occurs when writing data in smaller chunks 🧵
Query Optimization with 'Clustering' in Apache Hudi. Today I presented how the clustering service in Hudi makes a huge impact on the overall query perf. To highlight the difference, I ran the same query using Presto once before clustering & after in a 1 TB TPC-DS dataset.
A few months back, I started this 10-post blog series: @apachehudi from Zero to One, with a goal to give a comprehensive deep-dive of Hudi designs. Happy to share the last post today: (10/10) Becoming "One" - the upcoming 1.0 highlights #apachehudi open.substack.com/pub/datumagic/…
Building an Open Data Lakehouse on S3 with @apachehudi & @prestodb. Super excited about this new workshop that I am running with the Presto team on building an open lakehouse architecture & doing ad hoc analytics on top of it.
Have general Hudi quesitons? Wonder about Hudi's best practices or tips for troubleshooting? We are happy to start hosting additional 1-1 office hours every week! Book it now at calendly.com/apache-hudi/of…
.@BrennaBuuck has previously explored how MinIO & Hudi can work together to build a modern #datalake. This blog post aims to build on that knowledge & offer an alternative implementation of @apachehudi & MinIO that leverages Hive Metastore Service (HMS). hubs.li/Q02pVqBX0
@apachehudi: From Zero To One (9/10) introducing HoodieStreamer - a Swiss Army knife for ingestion! #apachehudi #apachespark #apachekafka #distributedsystems #dataprocessing #cdc #dataengineering #databases #datalake #lakehouse blog.datumagic.com/p/apache-hudi-…
#apachehudi's #lakehouse offers a utility, "Hudi Streamer", for data ingestion: ✅ ingest data like @apachekafka , #apachepulsar & etc ✅ supports auto checkpoint management & integration with schema registries @confluentinc ✅ supports for backfills, one-off runs & more
More good stuff from @grabengineering - this time writing about how they are building a realtime datalake with tools including @ApacheFlink, @apachehudi, @ApacheSpark and @trinodb engineering.grab.com/enabling-near-…
During COVID, Zoom adoption soared and eng rapidly redesigned their log analytics with MSK, EMR, #apachehudi + Athena. This led to 82% compute cost savings and 90% on storage while perf from 5h->5min The Blog 👉 aws.amazon.com/blogs/big-data… #datalakehouse #apachespark #apacheflink
Addressing the small file issue is critical for optimizing the query performance on data lakes. The problem occurs when writing data in smaller chunks. e.g. Stream processing engines, like @ApacheFlink, ingest continuous data streams into table formats like Apache Hudi. 🧵
Hudi's indexing vs. Mult-modal indexing - Catch the details around #apachehudi indexing here: hudi.apache.org/docs/indexing #dataengineering #datalakehouse #databases
some common examples are apache iceberg iceberg.apache.org apache hudi hudi.apache.org Azure data Lake 1 learn.microsoft.com/en-us/azure/da…
Our latest blog explores how integrating #YugabyteDB with #ApacheHudi enhances data lakehouse capabilities through: ➡️real-time data processing ➡️efficient upserts and deletes ➡️improved consistency and scalability Find out more! ⬇️ hubs.la/Q02k0Tl00
🤿And for a deep dive into the value of the data lakehouse, download our free ebook, Building a Universal Data Lakehouse. onehouse.ai/whitepaper/one…
🛫The Big Data Show in Bengalaru this past weekend was a huge success. We ❤️ this session: Journey into the Data Lakehouse: Unveiling the Third Generation of Data Design. 💡Data engineering leaders from Onehouse, Visa, and Walmart discussed key questions - including the…
Concurrency Control in Apache Hudi. Table formats in #datalake support concurrent access to data by multiple transactions. This is one of the most important problem tackled by a lakehouse architecture as opposed to data lakes. A 🧵
Apache Hudi 1.0 will bring database-like capabilities to the open source lakehouse architecture. Here’s an accessible write-up of the intro session from Open Source Data Summit, with a punchy pair of videos covering key points. Read, watch, and learn! 😏 onehouse.ai/blog/apache-hu……
Looking forward to this one @Dipankartnt 👏 datadaytexas.com/2024/sessions#… #OneTable #apacheiceberg #apachehudi #deltalake
Join @Dipankartnt at @DataDayTexas happening on 27th Jan 2024. He will be talking about OneTable & how it solves the interoperability challenges among #lakehouse table formats. Join him with some of the other amazing folks in the data space in Austin, TX!