Freelance Data Engineer and instructor, enjoy solving data problems with #ApacheSpark #AWS #GCP #Azure 👨🏭 | [email protected]waitingforcode.com remoteJoined January 2011
I have been busy the last few months writing a book for O'Reilly about how to build ML systems (batch, real-time, and LLMs), distilling much of what I have learnt from both working with customers as well as students. Why could the book interest you?
* Data Scientists - transition…
I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases.
Here's my 10 things to understand about databases:…
I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases.
Here's my 10 things to understand about databases:…
The early release of Delta Lake: The Definitive Guide is here! 🎉 The latest edition includes the addition of Chapter 12: Performance Tuning.
Download here ➡️ bit.ly/472DVY7
Authors @dennylee, Prashanth Babu, Tristen Wentling, & @newfront#opensource#deltalake#oss
Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 waitingforcode.com/apache-spark-s…
In the previous release #PySpark has got an interesting streaming feature -> the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world.
More 👉 waitingforcode.com/apache-spark-s…
[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.5.0 release is here. Try it out! spark.apache.org/releases/spark…
It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there waitingforcode.com/better (planning to add some stream processing materials soon)
If Delta Lake implemented the commits only, I could stop exploring this transactional part after the previous article. But as for RDBMS, #DeltaLake implements other ACID-related concepts, such as isolation levels 👉 waitingforcode.com/delta-lake/tab…
One of the great features of table file formats is the ability to handle write conflicts. It wouldn't be possible without commits that are the topic of my #DeltaLake blog post. waitingforcode.com/delta-lake/tab…
Surprises may be hidden elsewhere, even in the provider-managed libraries. I got punished once for relying on them without verifying the ins and outs before. Lessons learned 👉 waitingforcode.com/data-engineeri…
OOM problems in #ApacheSpark Structured Streaming were often due to the infinitely growing metadata layer. There were a few workarounds but it's also possible to use a proper configuration, at least for file sink 👉 waitingforcode.com/apache-spark-s…
If you rely on the watermark for the state expiration in #ApacheSpark arbitrary stateful processing, be careful. The first micro-batch doesn't contain the watermark yet! You can find some of possible workarounds in the new blog post 👉 waitingforcode.com/apache-spark-s…
44K Followers 1K FollowingCTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, https://t.co/94gROE5Xa0. https://t.co/nmRYAKG0LZ
369 Followers 1K Followingلا اله الا الله محمد رسول الله
صلى الله عليه وسلم
न घबराओ मुसलमानों ख़ुदा की शान बाकी है
अभी इस्लाम जिन्दा है अभी कुरान बाकी है!!!
7 Followers 63 FollowingI'm a GCP Certified Data Engineer with 3.5+ years of experience building automated, scalable data pipelines using tools like Cloud Composer, BigQuery, and PySpk
575 Followers 2K FollowingI am interested in Stream Processing, Distributed Systems and Databases. Software Engineer by profession. Co-founded @FennelAI. Currently @Databricks
44K Followers 1K FollowingCTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, https://t.co/94gROE5Xa0. https://t.co/nmRYAKG0LZ
1.3M Followers 2K FollowingFollow along for how-tos, demos, product news, and more. For company updates, check out @GoogleCloud.
Watch #GoogleCloudNext on demand ⬇️
1K Followers 258 FollowingFounder at Irontools. Data Streaming Advocate. Building data platforms. Writing about data engineering, real-time data and distributed systems.
3K Followers 509 FollowingStaff Engineer @confluentinc; Committer and PMC Member @ApacheFlink; Co-Founded data Artisans; Member of the Apache Software Foundation.
20 Followers 39 Following☝️ get it? I write code, I wrote about code, and I have a blast teaching and helping others fall in love with Engineering too.
3K Followers 2K Followinggeek, scribe, coffee snob, and wanna-be cyclist. Contributor to Apache Spark and Delta Lake maintainer. Developer Relations at Databricks (opinions r my own)
3K Followers 1K FollowingCo-founder and CEO @hopsworks. Organizer of the feature store summit. I am writing a book on Building ML Systems for O'Reilly.
1K Followers 485 FollowingI help companies extract value from data - https://t.co/i9aT6lR3kw. Data factory engineer. Now at Mastodon @social.linux.pizza@lalle & Bluesky @larsalbertsson.se
16K Followers 268 FollowingPenthouse Architect in the engine room💭Tree-hugging car nut💭Confused Sensemaker💭 Cloud immigrant. Opinions mine and plentiful.
9K Followers 0 FollowingDask natively scales Python
Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
1K Followers 1K FollowingData Engineer @CERN and @ATLASexperiment - I work with databases, data lakes, and data analytics - Oracle RDBMS - Apache Spark
10K Followers 0 FollowingI'm a software engineer, author of the book Hands-on Scala Programming https://t.co/jCMgqCKhpN
Bluesky https://t.co/2IJXLiM3YG
28K Followers 193 FollowingPracowałem w Microsoft. Zbudowałem i sprzedałem 2 firmy (IT & Longevity). Teraz https://t.co/wKnIFYoc6j. Sprawdź mailik o AI 👇.
2K Followers 934 FollowingDeveloper@heart; Advocate by nature; Communicator & Cosmopolitan by choice | Arsenal Fan by Inheritance| Love Reading, Writing, Coding | Tweets R Mine
No recent Favorites. New Favorites will appear here.