Spark 3.1.1 release is about to see the light and it comes with a lot of new things!
Check out, for example, all the Kubernetes-related tasks done for this release.
As you can probably imagine from the title of this post we are not going to talk about Kubernetes but Nested Fields.
If you have been working with Spark long enough you would definitely have had some nightmares using deeply nested fields.
Let’s start from the beginning.
What are nested fields?
Nested fields are fields that contain other fields or objects.
Earlier last year(2020) I had the need to sort an array, and I found that there were two functions, very similar in name, but different in functionality.
These are array_sort and sort_array.
Which one to use? At first, I felt confused, Why would there be two functions to do the same?
Well, the difference is that
def array_sort(e: Column):Sorts the input array in ascending order and null elements will be placed at the end of the returned array.
def sort_array(e: Column, asc: Boolean)Sorts the input array for the given column in ascending or…
In this post, I’ll break down the new output of the explain command. But first, let’s see how it was before 3.0.
Some people might have never used explain, but it is really useful when you want to know what’s really happening under the hood.
Sometimes we write beautiful code and expect Catalyst to solve it all for us, and sorry folks, but that’s not how it works. Catalyst makes our lives much easier, but unfortunately, it doesn’t do all the work. …
I’ve been working in Big Data now for 5 years, so when I decided to get the certification I was already familiar with some terms like Data Warehouse, networks, caches, performance, etc. I also knew some AWS services but more in relation to the Big Data world.
Let’s start with a bit of my background. In 2016, I started working with Amazon for close to 2 years. After reading this you might think I had a lot of experience with AWS, but that’s far from the truth. I can summarize my experience in launching EC2 instances and configuring security groups…
Data Engineer at Typeform