How we simplified our data pipeline – Freetrade Blog


(Ian) #1

I wanted to share our recent blog post with the community. Love to hear peoples thoughts and find out who this connects with. Are you working in this space or interested to hear more?

Cheers!

Ian


(Chris) #2

Z shell = bae


(Chris) #3

Python & JavaScript developer here so very interesting to me. Generally I’ve stayed away from GCP (big AWS fan, sorry!) but when it comes to data Google generally rules over everything else.

I love the tech insight though, thanks for sharing this!


(Ian) #4

I can see why you’d like AWS. It’s definitely a more feature rich platform. I like GCP products as they tend to have better documentation and be more refined. Long term we’ll likely leverage both.


#5

Good post @Ian - I like it. Wraps everything into a neat package.

So does Airflow provide locking functionality for multiple workers trying to access the same resource? And does HttpHook provide ratelimit/backoff/authentication strategies?

I trust this is okay, sorry if it’s too much :see_no_evil:


Way beyond the scope of this thread but I would be interested in learning about how you and the team write operations that are idempotent.


(Ian) #6

Hey @saf - great questions :slight_smile:

Airflow provides config for a number of use cases. You can use it to limit concurrency to help with contention. That said jobs should be made idempotent by say, using the job id, which will help prevent locking issues in several cases. Airflows tasks provide support for exponential backoff so you can rely on the task management rather than the hook. As for the HttpHook config you can provide auth headers but you may want to handle this directly if you have more specific needs.


#7

Thank you @Ian