Introduction to Dask: Scaling EDA & ML Workloads

While “Big Data“ may be an overhyped buzzword, it’s not uncommon for Python users to end up with more data than can fit on their laptops. Sampling is great, but sometimes you need to process everything. In the past, Python users didn’t have much choice beyond Spark (and the fact that most data lakes were HDFS made it the standard option). But today, even the stodgiest enterprises have migrated a ton of data to cheap blob storage in the cloud. This has freed python users from the misery of the JVM (I mean, hey, it’s way better to see a Python error than a JVM stack trace, right?). So as a result, tools like Dask make it much easier to scale the tools Python users love, e.g., NumPy, Pandas, Sklearn. In this talk, you’ll learn how to scale your PyData workloads with minimal code changes using Dask so that you can focus on your work without having to learn a new API PUBLICATION PERMISSIONS: PyData provided Coding Tech with the permission to republish PyData talks. CREDI

0 views