PyData Dallas 2015
"Data are always messy and ill-formatted. We spend seemingly unnecessary amounts of hours writing software to convert between common formats, databases, and newer filesystems. Typically, we spend just enough mental energy to get the job done -- hopefully, giving us more time in the next stage of the data pipeline. This results in non-performant, non-reusable, non-extensible code. In this talk we present Odo, a new open-source software package which simplifies and eases common data migration tasks. Odo can seamlessly migrate between CSVs, JSON, Dataframes, and Databases, just as easily as it can migrate between NumPy Arrays, HDF5, HDFS, and S3 -- and everything in between and much more. When choosing a storage format we have to balance several features: size, performance (read/write), chunk-ability, shareability, multi-tenancy, computational target, etc. Odo lets us explore and evaluate various target data containers without much cost. Where possible, Odo takes advantage of performant and feature rich bulk loaders. With a lower cost to play and faster data conversion speeds, a once unfun and boring task can possibly engage us and lead to happier computing down the road. We will cover different real-world use cases and scenarios and compare these with the “common” answers repeated amongst us data mungers" 00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVi...
Watch video Ben Zaitlen - Odo - Shape Shifting Data—A Handy Tool to Guide You from CSV HDFS and Beyond online, duration hours minute second in high quality that is uploaded to the channel PyData 01 May 2015. Share the link to the video on social media so that your subscribers and friends will also watch this video. This video clip has been viewed 1,669 times and liked it 22 visitors.