Home Python Pandas for big data
Post
Cancel

Python Pandas for big data

Pandas vs Rainbow Query Language

image

My tool of choice for “big” text files that need viewing and maybe a bit of filtering is Rainbow CSV extension in VS Code, and writing some quick queries with its built in language. However when the files get truly massive, Rainbow and VS Code gives up, so I had to learn enough Pandas to work on the files there, or break it down to something I can use with Rainbow. I believe this threshold for what is too big is currently in the 20-40MB range.

These are my notes on using python Pandas to analyze or edit large (100MB+) files. ALso available is a post on often used Rainbow queries.

Jupyter notebook loading a dataframe.

image

Viewing the first few rows, to determine column name and value to split on.

image

Split the large dataset down to just the details to analyze further.

image

This post is licensed under CC BY 4.0 by the author.