In this first post on Pandas I just want to discuss why
pandas is cool rather than go into too much detail, after all there are books and
documentation that can go into much more depth than I can. Pandas comes with the
Anaconda distribution so assuming that you took this route to install python you are good to go.
Here are some of the things you can do in python pandas:
- Deduplicate data
- Cleanse data
- Join data (though personally a simply import into SQL Server and query against this is sometimes better)
- Easily import / export data to databases, Excel, CSV and HTML (all of which I have needed)
- Manipulate data and perform calculations
- Get unique values
The list is much, much longer however the above are all
things that it is use for commonly. For the data analysis and report automation
that I have done I have used Pandas in every single case. If you Google it you
can probably find examples of what you want to do and this information helps
make Pandas so useful. Here is an example of some of the steps I have used pandas
for in a single report:
- Read an Excel file
- Import data to SQL Server
- Extract data from SQL Server
- Format data, mainly dates to be in correct format
- Sort values
- Perform a join
- Get unique values
- Iterate through these values to create tabs on an Excel spreadsheet
Over time each of these subjects will be covered, with examples, individually
and in more detail.
Other Resources:
No comments:
Post a Comment