So one of the things that I was interested in doing was analysing some data from Twitter. I thought a good place for me to start was with data on Dupuytren's and then on Ledderhose. Why this? Well because I am a trustee for the British Dupuytren's Society.
In this post I am going to use a variety of tools and libraries, I probably don't do this in the most perfect way as I was trying to achieve a few different things and this was just because I find python and data analysis fun!!!
Twitter Scraping:
The first python library that I am going to use is twitterscraper. This is a great tool for scraping the information from twitter. I have had a few issues with the errors coming out but that is probably my fault. Overall it works well, see the link above and my code snippet below.
I did add in retweets and likes but it did not seem to like that and the success rate for extraction was 10% rather than the 70% without it. The biggest issue is that I didn't want to cater for lots of different languages so everything, for example, in Chinese fails.
Re
This is the python regular expression library. I have used regex's before and they are super powerful and cool (if you like that sort of thing). For more information see the following link. I only use it for a very simple use below, removing return characters and tabs from the tweets so that exporting to a text document works.
I also use other packages that I have discussed elsewhere:
- pandas
- Xlwings
- pyodbc / SQL Alchemy
- SQL scripting
Screen Shots of the Output: (Yes I could have made this look nicer with some of the formatting options I have posted about before or even just auto-fitting the columns but I felt those commands would have gotten lost in this post).
The Code:
No comments:
Post a Comment