Remember if you want to run hypothesis tests on your data from sentiment, word frequencies, etc. you can download the data to a csv by using the write.csv command. For example from sentiment, you can extract the data matrix from R, then calculate a column mean for each of the sentiment attributes. If you do this for both companies or products you are investigating, you can then conduct a t-test or anova test between the means (make sure you use the appropriate values of n in your testing).

Row # | Positive | Negative | Angry… |
---|---|---|---|

001 | |||

002 | |||

… | |||

Total number data | Total positive data | Total negative data | … |

Taking the column totals and dividing them by the total number of tweets will give you the average number of positive words per tweet. If you do this for each company, product, ect. You can then conduct a t-test to determine if the number of positive tweets for company A is significantly different than company B. You can do the same thing for word frequencies. Doing this will allow you to conduct a true statistical test, rather than just saying “the graph of company A looks more positive”.

Remember too that you can write the data out from the “wrapper” of your tweets to create visualizations on day of week, time of day, geography, etc. using Tableau or minitab.