Twitter is a social media service that allows its users to send short (140 character) messages out to its subscribed listeners. From its beginning in 2006, Twit ter has grown to the point where it handles hundreds of millions of messages (tweets) per day from their 302 million active users. It differs from email in that it broadcasts messages, and the recipients are self-selected.
The messages are entered by Twitter users, each of whom has an account. All messages become part of a stream, and the ones that a particular user wants to see are pulled from that stream and placed on the user’s feed. It is, however, possible to see the feed and examine messages as they are sent, collecting data or identifying patterns. Twitter allows access to the stream, but when using Python, it requires the use of a module that must be downloaded and installed. That module is called tweepy.
A warning: setting up the authentication so that the Twitter stream can be accessed is not simple. A Twitter account is needed, an application has to be registered, and the app must be specified as being able to read, write, and direct messages. Twitter creates a unique set of keys that must be used for the authentication: the consumer key and consumer secret key, then the access token and access secret token.
A tweet is limited to 140 characters, but that only considers content. The amount of data sent in a tweet is substantially larger than that, 6000 bytes or more. That’s due to the large amount of metadata, or descriptive information, in a tweet. Most people never see that, but a program that reads tweets and sifts them for information will have to deal with it. The twitter interface returns tweet data in JSON format (JavaScript Object Notation), which is a standard for exchanging data, similar in purpose to XML. This format has to be parsed, but a second Python module named json will do that so no further discussion of JSON will be necessary.
1. Example: Connect to the Twitter Stream and Print Specific Messages
This program examines the twitter feed and prints messages that have the term “Star Trek” in them. It is useful to see that once again, authentication is one of the first things to do. In the case of tweepy, an object is created, passing the authentication strings.
import tweepy import json
# Authentication details from dev.twitter.com
consumer key = ‘get your own’
consumer secret = ‘get your own’
access token = ‘get your own ‘
access token secret = ”get your own ‘
authentication = tweepy.OAuthHandler(consumer key, consumer secret)
authentication.set access token(access token,access token secret)
Now something different is needed. Tweepy wants to have an object passed to it that is a subclass of one that it defines, StreamListener. As a part of the deal that is made with tweepy, the class must have a method named on_data() and another named on_error(). The on_data() method is called by tweepy when there is data in the stream to be read, and the data is passed as a string in JSON format; the on-error() method is called when an error occurs, and is passed a string with the error message. Creating this subclass will be described a little later. However, assume that it is called tweetlistener. The next step in the process is to create an instance of this class:
listener = tweet listener()
The stream is accessed through this class instance. Now tell tweepy what this instance is so it can use it. Also do the authentication:
stream = tweepy.Stream(authentication, listener)
Finally, tell tweepy what to extract from the Twitter stream. For this example, the call is:
stream.filter(track=[‘Star Trek’])
but other parts of the stream can be accessed and sent to this program, such as times, dates, and locations. In this case, the track argument looks into the message text for the “Star Trek” string, case insensitive. Multiple search strings can be placed in the list: [‘Star Trek’, ‘casablanca’].
What about the tweet listener class? It is a subclass of StreamListener. The on_data() method needs to parse the JSON-formatted string it is passed and print the parts of the message that are desired. Since the filter() call restricts the messages to those containing the string “Star Trek,” all that has to be done in this method is to print the body of the message. Here is the class showing the method; the explanation follows:
class tweet listener(tweepy.StreamListener):
def on data(self, data):
# Twitter returns data in JSON format – decode it first
diet = json.loads(data)
print (dicti’user’H’location’])
print (dict[‘user,][,screen name’],dict[‘text’])
return True
def on error(self, status):
print (status)
The parameter data is in JSON format. To convert it into something useable, pass it to the json.loads() method. It returns a Python dictionary with the data available, indexed by the field name. The data structure used by Twitter is complex, and is shown in small part in Table 13.1. The left side of the table shows the message field names, and the right lists some of the user fields; user is a field within the message that describes the sender. The variable diet is the resulting dictionary.
To simply solve the problem posed, all that would have to be done is to print dict[‘text’], which is the message body. The value of dict[‘user’] is the data for the sender of the message. There is a lot of that, mostly not useful to anyone but an app developer (e.g., the background color of the user’s window), but dict[‘user’]’[‘screen_name’] is the Twitter identity ofthe sender, and dict[‘user’] [‘location’] often indicates where they are. It would be possible to collect data on where the largest number of tweets are being sent from, what kind of information is being conveyed, and in this way perhaps develop an early warning system for events happening in the world.
Source: Parker James R. (2021), Python: An Introduction to Programming, Mercury Learning and Information; Second edition.