Python Tutorial: Connect Government Data API's By Using The Factory Pattern
Source Politician Data from Public Government Data API’s to Increase Allocation Accuracy
In this tutorial, we do mainly two steps
-
First of all we generate some plotly diagrams out of our collected data,
-
secondly, we connect a public government API to our program in order to retrieve government members and party assignment in a reliable way
As you already know we are interested in a generalized program, which can work with multiple countries. For that, we will introduce an abstract class and the factory pattern in the second part of our tutorial.
But let’s start with the first quite easy task of generating two charts for our tables.
Plotly Chart Generation
We create a bar chart as well as a pie chart. As a base, we take our plotly table
CH-tw-party-list.
The code is straightforward:
-
In the bar chart, we visualize the accumulated friends count per party.
-
In the pie chart, we aggregate the twitter account per party.
def create_party_friends_count_bar_chart(self, df):
data = [
go.Bar(
x=df.Party, # assign x as the dataframe column 'x'
y=df.FriendsCount
)
]
py.plot(data, filename=self.__country_code+'-tw-party_politicans_count')
def create_party_politicans_count_pie_chart(self, df):
trace = go.Pie(labels=df.Party, values=df.PartyCount,
hoverinfo='label+percent', textinfo='value',
title="Twitter User per Party", titlefont=dict(
family='Courier New, monospace',
size=14,
color='#7f7f7f'
))
data = [trace]
py.plot(data, filename=self.__country_code+'-tw-party_friends_count')
As one can see in the code excerpt, various configuration parameters allow you to modify the layout of a chart. Head over to the
Plotly Python Open Source Graphing Library
to find out more about the various possibilities with charts, panda and plotly.
As one can see we have a lot of “unkowns”, i.e. we couldn’t identify the corresponding party by just analyzing Twitter data elements. In the second part of this tutorial, we will connect another data source for addressing this issue.
Government Data API Factory
In the recent years, the availability of so-called open Government API’s exploded. It stems from the idea that data should be open, as Wikipedia describes the term of
Open Data:
“Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control….
One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data’s importance is borne from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.”
A good starting point f for getting an overview of government data API’s is the programmablweb directory, which list over 20’00 different API’s.
The government API category can be found here:
Two examples of Government API’s:
The data API of the US government:
https://www.data.gov/developers/apis
Or the Swiss Government API of the Swiss Parliament:
http://ws-old.parlament.ch/:
We will use the Swiss Parliament API’s to extract personal data of the parliament members (mainly the Party allocation) in order to increase the accuracy of our twitter matching algorithm.
Councillors Data
In order to get the data in a program readable form we have to attach the query parameter:
format=json. Which will return the content as a JSON document:
http://ws-old.parlament.ch/councillors?format=json
As we have already explained in our
second tutorial a public API may return a lot of information. In order to control the amount of data returned in one request, the concept of a cursor or paging mechanism is used.
The Swiss Parliament API returns about 25 records in one request. The latest request record will have an attribute attached which tells you if there are more data available (
hasMorePages=true).
In case it sets to ‘
true’ you may fetch the next page by adding the query parameter
pageNumber=2 etc.
pageNumber=2 etc.
You will find such kind of information about the API normally in its user documentation, e.g. the Swiss Parliament API has some parameters to control the output format, language etc.
Having now a basic understanding of the API, we can enhance program which is capable of reading data from country-specific government API’s. Let’s dig into the code.
Enhancing the Code - the UML Diagram
Introducing the government API in a general way needs some serious design and enhancement of our program. The UML class diagram of our enhanced program looks as follows (don’t be overwhelmed by the complexity, all the details will be explained later in this article).
A quick summary of what we have done until now:
-
We created the GovernmentSocialMediaAnalyzer class in the second tutorial, which is capable of retrieving twitter relevant account data of politician of a country. We used a configuration driven based approach - based on YAML - to abstract the country-specific data into a configuration file
-
Several methods were defined which allowed us to create panda data frames, as well as plotly specific tables and charts.
Now we will introduce three new classes
govAPIFactory, govAPI (an abstract class) and
govAPI_CH, which will build a generalized approach for connecting any kind of government API’s.
Factory Method Pattern
Software DesignPattern play an important role in Software design, as described by
Wikipedia:
“In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations. Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system.”
In our design, we will use the Factory Method Pattern to generalise the connectivity to a government API, which is explained by
Wikipedia as follows:
“In class-based programming, the factory method pattern is a creational pattern that uses factory methods to deal with the problem of creating objects without having to specify the exact class of the object that will be created. This is done by creating objects by calling a factory method—either specified in an interface and implemented by child classes,
or implemented in a base class and optionally overridden by derived classes
—rather than by calling a constructor.”
Our design will be based on the strategy, to define
-
a base class (parent - GovAPI) which is abstract and
-
a derived class (child - GovAPI_CH), which will have the country specific implementation (i.e Switzerland).
-
in the future, we can introduce additional classes for example for the United Kingdom we would construct the implementation class: GovAPI_UK
Abstract Base Class “GovAPI"
govAPI is an abstract class that contains several abstract methods. An abstract method is a a method that is declared, but contains no implementation.
In Python, an abstract class is derived (or inherited) from the class ABC and will have one to more methods marked with
@abstractmethod.
from abc import ABC, abstractmethod
...
class GovAPI(ABC):
@abstractmethod
def load_government_members(self):
pass
...
So the abstract class provides you with a build plan, for any implementation class which inherits from it. in our class the
govAPI_CH. What kind of methods does
govAPI_CH has to implement ?
First of all the implementation of the
load_government_members() method has to take care of the politician’s data records fetching from the government API.
Each fetched record - which represents the data of a single politician - must be passed to the method
add_person_record (which is already implemented by the
govAPI base class)
The question now is, what the heck is the
add_person_record method doing ? Well look at the code below.
def _add_person_record(self, dict):
person = {
'id': self._get_id(dict),
'active': self._get_active(dict),
'lastName': self._get_last_name(dict),
'firstName': self._get_first_name(dict),
'middleName':self._get_middle_name(dict),
'gender': self._get_gender(dict),
'party': self._get_party(dict),
'council' : self._get_council(dict),
'electedDate': self._get_elected_date(dict),
'birthDate': self._get_birthdate(dict),
'maritalStatus': self._get_marital_status(dict),
'title': self._get_title(dict),
'statePostalCode': self._get_state_postal_code(dict),
'district': self._get_district(dict),
'zip': self._get_zip(dict),
'townName': self._get_town_name(dict)
}
self._members.append(person)
The method is just preparing a target dictionary for our person record. I.e. the attribute names defined (
lastName, firstName, council etc.) are the names we want to use for any GovAPI implementation.
That means our retrieved record in the form of a dedicated government API implementation (i.e. in the form of the Swiss Government Parliament API) has to be transformed by using a bunch of getter methods.
Each of this getter methods is either abstract or returns an empty string. It’s the responsibility of the implementer of an inherited class (
GovAPI_CH) to provide the correct getter implementation.
@abstractmethod
def _get_active(self,dict):
pass
@abstractmethod
def _get_id(self,dict):
pass
@abstractmethod
def _get_last_name(self,dict):
pass
@abstractmethod
def _get_first_name(self,dict):
pass
def _get_middle_name(self,dict):
pass
def _get_party(self,dict):
return ''
def _get_council(self,dict):
return ''
Implementation Class “GovAPI_CH"
The getter method implementations of
GovAPI_CH is shown below. It consists of a bunch of getter methods, which will return required attribute values out of the record.
class GovAPI_CH(GovAPI):
…
def _get_active(self,dict):
return dict.get('active')
def _get_id(self,dict):
return dict.get('id')
def _get_last_name(self,dict):
return dict.get('lastName')
def _get_first_name(self,dict):
return dict.get('firstName')
def _get_middle_name(self,dict):
return ''
def _get_party(self,dict):
return dict.get('party')
...
def _get_birthdate(self,dict):
return self._convert_utc_timestamp(dict.get('birthDate'))
def _get_title(self,dict):
return dict['salutationTitle']
def _get_country(self,dict):
return 'CH'
def _get_state_postal_code(self,dict):
return dict['cantonName']
def _get_zip(self,dict):
return dict.get('postalAddress').get('zip')
def _get_town_name(self,dict):
return dict.get('postalAddress').get('city')
def _get_elected_date(self, dict):
return self._convert_utc_timestamp(dict['councilMemberships'][0]['entryDate'])
Let’s drill down into the method
load_gevernment_members:
Our implementation uses the python module
requests which is “an elegant and simple HTTP library for human-beings”. In the introduction section of this article, we provided an overview of the Swiss parliaments API. The code below will fetch the data, using the paging mechanism.
The URL and its parameter we placed in our configuration YAML file.
govAPIUrl: "http://ws-old.parlament.ch/"
govAPICouncillorsRes: "councillors"
govAPIParams:
- format : "json"
-
The first requests.get will fetch all the councillors overview pages
http://ws-old.parlament.ch/councillors?format=json&pageNumber=1. In case a data record is marked as active the details record will be fetched -
The second request will use the id attributed of the record and construct the URL for the details record. I.e in this example we fetch the politician record with the id ‘1358’:
http://ws-old.parlament.ch/councillors/1358?format=json -
The retrieved detail record we pass to the method addPerson which will transform the provided data record to the target one (by using the getters we have implemented).
-
Finally, we check for the hasMorePages attribute and in case we reached the last record we will break the loop.
def load_government_members(self):
page_number=1
url = self.__cfg['govAPIUrl']
politician_res = self.__cfg['govAPICouncillorsRes']
par = self.__cfg['govAPIParams']
while True:
par[0]['pageNumber'] = str(page_number)
headers = requests.utils.default_headers()
headers.update({ 'User-Agent': 'Mozilla/5.0'})
politicians = requests.get(url+politician_res, params=par[0], headers=headers).json()
has_more_pages = False
for politician in politicians:
if politician.get('hasMorePages'):
has_more_pages = True
if politician['active']:
id = politician['id']
details = requests.get(url+politician_res+"/"+str(id), params=par[0], headers=headers).json()
print(details)
self._add_person_record(details)
if not has_more_pages:
break
else:
page_number += 1
return self._members
The above method will be called within the
govAPI function
create_politican_from_govapi_table (already implmented by the
govAPI parent class) which will transform the list of politician records into a Panda dataframe.
def create_politican_from_govapi_table(self):
self.load_government_members()
df = DataFrame.from_records(self._members)
print(df)
It’s important to realize here, that the
structure of this Panda dataframe will be the same of any kind of Government API, as long as we implement a specific class based on the
govAPI abstract class. So we normalized our data so that we can work and process it afterward in a standardized way.
Again we strived an important design pattern, our target structure (or model), is known under the name
Canonical Model. As Wikipedia describes:
"A canonical model is a design pattern used to communicate between different data formats. Essentially: create a data model which is a superset of all the others ("canonical”), and create a “translator” module or layer to/from which all existing modules exchange data with other modules.”
Conceptually we have built a mini-data pipeline. For each government API we have to implement a data record fetching function and transformation rules (the getters) which will transform the data to our standardised one.
The whole pattern visualized in a UML sequence diagram.
-
“consume” operation is represented by step 60
-
“transform rules” operation are represented by the step 80-110.
-
“storeAs” operation is represented by step 120
It’s important that you understand the responsibility of the various classes.
govAPI and
govAPI_CH (red dots) are visible to the outside world (
govAPIFactory,
gsma) as one class instance. For the caller is irrelevant who is implementing which method.
One final thing is missing, the
govAPIFactory class, which is quite straightforward. Depending on the
country_code a corresponding implementation class instance will be created and returned to the caller:
from govAPI_CH import GovAPI_CH
class GovAPIFactory:
@classmethod
def create_country_gov_api(cls, country_code,cfg):
if country_code == "CH":
return GovAPI_CH(cfg)
return None
Uff, that was quite some content to be absorbed, we introduced two important design pattern the factory method and canonical data model, as well have shown how to generate a first pair of charts.
The
lesson3.py program will generate the table within Plotly under the name
CH-govapi-member-list.
Exercise
You can find the exercise here: Link
Source Code
This blog entry was fully produced within Evernote and published using the
Cloudburo Publishing Bot
.
comments powered by Disqus