dev@cloudburo

Python Tutorial: Connect Government Data API's By Using The Factory Pattern

Source Politician Data from Public Government Data API’s to Increase Allocation Accuracy

In this tutorial, we do mainly two steps
  • First of all we generate some plotly diagrams out of our collected data,
  • secondly, we connect a public government API to our program in order to retrieve government members and party assignment in a reliable way

As you already know we are interested in a generalized program, which can work with multiple countries. For that, we will introduce an abstract class and the factory pattern in the second part of our tutorial.

But let’s start with the first quite easy task of generating two charts for our tables.

Plotly Chart Generation
We create a bar chart as well as a pie chart. As a base, we take our plotly table CH-tw-party-list.



The code is straightforward:
  • In the bar chart, we visualize the accumulated friends count per party.
  • In the pie chart, we aggregate the twitter account per party.


  def create_party_friends_count_bar_chart(self, df): 
      data = [ 
          go.Bar( 
              x=df.Party, # assign x as the dataframe column 'x' 
              y=df.FriendsCount 
          ) 
      ] 
      py.plot(data, filename=self.__country_code+'-tw-party_politicans_count') 
  def create_party_politicans_count_pie_chart(self, df): 
      trace = go.Pie(labels=df.Party, values=df.PartyCount, 
                     hoverinfo='label+percent', textinfo='value', 
                     title="Twitter User per Party", titlefont=dict( 
                          family='Courier New, monospace', 
                          size=14, 
                          color='#7f7f7f' 
                      )) 
      data = [trace] 
      py.plot(data, filename=self.__country_code+'-tw-party_friends_count') 


As one can see in the code excerpt, various configuration parameters allow you to modify the layout of a chart. Head over to the Plotly Python Open Source Graphing Library to find out more about the various possibilities with charts, panda and plotly.


As one can see we have a lot of “unkowns”, i.e. we couldn’t identify the corresponding party by just analyzing Twitter data elements. In the second part of this tutorial, we will connect another data source for addressing this issue.




Government Data API Factory



In the recent years, the availability of so-called open Government API’s exploded. It stems from the idea that data should be open, as Wikipedia describes the term of Open Data:

“Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control….
One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data’s importance is borne from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.”

A good starting point f for getting an overview of government data API’s is the programmablweb directory, which list over 20’00 different API’s.

The government API category can be found here:


Two examples of Government API’s:

The data API of the US government: https://www.data.gov/developers/apis


Or the Swiss Government API of the Swiss Parliament: http://ws-old.parlament.ch/:

We will use the Swiss Parliament API’s to extract personal data of the parliament members (mainly the Party allocation) in order to increase the accuracy of our twitter matching algorithm.


Councillors Data

What we require is the councillors data objects: http://ws-old.parlament.ch/councillors


In order to get the data in a program readable form we have to attach the query parameter: format=json. Which will return the content as a JSON document: http://ws-old.parlament.ch/councillors?format=json

As we have already explained in our second tutorial a public API may return a lot of information. In order to control the amount of data returned in one request, the concept of a cursor or paging mechanism is used.

The Swiss Parliament API returns about 25 records in one request. The latest request record will have an attribute attached which tells you if there are more data available ( hasMorePages=true).

In case it sets to ‘ true’ you may fetch the next page by adding the query parameter
pageNumber=2 etc.


You will find such kind of information about the API normally in its user documentation, e.g. the Swiss Parliament API has some parameters to control the output format, language etc.



Having now a basic understanding of the API, we can enhance program which is capable of reading data from country-specific government API’s. Let’s dig into the code.

Enhancing the Code - the UML Diagram

Introducing the government API in a general way needs some serious design and enhancement of our program. The UML class diagram of our enhanced program looks as follows (don’t be overwhelmed by the complexity, all the details will be explained later in this article).



A quick summary of what we have done until now:

  • We created the GovernmentSocialMediaAnalyzer class in the second tutorial, which is capable of retrieving twitter relevant account data of politician of a country. We used a configuration driven based approach - based on YAML - to abstract the country-specific data into a configuration file
  • Several methods were defined which allowed us to create panda data frames, as well as plotly specific tables and charts.

Now we will introduce three new classes govAPIFactory, govAPI (an abstract class) and govAPI_CH, which will build a generalized approach for connecting any kind of government API’s.

Factory Method Pattern

Software DesignPattern play an important role in Software design, as described by Wikipedia:

“In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations. Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system.”

In our design, we will use the Factory Method Pattern to generalise the connectivity to a government API, which is explained by Wikipedia as follows:

“In class-based programming, the factory method pattern is a creational pattern that uses factory methods to deal with the problem of creating objects without having to specify the exact class of the object that will be created. This is done by creating objects by calling a factory method—either specified in an interface and implemented by child classes, or implemented in a base class and optionally overridden by derived classes —rather than by calling a constructor.”

Our design will be based on the strategy, to define
  • a base class (parent - GovAPI) which is abstract and
  • a derived class (child - GovAPI_CH), which will have the country specific implementation (i.e Switzerland).
  • in the future, we can introduce additional classes for example for the United Kingdom we would construct the implementation class: GovAPI_UK

Abstract Base Class “GovAPI"



govAPI is an abstract class that contains several abstract methods. An abstract method is a a method that is declared, but contains no implementation.

In Python, an abstract class is derived (or inherited) from the class ABC and will have one to more methods marked with @abstractmethod.


  from abc import ABC, abstractmethod 
  ... 


  class GovAPI(ABC): 
      @abstractmethod 
      def load_government_members(self): 
          pass 
      ... 


So the abstract class provides you with a build plan, for any implementation class which inherits from it. in our class the govAPI_CH. What kind of methods does govAPI_CH has to implement ?

First of all the implementation of the load_government_members() method has to take care of the politician’s data records fetching from the government API. Each fetched record - which represents the data of a single politician - must be passed to the method add_person_record (which is already implemented by the govAPI base class)

The question now is, what the heck is the add_person_record method doing ? Well look at the code below.


  def _add_person_record(self, dict): 
      person = { 
          'id': self._get_id(dict), 
          'active': self._get_active(dict), 
          'lastName': self._get_last_name(dict), 
          'firstName': self._get_first_name(dict), 
          'middleName':self._get_middle_name(dict), 
          'gender': self._get_gender(dict), 
          'party': self._get_party(dict), 
          'council' : self._get_council(dict), 
          'electedDate': self._get_elected_date(dict), 
          'birthDate': self._get_birthdate(dict), 
          'maritalStatus': self._get_marital_status(dict), 
          'title':  self._get_title(dict), 
          'statePostalCode': self._get_state_postal_code(dict), 
          'district': self._get_district(dict), 
          'zip': self._get_zip(dict), 
          'townName': self._get_town_name(dict) 
      } 
      self._members.append(person) 


The method is just preparing a target dictionary for our person record. I.e. the attribute names defined ( lastName, firstName, council etc.) are the names we want to use for any GovAPI implementation.

That means our retrieved record in the form of a dedicated government API implementation (i.e. in the form of the Swiss Government Parliament API) has to be transformed by using a bunch of getter methods.

Each of this getter methods is either abstract or returns an empty string. It’s the responsibility of the implementer of an inherited class ( GovAPI_CH) to provide the correct getter implementation.


  @abstractmethod 
  def _get_active(self,dict): 
      pass 
  @abstractmethod 
  def _get_id(self,dict): 
      pass 
  @abstractmethod 
  def _get_last_name(self,dict): 
      pass 
  @abstractmethod 
  def _get_first_name(self,dict): 
      pass 
  def _get_middle_name(self,dict): 
      pass 
  def _get_party(self,dict): 
      return '' 
  def _get_council(self,dict): 
      return '' 


Implementation Class “GovAPI_CH"

The getter method implementations of GovAPI_CH is shown below. It consists of a bunch of getter methods, which will return required attribute values out of the record.


  class GovAPI_CH(GovAPI): 
       
      def _get_active(self,dict): 
          return dict.get('active') 
      def _get_id(self,dict): 
          return dict.get('id') 
      def _get_last_name(self,dict): 
          return dict.get('lastName') 
      def _get_first_name(self,dict): 
          return dict.get('firstName') 
      def _get_middle_name(self,dict): 
          return '' 
      def _get_party(self,dict): 
          return dict.get('party') 
      ... 
      def _get_birthdate(self,dict): 
          return self._convert_utc_timestamp(dict.get('birthDate')) 
      def _get_title(self,dict): 
          return dict['salutationTitle'] 
      def _get_country(self,dict): 
          return 'CH' 
      def _get_state_postal_code(self,dict): 
          return dict['cantonName'] 
      def _get_zip(self,dict): 
          return dict.get('postalAddress').get('zip') 
      def _get_town_name(self,dict): 
          return dict.get('postalAddress').get('city') 
      def _get_elected_date(self, dict): 
          return self._convert_utc_timestamp(dict['councilMemberships'][0]['entryDate']) 


Let’s drill down into the method load_gevernment_members:

Our implementation uses the python module requests which is “an elegant and simple HTTP library for human-beings”. In the introduction section of this article, we provided an overview of the Swiss parliaments API. The code below will fetch the data, using the paging mechanism.

The URL and its parameter we placed in our configuration YAML file.


  govAPIUrl: "http://ws-old.parlament.ch/" 
  govAPICouncillorsRes: "councillors" 
  govAPIParams: 
     - format : "json" 


  • The first requests.get will fetch all the councillors overview pages
    http://ws-old.parlament.ch/councillors?format=json&pageNumber=1. In case a data record is marked as active the details record will be fetched
  • The second request will use the id attributed of the record and construct the URL for the details record. I.e in this example we fetch the politician record with the id ‘1358’:
    http://ws-old.parlament.ch/councillors/1358?format=json
  • The retrieved detail record we pass to the method addPerson which will transform the provided data record to the target one (by using the getters we have implemented).
  • Finally, we check for the hasMorePages attribute and in case we reached the last record we will break the loop.


  def load_government_members(self): 
      page_number=1 
      url = self.__cfg['govAPIUrl'] 
      politician_res = self.__cfg['govAPICouncillorsRes'] 
      par = self.__cfg['govAPIParams'] 
      while True: 
          par[0]['pageNumber'] = str(page_number) 
          headers = requests.utils.default_headers() 
          headers.update({ 'User-Agent': 'Mozilla/5.0'}) 
          politicians = requests.get(url+politician_res, params=par[0], headers=headers).json() 
          has_more_pages = False 
          for politician in politicians: 
              if politician.get('hasMorePages'): 
                  has_more_pages = True 
              if politician['active']: 
                  id = politician['id'] 
                  details = requests.get(url+politician_res+"/"+str(id), params=par[0], headers=headers).json() 
                  print(details) 
                  self._add_person_record(details) 
          if not has_more_pages: 
              break 
          else: 
              page_number += 1 
      return self._members 


The above method will be called within the govAPI function create_politican_from_govapi_table (already implmented by the govAPI parent class) which will transform the list of politician records into a Panda dataframe.


  def create_politican_from_govapi_table(self): 
      self.load_government_members() 
      df = DataFrame.from_records(self._members) 
      print(df) 


It’s important to realize here, that the structure of this Panda dataframe will be the same of any kind of Government API, as long as we implement a specific class based on the govAPI abstract class. So we normalized our data so that we can work and process it afterward in a standardized way.

Again we strived an important design pattern, our target structure (or model), is known under the name Canonical Model. As Wikipedia describes:

"A canonical model is a design pattern used to communicate between different data formats. Essentially: create a data model which is a superset of all the others ("canonical”), and create a “translator” module or layer to/from which all existing modules exchange data with other modules.”

Conceptually we have built a mini-data pipeline. For each government API we have to implement a data record fetching function and transformation rules (the getters) which will transform the data to our standardised one.



The whole pattern visualized in a UML sequence diagram.

  • “consume” operation is represented by step 60
  • “transform rules” operation are represented by the step 80-110.
  • “storeAs” operation is represented by step 120


It’s important that you understand the responsibility of the various classes. govAPI and govAPI_CH (red dots) are visible to the outside world ( govAPIFactory, gsma) as one class instance. For the caller is irrelevant who is implementing which method.

One final thing is missing, the govAPIFactory class, which is quite straightforward. Depending on the country_code a corresponding implementation class instance will be created and returned to the caller:


  from govAPI_CH import GovAPI_CH 
  class GovAPIFactory: 
      @classmethod 
      def create_country_gov_api(cls, country_code,cfg): 
          if country_code == "CH": 
              return GovAPI_CH(cfg) 
          return None 


Uff, that was quite some content to be absorbed, we introduced two important design pattern the factory method and canonical data model, as well have shown how to generate a first pair of charts.

The lesson3.py program will generate the table within Plotly under the name CH-govapi-member-list.



Exercise

You can find the exercise here: Link

Source Code

The source code can be found here (lesson 3 directory): https://github.com/talfco/clb-sentiment

This blog entry was fully produced within Evernote and published using the Cloudburo Publishing Bot .

comments powered by Disqus