Hacking M-Pesa - Transactions Data Mining

There is nothing wrong with data mining. Most serious companies hire a guy like me to crunch their data and give them new, non-obvious insights. They will get insights like:

  • How to target your products
  • Which to discontinue
  • Which to invest in
  • Whether an advertising campaign is working
  • Usage patterns
  • Purchase patterns
  • Customer patterns

Having said that, I object strongly to government mining our (Kenya) M-Pesa transactional information. You too should be concerned about this.

I would like to discuss data mining of M-Pesa transactions in a bit more detail because the numbers from the other mobile money providers in Kenya are of nuisance value.

Every M-Pesa transaction contains the following information:

  • Date
  • Time
  • Sender phone number
  • Recipient phone number
  • Amount
  • M-Pesa Outlet

Now, to register for M-Pesa you need to provide some information about yourself and the interesting bits of thaat data are:

  • Identity (ID) Number
  • Name
  • Date of birth
  • Gender

Let us now look at the M-Pesa outlet. An M-Pesa outlet, needs to register itself too. Therefore the following information is available:

  • Outlet name
  • PIN Number
  • Physical Location
  • Owner name, ID number
  • GPS Co-ordinates *
  • Opening time *
  • Closing time *

Note: The starred items are what I am not certain whether Safaricom collects or not, but if i were them, I would.

Six months of collecting this data is data mining gold. I'd frankly be astonished if Safaricom did not mine this database. There are some quick and pretty obvious things that you can derive from this data to improve service delivery.

M-Pesa Outlet Opening and Closing Times

First would be to detrmine which outlets really open on time. Given an outlet that claims to open at 8 AM yet the earliest M-Pesa transaction on a daily basis is between 9 AM and 9h30 AM over a continuous period of time, it is higly likely that the outlet does not open on time as it claims.

Now you could do the same to determine which outlets close on time.

Determining M-Pesa Outlets Optimum Locations

Given that the data provide you with GPS co-ordinates of each outle, you can position the M-Pesa outlets on a map. If you find there are four next to each other, A,B,C and D, and A,B and C do on average 30 transactions a day but D just does 5. You can probably close D.

In the same manner, assuming that you discover that A,B,C and D are processing transactions continuously from opening to closing time i.e. there are no hourly spikes, cross references with the average number of processed transactions across outlets. It is likely they are working flat out in which case you might need more outlets in the area to absorb the load.

Demographic Profile of M-Pesa Outlets

This is more interesting. Since the data provides us with the sender's details, we can derive things like "what is the modal age of customers at a particular M-Pesa outlet?".

By modal age I mean get the age of the sender, and find out how frequently that age occurs.

In other words, you can find in a particular outlet, most visitors are between 25-30 years old and in another outlet most visitors are 18-23 years old and in another 40-50 years old.

This is useful information for marketing people.

This can also assist with outlet design. For example, at an outlet where most visitors are 40-50 years old Safaricom can advise the outlet to get chairs for customers to sit as they wait.

M-Pesa Transactions Peak Times

This one is self explanatory.

You might find for example on average an outlet does 10 transactions an hour but at lunch time it spikes to 200. Then it drops back to 10. You find this outlet cannot handle the spike so customers have to queue.

The dilemma from this data then becomes that if you open a second outlet, it will likely be idle. If you do nothing then there is a likelihood of growing customer dissatisfaction.

The solution could then be something like a portable M-Pesa outlet (e.g. van) that can go there at lunch time, absorb the load and then leave.

Average Time to Complete M-Pesa Transactions

If you are from Kenya and remember that the initial forms to fill, they collected a lot more information than they do now. It is likely that someone analyzed these numbers (average time to complete transactions) and optimized the process.

There are tons of other things that you can look out for but those examples should suffice.

Data Gold Mine

Let's move on to the transaction themselves. Remember the following information is at your disposal:

  • Sender name
  • Sender ID number
  • Sender gender
  • Sender age
  • Recipient name
  • Recipient ID number
  • Recipient gender
  • Recipient age
  • Amount
  • M-Pesa outlet name
  • M-Pesa outlet location

Safaricom is sitting on a gold mine next to oil and platinum deposits because they also have access to your call records.

In other words, they can cross-reference your call and your M-Pesa records and mine them still further. Add to this the SMS database and this is paradise. You can derive a treasure trove of information from this.

Over and above who you are sending the money to, there is a lot of context to be gleaned (using metadata) if we can estimate why you are sending the money.

Metadata Could Get You Killed

Let us take an example of how end to end mining would work.

Let me again repeat that "data mining is premised on probability, and not certainty. Some of the assumptions may be wrong. But usually, you can derive pretty good confidence levels."

End to End Data Mining

0721 000000 sends 5,000 to 0722 000000 at 2.00 AM, via his phone.

First, we build a profile of both sender and recipient

  • 0721 00000 maps to John Kamau, aged 37. He has been a customer since 2000.

  • 0722 000000 maps to Jamie Omondi, who is not a male as first thought, but a female, aged 32, a customer since 2003.

Next, we analyze the context.

A 2h00 AM transaction is unusual. This is unlikely to be paying for something. Let us hop over to the phone logs database. Phone logs reveal that John and Jamie have in the past have made calls to each other. Now we can therefore infer that they know each other. Therefore that transfer was probably either some emergency or Jamie had a pressing bill that she needed to pay.

The next bit is to check if there are any subsequent transactions where Jamie is the sender.

It turns out that Lipa Na MPesa till number 000000 received a payment of 4,500 from Jamie 5 minutes after she got the money from John. Have there been any other payments from Jamie to that till number? Yes, on average, twice a month, over a 6 month period. From the till number we can determine the business it was registered to. It is Sky Lounge, a swanky bar.

Let's check if there have been any other payments to tills belonging to bars? Yes! 6 other bars / hotels over the same 6 month period. We can then infer that Jamie probably drinks. Given the profiles of the outlets she drinks at, she probably is more likely to drink spirits and cocktails.

So, if Safaricom were decide to license targeted customer profile databases and KBL requests and pays for that, guess whose details would be on that database?

Or if Safaricom decides to do context sensitive advertising. Once Jamie logs in to her Gmail via her Bamba modem, Safaricom can tie her traffic to her number. And can therefore serve appropriate ads (Smirnoff, etc)

Relax, I said "IF"!

Going back to John, What other transactions has John made?

John has made at least one transaction every month via Pay Bill to a hair salon. The average amount is 5,000 Ksh which means it is unlikely he is paying for himself. There is probably a lady in his life, who he accompanies to the salon.

It also turns out that he has used Lipa Karo to 3 different schools, ergo he either is a father with 3 children or he has 3 dependants he pays school fees for.

John also pays DSTV via M-Pesa. Premium package (7,000 KSh) without fail on the 3rd of each month. John also pays Access Kenya (10,400 KSh) for his home internet connection, also on the 3rd of each month. John also pays Kenya Power an average of 4,000 a month in power, which says something about where he lives - he likely does not live alone.

His bills say a lot about his financial abilities. In fact, none of his bills is paid earlier than the 3rd.

Looking closer, on the morning of the 3rd of every month John makes a 30,000 Ksh deposit into his M-Pesa from his bank which he uses to pay his bills. This suggests that likely he has a regular income that clears on the 3rd.

John also makes many payments to Steers. As frequently as 3 times a week, averaging 700 Ksh. The payments are always in the evening. This suggests that John eats a lot of take-away. Thus it is unlikely he is living with children (no one feeds kids burgers 3 days a week). This is supported by the fact that his spending at the Steers (700 is pretty much a meal for one).

There is also a payment of 3,000 at the end of every month to a number that does not appear in any of his call logs. This same number also received the same 3,000 from 4 other different numbers, with the same pattern. No calls. Who do you send money to but never call? Either some nefarious criminal enterprise or much more likely, a some sort of housekeeper.

But let me not belabour the point. A lot of insight can be derived for data mining, and this is not necessarily a bad thing.

What does Safaricom do with the data?

Safaricom probably uses this number crunching to derive things like:

  • New products e.g. tariffs
  • Promotions e.g. free calls from minute x
  • Pricing & price adjustments
  • Optimization of infrastructure
  • Competition containment (what is the highest we = can charge for inter-network connectivity while still making money, staying clear of the regulator and blacking the eye of other networks)

What horrifies me would be government having direct access to that information. That cannot be a good thing!

Cover Image Credit: Fiona Bradley

Comments