One of the many hats I wear is that of a data scientist. In my opinion, the most important job of a data scientist is asking questions. More specifically asking the right question or the right hypothesis. Ensuring that you correctly frame your research question can be the difference between a making life-changing breakthrough and wasting years on a useless experiment.

Asking the right question may seem trivial. Too often when we get an interesting dataset, we may just start performing analysis of the data (EDA) without stopping to think what the best questions are to ask about the data.


We live in exciting times. Everywhere we look, we are surrounded by accelerating technological change. Back in the 1980s and 1990s, we considered ourselves lucky when there was one technology wave to hop on. A few of these waves were mainframe technology, relational databases, the advent of the world wide web, to name a few. We now live in an era where it’s hard to keep track of the hot technology trends. IoT, blockchain, microservices, AI/ML, the cloud are just a few of the areas that are on fire today.

Companies like Amazon, Microsoft, and Google do not make it…


This article is about artificial intelligence and machine learning but before with the dive into the nitty-gritty. I would like to tell you a story about my amazing wife Karen. When we got married, we exchanged our vows. The words “ love”, “cherish”, “support” were used. (I think I also heard obey 😊). In addition — she probably regrets this decision — but she also promised to make me breakfast every Sunday.

She has been doing it without fail with one or two exceptions. She is a wonderful cook and can make anything. Sunday morning is my favorite day and…


Amazon recently announced that the certification exam for AWS Machine Learning Specialty is out of beta. I took it this morning at one of the AWS testing sites. Below are my thoughts on how to successfully tackle this new certification.

Amazon provides a study guide on what domains to expect in the exam and how many questions there will be for each particular domain. This should be your first stop to start framing the topics that you should study for the exam. By no means should it be the last.

For the guide, we learn that this will be the…


The cloud and AI/ML landscape keep on shifting and growing and it’s hard to keep up. There is a constant stream of new products, technologies, and ideas. But when it comes to business and investment books, the change of pace is slower, and many books are a rehash of old concepts, so coming across big new ideas or insights can be a rare event. One such book that had one of those powerful nuggets is a more than 2o year old book “The Gorilla Game; Picking Winners in High Technology” by Geoffrey A. Moore, Paul Thompson, and Tom Kippola.

I…


Random Forests are a powerful algorithm used for both classification (what group does it belong to) and regression (what is its value). It is a supervised algorithm meaning that it must be trained and tested with datasets where both the features (the parameters used to form the predictive model) and the targets (the actual values) are known. ‘Forest’ indicates a collection of trees. Initially a definition of trees will be provided, followed by why the trees are considered random.

The Trees in the Forest

Each tree represents a method for answering a question (what class does it belong to or…


One of the most useful and profitable applications of machine learning in finance is its application to the age-old problem of fraud. Since the days of the Medici banking family, bad actors have been trying to beat the system and get money for nothing.

One challenge of this business problem is that such actors are motivated to hide their footprints. It’s a cat and mouse game where fraudsters continuously come up with innovative ways to beat the system.

For this reason, the state of the art systems in this subject area are trained for anomaly detection rather than past patterns…


The laundering of money has now become the leading source of compliance fines for North American and European financial institutions. In 2016, regulators and governmental agencies levied fines in excess of $42 billion. In addition, it is estimated that each year, money-laundering transactions account for around $3 trillion or about 5 % of global GDP.

Even with these staggering numbers, estimates are that only about 1 percent of illicit global financial flows are ever seized by the authorities.

AML operations in banks consume an inordinate amount of manpower, resources, and capital to manage the process and comply with the regulations…


A common catchy phrase making the rounds lately is “data is the new oil”. There is a lot of truth in it in many regards. Up until 10 years ago or so, storage costs made it prohibitive to store more than a few years-worth of enterprise and personal data. It was not uncommon for most application logs to be purged on a regular schedule. Moore’s Law has been doing what it does regarding storage costs to the point that very little gathered information is ever purged. …


Machine Learning algorithms can be classified in many ways. One of my favorite classifications is the Pedro Domingo classification using the “Five tribes”. Today I will be using a more basic and traditional classification. This classification method is as follows:

  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning

Supervised learning

With supervised learning, you have a defined target, value or class to you want to predict. In other words, the “answer” to your prediction exists in a predefined set of possible answers.

Supervised learning uses a variety of techniques which all share the same principles:

The training dataset contains inputs data (your…

Alberto Artasanchez

Data Scientist, Artificial Intelligence, Machine Learning, Author of “Artificial Intelligence with Python”

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store