Riya's Medley #10 - Markov Decision Process

Markov Chain is a sequence of states where your future state only depends on your current state. When you add decision-making to it, you get a Markov Decision Process (MDP)

Sep 23, 2023

The key idea in a Markov Chain is that your future state only depends on your current state, not on how you got there. For example, if you want to maximise your chance of landing your first job with a good package, it depends on your final year’s result, projects, internship etc. The past events or states, such as your high school achievements, have little impact on this process.

When you add decision-making to Markov Chain, you get a Markov Decision Process (MDP), which helps you figure out the best actions to take in each state to reach your goal efficiently.

Suppose you have a smartphone brand and you want users to switch to your brand. Let’s apply Markov Chain Decision Process to maximise this conversion.

Current State (S):

In the Markov Chain model for converting users to your smartphone brand, the current state represents the user's current level of awareness or engagement with your brand. The possible states might include:

State 1: "Unaware" (User has never heard of your brand)
State 2: "Aware" (User is aware of your brand but hasn't used it)
State 3: "Interested" (User is considering your brand)
State 4: "Trial" (User has tried your brand)
State 5: "Loyal" (User has become a loyal customer of your brand)

We can get a little bit mathematical and represent this state also like this-

S = {U, A, I, T, L}

The transition to the next state depends on the actions taken and the probabilities associated with those actions. For example, if a user is currently in the "Aware"(A) state, they may transition to the "Interested"(I) state with a certain probability if specific actions are taken, like advertising or product showcasing.

Actions (A):

Define the actions that your marketing and customer engagement efforts can take to influence users' transitions between these states. Actions might include:

Action 1: "Advertising" (Running marketing campaigns)
Action 2: "Discounts" (Offering promotional pricing)
Action 3: "Product Showcase" (Highlighting product features)
Action 4: "Customer Support" (Providing excellent customer service)

Transition Probabilities (P):

Determine the probabilities of users transitioning from one state to another based on the actions you take. This requires analyzing historical data and conducting surveys or experiments to estimate these probabilities. For example:

P(Unaware, Advertising, Aware) = 0.2 (20% chance of moving from unaware to aware after advertising)
P(Interested, Discounts, Trial) = 0.3 (30% chance of generating interest after offering discounts)

By knowing the transition probabilities, you can allocate your marketing and engagement resources more efficiently. If certain actions have a higher probability of success, you can focus your resources on those actions to maximize user conversion.

Reward(R):

Define the rewards associated with each state or state-action pair. The rewards are benefit that company gains from users moving through various states in the conversion process. The objective is to make decisions that maximise these rewards.

R(Aware) = 10,000 users visits your website per day

R(Trial) = 1500 smartphones sold

Policy (P):

Formulate a policy π(s) or more simply a strategy that specifies which action to take in each state to maximize the expected cumulative reward over time.

Value Function (V):

The value function V(s) represents the expected cumulative reward when following a specific policy π starting from a particular state S. In simpler terms, it tells us how good it is to be in a particular state and follow a given policy. It's calculated by considering the rewards and transition probabilities under the policy.

Our goal is to maximise the Reward by choosing the best Policy. The algorithm looks something like this-

The equation may seem daunting but it simply calculates the expected overall benefit of taking a specific action (like advertising) in a certain situation (like when a user is aware of your brand). It helps to identify the action that is expected to yield the maximum expected cumulative reward when you're in that state.

In summary, using a Markov Chain Decision Process allows you to create a structured approach to convert users to your smartphone brand by modeling user states, actions, transitions, rewards, and policies. It helps you make informed decisions to maximize the likelihood of users transitioning from unaware to loyal customers.

Riya's Medley