Product Data Science via Chatgpt - Identify product weaknesses

Improve Weaknesses




Most of the Chatgpt applications seen so far in this practical section of the course are related to building apps, either B2B or B2C apps. However, perhaps surprisingly, Chatgpt can also be used for classical product data science analyses. A typical DS task is coming up with product ideas based on the data and this is something that Chatgpt can do as well.

In standard DS, we typically build a model -> inspect the model -> identify positive and negative segments based on the model -> come up with product ideas based on those insights. Most likely, this is done via a logistic regression or Random Forest + partial dependence plots. Something very similar can also be done via Chatgpt. The main difference is that instead of building a model to identify positive/negative segments, we can directly ask Chatgpt to tell us what customers like/dislike about us.

A way to look at this is that Chatgpt allows to do user research via coding at scale. By user research, I mean all sorts of qualitative data analysis, such as surveys, customer interviews, etc. Most of this work is still fairly manual today, but this won't be for long. There are already the first companies popping up that, essentially, offer user research via AI and it is very likely that in future product data scientists (or whatever new title) will be in charge of this.

In the example below, we will do a classical product data science analysis via Chatgpt. We work for a pet food company and our product manager asks us to tell them how we can improve the product. As standard as it gets. The underlying data for the analysis will be online reviews for several pet food companies.



Improve product weaknesses


The three datasets can be downloaded by clicking here and then unzipping the file. our_reviews.csv are the reviews for our company, other_reviews.csv are the reviews for the competitors, and embedding.csv are the embeddings of our company (unlike for the rest of the course, in this case we created the embeddings in advance because it is pretty time consuming given the size of the data).



library(tidyverse)
library(openai)
library(lsa)

#this is a fake key just to show how to use it. You should store it as an environment variable.
openai_api_key = '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'

#read the scraped reviews for our company
our_reviews = read.csv("/home/info/Downloads/data/our_reviews.csv")
#as simple as possible. Just the reviews, star rating, and company
kable(head(our_reviews, 1))
stars comments company
5 I see that they ship very quick. Good product as description Our Company
#we also have reviews for a few other companies
others_reviews = read.csv("/home/info/Downloads/data/others_reviews.csv")
#same structure as our_reviews. 
kable(head(others_reviews, 1))
stars comments company
3 Great price and got here super fast, But bag was split and I had to scoop out all the food and transfer it to another bag Competitor 1
#These are the companies in this dataset
unique(others_reviews$company)
[1] "Competitor 1" "Competitor 2"
#we also already created the embeddings for our reviews via openai get_embedding as usual
embeddings = read.csv("/home/info/Downloads/data/embeddings.csv")
#row number is the same as the number of our reviews (each review maps to one embedding) and col number is the usual 1536
dim(embeddings)
[1] 6580 1536



In product data science, a common way to get product ideas is identifying segments that perform poorly and then think how to possibly improve them. Something very similarly can be done in the AI context. Just that to identify the bad segments, we could simply ask Chatgpt.



#standard question to identify the largest areas of improvement
question = "Based on these reviews, what are the worst characteristics of this product? Focus on the negative reviews."

#turn that question into embeddings
embeddings_question = create_embedding(model = "text-embedding-ada-002",
                                     input = question,
                                     openai_api_key = openai_api_key)$data$embedding[[1]]

#calculate cosine similarity between that question and our embeddings
cosine_similarity = apply(embeddings, 1, function(x) cosine(embeddings_question, as.numeric(x)))

#pick the most relevant reviews
reviews_selected = bind_cols(reviews = our_reviews$comments, cosine_similarity_col = cosine_similarity) %>%
                   #sort by cosine similarity
                   arrange(desc(cosine_similarity_col)) %>%
                   #do cumulative sum of tokens
                   mutate(sum_token = cumsum(nchar(our_reviews$comments)/4)) %>%
                   #only keep top X until I get to the token limit I chose 
                   filter (sum_token<2500) %>%
                   #and select them
                   select(reviews)

#pass them to chatgpt and see what it says
context = "Reviews:"
prompt_string = paste(question, context, reviews_selected, sep="\n\n")
chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      #let's tell chatgpt what it has to do here
                                      messages = list(
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
)
                                  
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Based on the negative reviews, the worst characteristics of this product are:  
  
1. Poor quality and trash found in the product.  
2. Moldy and old food that sticks to the can.  
3. Hard and dry texture that can potentially harm dogs' teeth.  
4. Dogs refusing to eat the product.  
5. Inconsistent quality and broken pieces in the packaging.  
6. Tiny crumbs found in the bag.  
7. Overpriced and low-quality ingredients.  
8. Unreliable packaging and delivery.  
9. Stale and unappetizing appearance of the food.  
10. Gravy-like consistency and unpleasant smell.



Chatgpt has returned interesting insights in terms of areas of improvement. However, this is still not actionable. There is nothing here that can a product manager actually use. The next step is picking any of these answers and dig deeper. For instance, let's pick the fact that some customers are complaining that the products are hard (in a real world situation you want to further investigate all those results).

#let's dig deeper
question = "Based on these reviews, why are customers complaining that the product is hard? Focus on the negative reviews."

#turn that question into embeddings
embeddings_question = create_embedding(model = "text-embedding-ada-002",
                                     input = question,
                                     openai_api_key = openai_api_key)$data$embedding[[1]]

#calculate cosine similarity between that question and our embeddings
cosine_similarity = apply(embeddings, 1, function(x) cosine(embeddings_question, as.numeric(x)))

#pick the most relevant reviews
reviews_selected = bind_cols(reviews = our_reviews$comments, cosine_similarity_col = cosine_similarity) %>%
                   #sort by cosine similarity
                   arrange(desc(cosine_similarity_col)) %>%
                   #do cumulative sum of tokens
                   mutate(sum_token = cumsum(nchar(our_reviews$comments)/4)) %>%
                   #only keep top X until I get to the token limit I chose 
                   filter (sum_token<2500) %>%
                   #and select them
                   select(reviews)

#pass them to chatgpt and see what it says
context = "Reviews:"
prompt_string = paste(question, context, reviews_selected, sep="\n\n")
chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      #let's tell chatgpt what it has to do here
                                      messages = list(
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
)
                                  
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Customers are complaining that the product is hard because it does not match the description of being soft. They mention that the treats are hard as a rock and difficult to break or chew. Some customers feel that the product is falsely advertised as soft when it is actually hard. They express disappointment and frustration with the hardness of the product, stating that it is not suitable for their pets and not worth the money.



And let's try to find out which product specifically has this issue:

#let's see if we can find out the culprits
question = "Based on these reviews, which product category is not soft as advertised?"

#turn that question into embeddings
embeddings_question = create_embedding(model = "text-embedding-ada-002",
                                     input = question,
                                     openai_api_key = openai_api_key)$data$embedding[[1]]

#calculate cosine similarity between that question and our embeddings
cosine_similarity = apply(embeddings, 1, function(x) cosine(embeddings_question, as.numeric(x)))

#pick the most relevant reviews
reviews_selected = bind_cols(reviews = our_reviews$comments, cosine_similarity_col = cosine_similarity) %>%
                   #sort by cosine similarity
                   arrange(desc(cosine_similarity_col)) %>%
                   #do cumulative sum of tokens
                   mutate(sum_token = cumsum(nchar(our_reviews$comments)/4)) %>%
                   #only keep top X until I get to the token limit I chose 
                   filter (sum_token<2500) %>%
                   #and select them
                   select(reviews)

#pass them to chatgpt and see what it says
context = "Reviews:"
prompt_string = paste(question, context, reviews_selected, sep="\n\n")
chat_example = create_chat_completion(model = "gpt-3.5-turbo",
                                      temperature=0,
                                      openai_api_key = openai_api_key,
                                      #let's tell chatgpt what it has to do here
                                      messages = list(
                                                      list("role" = "user",  "content" = prompt_string)
                                      )
)
                                  
cat(gsub(pattern = "\n", replacement = "  \n", chat_example$choices$message.content))
Based on these reviews, the product category that is not soft as advertised is treats.



Overall, we got a pretty good understanding of the issue. One thing left now is trying to understand if this is a common problem in the industry or specific to us. If all other companies had similar complaints, then this would still be actionable, but much harder to solve. So probably a lower priority. But if this is just an issue for us, then it means that it can and should be fixed fast.

To get more quantitative results, it can be really useful to identify a keyword related to the issue and then get some descriptive statistics about it from the reviews. For instance, let's pick the word "soft" and compare it across the different companies. A useful thing to check is the average star rating for reviews that include that keyword vs they don't. That is, for each company, do reviews including "soft" have a higher star rating than those who don't? If the answer is yes, then we can define softness as a positive characteristic. Else, it is a negative one.

#append all reviews in just one dataset
data = rbind(our_reviews, others_reviews)

#to make it more refined, we could potentially take the stem, add synonymous, etc. 
#The concept is the same though. Here we simply match the keyword
#returns TRUE if the keyword is included, 0 otherwise
data$word_included = grepl("soft", data$comments, ignore.case = TRUE)

#plot avg star rating for reviews including soft vs not including it, by company
ggplot(data,aes(x=company,y=stars,fill=word_included)) + 
      stat_summary(fun="mean", geom="bar", position="dodge") +
      ggtitle("Avg star rating for reviews including the word 'SOFT' vs not including it")



This is super interesting. As we can see, for every single company except for us the word "Soft" is a positive trait. When reviews mention it, on an average, they have a better star rating. But for us, soft is actually the opposite. It is something that our customers aren't happy about, but it is actually industry-average to be happy about it.

The last thing we should check is the relative frequency of this term, to get a sense of how wide-spread is this issue.

#word frequency
data_plot = data %>%
            group_by(company) %>%
            summarize(word_frequency = mean(word_included)*100)

#plot frequency of the word soft, by company
ggplot(data_plot,aes(x=company,y=word_frequency, group = 1)) + 
geom_line()+geom_point()+
ggtitle("Percentage of reviews including the keyword SOFT") 



The word frequency appears pretty high, especially when compared to the other companies, implying that this is an important issue for the customers. To sum up:

  • Chatgpt identified something that current customers are not happy about (lack of softness)

  • Chatgpt also told us that this problem is specific to a certain product (treats)

  • Old school descriptive statistics told us that this is a problem specific to us and our competitors don't actually have it

Our product suggestion for the product manager is to ideally try to make the treats softer as this seems something very important for our customers. Short term it could be good to change advertisement strategy letting customer know that it is a product more on the harder end of the spectrum, in order to attract different customers that might be happier with the product.







import os
#pip3 install 'openai==0.27.0'   
import openai
from openai.embeddings_utils import get_embedding
from openai.embeddings_utils import cosine_similarity
import pandas
import sys
import numpy as np
import tiktoken
encoding = tiktoken.get_encoding("cl100k_base")
pandas.set_option('display.max_columns', 10)
pandas.set_option('display.width', 350)
#this can be important to print the entire text in the prompt
np.set_printoptions(threshold=sys.maxsize)

#this is a fake key just to show how to use it. You should store it as an environment variable.
openai.api_key = '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'

#read the scraped reviews for our company
our_reviews = pandas.read_csv("/home/info/Downloads/data/our_reviews.csv")
#as simple as possible. Just the reviews, star rating, and company
print(our_reviews.head(1))
stars comments company
5 I see that they ship very quick. Good product as description Our Company
#we also have reviews for a few other companies
others_reviews = pandas.read_csv("/home/info/Downloads/data/others_reviews.csv")
#as simple as possible. Just the reviews, star rating, and company
print(others_reviews.head(1))
stars comments company
3 Great price and got here super fast, But bag was split and I had to scoop out all the food and transfer it to another bag Competitor 1
#These are the companies in this dataset
print(set(others_reviews.company))
{'Competitor 1', 'Competitor 2'}
#finally we also already created the embeddings for our company reviews via openai get_embedding as usual
embeddings = pandas.read_csv("/home/info/Downloads/data/embeddings.csv")

#row number is the same as the number of our_reviews (each review maps to one embedding) and col number is the usual 1536
print(embeddings.shape)
(6580, 1536)



In product data science, a common way to get product ideas is identifying segments that perform poorly and then think how to possibly improve them. Something very similarly can be done in the AI context. Just that to identify the bad segments, we could simply ask Chatgpt.



#standard question to identify the largest areas of improvement
question = "Based on these reviews, what are the worst characteristics of this product? Focus on the negative reviews."

#create embeddings for the question
embeddings_question = get_embedding(question, engine="text-embedding-ada-002")

#calculate cosine similarity
cos_similarity = embeddings.T.apply(lambda x: cosine_similarity(x, embeddings_question))

#pick most relevant reviews
reviews_selected = pandas.concat([our_reviews['comments'], cos_similarity], axis=1)
#rename columns
reviews_selected.columns=['comments','cos_similarity']
#sort by cosine similarity
reviews_selected = reviews_selected.sort_values('cos_similarity', ascending=False)
#do cumulative sum of tokens
count_tokens = reviews_selected['comments'].apply(lambda x: len(encoding.encode(str(x))))
reviews_selected['cum_sum_tokens'] = np.cumsum(count_tokens)
#only keep top X until I get to the token limit I chose, 2500 here
reviews_selected = reviews_selected.query('cum_sum_tokens<2500')

#pass them to chatgpt and see what it says
context = "Reviews:"
reviews_prompt = '\n'.join(reviews_selected['comments'])
prompt_string = f'{question}\n\n{context}\n\n{reviews_prompt}'

chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Based on the negative reviews, the worst characteristics of this product are:

- Poor quality and trash in the product
- Moldy and old food
- Hard and dry texture
- Dogs and cats not liking the taste
- Dented cans and damaged packaging
- Expensive price
- Inconsistent quality
- Maggots and mold in some cans
- Questionable ingredients
- Regurgitation and digestive issues in pets



Chatgpt has returned interesting insights in terms of areas of improvement. However, this is still not actionable. There is nothing here that can a product manager actually use. The next step is picking any of these answers and dig deeper. For instance, let's pick the fact that some customers are complaining that the products are hard (in a real world situation you want to further investigate all those results).

#let's dig deeper
question = "Based on these reviews, why are customers complaining that the product is hard? Focus on the negative reviews."

#create embeddings for the question
embeddings_question = get_embedding(question, engine="text-embedding-ada-002")

#calculate cosine similarity
cos_similarity = embeddings.T.apply(lambda x: cosine_similarity(x, embeddings_question))

#pick most relevant reviews
reviews_selected = pandas.concat([our_reviews['comments'], cos_similarity], axis=1)
#rename columns
reviews_selected.columns=['comments','cos_similarity']
#sort by cosine similarity
reviews_selected = reviews_selected.sort_values('cos_similarity', ascending=False)
#do cumulative sum of tokens
count_tokens = reviews_selected['comments'].apply(lambda x: len(encoding.encode(str(x))))
reviews_selected['cum_sum_tokens'] = np.cumsum(count_tokens)
#only keep top X until I get to the token limit I chose, 2500 here
reviews_selected = reviews_selected.query('cum_sum_tokens<2500')

#pass them to chatgpt and see what it says
context = "Reviews:"
reviews_prompt = '\n'.join(reviews_selected['comments'])
prompt_string = f'{question}\n\n{context}\n\n{reviews_prompt}'

chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Customers are complaining that the product is hard and not as advertised. They mention that the treats are hard as a rock and difficult to break into smaller pieces for their dogs. Some customers also mention that the food is hard and dry, making it difficult for their dogs to eat. There are also complaints about receiving damaged cans and packages, as well as receiving fewer items than advertised. Some customers express disappointment with the quality of the product, mentioning that it is poor quality or contains trash. Additionally, there are complaints about the high price of the product and the lack of customer service or communication from the company.



And let's try to find out which product specifically has this issue:

#let's see if we can find out the culprits
question = "Based on these reviews, which product category is not soft as advertised?"

#create embeddings for the question
embeddings_question = get_embedding(question, engine="text-embedding-ada-002")

#calculate cosine similarity
cos_similarity = embeddings.T.apply(lambda x: cosine_similarity(x, embeddings_question))

#pick most relevant reviews
reviews_selected = pandas.concat([our_reviews['comments'], cos_similarity], axis=1)
#rename columns
reviews_selected.columns=['comments','cos_similarity']
#sort by cosine similarity
reviews_selected = reviews_selected.sort_values('cos_similarity', ascending=False)
#do cumulative sum of tokens
count_tokens = reviews_selected['comments'].apply(lambda x: len(encoding.encode(str(x))))
reviews_selected['cum_sum_tokens'] = np.cumsum(count_tokens)
#only keep top X until I get to the token limit I chose, 2500 here
reviews_selected = reviews_selected.query('cum_sum_tokens<2500')

#pass them to chatgpt and see what it says
context = "Reviews:"
reviews_prompt = '\n'.join(reviews_selected['comments'])
prompt_string = f'{question}\n\n{context}\n\n{reviews_prompt}'

chat_example = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature = 0,
                    messages=[{"role": "user", "content": prompt_string}
                              ]
                 )

print(chat_example["choices"][0]["message"]["content"])
Based on these reviews, the product category that is not soft as advertised is dog treats.



Overall, we got a pretty good understanding of the issue. One thing left now is trying to understand if this is a common problem in the industry or specific to us. If all other companies had similar complaints, then this would still be actionable, but much harder to solve. So probably a lower priority. But if this is just an issue for us, then it means that it can and should be fixed fast.

To get more quantitative results, it can be really useful to identify a keyword related to the issue and then get some descriptive statistics about it from the reviews. For instance, let's pick the word "soft" and compare it across the different companies. A useful thing to check is the average star rating for reviews that include that keyword vs they don't. That is, for each company, do reviews including "soft" have a higher star rating than those who don't? If the answer is yes, then we can define softness as a positive characteristic. Else, it is a negative one.

import matplotlib.pyplot as plt
import seaborn as sns

#append all reviews in just one dataset
data = pandas.concat([our_reviews, others_reviews], ignore_index = True)

#to make it more refined, we could potentially take the stem, add synonymous, etc. 
#The concept is the same though. Here we simply match the keyword
#returns TRUE if the keyword is included, False otherwise
data['word_included'] = data.comments.str.contains('soft', case=False)

sns.barplot(x='company', y='stars', hue='word_included', data=data, ci=None).set_title(
            "Avg star rating for reviews including the word 'SOFT' vs not including it")
plt.ylim([0, 5.99])
(0, 5.99)
plt.show()



This is super interesting. As we can see, for every single company except for us the word "Soft" is a positive trait. When reviews mention it, on an average, they have a better star rating. But for us, soft is actually the opposite. It is something that our customers aren't happy about, but it is actually industry-average to be happy about it.

The last thing we should check is the relative frequency of this term, to get a sense of how wide-spread is this issue.

#word frequency
#make it a percentage for the plot
data['word_included'] = data['word_included']*100
sns.barplot(x='company', y='word_included', data=data, ci=None).set_title(
"Percentage of reviews including the keyword SOFT") 



The word frequency appears pretty high, especially when compared to the other companies, implying that this is an important issue for the customers. To sum up:

  • Chatgpt identified something that current customers are not happy about (lack of softness)

  • Chatgpt also told us that this problem is specific to certain products

  • Old school descriptive statistics told us that this is a problem specific to us and our competitors don't actually have it

Our product suggestion for the product manager is to ideally try to make the treats softer as this seems something very important for our customers. Short term it could be good to change advertisement strategy letting customer know that it is a product more on the harder end of the spectrum, in order to attract different customers that might be happier with the product.





Complete and Continue