Create an easy #AI enabled Lightning component in #Salesforce using #Gensim

<< If you like to see the component live in my na30 org, please post your email id and Name in this blog comment and I will send you the org details>>

Salesforce is a major player in CRM industry. Among its major key features are Marketing cloud, Sales cloud, Service cloud, community cloud etc. Across all the clouds resides a platform called files. There is a huge potential of improving the way we use files though Artificial Intelligence.



Chatter is another prevalent aspect of Salesforce. This looks like a corporate Facebook platform. Here you share your views, ask questions, update your groups, Follow your peers etc. This is another platform which is having exposed to variant amount of data and can help us infer lots of hidden meaning.


So what I am trying to infer? I mentioned about Artificial intelligence, so definitely we are going to talk about that. In this blog I want to show how we can easily create a Lightning component which can help you do some interesting cool things related to Files and Chatter Feed.

Problem Statement

First and foremost I wrote down the issues which were affecting me on a daily basis. Out of which I ranked 2 scenarios which I felt if solved, will have a beneficial impact on the customer.

1. First was File Summarization. We tend to upload multiple files on a daily basis but how to make the user interested to read them was an issue. If I was @mentioned in a feed with a big pdf as an attachment, normally I would skip reading it unless I felt it is really important. Even if it was important most of the time I felt reading first few paragraphs from the document. I always felt a need to file summarization which would have highlighted only the high priority sentences. I felt if this problem was solved it will not only help the user to save time and effort but also will have a great impact in recommending files.

2. Second similar problem was in Chatter feeds. Most of the time we have seen in a question and answer feed, the comment posts grow beyond 5 or 10 threads. In such threads it is time consuming for a reader to go through each and every post and figure out the answer from them. The solution that I thought was to have a summarized view of a feed which can highlight the main feeds and can also suggest the user which post to refer.

Both the above problems were flavors of a single text summarization feature. I felt like if I was able to get a solution of these 2 problems then it will suffice me for my needs.

After finalizing the Problem statement my next intent was to figure out the tool or utility which I will use for this solution.


Why Gensim?

Nowadays thanks to the open source world there are numerous tools and utilities which can help you do machine learning without any detailed prior knowledge about the same. Among these tools one of them is Gensim.

Gensim has been used by many corporates because of its ease to use and free licenses (Please note not all features in Gensim are having free licenses, please go through the documentation before you opt for it)

Gensim is written in python language. Because of it modular designs it is easy to install and leverage the apis. During this learning phase I did a little bit of research with Metamind and ML libraries for Apache Spark.

Metamind is a best opted tool for this purpose because
a. It is Salesforce product
b. Its apis easily bind with Salesforce.
But there was a blocker in terms of text related analysis. Metamind has just beta released a sentiment analysis feature. Apart from this it is not having any other text related model.

ML Lib in Apache spark comes with a set of libraries which can be trained and used for different machine learning use cases. There was no con for this, but the fact that you need to have a dataset on which you need to run your model was cumbersome. Moreover the effort quadruples itself when we consider multi language scenarios.

Gensim proved to be the best among the 3 tools based on following aspects
1. It is open sourced and has free commercial license
2. It can easily be deployed in Heroku. Heroku has been owned by Salesforce and normally customers don't have issues sharing their data with Heroku services
3. It is having support for fasttext. FastText as you might know it is a research lab for Facebook which has pre-trained models for 90+ languages. So all the goodness of that comes with Gensim too.
4. You get a generic model which can be used right away instead of making the model learn from scratch with your data. For a hackday this felt as a boon.

Gensim leverages Text2Rank algorithm where it ranks each sentences respecting their occurrence of appearance in the document. It then shows top ranked sentences based on weighing factor which will be discussed later. You can read more about it at here.

After finalizing the tool, the next item was to design the system.

Architectural Design

Gensim can be easily ported as a microservice. The intent was to have a microservice which talks to a custom lightning component. The custom lightning component can be easily integrated in the user's page and can help the user retrieve the summary details.

In order to port Gensim in Microservice I began to explore all the options and figured out that we can easily host gensim in Heroku.

Following diagram gives more details



Details

1.Lightning Component talks to Heroku Service.
2.Heroku Service talks to Salesforce Rest apis.
3.Through Rest Api we fetch Salesforce File content or Chatter Feed data based on request
4.Data is sent back to Heroku Service
5.Data sent to Gensim which summarizes the content and sends back the result
6.Summary saved in custom object
7. Heroku Service fetches summary data from custom object
8. Heroku Service returns the results to Lightning component

Coding Time

Let's start with the microservice first.

Note we need will be deploying the python application on Heroku using gunicorn and Flask. There are numerous example in internet which can show you how. Here is a link of such approach that you can refer.


As per the documentation we need a requirements.txt file which will have all the pre-requisites. For our case we need the following softwares to be installed as a setup process. Note we are having gensim 0.12.4 and simple-salesforce=0.72.2. Gensim will help us to do the summarization while simple-salesforce will help us connect to Salesforce



asn1crypto==0.22.0
certifi==2017.4.17
cffi==1.10.0
chardet==3.0.4
click==6.7
cryptography==1.9
enum34==1.1.6
Flask==0.12.2
gunicorn==19.7.1
idna==2.5
ipaddress==1.0.18
itsdangerous==0.24
Jinja2==2.9.6
MarkupSafe==1.0
pycparser==2.17
pyOpenSSL==17.0.0
requests==2.18.1
six==1.10.0
urllib3==1.21.1
Werkzeug==0.12.2
gensim==0.12.4
Cython==0.24
simple-salesforce==0.72.2


Here is the Python code for the same

import sys
import requests
from flask import Flask, request
from simple_salesforce import Salesforce
from gensim.summarization import summarize, keywords

We need to import Flask which will help us create the get and post api endpoints. Note with summarization from gensim I am also importing keywords. Keywords api helps in doing topic classification from within the content.


app = Flask(__name__)
reload(sys)
sys.setdefaultencoding('utf-8')

Flask needs to be initialized which will give the app object. We have to set the default encoding value to utf-8 else it will throw exception during GET and POST transactions.


@app.route('/getfilesummary/',methods=['GET'])
def getFileSummary():
docId = request.args.get('docId')
print docId
summaryObj = Summary()
content =summaryObj.getcontentfromtextfile(docId) 
summary=summarize(content, word_count=50)
topics=keywords(content, ratio=0.01)
if summaryObj.verifyDocEntryExist(docId) is False:
#Insert the summary with document in custom object
summaryObj.insertSummaryInFileSummaryObject(docId, summary, topics)
return summary

Believe me when I say you just need to do this for summarization logic. Now you will realize how easy it is to integrate gensim with Salesforce. Let me explain what is done in this module. First and foremost the endpoint is named as getfilesummary and it is invoked over GET protocol. Ex: http://herokuserviceurl/getsummary

The endpoint accepts you to send a docId (which starts with a prefix 069). We are invoking a Summary Class object which we have defined later which will help us with the handle of the object. it has a method called getcontentfromtextfile which gets the text content of that file. Then with the content we just need to invoke the summarize method which was imported from gensim. While calling summarize method we can mention a weight which says how much word should the resultant summary has. In my case I wanted to keep it short so made 50 words.

Once we get the summary we are calling for the topics for the same file by invoking the keywords method. Note here there is a different weight given to the method and that is ratio. Ratio helps you define a percentage which you want to skim from the top as the output. Precisely you can play around with these 2 weighing factors. After getting the summary we need to insert the same in our custom object.

The intent is if the summary is already present in the custom object then skip insertion else insert the data. Note: we should also have another scope to update the summary. As it was a hackday project I didn't felt the necessity to make it full proof for all scenarios. Eventually you return the summary

@app.route('/getfeedsummary/', methods=['GET'])
def getFeedSummary():
feedId = request.args.get('feedId')
summaryObj = Summary()
summary = summaryObj.getdatafromfeed(feedId)
if summaryObj.isFeedEntryExist(feedId) is False:
#Insert the summary with feed in custom object
summaryObj.insertInFeedSummaryObject(feedId, summary)
print summary
return summary

Similarly we can have the same flow for feed summary too, where you get the feed data from Salesforce, send the text to gensim summary and get the summary. Eventually insert the details in the Salesforce custom object

@app.route('/getsummary/', methods=['GET','POST'])
def getSummary():
text = request.values.get('text')
summary = summarize(text, word_count=50)
print "SUMMARY: " + summary
return summary

I made a generic method just for fun where if you send a text as an argument it should respond back with the summary text.

Now comes the Summary class definition

class Summary:
def getcontentfromtextfile(self, docId):
        sf=self.__getSalesforceObject()
sessionId = sf.session_id

        versionobject = sf.query_all("SELECT VersionData from ContentVersion WHERE ContentDocumentId='%s'" % (docId))
        versionDataUrl=''
        for record in versionobject['records']:
                versionDataUrl = record['VersionData']
        url = 'https://na30.salesforce.com'+versionDataUrl
        response = requests.get(url,
                headers={ 'Content-Type': 'application/json', 'Authorization': 'Bearer %s' % sessionId })

        content = response.text
        return content

The first method of Summary class is getcontentfromtextfile. In here our intent is to get the text content of a file. We will try to execute sobject query using simple_salesforce.

Firstly we need to get the salesforce object, this is nothing but initialization of simple salesforce object. docId is sent to the sobject query as an argument. Note to get the text content we need to construct an url which will have the versionData from Content Version pertaining to the content document Id.

Once the url is constructed , querying the same should give you the text content

def insertSummaryInFileSummaryObject(self, docId, summary, keywords):
sf = self.__getSalesforceObject()
sf.FileSummary__c.create({'ContentDocumentId__c':docId, 'FileSummaryText__c':summary, 'Topics__c':keywords})

def insertInFeedSummaryObject(self, feedId, summary):
sf = self.__getSalesforceObject()
sf.FeedSummary__c.create({'FeedId__c':feedId, 'FeedSummaryText__c': summary})

def deleteSummaryObject(self, docId):
sf=self.__getSalesforceObject()
sf.FileSummary__c.delete(docId)

def verifyDocEntryExist(self, docId):
totalRecords=''
sf=self.__getSalesforceObject()
try:
fileSummaries = sf.query_all("SELECT FileSummaryText__c from FileSummary__c WHERE ContentDocumentId__c='%s'" % (docId))
totalRecords=fileSummaries['totalSize']
except ValueError:
print 'I see an exception' + ValueError
if totalRecords == 0:
return False
else:
return True

def updateDocEntryExist(self, docId, summary):
sf=self.__getSalesforceObject()
record = sf.query_all("SELECT FileSummaryText__c from FileSummary__c WHERE ContentDocumentId__c='%s' LIMIT 1" % (docId))
record.FileSummaryText__c= summary
sf.FileSummaryText__c.upsert(record)

All the above methods are related to the CRUD operations for custom object. These are pretty simple to understand and can be interpreted from the Sobject query. 


#Get the salesforce object
def __getSalesforceObject(self):
sf = Salesforce(username=‘<<username>>’, password=“<<password>>“, security_token=‘<<token>>’)
return sf

Finally documenting the method which was helping us to create the simple_salesforce object. In here I was using the username/password authentication approach as it was easy to implement and less risky for a hackathon. If you want to implement this in production, I would suggest you follow the oauth approach.

Conclusion


Note I have not documented the feed summarization method here as it is pretty similar to files summarization.

Well that's it folks this is the only code snippet you need to follow for your heroku service. I am not documenting the implementation of Lightning component as there are multiple examples available on net. In order to query this heroku service endpoint from your lightning component, setup a named credential and then create a http request from inside the controller.

There is a huge opportunity to leverage this solution in multiple arenas of Salesforce


Above figure highlights a few
1. New Sales representatives can quickly digest files and feed information
2. Better file search if the indexing happens on summary field
3. Machine can now suggest which could be the correct answer in a question answer feed. Humans can just vote up or down the particular answer
4. Support representative has the ability to connect documents with cases without going through the entire document and just by following the summary.


Screenshots

File Summary



For the above summary here is the text for the actual document

Pomegranate stimulates fat reduction and prevent insulin resistance

Pomegranate is a rich source of punicic acid.

A study conducted Leiden University Medical Centre, Netherlands demonstrated that obese and insulin resistant mice after feeding with pomegranate seed oil for 12 weeks showed lower body weight, significant decrease in body fat and improved peripheral insulin sensitivity.

A research study was conducted at The Department of Biological Sciences and Biotechnology, Tsinghua University, China to investigate the anti-obesity effects of pomegranate leaf extract (PLE).

Obese mice were treated with this extract for 5 weeks. It was observed that obesity and hyperlipidaemia were inhibited by the PLE. Scientists concluded that PLE can be a novel appetite suppressant too.

Another study conducted to investigate the benefits of pomegranate seed oil (PSO) on hyperlipidaemic subjects showed that PSO has favourable effects on the lipid profiles.

Insulin resistance is associated with weight gain and obesity.

Pomegranate is a good source of anthocyanins. A scientific study demonstrated that anthocyanins reduce dyslipidaemia, enhances antioxidant capacity and prevents insulin resistance in diabetic patients.

A review conducted in the King Saud University, Saudi Arabia mentions pomegranate as a preventive agent against obesity. This is mainly because of the tannin, anthocyanin, antioxidants and flavonoid content in them.

What it means? Pomegranate through various mechanisms is found to promote fat reduction and prevents insulin resistance thereby averting the risk of obesity.
2. Pomegranate enhances satiety, appetite control and prevent over-eating

Dietary fiber has a great physiological effect on satiation because of its properties of bulk addition and viscosity impartment.

These fibers prolong the intestinal phase of digestion and absorption. This provides a good time for macronutrients to trigger the signals of satiation and thus prevent in over eating. This property of dietary fiber is highly beneficial in weight management.

Pomegranate is a great source of dietary fiber. They consist of around 11.3g per fruit providing 45% of the daily need of dietary fiber.

Dietary fiber withdraws water and holds on to it. This results in expanding of dietary fiber in stomach enhancing the satiety. As a result, food is absorbed slowly preventing the intake of unnecessary calories.

What it means? Pomegranate is a great source of dietary fiber. Dietary fiber enhances satiation, appetite control and thus helps in controlling body weight.




Comments

  1. Thanks you for sharing the article. The data that you provided in the blog is infromative and effectve. Through you blog I gained so much knowledge. Also check my collection at Salesforce Online Training Blog

    ReplyDelete

Post a Comment

Popular posts from this blog

Firebase authentication with Ionic creator

Big Data - SWOT Analysis

LINKEDIN api call using NODE.JS OAUTH module