The Network is the Computer: Mining Content in the Cloud

There’s an increasing need for the mining of content on social media and to find significant words. You might not know it, but there’s a…

The Network is the Computer: Mining Content in the Cloud

There’s an increasing need for the mining of content on social media and to find significant words. You might not know it, but there’s a whole lot of mining agents on the Internet who are trying to make some sense of your posts and your sentiment.

For law enforcement it is phrases such as “bomb”, “explosion” and “attack”, but for hotel sites, it might be “I hated the room”, “tear in carpet” and “such a lovely view”. And so we can turn to the Cloud to be able to mime content and find the most significant words. One of the Cloud services around is the Microsoft Text Analysis Cognitive Service. So let’s take an example:

Theresa May has said proposed changes to social care funding in 
England will now include an absolute limit on the money people
will have to pay.

If you use the cognitive services within the Microsoft Cognitive Services we get [link]:

Here is the code we are using:

import httplib, urllib, base64
import sys
import json
headers = {
# Request headers
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': 'KEY GOES HERE',
}
params = urllib.urlencode({
# Request parameters
'numberOfLanguagesToDetect': '1',
})
text="This is an important message"
body = "{'documents': [{'id': 'test001',\'text':\'"+text+"\'}]}"
print

try:
conn = httplib.HTTPSConnection('westus.api.cognitive.microsoft.com')
conn.request("POST", "/text/analytics/v2.0/keyPhrases?%s"
% params, body, headers)
response = conn.getresponse()
data = response.read()
conn.close()
d= json.loads(data)
s1 = d['documents'][0]
ll = s1['keyPhrases']
for member in ll:
print member
except Exception as e:
print("[Errno {0}] {1}".format(e.errno, e.strerror))

If we take a hotel review of [here]:

6 of us stayed here for the weekend. The first thing we noticed on 
entering our room was how small it was. Our rooms were clean, but the
bed cover was stained. The furniture was really outdated, especially
the bathroom, which had an old pink suite and linoleum on the floor.

we get:

Most significant phrases:
old pink suite
bed cover
linoleum
bathroom
floor
rooms
thing
furniture
weekend

We can now add sentiment analysis [here], where a sentiment score close to 1 defines a positive sentiment, while one near zero indicates a negative sentiment:

We can see the sentiment is near zero, so it is a negative comment. For a positive review we see the score is near 1 [here]:

But sentiment analysis still has a bit to go to really understand us, as you can see the score here is fairly near 1, but it is a negative posting [here]:

Conclusions

Well … humans are complex … and machines still don’t quite understand us. When they do, we should worry!