And Where Does Your Company Do Its Market/Cyber/Brand Analysis?

Forget Twitter, The Future of Open Data Analysis … Reddit

And Where Does Your Company Do Its Market/Cyber/Brand Analysis?

Forget Twitter, The Future of Open Data Analysis … Reddit

At one time you would have a whole team of people who cut out newspaper snippets for your company, and then distribute them around the company. But, as Bob Dylan would say, “Times Have Changed!”. The usage of Big Data is now at the heart of driving companies forward. It gives them the heartbeat of the how their company is being perceived. What worries our customers? How dislikes us? Why do they like us? How is our brand perceived? At one time, a company would pay a marketing agency to do this, and they would happily go out and survey some customers … “84% of customers prefer cat food to dog food (based on a survey of 83 people)”. But not any more. One of the easiest sells for an organisation is to buy a SIEM solution, as the data can be used for security analytics, marketing, sales, and so on. “How many people with iPhone’s visit our site?”, “When our customers from Japan visiting our site”, “Did that promotion we ran, increase Web activity and sales?”. This is the Splunk approach … mine the data and then use it in whatever way you want.

But that is closed sourced data. The world has so much more, and it’s open. And so open source intelligence (OSINT) is grabbing the world in its ability to use the data for both cybersecurity analytics — who is going to hack us next — and for business analytics. Twitter has long been used as a place to monitor trends and market data:

But while Twitter, Facebook and LinkedIn have been the places that must organisations have gone to see what their customers think: it is now Reddit — a crowd-source content site — that provides the heartbeat of the Internet. In fact it is now the 18th most popular site on the Internet.

A crowd-source content site involves people coming together to share ideas. In professionally-focused sites such as Stack Overflow we often have high levels of trust in the users of the site. Reddit supports millions of communities and which are defined as “subreddits”, and which begin with “/r/”. For Golang, for example, we have a subredit of “r/golang/”.

Users tend to use pseudonyms for the ID, and where is can be difficult to trace their identity. Credibility for an identity, though, includes “Date joined”, “Post karma”, “Comment karma” and a “Trophy case”. Any user with an account can created their own subreddit, but these are moderated by moderators — and who are volunteers — and who can edit and delete content, along with the rights to remove users. Users can upvote or downvote posts, and which increase or decrease the visibility of a posting. A key focus of the site is often to filter in terms of hot, new, controversial, and rising topics. There are some abbreviations that are common, including: OP (original poster), TIL (Today I learned), and AMA (Ask Me Anything).

To create a data miner for Reddit, you go to https://www.reddit.com/prefs/apps and define a new scripted app:

This then generates:

Within the Reddit API was can search for keywords or for the Top posts on a subreddit. In the following we search for the last 10 tweets for the key word of “cryptocurrency” [here]:

import praw
import pandas as pd
import datetime as dt
import sys

reddit = praw.Reddit(client_id='xxxxx', \
client_secret='xxxxx', \
user_agent='xxxx', \
username='xxxxx', \
password='xxxx')
search_term='cryptocurrency'
if (len(sys.argv)>1):
search_term=(sys.argv[1])
print "Search term: ",search_term
subreddit = reddit.subreddit(search_term)
top_subreddit = subreddit.top(limit=10)
for submission in subreddit.top(limit=10):
print "=ID: ",submission.id
print " Title: ",submission.title.encode('ascii', 'ignore')
print " Score: ",submission.score
print " URL: ",submission.url
print " Text: ",submission.selftext[:100]

A sample run is [here]:

Search term:  cryptocurrency
=ID: 7r0ftz
Title: CryptoNick is deleting all of his BitConnect videos, and so are his buddies. Please never forget what he and his cohorts did to so many people, and how much money those people lost in the process thanks to CryptoNick, Trevon James, and Craig Grant!
Score: 26503
URL: https://www.reddit.com/r/CryptoCurrency/comments/7r0ftz/cryptonick_is_deleting_all_of_his_bitconnect/
Text: We can't let these legendary affiliate scammers get away with what they did, and we have to show the
=ID: 7vga1y
Title: I will tell you exactly what is going on here, this is critical information to understand if you are going to make money in this space. How prices work, and what moves them - and it's not money invested/withdrawn.
Score: 20145
URL: https://www.reddit.com/r/CryptoCurrency/comments/7vga1y/i_will_tell_you_exactly_what_is_going_on_here/
Text: /edit: Hi /r/all. While I have your attention, I want to take 5 seconds of your time and bring some
=ID: 7sx5ze
Title: Robinhood is launching a Crypto Trading app to compete with Coinbase
Score: 19969
URL: http://blog.robinhood.com/news/2018/1/24/dont-sleep
Text:
=ID: 80xb4n
Title: Checkmate, Bill.
Score: 19626
URL: https://i.redd.it/vmcf9d93dzi01.jpg
Text:
=ID: 7qr6ky
Title: Delta's app store description seems appropriate today.
Score: 18473
URL: https://i.imgur.com/qcDbWMz.png
Text:
=ID: 7raztw
Title: Listen up folks, if you "did", or still do promote cryptocurrency related scams, you will be called out on it via this sub-Reddit. We don't care about you, or your ill-gotten gains, we care about the general well-being of our community first and foremost.
Score: 17890
URL: https://www.reddit.com/r/CryptoCurrency/comments/7raztw/listen_up_folks_if_you_did_or_still_do_promote/
Text: So apparently, some of you known scammers are getting a little butt-hurt about being added to our kn
=ID: 8eto2e
Title: Nasdaq is open to becoming cryptocurrency exchange, CEO says
Score: 17161
URL: https://www.cnbc.com/2018/04/25/nasdaq-is-open-to-becoming-cryptocurrency-exchange-ceo-says.html?__source=sharebar|twitter&par=sharebar
Text:
=ID: 7r4vlc
Title: Why we won't have a long term bear market, and how to systematically pick your future investments in crypto
Score: 14596
URL: https://www.reddit.com/r/CryptoCurrency/comments/7r4vlc/why_we_wont_have_a_long_term_bear_market_and_how/
Text: With so much uncertainty right now it would be a good time to take some time to go over what happene
=ID: c1zlny
Title: The true power of Bitcoin
Score: 14548
URL: https://i.redd.it/h4n3169pq2531.jpg
Text:
=ID: 7s9zmc
Title: Great news from Korea! Banks will allow cryptocurrency trading again from today and next week account registration is opened again.
Score: 13412
URL: http://m.news.naver.com/read.nhn?mode=LSD&mid=sec&sid1=101&oid=001&aid=0009829657

If you are interested, we are creating a Data Science and Cybersecurity course, in partnership with The Data Labs. We will be running workshops on data mining for open source intelligence (OSINT) in October — with practical hand-s-on work in Python and Go — so watch this space.

Whether you’re a threat hunter, a marketing executive, or a sales executive in your company, you need to make sure that you have your finger on the pulse, and Reddit is one place that will give you some idea of how your company is being perceived, and what people really like and dislike.

And, if you want to create the next amazing company of the future … I would recommend that mining Reddit is the way to go … and Edinburgh is the place to do it (sorry, for the plug, and it’s an amazing city!).