On a pre-COVID-19 Friday night, I decided to stay in and study data science instead of attending a public celebration. I  started with the "Complete Data Science Training: Mathematics, Statistics, Python, Advanced Statistics in Python," and then decided to look into Google Cloud Platform as well.

On Saturday, I woke up and decided to skip the festival again, preferring instead to study data science and GCP. Coincidentally, I had been wondering how many other locals felt like they would rather stay in than join the celebration, and I decided to utilize my data science studies to gain a better understanding of general public opinion. 

My first and only idea, based on some previous work developed by my teammates at Avenue Code, was to capture Tweets using the Filtered Stream provided by Twitter, then analyze the content's feeling using GCP Natural Language API. 

I quickly designed some architecture that was simple but would do the job flawlessly. Besides the two services mentioned, I also used GCP Pub/Sub and GCP Firestore, as shown below:

 

Sentiments Architecture.png

 

I could have used GCP Functions to avoid running code directly in my machine. I guess I thought about that when I was designing the solution - go all serverless, right? - but the time taken to learn one more service  wasn't worth the pay off in this POC.

Almost 10k Tweets later, I plotted the information using
ChartJS. The x-axis represents the feeling of the Tweet. Dots closer to the number 1 represent a positive feeling, and dots closer to the number -1 represent the opposite. The y-axis represents the intensity of the discourse. The higher the number, the more intense the feeling is. The result was simple and effective:

carnival-1

Since this was a purely experimental assessment, I didn't spend too much time analyzing the results. It's a well-known phrase that: "worse than no data is data without context." The circumstances in which I obtained the information are so unique that we could end up in Simpson's Paradox. That said, the premise of my study was relatively simple and had the potential to yield effective results.  

The code is provided in this repository within the MIT License. 


Author

Marcio Viegas

Marcio Viegas is a Technical Manager at Avenue Code. He loves learning and is passionately curious. He enjoys studying different subjects that could be related to technology or not and considers that his greatest and most valuable professional skill is his ability to bring together his diverse knowledge and creativity to propose solutions for complex problems.


Testing Data Pipelines with Behavior Driven Development (BDD)

READ MORE

Pillars of the Cloud Center of Excellence (CCoE)

READ MORE

Data: An Executive Briefing

READ MORE

How the Mulesoft JWT Validation Policy Works

READ MORE