A computer science filtering project

Before I took a break from my train of thought about censoring to make room for my Friday This Week post, I was messing around with thoughts about the news this week that there were some who thought that Facebook was censoring certain stories (or types of stories).  I took a look at Facebook and also at Twitter as news providers.  Certainly, when you go to traditional media, it’s up front in your face if they have a particular political lean.

But services that access everything and base their recommendations on trending topics operate differently.  We’ve all seen the concept of a trending topic – you might have been at an event where you were encouraged to “make out hashtag trending”.  I don’t know why – my mind wanders at times – but I thought about the Infinite Monkey Theorem.  You know the one – put enough monkeys in front of typewriters and allow them to type and they’ll eventually type the Bible.

Thanks, Wikimedia.

Of course, Facebook and Twitter don’t have a staff of monkeys!

If you watch trending topics long enough, you see the ebb and flow that goes by so quickly.  It couldn’t be handled by a person with a calculator either!  It has to be computer managed and generated.

Now it gets interesting.

One of the really engaging topics with computer science students is writing code to a solution that can’t be replicated in real life.

So, I started to mull around in my mind.  It’s got to be manageable in class so I boiled it down to this.

How could I ask a program to read a news story, analyse the content, and then come to the conclusion “This is a good story about Sanders/Clinton/Trump” – let’s run with it.  Or, “This is not a good story about Sanders/Clinton/Trump” – let’s not publish it.

The solution wouldn’t rely on the number of references to the story – that would be relatively simple with a counter – it would be to ask the computer program to rip apart the text and make a recommendation based on content.  That does get really interesting.

The idea for solution seems to me that you would have to set up a database of words/phrases that would be positive for a particular candidate and another that would be negative.  Then, you’d write a program that would parse the presented story and run it against the databases to see how many matches of a particular type there were.  I doubt that you’d ever be 100% correct so there would have to be a confidence level before you make your ultimate decision.  Now, actually writing it would be within the skillset of most Grade 12 students.  It’s generating the content for this miniAI project that would be interesting, a challenge for students, and an opportunity for them to think deeply about current issues and also the social impact that their program could conceivably have if it was used in real life.  In particular, what would be the words/phrases that should be used? Do we need to consider the actual source of the document?  Not only would they have to be used, but the probability that a particular story would actually contain them is important too.  Do you reject a story where the political slant is not able to be determined.

This article could be very helpful. How to Recognize Bias in a Newspaper Article.  In particular for this example, point number 7 would be important.

I think it’s a lovely concept; one with no clear and definable solution but would let students realize just how coding is behind their social media.  No monkeys involved.

Then, the Wall Street Journal trumped me.  (I can’t believe I said that)

I ran across this resource.  Blue Feed, Red Feed

From a selected list of topics, it pulls stories from a Facebook feed and classify them as liberal or conservative.

I gave it a shot.




Maybe there’s purpose to this computer science stuff after all.  Of course, there’s a disclaimer about the content generated but the methodology is interesting reading.  It paints a whole new picture of what it takes to generate this.

Monkeys indeed!


OTR Links 05/21/2016

Posted from Diigo. The rest of my favorite links are here.