Unlock the Magic: Data Science with R’s Enchanting Elixirs

Forget bubbling demagogues and cryptic chants – the modern data scientist wields R, and their laboratory brims with potent packages. Today, I unveil three essential packages for deriving data-driven insights: e1071, ggplot2, and caret. Brace yourselves, fellow data scientists, for we’re about to transmute raw data into shimmering pure gold!

If you are just starting out with programming, consider looking into my intro to programming textbook using R.  If you prefer a video format, I also have a video series on the topic.

1. Elemental Essence: e1071

Think of e1071 as your alchemist’s cabinet, overflowing with potent algorithmic elixirs. From fiery linear regressions to swirling support vector machines, it offers a dizzying array of tools to unravel the mysteries of your data. Whether you seek to predict customer churn with the precision of a crystal ball or cluster market segments like constellations, e1071 fuels your analytical fire.

If you are interested in getting started modeling with R, I would suggest the Introduction to Statistical Learning with R (ISLR 2nd Edition Affiliate Link, Non-Affiliate Free PDF Link).  If you prefer a video format, I created an intro to machine and statistical learning video series.

2. Crystallize Clarity: ggplot2

Data may whisper its secrets, but ggplot2 amplifies them into dazzling visual tapestries. This package is your potion for transmuting numbers into breathtaking graphs, charts, and maps. With its intuitive incantations and boundless flexibility, ggplot2 isn’t just for eye candy – it’s about weaving narratives from data that captivate both the scientist and your broader audiences.

3. The Crucible of Model Curation: caret

Crafting the perfect machine learning model can be a chaotic art. But fear not, aspiring alchemists – caret will create an orderly way to manage the art. This package orchestrates the entire process, from data cleaning to model training. With caret, you can experiment with algorithms like alchemical ingredients, optimize hyperparameters with practiced precision, and ultimately declare the champion model, ready to unlock the secrets of your data.

So, how do these three reagents form the Data Alchemist’s ultimate elixir?

  • e1071 provides the raw power of algorithmic transmutation.
  • ggplot2 crystallizes insights into mesmerizing visual clarity.
  • caret stirs the cauldron of model creation with masterful efficiency.

Mastering these tools equips you to tackle real-world problems with the wisdom of Merlin himself. Predict stock market fluctuations, optimize resource allocation, or discover hidden patterns in social media – the possibilities are endless.

This is just the first step on our data scientist journey. Stay tuned for deeper dives into each package, secret spells for data wrangling, and thrilling adventures in the uncharted lands of data science. Now, grab your beakers, fire up R, and let’s transform the world with the alchemy of code!

Are there additional topics regarding data science you would like me to cover next? Consider reaching out to let me know what I should talk about next time!

Note: Bard was used to help write this article.  Midjourney was used to help create the image(s) presented in this article.

A Perspective on Googling “Health Care”: From 2008 to Now

A few days ago, Nate Silver stated here the following:

“We see that Google searches for “health care” — although not a perfect proxy for media coverage — have spiked for about a week at a time, only to fall back down again. Which could reflect the media’s short attention span for the story, or the public’s.”

This got me thinking: what has been the relative interest in the current Republican health care attempt at health care?  So I extended the time frame analyzed to be February 1, 2008, to July 6, 2017.   I recreated the plot below in R using ggplot2 (and provide the code to create it at the end of the post).

 

The figure shows the relative interest in searching “health care” in Google over time.  The x axis is the date.  The y axis is the interest relative to the most popular time “health care” was searched.  In this case it was when the March 2010.  The scale goes from 0 to 100, where 0 is not as searched as relative to the most popular point.  A 50 means that the term was only has as popular. 100 means that it was just as popular.  We can discuss if this is a good or bad metric, but let’s table that for another time (since it’s a long discussion).   In short, sometimes it’s good, others it’s bad.

The blue dot with a white triangle indicates the month where President Obama announced to a joint session of Congress he would actively pursue health care reform. The green dot with a white center dot indicates when Congress went on recess in August 2009. It was during this recess when a particularly large number of members of Congress first encountered the Tea Party. The blue dot with the cyan center indicates when the Affordable Care Act (ACA), aka Obamacare, passed in Congress. The green point with a white triangle is when the United States Supreme Court stated that the ACA was constitutional since its was considered a tax. The red and white point indicates when the House failed to pass health care reform in March 2017. The red dot and white triangle point indicates when the House passed the American Health Care Act (AHCA) to repeal and replace the ACA in May 2017.

I pointed out some of these events to give an idea of how popular searching “health care” was during some other events.  Note that the popular moment for searching was when the ACA passed.  However, what’s interesting is that people seemed much more engaged and interested in finding more about health care leading up to passing the ACA.  This does not appear to be the case for GOP’s attempt at passing health care.  Events that appear to be more similar in interest to the GOP’s attempts is when the Supreme Court revealed their judgement on the constitutionality of the ACA.

In short, this means that the public has been pretty disengaged with the GOP’s attempts at health care reform!

This raises a lot of interesting questions.  Why is it that people appear less interested this time around?  Here are three (possible) ideas I have:

  1. Health care is messy.  Passing health care is complicated and confusing.  People do not want to think about reworking the health care system again! (I’m not aware of data to support this claim.  So it’s a complete shot in the dark.)
  2. There are a lot more distractions this time around.  With contention between Trump and the media and recent missile tests from North Korea just to name two.  (Again, can’t find data.)
  3. There’s simply too little information available for the public to easily digest on the GOP’s attempts at reforming health care.  While the House’s bill is very unpopular according to Nate Silver, there is also a sizeable chunk on undecideds according to the YouGov poll.  When the ACA was in the works, the process was long and time consuming.  This attempt has been much faster (since it has come, died, and then been resurrected).  This has prevented the public from really thinking about it. (Yay! Data!)

If you have any ideas (and/or data) to investigate this further, I’d love to hear about it!  You can tweet it at me!

library(ggplot2)

dat<-read.csv(file="multiTimeline.csv", sep=",", header=FALSE) #note that I did remove the top of the csv file downloaded directly from Google

colnames(dat)<-c("Date", "Rel")

df<-as.data.frame(dat)

aca<- data.frame( Dat = "2010-03", Rel = 100 ) #when ACA was passed
oba<-data.frame(Dat = "2008-02", Rel = 29 ) #when Obama announced
                                          #intention to pass health care

house1<-data.frame(Dat="2017-03", Rel = 32)#Rep House fails to vote on AHCA
house2<-data.frame(Dat="2017-05", Rel = 29)

sc<-data.frame(Dat="2012-06", Rel = 33) #SCOTUS decision on ACA and taxes

tea<-data.frame(Dat="2009-08", Rel = 57)# congress recess of aug 2009

mytheme<-theme(

	plot.title = element_text(lineheight=1.5, size=35, face="bold"),
	axis.text.x=element_text(size=23),
	axis.text.y=element_text(size=23),
	axis.title.x=element_text(size=28, face='bold'),
	axis.title.y=element_text(size=28, face='bold'),
	strip.background=element_rect(fill="gray80"),
	panel.background=element_rect(fill="gray80"),
	axis.text=element_text(colour="black")

	)

#general setup
p<-ggplot(data=df, aes(x=Date, y=Rel, group=1))+geom_line()+
   geom_point(data=df, aes( x=Date, y=Rel ))+
   xlab("Date")+
   ylab("Relative Interest")+
   ggtitle("Realtive Interest in\nHealth Care Over Time ")+
   theme(plot.title = element_text(hjust = 0.5) )

#important points
p<-p + geom_point(data=aca, aes(x=Dat, y=Rel), color="blue", size=4 )+
       geom_point(data=aca, aes(x=Dat, y=Rel), color="cyan" )+

       geom_point(data=oba, aes(x=Dat, y=Rel), color="blue", size=4 )+
       geom_point(data=oba, aes(x=Dat, y=Rel), color="white", shape=17)+

       geom_point(data=house1, aes(x=Dat, y=Rel), color="red", size=4 )+
       geom_point(data=house1, aes(x=Dat, y=Rel), color="orange")+

       geom_point(data=house2, aes(x=Dat, y=Rel), color="red", size=4)+
       geom_point(data=house2, aes(x=Dat, y=Rel), color="white", shape=17)+

       geom_point(data=sc, aes(x=Dat, y=Rel), color="forestgreen", size=4)+
       geom_point(data=sc, aes(x=Dat, y=Rel), color="white", shape=17)+

       geom_point(data=tea, aes(x=Dat, y=Rel), color="forestgreen", size=4)+
       geom_point(data=tea, aes(x=Dat, y=Rel), color="white")


#adding general layout
p<- p + mytheme + scale_x_discrete(breaks = c("2009-01", "2011-01",
                                               "2013-01", "2015-01", "2017-01"),
                                   labels = c("2009", "2011",
                                              "2013", "2015",  "2017")
                                              )

p

ggsave("rel_health.png")