- shiny server on Amazon AWS EC2 in 5 minutes

Introducing the shiny web application framework to the R community is like telling NFL fans who Jerry Rice is. So I won’t do it. However, even though almost every R developer has heard of shiny or even played around with it locally, a lot fewer have actually deployed their apps on a web server. Deploying the entire stack is not overly complicated for a seasoned web developer, but not entirely trivial to the wide array of non-computer science researchers who use R.

Admittedly, there is a sorrow-free hosted solution offered by the creators of shiny at But you won’t have full control over you your own webserver then to test out things like proxy server etc. If you wan’t to set it up on your own and really understand whats going on server side, I’ll give you a kick start with an install script that installs the entire stack on a butt naked ubuntu 14.04 LTS server.

Ok, so obviously you need a server first. There are tons local providers with attractive conditions and more personal support than amazon, but if you don’t know which provider you should use, amazon is a good choice. In most parts of the world it’s one of the easiest ways to get a server. Just go to website of the amazon aws ec2 cloud computing program and get it done. Servers there come at all sizes core and RAM wise and set up for various usage. Just get a t2, micro and the first year will be free. After one year amazon will charge you about 10 USD / month (as of April 2016 prices) Doesn’t sound like a super computer, but don’t worry you won’t be playing video games. It will do for your shiny sandbox.

Click through their menu (choose Ubuntu 14.04 LTS as an operating system and a location near you) and within a few minutes you’ll have a server up and running. Once you got your fresh web server, SSH to your server and get the install script below to run on your server.One way to do it, is to create an empty text file on the server. Then, on the console, open the file you just created with a text editor (vim is my choice below, but any other editor will do.). If you are new to vim, make sure to RTFM, otherwise you won’t even know how to leave it. Btw here’s a shorter vim commands cheat sheet. I guess nano should be pre-installed too on AWS ubuntu, if you really want another editor.

sudo vim

paste the following content to it.

Last but not least you need to make the file executable and run it.

sudo chmod +x

By now your server should be installing:

  • updates
  • R
  • nginx (I will describe another post how this can be useful as proxy server)
  • several R packages
  • shiny server
  • git version control

I guess, that’s enough for one post. By default your nginx server listens on port 80, while the shiny server listens on 3838. Basically you are ready to access same sample content that ships with nginx and shiny server via your favorite web browser! Next time I will describe how to use nginx as a proxy and siny server behind it, so you can have password protected shiny app alongside a public one.


-> View comments and comment on this posts using disqus. <-

Document the Right Way - Getting Started With Sphinx on OSX

Though I do understand that proper documentation is inevitable, writing it up remains an unpleasant task to me. And, unfortunately, many developers feel the same way. Yet, ‘new’ efficient toys to auto-generate a shiny documentation rendered as either .pdf or .html, help to trick yourself into actually liking to write up a documentation for your latest piece of software.

One of those static documentation generators named sphinx has crossed my path several times. The fact that its templates have gotten sweeter than Ben and Jerries, plus sphinx still being there after all these years, has made me set it up on my local machine. It’s not particularly difficult to set up, but there are a couple of pitfalls along the way. So I documented how to set up the python driven sphinx documentation tool on OSX (I did it for Macbook Pro (11,1) with Yosemite 10.10.5, but this should work for other OSX setups as well.)

RTD theme

I have no idea why sphinx is called sphinx, but sphinx seems to be popular a name for a piece software. I know of at least 3 programs that go by that name and I am sure there are several more. So when you looked up sphinx on google or when you use a package manager / installer like homebrew make sure you are actually downloading the right package. Btw don’t use homebrew in this case - it will install another software also called sphinx…

Getting your python setup right

If you do not have python’s pip install manager yet - go get it! It’s super useful in general. Even if you got it, you might want to run the second line…

sudo easy_install pip
pip install --upgrade pip

Once you got pip, it might be time to update your python. If you are not a regular python user and still have whatever python shipped with your OS, it might be about time to

brew reinstall python
brew unlink python
brew link --overwrite python

Getting the right sphinx

Once python is updated and linked again, use pip to install sphinx. If you see a sphinx version number and help, you succeeded. Congrats!

pip install sphinx
sphinx-build -h

Find a folder in which you want to start documenting and run the quickstart. But, dont forget to get a nice template before you render your test. I really like the read the docs style template.

Get the read the docs template and test it

pip install sphinx_rtd_theme

and follow the procedure. Quickstart will lead you through several questions helping you to set up the folder structure. I recommend to separate source and build, so you will end up with two folder. The source folder will contain your .rst file, the root folder will contain a make file. So just run

make html
open build/html/index.html

to build a html documentation and open it in your favorite (default) browser. You can several other formats, most notably pdf, in case you got LaTeX installed. simply run.

make pdflatex

Note, just running

make latex

would give you the tex .files without the .pdf. Admittedly, a documentation like that makes software look professional…


RTD theme on github

sphinx documentation

Pip install (guess what the documentation looks like…)

-> View comments and comment on this posts using disqus. <-

Insert R data.frames to PostgreSQL

Now that I have digested one of the few online-free times of the year, I’d like to share an R function that inserts R data.frames into existing PostgreSQL tables. The function suggested below is an alternative to the dbWriteTable implementations that rather focus on creating new tables based on data.frames (such as CRAN packages RPostgreSQL or RMySQL). Besides, a more exotic extension of dbWriteTable can be found in the caroline package. It’s creatively called dbWriteTable2 and provides improvement when appending data to existing non-empty tables. Recently I experienced some trouble related to the use of the nchar function in the internals of dbWriteTable2 which I found difficult to hunt down.

So I put together a small piece of code to wrap up standard INSERT INTO behavior into an R function. This basic version of my insert row function is designed to work with PostgreSQL, but adaptions to other SQL dialects should not result in a major challenge. Also, for now I assume that there’s a single column primary key.

insert_rows <- function(dframe,con,
                        tbl = "a_sql_table",id_col="your_pk"){
  # check available columns
  acols <- dbGetQuery(dbs$archive_connection,
                      sprintf("SELECT column_name 
                               FROM information_schema.columns
                               WHERE table_name='%s'",tbl))
  okcols <- names(dframe)[(names(dframe) %in%
  # this is only relevant for debugging
  invalid_cols <- names(dframe)[!((names(dframe) %in%
  # cast it to character before creating characters, 
  # otherwise pasting can lead to really strange behavior
  dframe[] <- lapply(dframe,as.character)
  out <- list()
  for(i in 1:nrow(dframe)){
    query <- sprintf("INSERT INTO %s (%s) VALUES (%s)",
    # replace NA with NULL
    query <- gsub("'NA'",'NULL',query)
    tryCatch({dbSendQuery(con,query)},error=function(e) cat(paste0(dframe[i,id_col]," could not be entered. \n")))
  cat("These columns are not part of the database: \n")

Note that, pasting together an INSERT query string that contains the entire statement instead of looping will considerable speed things up. Rewriting the function in such a way is useful when large tables need to be updated using R on a regular basis. If your query string gets to large you will need to consider C_Stack size and split your query string into chunks, but that’s beyond the scope of this post and irrelevant to many users looking for quick and easy way to store R data.frames to PostgreSQL.

-> View comments and comment on this posts using disqus. <-

On the Radar Steph Curry vs. 93 Michael Jordan

Just when we thought we shook off the last pesky next-MJ-writer, this Steph Curry kid shows up to check the wingspan of our jaws. With Kobe Bryant announcing retirement recently and LeBron playing too much freight train style to draw comparisons we thought we were finally done with the search for the next MJ that annoyed basketball fans for more than a decade. If it’s just for the lack of words the basketball world has for #30’s play, the ultimate benchmark of modern basketball seems appropriate. Obviously, given Jordan’s overall achievement and legacy which is light years ahead of Curry despite all the hype, this cannot be a career or legacy comparison. It rather has to be a particular Jordan season vs. Curry’s current season.

plot of chunk jordancurry

To me, drawing this comparison seems to confirm what was rather a gut feeling when I was watching the Dubs in April last year: Steph Curry is the most unique talent since Michael Jordan. He is still unlikely to be like Mike in the long run but he has got the unique once-in-a-generation talent to influence the way the game is played.

Despite his sub par frame he has leveled his weaknesses and optimized his handle, stroke and court vision to become ridiculously efficient in many aspects of modern basketball. Very much like Jordan he has studied the game, its environment and himself to find ways to be ahead of any others. Offensively, Curry even managed to be ahead of the game’s GOAT: 1) he is a substantially better free throw shooter, 2) he has a way better 3 point field goal percentage despite racking up more 3s, 3) he even has a slightly better field goal percentage while scoring about the same ppg (while sitting out fourth quarters) and is two years younger than Jordan was in his stellar 93 season. Yet, his league 2nd steals per game cannot trick us into believing he is anywhere near Jordan as a one-on-one defender.

How to create this chart

It doesn’t always have to be D3 or highcharts. Sometimes out-the-gates-quick is also nice. Sure, javascript based charts aren’t that big of a deal anymore, but still it requires some additional knowledge – particulary in setting up things. Plus, the charts presented above can easily be embedded in markdown or LaTeX based pdf reports. The radar charts are plain R and can be easily created with Minato Nakazawa’s fmsb package which is available from CRAN. Note the nice pdensity parameter to fill the polygon.

-> View comments and comment on this posts using disqus. <-

How the Golden State Warriors Pile Up Points

19 games into the new NBA season, the Golden State Warriors have yet to loose a game. Sports casters are searching for words to express their surprise and fascination. The NBA regular season finally has the kind of story it did not have in years. The reigning champs play a brand of basketball that makes many of us believe we are witnessing a paradigm shift in pro basketball. Plus the Dubs got Steph Curry who might become the first MVP to earn an MIP award.

Since it’s hard to find the right words I thought the Warriors recent dominance might be a nice opportunity to say something using waffle charts. Some people describe waffle charts as rectangular pie charts. The waffle chart below shows how the Warriors pile up points: first and foremost they score a lot and they hit an unprecented amount of threes. Their 3-point shooting accounts for more than one third(!!) of their total points…

gsw + theme(legend.position="bottom") +
  ggtitle("How do the Warriors Score?")

plot of chunk waffle1

Last year’s finals foe (no stranger to trey either), who continues to be among the NBAs elite teams cannot keep up by any means: fewer total points, fewer 3s, fewer 2s and fewer freebies. Plus, 3-point shots account for fewer that 30 percent of the Cavs points. Picture this: If the Warriors had not made a single free throw they would still have scored more points than the Cavs.

cle + theme(legend.position="bottom")

plot of chunk waffle2

How to create this chart

Actually I had seen such charts before, but admittedly learned just recently that people call them waffle charts. However, I always thought they were a nice alternative to the boredom and limited information of regular pie charts. Plus, the come with an extra creative potential as the single squares can be replaced by icons – a technique we often encounter in well designed magazines. But, let me stay with the most basic example for now.

After I knew the name it is was easy to find out the obvious: there is an R package for that. Actually Bob Rudis’ waffle R package is available from CRAN and is thus easy to install. Also the package itself is very intuitive to use and even beginneRs can create waffle charts within minutes even if they have never done it before. The waffle package is built on top of Hadley’s ggplot2, which means waffle charts can be customized using the mighty ggplot2 syntax and they profit from ggplot’s great documentation. And here’s the source code to reproduce my little example:

What would an internal point distribution look like?

-> View comments and comment on this posts using disqus. <-


When I started my PhD in applied economics, I thought I knew a thing or two about data juggling and statistical computing. Know I know that I didn't just wish that I did.

What would it have been like had I known everything I know now about R, knitr, PostgreSQL, Git and software carpentry in general when I started? Jeeeez, yet alone having a fully configured sublime text editor would have made me feel like the rabbit who all of sudden got the gun. Not to mention all the great resources like Hadley’s documentations or a 10 million (!) question strong stackoverflow (100K(!) + of which are questions tagged R. Or just think of visualization.

This simple thought experiment has motivated me to come up with yet another blog, despite the abundance of data related resources on the web. What’s more striking to me than the extreme number of blogs, twitter accounts and libraries that discuss data, is how many of them are actually useful. Yes, there’s a library or two for data visualization. But somehow there is plenty of room for the likes of d3, chart.js, ggplot2, morris.js, rcharts, raphael and many more to peacefully co-exist. Or think about all the different approaches that bring R to the web from opencpu, RApache to shiny, Rstudio Server to smaller projects like plumber. And I haven’t even mentioned iPython Notebook. The point is, every single resource seems to add another perspective, help a particular use case or simply aggregate information in a field that hasn’t stopped to emerge.

The aim of this blog is to be a web log in its very traditional sense. I log what I encounter on my journey through the data jungle. Gotchas and pitfalls alike. From small snippets to comprehensive manuals and discussions. From programming to statistics. From software carpentry to data driven wizardry. From serious economics to how I beat my pals in fantasy sports leagues.

I hope you enjoy the thoughts, tips, links, code examples and the crap, too. Thanks for reading!


P.S.: Thanks to @mdo for Hyde and making github look like it does.

-> View comments and comment on this posts using disqus. <-