metaforr

Authors

Bhargav Kantheti (bk2899)

Ryuichiro Sonoda (rs4493)

Published

December 14, 2023

1 Introduction

R is fun. Why?

R is considered fun primarily due to its community and extensive package ecosystem. The active and supportive community fosters a very collaborative environment, making learning enjoyable. The wealth of packages available in R facilitates efficient data analysis, visualization, and modeling, enabling users to tackle complex tasks with ease. Moreover, the language’s flexibility encourages creativity and customization, allowing users to create tailored solutions. Its intuitive syntax and widespread use across academia and various industries further contribute to its appeal, making the journey of learning and using R an engaging and rewarding experience.

Python might let you fly, but R will make your data dance the tango.

Code
library("cowsay")
say("when in doubt, puRRR it out!", "cat")

 -------------- 
when in doubt, puRRR it out! 
 --------------
    \
      \
        \
            |\___/|
          ==) ^Y^ (==
            \  ^  /
             )=*=(
            /     \
            |     |
           /| | | |\
           \| | |_|/\
      jgs  //_// ___/
               \_)
  

Like the cat said, the innate advantage with R is how quickly we can visualize ideas.

[BUT]

Finding a good R library is demanding. ggplot2 is might not be the answer for everything. So, we have tasked ourselves to use statistical methods to explore and visualize entirety of CRAN packages using what we learnt from STATGR5702 at Columbia University.


1.1 Our Work

Results (our findings) will influence our builds.

  • Data: Find appropriate parameters to judge a package.
    1. Clean the available package’s data.
    2. Retain features that well describe a package’s relevance.
  • Results: Research the data.
    1. Relevance is measure via trending usage data and a package usage history available via pkgsearch package.
    2. Provide a comprehensive understanding of factors influencing a package’s popularity and characteristics within the ecosystem.
  • Interactive Graphs:
    1. Build a keystroke animation using d3 which animates dependency graph of queried package.
    2. Build a package suggester by tokenizing keywords associated with each package’s metadata.
    3. Build a package backlink tracer, which shows how many times another package found this package useful.

1.1.1 Where?


1.2 Outcome

We hope to make this website a one-stop-shop for anyone trying to find good R packages!


1.3 Data Sources

R packages used (for Results):

  • available [CRAN]: This package let us “Check if the Title of a Package is Available, Appropriate and Interesting”.
  • pkgsearch [CRAN]: This package helped us “Search CRAN metadata about packages by keyword, popularity, recent activity, package name and more. Uses the ‘R-hub’ search server, see https://r-pkg.org and the CRAN metadata database, that contains information about CRAN packages.”

Since our data sources themselves are packages, we think the repository will always be upto date thanks to github workflows! Here, let’s check number of packages our dataset has:

Code
suppressMessages(library(dplyr))
options(repos = 'https://cloud.r-project.org/')
available.packages() %>%
    nrow()
[1] 20200