Excel (PowerPivot)
R – Open Source
Python – Open Source
Ruby
SQL
SAS
Tableau
Crystal Ball (by Oracle)
Hadoop
Data Science Links
Business Intelligence Software Guide (by ConsumerAffairs.com)
Language/Tool: Excel (PowerPivot)
Main Purpose: Basic (and most well-known) way to process data for simple analysis purposes
Benefits of this resource: most well-known and widely used resource
Limitations: limits observations and for use with Microsoft Excel 2010 only. Excel 2013 support is in progress
Compatible with (file types?): xls(x)
Level of Expertise Required: Low
Where to access tool/Software: Free add-on for download http://www.microsoft.com/en-us/download/details.aspx?id=7609
Sources to Learn: http://technet.microsoft.com/en-us/library/gg399093.aspx
Back To Top
Language/Tool: R
Main Purpose: Large datasets (1M+ rows), statistical methods on datasets of any size
Benefits of this resource:
ggplot creates many different types of useful data visualizations
robust ecosystem--if you want to do something, there’s almost certainly a package that makes it easy to do
open source and free
Limitations:
no GUI, a little bit of programming is required
slow when working with huge datasets, can’t handle huge datasets as easily as SAS or Pandas (Python)
Compatible with (file types?):
probably all, definitely CSV, XLS(X), TXT, PSV
Level of Expertise Required: High
comfort with programming (programming for R in and of itself is not that tough, but you need to be willing to write commands instead of manipulating a GUI)
Where to Access Tool/Software: R system package free at http://cran.rstudio.com; R studio software free at http://www.rstudio.com/products/rstudio/download/
Sources to learn:
Back To Top
Language/Tool: Python
Main Purpose: web/app development, automation, data mining and analytics, data visualization
Benefits of this resource: More readable than many programming languages, and widely used among amateur and self-taught programmers. There are countless add-on library packages as well; often someone else has already done the work of writing a tricky piece of code, all you have to do is import it.
Limitations: slower than more advanced languages, open source libraries mean you might occasionally run into bugs
Compatible with (file types?): .py
Level of Expertise Required: High
Where to Access Tool/Software: free at https://www.python.org/downloads/ or for a more complete offering with multiple libraries and a few choices of command shells for easier coding and debugging: https://store.continuum.io/cshop/anaconda/
Sources to learn:
Code Academy (http://www.codecademy.com/learn)
Back To Top
Language/Tool: Ruby
Main Purpose: web and application development as part of Rails, mostly
Benefits of this resource: Maybe the most intuitive popular programming language, Rails makes it easy to build large-scale applications in minutes
Limitations: slow, larger ecosystem for non-Rails Ruby tasks (scripting) in Python
Compatible with (file types?): N/A
Level of Expertise Required: High
Where to Access Tool/Software: free at http://rubyonrails.org/download/
Sources to learn:
Code Academy (http://www.codecademy.com/learn)
Back To Top
Language/Tool: SQL
Main Purpose: Pulling data from databases and then in limited instances manipulating that data
Benefits of this resource:
Process extremely large datasets
Widespread implementation--used by a huge percentage of organizations to store data
Limitations: Very few resources for data analysis
Compatible with (file types?): .sql
Level of Expertise Required: Moderate
Where to Access Tool/Software: MySQL Community Server free at http://dev.mysql.com/downloads/mysql/
phpMyAdmin is a web based platform for learning SQL
Sources to learn:
Back To Top
Language/Tool: SAS
Main Purpose: Clean and combine large datasets for easier use; perform statistical analysis of the data
Benefits of this resource:
Process extremely large datasets
Easy to clean, manipulate, and merge datasets
Limitations: Poor visualization
Compatible with (file types?): XLS(X), CSV, TXT, ACCDB, ...
Level of Expertise Required: High
Where to Access Tool/Software: license only (expensive)
Sources to learn:
The Little SAS Book: A Primer by Lora Delwiche and Susan Slaughter
Back To Top
Language/Tool: Tableau
Main Purpose: Visualize data
Benefits of this resource: easy to use, especially if you know how to use pivot tables; makes a wide variety of dynamic visuals, including location-based
Limitations: not easy to create certain calculated variables
Compatible with (file types?): XLS(X), CSV
Level of Expertise Required: Low
Where to Access Tool/Software: free for students at http://www.tableausoftware.com/academic/students
Sources to learn:
http://www.tableausoftware.com/learn/training
Back To Top
Language/Tool: Crystal Ball (by Oracle)
Main Purpose: Spreadsheet based application for predictive modeling, forecasting, simulation, and optmization
Benefits of this resource:
Builds on Monte Carlo and predictive modeling tools
Provides optimization and calculation capabilities
Limitations: Primarily used for predictive and optimization analytics. Excel based.
Compatible with (file types?): XLS(X)
Level of Expertise Required: Low
Where to Access Tool/Software: TBD NYU stern students are offered the program for free
Sources to Learn: http://www.oracle.com/us/products/applications/crystalball/resources/index.html (under the resources tab)
Back To Top
Language/Tool: Hadoop
Main Purpose: Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.
Benefits of this resource:
Incredibly robust for large sets of data
Does not require high-end hardware
Currently open source but there are other vendors that have more sophisticated software utilizing the Hadoop platform (Cloudera, IBM BigInsights)
Limitations: Primarily used for visualizing big data. SQL based.
Level of Expertise Required: Medium
Where to Access Tool/Software:
Apache Hadoop: http://hadoop.apache.org/releases.html
Cloudera: http://www.cloudera.com/content/cloudera/en/downloads.html
SAS: http://www.sas.com/en_us/insights/big-data/hadoop.html
IBM: http://www-01.ibm.com/software/data/infosphere/hadoop/products.html
SPSS
SAS – Just staff and PHD
http://www.sas.com/en_us/software/university-edition.html
Are expensive
Both available in the NYU Bobst Library 5th floor
Look up NYU Data services with a list of software and there are helpers who can answer questions on how to use the software.
R is open source and can be downloaded free at R-project.org
Crystal Ball
Are all on:
https://apps.stern.nyu.edu
David Frederick
Back To Top
Data Science:
Code Academy for Data Scientists
https://dataquest.io
Open Source Data Science Masters
http://datasciencemasters.org/
Johns Hopkins - Data Science Specialization Certification
https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage
Duke University course on Data Analysis and Statistical Inference
https://www.coursera.org/course/statistics
Cognitir Webinar Series - Data Science for Finance
http://www.cognitir.com/webinars/big-data-finance/watch
Excel Power Pivot
Free Add-on for download
http://www.microsoft.com/en-us/download/details.aspx?id=7609
Where to Learn
http://technet.microsoft.com/en-us/library/gg399093.aspx
R
Free for Stern Students at (requires Citrix plugin download):
https://apps.stern.nyu.edu
Other free sources:
http://www.r-project.org
http://cran.rstudio.com
http://www.rstudio.com/products/rstudio/download/
Where to learn:
https://www.coursera.org/course/rprog
Sample Data Sets:
http://datasciencemasters.org
Python
Free sources:
https://www.python.org/downloads/
For a more complete offering with multiple libraries and a few choices of command shells for easier coding and debugging:
https://store.continuum.io/cshop/anaconda/
http://davebackus.github.io/Data_Bootcamp/install.html#python-on-your-computer
Where to learn:
http://www.codecademy.com
https://developers.google.com/edu/python/
https://dataquest.io
http://learnpythonthehardway.org
Sample Data Sets:
http://datasciencemasters.org
Ruby
Free sources:
https://www.ruby-lang.org/en/
Sources to learn:
http://www.codecademy.com
http://learnrubythehardway.org (This book is a work in progress)
SQL
Free sources and datasets:
http://dev.mysql.com/downloads/mysql/
http://www.phpmyadmin.net/home_page/index.php
Sources to learn:
http://sql.learncodethehardway.org (This book is a work in progress)
SAS
Free sources:
http://www.sas.com/en_us/software/university-edition.html#for-students-learners
Sources to learn:
The Little SAS Book: A Primer by Lora Delwiche and Susan Slaughter
Tableau
Free sources:
http://www.tableausoftware.com/academic/students
Sources to learn:
http://www.tableausoftware.com/learn/training
Crystal Ball
Free for Stern Students at (requires Citrix plugin download):
https://apps.stern.nyu.edu
Sources to learn:
http://www.oracle.com/us/products/applications/crystalball/resources/index.html (under the resources tab)
General resources for Coding and Data scientists
www.coursera.org
www.codeacademy.com
Back To Top
R – Open Source
Python – Open Source
Ruby
SQL
SAS
Tableau
Crystal Ball (by Oracle)
Hadoop
Data Science Links
Business Intelligence Software Guide (by ConsumerAffairs.com)
Language/Tool: Excel (PowerPivot)
Main Purpose: Basic (and most well-known) way to process data for simple analysis purposes
Benefits of this resource: most well-known and widely used resource
Limitations: limits observations and for use with Microsoft Excel 2010 only. Excel 2013 support is in progress
Compatible with (file types?): xls(x)
Level of Expertise Required: Low
Where to access tool/Software: Free add-on for download http://www.microsoft.com/en-us/download/details.aspx?id=7609
Sources to Learn: http://technet.microsoft.com/en-us/library/gg399093.aspx
Back To Top
Language/Tool: R
Main Purpose: Large datasets (1M+ rows), statistical methods on datasets of any size
Benefits of this resource:
ggplot creates many different types of useful data visualizations
robust ecosystem--if you want to do something, there’s almost certainly a package that makes it easy to do
open source and free
Limitations:
no GUI, a little bit of programming is required
slow when working with huge datasets, can’t handle huge datasets as easily as SAS or Pandas (Python)
Compatible with (file types?):
probably all, definitely CSV, XLS(X), TXT, PSV
Level of Expertise Required: High
comfort with programming (programming for R in and of itself is not that tough, but you need to be willing to write commands instead of manipulating a GUI)
Where to Access Tool/Software: R system package free at http://cran.rstudio.com; R studio software free at http://www.rstudio.com/products/rstudio/download/
Sources to learn:
Back To Top
Language/Tool: Python
Main Purpose: web/app development, automation, data mining and analytics, data visualization
Benefits of this resource: More readable than many programming languages, and widely used among amateur and self-taught programmers. There are countless add-on library packages as well; often someone else has already done the work of writing a tricky piece of code, all you have to do is import it.
Limitations: slower than more advanced languages, open source libraries mean you might occasionally run into bugs
Compatible with (file types?): .py
Level of Expertise Required: High
Where to Access Tool/Software: free at https://www.python.org/downloads/ or for a more complete offering with multiple libraries and a few choices of command shells for easier coding and debugging: https://store.continuum.io/cshop/anaconda/
Sources to learn:
Code Academy (http://www.codecademy.com/learn)
Back To Top
Language/Tool: Ruby
Main Purpose: web and application development as part of Rails, mostly
Benefits of this resource: Maybe the most intuitive popular programming language, Rails makes it easy to build large-scale applications in minutes
Limitations: slow, larger ecosystem for non-Rails Ruby tasks (scripting) in Python
Compatible with (file types?): N/A
Level of Expertise Required: High
Where to Access Tool/Software: free at http://rubyonrails.org/download/
Sources to learn:
Code Academy (http://www.codecademy.com/learn)
Back To Top
Language/Tool: SQL
Main Purpose: Pulling data from databases and then in limited instances manipulating that data
Benefits of this resource:
Process extremely large datasets
Widespread implementation--used by a huge percentage of organizations to store data
Limitations: Very few resources for data analysis
Compatible with (file types?): .sql
Level of Expertise Required: Moderate
Where to Access Tool/Software: MySQL Community Server free at http://dev.mysql.com/downloads/mysql/
phpMyAdmin is a web based platform for learning SQL
Sources to learn:
Back To Top
Language/Tool: SAS
Main Purpose: Clean and combine large datasets for easier use; perform statistical analysis of the data
Benefits of this resource:
Process extremely large datasets
Easy to clean, manipulate, and merge datasets
Limitations: Poor visualization
Compatible with (file types?): XLS(X), CSV, TXT, ACCDB, ...
Level of Expertise Required: High
Where to Access Tool/Software: license only (expensive)
Sources to learn:
The Little SAS Book: A Primer by Lora Delwiche and Susan Slaughter
Back To Top
Language/Tool: Tableau
Main Purpose: Visualize data
Benefits of this resource: easy to use, especially if you know how to use pivot tables; makes a wide variety of dynamic visuals, including location-based
Limitations: not easy to create certain calculated variables
Compatible with (file types?): XLS(X), CSV
Level of Expertise Required: Low
Where to Access Tool/Software: free for students at http://www.tableausoftware.com/academic/students
Sources to learn:
http://www.tableausoftware.com/learn/training
Back To Top
Language/Tool: Crystal Ball (by Oracle)
Main Purpose: Spreadsheet based application for predictive modeling, forecasting, simulation, and optmization
Benefits of this resource:
Builds on Monte Carlo and predictive modeling tools
Provides optimization and calculation capabilities
Limitations: Primarily used for predictive and optimization analytics. Excel based.
Compatible with (file types?): XLS(X)
Level of Expertise Required: Low
Where to Access Tool/Software: TBD NYU stern students are offered the program for free
Sources to Learn: http://www.oracle.com/us/products/applications/crystalball/resources/index.html (under the resources tab)
Back To Top
Language/Tool: Hadoop
Main Purpose: Apache™ Hadoop® is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer.
Benefits of this resource:
Incredibly robust for large sets of data
Does not require high-end hardware
Currently open source but there are other vendors that have more sophisticated software utilizing the Hadoop platform (Cloudera, IBM BigInsights)
Limitations: Primarily used for visualizing big data. SQL based.
Level of Expertise Required: Medium
Where to Access Tool/Software:
Apache Hadoop: http://hadoop.apache.org/releases.html
Cloudera: http://www.cloudera.com/content/cloudera/en/downloads.html
SAS: http://www.sas.com/en_us/insights/big-data/hadoop.html
IBM: http://www-01.ibm.com/software/data/infosphere/hadoop/products.html
SPSS
SAS – Just staff and PHD
http://www.sas.com/en_us/software/university-edition.html
Are expensive
Both available in the NYU Bobst Library 5th floor
Look up NYU Data services with a list of software and there are helpers who can answer questions on how to use the software.
R is open source and can be downloaded free at R-project.org
Crystal Ball
Are all on:
https://apps.stern.nyu.edu
David Frederick
Back To Top
Data Science:
Code Academy for Data Scientists
https://dataquest.io
Open Source Data Science Masters
http://datasciencemasters.org/
Johns Hopkins - Data Science Specialization Certification
https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage
Duke University course on Data Analysis and Statistical Inference
https://www.coursera.org/course/statistics
Cognitir Webinar Series - Data Science for Finance
http://www.cognitir.com/webinars/big-data-finance/watch
Excel Power Pivot
Free Add-on for download
http://www.microsoft.com/en-us/download/details.aspx?id=7609
Where to Learn
http://technet.microsoft.com/en-us/library/gg399093.aspx
R
Free for Stern Students at (requires Citrix plugin download):
https://apps.stern.nyu.edu
Other free sources:
http://www.r-project.org
http://cran.rstudio.com
http://www.rstudio.com/products/rstudio/download/
Where to learn:
https://www.coursera.org/course/rprog
Sample Data Sets:
http://datasciencemasters.org
Python
Free sources:
https://www.python.org/downloads/
For a more complete offering with multiple libraries and a few choices of command shells for easier coding and debugging:
https://store.continuum.io/cshop/anaconda/
http://davebackus.github.io/Data_Bootcamp/install.html#python-on-your-computer
Where to learn:
http://www.codecademy.com
https://developers.google.com/edu/python/
https://dataquest.io
http://learnpythonthehardway.org
Sample Data Sets:
http://datasciencemasters.org
Ruby
Free sources:
https://www.ruby-lang.org/en/
Sources to learn:
http://www.codecademy.com
http://learnrubythehardway.org (This book is a work in progress)
SQL
Free sources and datasets:
http://dev.mysql.com/downloads/mysql/
http://www.phpmyadmin.net/home_page/index.php
Sources to learn:
http://sql.learncodethehardway.org (This book is a work in progress)
SAS
Free sources:
http://www.sas.com/en_us/software/university-edition.html#for-students-learners
Sources to learn:
The Little SAS Book: A Primer by Lora Delwiche and Susan Slaughter
Tableau
Free sources:
http://www.tableausoftware.com/academic/students
Sources to learn:
http://www.tableausoftware.com/learn/training
Crystal Ball
Free for Stern Students at (requires Citrix plugin download):
https://apps.stern.nyu.edu
Sources to learn:
http://www.oracle.com/us/products/applications/crystalball/resources/index.html (under the resources tab)
General resources for Coding and Data scientists
www.coursera.org
www.codeacademy.com
Back To Top