By Borough analysis of 2012 SAT scores
The R Script for this code can be found here
The data is the most recent school level results for New York City on the SAT from datazar.
The data set contains the following columns-
- Unique school identifier
- School Name
- Number of Sat Test Takers
- SAT Critical Reading Average Score
- SAT Math Average Score
- SAT Writing Average Score
In this analysis, I remove the columns with missing (or incomplete) data. They have an ‘s’ in place of the SAT information. We are able to obtain the borough from the unique school identifier. After we obtain the borough we can visually compare the distribution of each SAT score against the other boroughs. Finally we use a t test to compare the means to determine which scores are the same. (Null hypothesis is that the difference is 0)
- We find that 12 of the SAT scores between the 5 boroughs are statistically similar.
- All 3 of the scores (Reading, Math, Writing) are statistically similar within Manhattan, Queens and Staten Island. ..* This accounts for 9 scores (M = Q, M = SI, Q = SI for 3 scores)
- All 3 of the scores (Reading, Math, Writing) are statistically similar within Brooklyn and the Bronx.
- The scores of the Manhattan Group are higher than those of the Brooklyn Group.
Potential Future Projects
I think this dataset could be used in conjunction with other data to understand the different boroughs. In the future we could potentially map all the schools to the zip code and use information about the zip code to help understand the affects of the community on the SAT Scores. For example, if we could link average income per zip or number of businesses within the zip code, we could create a model to continue comparing boroughs or do more exploratory analysis.
I would like to thank @CoolDatasets on twitter for initially telling me about this dataset.
I would appreciate any feedback on the analysis and code. You can email me here
Data was found first found here