cs/ - Contains all results with the query "computer*" - 27,822 results all/ - Contains all results with the query "u*" (experimentally the most you can get with one query) - 271,807 results
The schema of the all.csv
file which contains all the content on GradCafe and the allgrad
table in all.sql
is:
Column Name | Type | Description |
---|---|---|
rowid | INTEGER PRIMARY KEY | A unique integer ID identifying the row. There are 271,807 rows. |
uni_name | TEXT | The name of the university. The uncleaned field is user-supplied, and very noisy, containing 10,297 distinct strings. The cleaned version reduces this number to 2708. 98.5% of university names are clean. |
major | TEXT | The intended major. This field isn't cleaned and is user-supplied and also noisy. It contains 18,957 distinct strings, the most common of which are "Computer Science", "Economics" and "English". |
degree | TEXT(5) | The degree to be earned. This field is cleaned and takes the following values: "PhD", "MS", "MEng", "MBA", "MFA", "MA", and "Other". The top 3 are "PhD", "MS" and "Other". |
season | TEXT(3) | The season is a three letter string of the form [SF][0-9]{2}. "S" is for admission into the Spring semester and "F" represents Fall. The two numbers represent the year for which admission is being sought. |
decision | TEXT(15) | The decision being reported. This field takes the following values: "Accepted", "Rejected", "Wait listed", "Interview" and "Other". |
decision_method | TEXT(15) | The method in which the decision was reported. The field takes the following values: "E-mail", "Website", "Phone", "Postal Service" and "Other". |
decision_date | TEXT(10) | The date the decision was made in the form "dd-mm-yyyy". |
decision_timestamp | INTEGER | The timestamp since epoch that the decision was made. |
ugrad_gpa | FLOAT | The candidate's self-reported undergraduate GPA. Typically on a 4.0 scale, but often scores on 10.0 scales are reported with no clear disambiguation. |
gre_verbal | INTEGER | The candidate's self-reported GRE Verbal score. If is_new_gre is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive. |
gre_quant | INTEGER | The candidate's self-reported GRE Quantitative score. If is_new_gre is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive. |
gre_writing | FLOAT | The candidate's self-reported GRE Writing score. It is on a scale of 0.0 to 6.0. |
is_new_gre | INTEGER | Whether or not the candidate took the new GRE examination (where scores range from 130 to 170) or not. |
gre_subject | INTEGER | The candidate's self-reported GRE Subject Test score. It can range in the 900s. Presumably, given that this is a CS dataset, I'd assume the subject in question is Computer Science. |
status | TEXT(28) | The status of the candidate. Can take on 4 different values - "American", "International", "International with US degree" and "Other". |
post_data | TEXT(10) | The date on which this report was posted by the candidate in the form "dd-mm-yyyy". |
post_timestamp | INTEGER | The timestamp since epoch that the post was made. |
comments | BLOB | All user added comments to the post he submitted |
The schema of the cs.csv
file, which contain all results that have word beginning with "computer", and the computer
table in cs.sql
is:
Column Name | Type | Description |
---|---|---|
rowid | INTEGER PRIMARY KEY | A unique integer ID identifying the row |
uni_name | TEXT | The name of the university. The uncleaned field is user-supplied, and very noisy, containing 2325 distinct strings. The cleaned version reduces this number to 415. |
major | TEXT | The intended major. This field is cleaned and takes the following values: "CS", "ECE", "HCI", "IS" and "Other". |
degree | TEXT(5) | The degree to be earned. This field is cleaned and takes the following values: "PhD", "MS", "MEng", "MBA", "MFA" and "Other". |
season | TEXT(3) | The season is a three letter string of the form [SF][0-9]{2}. "S" is for admission into the Spring semester and "F" represents Fall. The two numbers represent the year for which admission is being sought. |
decision | TEXT(15) | The decision being reported. This field takes the following values: "Accepted", "Rejected", "Wait listed", "Interview" and "Other". |
decision_method | TEXT(15) | The method in which the decision was reported. The field takes the following values: "E-mail", "Website", "Phone", "Postal Service" and "Other". |
decision_date | TEXT(10) | The date the decision was made in the form "dd-mm-yyyy". |
decision_timestamp | INTEGER | The timestamp since epoch that the decision was made. |
ugrad_gpa | FLOAT | The candidate's self-reported undergraduate GPA. Typically on a 4.0 scale, but often scores on 10.0 scales are reported with no clear disambiguation. |
gre_verbal | INTEGER | The candidate's self-reported GRE Verbal score. If is_new_gre is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive. |
gre_quant | INTEGER | The candidate's self-reported GRE Quantitative score. If is_new_gre is 1, this field should be between 130 and 170 inclusive. If 0, then it should be between 200 and 800 exclusive. |
gre_writing | FLOAT | The candidate's self-reported GRE Writing score. It is on a scale of 0.0 to 6.0. |
is_new_gre | INTEGER | Whether or not the candidate took the new GRE examination (where scores range from 130 to 170) or not. |
gre_subject | INTEGER | The candidate's self-reported GRE Subject Test score. It can range in the 900s. Presumably, given that this is a CS dataset, I'd assume the subject in question is Computer Science. |
status | TEXT(28) | The status of the candidate. Can take on 4 different values - "American", "International", "International with US degree" and "Other". |
post_data | TEXT(10) | The date on which this report was posted by the candidate in the form "dd-mm-yyyy". |
post_timestamp | INTEGER | The timestamp since epoch that the post was made. |
comments | BLOB | All user added comments to the post he submitted |