Spss ibm text mining gradpak

those who are not can be attributed to chance. The exercise found the probability (P-Value) to be 0, so the probability is 0 that the difference between students involved in activities vs. not on-time and then modeler broke out the next level as students who did not participate in activities and those who did. The initial node shows the breakdown of graduate on-time vs. My results are presented in the viewer which shows a “tree” to present the data.

In my exercise, I set my (CHAID) target to “ graduate on time” and my predictors to “ activities” and “ athlete”. Again, SPSS Modeler offers the ‘CHAID Node” that can be dropped into a stream and configured. This test gives the probability that the difference between athletes and non-athletes can be attributed to chance.ĬHAID, or Chi-squared Automatic Interaction Detection, is a classification method for building decision trees by using chi-square statistics to identify optimal splits. The Chi-Square test is a statistical test is used to answer this question. The question now is -can this difference be attributed to chance (because just a sample was drawn) or, does the difference in the sample reflect a true difference in the population of all students? Looking at my cross-tabulation output, it appears that 93 % of the non-athlete students did not graduate on time, while for the students who were athletes only 13 % did not graduate on time. What I did was use the node to plot “ athlete” overlaid with “graduate on time” for an interesting perspective: A typical use of the Distribution node can be to show imbalances in the data (that can be rectified by using a Balance node before creating a model). SPSS Modeler also provides the Distribution Node which lets you show the occurrence of symbolic (non-numeric) values, in this case, “graduated on time” or “athlete”, in our dataset. With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

(I also went to the “Appearance” tab and clicked-on “Counts” and “Percentage of column” for my “Cross-tabulation cell contents”.Īfter clicking “Run”, the output is ready for review: For example, I set “Rows” to the field in my file “graduate on time” and “Columns” to “athlete”. You can simply drop it into your stream, connect it to your source data and set some parameters. In IBM SPSS Modeler, it is very simple to cross tabulate data using the Matrix node. Based on this hunch, they investigate to see if there might be any differences in graduating on time statistics – by cross tabulating on “athlete”: The college recruiters have a hunch that there is a difference between students who are athletes and students who are not and if a student participates in collegiate activities or not.

Typically, a dataset will include a field that indicates the behavior, here: has the student graduated on time? Yes or no. The sample below gives us an idea such a historical dataset. The institution can draw a sample from its historic data and using this sample, possibly predict if a particular recruit would graduate on time. My college wants to determine if a recruit will graduate on time or not. So, when I read in the news recently, about college recruiters using predictive techniques to determine the probability of a particular recruit graduating on time, I thought it would be interesting to explore that idea. Having recently completed the course “IBM SPSS Modeler & Data Mining” offered by Global Knowledge, I was looking to find more opportunities to do some modeling with SPSS Modeler.