2

Hey Folks,

At the moment i am using GLMselect as a first stage investogation on potential valid and stable regressors (Out of Sample validation and ASE being the key features I want to leverage here). My dependent is a 1/0 binomial but I was hoping I could get a syntax or use another PROC (perhaps Transreg?) to transform it into the logit input - my problem is that I am using a cross section and i am not too sure about the best syntax or Proc to use - your thoughs and suggestions are much appreciated !

flag

3 Answers

1

Hello,

You could use Proc Logistic which allows you to perform stepwize selection for binary dependents either for quantitative and categorical independant variables.

Here is a small example:

PROC PROBIT DATA=test;  
CLASS z x;  
MODEL x=Y z/D=LOGISTIC SELECTION=BACKWARD;  
RUN;

Does it help?

Kind regards,
CC

link|flag
0

Hello Toloc - it is good to know indeed and the PROC Logit also offers stepwise selection with ROC chart as an input. However the Key reason while I am keen in processing this through GLM select is that it is the only Proc that I know off which can base the selection on the Average Error on the Validation sample. Therefore the risk of overfitting are somehow reduced, even more so when I also include a third sample which has never been used. It would be great to be able to use proc logit or probit for such modelling process but I don't have access to such Macro (feel free to fill that Blank!). Thus the intermediate solution is to use Proc GLMSELECT but that it means I need to directly input the Maximum likelihood formula as an objective function - how does one code this? Thanks again everyone - I feel this is going to be an interesting thread!

link|flag
0

Hello Olivier,

To directly select the model with the Average Error on the Validation sample, you can use the regression node in Entreprise Miner, if you got it ?

Personally for selection purposes, I prefer to to visualize the Roc performance for training, validation et test set on a chart for the different models tested by the algorithm. You can check when the performance on the test set (the one not used) is at the top or close to it with less drivers. This can also be done with Proc logistic with some macros coding.

Regards,
Toloc

link|flag
Hello Toloc, Unfortunately I don't have EM (not rich enough!). The ROC idea rocks :-). But I still wonder what is the model line code for the Maximum Likelihood formula - if anyone got any ideas that would be great - meanwhile I am going to dig in my old textbook (If I can find it!) – Olivier Nov 25 at 10:39

Your Answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.