Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You might want to look at botlzman/softmax if you want to weight the prob of selection as a function of the current estimated value. One tricky bit is figuring out a good setting for the temperature parameter. Another poster alluded to softmax. In my experience it dosn't really perform better than a simple e-greedy approach, but maybe it has worked well for others?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: