My thought: start simple, then maybe try adding more variables. Does performance get better? Can you tell which variables really helped the machine learning algorithm predict better?
At first, we decided to go with age, sex, country, outcome, and all of the chronic diseases and symptoms. But we may have other ideas along the way...
We did what Dr.Stonedahl suggested and started with only age, gender and country then add more variables. But it only change significantly when we add in symptoms of pneumonia.
We used age, sex, country, and a bunch of the specific symptoms. For us, that was the best combination from all the ones we tried.