How to deal with No Data Issues?

posted in   KDD CUP of Fresh Air

2018-05-30 06:40



2018-05-31 06:02


2018-05-31 07:49




2018-05-31 09:14



Jack Strong

2018-05-31 09:51


<p>I agree with extended evaluation suggestion. Last ten days are most important part in this competition. It represents final model expression ability and often last period is most competitive for team catching up. I believe this is the reason why host sets up 10 days sward. Teams work for last 10 days model since May.21. Generally, there are three methods for now:</p> <ol> <li><p>Use damaged data days evaluation.<br>Three days data influence last ten days a lot. Obviously, damaged data evaluation is not consistent with competition rule, and the result is biased and random.</p> </li><li><p>Just discard three damaged days data.<br>The evaluation days are not consistent with 31 days evaluation. As for last 10 days award, using 7 days are insufficient and not consistent with competition rule. Especially, the other 7 days suffer from very special weather of sand storm and weather forecast long delay. Use data before May.20 can sum up to 10 days, but it will lead to last days evaluation using middle period result, which is inconsistent with rule and could not reflect final model ability.</p> </li><li><p>Discard three damaged data and extend three days evaluation, 10 days evaluation is complete and no bias. Full of 31 days evaluation is also using right data.</p> </li></ol> <p>If these three damaged data days evaluation is used for evaluation or just discarded. The award, especially last ten days award, is hard to say as same value as kdd cup in previous 20 years.</p> <p>Thanks for chairs’ kindly consideration for our situation. Hope this year winner is qualified and kdd cup will be keeping respectful in the future.</p>
  • unique reply Jack Strong

    2018-05-31 17:08


    <p>Thanks Jack for the anylize. Use damage data or juet discard three is certainty not reasonable. Despite exrend competition, another option is using London only, thus the data is clean and 31 days grade is also kept as expected.</p>

Christo Palaskas

2018-05-31 17:31


<p>I don’t agree with changing anything that has been agreed upon from the begining. These things happen in the real world, sometimes you have data sometimes you don’t. This will also show who has developed better tolerance mechanisms for missing data. We missed a couple days, so what? Everybody missed those days. The competition is supposed to end today, not drag on because of reasons that come up later.</p>


2018-05-31 18:33


<p>I don’t agree with change rules, as long as the rule apply to everyone, it is fair game. </p> <p>There are lots of missing data in your training data, we know the quality of data is not perfect when you started working on the competition. </p>


2018-06-01 02:31


