Why not code binary inputs as 0 and 1?
If the input data are distributed over a range of [1,2] as shown at lines1to2.gif, the situation is even worse. If the input data are distributed over a range of [9,10] as shown at lines9to10.gif, very few of the initial hyperplanes pass through the region at all, and it will be difficult to learn any but the simplest classifications or functions. It is also bad to have the data confined to a very narraw range such as [-0.1,0.1], as shown at lines-0.1to0.1.gif, since most of the initial hyperplanes will miss such a small region. Thus it is easy to see that you will get better initializations if the data are centered near zero and if most of the data are distributed over an interval of roughly [-1,1] or [-2,2]. If you are firmly opposed to the idea of standardizing the input variables, you can compensate by transforming the initial weights, but this is much more complicated than standardizing the input variables. Standardizing input variables has different effects on different training