# 机器学习——如何使用过去的 20 行作为 X 的每个 Y 值的输入:machine learning-how to use the past 20 rows as an input for X for each Y value

Jake Shakeswor 2022-5-6

I have a very simple machine learning code here:

``# load dataset dataframe = pandas.read_csv("USDJPY,5.csv", header=None) dataset = dataframe.values X = dataset[:,0:59] Y = dataset[:,59] #fit Dense Keras model model.fit(X, Y, validation_data=(x,y_test), epochs=150, batch_size=10) ``

My X values are 59 features with the 60th column being my Y value, a simple 1 or 0 classification label.

Considering that I am using financial data, I would like to lookback the past 20 X values in order to predict the Y value.

So how could I make my algorithm use the past 20 rows as an input for X for each Y value?

I'm relatively new to machine learning and spent much time looking online for a solution to my problem yet I could not find anything simple as my case.

### 问题解答

This is typically done with Recurrent Neural Networks (RNN), that retain some memory of the previous input, when the next input is received. Thats a very breif explanation of what goes on, but there are plenty of sources on the internet to better wrap your understanding of how they work.

Lets break this down in a simple example. Lets say you have 5 samples and 5 features of data, and you want two stagger the data by 2 rows instead of 20. Here is your data (assuming 1 stock and the oldest price value is first). And we can think of each row as a day of the week

``ar = np.random.randint(10,100,(5,5)) [[43, 79, 67, 20, 13], #<---Monday--- [80, 86, 78, 76, 71], #<---Tuesday--- [35, 23, 62, 31, 59], #<---Wednesday--- [67, 53, 92, 80, 15], #<---Thursday--- [60, 20, 10, 45, 47]] #<---Firday--- ``

To use an `LSTM` in keras, your data needs to be 3-D, vs the current 2-D structure it is now, and the notation for each diminsion is `(samples,timesteps,features)`. Currently you only have `(samples,features)` so you would need to augment the data.

``a2 = np.concatenate([ar[x:x+2,:] for x in range(ar.shape[0]-1)]) a2 = a2.reshape(4,2,5) [[[43, 79, 67, 20, 13], #See Monday First [80, 86, 78, 76, 71]], #See Tuesday second ---> Predict Value originally set for Tuesday [[80, 86, 78, 76, 71], #See Tuesday First [35, 23, 62, 31, 59]], #See Wednesday Second ---> Predict Value originally set for Wednesday [[35, 23, 62, 31, 59], #See Wednesday Value First [67, 53, 92, 80, 15]], #See Thursday Values Second ---> Predict value originally set for Thursday [[67, 53, 92, 80, 15], #And so on [60, 20, 10, 45, 47]]]) ``

Notice how the data is staggered and 3 dimensional. Now just make an `LSTM` network. Y remains 2-D since this is a many-to-one structure, however you need to clip the first value.

``model = Sequential() model.add(LSTM(hidden_dims,input_shape=(a2.shape[1],a2.shape[2])) model.add(Dense(1)) ``

This is just a brief example to get you moving. There are many different setups that will work (including not using RNN), you need to find the correct one for your data.

