DQN only use discrete action space

Question

question

Willie asked Feb 26, '25 Willie edited Feb 26, '25

DQN only use discrete action space

I am using MultiDiscrete in the action space, and the PPO algorithm is running correctly.

However, when I used DQN, it seems that DQN does not support MultiDiscrete action space.

Is it possible to convert MultiDiscrete to Discrete action space?

Thank you for your advise or the other method in advance.

Software Version:

FlexSim 23.0.15

reinforcement learning dqn actionspace multidiscrete

1740568160325.png (31.5 KiB)

1740569324259.png (6.1 KiB)

smalldemo.fsm (87.4 KiB)

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Answer 1 · 2025-02-26T11:58:12Z

Nil Ns answered Feb 26, '25 Nil Ns commented Feb 26, '25

Hey Willie,

Yes, it's definitely possible to convert a MultiDiscrete action space into a Discrete one by combining OP1 and OP2 into a single value. Basically, you treat each combination of OP1 and OP2 as a unique action in a Discrete space.

For example, if OP1 has 10 (1,2,3...) possible values and OP2 has 5 possible values, you can map each pair (OP1, OP2) to a single discrete index using:

//MultiDiscrete to Discrete
int Op1 = 7;
int Op2 = 4;
int nOp2 = 5; // Num of posible actions of Op2

return Op1 * nOp2 + Op2; //the Discret input

//Discrete to MultiDiscrete 
int answer = 39; 
int nOp2 = 5; 

int OP1 = Math.floor(answer / nOp2 );
int OP2 = Math.fmod(answer ,nOp2);
return OP1 + "" + OP2;

This makes it possible to use DQN, but to be honest, it’s not the most efficient way. DQN treats each action as an independent choice, so combining multiple actions into one number can make it harder for the model to recognize patterns. This often leads to longer training times and more complex learning.

Since PPO is already working well for you, I'd recommend sticking with it if possible. But if DQN is a must, this method will get the job done—it just might take more effort to train.

Hope this helps!

· 2

5 |100000

Attachments: Up to 12 attachments (including images) can be used with a maximum of 23.8 MiB each and 47.7 MiB total.

Willie commented · Feb 26 at 12:06 PM

Hello @Nil Ns , Thank you for your advise, this is very helpful for me.

But where should I put this code.In the FlexSim or Vs code's Training file ?

0 ·

Nil Ns Willie commented · Feb 26 at 12:18 PM

The best approach would be to modify the OnObservation event before passing it to the RL model. You can use the first piece of code there.

Then, after the RL model returns an action (a single Discrete value), you’d need to convert it back to MultiDiscrete format in the OnRequestAction event. That’s where you can apply the second piece of code, after tho original OnRequestAction code.

0 ·

question