January 25, 2025
This blog describes our attempt to use reinforcement learning to grow plants by learning a lighting schedule from data.
Markov Decision Process (MDP)
We first frame the problem as a Markov Decision Process (MDP).
Reward function
Difference in log area
The return is invariant to the initial area.
The rewards are invariant to the size of the plant, this is desirable
because when training the neural network the error should not be
dominated by the size of the plant.
The return is the log-ratio of the final and initial area.
We can interpret the return as the number of doublings of the plant area.
The exponential of the return is the ratio of the final and initial area.
We actually use to avoid issues with zero area.
Alternative Reward functions
Other alternative reward functions were considered.
Difference in area
The return is not invariant to the initial area. The rewards grow exponentially with the size of the plant.
Percentage change in area
While this reward function induces a return that is invariant to the initial area, it can favour volatility in area sequences.
For example, a plant that is that grows to then shrinks back to gets a return of . The plant did not grow, so the return should be zero.
Difference in area divided by initial area
While this reward function induces a return that is invariant to the initial area, and the return is equal to the total growth factor, it has rewards that are exponential with the plant size.
Methods
- Offline RL
Deployment
How do we do inference? mean plant embedding seems to be a reasonable choice (maybe only mean of those that are “alive”)