The researchers introduce the ROSMA dataset and methodology in Applied Sciences, facilitating instrument detection and gesture segmentation in robotic surgical tasks. Through manual annotations and a neural network model combining YOLOv4 and LSTM, they achieve high accuracy and generalization capabilities, offering potential applications in surgical data science and beyond.