“The embodiments of the disclosure generally relate to systems and methods for using audio and stacked computer vision algorithms in user environment and user behavior detection. In some embodiments, ...