The present specification is related generally to a radar-based gestural interface.
Presently, 8 people every day on average are killed in the United States in crashes reported to involve distracted drivers, and 1,161 injured. Driver distraction is increasingly becoming problematic with the ubiquity of smartphones, the preference for texting over calling, the need for GPS navigation via mobile phones, and the plethora of new notifications.
Dashboard interfaces, or tertiary tasks have been the main focus of conventional interfaces for years and generally the standard for cars. The dashboard interfaces are predictable and provide feedback in the form of mechanical switches/buttons, but are both away from the line-of-sight for the driver and also fixed in functionality. The use of touchscreens has increased in recent years to become standard in higher end cars. Although touchscreens allow many more configurable controls, they require complete eye contact without providing any tangible feedback.
In some implementations, a system for providing a gestural interface in vehicles uses radar to enable user interaction and control of a mobile phone or other in-vehicle processing system through gestures. These gestures can be three-dimensional spatial gestures that occur between the car dashboard and the driver's chest. The gestures can be performed by the user while the hands of the user remain in contact with or in proximity to the steering wheel of the vehicle. The system can include a radar transmitter and a radar receiver for detecting the gestures of the user. The system can further include a processing module that is configured to determine commands corresponding to the detected gestures of the user. The commands corresponding to the detected gestures can be used to place calls, select music, send texts, and enable GPS or navigation.
By allowing the user to control certain functionality of a mobile phone via the gestural interface, the user can remain focused on the operation of the vehicle. In particular, the user can control the interface without looking away from the road, and without removing his hands from the steering wheel. The gestural interface can include contextually limited features to limit the gestures at any given point in time to only those that are relevant to the user's need at that moment. Thus, the system can be limited to gestures and corresponding mobile device actions that are safe during driving. Further, the system for providing a gestural interface can leverage symbolic language that users already familiar to a user. The leveraging of common symbols enables users to easily recall gestures without creating distractions while driving.
One innovative aspect of the subject matter described in this specification is embodied in systems that include a radar transmitter located in a vehicle, the radar transmitter being arranged to transmit radio waves toward a driver of a vehicle, and a radar receiver located in the vehicle. The radar receiver can be arranged to detect changes in radio waves corresponding to hand movements of the driver of the vehicle while the driver's hands remain in contact with or in proximity to the steering wheel of the vehicle. The systems can further include a processing module configured to determine commands corresponding to the hand movements detected by the radar receiver.
Other implementations of this and other aspects include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods encoded on computer storage devices.
Implementations may each optionally include one or more of the following features. For instance, the system can include the radar transmitter and the radar receiver are located on a single chip. The processing module can be in communication with a mobile device. The system can include the radar transmitter and the radar receiver being removably mounted within the vehicle. The radar transmitter can transmit millimeter-wave frequencies and the radar receiver can receive millimeter-wave frequencies. In some aspects, the system can include the radar receiver, the radar transmitter, and the processing module located in a housing. The radar transmitter and the radar receiver can be positioned behind the steering wheel of the vehicle. The radar transmitter and the radar receiver can be oriented in a direction of an opening in the steering wheel of the vehicle. Further, the system can detect more than 100 measurements per second at the radar receiver. The system can be used to control an automotive infotainment system, and can be configured to adjust one or more settings responsive to the determined commands.
In another general aspect, a method performed by one or more computing devices includes: receiving, from a radar receiver arranged to detect movement at an interior of a vehicle, movement data corresponding to a gesture of a driver of the vehicle; determining, based on the movement data from the radar receiver, that the gesture represents a particular gesture from among a first predetermined set of gestures for selecting an operating mode of a computing device; and in response to determining that the gesture represents the particular gesture: (i) causing a computing device to enter an operating mode corresponding to the particular gesture; and (ii) determining, based on data from the radar receiver, whether a subsequent movement of the driver represents a gesture from a second predetermined set of gestures that is different from the first predetermined set of gestures.
Implementations may optionally include one or more of the following features. For example, determining that the gesture represents a particular gesture from among a predetermined set of gestures includes determining that the gesture represents the particular gesture based on output from a first classifier trained to recognize the gestures in the first predetermined set of gestures. Determining that the subsequent movement represents the gesture from the second predetermined set of gestures includes determining that the gesture represents the gesture from the second predetermined set of gestures based on output from a second classifier that is trained to recognize the gestures in the second predetermined set of gestures, the second predetermined set of gestures being different from the first predetermined set of gestures.
In some implementations, determining that the gesture represents a particular gesture includes: accessing context data indicating a current operating mode of the computing device; and determining that the gesture should be selected from among a first predetermined set of gestures based on the context data indicating the current operating mode of the computing device. In some implementations, determining that the gesture represents a particular gesture includes: determining feature scores based on output of the radar receiver; providing the feature scores to each of multiple classifiers, and the multiple classifiers have been trained to indicate likelihoods of occurrence of gestures in different predetermined sets of gestures; selecting one of the multiple classifiers based on context data indicating a current mode of operation of the computing device; and determining that the gesture represents the particular gesture based on output from the selected classifier.
In some implementations, determining that the gesture represents a particular gesture includes processing input representing features of the movement data sensed by the radar receiver with multiple machine learning classifiers that operate in parallel, each of the multiple machine learning classifiers being configured to recognize gestures in a different predetermined set of gestures.
In some implementations, the multiple machine learning classifiers are decision trees.
In some implementations, the decision trees are random forest decision trees.
In some implementations, causing the computing device to enter the operating mode corresponding to the particular mode selection gesture includes sending an indication of a user selection corresponding to the particular gesture to a mobile phone in the vehicle over a wired or wireless interface.
In some implementations, causing the computing device to enter an operating mode corresponding to the particular mode selection gesture includes causing a mobile phone to enter a mode for initiating calls, select music, text messaging, or navigation.
In some implementations, the second set of predetermined gestures corresponds to a shared gestural vocabulary between a plurality of devices.
In some implementations, the radar receiver is arranged within the vehicle to detect movements in a volume that includes a space a steering wheel and the driver's chest, and the movement data indicates movements of the driver's hands or fingers in the volume.
In some implementations, the radar receiver is arranged within the vehicle to detect movements of the driver's fingers on and around at least a portion of a steering wheel of the vehicle, and the movement data indicates movements of the driver's fingers on and around the steering wheel of the vehicle.
In some implementations, the radar receiver is arranged to transmit and receive radar signals through an opening in the steering wheel.
In some implementations, the computing device is integrated with the vehicle.
In some implementations, the computing device is a mobile device that is not integrated with the vehicle.
In some implementations, the method further includes communicating with the computing device over a wireless interface.
In some implementations, the radar transmitter and the radar receiver are located on a single chip.
In some implementations, a radar transmitter and the radar receiver are removably mounted within the vehicle.
In some implementations, the radar transmitter transmits millimeter-wave frequencies and the radar receiver receives millimeter-wave frequencies.
In some implementations, the radar receiver, the radar transmitter, and the processing module are located in a single housing.
In some implementations, the radar transmitter and the radar receiver are positioned behind the steering wheel of the vehicle.
In some implementations, the radar transmitter and the radar receiver are oriented in a direction of an opening in the steering wheel of the vehicle.
In some implementations, the radar receiver is configured to perform more than 100 measurements per second.
In some implementations, the processing module is further configured to adjust, in response to the determined gestures or corresponding commands, one or more settings of an automotive infotainment system.
In another general aspect, a method performed by one or more computing devices includes: receiving data from a radar receiver within a vehicle, the data from the radar receiver indicating movement of a driver of the vehicle; using a plurality of classifiers to classify the movement of the driver, each of the classifiers being trained to recognize a different set of gestures; selecting, from among the outputs of the classifiers, a classification for the movement based on information about an operating state of a computing device; and providing a command corresponding to the selected classification.
In some implementations, each of the plurality of classifiers includes a random forest classifier.
In some implementations, selecting the classification for the movement includes: determining whether a mobile phone is in (i) a first state in which one of a plurality of user-selectable modes are active, (ii) a second state in which none of the plurality of user-selectable modes are active, and (iii) a third state in which an incoming call is being received, the first state corresponding to a first classifier, the second state corresponding to a second classifier, and the third state corresponding to a third classifier; and selecting the output from the state that the mobile phone is determined to be in when the movement of the driver was detected.
In another general aspect, method performed by one or more computing devices includes: obtaining a set of candidate gestures for a command; determining, for each of the candidate gestures, a detectability score; determining, for each of the candidate gestures, a uniqueness score indicating a level of difference from gestures in another set of gestures; determining, for each of the candidate gestures, a memorability score indicating a measure of human users to remember the gesture after a period of time; and assigning a gesture for the command, from among the set of candidate gestures, based on the detectability scores, the uniqueness scores, and the memorability scores.
In some implementations, the detectability score indicates a measure of accuracy or repeatability with which the candidate gesture is detected.
In some implementations, the set of candidate gestures corresponds to a shared gestural vocabulary between a plurality of devices.
In some implementations, the method includes training a classifier to recognize a set of gestures that includes the assigned gesture. Different classifiers are trained to recognize different sets of gestures, where each classifier and associated set of gestures is used for recognizing gestures for a different operating mode or state of a computing device.
Other implementations of this and other aspects include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods encoded on computer storage devices. Implementations may optionally include additional features described below, and subcombinations thereof
Advantageous implementations can include one or more of the following features. The system can include a processing module that is connected to a mobile phone via Bluetooth connection. As such, the processing module may determine commands corresponding to the hand movements of the user and then transmit the determined commands for execution at the mobile phone. The system can use the radar transmitter and the radar receiver to determine a respiratory rate and a heart rate of the user. The respiratory rate and the heart rate of the user may be used to determine a relative stress level of the user while operating the vehicle. The system can further include a visual interface and an auditory interface to supplement the gestural interface. In some aspects, the visual and auditory interfaces may provide audio and visual cues to indicate the detection and/or execution of commands that are determined by the processing module. Further, the processing module of the system can be configured to adjust one or more settings of an automotive infotainment system in response to a determined gesture. The settings that can be changed can include, for example, an audio volume setting, a radio channel selection setting, a song or playlist selection setting, a media source setting, GPS navigation settings, and other settings. In this manner, the gestural interface may control a mobile device located in a vehicle, such as a mobile phone, or an in-dash or integrated system of a vehicle, or both.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will become apparent from the description, the drawings, and the claims.
BRIEF DESCRIPTIONS OF DRAWINGS
FIG. 1 is a system diagram for a system for providing a radar-based gestural interface.
FIG. 2 is an exemplary illustration of classification accuracy tests for a system providing a radar-based gestural interface.
FIG. 3 is an exemplary illustration of primary symbolic gestures.
FIG. 4 is an exemplary illustration of secondary vocabulary gestures.
FIG. 5 illustrates an exemplary system providing a gestural interface in vehicles.
FIG. 6 illustrates an exemplary graph of a memorability of gestures.
FIG. 7 illustrates an exemplary graph of a volume of training data.
Like reference numbers and designations in the various drawings indicate like elements.
In some implementations, a system for providing a gestural interface in vehicles includes an automotive gestural interface that uses radar to enable sub-millimeter interaction and control for in car mobile phones through open-air and hands-on-the-steering-wheel gestures. The system can use a gestural vocabulary for in-car mobile devices to reduce driver distraction using radar based hardware.
The system providing the gestural interface can provide new possibilities that were not previously possible or reasonable with cameras for in-car gestural interfaces including: stable usability independent of lighting and atmosphere, a small 9×9 mm form factor, resistance to optical occlusions, and increased privacy, which is typically problematic for camera technologies.
The system can use a gestural language that is intuitive to learn and easy to perform in a driving scenario with minimal cognitive load. In certain aspects, the gestural language is based on a survey to find preferred mobile services while driving. As such, the gestural language can be iteratively designed using interface gestures that are both recognizable and memorable. In certain aspects, the gestural language can be contextually aware and include modalities achieving greater than 97% accuracy.
The system providing a gestural interface in vehicles can be a wireless enabled aftermarket device that can be implemented in a car for a hands on the steering wheel interactions to decrease driver distraction.
This alone may not be a problem, but compounded with the exponential growth of mobile phone usage for all forms of life from entertainment (including music and videos), to professional necessity for work and general communication, issues with distracted driving are taking place. It was reported in 2014 that the average employed adult has “over 31 hours of activity in a day.” The figure of 31 hours of activity was reported due to the fact that in tracking the activity of the participants, the days of the participants were highly multitasked with multiple activities performed at once, a majority of which involved mobile phones or computers.
The new attachment and ubiquity of mobile phones can ultimately affect driving positively (due to instant access to navigation information and entertainment), as well as negatively (when the driver's eyes are distracted from the road). It was reported by the NHTSA that at any given daylight moment across America, approximately 660,000 drivers are using cell phones or manipulating electronic devices while driving, a number that has held steady since 2010.
Until self-driving cars become the norm, drivers are faced with a series of either antiquated or transitional technologies that cause significant challenges. For example, driver face challenges posed by antiquated head units. Antiquated head units can include older dashboards that may be functional in terms of natural haptic feedback from knobs and buttons, but can fail when users need to look away from the road to find them. Additionally, the antiquated head units may not meet user needs for music selection, phone calls, or dynamic functionality in a time of ultimate customizability. In another instance, original equipment manufacturer (OEM) interfaces can posed a challenge to drivers. Over the average lifespan of a new automobile (8 years or 150,000 miles) drivers are faced with—once innovative—hardware and software solutions that are difficult to upgrade with the pace of mainstream technology adoption. Further, OEM touch screen units can provide challenges to drivers. In car touch screens provide dynamic controls and content, but require the most visual attention from their complete lack of touch or haptic feedback. Additionally, mobile phones can pose challengers for drivers. Often, drivers opt to use their mobile phone for a majority of tasks while driving. These tasks of music selection, navigation, phone calls, etc. can all create distractions when a driver is operating a vehicle.
The system providing a gestural interface in vehicles contributes to and extends the effort to reduce driver distraction through alternative user interfaces for driving. The system can use radar as a gestural input device inside automobiles. The use of radar as a gestural input device can include an entirely radar-based standalone sensor for full interaction and be user-centric by design.
FIG. 1 is a system diagram for a system 100 for providing a radar-based gestural interface. As shown in the figure, the system includes a radar gesture system 110 that can be used in a vehicle 120. The radar system 110 can be used to detect and interpret gestures of a person in the vehicle. For example, at least portions of the radar gesture system 110 can be mounted to or integrated into the steering wheel or dashboard area of a car, truck, or other vehicle. The radar components of the system can be positioned so that finger and hand movements of the driver in the area of the steering wheel can be detected as control gestures for a computing system. In particular, the radar system 110 can be configured to detect movements made while the driver's hands are still in contact with the steering wheel. The radar system 110 detects the gestures, and then interprets the detected movements to select a command for a computing device 130, which can be a mobile device separate from the vehicle, such as a user's phone, tablet computer, laptop computer, wearable device, navigation device, etc., or can be a computing system integrated into the vehicle 120, such as an in-dashboard navigation unit, control system or control panel of the vehicle 120, or other information or entertainment (infotainment) system of the vehicle 120.
The radar system 110, or at least the radar subsystem 130, can be placed or mounted in the interior or cabin of the vehicle, arranged to detect movement and gesture in an area between the steering wheel and the user's chest. The volume in which the radar system 110 detects gesture input can include some or all of the steering wheel, to include input of finger and hand movements on or around the steering wheel. The radar subsystem 130 can be oriented to transmit and receive radar signals through an opening in the steering wheel, and around at least a top portion of the steering wheel.
In some implementations, the system 100 providing a gestural interface in vehicles can include the radar system 110 as a standalone radar-based sensing module for gestural control of a mobile phone or other user device, and/or for a built-in infotainment system (e.g., navigation system, stereo system, etc.) of the vehicle 120. The gestural control can be based on a gestural language inspired by sign language symbols for in-car mobile use. The gestural language can be derived from a shared gestural vocabulary to be used across various applications between different devices such as smartphones, tablets, computers, and the like. The system providing a gestural interface in vehicles can include a select set of modalities necessary for driving, as surveyed by users. The system can ultimately balance a gestural UI with technologically feasible radar-based sensing.
The radar system 110 can include a radar transmitter 132 and a radar sensor 134 to provide the gestural interface. In some implementations, the radar transmitter 132 and the radar sensor 134 can be integrated into a single chip 130. The radar transmitter 132 and the radar sensor 134 can provide a sensing mechanism that provides robust, high-resolution, low power, miniature gesture sensing technology based on millimeter wave radar. When implemented at millimeter wave RF frequencies, the entire sensing mechanism can be designed as a radar chip: a miniature, low power device that has no moving parts and can be manufactured inexpensively at scale.
In some implementations, the radar system 110 emits electromagnetic waves in a broad beam. Objects within the beam scatter this energy, reflecting some portion back towards the radar antenna. Properties of the reflected signal, such as energy, time delay, and frequency shift capture rich information about the object's characteristics and dynamics, including size, shape, orientation, material, distance, and velocity.
In some implementations, the radar system 110 tracks and recognizes dynamic gestures expressed by fine motions of the fingers and hand. This can be done with a single chip sensor 130. Unlike traditional radar sensors, this type of sensing mechanism does not require large bandwidth and high spatial resolution. In fact, the spatial resolution can be coarser than the scale of most fine finger gestures. Instead, the radar system 110 uses sensing principles rely on motion resolution by extracting subtle changes in the received signal over time. By processing these temporal signal variations, the radar system 110 can distinguish complex finger movements and deforming hand shapes within its field. Thus, while spatial resolution may be relatively low, the temporal resolution may be high to accurately indicate velocity and changes in position.
In some implementations, the software for the radar system 110 includes a gesture recognition pipeline which is hardware agnostic and can work with different types of radar. The pipeline implements several stages of signal abstraction: (i) from the raw radar data to signal transformations, (ii) core and abstract machine learning features, (iii) detection and tracking, (iv) gesture probabilities, and finally (v) UI tools to interpret gesture controls.
The radar system 110 can include a processing system 140 that includes one or more processors, one or more data storage devices storing executable instructions, and/or other processing components. This processing system 140, along with other hardware and software if desired, can implement the functions described and illustrated for a signal processing module 142, classifiers 144a-144c, and a selection module 146.
The signal processing module 140 can obtain real-time signals from the radar subsystem 130. The signal processing module 140 can apply signal transformations and output the results of the transformations, for example, to generate high-precision position and motion data. For example, the signal processing module 140 may apply transformations to each radar measurement to determine shapes and positions of objects relative to the radar sensor. Similarly, the signal processing module 140 may compare different measurements and track changes over many different measurements to determine the direction and speed of movement of detected objects. Thus, the signal processing module can extract, from the raw radar data stream, features representing the position of objects detected, the size of objects detected, the shape of objects detected, the rate and direction of movement of objects detected and so on. The signal processing module 140 may use Doppler processing techniques, Fourier transforms, and other techniques to generate feature values representing radar signals that have been reflected back to the sensor and detected. Thus, in response to the incoming stream of radar sensor measurements, the processing system 140 can output the results of signal transformations, including high-precision position and motion data. This may be provided as a stream of feature values, where each set of feature values represents one or more measurements of the radar sensor 134. In some implementations, a set of predetermined core radar features are used, and the output features represent the mean, standard deviation, sum, and absolute sum for each of the core radar features. For example, with 8 core radar features, a total of 32 feature value outputs may be provided for each measurement. The feature values determined for a given measurement or frame can be based on a set of measurements within a predetermined window of time, e.g., the prior 5, 10, or 50 measurements. The sensor 134 and the signal processing module can operate at any appropriate frame rate, for example, frame rates from 100 to 10,000 frames per second.
The feature values or other output data of the signal processing module 142 can be provided to one or more machine learning classifiers. In some implementations, multiple classifiers 144a-144c are used in parallel to maximize responsiveness and accuracy. For example, each of the multiple classifiers can be trained to detect a different set of gestures. For example, a first classifier 144a may be trained to recognize a specific set of gestures in a first gesture set 145a, a second classifier 144b may be trained to recognize different set of gestures shown as a second gesture set 145b, and a third classifier 144c may be trained to recognize yet a different set of gestures shown as a third gesture set 145c. By limiting each classifier 144a-144c to a specific set of gestures, the classifier can learn to accurately distinguish among the small set of gestures with a small and computationally efficient model. These classifiers 144a-144c can each receive features from the signal processing module 142 in parallel and process the features concurrently. In some implementations, each classifier 144a-144c receives the same set of features even though each is configured to recognize a different set of gestures 145a-145c. The different gesture sets may be completely distinct and non-overlapping, or in some implementations may include one or more gestures in common.
In some implementations, each gesture set 145a-145c and its corresponding classifier 144a-144c corresponds to a particular task, context, or operating mode. For example, in a hierarchical interface, an initial or primary state may have one set of options for a user to select, and each of these options may have a different corresponding gesture assigned. Selection of one option from the primary state may enter a mode or portion of the interface where a secondary set of options are available, and each of these options may correspond to a different gesture. In this manner one gesture set and classifier may represent the gestures for the primary state, and a second gesture set and classifier represent the gestures for the secondary state. In this manner, each state of an interface having a different set of options available for the user may have a corresponding gesture set and classifier.
For example, the classifiers 144a-144c may be implemented as three decision trees or random forest classifiers used in the classification of the gestures. The classifiers 144a-144c can respectively correspond to primary symbolic gestures, phone call gestures, and secondary vocabulary gestures. The primary symbolic gestures may represent operating modes for the computing device or types of tasks that the user may want to perform, e.g., initiate phone call, send a text message, play music, and start GPS. Each of these options may correspond to a distinct, predetermined gesture movement. Once the user performs the appropriate gesture to select one of these options, secondary vocabulary gestures become available, e.g., controls to move left in a list, move right in a list, select a current item, or navigate back (e.g., to the primary interface state). Another set of gestures and a corresponding classifier may be used for gestures in another mode of a computing device, such as a mode in which an incoming call is being received. In this mode, a set of gestures corresponding to answering the call or dismissing the call may be the available options. Depending on the interface and configuration of the computing device, different selection or navigation gestures may be defined, and these gestures (e.g., the hand motions and positions representing the gestures) may be the same or may be different for the various interface states.
As noted above, the three classifiers 144a-144c can run concurrently. The outputs of each classifier 144a-144c can be an indication of the most likely gesture from among the gesture set for the classifier. This may be expressed as an indication of a specific gesture being recognized, or as probabilities for each of the gestures in the corresponding gesture set, or in another form. A selection module 146 receives the outputs of the classifiers 144a-144c and also receives context data 150 about the current state or mode of the computing device 130. Based on the context data 150, the radar system 110 selects the output from one of the classifiers 114a-114c to use in interpreting the user input gesture. The selection module 146 then determines an indication of the user input gesture. For example, as shown in FIG. 1, the selection module 146 can output a gesture indication 152 indicating the identified gesture. In addition, or as an alternative, the radar system 110 may indicate a command corresponding to the identified gesture, for example, by sending a control instruction to change the operation of the computing device 130 based on the identified gesture.
The radar system 110 may communicate with the computing device 130 using a wired interface or a wireless interface (e.g., Bluetooth, Wi-Fi, etc.). The computing device 130 may periodically provide data indicating the current mode or interface state of the computing device 130. For example, each time the mode of the computing device 130 changes, the computing device 130 may indicate the change to the radar system 110. As another example, the mode may be indicated at regular intervals, e.g., each second, every 5 seconds, etc. As another example, the radar system 110 may query the computing device 130 to obtain information indicating the current operating mode of the device.
The operating modes indicated may correspond to the different gesture sets 145a-145c. For example, a mode allowing selection from among music playback, navigation, initiating a call, or other options may be considered a first or primary mode of operation, corresponding to gesture set 145a. When the context data 150 indicates that the computing device 130 is in this mode, the selection module 146 will use the output of the first classifier 144a and ignore the outputs of classifiers 144b, 144c. The result is that the user gestures will only be identified from among the gestures in the first gesture set 145a, which are the set of gestures relevant to the primary operating mode. By limiting the set of gestures that are expected to those actually relevant to the computing device, and using a classifier 144a specifically trained to distinguish among that specific set of gestures, the radar system 110 can avoid false identification of gestures that are not applicable and can use a simple, fast, and computationally efficient model to distinguish among the small set of gestures.
As another example, when the computing device 130 is in a mode for navigation, or a mode for music playback, the computing device 130 can indicate this in the context data 150. In some implementations, the set of gestures relevant to multiple modes can be the same. In other words, whether in music playback mode or navigation mode, the same gesture set 145b may be represent the full set of options available to the user. Accordingly, the context data 150 may simply indicate that one of multiple secondary modes of operation is currently in use, without specifically indicating which of the multiple secondary modes is being used. When the context data 150 indicates that a secondary mode is the current operating mode, the selection module 146 uses the output from the second classifier 144b, which recognizes a different set of gestures than the other classifiers 144a, 144c. The selection module 146 thus determines the most likely gesture from among the set of gestures in gesture set 145bfrom the output of classifier 144b. In a similar manner, if the context data 150 indicates that a third operating mode is active on the computing device 130, such as when a phone call is incoming, then the selection module will use output of the classifier 144c to select a gesture from among gesture set 145c as the identified gesture.
In some implementations, the selection module 146 performs other processing of the classifier outputs. For example, the selection module 146 or another post processing module may smooth the data of the classifiers by skipping frames or adding a buffer of time to the beginning of each frame, so that each determined gesture may be cross-checked for accuracy. As another example, the selection module 146 may average probabilities or other outputs over may frames, e.g., over a particular number of frames or over a particular amount of time, to increase accuracy and reliability. As noted above, radar measurements can be made at frame rates from 100 to 10,000 frames per second. A stream of gesture indications 152, e.g., the gesture labels or instructions indicated, can be provided at the same rate, e.g., one indication per sensor data frame, or at a lower rate. For example, 10 or 100 frames of data and corresponding classifier outputs may be used to generate each gesture indication 152 provided to the computing device 130. In addition, when the user is not performing a gesture, or when a probability score or confidence score for all gestures is less than a threshold, the radar system 110 may indicate that no gesture is currently being performed, or may simply not provide any gesture indication 152 until a recognizable gesture is detected.
In response to receiving the gesture indication 152, the computing device 130 can perform an associated action. For example, the computing device 130 can change operating modes, make selections, traverse items in a list, or perform other operations corresponding to the identified gestures.
In some implementations, the radar sensor chip 130 is a fully integrated, low-power radar operating in the 60-GHz ISM band. Different modulation architectures can be used, for example, a Frequency Modulated Continuous Wave (FMCW) radar, and/or a Direct-Sequence Spread Spectrum (DS SS) radar. For either technique, the entire radar system can be integrated into the package, including multiple beamforming antennas that enable 3D tracking and imaging with no moving parts.
Potentially the fastest growing option for distractionless driving within cars is the prevalence of voice-controlled interfaces. Although voice-controlled interfaces currently appear to be one of the better options for safer driving, they have limitations including specific control of settings such as volume controls as well as limited contextual understanding. While voice technology is drastically improving at a high rate, voice technology is most commonly built into the car itself. And with the average lifespan of cars at 8 years or 150,000 miles users are quickly stuck with out-of-date technology or are required to keep updating their built-in systems. Although efforts like Android Auto aim to move voice technologies into the mobile phone with driving focused interfaces, these efforts are not yet widely available.
Another potential disadvantage of voice based interfaces is the social aspect of using it. Typically voice technologies are used in private, as it would be strange to use a voice-controlled assistant interface while with friends and disturb conversations. Socially speaking, the misunderstanding of words (especially for users with accents) is particularly frustrating.
In certain aspects, the radar chip 130 can include measurements between 9×9 mm and 12×12 mm. Radar chips can be built specifically for gesture interaction sensing. They are small enough to promise truly ubiquitous gesture interaction across a very broad range of applications. They can be used in many types of environments, including but not limited to traditional devices (such as mobile phones, tablets and laptops), Internet of Things (IoT) devices and car interiors.
The radar sensor of the system 110 can be uniquely suitable for automotive context, as a radar-based technology allows for sensing and stability features that improves upon past technologies. Unlike camera based systems, the sensor may not be affected by the extreme lighting and atmospheric conditions found in car interiors. The sensors can be small in size so that they can be placed in almost any location in the car interior without obstructing the driver or adding visible components to the car interior. The sensor works through non-metallic materials, meaning that there is more freedom for placement of the sensor without worry of obstruction (a major problem with camera sensing). Privacy concerns inherent in camera based systems do not apply to the sensor. Overall the controlled environment of the car interior, in which a fixed location of the driver can be assumed, lends itself to robust gesture detection. To further optimize the robustness of the system, multiple sensors can be placed around the space in which gestures are performed.
The radar sensor chip 130 can be fast and accurate enough to track inputs at up to 10,000 frames per second and detect submillimeter motion. Range, velocity and motion are key to radar-based gesture interaction: the sensor does not build a map of its environment, but it can accurately detect and track multiple motion components caused by a hand moving within its field. For sensor enabled interactions, gestures with a clear motion component are determined by the system rather than gestures that are expressed as hand shapes.
The system providing the gestural interface can use gestures based on virtual buttons rather than physical buttons. In some aspects, the gesture can be performed when the hands of the driver remain in contact with the steering wheel of the car. The system can use a mobile phone located in the car rather than outdated dashboards or other interfaces built into the vehicle. Additionally, the use of the mobile phone allows for an entirely new market of drivers not wanting to replace an entire head unit if they already owned a car.
The system can include contextually limited features to keep the interface functional. Thus, by not overloading users with extra gestures, the number of gestures can be limited at any given time to only permit gestures that are relevant to the user's needs at a particular moment in time. Further, the system can include a shared gestural vocabulary that can be similar to gestures used on other devices such as mobile phones or tablets. The shared gestural vocabulary can be similar to swipes, zooms, clicks, and the like. Additionally, the shared gestural vocabulary can include a new but equally literate form of radar vocabulary gestures.
The system can include an intuitive interface that does not force users to learn a new abstract gestural langue, by leveraging a symbolic language that users already know culturally. The symbolic language can be limited in its functionality so that it remains safe during driving and easily recallable without distraction. The system can also limit phone functionality by keeping drivers less distracted by unnecessary features, streamline a menu system of the system providing a gestural interface in vehicles to improve usability, and include a relatively small classification set to be technically functional.
In defining the classification set, otherwise known as feature set, a survey may be conducted. In the survey, 200 drivers who are 21 years or older and own or have access to an automobile and drive 3-4 times a week may be asked: “When driving, what three mobile services are most important to you? Navigation, music, phone calling, SMS/texting, search, email, or none of the above.” The results of the survey from most popular to least popular can include: navigation, music, phone calling, SMS/texting, search, email, none of the above.
Thus, in accordance with the survey, the system providing a gestural interface in vehicles can include a limited feature set including modalities such as navigation, music, phone calling, and texting. Each of the modes may be accessible from a menu screen of the gestural user interface of the system providing a gestural interface in vehicles. The modes may be accessible from the menu screen at any time, in which the user can provide a particular gesture to enter one of the modes, and ultimately complete a desired task.
The gesture set of the system can include different types of gestures such as primary symbolic gestures and secondary vocabulary gestures. The primary symbolic gestures can be defined as a hand gesture that may already be known culturally for one reason or another and is already in circulation and use by users. One example of the primary symbolic gesture can be the “I'm on the phone gesture.” On the other hand, secondary vocabulary gestures can include a set of gestures that work cross app as a ubiquitous gesture set including gestures such as: swipe left, swipe right, select, back, etc.
The system is designed so that a user enters a specific mode, such as navigation, by performing a primary symbolic gesture, and then once inside a mode can navigate using secondary vocabulary gestures. Of importance to note, is the difference between primary symbolic gestures and secondary vocabulary gestures as a system design solution for UI navigation, but equally it simplifies many technical hurdles. By limiting the gestural functionality of the system at any given time, the classification accuracy can also be drastically improved by reducing the classification set to a smaller range of gestures at any given time, such as two to four gestures rather than eight to ten gestures overall.
The primary symbolic gestures can be based on colloquial gestures. By focusing on a limited modality defined by users, a set of hand gestures may be determined that symbolically represent each of the determined modes while still appearing unique in terms of radar signature. The radar signature of the gestures will be discussed further herein.
Using a predetermined set of gestures, a survey may be conducted to determine preferred gestures. The survey can include four possible gestures for each determined mode. The survey may provide animations of the possible gestures to participants of a particular age category and ask, “From the animations below, the simplest hand gesture symbolizing a ‘phone call’ is which of the following?” In certain aspects, the survey may find that a majority of preference for a specific gesture in each category. As such, the preferential gesture corresponding to each gesture may then be determined to be the primary symbolic gesture for that particular mode.
In order to determine the gestures as they are detected by the sensor or radar chip, signal transformations such as range dopplers, micro-dopplers, spectrograms, and the like, can be examined. The examination of the signal transformations can be used to determine if each of the gestures are recognizable and/or unique to one another.
FIG. 2 is an exemplary illustration of classification accuracy tests of a system providing a gestural interface in vehicles. The classification accuracy tests portray how each gesture may be unique in relation to one another by distinguishing features such as: velocity, acceleration, distance from sensor, magnitude of change in motion, length of gesture, approaching/receding directions. Classification accuracy tests are used to gauge if the movements corresponding to the gestures are detectable. Additionally, the classification accuracy tests are important in providing an understanding of how the gestures remained unique in movements, time, directionality, and proximity to the sensor or radar chip. The classification accuracy tests of FIG. 2 further define the gesture space and unique of each designed gesture aiding both technical feasibility but also system designs/mental models for the user.
In FIG. 2, two examples of gestures sets are illustrated, with their spatial positions indicated within the volume 210 near the steering wheel 220 where the radar system 110 can detect gestures. The first set of gestures includes four gestures 230a-230d, each having a different length, position, and trajectory within the volume 210. These gestures represent one gesture set, recognized by one classifier of the radar system 110. The second set of gestures includes two gestures 240a-240b, which represent a different gesture set identified using a different classifier. For each of these gestures, the dot along the path represents the terminal point of the gesture, e.g., where the user's hand pauses, or the speed of movement decreases below a threshold speed.
FIG. 3 is an exemplary illustration of primary symbolic gestures 310, 320, 330, 340. Each of the gestures is shown relative to the position of the steering wheel 210, and within the volume 210 detectable by the radar system 110. The primary symbolic gestures can be determined based on user evaluation of gestures as well as the feasibility of each gesture's radar signature. For example, the gesture 340 corresponding to the music mode can include rhythmically tapping on the steering wheel, as you would while listening to music. Thus, the music mode would be initiated. In another example, the gesture 310 corresponding to the phone call mode can include making the universal “call me” gesture. As such, the user may be able to answer an incoming call, or make a call out to a particular contact. In certain aspects, upon entering the call mode, a user can either select a favorite contact and call that particular contact, or the user can speak verbally to identify the particular contact. In another example, the gesture 330 corresponding to texting can include making a quack or “yada yada yada” hand gesture. Thus, the system may be configured to initiate the texting mode and compose a new text or open “reply mode” if a recent text just came in. Further, the gesture 320 corresponding to GPS mode can include holding a hand up as if looking at a map on the hand. As such, the GPS or navigation mode will be initiated.
FIG. 4 is an exemplary illustration of secondary vocabulary gestures 410, 420, 430, 440. The secondary vocabulary gestures can be determined based on symbolic relevance as well as technical feasibility. Within any given application it is important to have a shared gestural language, therefore reducing cognitive strain of memorizing extra hand gestures. Similar to a mobile phone's current zoom and pinch gestures, the secondary vocabulary gestures can include an equivalent set for driving control over mobile phones. For example, the gesture 410 corresponding to panning left or right can include flicking two index fingers while holding onto the steering wheel of a car. Thus, the gestural interface can be panned to the left and to the right via such gesture. In another example, the gesture 420 corresponding to selecting a particular entry can include flicking both index fingers forward with both hands simultaneously. In a further example, the gesture 430 corresponding to going back/dismissing an option can include flicking away from the steering wheel. In this instance, a gesture of disapproval, such as flicking away from the steering wheel, may be used to reject calls or go back to the home screen of the gestural user interface. Another example may be the gesture 440 corresponding to volume adjustment. This gesture can include gently moving index and thumb fingers back and forth, as if over a knob, to turn music or phone call volume up or down.
FIG. 5 is an exemplary illustration of a system providing a gestural interface in vehicles. The system providing a gestural interface in vehicles can include the sensor or radar chip 130 of the radar system 110 placed behind the steering wheel 210 of a car. The placement of this sensor can be determined based on the interior of the car, such as a sports car interior or a conventional sedan interior. While performing random hand gestures and movements inside the car, the sensor can be monitored at various locations to gauge signal responsiveness, and therefore determine the sensor location most suitable to the particular car type. The sensor can be placed centered five inches behind the steering wheel, “pointing” through the opening of the steering wheel. From this vantage point the sensor points directly at the user's chest and is able to read both gestures from hands on the steering wheel as well as in open air between the steering wheel and the user's chest. The steering wheel may be located 30 inches from the chest of the user. In some aspects, the sensor may be placed ad hoc in this position by the user and connected via BLE to a mobile phone on the dashboard of the car. By connecting the sensor to the mobile phone via BLE, there would be no need to embed the gesture interface technology within the car.
FIG. 6 illustrates an exemplary graph of a memorability of gestures. It can be important for the system to include symbolic gestures that are either known inherently or learnable in a way that they become immediately memorable. By reducing the cognitive strain when mentally associating gestures to UI controls, users remain less distracted and are able to place more focus on driving. To test the memorability of a particular gesture set, the initial participants used to determine the particular gesture set may be shown the results of the survey and taught each gesture that was chosen to be associated with the modes. The initial participants may then be asked at a later date to identify the mode corresponding to each of the chosen gestures. The results may indicate which gestures are intuitive to the users and as well as which gestures are not as easy to remember.
The data may be collected based on participants seated at a driving simulator and who are introduced briefly to the setup of the system and the corresponding gestures. The participants may be instructed how to do each gesture in the set, and given time to practice the gesture before feeling comfortable performing the gestures while holding the steering wheel. The participants may then be asked to perform the gestures and visually recorded at regular intervals. In some aspects, the gestures may be requested in random order as the capture the gestures more organically.
The cross-user accuracy of each gesture may be tested by collecting a predetermined number of repetitions from each participant per gesture set. Each repetition can include a sample such as a buffered recording for 700 milliseconds to capture the length of the entire gesture. The samples may be used by a gesture recognition pipeline. The gesture recognition pipeline can be performed by the radar chip, the mobile phone, processing circuitry of the car, or any combination thereof. The gesture recognition pipeline can include a series of stages signal abstractions and transformations. In an aspect, the radar chip performs the gesture recognition pipeline and a predetermined number of radar features are extracted from the pipeline. The radar features can be used to compute the mean, standard deviation, sum and absolute sum of each radar feature. As such, if eight core radar features are chosen, 32 total radar features will be collected after the computations are completed. Multiple random forest classifiers, e.g., multiple different sets of decision trees, can be used for classification of these features. The collected data of the features can be used to train and test the classifier on the various rounds of data. In certain aspects, the classification accuracy of the primary symbolic gestures, the phone call gestures, and the secondary gestures and surpass 97%.
In certain aspects, three random forest classifiers may be used in the classification of the gestures. The three random forest classifiers can respectively correspond to primary symbolic gestures, phone call gestures, and secondary vocabulary gestures. The three random forest classifiers can run simultaneously, however, the current context of the gestural user interface may dictate which algorithm to pull data from at a given point in time. For example, if the UI has determined that the system is in music mode, the UI may pull data from the random forest classifier corresponding to the secondary vocabulary gestures to further determine the user's selection of music. In another example, if the UI has not determined an initial mode (according to the primary symbolic gestures), the UI may pull data from the random forest classifier corresponding to the primary symbolic gestures to determine which mode the user wishes to enter. The results of the random forest classifiers may be input to a post processing module. The post processing module may smooth the data of the random forest classifiers by skipping frames or adding a buffer of time to the beginning of each frame, so that each determined gesture may be cross-checked for accuracy.
FIG. 7 illustrates an exemplary graph of a volume of training data. The volume of training data includes a collection of data corresponding to three separate portions of the gestural user interface: primary symbolic gestures (call, send text, play music, start GPS), phone call gestures (answer, dismiss), and secondary vocabulary gestures (left, right, select, back). To account for differences in car dashboards, automobile sizes, and general variability, the random forest classifiers was run multiple times to understand how much training data is needed to create a reliable classifier.
The system providing a gestural interface in vehicles may use large training datasets to improve real time analysis. The system can be tailored to the type of car in which sensor placement may improve radar signatures corresponding to the provided gestures of the user. In some aspects, the training set of gestures may need to be altered according to the physical structure of each car dashboard. By placing the sensor in a location proximate to the steering wheel and in the vicinity of the chest of the user, the sensor may be able to determine a heart rate and respiratory rate of the user. The heart rate and respiratory rate of the user can further be used to determine stress levels of the user while driving. The system can be implemented with visual and audio cues that refrain from distracting the user while driving. For example, the audio cues may indicate to the user that contacts are being scrolled over when placing a call via secondary gestures in the phone call mode.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the disclosure. For example, various forms of the flows shown above may be used, with steps re-ordered, added, or removed.
Embodiments of the invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the invention can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a tablet computer, a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the invention can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
Embodiments of the invention can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
While this specification contains many specifics, these should not be construed as limitations on the scope of the invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of the invention. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
In each instance where an HTML file is mentioned, other file types or formats may be substituted. For instance, an HTML file may be replaced by an XML, JSON, plain text, or other types of files. Moreover, where a table or hash table is mentioned, other data structures (such as spreadsheets, relational databases, or structured files) may be used.
Particular embodiments of the invention have been described. Other embodiments are within the scope of the following claims. For example, the steps recited in the claims can be performed in a different order and still achieve desirable results.