In 1987 Apple released this concept video of what they called a Network Navigator, an integrated agent to manage data from different sources, communicate with people across the network, and share information regardless of the source and its location.
It may seem quaint to us now, 30 years later but the real question is whether the tools have evolved to the point where it is possible to build such technology.
This post will explore some of the current technologies that would make such a project feasible and discuss potential future implementations. This is a theoretical exploration rather than a working example. There is no code in this post 🙂
Three things that were impossible to predict in ’87 are how powerful mobile phones would become, how much would agents become a common day occurrence, and the growth of home automation with tools like Alexa Home or Google Nest.
Tools like Google Assistant, Apple Siri, and Amazon Alexa have created large markets for voice-enabled applications so, in theory, it should be possible to do one of three things:
- Use default tools from the agent
- Load existing actions for what we need
- Build custom actions when actions are not available or they will not do what we need
For example, Google Assistant can already do voice searches in Wikipedia and Gooogle search so we have the first layer of research tools available.
We can also leverage existing voice interfaces from universities or commercial research databases.
If there are no existing actions we can develop our actions to interact with data sources and libraries. We may also want to consolidate all searches into a single command.
Search for information
Today I can tell Google Assistant to
search Wikipedia for bionics and it will display the results from Wikipedia for the requested term. How difficult would it be to trigger the same search on multiple sites/engines from the same command?
Machine Learning could help us to provide a context-sensitive search interface where results are more closely aligned with the meaning of our search.
Text to Speech
Most browsers now have text-to-speech capabilities and so do agents. It shouldn’t be too much of a stretch to use the agent’s built-in speech capabilities or piggyback on the Web Speech API to provide a more generic solution that has the browser or agent read the page to us without us having to “make” an agent to do it.
WebRTC provides both audiovisual and data channel point to point and multicasting communication.
Agents could leverage the low-level APIs or they could piggyback into existing tools like Google Meet or similar tools that allow communication and data exchange.