Second Half 2022 Technical Outlook for Data and Artificial Intelligence


With our first-half show off 2022, it’s time to take stock of where we’re at this year in big data, advanced analytics, and AI and assess where we’re likely to go next.

Based on where we have been so far in 2022, Datanami feels confident in making these five predictions for the remainder of the year.

Data monitoring continues to work

The first half of the year was huge in terms of data observation, giving customers better insight and metrics of what’s happening with their data streams. As data becomes more important for decision making, the validity and usability of this data becomes more important as well.

We’ve seen a number of data monitoring startups earn hundreds of millions of dollars in project funding, including Cribl (Series D, $150 million); Monte Carlo (Series D, $135 million); Coralogix (Series D, $142 million); and others. Others making news include Bigeye, who has rolled out metadata metrics; StreamSets, acquired by Software AG for $580 million; and IBM, which bought monitoring startup Databand las tmonth.

This momentum will continue into the second half of 2022, as more data monitoring startups exit the woods and existing companies seek to establish themselves in this emerging market.

Is real-time data ready for a boom? (Blue Planet Studio/Shutterstock)

Real-time data pops

Real-time data has been on the hook for years, serving some niche use cases but really not seeing widespread use among regular businesses. But thanks to the COVID pandemic and associated change in business plans over the past two years, the conditions are now ripe for real-time data to move into mainstream tech.

“I think streaming is finally happening,” Ali Godsey, CEO of Databricks, said at the recent Data+AI Summit, noting a 2.5x growth in workloads streaming on the company’s cloud data platform. “They have more and more AI use cases that just need to be in real time.”

In-memory databases and in-memory data networks are also poised to take advantage of the real-time renaissance (if that is what it is). RocksDB, a fast analytics database with event-based systems like Kafka, now has a fast alternative called Speedb. SingleStore, which combines OLTP and OLAP capabilities into a single relational framework, achieved a valuation of $1.3 billion in a funding round last month.

There’s also StarRocks, which recently funded a new fast OLAP database based on Apache Doris; Imply, which received $100 million Series D in May to continue its real-time analytics business based on Apache Druid; And DataStax, which added Apache Pulsar to the Apache Cassandra portfolio, raised $115 million to drive real-time application development. Datanami You expect this focus to continue to be on real-time data analysis.

organizational growth

It’s been four years since the General Data Protection Regulation came into force, notifying arrogant big data users and accelerating the rise of data governance as a necessary component of responsible data programs. In the US, the task of regulating access to data falls to the states, and California is leading the way with the CCPA, which in many ways mimics GPDR. But more countries are likely to follow suit, complicating the data privacy equation for US companies.

But the GDPR and CCPA are just the start of the regulations. We are also in the midst of the death of a third-party cookie, which is making it difficult for companies to keep track of what users are doing online. Google’s decision to delay the end of third-party cookies on its platform until January 1, 2023 has given marketers some extra time to adjust, but the information from cookies will be difficult to replicate.

In addition to data regulations, we are on the cusp of new regulations on the use of artificial intelligence. The European Union introduced an AI law in 2021, and experts predict it could become law by the end of 2022 or early 2023.

Battle of spreadsheet formats

A classic technology battle is taking shape over new spreadsheet formats that will determine how data is stored in big data systems, who can access it, and what users can do with it.

Apache Iceberg has been gaining traction in recent months as a potential new standard for spreadsheet formats. Cloud data warehouse giants Snowflake and AWS emerged early this year to support Iceberg, which provides transactions and other controls over data and has emerged from work at Netflix and Apple. Claudera, a former Hadoop distributor, also supported Iceberg in June.

But the folks at Databriks offer an alternative in the Delta Lake table format, which offers similar capabilities as Iceberg. Apache Spark proponents originally developed the Delta Lake table format in a proprietary manner, leading to accusations that Databriks was preparing clients for confinement. But at the Data+AI Summit in June, the company announced that it was committing to the entire format to open source, thus allowing anyone to use it.

Apache Hudi is lost in the shuffle, which also provides data consistency as it sits in large data warehouses and is accessible through various computing engines. Onehouse, a project powered by the creators of Apache Hudi, launched earlier this year on Hudi’s Lakehouse platform.

The big data system loves competition, so it will be interesting to watch these formats evolve and wrestle over the remainder of 2022.

Language AI continues to succeed

The cutting edge of AI is getting sharper every month, and today, the tip of the AI ​​spear is the big language models, which continue to improve. In fact, large language models have become so good that a Google engineer claimed in June that the company’s LaMDA conversational system had become conscious.

AI isn’t conscious yet, but that doesn’t mean it’s not good for the organization. We were reminded that Salesforce has a large Language Model (LLM) project called CodeGen, which seeks to understand source code and even generate its own code in different programming languages.

Last month, Meta (Facebook’s parent company) revealed a large language model that can translate between 200 languages. We’ve also seen efforts to democratize AI through projects like BigScience Large Open-science Open-access Open-access Multilingual language model” or BLOOM.

What are your expectations for the rest of 2022? Call us to let us know.

#Technical #Outlook #Data #Artificial #Intelligence

Leave a Comment

Your email address will not be published.