Covering Scientific & Technical AI | Wednesday, November 27, 2024

Rangers Redux: Can Texas Repeat with Data, Analytics, and AI? 

Rangers Redux: Can Texas Repeat with Data, Analytics, and AI?

Hope springs eternal on Opening Day. Every team starts with a perfect record and dreams of winning the World Series in six months. For the Texas Rangers, defending their championship will require the proper mix of hard work, determination, and luck. Oh, and data–lots and lots of data.

The Texas Rangers worked hard, both on and off the field, in winning the franchise’s first World Series last year. Scouts spent years scouring the world for talent, the front-office made personnel moves that put the team in playoff contention, and the players came through with timely plays down the stretch. Luck also factored in, with an unheard-of 11-game road winning streak through the playoffs.

So what ultimately pushed the Rangers over the top? One theory is the team’s commitment to and investment in data, analytics, and AI had something to do with it. Alexander Booth, the Rangers’ assistant director of research and development, discussed the team’s use of the Databricks data platform and its adoption of AI, including generative AI, at the Data + AI Summit last June.

Following the Rangers’ World Series win, Booth sat down with Datanami to share some of the lessons from the 2023 season, and how the R&D department will look to improve its data, analytics, and AI game in preparation for the 2024 season.

“One of our core tenets in the research and development department is that investing in technology and investing in data gives us a competitive advantage,” Booth said in an early December interview. “We don’t ever want to be chasing other teams in a catch-up mode, especially when it comes to technology and data.”

 

(University-of-College/Shutterstock)

Booth characterized the Rangers use of data, analytics, and AI as simultaneously very aggressive and expansive, but also balanced. The team tries to employ data, analytics, and AI to optimize as many decisions as possible, while still leaving room for the gut feel of baseball lifers like Manager Bruce Bochy.

“Obviously with a guy like Bochy or CY [General Manager Chris Young], they have a lot of domain expertise in the game. They’ve been around for a while, and that’s super valuable,” Booth said. “But at the end of the day, we want to make a decision. [Whether it’s] a decision on alignment for our defensive positioning, whether or not we’re going add this guy to the roster to protect him from the Rule 5 draft, or who we’re going start in pivotal playoff games–to make a decision, especially a high-leverage decision like that, they want to see as many data points as possible.”

The combination of the Databricks platform, AWS compute and storage, and data tools like Prophecy give the Rangers R&D team the capability to amass a lot of data in a single place for analysis and modeling. What they do with the data is dependent on where they can make an impact on the game.

The breadth of the Rangers’ data, analytics, and AI systems is impressive, with many different systems designed to inform decision-makers. From tracking player development at the amateur level, using physics-based models to fine-tune defensive positioning, or running simulations to optimize pitcher-hitter match-ups, the Rangers are fully enmeshed in data, analytics, and AI.

Here’s a peek into some of the Rangers’ systems for data, analytics, and AI:

Scouting with GenAI

The Rangers were among the first MLB clubs to adopt generative AI, which burst into being with the launch of ChatGPT in late 2022 and took the world by storm in 2023.

“You know it’s a crazy technological revolution when these guys that are old players who just live and breathe baseball are asking about ChatGPT and how can we kind of integrate this into the Rangers somehow,” Booth said.

 

(wituli/Shutterstock)

Much of that information scouts use is of the unstructured variety–scouting reports, newspaper articles, video interviews. GenAI helps the Ranger scouts filter out the noise and focus on information that matters.

“I talk to them, and they say ‘I do Ctrl-F.’ They have these key phrases that they look for,” Booth said. “For our stakeholders who are reading dozens and dozens of scouting reports and articles, consuming a ton of media about these players, watching a lot of video–it can get really hard to dig through the noise.”

Natural language processing (NLP) is also helping Rangers identify intangibles about the players themselves. By pairing speech-to-text capabilities with language models, they can quickly process through many videos to get an idea of what a college or high school players mental makeup is and how well they respond to adversity.

“That’s something that happens in baseball all the time. You get injured. You fail. You have a bad week. You have a bad two weeks. But how do you pick yourself up? How do you try and strive to make yourself better?” Booth said. “We’re able to identify certain key phrase and sentiment with natural language processing.”

The Rangers have developed their own language model that knows how baseball people talk. So when a scout says something like “this guy throws gas” or “this guy is built like a truck,” the model knows that those are positive sentiments.

“So trying to tune the models to fit to that natural language expectation has been an interesting problem to solve,” Booth said, “but I think we’ve done a pretty good job of approaching it.”

Player Tracking and Biomechanics Data

One of the biggest revolutions in baseball analytics is the widespread availability of tracking data. Every pitch, every play is meticulously tracked with Statcast at 30-frames-per-second, with some limb movement tracked at 300-frames-per-seconed with the Hawk-Eye high frame rate cameras introduced in 2023. But not every team is equal in their capability to take advantage of it.

 

(kentoh/Shutterstock)

“In baseball, it’s been this explosion of new technology,” Booth said. “We’ve been getting this data for a little while now, and we knew that without a cloud platform, that we weren’t going to be able to process that. And there are clubs that can’t process it–straight up, they have no way of having the technology to be able to analyze bio-mechanics data to get an advantage. So we wanted to build something future-resistant and future-proof.”

The good news for MLB teams is high schools and colleges are now investing in the more basic, 30-frame-per-second tracking technology too. That cranks up the volume of bio-mechanic data available on prospects, which all goes into the pot to help MLB teams like the Rangers predict which players have a future in the Big Leagues.

“At the end of the day, that’s what we’re doing,” Booth said. “We’re going to have AI models that are going to be predicting the likelihood that this high school or college player’s going to need surgery, predicting the expected round this guy is going to be taken in, predicting things like bonuses.”

Weather Data

Another source of massive data is the weather. Whether the wind is blowing in or blowing out on a given day will help inform a range of on-field decisions, such as what kind of pitch-mix to use, how to compose the batting order, and where outfielders will play.

 

How a field plays is impacted by weather (Image courtesy Statcast)

“The weather data is insane,” Booth said. “It’s a lot of data coming in that we’d never had before. Fluid dynamics, physics-based models predicting how balls would fly in different kind of atmospheric conditions, given different wind speeds, and things like that.”

The science says wind blowing toward home plate will tend to amplify breaking balls, which will impact the mix that a pitcher might use. When the wind is blowing toward the outfield, it might incline a manager to put in the big boppers, or move them up in the lineup, in the hopes of getting home runs.

The availability of weather data also helps the Rangers normalize hitting, pitching, and fielding statistics for players and prospects. The Rangers play in a retractable dome, which minimizes weather impacts, but the R&D team can use data to see what kind of stats a player or prospect will put up in Globe Life Field.

“If we didn’t really have a tech stack to look at that, or the people or the AI or the products, like Prophecy to process that at scale, we would be stuck,” Booth said. “So building out the strategy to allow us to be a first mover on weather data, is the advantage.”

In-Season Modeling

Baseball has always been a game of numbers and statistics. What’s changed since the Moneyball era started about 20 years ago is the amount of data that teams use for analysis, and the types of analyses they’re doing.

 

Rangers second baseman Marcus Semien (Conor-P.-Fitzgerald/Shutterstock)

For instance, the Rangers used machine learning and AI models to help with all sorts of player development decisions, including whether to sign particular free agents. During the 2023 season, the team had models that tried to predict what kind of season various free agent pitchers would have.

“We had models that said, alright we’re going to sign Jacob deGrom in the offseason and now let’s predict the likelihood of injury,” Booth said. “Unfortunately, he did get injured fairly early this season, but knowing uncertainty and probability theory, that was a risk we were willing to take at that time.”

At the trade deadline, the Rangers used models to predict the future performance of pitchers Jordan Montgomery and Max Scherzer, weighing the possibility of getting good contribution versus the odds of an injury and the salary hit the Rangers would take. The models play a part, but aren’t the only factor in these decisions, Booth said.

“The decision was not made purely because of the AI model,” Booth said. “The decision is a holistic, organizational decision, and CY really has a culture where he listens to everybody and he really gets that point of view across.”

Game Modeling and Simulation

The Rangers are also active in using modeling and simulation to see how changes in the lineup or defensive positioning can help them win. According to Booth, it’s not that much different than MLB The Show, a popular video game.

“You can kind of plug in a lineup and see what happens during the game, and now I want to run that 10,000 times,” he said. “Or maybe I want to look at every possible permutation of a lineup and see what is going to perform the best.”

 

The Rangers use simulation to help make decision (Image courtesy MLB The Show)

On the pitching side, the Rangers have the capability to determine what the odds of things happening in certain situations, such as whether a certain hitter is likely to hit a sinkerball in a one- or two-strike count. “We can simulate that out and say, in how many situations has that groundball happened? What is the probability that it actually gets through the infield? What’s the probability that he gets on base or come around and scores a run?”

The simulations work hand-in-hand with their AI models to help the Rangers understand what the results are really saying.

“A lot of traditional ML models, it’s really hard to understand the certainty of their outputs and predictions,” Booth said. “So coupling AI outputs and recommendations with some of the outputs of simulations give an uncertainty estimate to some of these point predictions and point estimations, which again goes back the motif of the more information, the more data, the more techniques and models that you have to kind of analyze the situation, the more confident you’ll be in the recommendation at the end of the day.”

Prepping for What’s Next

 

Alexander Booth is the assistant director of R&D for the Texas Rangers

The Rangers may enjoy a competitive advantage in the data, analytics, and AI department right now, but that lead won’t last forever. Other teams will emulate their World Series-winning approach. The technology is also evolving extremely quickly, which gives other teams the opportunity to catch up and leapfrog the Rangers.

If the Rangers are going to repeat as World Series champions, they will need to beat complacency. Booth said the team is determined not to rest on the laurels of a championship, and to keep finding new ways to exploit data, analytics, and AI for competitive advantage.

“I don’t think that this is going to give us a competitive advantage forever,” he said. “But I think there’s always going to be a next thing, and if we can build something that’s future-resistant [that allows us] to get new data sources to make decisions quicker, or new innovative machine learning and artificial intelligence techniques–if we have a platform in place to be a first mover in that space, that’s going to be what gives a continuously gives that edge.”

Related Items:

Will Gen AI Help the Texas Rangers Win the World Series in ’23?

We’re In the Moneyball 3.0 Era. Here’s What It Means for Live Sports

Today’s Baseball Analytics Make Moneyball Look Like Child’s Play

AIwire