32.Lock Code Circular

Data scientists, who often choose open source packages without considering security, increasingly face concerns over the unvetted use of those components, new study shows.

Vulnerabilities in open source components — such as the widespread flaws revealed 10 months ago in Log4j 2.0 — have forced data scientists to reevaluate the open source code frequently used in analysis and the creation of machine learning models.

According to a report by Anaconda, a data-science platform firm, in the past year, 40% of surveyed data scientists, business analysts, and students have scaled back their use of open source components, while a third remained steady, and only 7% incorporated more open source code into their projects. The majority of those surveyed do not report to the information technology department (18%), but work within their own data science or research and development group (47%), according to Anaconda's "2022 State of Data Science" report, released last week.