All posts in “Data”

Orbital Insight closes $50M Series C led by Sequoia

This morning, Orbital Insight, a geospatial analytics startup, announced it had completed raising a $50 million Series C round of financing from Sequoia. The fresh capital brings the company’s total fundraising to $78.7 million.

Orbital Insight generates analytics for businesses using satellite imagery and other data. A product of deep learning and more readily available satellite imagery, the geospatial analytics space as a whole has drawn a lot of attention for the role it has played in the hedge fund industry.

With images of retailer parking lots, companies like Orbital Insight have proven their ability to extrapolate accurate revenue predictions for businesses by counting cars as a proxy in advance of official earnings reports. But as data and technology commoditize, startups have been racing to service new markets and simplify their offerings to avoid becoming clunky consultancies.

“Neural networks are very generalizable,” said Bill Coughran, a partner at Sequoia Capital. “Some adaptation is needed depending on the use case, but it’s the same basic underlying tech.”

With the news of today’s Series C, Orbital Insight has grown to become one of the most capitalized startups in the geospatial analytics space. Some of its competitors, Descartes Labs and Spaceknow, have raised $8.28 million and $5.2 million, respectively.

“The company is seeing strong uptake commercially and with the government,” added Coughran. “We felt this was a good time to hire for engineering and sales.”

New investors Envision Ventures, Balyasny Asset Management, Geodesic Capital, ITOCHU Corporation and Intellectus Partners joined Sequoia in today’s round. And, of course, earlier investors GV, Lux Capital and CME Ventures maintained their involvement with participation.

The diverse collection of investors hints heavily at ambitions of international expansion. Geodesic Capital is regularly involved in helping startups access Asian markets and ITOCHU itself is a Japanese firm.

Featured Image: Orbital Insight

Your headphones aren’t spying on you, but your apps are. Here’s why.

Lawyers in the US are claiming that headphone and speaker company Bose, is secretly collecting information about what users listen to when they use its bluetooth wireless headphones.

Edelson, the lawyers acting on behalf of customer Kyle Zak of Illinois, claim that information about what Zak has been listening to through his Bose headphones was being collected without his knowledge or explicit consent every time he used a Bose companion mobile app called Bose Connect

The app allows customers to interact with the headphones, updating software and also managing which device is connected at any time with the headphones. If the headphones are being used to listen to something, details about what is being played will show up in the Connect App.

This information is then collected by Bose and sent to third parties, including companies like Segment, who facilitate the collection of data from web and mobile applications and make it available for further analysis.

The lawyers are contending that Bose’s actions amount to illegal wire tapping and that the information being collected could reveal a great deal of personal information about customers. Allegedly, Kyle Zak would not have bought Bose headphones if he had known that this information would be collected and he further claims that he never gave his consent for this information to be collected.

Bose has denied the allegations and pointed to the privacy policy in the Connect App that is explicit about the fact that it collects de-identified data for Bose’s use only and does not sell identified data for any purpose including “behavioral advertising”. Bose also points out that what a customer listens to on the headphones is only visible to Bose if the customer is using the Connect App and has it open and running.

Given the app’s limited functionality, it is really unclear why anyone would use the Connect App for this purpose on a continuous basis.

Most software uses tracking

The majority of apps installed on a phone will be collecting data about its usage and sending it back, de-identified, for analysis. This data may well be aggregated without giving any detail about any individual user. So, it would not be possible for example to say whether people who use an app every day are more likely to use particular features. Of course, some companies do collect this level of detail.

So what is this tracking data used for?

Developers use this information to track a range of things including statistics about usage of the app. Companies usually track how many daily and monthly active users they have and how many users stop using the app after opening for the first time.

Developers are also interested to find out if the app experiences problems, like crashes for example. They are also interested in what features of the app do customers use, what sequence did they use them and for how long.

A range of companies, including Apple and Google provide means of collecting anonymous statistics from users. The data is sent back to a server and made available for analysis. This type of tracking is very different from the tracking that is done for advertising purposes. In this case, information is collected that is identifiable and used to personalise ads to be delivered either directly through the app, or through other services.

Hidden privacy statements are not enough

Privacy statements for apps, websites and other software should make it clear, and before the user starts using the app, what information the software is collecting, who it will be shared with, and for what purposes. Most software however, does not do this. Companies simply skip showing a user the privacy statement and make reference to the fact that the statement can be accessed somewhere on a website or in the app, at a later time.

Another problem with a great number of privacy policies, is that they are written in legal language and do not make explicit what information is being collected and for what purpose.

It is not only the companies that treat privacy as an afterthought. Customers also struggle with understanding the basics of their rights to privacy and what a privacy statement actually does. In 2014, Pew Research found that 52% of Americans surveyed wrongly believed that simply having a privacy policy at all meant that companies kept confidential all the information they collected on users.

In another survey, only 20% of users who read any part of a privacy policy felt they fully understood what they had read.

Ironically enough, the website of legal firm Edelson does not feature a clear link to its privacy policy. Its privacy statement is buried in a “Disclaimer” which helpfully says: “PLEASE READ THE FOLLOWING TERMS OF SERVICES & LEGAL NOTICES (“THIS AGREEMENT”) CAREFULLY BEFORE USING THE EDELSON.COM WEBSITE”. Somewhat hard to do if you have to visit the site to get to it.

Privacy should be treated as a fundamental driver of design in software. This situation has been changing, especially as companies have focused on protecting customers’ privacy, not from the companies themselves, but from law enforcement agencies, secret services and the government in general. 

Perhaps also, the threat of legal action by companies like Edelson, will prove another incentive to do the right thing.

WATCH: This college student spent his summer undercover in a Chinese iPhone factory

This article originally published at The Conversation

The future is being able to monitor the heart rate of your favorite NFL player

Defensive back Marcus Maye runs through a drill during Florida's NFL Pro Day in Gainesville, Florida.
Defensive back Marcus Maye runs through a drill during Florida’s NFL Pro Day in Gainesville, Florida.

Image: AP/REX/Shutterstock

In the perhaps not-so-distant future, NFL players will be able to monetize intimate data about their body. 

The NFL Players Association recently closed a deal with Whoop, a wearable tech company that makes wristbands that track heart rate, sleep quality, and more. 

Whoop will now hand out wristbands to NFL players who will own the data produced by their wearables and can sell that information to, as Bloomberg wrote, a network that could then broadcast the heart rate of that player throughout a particular game. 

Players don’t have to sell their data, and they don’t even have to wear the wristbands. But if they do both, it could provide some pretty fascinating insights.

Does that one quarterback look out of shape? Well, now you can look up his data. Is that injured player ready to return to the field? Check his profile to find out. 

Bloomberg reports that the NFLPA and Whoop will examine the data to learn more about how NFL players recover during the season, in hopes of better protecting players from injury. In exchange, Whoop gets to sell wristbands designed by NFL players, and of course now has the endorsement of a player organization in the most popular sports league in the United States. 

The effects of this type of data availability, while unknown, are somewhat creepy to contemplate. Whoop has worked with college athletes and not given those athletes control of the data on them, allowing coaches to know whether the student-athletes are, for example, getting “enough” rest on Friday and Saturday nights. 

NFL athletes seem to have significantly more control—but, if wearing these wristbands becomes widespread, we’ll see how much pressure is put on athletes to conform. 

WATCH: This inflatable wristband could save your life in deep waters

Immuta adds accountability and control for project-based data science

Fresh off $8 million in Series A financing, Immuta is releasing the second version of its data science governance platform. With the democratization of machine learning comes new risks for businesses that have too many workers manipulating data sets and models without oversight.  The Immuta platform helps companies maintain an understanding of how digital assets are applied and shared across a company.

Immuta’s fully Dockerized platform is something of a “Google Docs” for data scientists. To ensure privacy compliance and proper implementation, managers can effectively track changes and set permissions and sharing settings — in addition to more complex auditing features. The version being launched today adds the concept of “Projects” to the platform. This enables data management at the granularity of a specific task that brings together multiple people and data sources.

  1. My Projects

    The main projects page

  2. Project Page

    A single project

  3. Project Page – Selecting a Purpose

    Selecting a purpose for a given project

Mathew Carroll, CEO of Immuta, explained to me in an interview that the goal of Immuta is to provide transparency and accountability. The startup is targeting customers in the middle of the road in their machine learning adoption. This only really excludes companies that live under a rock and companies that have their own internal full-stack machine learning infrastructure, like Facebook or Google.

Founded in 2014, the College Park, Maryland-based startup has grown to 24 employees. Some of its early customers in the private sector include General Electric, Capital One, and Orange. Immuta also services public sector clients including the U.S. Government.

It’s a particularly hot time for data governance startups given recent developments with the European Union’s General Data Protection Regulation (GDPR). As time goes on, data will continue to become less and less of a free for all and companies will be expected to ensure data is only put to use for specified purposes. Businesses have an explicit legal interest in ensuring data compliance — something Immuta is more than willing to support.

Users of Immuta can now also take advantage of a virtual Hadoop Distributed File System (HDFS) Layer for distributed batch processing without sacrificing control. Future versions of Immuta will take into account the ever-changing policy and regulatory landscape and offer new features for further automating data governance.

Featured Image: Erik Von Weber/Getty Images

A conversation about digital copyright reform

The European Union is in the process of reforming copyright laws that date back to 2001, as part of a wider strategy to establish a Digital Single Market across the 28 Member States of the bloc, aiming to break down regional barriers to ecommerce.

Earlier this year an agreement was reached on ending geoblocks on travelers digital subscriptions by 2018. And EU consumers are set to say adios to mobile roaming fees from this June. So far so good, you could say.

But when the European Commission’s draft proposals for digital copyright reform were published last September they were criticized by tech companies as regressive, and by copyright reformists as a missed opportunity to modernize ill-fitting laws to make them fit for purpose in the Internet age.

There have also been warning of the potential impact on startups of the copyright reform, although it’s fair to say that the loudest complaints are coming from big US tech companies who appear to be core targets in the European Commission’s draft proposal, on account of the size and power of their content sharing platforms.

In the supporters’ camp, EU sources argue that the Commission’s proposals will help European creative and copyright-centered industries flourish in a Digital Single Market, and European authors reach new audiences — while making regional works widely accessible to EU citizens and across borders.

“The aim is to ensure a good balance between copyright and relevant public policy objectives such as education, research, innovation and the needs of persons with disabilities,” one EU source told us. “We trust that the discussions in the Council and the European Parliament will aim to maintain this ambition and will facilitate access to and use of copyright-protected content online and ensuring a well-functioning copyright marketplace.”

At the launch of the draft proposals last year, Andrus Ansip, VP for the Digital Single Market, summed up the balance the EC is seeking to strike thus: “Europe’s creative content should not be locked-up, but it should also be highly protected, in particular to improve the remuneration possibilities for our creators.”

Neighboring rights for news

Among the most controversial elements of the proposals are an extra copyright provision for using snippets of journalistic content online — a so-called ‘neighboring right’ for news sites, which critics describe as an attack on the hyperlink.

This could apply to link previews generated by news aggregators like Google News, for example, or social network sites like Facebook linking out to news articles. But there are also suggestions it may disproportionately impact startups in the news aggregation and/or media monitoring space.

Although EU sources emphasize there is no requirement that publishers levy a charge for their content — rather it is up to publishers to decide on conditions for use of their content, with the argument being that the neighboring right would give publishers a stronger legal basis to negotiate with third parties.

A similar law was enacted in Germany in 2013, but uncertainty remains about what actually constitutes a snippet — and local publishers ended up offering Google free consent to display their snippets after they saw traffic fall substantially when Google stopped showing their content rather than pay for using them.

Spain also enacted a similar ancillary copyright law for publishers in 2014 but its implementation required publishers to charge for using their snippets — leading Google to permanently close its news aggregation service in the country. A subsequent economic study found the significant drop in traffic associated with the shuttering of Google News in Spain mostly affected smaller, niche or newcomer publishers. But even large media entities there have come out against the law.

Content monitoring

Another highly controversial portion of the copyright proposal is a requirement on websites that host large amounts of user generated content to monitor user behavior to identify and prevent copyright infringement. So, in other words, to shift from reviewing reported content after its been published to proactively scanning at the point of upload to try to prevent copyright infringements happening in the first place.

Critics complain this approach would compel private companies to police the Internet on behalf of rights holders. They also suggest it’s a surveillance risk and that requiring indiscriminate monitoring of citizens’ online activity is disproportionate and therefore potentially violates fundamental EU privacy rights.

Countering these criticisms, EU sources emphasize that the Commission’s proposals are specifically targeted at services that store and give access to “large amounts of copyright protected content” — pointing out that such platforms have become important players on the content market.

“Due to the nature and significance of these services for the distribution of copyright protected content, they are required to take certain measures to allow a better functioning of the content market,” one source told us. 

Measures taken must also be “proportionate”, and should not be “unnecessarily complicated or costly for the service providers”, according to the source. Nor are specific technologies or solutions imposed.

It is for the services to find the appropriate and proportionate measures, which could be developed either internally or using for example third party services, as done by a number of services already today,” the source added, arguing that the proposal “strikes a balance between different interests”.

“It imposes obligations on platforms with large amounts of copyright protected content, which can be expected, due to their role on the content market, to have certain responsibilities. It also introduces safeguards for businesses and users. It does not introduce a general obligation to monitor content.”

Text and data mining

The copyright reform also proposed to establish a new EU-wide exception for text and data-mining — but only for research institutions conducting scientific research, which has raised questions over whether commercial data mining activity might suddenly be considered to lie outside the law.

Responding to this concern, EU sources argue that the proposal does not regulate or extend access to data for any stakeholders, nor does it change the current situation for other users of text and data mining — adding that these users “can continue exercising their activities under the same conditions as today”.

There has also been disappointment among copyright reformists that the EC has not sought to harmonize rules across the EU to recognize and put beyond legal doubt digital remix culture, such as the ability to create GIFs, memes, supercuts etc — types of digital content which may currently at least technically be copyright infringements in some EU Member States.

The European Parliament has been debating the copyright reform proposals for the past few months, as it formulates its official reaction to the draft proposals and seeks to push for specific changes. And this week Members of the European Parliament submitted their amendments to the Commission’s proposals, as part of that process.

TechCrunch spoke to MEP Julia Reda, a long running proponent for copyright reform — who has called for bold and ambitious reforms yet instead finds herself fighting a set of proposals that she argues could usher in additional restrictions on web users — while also disadvantaging regional startups.

TC: What was the original impetus for the EU’s digital copyright reform, and what did the Commission eventually propose?
Reda: When EU first announced the copyright reform it was saying that the purpose was to really make life easier for everybody – businesses who wanted to scale up throughout the entire EU but also for citizens or consumers who wanted to access different services across borders. And what would be needed for that would be a more European copyright. We’re currently stuck with 28 different national laws that are often contradictory and that’s often causing problems in the online environment. So when the Commission came out with this proposal there was very little of this ambition to be found. There are a few exceptions that the Commission is proposing to make mandatory across the EU when it comes to teaching – so use of digital content in teaching, and preservation copies being made by libraries and archives but this is really – while it’s a step in the right direction it doesn’t really do much more for the market.

At the same time when it comes to the measures that are proposed on the marketplace I think they are actively harmful, so on the one hand you have a provision that would force any company or not even company any host provider that is basically giving users the possibility to upload content on their own an obligation to monitor what the users are doing – and this is not only extremely costly for all the providers, it could be anyone from Wikipedia to Github to photo communities, but it’s also a violation of fundamental rights. In the past the European Court of Justice has made it very clear that Member States are not allowed to impose a general obligation on Internet providers to monitor what users are doing. And this is exactly what this law would do. But this is the one big criticism that I think is relevant when it comes to how this would affect the Internet ecosystem.

The other one is the proposal to extend copyright for press publishers and allow them to ask for licence fees for the reproduction of even the smallest snippets of content – so for example the headline of a news article. This directly interferes with the possibility to link to content on the Internet because of course if you’re linking to something you want the link to be meaningful, and at the very least to include the title of the article you’re linking to.

TC: How have we arrived here? Who most stands to benefit from the most controversial proposals?
Reda: I think both of these proposals are examples of really blatant industry lobbying. So in the case of these content monitoring provisions, this has been very clearly pushed for by the music industry. And it’s actually a parallel development to the discussions that are going on in the US. So the music industry has quite successfully convinced a lot of lawmakers that they basically need to be paid more by YouTube. The entire purpose of this article is really to settle a fight between music labels and YouTube. The problem with this proposal is of course that its effects would go far beyond YouTube. And in fact probably YouTube would be one of the only hosting websites that could easily comply with this website because they already have a content monitoring facility in place. So even though it’s intended to strengthen the position of the music industry when it’s negotiating with YouTube, probably the collateral damage on other hosting websites would be a lot higher. But this is simply not something that the Commission has been thinking about when it was drafting this law. It’s very clear that they had a very specific type of website and a very specific type of content in mind, where such automated filtering may be more realistically possible.

Because if you’re trying to find a music recording, at least technologically this is comparatively simple because a music recording is more or less unique. But copyrighted content is a lot more than that. And if, for example, software would have to detect any type of copyright infringement – which is basically what this law is saying – the technology for that doesn’t even exist. So it could be things like being able to transfer to detect translations of a text that can be a copyright infringement, or pictures of a sculpture from different angles. It can be compositions rather than just musical recordings. So it’s really a huge technological challenge and it’s very clear from the fact that in all its reporting documents the Commission is only talking about the music industry that this is really what they had in mind. And there has been quite clear lobbying from the industry for this.

And in the case of the extra copyright for press publishers, it’s not even the publishing industry in general that’s in favor of this. It’s a relatively small number of – in particular two German publishing houses – that want to have this. And everybody else is a bit more puzzled by it. But because we had a German commissioner at the time that this proposal was being produced, they had very easy access to the highest levels of the Commission. But there are a lot of publishers who are actually quite critical of this proposal because they are saying that being able to be found on news aggregators and being able to be linked to by people on social media, is absolutely crucial to their business model and to finding their audience. So it’s not like the entire publishing industry is in favor of this either.

[In Germany an ancillary copyright] was passed into law in 2013, and since then there has been court battles going on about what it actually means. Like how many words are you allowed to use before it becomes an infringement? And none of these questions has been solved by now. But a number of startups who have been doing media monitoring and stuff like that have had to go out of business because of the legal uncertainty, and they just can’t get funding — if they don’t know whether what they are doing is legal. And they’re probably not going to find out for several years.

TC: Setting aside the problem of a lack of ambition in the reform, it sounds like it has been overly broadly drafted –- could the Commission fix what it has, or do you think it should be scrapped entirely?
I think it should be scrapped because there’s not one problem with the proposal but several ones. So I think it’s a fundamentally bad idea to write content recognition technology into law. Not just because it’s extremely invasive but because it systematically ignores users’ rights. So the way that copyright is designed in Europe is that we have exclusive rights, and then have a list of specific exceptions under which users are allowed to use copyrighted content. So, for example, in most member states of the EU you are allowed to use works for purposes of quotation, within certain limits of course. The technology is not able to distinguish between a lawful use of copyrighted content under an exception, and an unlawful use –- so it simply takes down every use of the content that is not licensed. And this of course leads to takedowns of lots of EU content and it systematically undermines the purpose of the exception which is usually the protection of freedom of expression. So I think as long as this proposal talks about forcing anyone to use content recognition technologies it’s systematically undermining the copyright exceptions and it’s basically throwing the copyright system even more out of balance. So I find it very difficult to imagine how this could be fixed.

I think it’s a fundamentally bad idea to write content recognition technology into law. Not just because it’s extremely invasive but because it systematically ignores users’ rights.

The other problem is that it’s trying to misrepresent the legal status of hosting providers in the EU. Because at the moment, if a user uploads something to a platform it’s primarily the user who is responsible for it, so they are the ones who have to check whether the content they are uploading is legal and so on – and this make sense because otherwise it wouldn’t be possible to run a platform that has a lot of user uploaded content. If you had to check every YouTube video before it’s uploaded or every picture before it can be used on Wikipedia, these platforms simply wouldn’t work the way that they work today. And so that’s why there is a limited liability for these host providers that no they don’t have to pro-actively check everything that is uploaded. But in return they have to take down content once they’re informed, or once they learn that there’s something illegal there. And they’re doing this. So I think that as long as the proposal first of all doesn’t recognize this legal regime and this limited liability, and at the same time speaks about content recognition, I don’t see how it can be fixed.

TC: At the moment in the EU there’s a lot of political pressure on social platforms to get better and faster at taking down problem content such as hate speech, terrorist propaganda and child abuse imagery — including governments talking about wanting the tech companies to build tools to help automate this process. Might this sort of thinking be feeding into the Commission’s proposals on copyright too?
 I think the problems associated with copyright infringement, with hate speech and with images of child abuse are fundamentally different. So first of all with hate speech the biggest problem is that according to numbers by the Council of Europe, only 15 per cent of hate speech is even illegal in the first place. So the companies are often being asked to take down content that is technically legal. And then of course it’s extremely difficult because then the problem is not that the companies are not complying with their obligations under the limited liability regime, but the problem is that the laws are not fit for purpose to actually address hate speech –- so there we have a problem, and it’s the problem with the criminal provisions in the Member States and not with the enforcement of the law by the platforms.

Then in the case of images of child abuse, it’s relatively clear – the legal situation is essentially the same all around the world. These images are illegal to spread and therefore if you have an exact copy of the same content then it’s very easy for a platform to say this is illegal, this needs to be taken down. And there I think the use of automated recognition of these images can be justified. And then it can be taken down at the source. The problem is this doesn’t work for copyright because with the copyright exceptions, just because something is using copyrighted content does not mean that it is actually infringing. And the problem is of course if you start putting in place infrastructure for one type of content – perhaps it’s justified with terrorism – then there will invariably be a strong push to use it for all types of other content where it is not justified. And I think – well, there are lots of examples for this – but I think for copyrighted content these automated tools simply undermine copyright exceptions. And they are not proportionate. I mean we are not talking about violent crimes here in the way that terrorism or child abuse are. We’re talking about something that is a really widespread phenomenon and that’s dealt with by providing attractive legal offers to people. And not by treating them as criminals.

TC: How do you believe startups might be disadvantaged by the current proposals for the EU copyright reforms? Big companies like Google have some clear risks but also big resources to respond to new laws. What specific risks do you see for startups?
There’s a certain cognitive dissonance among a lot of the regulators in Europe because on the one hand they are kind of upset about the fact there are so few European startups and they’re wondering how we can better compete with the US, but at the same time they’re putting in place laws that are targeted at the big US tech giants but that actually end up hitting the domestic startups a lot harder because they have to comply with pretty strict regulations from the start that they’re not equipped to actually deal with, and that often hampers their possibility to get funding.

I mean something that an investor certainly does not want to have is legal uncertainty. And a big flaw of the proposals that are put on the table by the Commission is that they are unclear. If you took, for example, the neighboring rights for press publisher by its word you would have to conclude that taking a single word, or even a single letter from a publication would be an infringement because, unlike copyright, neighboring rights do not have a threshold of originality. But at the same time of course common sense dictates that you cannot have an exclusive right on a single word or a single letter. So it’s clear that interpreting what exactly this law protects would be up to the courts. And probably the courts in different countries would come to different conclusions. So this is a huge source of legal uncertainty and it’s particularly hitting those who are trying to create new and innovative business models. And I think this is quite tragic. It’s precisely startups that have the possibility to actually find the new business models that the cultural sector so dearly needs. It’s just that the large incumbents – such as those two publishing houses that are behind the press publishers’ rights, they don’t have a particular interest in having new competitors on the market that might be more efficient at bringing the news to people. So they have a clear interest in introducing this law. Even if they don’t think that they’re actually going to get any money from Google for using their snippets – it’s simply about making it more difficult for new market entrants to compete with them.

For the neighboring right [the biggest impact will be on] news startups, everybody who is dealing with news analysis. We had a couple of examples of startups like that that are, for example, trying to find ways to detect fake news, or to give people different sources or propose different sources to try to corroborate a story. Things like that would be extremely difficult with the neighboring right. It would also affect companies that are engaged in big data mining, because there is a new exception in the proposal that explicitly allows text and data mining for research organizations but not for anybody else. So this is an area where it’s currently quite unclear whether big data mining constitutes copyright infringement in the first place. But if you explicitly allow it for some then it kind of implies that it’s forbidden for others.

And I think the third kind of startup that is particularly affected by this is any kind of platform for sharing user generated content. For example we had an example of a Belgian startup called MuseScore, which is quite a popular platform for people to exchange sheet music – and it’s usually people simple sharing their own compositions. But of course there is no software that could automatically detect copyright infringements in sheet music because it’s not simply somebody copying the sheet music one on one. But rather whenever a composition to which the person who uploads the sheet music doesn’t have the rights, is included there this would constitute a copyright infringement so you would have to somehow technologically make the leap from a particular melody to that melody being expressed in sheet music and that technology is not available.

TC: Could this reform mean companies using large amounts of data for building AI models might technically be committing a copyright infringement — if they’re using copyrighted data to train a machine learning algorithm?
Reda: Yeah, if they’re, for example, learning to detect cats in pictures and using a bunch of cat pictures from the Internet to train their algorithms then the argument goes that by copying these images they are using a copyrighted work and they would need a license for that. In most countries it’s kind of clarified either that this kind of use is fair use or there’s specific exceptions for text and data mining — for example Japan has introduced a text and data mining exception that clarifies that it’s not a copyright infringement. But there’s also the question should this be covered by copyright in the first place? Because you are not using the work as an intellectual creation you are just using the data in the work. For example if you’re mining text and you’re looking for particular patterns, you’re not really interested in what the text means, you’re interested in how often a particular word is used or something like that. So arguably this is not actually a use of the work as such but rather just of the data that’s carrying this work. So if we introduce a text and data-mining exception only for certain organizations and startups are not included in that, then we’re basically saying that any kind of startup that if you’re using copyrighted content for training their AI would be performing a copyright infringement.

TC: On the flip side, you could argue that while algorithms may not be using the work itself there is a kind of value exchange going on, based on extracting something useful (and potentially profitable) from the data…
Reda: Copyright law was never designed to be based on whether or not you are commercially benefiting from the use or not – I mean if this were the case then all non-commercial use of copyrighted works should be legal, but it’s not. It’s always based on whether or not you’re performing certain protected uses such as making a copy. And in the digital world you just need to make copies a lot more than in the analogue world. I think that would have been perfectly legal in an analogue content – such as reading a book and counting the number of times a certain word is used is not a copyright relevant act in any way. And just when you’re using a computer to do the same thing then it suddenly is.

The other issue is that it only makes sense to require people to get a license if it’s actually possible to get a license. But how would this work? If somebody’s just scraping loads of images off social media, for example, the rights holders of those images are spread all over the world – there are millions of them, and if you actually contacted them and said hey I want to use your cat picture that you posted on Twitter for training my AI can you please given me a license, they would not know what the hell you’re talking about. The transaction cost of actually trying to do this legally would be so high that it would simply not pay to do this kind of research anymore. So basically by saying this is something that requires a license you are guaranteeing that it is simply not going to be done legally. But you’re not actually creating new business opportunities for anyone.

TC: I haven’t personally heard many European startups voicing concerns about the EU copyright reform – do you think there’s an awareness problem here? Or maybe they don’t yet realize the potential implications down the line?
I have a somewhat different impression. Because when we invited some startups to come to Brussels to speak about their experience it was extremely easy to find startups that were concerned about this, and had very specific concerns about either the neighboring right or the content monitoring. Of course if you’re a startup founder you probably don’t have the resources to lobby in the same way that a large company does because you’re basically spending all of your time on developing your product, but nevertheless there are a number of startups that are actually coming to Brussels and talking to policymakers. They have formed a business association – Allied for Startups – which is also organizing their activities. And they focus quite a lot on copyright – so for example Allied for Startups has done this startup manifesto – scale up manifesto – that they have presented to the European Commission where they are extremely critical of these proposals. So of course I don’t expect every startup founder in the EU to know about this because it is still quite a complex legislative process. But I wouldn’t share the impression that they’re not concerned about this. My impression is more that if they know about it they are concerned.

TC: What arguments are you hearing from larger tech companies – like Apple, Google, Facebook, Spotify — about the copyright reform?
Apple, I have to say, has not been particular active on this. And also Google. They’re mostly active through their business associations. So it’s extremely difficult to say what exactly is the position of which particular players. Google was invited to one of the hearings that we had in the legal affairs committee. And they were basically spending their time explaining how Content ID works, what they’re already doing voluntarily, and kind of also explaining the limits of what the technology can do – so, for example, they were quite open about the fact that it’s not capable of interpreting copyright exceptions and limitations.

Generally I would say the tech companies have been most concerned about the content monitoring provision. Because it really affects a very broad range of companies, where the neighboring right is more targeted at a specific kind of company that is active in the news sector in some way.

I met with Apple this week but they were more concerned about the Electronic Communications Code, so the telecoms review that is going on at the moment. They did have concerns about the content monitoring provision… I’ve spoken to Soundcloud and they are really quite concerned about this, and they were quite open in saying that if this kind of provision had existed when they started out, they would have never managed to survive. And nevertheless they are kind of a licensed service nowadays and are able to work with the rights holders. So they’ve been quite active on this… I’ve met with Facebook at some point. And I mean they were just reiterating their concerns about the content monitoring and the neighboring right. It’s certainly on their radar.

I think generally [the big tech companies are] trying to emphasis that they’re already doing a lot of things on a voluntarily basis.

TC: You’ve personally been pushing for copyright for years – and made it your legislative priority. Why is that? And what would you really like to see happen? What would be your ideal copyright reform?
I think that copyright reform is absolutely crucial for access to knowledge and empowerment of people. I think the cultural sector is just one small element of this. I think where the negative effects of the copyright system are much more apparent is the academic sector where basically you have a small number of extremely powerful publishing companies, that have profit margins of upwards of 30 per cent, that are basically living off getting articles for free from researchers at universities and then selling them back to the universities at astronomical prices. And I think this is an extremely unhealthy system, it’s contributing to global inequality because basically universities in developing countries and increasingly also in industrialized countries are not able to afford access to the content that is actually necessary to get a good education. So this is really what my motivation behind this copyright reform is.

Copies that are made in a digital environment should not be treated the same way as copies in the analogue age.

I’ve worked as a student assistant at a university – and I know first hand the problems that exist with simply being able to access the knowledge that has been produced with public money because of the way that the copyright system is set up. What I would really like to see – I think where a huge mistake has been made in translating the copyright system to the digital world is that copies that are made in a digital environment should not be treated the same way as copies in the analogue age. If you have 20,000 copies of a digital book in your basement it’s very clear that your intention is to distribute them and so it’s kind of a short cut of the law to simply make the copies themselves illegal, and not just the distribution. But with digital technology that’s completely different because any kind of use of digital technologies requires the making of copies and it is not implied that just because you’re making copies your intention is to give those copies to somebody else.

Just to give you an example, a friend of mine has a digital hearing aid – a cochlear implant which is basically implanted into his brain and it translates an audio signal into a digital signal, and that’s why he’s able to hear again. And if there were no exceptions to copyright that allow for example this copy from analogue to digital then he would be committing a copyright infringement every time he’s listening to music. And this obviously doesn’t make any sense. So what I would really like to see would be a reform that simply does not take digital copies as the basis for what is considered to be a copyright infringement anymore.

TC: What do you see as the likely result of the copyright reform process – are you hopeful of being able to make substantial changes to the proposals?
Reda: I’m quite optimistic that we’re going to be able to defeat the neighboring right. It’s a wildly unpopular measure wherever it has been introduced in Germany and in Spain. The Parliament has already voted against it several times. I’m of course concerned about the really intense lobbying from some publishers who are trying to shift the position of the parliament. But so far most of the parliament reports that have come out, including the Legal committee, they have all been proposing to get rid of the neighboring right.

I am more pessimistic when it comes to the content monitoring provision because there it’s extremely difficult to change this proposal into something that is not harmful. It’s a very complex ecosystem and I think not everybody is aware of the problems associated with content recognition technologies. And as you were saying it’s kind of mixed up with the discussions around terrorism and hate speech. And I think that’s always a very bad starting point for having a really targeted copyright reform that it’s not mixing up a lot of different issues. So there I’m a lot more skeptical.

TC: What happens next? What’s the timeline from here?
Reda: The European parliament has presented its report, and the deadline for amendments to that is actually today [last Wednesday]. So after everybody has tabled their amendments the person who wrote the report, the rapporteur, is going to take those amendments and form them into compromises. Then we’re going to vote on it in the committee, probably in June or July, and then it will go to the plenary vote and to negotiations with the Council. So a final text could be expected maybe in a year or so.

TC: So there’s still a chance for substantial amendments?
Basically so far the proposal from the Commission is only the starting point. And nobody is bound by what the Commission has proposed. And actually Council as well – there are a lot of national governments who are completely unconvinced by the neighboring right. And are asking a lot of critical questions so it’s very possible that we can get rid of these proposals if we’re keeping up the public pressure and it’s convincing also national governments that this is also not in their interest.

This interview has been lightly edited and condensed for clarity