Category Archives: Paper Summary

YouTube’s AI-based video recommendations skew, obviously, to what results in more views

The goal of Youtube, obviously, is to increase the percent of time you spend watching Youtube, and their ads every few minutes:

New research published today by Mozilla backs that notion up, suggesting YouTube’s AI continues to puff up piles of ‘bottom-feeding’/low grade/divisive/disinforming content — stuff that tries to grab eyeballs by triggering people’s sense of outrage, sewing division/polarization or spreading baseless/harmful disinformation — which in turn implies that YouTube’s problem with recommending terrible stuff is indeed systemic; a side-effect of the platform’s rapacious appetite to harvest views to serve ads.

Source: YouTube’s recommender AI still a horrorshow, finds major crowdsourced study | TechCrunch

Machine learning-based recommendation systems are constantly seeking patterns and associations – person X has watched several videos of type Y, therefore, we should recommend more videos that are similar to Y.

But Youtube defines “similar” in a broad way which may result in your viewing barely related videos that encourage outrage/conspiracy theories or what they term “disinformation”. Much of that depends, of course, on how you define “disinformation” – the writer of the article, for example, thinks when a user watches a video on “software rights” that it is a mistake to then recommend a video on “gun rights” and implies (in most examples given) that this has biased recommendations towards right-leaning topics.

News reports also highlighted inappropriately steering viewers to “sexualized” content, but that was a small part of the recommendations. This too might happen based on the long standing marketing maxim that “sex sells“.

What seems more likely is the algorithms identify patterns – even if weak associations – and use that to make recommendations. In a way, user behavior drives the pattern matching that ultimately leads to the recommendations. The goal of the algorithm (think of like an Excel Solver goal) is to maximize viewing minutes.

Ultimately, what the Mozilla research actually finds is the recommendations are not that good – and gave many people recommendations that they regretted watching.

Yet the researchers and TechCrunch writer spin this into an evil conspiracy of algorithms forcing you to watch “right wing” disinformation. But the reality seems far less nefarious. It’s just pattern matching what people watch. Their suggestion of  turning off these content matches is nefarious – they want Youtube to forcefully control what you see – for political purposes – not simply increasing viewing minutes.

Which is more evil? Controlling what you see for political purposes or controlling what you see to maximize viewing minutes?

Increases in student loan availability lead to increases in tuition and fees

Stated another way, the more money poured in to student loan programs, the higher the tuition charged. Tuition goes up because of student loans rather than the view that student loans go up in response to higher tuition.

Consistent with the model, we find that even when universities price-discriminate, a credit expansion will raise tuition paid byall students and not only by those at the federal loan caps because of pecuniary demand externalities. Such pricing externalities are often conjectured in the context of the effects of expanded subprime borrowing on housing prices leading up to the financial crisis, and our study can be seen as complementary evidence in the student loan market.

From: Lucca, D., Nadauld, T., Shen, K. (2015, 2017). Credit supply and the rise in college tuition: Evidence from the expansion in Federal student aid programs. Staff Report no. 733. Federal Reserve Bank of New York.

As the authors note, this is similar to other areas where a third party supply of money causes prices to rise – such as the effect of cheap mortgages causing home prices to rise.

A similar effect occurs in health care where third party “insurance” benefits are an enabler of higher priced health care services.

Whenever the cost of goods are services are subsidized such that their immediate direct costs are lower than the market clearing price, demand for those goods and services will increase. As demand increases relative to supply, the prices charged increase to a new actual and higher market clearing price.

Student loan programs are a major cause of tuition hikes. Cheap mortgages are a major cause of rising home prices. Health “insurance” is a major cause of higher prices charged in health care.

The problems with models versus the real world

Model-land is a hypothetical world in which our simulations are perfect, an attractive fairy-tale state of mind in which optimising a simulation invariably reflects desirable pathways in the real world. Decision-support in model-land implies taking the output of model simulations at face value (perhaps using some form of statistical post-processing to account for blatant inconsistencies), and then interpreting frequencies in model-land to represent probabilities in the real-world.

The following is something I see nearly every day in the media – where the model output is presented as the real world (even when the real world is different):

As a trivial example, when writing about forecasts of household consumption, energy prices, or global average surface temperature, many authors will use the same name and the same phrasing to refer to effects seen in the simulation as those used for the real world. It may not be the case that these authors are actually confused about which is which, the point is that readers of conclusions would benefit from a clear distinction being made, especially where such results are presented as if they have relevance to real-world phenomena and decision-making.

For what we term “climate-like” models, the realms of sophisticated statistical processing which variously “identify the best model”, “calibrate the parameters of the model”, “form a probability distribution from the ensemble”, “calculate the size of the discrepancy” etc., are castles in the air built on a single assumption which is known to be incorrect: that the model is perfect.

….

It is not clear why multi-model ensembles are taken to represent a probability distribution at all; the distributions from each imperfect model in the ensemble will differ from the desired perfect model probability distribution (if such a thing exists); it is not clear how combining them might lead to a relevant, much less precise, distribution for the real-world target of interest.

Source: 1662970102.pdf

The last paragraph quoted above is something that has long bothered me about model ensembles.

This paper is a good read. Click on the link above to read the full paper.

Thompson, Erica L.; Smith, Leonard A. (2019) : Escape from model-land, Economics Discussion Papers, No. 2019-23, Kiel Institute for the World Economy (IfW), Kiel. Retrieved from: https://www.econstor.eu/bitstream/10419/194875/1/1662970102.pdf

“Enabling High-level Application Development for the Internet of Things” (2015)

Patel, P., & Cassou, D. (2015). Enabling high-level application development for the Internet of Things. Journal of Systems and Software, 103, 62-84.

Retrieved from: http://arxiv.org/pdf/1501.05080

File: 1501.05080.pdf

Paper proposes a software engineering methodology for IoT applications, noting that IoT devices may span a wide area of issues from sensors, wireless connectivity, design techniques, to much more – requiring domain expertise in various phases of the life cycle. The paper then proposes an approach (a model or models described using new high level languages) for the specification, design, development of comprehensive IoT applications. There is a lot of detail in this paper – This is a really interesting paper summarizing the PhD thesis of the first author.

“Privacy of big data in the internet of things era.” (2015)

Perera, C., Ranjan, R., Wang, L., Khan, S. U., & Zomaya, A. Y. (2015). Privacy of big data in the internet of things era. IEEE IT Special Issue Internet of Anything, 6.

File: 1412.8339.pdf

The authors note an objective of IoT is to collect data, and whether that is stored locally or in the cloud, privacy

Notes definition of Big Data in terms of Volume, Variety and Velocity (3Vs), much of the data will be personal, and 60% of Internet users are aware of privacy issues and 85% want more control. But users are willing to give up privacy in exchange for value. Authors suggest that in the future, IoT consumers will be offered two models: (1) give up privacy in exchange for service, or (2) pay a fee to obtain service and retain privacy. Services need to obtain consent but regarding social media privacy policies “most of the users underestimate the authorization given to the third party applications” – in other words, people are giving up more privacy than they realize. Related issue: can users migrate their own data from one service provider to another? And, seldom are we anonymous on the Internet due to modern tracking capabilities. Notes issues with security and lack of updates/patches to IoT devices.

Paper recommends that manufacturers take privacy and security seriously, provide options to enable and disable data collection, limit the transmission of data to the cloud – and then there is the issue of 3rd party application developers and what they build on top of these platforms. While we think of IoT sensors in terms of consumers, in many cases, consumers may not have control, such as the use of IoT sensors in apartments and businesses.

Authors mention several sensors as a service models, including OpenIT, Lab of Things (LoT), Hub of All Things (HAT), Xively and Datacoup.

Movers and Shakers: Kinetic energy harvesting for the Internet of Things. Gorlatova, et al (2014)

Gorlatova, M., Sarik, J., Grebla, G., Cong, M., Kymissis, I., Zussman, G. (2014). Movers and Shakers: Kinetic Energy Harvesting for the Internet of Things.  Columbia University, Electrical Engineering Technical Report #2014-03-27, Mar. 2014.
Retrieved from: https://arxiv.org/pdf/1307.0044.pdf
File: 1307.0044.pdf
Examines the use of motion (kinetic energy) harvesting to power IoT devices, specifically, that of people and how they move around during the day. They monitored 40 people for about 200 hours to capture a record of motion, and then developed a model that might be used to estimate the amount of energy that might be collected for powering IoT devices, attached to people (in addition to motion, they also examine the potential use of photovoltaics to collect energy).

Internet of Things. Mulani and Pingle (2016). Paper Summary.

I am reading through a number of published, peer-reviewed papers that are related to the Internet of Things topic. Rather than create a personal annotated bibliography as I go through the papers, I am instead going to try and write paper summaries and comments here on this blog. May as well share what I read!

There is no particular order to these papers. I found them online and saved them in a folder, and am reading them in the order they appear in the folder.

 

Mulani, T., and Pingle, S.V. (2016) Internet of Things. International Research Journal of Multidisciplinary Studies. March 2016. Retrieved from: http://www.irjms.in/sites/irjms/index.php/files/article/download/270/256 File: 270-772-1-PB.pdf

Defines the IoT and then examines issues surrounding IoT, including security, privacy, legal, economic, followed by comments on IoT communications models including peer to peer (device to device), device to cloud, device to gateway and back end data sharing. Concludes that IoT has “technical, social and policy considerations” that need to be examined. But … this paper does not examine them. Reads like a proposal.