<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>HLP/Jaeger lab blog</title>
	<atom:link href="https://hlplab.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://hlplab.wordpress.com</link>
	<description>2 spoons of psycholinguistics, 1/2 cup full of brain, add some modeling, and run the whole thing in the tropics</description>
	<lastBuildDate>Tue, 03 Jan 2012 19:57:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='hlplab.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>https://s-ssl.wordpress.com/i/buttonw-com.png</url>
		<title>HLP/Jaeger lab blog</title>
		<link>https://hlplab.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="https://hlplab.wordpress.com/osd.xml" title="HLP/Jaeger lab blog" />
	<atom:link rel='hub' href='https://hlplab.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Using pyjamas to program external Mechanical Turk experiments</title>
		<link>https://hlplab.wordpress.com/2011/12/25/using-pyjamas-to-program-external-mechanical-turk-experiments/</link>
		<comments>https://hlplab.wordpress.com/2011/12/25/using-pyjamas-to-program-external-mechanical-turk-experiments/#comments</comments>
		<pubDate>Sun, 25 Dec 2011 20:36:34 +0000</pubDate>
		<dc:creator>jdegen</dc:creator>
				<category><![CDATA[WWW experiments]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[Mechanical Turk]]></category>
		<category><![CDATA[online experiments]]></category>
		<category><![CDATA[pyjamas]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1076</guid>
		<description><![CDATA[I recently set up my first external Mechanical Turk study. My greatest friend and foe in this process was pyjamas, a Python-to-Javascript compiler and Widget Set API. THE great advantage of using pyjamas: you can program your entire experiment in python, and pyjamas will create the browser-dependent javascript code. If you already know javascript, writing [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1076&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I recently set up my first external Mechanical Turk study. My greatest friend and foe in this process was <a title="pyjamas" href="http://pyjs.org/">pyjamas</a>, a Python-to-Javascript compiler and Widget Set API. THE great advantage of using pyjamas: you can program your entire experiment in python, and pyjamas will create the browser-dependent javascript code. If you already know javascript, writing your experiment in python without having to worry about browser-dependent issues will save you time. And if you don&#8217;t, you don&#8217;t have to go through the frustrating process of learning javascript. On the downside, the documentation for pyjamas is currently not very good, so figuring out how to get things to work can take a while.</p>
<p>That&#8217;s why I&#8217;m providing the (commented) code that I generated to create my MechTurk experiment. A short demo version of the experiment can be found <a href="http://www.hlp.rochester.edu/mturk/naturalness/output/natural.html?explist=1&amp;assignmentId=foo&amp;order=someafter&amp;plusminus=minus&amp;debug=True">here</a>.</p>
<div id="attachment_1086" class="wp-caption aligncenter" style="width: 665px"><a href="http://hlplab.files.wordpress.com/2011/12/gumballmachine1.png"><img class="size-full wp-image-1086 " title="gumballmachine" src="http://hlplab.files.wordpress.com/2011/12/gumballmachine1.png?w=655&#038;h=373" alt="" width="655" height="373" /></a><p class="wp-caption-text">A screenshot of the experiment. Participants were asked to rate on a 7-point scale how natural the statement they heard was as a description of the scene.</p></div>
<p><span id="more-1076"></span></p>
<p>The structure of the experiment, which combines timed image presentation, playing audio via HTML5, recording naturalness ratings via radio buttons, and recording feedback via a text box:</p>
<ol>
<li>Participants first see an instructions page with a CONTINUE button. Upon clicking CONTINUE,</li>
<li>they continue to a sound test. This is to make sure participants can actually hear sound via their speakers/headphones. They click PLAY and are asked to enter the first word of the phrase they hear. If the word is entered correctly,</li>
<li>the main body of the experiment begins. Each trial consists of</li>
<ul>
<li>displaying an image of a full gumball machine, which after 1.5 seconds changes such that</li>
<li>a second image is displayed where a certain number of gumballs has moved to the lower chamber. An audio file is played (statements of the form &#8220;You got X of the gumballs&#8221;). Next to the gumball machine, a 7-point scale of radio buttons is displayed on which participants are asked to rate the naturalness of the statements they heard as a description of the scene. Below the scale, there is a FALSE button that participants are asked to click if they think the statement was false. When a button is clicked, information about which button it was is recorded and the next trial begins.</li>
</ul>
<li>On the last trial, a feedback box is displayed alongside a SUBMIT button. Upon clicking SUBMIT, all the information about the experiment (worker ID, trial information, participants&#8217; responses, etc) is submitted to Mechanical Turk. Note that in the demo version I&#8217;ve made available, the information is printed to a window that pops up when you click the SUBMIT button.</li>
</ol>
<p>The tgz file on my <a href="http://www.bcs.rochester.edu/people/jdegen/publications.html">homepage</a> (scroll to the bottom of the page) contains all the files necessary for building the pyjamas project that generates the experiment. You need to have installed pyjamas for the build to work. The structure of the experiment is reflected in the file natural.py, which contains the python code that pyjamas compiles to javascript. Read the README to get started.</p>
<p>Good luck. Questions &amp; comments welcome!</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/statistics-methodology/www-experiments/'>WWW experiments</a> Tagged: <a href='https://hlplab.wordpress.com/tag/javascript/'>javascript</a>, <a href='https://hlplab.wordpress.com/tag/mechanical-turk/'>Mechanical Turk</a>, <a href='https://hlplab.wordpress.com/tag/online-experiments/'>online experiments</a>, <a href='https://hlplab.wordpress.com/tag/pyjamas/'>pyjamas</a>, <a href='https://hlplab.wordpress.com/tag/python/'>python</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1076/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1076/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1076/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1076&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/12/25/using-pyjamas-to-program-external-mechanical-turk-experiments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/93d41ee91229d49e4bde8b6097c52258?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">jdegen</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/12/gumballmachine1.png" medium="image">
			<media:title type="html">gumballmachine</media:title>
		</media:content>
	</item>
		<item>
		<title>The NSF in 2020: The future of the Social, Behavioral,and Economic Sciences</title>
		<link>https://hlplab.wordpress.com/2011/11/30/the-nsf-in-2020-the-future-of-the-social-behavioraland-economic-sciences/</link>
		<comments>https://hlplab.wordpress.com/2011/11/30/the-nsf-in-2020-the-future-of-the-social-behavioraland-economic-sciences/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 04:06:20 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[NSF]]></category>
		<category><![CDATA[vision]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1068</guid>
		<description><![CDATA[The NSF/SBE released its executive summary of 252 short white papers on the future of the social, behavioral, and economic sciences. Among other things, the report identifies four focus areas (population change; sources of disparities; communication, language, and linguistics; and technology, new media, and social network) and three properties of future research (data-intensive, multidisciplinary, and collaborative). But read [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1068&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The NSF/SBE released its <a href="http://www.nsf.gov/pubs/2011/nsf11086/nsf11086.pdf">executive summary of 252 short white papers on the future of the social, behavioral, and economic sciences</a>. Among other things, the report identifies four focus areas (population change; sources of disparities; <strong>communication, language, and linguistics</strong>; and technology, new media, and social network) and three properties of future research (<strong>data-intensive</strong>, <strong>multidisciplinary, and collaborative</strong>). But read for yourself. The report summarizes what the community (authors that submitted white papers) had to say about what works well and what needs to be improved in terms of the processes that are currently employed by the NSF to distribute its funding. On p. 24 an onward, you can read a summary of the many many linguistic white papers that seem to have been submitted (see p. 39 for a summary of which disciplines the white papers came from). On p.29 an onward the report lays out possible scenarios as to how the NSF might change in order to get to the outlined vision.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/link/'>link</a> Tagged: <a href='https://hlplab.wordpress.com/tag/funding/'>funding</a>, <a href='https://hlplab.wordpress.com/tag/nsf/'>NSF</a>, <a href='https://hlplab.wordpress.com/tag/vision/'>vision</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1068/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1068/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1068/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1068&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/11/30/the-nsf-in-2020-the-future-of-the-social-behavioraland-economic-sciences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>Google scholar now provides detailed citation report</title>
		<link>https://hlplab.wordpress.com/2011/11/30/google-scholar-now-provides-detailed-citation-report/</link>
		<comments>https://hlplab.wordpress.com/2011/11/30/google-scholar-now-provides-detailed-citation-report/#comments</comments>
		<pubDate>Thu, 01 Dec 2011 02:04:51 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[link]]></category>
		<category><![CDATA[For students]]></category>
		<category><![CDATA[Ever noticed?]]></category>
		<category><![CDATA[google scholar]]></category>
		<category><![CDATA[h-index]]></category>
		<category><![CDATA[impact]]></category>
		<category><![CDATA[tracking citations]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1056</guid>
		<description><![CDATA[This might be of interest to some of you: Google Scholar now allows you to correct links or citations to your work. It also provides a complete summary of all your citations, by article, by year, etc. It&#8217;s a functionality similar to academia.edu, but it let&#8217;s you remove wrong links to your work (e.g. to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1056&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://hlplab.files.wordpress.com/2011/11/googlescholar2.png"><br />
</a>This might be of interest to some of you: Google Scholar now allows you to correct links or citations to your work. It also provides a complete summary of all your citations, by article, by year, etc. It&#8217;s a functionality similar to academia.edu, but it let&#8217;s you remove wrong links to your work (e.g. to old prepublished manuscripts).</p>
<p>The interface is rather convenient since it allows you to import all references from scholar, which is almost 95% correct. Overall, it&#8217;s actually much more convenient than academia.edu (though I&#8217;d say it serves a slightly different purpose). It also generates a list of all your co-authors and other schnick-schnack <img src='https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . <a href="http://googlescholar.blogspot.com/2011/11/google-scholar-citations-open-to-all.html">Check it out.</a> Sweet.</p>
<p><span id="more-1056"></span></p>
<p>Here&#8217;s a screen shot for those of you, who remain unconvinced (you can get a similar view for each paper separately):</p>
<p><a href="http://hlplab.files.wordpress.com/2011/11/googlescholar2.png"><img class="aligncenter size-full wp-image-1060" style="border-color:initial;border-style:initial;" title="GoogleScholar" src="http://hlplab.files.wordpress.com/2011/11/googlescholar2.png?w=655" alt=""   /></a></p>
<p>Who knew that one of my most influential papers (aehem) is on Bulgarian wh-questions <img src='https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . And can I just mention that this even found my undergraduate homework on &#8220;The [tragic] comedy in the 18th/19th/20th century&#8221;, prepared for a class in Modern German literature:</p>
<p style="padding-left:90px;"><a href="http://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=PyYDHEUAAAAJ&amp;pagesize=100&amp;citation_for_view=PyYDHEUAAAAJ:u5HHmVD_uO8C">Bibliographie zur Komödie im 18./19./20. Jahrhundert</a></p>
<p style="padding-left:90px;">TF Jaeger<br />
Humboldt-Universität zu Berlin1996</p>
<div>That&#8217;s right. A true master piece though (as of yet) very much under-cited.</div>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/ever-noticed/'>Ever noticed?</a>, <a href='https://hlplab.wordpress.com/category/for-students/'>For students</a>, <a href='https://hlplab.wordpress.com/category/link/'>link</a> Tagged: <a href='https://hlplab.wordpress.com/tag/google-scholar/'>google scholar</a>, <a href='https://hlplab.wordpress.com/tag/h-index/'>h-index</a>, <a href='https://hlplab.wordpress.com/tag/impact/'>impact</a>, <a href='https://hlplab.wordpress.com/tag/tracking-citations/'>tracking citations</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1056/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1056/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1056/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1056&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/11/30/google-scholar-now-provides-detailed-citation-report/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/11/googlescholar2.png" medium="image">
			<media:title type="html">GoogleScholar</media:title>
		</media:content>
	</item>
		<item>
		<title>Lot&#8217;s of zeros? Be careful with your chi-square (exact or not) and alike</title>
		<link>https://hlplab.wordpress.com/2011/11/17/lots-of-zeros-be-careful-with-your-chi-square-exact-or-not-and-alike/</link>
		<comments>https://hlplab.wordpress.com/2011/11/17/lots-of-zeros-be-careful-with-your-chi-square-exact-or-not-and-alike/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 16:20:44 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[chi-square]]></category>
		<category><![CDATA[low count data]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1044</guid>
		<description><![CDATA[If you&#8217;re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There&#8217;s a nice article by Andrew Gelman on this topic, where he shows that the problem is that all the low counts can make it harder to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1044&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re running chi-squares to analyze categorical data and you have lots of very low count (or even 0 cells), be careful in how to interpret the result. There&#8217;s a <a href="http://andrewgelman.com/2011/11/chi-square-fail-when-many-cells-have-small-expected-values/">nice article by Andrew Gelman on this topic</a>, where he shows that the problem is that all the low counts can make it harder to detect the signal (and hence a significant deviation from the expected values for a part of the table). Put differently, you might have a significant pattern, but not detect. I don&#8217;t think it&#8217;s so much a problem for most of the tests we conduct since contingency tables in psycholinguistic and linguistic research are usually rather small. I can&#8217;t recall the last time that I saw anything larger than a 3&#215;4 or alike. From what I understand from the Gelman&#8217;s post, it would seem that the problem he points out becomes more serious the larger the table is.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/chi-square/'>chi-square</a>, <a href='https://hlplab.wordpress.com/tag/low-count-data/'>low count data</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1044/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1044/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1044/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1044&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/11/17/lots-of-zeros-be-careful-with-your-chi-square-exact-or-not-and-alike/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>some (relatively) new funding mechanisms through NSF</title>
		<link>https://hlplab.wordpress.com/2011/11/10/some-relatively-new-funding-mechanisms-through-nsf/</link>
		<comments>https://hlplab.wordpress.com/2011/11/10/some-relatively-new-funding-mechanisms-through-nsf/#comments</comments>
		<pubDate>Fri, 11 Nov 2011 02:14:19 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[Ever noticed?]]></category>
		<category><![CDATA[funding]]></category>
		<category><![CDATA[NSF]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1040</guid>
		<description><![CDATA[This might be of interest to folks, in case you haven&#8217;t seen it. First, there&#8217;s RAPID and EAGER. RAPID is a mechanism for research that requires fast funding decisions (e.g. b/c the first language with only one phoneme was just discovered but its last speaker is just about to enter into a vow of silence). [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1040&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This might be of interest to folks, in case you haven&#8217;t seen it. First, there&#8217;s RAPID and EAGER. <a href="http://www.nsf.gov/pubs/policydocs/pappguide/nsf10_1/gpg_2.jsp#IID1">RAPID</a> is a mechanism for research that requires fast funding decisions (e.g. b/c the first language with only one phoneme was just discovered but its last speaker is just about to enter into a vow of silence). <a href="http://www.nsf.gov/pubs/policydocs/pappguide/nsf09_1/gpg_2.jsp#IID2">EAGER</a>s are &#8220;Early-concept Grants for Exploratory Research&#8221; for exploratory work &#8211; i.e. high risk research with a high potential for high pay-off. One important property of both mechanisms is that submissions do not have to be sent out for external review, which should substantially shorten the time until you hear back from NSF.</p>
<p>Second, there is now a new type of proposal that is specifically aimed at interdisciplinary work that would not usually be funded by any of the existing NSF panels alone &#8211; <a href="http://www.nsf.gov/pubs/2012/nsf12011/nsf12011.jsp">CREATIV: Creative Research Awards for Transformative Interdisciplinary Ventures</a>.</p>
<p>Note that all three of these funding types allow no re-submission.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/ever-noticed/'>Ever noticed?</a> Tagged: <a href='https://hlplab.wordpress.com/tag/funding/'>funding</a>, <a href='https://hlplab.wordpress.com/tag/nsf/'>NSF</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1040/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1040/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1040/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1040&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/11/10/some-relatively-new-funding-mechanisms-through-nsf/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>The serial founder hypothesis and word order universals</title>
		<link>https://hlplab.wordpress.com/2011/11/07/the-serial-founder-hypothesis-and-word-order-universals/</link>
		<comments>https://hlplab.wordpress.com/2011/11/07/the-serial-founder-hypothesis-and-word-order-universals/#comments</comments>
		<pubDate>Mon, 07 Nov 2011 17:18:23 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[Atkinson]]></category>
		<category><![CDATA[Croft]]></category>
		<category><![CDATA[Cysouw]]></category>
		<category><![CDATA[Dunn]]></category>
		<category><![CDATA[Graff]]></category>
		<category><![CDATA[linear mixed models]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[phonological complexity]]></category>
		<category><![CDATA[phonology]]></category>
		<category><![CDATA[Pontillo]]></category>
		<category><![CDATA[serial founder hypothesis]]></category>
		<category><![CDATA[Tily]]></category>
		<category><![CDATA[typology]]></category>
		<category><![CDATA[word order]]></category>
		<category><![CDATA[word order universals]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1037</guid>
		<description><![CDATA[Check out this article in ScienceNews summarizing commentaries on two recent language studies in Science (Atkinson, 2011: ) and Nature (Dunn et al., 2011). Each of the studies has received a lot of attention and they are the subject of two special issues in press for Linguistic Typology, to which HLP Lab contributed on three [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1037&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Check out this<a href="http://www.sciencenews.org/view/feature/id/335805/title/Darwin%E2%80%99s_Tongues"> article in ScienceNews summarizing commentaries on two recent language studies in Science (Atkinson, 2011: ) and Nature (Dunn et al., 2011)</a>. Each of the studies has received a lot of attention and they are the subject of two special issues in press for Linguistic Typology, to which HLP Lab contributed on three articles. I will add a link to the special issue(s) once it comes out.<span id="more-1037"></span></p>
<p><strong><a href="http://www.sciencemag.org/content/332/6027/346.short">Atkinson (2011)</a></strong> proposed the serial founder hypothesis, according to which languages further away from the point of origin of language have simpler phonology. He presents evidence based on a data set constructed from the World Atlas of Languages. The study has been criticized for, among other things, the choice of data, the way phonological complexity was calculated, and the statistical methods employed in the approach. The<a href="http://rochester.academia.edu/tiflo/Papers/774232/Jaeger_T._F._Graff_P._Croft_B._and_Pontillo_D._to_appear._Mixed_effect_models_for_genetic_and_areal_dependencies_in_linguistic_typology_Commentary_on_Atkinson._Linguistic_Typology"> paper that I co-authored with Peter Graff, Bill Croft, and Dan Pontillo assesses the mixed effect regression approach taken by Atkinson to account for genetic relations between languages</a>. Bill Croft, of course, is at the University of New Mexico, Peter Graff is a graduate student at MIT, and Dan just joined Rochester&#8217;s graduate program. We provide an introduction to mixed effect regression, discuss when one can conclude that random slopes aren&#8217;t warranted, and extend the approach to account for language contact. Beyond the specific evaluation of Atkinson&#8217;s approach, we also hope that this paper will be of interest to anyone conducting data analysis of typological data. You might also be interested in the comment by Cysouw et al that just got accepted by Science. They discuss to what extent Atkinson&#8217;s findings replicate on another data set. As soon as the article is officially in press, I will post a link here. Another comment from our lab (work with Dan Pontillo and Peter Graff) under review for Science presents large scale statistical simulations that assess the Type I error rate of Atkinson&#8217;s approach.</p>
<p><strong><a href="http://www.nature.com/nature/journal/v473/n7345/full/nature09923.html">Dunn et al (2011)</a></strong> present a novel statistical approach to assess whether there is any typological evidence for word order universals (well, novel for typological research; their approach has been employed to evolutionary biology). They argue that there is only evidence for lineage-specific trends, rather than cross-lineage universals. In one comment to appear in Linguistic Typology, <a href="http://rochester.academia.edu/tiflo/Papers/674181/Tily_H._and_Jaeger_T.F._in_press._Complementing_quantitative_typology_with_behavioral_approaches_Evidence_for_typological_universals._Linguistic_Typology">Hal Tily and I have discussed alternative evidence from behavioral paradigms</a> (artificial language learning and iterated artificial language learning) that <em>does</em> seem to provide evidence for cross-lineage universals (although I think the main message of our comment is that there are other methods that should be pursued in addition to statistical analyses of typological data, which suffer a lot from data sparseness). Hal is currently a post-doctoral fellow at MIT. In another comment, <a href="http://rochester.academia.edu/tiflo/Papers/790165/Croft_W._Bhattacharya_T._Kleinschmidt_D._Smith_D._E._and_Jaeger_T._F._to_appear._Greenbergian_universals_diachrony_and_statistical_analyses._Linguistic_Typology">Bill Croft, Tanmoy Bhattacharya, Dave Kleinschmidt, Eric Smith and I discuss the trade-offs of the statistical methods employed by Dunn et al</a> (a trait evolution model for language change, implemented in the software <em>BayesTraits</em>). Tanmoy and Eric are at the Santa Fe Institute, Bill is at the University of New Mexico.</p>
<p>Comments welcome.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a> Tagged: <a href='https://hlplab.wordpress.com/tag/atkinson/'>Atkinson</a>, <a href='https://hlplab.wordpress.com/tag/croft/'>Croft</a>, <a href='https://hlplab.wordpress.com/tag/cysouw/'>Cysouw</a>, <a href='https://hlplab.wordpress.com/tag/dunn/'>Dunn</a>, <a href='https://hlplab.wordpress.com/tag/graff/'>Graff</a>, <a href='https://hlplab.wordpress.com/tag/linear-mixed-models/'>linear mixed models</a>, <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/phonological-complexity/'>phonological complexity</a>, <a href='https://hlplab.wordpress.com/tag/phonology/'>phonology</a>, <a href='https://hlplab.wordpress.com/tag/pontillo/'>Pontillo</a>, <a href='https://hlplab.wordpress.com/tag/serial-founder-hypothesis/'>serial founder hypothesis</a>, <a href='https://hlplab.wordpress.com/tag/tily/'>Tily</a>, <a href='https://hlplab.wordpress.com/tag/typology/'>typology</a>, <a href='https://hlplab.wordpress.com/tag/word-order/'>word order</a>, <a href='https://hlplab.wordpress.com/tag/word-order-universals/'>word order universals</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1037/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1037/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1037/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1037&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/11/07/the-serial-founder-hypothesis-and-word-order-universals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>More papers relevant to questions about information density</title>
		<link>https://hlplab.wordpress.com/2011/10/21/more-papers-relevant-to-questions-about-information-density/</link>
		<comments>https://hlplab.wordpress.com/2011/10/21/more-papers-relevant-to-questions-about-information-density/#comments</comments>
		<pubDate>Sat, 22 Oct 2011 04:40:39 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[information density]]></category>
		<category><![CDATA[information rate]]></category>
		<category><![CDATA[speech rate]]></category>
		<category><![CDATA[syllables]]></category>
		<category><![CDATA[uniform information density]]></category>
		<category><![CDATA[word length]]></category>
		<category><![CDATA[Zipf]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1032</guid>
		<description><![CDATA[And while I am at it, let me post three more papers that are interesting for anyone interested in uniform information density and, more generally, theories of communicatively efficient language production (though most of you may already know these papers): They call it speech information rate, but it&#8217;s essentially the same: Pellegrine, F., Coupe, C., and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1032&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>And while I am at it, let me post three more papers that are interesting for anyone interested in <em>uniform information density</em> and, more generally, theories of communicatively efficient language production (though most of you may already know these papers):</p>
<ul>
<li>They call it speech information rate, but it&#8217;s essentially the same: Pellegrine, F., Coupe, C., and Marsico, E. 2011. <a href="http://www.lsadc.org/info/documents/2011/press-releases/pellegrino-et-al.pdf">A cross-linguistic perspective on speech information rate</a>. <em>Language</em> 87(3), 539-558.</li>
<li>Maurits, L., Perfors, A., and Navarro, D. 2010. <a href="http://www.psychology.adelaide.edu.au/personalpages/staff/amyperfors/papers/mauritsetal10nips-wordorderuid.pdf">Why are some word orders more common than others. A uniform information density account.</a> NIPS.</li>
<li>S.T. Piantadosi, H. Tily, and E. Gibson. 2011. <a href="http://web.mit.edu/piantado/www/">Word lengths are optimized for efficient communication</a>.<em>Proceedings of the National Academy of Sciences</em>, 108(9):3526.</li>
</ul>
<div>Lots of food for thought.</div>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a> Tagged: <a href='https://hlplab.wordpress.com/tag/information-density/'>information density</a>, <a href='https://hlplab.wordpress.com/tag/information-rate/'>information rate</a>, <a href='https://hlplab.wordpress.com/tag/speech-rate/'>speech rate</a>, <a href='https://hlplab.wordpress.com/tag/syllables/'>syllables</a>, <a href='https://hlplab.wordpress.com/tag/uniform-information-density/'>uniform information density</a>, <a href='https://hlplab.wordpress.com/tag/word-length/'>word length</a>, <a href='https://hlplab.wordpress.com/tag/zipf/'>Zipf</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1032/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1032/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1032/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1032&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/10/21/more-papers-relevant-to-questions-about-information-density/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>UID and text generation</title>
		<link>https://hlplab.wordpress.com/2011/10/21/uid-and-text-generation/</link>
		<comments>https://hlplab.wordpress.com/2011/10/21/uid-and-text-generation/#comments</comments>
		<pubDate>Sat, 22 Oct 2011 04:28:57 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[complementizer]]></category>
		<category><![CDATA[Rajkumar]]></category>
		<category><![CDATA[text generation]]></category>
		<category><![CDATA[uniform information density]]></category>
		<category><![CDATA[White]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1030</guid>
		<description><![CDATA[Ah, just when I thought it couldn&#8217;t get any better: Uniform Information Density has been applied to text generation . Have a look at this paper (thanks, Raja, for forwarding it): Rajakrishnan Rajkumar and Michael White. 2011. Linguistically Motivated Complementizer Choice in Surface Realization. In Proc. of the EMNLP-11 Workshop on Using Corpora in NLG. (bib) According to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1030&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Ah, just when I thought it couldn&#8217;t get any better: Uniform Information Density has been applied to text generation <img src='https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . Have a look at this paper (thanks, Raja, for forwarding it):</p>
<ul>
<li>Rajakrishnan Rajkumar and Michael White. 2011. <a href="http://www.aclweb.org/anthology/W/W11/W11-2706.pdf">Linguistically Motivated Complementizer Choice in Surface Realization</a>. In <em>Proc. of the EMNLP-11 Workshop on Using Corpora in NLG</em>. <a href="http://www.aclweb.org/anthology/W/W11/W11-2706.bib">(bib)</a></li>
</ul>
<div>According to Raja (the first author), more on this issue is in progress (e.g. an extension beyond complementizers) and future updates on this work  will be posted on <a href="http://www.ling.ohio-state.edu/~mwhite/#papers">Michael White&#8217;s lab at Ohio State</a>.</div>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a> Tagged: <a href='https://hlplab.wordpress.com/tag/complementizer/'>complementizer</a>, <a href='https://hlplab.wordpress.com/tag/rajkumar/'>Rajkumar</a>, <a href='https://hlplab.wordpress.com/tag/text-generation/'>text generation</a>, <a href='https://hlplab.wordpress.com/tag/uniform-information-density/'>uniform information density</a>, <a href='https://hlplab.wordpress.com/tag/white/'>White</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1030/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1030/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1030/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1030&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/10/21/uid-and-text-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>Belated congratulations to Dave Kleinschmidt</title>
		<link>https://hlplab.wordpress.com/2011/10/13/belated-congratulations-to-dave-kleinschmidt/</link>
		<comments>https://hlplab.wordpress.com/2011/10/13/belated-congratulations-to-dave-kleinschmidt/#comments</comments>
		<pubDate>Thu, 13 Oct 2011 18:31:39 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[HLP lab]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[AMLaP]]></category>
		<category><![CDATA[Kleinschmidt]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=1026</guid>
		<description><![CDATA[Better late than never: Congratulations to Dave Kleinschmidt for winning the &#8220;Student Talk Prize&#8221; at the 2011 meeting of Architecture and Mechanisms of Language Processing in Paris, France. If you want to learn more about&#8217;s Dave&#8217;s work on A Bayesian belief updating model of phonetic recalibration and selective adaptation either have a look at this AMLaP abstract or read [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1026&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Better late than never: Congratulations to <strong>Dave Kleinschmidt</strong> for winning the &#8220;Student Talk Prize&#8221; at the 2011 meeting of Architecture and Mechanisms of Language Processing in Paris, France. If you want to learn more about&#8217;s Dave&#8217;s work on <strong><em>A Bayesian belief updating model of phonetic recalibration and selective adaptation</em></strong> either have a look at this <a href="http://amlap2011.files.wordpress.com/2011/08/264_pdf.pdf">AMLaP abstract</a> or read Dave&#8217;s <a href="http://rochester.academia.edu/tiflo/Papers/574205/Kleinschmidt_D._and_Jaeger_T.F._Submitted_._A_Bayesian_belief_updating_model_of_phonetic_recalibration_and_selective_adaptation._ACL_Workshop_on_Cognitive_Modeling_and_Computational_Linguistics">short ACL paper</a> on some the findings presented at the 2011 Cognitive Modeling and Computational Linguistics workshop in Portland, Oregon (here&#8217;s a link to the <a href="http://aclweb.org/anthology-new/W/W11/W11-06.pdf">full proceedings</a>).</p>
<p>If you&#8217;re interested in this line of work, you might also enjoy reading <a href="http://palm.mindmodeling.org/cogsci2010/papers/0063/paper0063.pdf">Morgan Sonderegger and Alan Yu&#8217;s 2010 CogSci paper</a> on <em>A rational account of perceptual compensation for coarticulation</em>, which we learned about recently.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/hlp-lab/'>HLP lab</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a> Tagged: <a href='https://hlplab.wordpress.com/tag/amlap/'>AMLaP</a>, <a href='https://hlplab.wordpress.com/tag/kleinschmidt/'>Kleinschmidt</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/1026/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/1026/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/1026/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=1026&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/10/13/belated-congratulations-to-dave-kleinschmidt/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>New R resource for ordinary and multilevel regression modeling</title>
		<link>https://hlplab.wordpress.com/2011/07/27/new-r-resource-for-ordinary-and-multilevel-regression-modeling/</link>
		<comments>https://hlplab.wordpress.com/2011/07/27/new-r-resource-for-ordinary-and-multilevel-regression-modeling/#comments</comments>
		<pubDate>Wed, 27 Jul 2011 20:15:24 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[class/tutorial]]></category>
		<category><![CDATA[For students]]></category>
		<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[multilevel models]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=998</guid>
		<description><![CDATA[Here&#8217; s what I received from the Center of Multilevel Modeling at Bristol (I haven&#8217;t checked it out yet; registration seems to be free but required): The Centre for Multilevel Modelling is very pleased to announce the addition of R practicals to our free on-line multilevel modelling course. These give detailed instructions of how to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=998&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here&#8217; s what I received from the Center of Multilevel Modeling at Bristol (I haven&#8217;t checked it out yet; registration seems to be free but required):</p>
<blockquote>
<pre>The Centre for Multilevel Modelling is very pleased to announce the addition of
R practicals to our free on-line multilevel modelling course. These give
detailed instructions of how to carry out a range of analyses in R, starting
from multiple regression and progressing through to multilevel modelling of
continuous and binary data using the lmer and glmer functions.

MLwiN and Stata versions of these practicals are already available.
You will need to log on or register onto the course to view these
practicals.

Read More...
<a href="http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13">http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13</a></pre>
</blockquote>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/for-students/classtutorial/'>class/tutorial</a>, <a href='https://hlplab.wordpress.com/category/for-students/'>For students</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/multilevel-models/'>multilevel models</a>, <a href='https://hlplab.wordpress.com/tag/r-code/'>R code</a>, <a href='https://hlplab.wordpress.com/tag/regression/'>regression</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/998/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/998/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/998/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=998&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/07/27/new-r-resource-for-ordinary-and-multilevel-regression-modeling/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>LSA 2011 class on Computational Psycholinguistics</title>
		<link>https://hlplab.wordpress.com/2011/07/14/lsa-2011-class-on-computational-psycholinguistics/</link>
		<comments>https://hlplab.wordpress.com/2011/07/14/lsa-2011-class-on-computational-psycholinguistics/#comments</comments>
		<pubDate>Fri, 15 Jul 2011 03:59:48 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[class/tutorial]]></category>
		<category><![CDATA[For students]]></category>
		<category><![CDATA[class]]></category>
		<category><![CDATA[computational]]></category>
		<category><![CDATA[computational psycholinguistics]]></category>
		<category><![CDATA[psycholinguistics]]></category>
		<category><![CDATA[syllabus]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=995</guid>
		<description><![CDATA[Due to popular demand &#8211; you can find the Computational Psycholinguistics class Roger Levy and I are currently teaching at the LSA 2011 institute at Boulder mirrored here. Filed under: class/tutorial, For students Tagged: class, computational, computational psycholinguistics, psycholinguistics, syllabus<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=995&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Due to popular demand <img src='https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  &#8211; you can find the <em>Computational Psycholinguistics </em>class Roger Levy and I are currently teaching at the LSA 2011 institute at Boulder mirrored <a href="http://idiom.ucsd.edu/~rlevy/teaching/2011summer/lsa008/syllabus.html">here</a>.</p>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/for-students/classtutorial/'>class/tutorial</a>, <a href='https://hlplab.wordpress.com/category/for-students/'>For students</a> Tagged: <a href='https://hlplab.wordpress.com/tag/class/'>class</a>, <a href='https://hlplab.wordpress.com/tag/computational/'>computational</a>, <a href='https://hlplab.wordpress.com/tag/computational-psycholinguistics/'>computational psycholinguistics</a>, <a href='https://hlplab.wordpress.com/tag/psycholinguistics/'>psycholinguistics</a>, <a href='https://hlplab.wordpress.com/tag/syllabus/'>syllabus</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/995/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/995/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/995/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=995&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/07/14/lsa-2011-class-on-computational-psycholinguistics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>R code for Jaeger, Graff, Croft and Pontillo (2011): Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson</title>
		<link>https://hlplab.wordpress.com/2011/07/13/glmm-for-typologists/</link>
		<comments>https://hlplab.wordpress.com/2011/07/13/glmm-for-typologists/#comments</comments>
		<pubDate>Thu, 14 Jul 2011 02:06:27 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[articles]]></category>
		<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[areal dependency]]></category>
		<category><![CDATA[Atkinson]]></category>
		<category><![CDATA[Croft]]></category>
		<category><![CDATA[data analysis]]></category>
		<category><![CDATA[genetic dependency]]></category>
		<category><![CDATA[Graff]]></category>
		<category><![CDATA[lmer]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[multilevel models]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[serial founder model]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[typology]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=992</guid>
		<description><![CDATA[Below I am sharing the R code for our paper on the serial founder effect: Jaeger, Graff, Croft, and Pontillo. 2011. Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson. Linguistic Typology 15(2), 281–319. [if you're not subscribed to Linguistic Typology, check out this pre-final draft or contact me for an offprint]. This [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=992&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<div>Below I am sharing the R code for our paper on the serial founder effect:</div>
<div>
<ul>
<li>Jaeger, Graff, Croft, and Pontillo. 2011. <a href="http://www.reference-global.com/doi/pdf/10.1515/LITY.2011.021">Mixed effect models for genetic and areal dependencies in linguistic typology: Commentary on Atkinson</a><em>. <a href="http://www.reference-global.com/toc/lity/15/2">Linguistic Typology 15(2)</a>, 281–319.</em> [if you're not subscribed to Linguistic Typology, check out this pre-final <a href="http://rochester.academia.edu/tiflo/Papers/774232/Jaeger_T._F._Graff_P._and_Croft_B._to_appear._Mixed_effect_models_for_genetic_and_areal_dependencies_in_linguistic_typology_Commentary_on_Atkinson._Linguistic_Typology">draft</a> or contact me for an offprint].</li>
</ul>
</div>
<div>This paper is a commentary on <a href="http://www.sciencemag.org/content/332/6027/346.abstract">Atkinson&#8217;s 2011 Science article on the serial founder model</a> (see also this <a href="http://www.sciencenews.org/view/feature/id/335805">interview with ScienceNews</a>, in which parts of our comment in Linguistic Typology and follow-up work are summarized). In the commentary, we provide an introduction to linear mixed effect models for typological research. We discuss how to fit and to evaluate these models, using Atkinson&#8217;s data as an example.We illustrate the use of crossed random effects to control for genetic and areal relations between languages. We also introduce a (novel?) way to model areal dependencies based on an exponential decay function over migration distances between languages.</div>
<div>Finally, we discuss limits to the statistical analysis due to data sparseness. In particular, we show that the data available to Atkinson did not contain enough language families with sufficiently many languages to test whether the observed effect holds once random by-family slopes (for the effect) are included in the model. We also present simulations that show that the Type I error rate (false rejections) of the approach taken in Atkinson is many times higher than conventionally accepted (i.e. above .2 when .05 is the conventionally accepted rate of Type errors).</div>
<div></div>
<div>The scripts presented below are <em>not </em>intended to allow full replication of our analyses (they lack annotation and we are not allowed to share the WALS data employed by Atkinson on this site anyway). However, there are many plots and tests in the paper that might be useful for typologists or other users of mixed models. For that reason, I am for now posting the raw code. Please comment below if you have questions and we will try to provide additional annotation for the scripts as needed and as time permits. <strong>If you find (parts of the) script(s) useful, please consider citing our article in Linguistic Typology.</strong></div>
<div><span id="more-992"></span></div>
<div>
<pre><span style="color:#99cc00;"># assumes that we are in the right working directory and that there </span>
<span style="color:#99cc00;"># is a subdirectory called /figures.</span>
<span style="color:#99cc00;">source("functions.R")</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># load data</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">load("EnrichedAtkinsonCorrected.RData")</span>
<span style="color:#99cc00;">str(d)</span>
<span style="color:#99cc00;">summary(d)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># Atkinson's ordinary model</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">l = lm(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000,</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">summary(l)</span>

<span style="color:#99cc00;">l = lm(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 + </span>
<span style="color:#99cc00;"> getWeightedArealPhonemeDiversity(d, 40),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">summary(l)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># Atkinson's lmer model</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># mixed model that Atkinson ran</span>
<span style="color:#99cc00;">library(lme4)</span>

<span style="color:#99cc00;">l.atkinson = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.atkinson</span>
<span style="color:#99cc00;">round(cor(fitted(l.atkinson), l.atkinson@y)^2,3)</span>
<span style="color:#99cc00;">pvals.fnc(l.atkinson, nsim= 20000)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># outlier check</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">hist(scale(d$lEstimatedSpeakerPopSize))</span>
<span style="color:#99cc00;">hist(scale(d$DistanceFromBestFitOrigin1000))</span>
<span style="color:#99cc00;">hist(scale(d$TotalNormalizedPhonemeDiversity))</span>

<span style="color:#99cc00;"># one potential outlier for population size</span>
<span style="color:#99cc00;">d$slEstimatedSpeakerPopSize = as.numeric(scale(d$lEstimatedSpeakerPopSize))</span>
<span style="color:#99cc00;">d[abs(scale(d$lEstimatedSpeakerPopSize)) &gt; 2.5,]$LanguageName</span>
<span style="color:#99cc00;">d[abs(scale(d$lEstimatedSpeakerPopSize)) &gt; 2.5,]$slEstimatedSpeakerPopSize</span>

<span style="color:#99cc00;"># no outliers for distance</span>
<span style="color:#99cc00;">d[abs(scale(d$DistanceFromBestFitOrigin1000)) &gt; 2.5,]$LanguageName</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># examinging residuals</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># outliers in terms of residuals</span>
<span style="color:#99cc00;">d[abs(scale(d$TotalNormalizedPhonemeDiversity - fitted(l.atkinson))) &gt; 2.5,]$LanguageName</span>
<span style="color:#99cc00;">ggplot(d, aes(x = TotalNormalizedPhonemeDiversity - fitted(l.atkinson))) +</span>
<span style="color:#99cc00;"> geom_histogram(aes(y=..density..)) +</span>
<span style="color:#99cc00;"> geom_density(size = 2, color = "blue") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Residuals") +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/histogram-residuals.png", width = 3.5, height = 3.5)</span>

<span style="color:#99cc00;">library(Design)</span>
<span style="color:#99cc00;">ggplot(d, aes(x = lEstimatedSpeakerPopSize, y = as.numeric(scale(TotalNormalizedPhonemeDiversity - fitted(l.atkinson))))) +</span>
<span style="color:#99cc00;"> geom_point() +</span>
<span style="color:#99cc00;"> geom_abline(intercept = -2.5, slope=0, linetype=2, color="gray25", size=1.2) +</span>
<span style="color:#99cc00;"> geom_abline(intercept = 2.5, slope=0, linetype=2, color="gray25", size=1.2) +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm") +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm", formula = y ~ pol(x, 2), color = "red") +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm", formula = y ~ pol(x, 3), color = "green") +</span>
<span style="color:#99cc00;"> scale_x_continuous("(log-transformed) population size") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Standardized residuals") +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/populationSize-residuals.png", width = 3.5, height = 3.5)</span>
<span style="color:#99cc00;">ggplot(d, aes(x = DistanceFromBestFitOrigin, y = as.numeric(scale(TotalNormalizedPhonemeDiversity - fitted(l.atkinson))))) +</span>
<span style="color:#99cc00;"> geom_point() +</span>
<span style="color:#99cc00;"> geom_abline(intercept = -2.5, slope=0, linetype=2, color="gray25", size=1.2) +</span>
<span style="color:#99cc00;"> geom_abline(intercept = 2.5, slope=0, linetype=2, color="gray25", size=1.2) +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm") +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm", formula = y ~ pol(x, 2), color = "red") +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm", formula = y ~ pol(x, 3), color = "green") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Standardized residuals") +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/distance-residuals.png", width = 3.5, height = 3.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># examining residuals by group (assessing normality and homoscedasticity)</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(ggplot2)</span>
<span style="color:#99cc00;">dd = prepareVars(d[table(d$Family)[as.character(d$Family)] &gt;= 4,])</span>
<span style="color:#99cc00;">l.atkinson.r = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> dd)</span>
<span style="color:#99cc00;">dd = dd[order(paste(dd$Continent, dd$Family)),]</span>
<span style="color:#99cc00;">ggplot(dd, aes(x = paste(Continent, Family, sep = " : "), y = resid(l.atkinson.r), fill = Continent)) +</span>
<span style="color:#99cc00;"> geom_boxplot(size = .5) +</span>
<span style="color:#99cc00;"># geom_abline(intercept = 0, slope = 0, color = "blue") +</span>
<span style="color:#99cc00;"> scale_x_discrete("Family") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Standardized residuals") +</span>
<span style="color:#99cc00;"> coord_flip() + </span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/by-family-residuals.png", width = 5.5, height = 8)</span>
<span style="color:#99cc00;">ggplot(d, aes(x = Continent, y = resid(l.atkinson), fill = Continent)) +</span>
<span style="color:#99cc00;"> geom_boxplot(size = .5) +</span>
<span style="color:#99cc00;"> scale_x_discrete("Continent") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Standardized residuals") +</span>
<span style="color:#99cc00;"> coord_flip() + </span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/by-continent-residuals.png", width = 5.5, height = 3)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># examining random effects</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">dotplot(ranef(l.atkinson, postVar=T))</span>
<span style="color:#99cc00;">qqmath(ranef(l.atkinson, postVar=T)</span>

<span style="color:#99cc00;">ranef(l.atkinson)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># rerunning analysis without suspicious data points</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># results holds</span>
<span style="color:#99cc00;">lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(subset(d, d$LanguageName != "Mandarin" &amp;</span>
<span style="color:#99cc00;"> abs(scale(TotalNormalizedPhonemeDiversity - fitted(l.atkinson))) &lt; 2.5 ))</span>
<span style="color:#99cc00;">)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># Illustrate shrinkage</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(ggplot2)</span>
<span style="color:#99cc00;">d$AtkinsonPrediction = fitted(l.atkinson)</span>
<span style="color:#99cc00;">d$FamilyMembers = table(d$Family)[as.character(d$Family)]</span>
<span style="color:#99cc00;">dd = aggregate(d[c('FamilyMembers','AtkinsonPrediction','DistanceFromBestFitOrigin','EstimatedSpeakerPopSize','TotalNormalizedPhonemeDiversity')], by= list(Family = d$Family), mean)</span>
<span style="color:#99cc00;">ggplot(dd, </span>
<span style="color:#99cc00;"> aes(x = DistanceFromBestFitOrigin)) +</span>
<span style="color:#99cc00;"> geom_point(aes(y= TotalNormalizedPhonemeDiversity, </span>
<span style="color:#99cc00;"> size = I(FamilyMembers)), </span>
<span style="color:#99cc00;"> alpha=.5, </span>
<span style="color:#99cc00;"> color = "black"</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> geom_point(aes(y= AtkinsonPrediction), color = "blue", shape = 8) +</span>
<span style="color:#99cc00;"> geom_segment(aes(</span>
<span style="color:#99cc00;"> xend = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y= TotalNormalizedPhonemeDiversity, </span>
<span style="color:#99cc00;"> yend= AtkinsonPrediction), </span>
<span style="color:#99cc00;"> color = "blue"</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> scale_color_discrete("Family") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") + </span>
<span style="color:#99cc00;"> scale_size_continuous("Languages\nin family") +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity), </span>
<span style="color:#99cc00;"> color = "black", linetype = 1, size = 1.5, </span>
<span style="color:#99cc00;"> method = "lm", se=F) +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = AtkinsonPrediction), </span>
<span style="color:#99cc00;"> color = "blue", linetype = 1, size = 1.5, </span>
<span style="color:#99cc00;"> method = "lm", se=F) +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.3,1.3)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/shrinkage.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># What about random slopes?</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># full mixed effect model does not converge</span>
<span style="color:#99cc00;"># backing off to mixed effect model with reduced random effect structure</span>
<span style="color:#99cc00;">library(lme4</span>

<span style="color:#99cc00;"># limiting ourselves to language families with at least kmin languages</span>
<span style="color:#99cc00;">kmin = 10</span>
<span style="color:#99cc00;">dd = prepareVars(d[table(d$Family)[as.character(d$Family)] &gt;= kmin,])</span>
<span style="color:#99cc00;">nlevels(as.factor(as.character(dd$Family)))</span>
<span style="color:#99cc00;">nlevels(as.factor(as.character(dd$LanguageName)))</span>
<span style="color:#99cc00;">l.base = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.distance = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd, </span>
<span style="color:#99cc00;"> control = list(msVerbose=F, maxIter = 1000, maxFN = 1200)</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.population = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize | Family),</span>
<span style="color:#99cc00;"> dd, </span>
<span style="color:#99cc00;"> control = list(msVerbose=F, maxIter = 1000, maxFN = 1200)</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.mainOnly = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize + cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd, </span>
<span style="color:#99cc00;"> control = list(msVerbose=F, maxIter = 1000, maxFN = 1200)</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.full = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize * cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd, </span>
<span style="color:#99cc00;"> control = list(msVerbose=F, maxIter = 1000, maxFN = 1200)</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">anova(l.base, l.distance, l.mainOnly, l.full)</span>
<span style="color:#99cc00;">anova(l.base, l.population, l.mainOnly, l.full)</span>

<span style="color:#99cc00;">library(languageR)</span>
<span style="color:#99cc00;">pvals.fnc(l.base, nsim= 10000)</span>

<span style="color:#99cc00;">kmin = 7</span>
<span style="color:#99cc00;">dd = prepareVars(d[table(d$Family)[as.character(d$Family)] &gt;= kmin,])</span>
<span style="color:#99cc00;">nlevels(as.factor(as.character(dd$Family)))</span>
<span style="color:#99cc00;">nlevels(as.factor(as.character(dd$LanguageName)))</span>
<span style="color:#99cc00;">ll.base = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.distance = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.population = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.mainOnly = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize + cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">l.full = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 + clEstimatedSpeakerPopSize * cDistanceFromBestFitOrigin | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">anova(l.base, l.distance, l.mainOnly, l.full)</span>
<span style="color:#99cc00;">anova(l.base, l.population, l.mainOnly, l.full)</span>

<span style="color:#99cc00;">#################################</span>
<span style="color:#99cc00;"># assessing partial contributions</span>
<span style="color:#99cc00;">#################################</span>
<span style="color:#99cc00;">l = lmer(TotalNormalizedPhonemeDiversity ~ 1 + (1 | Genus) + (1 | Subfamily) + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;"># assessing partial contributions</span>
<span style="color:#99cc00;">l = lmer(TotalNormalizedPhonemeDiversity ~ 1 + (1 | Subfamily) + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;">l = lmer(TotalNormalizedPhonemeDiversity ~ 1 + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;">l = lmer(NormalizedVowelDiversity ~ 1 + (1 | Genus) + (1 | Subfamily) + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;">l = lmer(NormalizedConsonantDiversity ~ 1 + (1 | Genus) + (1 | Subfamily) + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;">l = lmer(NormalizedToneDiversity ~ 1 + (1 | Genus) + (1 | Subfamily) + (1 | Family), d)</span>
<span style="color:#99cc00;">round(cor(fitted(l), l@y)^2,3)</span>

<span style="color:#99cc00;">l.atkinson = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">round(cor(fitted(l.atkinson), l.atkinson@y)^2,3)</span>

<span style="color:#99cc00;">l.atkinson = lmer(NormalizedConsonantDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">round(cor(fitted(l.atkinson), l.atkinson@y)^2,3)</span>

<span style="color:#99cc00;">l.atkinson = lmer(NormalizedVowelDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">round(cor(fitted(l.atkinson), l.atkinson@y)^2,3)</span>

<span style="color:#99cc00;">l.atkinson = lmer(NormalizedToneDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">round(cor(fitted(l.atkinson), l.atkinson@y)^2,3)</span>

<span style="color:#99cc00;">l.country.continent.noGenus = lmer(NormalizedToneDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># are all genealogical grouping factors justified?</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(lme4)</span>

<span style="color:#99cc00;">l.atkinson = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.atkinson, l.noGenus)</span>
<span style="color:#99cc00;">l.noSubfamily = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.atkinson, l.noGenus, l.noSubfamily)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># assessing areal effects - random effects</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(lme4)</span>
<span style="color:#99cc00;">library(languageR)</span>

<span style="color:#99cc00;">l.country.continent = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.continent = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.continent, l.country)</span>
<span style="color:#99cc00;">anova(l.country.continent, l.continent)</span>
<span style="color:#99cc00;">anova(l.atkinson, l.continent)</span>
<span style="color:#99cc00;">l.noCountry = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.continent, l.country, l.noCountry)</span>
<span style="color:#99cc00;">l.country.noSubfamily = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.continent, l.country, l.country.noSubfamily)</span>

<span style="color:#99cc00;">l.country.continent.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.continent.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.continent.noGenus, l.country.noGenus)</span>
<span style="color:#99cc00;">anova(l.country.continent.noGenus, l.continent.noGenus)</span>
<span style="color:#99cc00;">anova(l.country.continent, l.country, l.country.noGenus)</span>
<span style="color:#99cc00;">anova(l.atkinson, l.country)</span>
<span style="color:#99cc00;">anova(l.country.noGenus, l.country.noSubfamily)</span>
<span style="color:#99cc00;">anova(l.country.noGenus, l.country.continent)</span>

<span style="color:#99cc00;">l.noCountry.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.noGenus, l.noCountry.noGenus)</span>

<span style="color:#99cc00;">#################################################</span>
<span style="color:#99cc00;"># let's look at random slopes for the best models</span>
<span style="color:#99cc00;">#################################################</span>
<span style="color:#99cc00;">l.slp.country.continent.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin1000 | Country) +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin1000 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.slp.continent.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin1000 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.slp.country.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 + cDistanceFromBestFitOrigin1000 | Country) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.slp.country.continent.noGenus, l.slp.country.noGenus)</span>
<span style="color:#99cc00;">anova(l.slp.country.continent.noGenus, l.slp.continent.noGenus)</span>
<span style="color:#99cc00;">anova(l.country.noGenus, l.slp.country.noGenus)</span>
<span style="color:#99cc00;">anova(l.continent.noGenus, l.slp.continent.noGenus)</span>

<span style="color:#99cc00;">#################################################</span>
<span style="color:#99cc00;"># get p-values</span>
<span style="color:#99cc00;">#################################################</span>
<span style="color:#99cc00;">pvals.fnc(l.country.noGenus, nsim=20000)</span>
<span style="color:#99cc00;">pvals.fnc(l.continent.noGenus, nsim=20000)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># assessing areal effects - spill over</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># without interaction with population size</span>
<span style="color:#99cc00;">dev2 = c()</span>
<span style="color:#99cc00;">ss2 = c()</span>
<span style="color:#99cc00;">f2 = c()</span>
<span style="color:#99cc00;">for(s in c(seq(100,600,50),seq(600,660,10),seq(670,720,2.5),seq(730,800,10),seq(800,2500,50),seq(2500,15000,500), seq(15000,20000,1000))) {</span>
<span style="color:#99cc00;">#for(s in c(seq(20,100,20),seq(100,180,10),seq(180,250,5),seq(250,400,10),seq(400,2500,50),seq(2500,15000,500), seq(15000,20000,1000))) {</span>
<span style="color:#99cc00;"> l = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,s) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d)</span>
<span style="color:#99cc00;"> )</span>
<span style="color:#99cc00;"> ss2 = append(ss2, s)</span>
<span style="color:#99cc00;"> dev2 = as.numeric(append(dev2, summary(l)@deviance["ML"]))</span>
<span style="color:#99cc00;"> f2 = append(f2, fixef(l)[4])</span>
<span style="color:#99cc00;">}</span>
<span style="color:#99cc00;">plot(ss2[1:20], dev2[1:20])</span>
<span style="color:#99cc00;">plot(ss2[1:70], dev2[1:70])</span>
<span style="color:#99cc00;">plot(ss2, dev2)</span>
<span style="color:#99cc00;">plot(ss2, dev2*(f2/abs(f2)))</span>

<span style="color:#99cc00;">dd = data.frame(x = ss2[1:85], y = dev2[1:85])</span>
<span style="color:#99cc00;">dd[dd$y == min(dd$y),]$x</span>
<span style="color:#99cc00;">ggplot(dd,</span>
<span style="color:#99cc00;"> aes(x=x, y=y)) +</span>
<span style="color:#99cc00;"> geom_line() +</span>
<span style="color:#99cc00;"> geom_point() +</span>
<span style="color:#99cc00;"> geom_point(aes(x = dd[dd$y == min(dd$y),]$x, </span>
<span style="color:#99cc00;"> y = dd[dd$y == min(dd$y),]$y</span>
<span style="color:#99cc00;"> ),</span>
<span style="color:#99cc00;"> size = 6,</span>
<span style="color:#99cc00;"> color = "red", </span>
<span style="color:#99cc00;"> shape = 1,</span>
<span style="color:#99cc00;"> alpha = 1,</span>
<span style="color:#99cc00;"> solid = T</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> scale_x_continuous("Standard deviation of weight function", trans = "log10") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Deviance of model") +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/areal-fits-overall-normalization.png", width = 4.5, height = 4.5)</span>

<span style="color:#99cc00;">d$ArealPhonemeDiversity = getMigrationDistanceWeightedArealPhonemeDiversity(d,240)</span>
<span style="color:#99cc00;">d$ContinentWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Continent), mean))[as.character(d$Continent)]</span>
<span style="color:#99cc00;">d$CountryWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Country), mean))[as.character(d$Country)]</span>
<span style="color:#99cc00;">d$FamilyWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Family), mean))[as.character(d$Family)]</span>
<span style="color:#99cc00;">d$SubfamilyWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Subfamily), mean))[as.character(d$Subfamily)]</span>
<span style="color:#99cc00;">d$GenusWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Genus), mean))[as.character(d$Genus)]</span>
<span style="color:#99cc00;">my.pairs(d[,grep("FamilyWisePhonemeDiversity|SubfamilyWisePhonemeDiversity|GenusWisePhonemeDiversity|ContinentWisePhonemeDiversity|CountryWisePhonemeDiversity|ArealPhonemeDiversity", names(d))],</span>
<span style="color:#99cc00;"> labels = c("Family","Subfamily","Genus","Continent","Country",</span>
<span style="color:#99cc00;"> "Areal\n(s=685)")</span>
<span style="color:#99cc00;">)</span>

<span style="color:#99cc00;"># with interaction with population size</span>
<span style="color:#99cc00;">dev = c()</span>
<span style="color:#99cc00;">ss = c()</span>
<span style="color:#99cc00;">for(s in c(seq(10,100,5),seq(100,200,10),seq(200,2500,50),seq(2500,15000,500), seq(15000,20000,1000))) {</span>
<span style="color:#99cc00;">l = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> (cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,s)) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">ss = append(ss, s)</span>
<span style="color:#99cc00;">dev = append(dev, deviance(l))</span>
<span style="color:#99cc00;">}</span>
<span style="color:#99cc00;">plot(ss[1:20], dev[1:20])</span>
<span style="color:#99cc00;">plot(ss[1:30], dev[1:30])</span>
<span style="color:#99cc00;">plot(ss, dev)</span>

<span style="color:#99cc00;">## best fitting model</span>
<span style="color:#99cc00;">dd = prepareVars(d)</span>
<span style="color:#99cc00;">dd$ArealPhonemeDiversityBest = getMigrationDistanceWeightedArealPhonemeDiversity(dd,685)</span>
<span style="color:#99cc00;">l.areal = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> dd$ArealPhonemeDiversityBest +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">pvals.fnc(l.areal, nsim = 20000)</span>

<span style="color:#99cc00;">l.areal = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> dd$ArealPhonemeDiversityBest +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> dd</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">pvals.fnc(l.areal, nsim = 20000)</span>

<span style="color:#99cc00;">#############################################</span>
<span style="color:#99cc00;"># some additional explorations and plots</span>
<span style="color:#99cc00;">#############################################</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity50 = getMigrationDistanceWeightedArealPhonemeDiversity(d,50)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity100 = getMigrationDistanceWeightedArealPhonemeDiversity(d,100)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity250 = getMigrationDistanceWeightedArealPhonemeDiversity(d,250)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity500 = getMigrationDistanceWeightedArealPhonemeDiversity(d,500)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity1000 = getMigrationDistanceWeightedArealPhonemeDiversity(d,1000)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity2500 = getMigrationDistanceWeightedArealPhonemeDiversity(d,2500)</span>
<span style="color:#99cc00;">d$ArealPhonemeDiversity5000 = getMigrationDistanceWeightedArealPhonemeDiversity(d,5000)</span>
<span style="color:#99cc00;">d$ContinentWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Continent), mean))[as.character(d$Continent)]</span>
<span style="color:#99cc00;">d$CountryWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Country), mean))[as.character(d$Country)]</span>
<span style="color:#99cc00;">d$FamilyWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Family), mean))[as.character(d$Family)]</span>
<span style="color:#99cc00;">d$SubfamilyWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Subfamily), mean))[as.character(d$Subfamily)]</span>
<span style="color:#99cc00;">d$GenusWisePhonemeDiversity = unlist(lapply(split(d$TotalNormalizedPhonemeDiversity, d$Genus), mean))[as.character(d$Genus)]</span>
<span style="color:#99cc00;">my.pairs(d[,grep("FamilyWisePhonemeDiversity|SubfamilyWisePhonemeDiversity|GenusWisePhonemeDiversity|ContinentWisePhonemeDiversity|CountryWisePhonemeDiversity|ArealPhonemeDiversity", names(d))],</span>
<span style="color:#99cc00;"> labels = c("Family","Subfamily","Genus","Continent","Country",</span>
<span style="color:#99cc00;"> "Areal\n(s=50)", "Areal\n(s=100)", "Areal\n(s=250)", "Areal\n(s=500)", </span>
<span style="color:#99cc00;"> "Areal\n(s=1000)", "Areal\n(s=2500)", "Areal\n(s=5000)"</span>
<span style="color:#99cc00;"> )</span>
<span style="color:#99cc00;">)</span>

<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity50, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity100, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity250, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity500, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity1000, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity2500, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>
<span style="color:#99cc00;">round(cor(d$ArealPhonemeDiversity5000, d$TotalNormalizedPhonemeDiversity)^2, 3)</span>

<span style="color:#99cc00;">ddd = melt(d, </span>
<span style="color:#99cc00;"> measure.var = grep("ArealPhonemeDiversity", names(d))</span>
<span style="color:#99cc00;">)</span>
<span style="color:#99cc00;">ggplot(ddd, aes(y = TotalNormalizedPhonemeDiversity, x= value, color= variable)) +</span>
<span style="color:#99cc00;"> geom_point(alpha = .4) +</span>
<span style="color:#99cc00;"> geom_smooth(size = 1.3, method = "lm") + </span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Weighted areal normalized phonological diversity") +</span>
<span style="color:#99cc00;"> scale_color_manual("Standard deviation of\ndistance-based weight decay",</span>
<span style="color:#99cc00;"> values = c("red","orange","yellow","green","blue","purple","black"),</span>
<span style="color:#99cc00;"> breaks = levels(ddd$variable),</span>
<span style="color:#99cc00;"> labels = c(50,100,250,500,1000,2500,5000)</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank()) </span>
<span style="color:#99cc00;">ggsave(file = "figures/areal-effects.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># assessing areal effects - spill over vs. random effect</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">best_s = 685</span>

<span style="color:#99cc00;">l.country.continent = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country.continent.areal = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Continent) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.continent.areal, l.country.continent)</span>

<span style="color:#99cc00;">l.country = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country.areal = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.areal = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Genus) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country.areal.noGenus = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Subfamily) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.areal, l.country)</span>
<span style="color:#99cc00;">anova(l.country.areal, l.areal)</span>
<span style="color:#99cc00;">anova(l.country.areal, l.country.areal.noGenus)</span>

<span style="color:#99cc00;">l.country.areal.noGenus.noSubfamily = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.areal, l.country.areal.noGenus, l.country.areal.noGenus.noSubfamily)</span>

<span style="color:#99cc00;">l.country.areal.noGenus.noSubfamily.noFamily = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Country),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">l.country.areal.noGenus.noSubfamily.noCountry = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> getMigrationDistanceWeightedArealPhonemeDiversity(d,best_s) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.areal, l.country.areal.noGenus, l.country.areal.noGenus.noSubfamily,l.country.areal.noGenus.noSubfamily.noCountry)</span>
<span style="color:#99cc00;">anova(l.country.areal, l.country.areal.noGenus, l.country.areal.noGenus.noSubfamily,l.country.areal.noGenus.noSubfamily.noFamily)</span>

<span style="color:#99cc00;">l.country.noAreal.noGenus.noSubfamily = lmer(TotalNormalizedPhonemeDiversity ~ </span>
<span style="color:#99cc00;"> clEstimatedSpeakerPopSize *</span>
<span style="color:#99cc00;"> cDistanceFromBestFitOrigin1000 +</span>
<span style="color:#99cc00;"> (1 | Country) +</span>
<span style="color:#99cc00;"> (1 | Family),</span>
<span style="color:#99cc00;"> prepareVars(d))</span>
<span style="color:#99cc00;">anova(l.country.areal.noGenus.noSubfamily,l.country.noAreal.noGenus.noSubfamily)</span>

<span style="color:#99cc00;">pvals.fnc(l.country.areal.noGenus.noSubfamily, nsim = 20000)</span>

<span style="color:#99cc00;">s_best = 685</span>
<span style="color:#99cc00;">distanceBasedWeight(100, s_best) </span>
<span style="color:#99cc00;">distanceBasedWeight(500, s_best) / distanceBasedWeight(100, s_best) </span>
<span style="color:#99cc00;">distanceBasedWeight(1000, s_best) /distanceBasedWeight(100, s_best) </span>
<span style="color:#99cc00;">distanceBasedWeight(2500, s_best) /distanceBasedWeight(100, s_best) </span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># illustrate areal fit for example language</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">s_best = 685</span>

<span style="color:#99cc00;">chosenlang = "Hindi"</span>
<span style="color:#99cc00;">langs = as.character(d$ourWALScode)</span>
<span style="color:#99cc00;">dd = data.frame(t(d[d$Language == chosenlang,langs]))</span>
<span style="color:#99cc00;">names(dd) = c("distance")</span>
<span style="color:#99cc00;">dd$weight = distanceBasedWeight(dd$distance, s_best)</span>
<span style="color:#99cc00;">dd = subset(dd, weight &gt; 0)</span>
<span style="color:#99cc00;">str(dd)</span>

<span style="color:#99cc00;">ddd = d</span>
<span style="color:#99cc00;">row.names(ddd) = ddd$ourWALScode</span>
<span style="color:#99cc00;">dd$AdjustedLongitude = (ddd[row.names(dd),]$Longitude - d[d$Language == chosenlang,]$Longitude) * (dd$distance / max(dd$distance))</span>
<span style="color:#99cc00;">dd$AdjustedLatitude = (ddd[row.names(dd),]$Latitude - d[d$Language == chosenlang,]$Latitude) * (dd$distance / max(dd$distance))</span>
<span style="color:#99cc00;">dd$AbsoluteContribution = abs(ddd[row.names(dd),]$TotalNormalizedPhonemeDiversity * dd$weight)</span>
<span style="color:#99cc00;">dd$Language = ddd[row.names(dd),]$Language</span>
<span style="color:#99cc00;">dd$ourWALScode = ddd[row.names(dd),]$ourWALScode</span>
<span style="color:#99cc00;">dd = subset(dd, Language != chosenlang)</span>

<span style="color:#99cc00;">#size = AbsoluteContribution</span>
<span style="color:#99cc00;">ggplot(dd, aes(x=AdjustedLongitude, y=AdjustedLatitude, size = weight )) +</span>
<span style="color:#99cc00;"> geom_text(aes(label = Language), alpha = .8) +</span>
<span style="color:#99cc00;"> geom_point(aes(x=0,y=0),shape=8,size=6,color="red") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Longitudinal migration distance") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Latitudinal migration distance") +</span>
<span style="color:#99cc00;"> scale_size_continuous("Weight based on\nmigration distance") +</span>
<span style="color:#99cc00;"> coord_cartesian(xlim=c(-1,1), ylim=c(-1,1)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/hindi-weights.png", width = 4.5, height = 4.5)</span>

<span style="color:#99cc00;">chosenlang = "Albanian"</span>
<span style="color:#99cc00;">langs = as.character(d$ourWALScode)</span>
<span style="color:#99cc00;">dd = data.frame(t(d[d$Language == chosenlang,langs]))</span>
<span style="color:#99cc00;">names(dd) = c("distance")</span>
<span style="color:#99cc00;">dd$weight = distanceBasedWeight(dd$distance, s_best)</span>
<span style="color:#99cc00;">dd = subset(dd, weight &gt; 0)</span>
<span style="color:#99cc00;">str(dd)</span>

<span style="color:#99cc00;">ddd = d</span>
<span style="color:#99cc00;">row.names(ddd) = ddd$ourWALScode</span>
<span style="color:#99cc00;">dd$AdjustedLongitude = (ddd[row.names(dd),]$Longitude - d[d$Language == chosenlang,]$Longitude) * (dd$distance / max(dd$distance))</span>
<span style="color:#99cc00;">dd$AdjustedLatitude = (ddd[row.names(dd),]$Latitude - d[d$Language == chosenlang,]$Latitude) * (dd$distance / max(dd$distance))</span>
<span style="color:#99cc00;">dd$AbsoluteContribution = abs(ddd[row.names(dd),]$TotalNormalizedPhonemeDiversity * dd$weight)</span>
<span style="color:#99cc00;">dd$Language = ddd[row.names(dd),]$Language</span>
<span style="color:#99cc00;">dd$ourWALScode = ddd[row.names(dd),]$ourWALScode</span>
<span style="color:#99cc00;">dd = subset(dd, Language != chosenlang)</span>

<span style="color:#99cc00;">#size = AbsoluteContribution</span>
<span style="color:#99cc00;">ggplot(dd, aes(x=AdjustedLongitude, y=AdjustedLatitude, size = weight )) +</span>
<span style="color:#99cc00;"> geom_text(aes(label = Language), alpha = .8) +</span>
<span style="color:#99cc00;"> geom_point(aes(x=0,y=0),shape=8,size=6,color="red") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Longitudinal migration distance") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Latitudinal migration distance") +</span>
<span style="color:#99cc00;"> scale_size_continuous("Weight based on\nmigration distance") +</span>
<span style="color:#99cc00;"> coord_cartesian(xlim=c(-1,1), ylim=c(-1,1)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/albanian-weights.png", width = 3.75, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># assessing non-linearity</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(ggplot2)</span>
<span style="color:#99cc00;">library(gam)</span>

<span style="color:#99cc00;">hist(table(d$Family), breaks = 100)</span>
<span style="color:#99cc00;">ggplot(d, </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> geom_point(aes(size = EstimatedSpeakerPopSize), alpha = .5) +</span>
<span style="color:#99cc00;"> scale_color_discrete("Family") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") + </span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity), </span>
<span style="color:#99cc00;"> color = "black", linetype = 2, size = 1.5, </span>
<span style="color:#99cc00;"> method = "gam", formula = y ~ ns(x,5)) +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.5,2)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/regression-linear.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">ggplot(d, </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> geom_point(aes(size = EstimatedSpeakerPopSize), alpha = .5) +</span>
<span style="color:#99cc00;"> scale_color_discrete("Family") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") + </span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity), </span>
<span style="color:#99cc00;"> color = "red", linetype = 2, size = 1.3, </span>
<span style="color:#99cc00;"> method = "loess") +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity), </span>
<span style="color:#99cc00;"> color = "black", linetype = 1, size = 1.3, </span>
<span style="color:#99cc00;"> method = "lm") +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.5,2)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank(), legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/regression-loess.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># assessing local trends</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">ggplot(subset(d, table(d$Family)[as.character(d$Family)] &gt;= 16), </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity, </span>
<span style="color:#99cc00;"> color = as.factor(as.character(Family))</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> geom_point(aes(size = EstimatedSpeakerPopSize), alpha = .5) +</span>
<span style="color:#99cc00;"> scale_color_brewer("Family", palette = "Set1") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") + </span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity, </span>
<span style="color:#99cc00;"> color = as.factor(as.character(Family))),</span>
<span style="color:#99cc00;"> method = "lm", formula = y ~ x, size=1.5) +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.5,2)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/regression-linear-less.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">ggplot(subset(d, table(d$Family)[as.character(d$Family)] &gt;= 16), </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity, </span>
<span style="color:#99cc00;"> size = EstimatedSpeakerPopSize, </span>
<span style="color:#99cc00;"> color = as.factor(as.character(Family))</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> geom_point(alpha = .75) +</span>
<span style="color:#99cc00;"> scale_color_brewer("Family", palette = "Set1") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Distance from origin") +</span>
<span style="color:#99cc00;"> scale_y_continuous("Normalized phonological diversity") + </span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> geom_smooth(size=1.5) +</span>
<span style="color:#99cc00;"> geom_smooth(aes(x = DistanceFromBestFitOrigin, </span>
<span style="color:#99cc00;"> y = TotalNormalizedPhonemeDiversity), color = "black", linetype = 2) +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.5,2))</span>
<span style="color:#99cc00;">ggsave(file = "figures/regression-non-linear-less.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># plotting distribution on map</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">library(maps)</span>
<span style="color:#99cc00;">ggplot(subset(d, table(d$Family)[as.character(d$Family)] &gt;= 10), </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = Longitude, </span>
<span style="color:#99cc00;"> y = Latitude</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> borders("world") +</span>
<span style="color:#99cc00;"> geom_point(aes(</span>
<span style="color:#99cc00;"> size = EstimatedSpeakerPopSize, </span>
<span style="color:#99cc00;"> color = as.factor(as.character(Family))</span>
<span style="color:#99cc00;"> ), </span>
<span style="color:#99cc00;"> alpha = .8) +</span>
<span style="color:#99cc00;"> scale_color_brewer("Family", palette = "Set3") +</span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> opts(panel.grid.major = theme_blank(), panel.background = theme_blank(), panel.grid.minor = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/world.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">ggplot(subset(d, table(d$Family)[as.character(d$Family)] &gt;= 16), </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = Longitude, </span>
<span style="color:#99cc00;"> y = Latitude</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> borders("world") +</span>
<span style="color:#99cc00;"> geom_point(aes(</span>
<span style="color:#99cc00;"> size = EstimatedSpeakerPopSize, </span>
<span style="color:#99cc00;"> color = as.factor(as.character(Family))</span>
<span style="color:#99cc00;"> ), </span>
<span style="color:#99cc00;"> alpha = .75) +</span>
<span style="color:#99cc00;"> scale_color_brewer("Family", palette = "Set1") +</span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> opts(panel.grid.major = theme_blank(), panel.background = theme_blank(), panel.grid.minor = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/world-less.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">ggplot(subset(d, table(d$Family)[as.character(d$Family)] &gt;= 30), </span>
<span style="color:#99cc00;"> aes(</span>
<span style="color:#99cc00;"> x = Longitude, </span>
<span style="color:#99cc00;"> y = Latitude</span>
<span style="color:#99cc00;"> )) +</span>
<span style="color:#99cc00;"> borders("world") +</span>
<span style="color:#99cc00;"> geom_point(aes(</span>
<span style="color:#99cc00;"> size = EstimatedSpeakerPopSize, </span>
<span style="color:#99cc00;"> shape = as.factor(as.character(Family)),</span>
<span style="color:#99cc00;"> color = TotalNormalizedPhonemeDiversity</span>
<span style="color:#99cc00;"> ), </span>
<span style="color:#99cc00;"> alpha = .75) +</span>
<span style="color:#99cc00;"> scale_color_gradient("Phonological Complexity") +</span>
<span style="color:#99cc00;"> scale_size_continuous("Population Size", trans="log10") +</span>
<span style="color:#99cc00;"> scale_shape_discrete("Family")</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># simulation for simpson's paradox</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>

<span style="color:#99cc00;">## -----------------</span>
<span style="color:#99cc00;"># set parameters</span>
<span style="color:#99cc00;">## -----------------</span>
<span style="color:#99cc00;">init = function(ngroup = 10, nitem = 10) {</span>
<span style="color:#99cc00;"> x = 1:27000</span>
<span style="color:#99cc00;"> spread_x = (max(x) - min(x)) * .2</span>
<span style="color:#99cc00;"> group_x_start = runif(ngroup, min(x), max(x))</span>
<span style="color:#99cc00;"> group_x_end = runif(ngroup, </span>
<span style="color:#99cc00;"> apply(cbind(group_x_start + spread_x/2, max(x)), MARGIN = 1, FUN = min),</span>
<span style="color:#99cc00;"> apply(cbind(group_x_start + spread_x, max(x)), MARGIN = 1, FUN = min))</span>

<span style="color:#99cc00;"> # variances</span>
<span style="color:#99cc00;"> group_sigma = .1</span>
<span style="color:#99cc00;"> group_x_sigma = .00001</span>
<span style="color:#99cc00;"> indiv_sigma = .1</span>

<span style="color:#99cc00;"> # betas</span>
<span style="color:#99cc00;"> alpha = .6</span>
<span style="color:#99cc00;"> group_alpha = rnorm(ngroup, 0, group_sigma)</span>
<span style="color:#99cc00;"> beta_x = -.00005</span>
<span style="color:#99cc00;"> # for normal differences</span>
<span style="color:#99cc00;"> # beta_group_x = rnorm(ngroup, 0, group_x_sigma)</span>
<span style="color:#99cc00;"> beta_group_x = 2 * rbinom(ngroup, 1, 0.5) * -beta_x + rnorm(ngroup, 0, group_x_sigma)</span>

<span style="color:#99cc00;"> ## -----------------</span>
<span style="color:#99cc00;"> # create data set</span>
<span style="color:#99cc00;"> ## -----------------</span>
<span style="color:#99cc00;"> d = data.frame(Group = rep(1:ngroup, nitem), </span>
<span style="color:#99cc00;"> Item = sort(rep(1:nitem, ngroup)))</span>
<span style="color:#99cc00;"> d$group_x_start = group_x_start[d$Group]</span>
<span style="color:#99cc00;"> d$group_x_end = group_x_end[d$Group]</span>
<span style="color:#99cc00;"> d$x = runif(ngroup * nitem, d$group_x_start, d$group_x_end)</span>
<span style="color:#99cc00;"> # y under assumption linear mixed model</span>
<span style="color:#99cc00;"> d$y = (alpha + group_alpha[d$Group]) + </span>
<span style="color:#99cc00;"> (beta_x + beta_group_x[d$Group]) * d$x + </span>
<span style="color:#99cc00;"> rnorm(ngroup * nitem, 0, indiv_sigma) </span>
<span style="color:#99cc00;"> # y under simpson's paradox</span>
<span style="color:#99cc00;"> d$y_simpson = (alpha + group_alpha[d$Group]) + </span>
<span style="color:#99cc00;"> (beta_x) * d$x + </span>
<span style="color:#99cc00;"> (beta_group_x[d$Group]) * (d$x - d$group_x_start) +</span>
<span style="color:#99cc00;"> rnorm(ngroup * nitem, 0, indiv_sigma) </span>

<span style="color:#99cc00;"> # convery group and item to factor (only AFTER all the above has happened)</span>
<span style="color:#99cc00;"> d$Group = factor(d$Group)</span>
<span style="color:#99cc00;"> d$Item = factor(d$Item)</span>

<span style="color:#99cc00;"> return(d)</span>
<span style="color:#99cc00;">}</span>

<span style="color:#99cc00;">## -----------------</span>
<span style="color:#99cc00;"># plot</span>
<span style="color:#99cc00;">## -----------------</span>
<span style="color:#99cc00;">library(ggplot2)</span>
<span style="color:#99cc00;">dd = init(15, 20)</span>
<span style="color:#99cc00;">ggplot(dd, aes(y = y_simpson, x = x, color = Group)) +</span>
<span style="color:#99cc00;"> geom_point(alpha = .4, size = 3) +</span>
<span style="color:#99cc00;"> geom_smooth(method = "lm", </span>
<span style="color:#99cc00;"> formula = y ~ x, </span>
<span style="color:#99cc00;"> se = F, </span>
<span style="color:#99cc00;"> size = 1.2</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> geom_smooth(aes(y = y_simpson, x = x),</span>
<span style="color:#99cc00;"> method = "lm", </span>
<span style="color:#99cc00;"> formula = y ~ x, </span>
<span style="color:#99cc00;"> size = 2, </span>
<span style="color:#99cc00;"> color = "black", </span>
<span style="color:#99cc00;"> linetype = 2</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> scale_y_continuous("Pseudo normalized phonological diversity") +</span>
<span style="color:#99cc00;"> scale_x_continuous("Pseudo distance from origin") + </span>
<span style="color:#99cc00;"> scale_color_discrete("Pseudo\nlanguage\nfamily") +</span>
<span style="color:#99cc00;"> coord_cartesian(ylim = c(-1.6, 1.2)) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank()) </span>
<span style="color:#99cc00;">ggsave(file = "figures/simpsons-paradox.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;"># vectors to store coefficient</span>
<span style="color:#99cc00;">l = c()</span>
<span style="color:#99cc00;">l.est = c()</span>
<span style="color:#99cc00;">l.m.int = c()</span>
<span style="color:#99cc00;">l.m.int.est = c()</span>
<span style="color:#99cc00;">l.m.slp = c()</span>
<span style="color:#99cc00;">l.m.slp.est = c()</span>
<span style="color:#99cc00;">k = 2000</span>
<span style="color:#99cc00;">steps = c(10,20,40,80,160)</span>
<span style="color:#99cc00;">s = data.frame(cbind(1:(k*length(steps)^2), l, l.est, l.m.int, l.m.int.est, l.m.slp, l.m.slp.est))</span>

<span style="color:#99cc00;">library(lme4)</span>
<span style="color:#99cc00;">j = 0</span>
<span style="color:#99cc00;">for (i in 1:k) {</span>
<span style="color:#99cc00;"> for (ngroup in steps) { </span>
<span style="color:#99cc00;"> for (nitem in steps) {</span>
<span style="color:#99cc00;"> j = j + 1</span>
<span style="color:#99cc00;"> dd = init(ngroup, nitem)</span>
<span style="color:#99cc00;"> s$ngroup[j] = nlevels(dd$Group)</span>
<span style="color:#99cc00;"> s$nitem[j] = nlevels(dd$Item)</span>

<span style="color:#99cc00;"> l.1 = lm(y_simpson ~ 1 + x, dd)</span>
<span style="color:#99cc00;"> s$l[j] = summary(l.1)$coefficients[2,"Pr(&gt;|t|)"]</span>
<span style="color:#99cc00;"> s$l.est[j] = summary(l.1)$coefficients[2,"Estimate"]</span>

<span style="color:#99cc00;"> l.2 = lmer(y_simpson ~ 1 + x + (1 | Group), dd)</span>
<span style="color:#99cc00;"> s$l.m.int[j] = abs(summary(l.2)@coefs[2,3])</span>
<span style="color:#99cc00;"> s$l.m.int.est[j] = as.numeric(fixef(l.2)[2])</span>

<span style="color:#99cc00;"> l.3 = lmer(y_simpson ~ 1 + x + (1 + x | Group), dd)</span>
<span style="color:#99cc00;"> s$l.m.slp[j] = abs(summary(l.3)@coefs[2,3])</span>
<span style="color:#99cc00;"> s$l.m.slp.est[j] =as.numeric(fixef(l.3)[2])</span>
<span style="color:#99cc00;"> }</span>
<span style="color:#99cc00;"> }</span>
<span style="color:#99cc00;">}</span>
<span style="color:#99cc00;">summary(s)</span>
<span style="color:#99cc00;">save(s, file = "simulation-simpson-paradox.RData", compress=T)</span>

<span style="color:#99cc00;">lapply(split(s$l.m.int, paste(s$nitem, s$ngroup)), FUN = function(x) { sum(ifelse(x &gt; 1.96,1,0)) / length(x) } ) </span>
<span style="color:#99cc00;">lapply(split(s$l.m.slp, paste(s$nitem, s$ngroup)), FUN = function(x) { sum(ifelse(x &gt; 1.96,1,0)) / length(x) } ) </span>

<span style="color:#99cc00;">load("simulation-simpson-paradox.RData")</span>
<span style="color:#99cc00;">ggplot(s, aes(x = as.factor(ngroup), </span>
<span style="color:#99cc00;"> y = l.m.slp, </span>
<span style="color:#99cc00;"> color = as.factor(nitem)</span>
<span style="color:#99cc00;"> )</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> geom_point(size = 4, </span>
<span style="color:#99cc00;"> alpha = .4, </span>
<span style="color:#99cc00;"> position = position_jitter(width = .2)</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> stat_summary(fun.y = "mean", </span>
<span style="color:#99cc00;"> geom = "point",</span>
<span style="color:#99cc00;"> aes(shape = as.factor(nitem)), </span>
<span style="color:#99cc00;"> color = "black",</span>
<span style="color:#99cc00;"> size = 4</span>
<span style="color:#99cc00;"> ) + </span>
<span style="color:#99cc00;"> geom_abline(intercept = 1.96, </span>
<span style="color:#99cc00;"> slope = 0, </span>
<span style="color:#99cc00;"> size = 1.2, </span>
<span style="color:#99cc00;"> color = "gray25", </span>
<span style="color:#99cc00;"> linetype = 2</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> scale_x_discrete("Number of groups") +</span>
<span style="color:#99cc00;"> scale_y_continuous("absolute t-value") +</span>
<span style="color:#99cc00;"> scale_color_brewer("Number of individual\ndata points per group", palette = "Set1") +</span>
<span style="color:#99cc00;"> scale_shape_discrete("Mean by individual\ndata points per group",</span>
<span style="color:#99cc00;"> solid = F</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> opts(panel.background = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/simpsons-paradox-simulation-lmer-with-slope.png", </span>
<span style="color:#99cc00;"> width = 4, height = 4.5)</span>
<span style="color:#99cc00;">last_plot() + aes(y = l.m.int) + opts(legend.position = "none")</span>
<span style="color:#99cc00;">ggsave(file = "figures/simpsons-paradox-simulation-lmer-no-slope.png", </span>
<span style="color:#99cc00;"> width = 2.5, height = 4.5)</span>

<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;"># re-analysis of single origin determination</span>
<span style="color:#99cc00;">## -------------------------------------------------------------</span>
<span style="color:#99cc00;">load("BestOriginFitByModelType.RData")</span>

<span style="color:#99cc00;"># plot model fit for each hypothetical origin</span>
<span style="color:#99cc00;">library(maps)</span>
<span style="color:#99cc00;">library(ggplot2)</span>
<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist.pop.int.fam.sub.count")</span>
<span style="color:#99cc00;">dd[dd$devDifference == max(dd$devDifference),]</span>
<span style="color:#99cc00;">aggregate(dd[,"devDifference"],</span>
<span style="color:#99cc00;"> by = list(Continent = dd$Continent), quantile)</span>
<span style="color:#99cc00;">ggplot(dd) +</span>
<span style="color:#99cc00;"> borders("world") +</span>
<span style="color:#99cc00;"> geom_point(aes(</span>
<span style="color:#99cc00;"> x = Longitude, </span>
<span style="color:#99cc00;"> y = Latitude,</span>
<span style="color:#99cc00;"> color = devDifference</span>
<span style="color:#99cc00;"> ),</span>
<span style="color:#99cc00;"> alpha = 0.5,</span>
<span style="color:#99cc00;"> size = 2.5</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> geom_point(aes(x = dd[dd$devDifference == max(dd$devDifference),]$Longitude, </span>
<span style="color:#99cc00;"> y = dd[dd$devDifference == max(dd$devDifference),]$Latitude</span>
<span style="color:#99cc00;"> ),</span>
<span style="color:#99cc00;"> size = 6,</span>
<span style="color:#99cc00;"> color = "black", </span>
<span style="color:#99cc00;"> shape = 8,</span>
<span style="color:#99cc00;"> alpha = 1,</span>
<span style="color:#99cc00;"> solid = T</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> geom_point(aes(x = c(31,29,105,-170,-77.5), </span>
<span style="color:#99cc00;"> y = c(30,41,11.5,66,8)</span>
<span style="color:#99cc00;"> ),</span>
<span style="color:#99cc00;"> size = 4,</span>
<span style="color:#99cc00;"> color = "black", </span>
<span style="color:#99cc00;"> alpha = 1,</span>
<span style="color:#99cc00;"> solid = T</span>
<span style="color:#99cc00;"> ) +</span>
<span style="color:#99cc00;"> scale_color_gradient("Improvement in\nmodel quality\n(change in deviance)") +</span>
<span style="color:#99cc00;"> opts(panel.grid.major = theme_blank(), panel.background = theme_blank(), panel.grid.minor = theme_blank())</span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.pop.int.fam.sub.count.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist.pop.int.fam.sub.gen")</span>
<span style="color:#99cc00;">last_plot() </span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.pop.int.fam.sub.gen.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist.pop.int.fam.sub")</span>
<span style="color:#99cc00;">last_plot() </span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.pop.int.fam.sub.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist")</span>
<span style="color:#99cc00;">last_plot() </span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist.pop")</span>
<span style="color:#99cc00;">last_plot() </span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.pop.png", width = 8, height = 4.5)</span>

<span style="color:#99cc00;">dd = subset(all.long, model == "l.dist.pop.int")</span>
<span style="color:#99cc00;">last_plot() </span>
<span style="color:#99cc00;">ggsave(file = "figures/best-orgin-l.dist.pop.int.png", width = 8, height = 4.5)</span></pre>
</div>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/areal-dependency/'>areal dependency</a>, <a href='https://hlplab.wordpress.com/tag/atkinson/'>Atkinson</a>, <a href='https://hlplab.wordpress.com/tag/croft/'>Croft</a>, <a href='https://hlplab.wordpress.com/tag/data-analysis/'>data analysis</a>, <a href='https://hlplab.wordpress.com/tag/genetic-dependency/'>genetic dependency</a>, <a href='https://hlplab.wordpress.com/tag/graff/'>Graff</a>, <a href='https://hlplab.wordpress.com/tag/lmer/'>lmer</a>, <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/multilevel-models/'>multilevel models</a>, <a href='https://hlplab.wordpress.com/tag/r-code/'>R code</a>, <a href='https://hlplab.wordpress.com/tag/serial-founder-model/'>serial founder model</a>, <a href='https://hlplab.wordpress.com/tag/simulation/'>simulation</a>, <a href='https://hlplab.wordpress.com/tag/typology/'>typology</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/992/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/992/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/992/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=992&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/07/13/glmm-for-typologists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>More on random slopes and what it means if your effect is not longer significant after the inclusion of random slopes</title>
		<link>https://hlplab.wordpress.com/2011/06/25/more-on-random-slopes/</link>
		<comments>https://hlplab.wordpress.com/2011/06/25/more-on-random-slopes/#comments</comments>
		<pubDate>Sat, 25 Jun 2011 18:13:59 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[random effects]]></category>
		<category><![CDATA[regression]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=985</guid>
		<description><![CDATA[I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at http://hlplab.wordpress.com/2009/05/14/random-effect-structure/. As always, comments are welcome. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=985&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I thought the following snippet from a somewhat edited email I recently wrote in reply to a question about random slopes and what it means that an effect becomes insignificant might be helpful to some. I also took it as an opportunity to updated the procedure I described at <a href="http://hlplab.wordpress.com/2009/05/14/random-effect-structure/">http://hlplab.wordpress.com/2009/05/14/random-effect-structure/</a>. As always, comments are welcome. What I am writing below are just suggestions.</p>
<blockquote><p><em>[...] an insignificant effect in an (1 + factor|subj) model means that, after controlling for random by-subject variation in the slope/effect of factor, you find no (by-convention-significant) evidence for the effect. Like you suggest, this is due to the fact that there is between-subject variability in the slope that is sufficiently large to let us call into question the hypothesis that the &#8216;overall&#8217; slope is significantly different from zero.</em></p>
<p><em>[...] </em><em>So, what&#8217;s the rule of thumb here? If you run any of the standard simple designs (2&#215;2, 2&#215;3, 2x2x2,etc.) and you have the psychologist&#8217;s luxury of plenty of data (24+item, 24+ subject [...]), the full random effect structure is something you should entertain as your starting point. That&#8217;s in Clark&#8217;s spirit. That&#8217;s what F1 and F2 were meant for. [...] All of these approaches do not just capture random intercept differences by subject and item. They also aim to capture random slope differences.</em></p>
<p><em>[...] here&#8217;s what I&#8217;d recommend during tutorials now because it often saves time for psycholinguistic data. I am only writing down the random effects but, of course, I am assuming there are fixed effects, too, and that your design factors will remain in the model. Let&#8217;s look at a 2&#215;2 design:<span id="more-985"></span></em></p>
<p><em>1) find the largest model that still converges. for normal psycholinguistic data sets, you can actually often fit the full model:</em></p>
<ul>
<li><em>(1 + factorA * factorB | subject) + (1 + factorA * factorB | item)</em></li>
</ul>
<p><em>but you might have to back-off, if this doesn&#8217;t converge. If so, try both:</em></p>
<ul>
<li><em>(1 + factorA + factorB | subject) + (1 + factorA * factorB | item)</em></li>
<li><em>(1 + factorA * factorB | subject) + (1 + factorA + factorB | item)</em></li>
</ul>
<p><em>If neither of those works, try:</em></p>
<ul>
<li><em>(1 + factorA + factorB | subject) + (1 + factorA + factorB | item)</em></li>
</ul>
<p><em>etc. This will give you what I started to call &#8220;the maximal random effect structure justified by your sample&#8221;. NB: this does not mean that you can go around and say that higher random slope terms don&#8217;t matter and that your results would hold if you included those. You&#8217;re sample does not have enough data to afford that conclusion within the mixed model implementations available to you. That&#8217;s a normal caveat, I find.</em></p>
<p><em>At this point, you can say: I have enough data, the random effects are theoretically motivated, so I will leave it at this. Or, e.g., b/c you have reason to suspect that there are power issues, you might want to check whether you can reduce the random effect structure further. If so, continue to 2)</em></p>
<p><em>2) Compare the maximal model against:</em></p>
<ul>
<li><em>the intercept-only model: (1 | subject) + (1 | item)</em></li>
</ul>
<p><em>compare the deviance between the two models (e.g. the chisq of anova(model1, model2)). if it&#8217;s less than 3, there is no room for any of the slopes to matter (deviance differences are cumulative). you&#8217;re done with slope tests. if not continue at 3)</em></p>
<p><em>3) if the comparison of the full and the intercept-only model is significant, we need to find out which slopes matter. The size of the deviance difference between the full and the intercept-only model is very instructive as it gives us an idea about how much of a deviance difference is there to be accounted for by additional slopes.</em></p>
<p><em>In my experience, the homogeneous nature of psycholinguistic stimuli usually means that there is not much item variance and that most of your variance will be due to subjects. This is often also visible in the size of the variance estimates of the by-subject and by-item intercepts. So, if you want to save some time, I&#8217;d recommend first looking which of the random by-subject slopes matters the most. This is done by further model comparison (e.g. using the anova(model1, model2, ..) command; although there are more complicated tests that have been argued to be better).</em></p>
<p><em>Usually this will result in a clear winner model. Be aware that it&#8217;s theoretically possible that two models with different, non-nested, random effect structures are equally good in terms of their deviance. In that case, write to ling-R-lang.</em></p>
<p><em>What else? I would always follows R default to include random covariances between different random terms by the same group (e.g. random by-subject intercepts and by-subject slopes for factorA). You can test this assumption, too (again using model comparison), but I find that it&#8217;s usually not worth removing the random covariances.</em></p>
<p><em>4) Of course, you can also assess whether you need a subject or item effect at all. Simple compare the intercept-only model against, e.g.:</em></p>
<ul>
<li><em>(1 | subject)</em></li>
<li><em>(1 | item)</em></li>
</ul>
<p><em>For example, if anova(intercept-only.model, subject-intercept-only.model) is not significant, your sample doesn&#8217;t provide evidence that you need item effects.</em></p>
<p><em>5) Note that, to the best of my knowledge, it&#8217;s *not* legit to test whether you might not need any random effect by comparing e.g. (1 | subject) against an ordinary linear model. See, for example, the link provided on <a href="http://hlplab.wordpress.com/2011/05/31/two-interesting-papers-on-mixed-models/">http://hlplab.wordpress.com/2011/05/31/two-interesting-papers-on-mixed-models/</a>.</em></p>
<p><em>This whole procedure may seem cumbersome, but this is a matter of implementation. To the best of my knowledge, several folks are working on implementations that make these comparisons easier [...]</em></p>
<p><em>HTH,</em><br />
<em> Florian</em></p></blockquote>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/r-code/'>R code</a>, <a href='https://hlplab.wordpress.com/tag/random-effects/'>random effects</a>, <a href='https://hlplab.wordpress.com/tag/regression/'>regression</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/985/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/985/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/985/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=985&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/06/25/more-on-random-slopes/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>Two interesting papers on mixed models</title>
		<link>https://hlplab.wordpress.com/2011/05/31/two-interesting-papers-on-mixed-models/</link>
		<comments>https://hlplab.wordpress.com/2011/05/31/two-interesting-papers-on-mixed-models/#comments</comments>
		<pubDate>Wed, 01 Jun 2011 03:27:50 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[articles]]></category>
		<category><![CDATA[Papers & Presentations]]></category>
		<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[linear mixed models]]></category>
		<category><![CDATA[lmer]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[power analyses]]></category>
		<category><![CDATA[random effects]]></category>
		<category><![CDATA[random slopes]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[zero variance]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=980</guid>
		<description><![CDATA[While searching for something else, I just came across two papers that should be of interest to folks working with mixed models. Schielzeth, H. and Forstmeier, W. 2009. Conclusions beyond support: overconfident estimates in mixed models. Behavioral Ecology Volume 20, Issue 2, 416-420.  I have seen the same point being made in several papers under review and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=980&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>While searching for something else, I just came across two papers that should be of interest to folks working with mixed models.</p>
<ul>
<li>Schielzeth, H. and Forstmeier, W. 2009. <strong><a href="http://beheco.oxfordjournals.org/content/20/2/416.short">Conclusions beyond support: overconfident estimates in mixed models</a></strong>. Behavioral Ecology Volume 20, Issue 2, 416-420.  I have seen the same point being made in several papers under review and at a recent CUNY (e.g. Doug Roland&#8217;s 2009? CUNY poster). On the one hand, it should be absolutely clear that random intercepts alone are often insufficient to account for violations of independence (this is a point, I make every time I am teaching a tutorial). On the other hand, I have reviewed quite a number of papers, where this mistake was made. So, here you go. Black on white. The moral is (once again) that no statistical procedure does what you think it should do <em>if you don&#8217;t use it the way it was intended to</em>.</li>
<li>The second paper takes on a more advanced issue, but one that is becoming more and more relevant. <strong>How can we test whether a random effect is essentially non-necessary &#8211; i.e. that it has a variance of 0?</strong> Currently, most people conduct model comparison (following Baayen, Davidson and Bates, 2008).  But this approach is not recommended (and neither do Baayen et al recommend it) if we want to test whether <em>all </em>random effects can be completely removed from the model (cf. the very useful <a href="http://glmm.wikidot.com/faq">R FAQ list</a>, which states &#8220;<em>do not</em> compare lmer models with the corresponding lm fits, or glmer/glm; the log-likelihoods [...] include different additive terms&#8221;). This issue is taken on in Scheipl, F., Grevena, S. and Küchenhoff, H. 2008. <strong><a href="http://www.sciencedirect.com/science/article/pii/S0167947307004306">Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models.</a> </strong>Computational Statistics &amp; Data Analysis.Volume 52, Issue 7, 3283-3299. They present power comparisons of various tests.</li>
</ul>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/papers-presentations/articles/'>articles</a>, <a href='https://hlplab.wordpress.com/category/papers-presentations/'>Papers &amp; Presentations</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/linear-mixed-models/'>linear mixed models</a>, <a href='https://hlplab.wordpress.com/tag/lmer/'>lmer</a>, <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/power-analyses/'>power analyses</a>, <a href='https://hlplab.wordpress.com/tag/random-effects/'>random effects</a>, <a href='https://hlplab.wordpress.com/tag/random-slopes/'>random slopes</a>, <a href='https://hlplab.wordpress.com/tag/regression/'>regression</a>, <a href='https://hlplab.wordpress.com/tag/zero-variance/'>zero variance</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/980/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/980/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/980/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=980&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/05/31/two-interesting-papers-on-mixed-models/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>
	</item>
		<item>
		<title>Mixed model&#8217;s and Simpson&#8217;s paradox</title>
		<link>https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/</link>
		<comments>https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/#comments</comments>
		<pubDate>Tue, 31 May 2011 09:12:37 +0000</pubDate>
		<dc:creator>tiflo</dc:creator>
				<category><![CDATA[Statistics & Methodology]]></category>
		<category><![CDATA[statistics/R]]></category>
		<category><![CDATA[mixed models]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[Simpson's paradox]]></category>
		<category><![CDATA[simulation]]></category>

		<guid isPermaLink="false">http://hlplab.wordpress.com/?p=972</guid>
		<description><![CDATA[For a paper I am currently working on, I started to think about Simpson&#8217;s paradox, which wikipedia succinctly defines as &#8220;a paradox in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science [...]&#8220; The wikipedia page also gives a nice visual illustration. Here&#8217;s my [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=972&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For a paper I am currently working on, I started to think about <strong><a href="http://en.wikipedia.org/wiki/Simpson's_paradox">Simpson&#8217;s paradox</a></strong>, which wikipedia succinctly defines as</p>
<blockquote><p>&#8220;a <a title="Paradox" href="http://en.wikipedia.org/wiki/Paradox">paradox</a> in which a correlation (trend) present in different groups is reversed when the groups are combined. This result is often encountered in social-science [...]&#8220;</p></blockquote>
<p>The wikipedia page also gives a nice visual illustration. Here&#8217;s my own version of it. The plot shows 15 groups, each with 20 data points. The groups happen to order along the x-axis (&#8220;Pseudo distance from origin&#8221;) in a way that suggests a negative trend of the <em>Pseudo distance from origin</em> against the outcome (&#8220;Pseudo normalized phonological diversity&#8221;). However, this trend does not hold within groups. As a matter of fact, in this particular sample, most groups show the opposite of the global trend (10 out of 15 within-group slopes are clearly positive). If this data set is analyzed by an ordinary linear regression (which does not have access to the grouping structure), the result will be a significant negative slope for the <em>Pseudo distance from origin</em>. So, I got curious: what about linear mixed models?</p>
<p><a href="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox.png"><img class="aligncenter size-full wp-image-973" title="Simpson's Paradox" src="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox.png?w=655&#038;h=368" alt="" width="655" height="368" /></a><span id="more-972"></span></p>
<p>Here, I don&#8217;t want to ask how likely the above case is to be an actual instance of Simpson&#8217;s paradox (for that, we would need to know that the order of the groups on the x-axis is indeed co-incidental rather than itself being due to a causal effect of <em>Pseudo distance from origin</em>).</p>
<p>So, as a first step, here&#8217;s a function that creates data, in which the within-group slope is either the global slope or its negative (plus, in either case, a bit of normal noise).</p>
<pre>init = function(ngroup = 10, nitem = 10) {
	x = 1:27000
	spread_x = (max(x) - min(x)) * .2
	group_x_start = runif(ngroup, min(x), max(x))
	group_x_end = runif(ngroup,
		apply(cbind(group_x_start + spread_x/2, max(x)), MARGIN = 1, FUN =  min),
		apply(cbind(group_x_start + spread_x, max(x)), MARGIN = 1, FUN =  min))

	# variances
	group_sigma = .1
	group_x_sigma = .00001
	indiv_sigma = .1

	# betas
	alpha = .6
	group_alpha = rnorm(ngroup, 0, group_sigma)
	beta_x = -.00005
	# for normal differences
	# beta_group_x = rnorm(ngroup, 0, group_x_sigma)
	beta_group_x = 2 * rbinom(ngroup, 1, 0.5) * -beta_x + rnorm(ngroup, 0, group_x_sigma)

	## -----------------
	# create data set
	## -----------------
	d = data.frame(Group = rep(1:ngroup, nitem),
		Item = sort(rep(1:nitem, ngroup)))
	d$group_x_start = group_x_start[d$Group]
	d$group_x_end = group_x_end[d$Group]
	d$x = runif(ngroup * nitem, d$group_x_start, d$group_x_end)
	# y under assumption linear mixed model
	d$y = (alpha + group_alpha[d$Group]) +
		(beta_x + beta_group_x[d$Group]) * d$x  +
		rnorm(ngroup * nitem, 0, indiv_sigma)
	# y under simpson's paradox
	d$y_simpson = (alpha + group_alpha[d$Group]) +
		(beta_x) * d$x  +
		(beta_group_x[d$Group]) * (d$x - d$group_x_start) +
		rnorm(ngroup * nitem, 0, indiv_sigma) 

	# convery group and item to factor (only AFTER all the above has happened)
	d$Group = factor(d$Group)
	d$Item = factor(d$Item)

	return(d)
}</pre>
<p>With this function, (a random version of) the above graph is created as follows:</p>
<pre>library(ggplot2)
dd = init(15, 20)
ggplot(dd, aes(y = y_simpson, x = x, color = Group)) +
	geom_point(alpha = .4, size = 3) +
	geom_smooth(method = "lm",
		formula = y ~ x,
		se = F,
		size = 1.2
	) +
	geom_smooth(aes(y = y_simpson, x = x),
		method = "lm",
		formula = y ~ x,
		size = 2,
		color = "black",
		linetype = 2
	) +
	scale_y_continuous("Pseudo normalized phonological diversity") +
	scale_x_continuous("Pseudo distance from origin") +
	scale_color_discrete("Pseudo\nlanguage\nfamily") +
	coord_cartesian(ylim = c(-1.6, 1.2)) +
	opts(panel.background = theme_blank())</pre>
<p>Next, I ran a 2000 simulations each for calls to init(<em>g</em>, <em>i</em>), with all combinations of g=10,20,40,80,160 groups and i=10,20,40,80,160 cases per group. For each simulation run, three models were fit:</p>
<ol>
<li>An ordinary linear regression model with only the predictor <em>Pseudo distance from origin</em></li>
<li>A linear mixed effect model with the predictor<em> Pseudo distance from origin</em> and a random by-group intercept</li>
<li>A linear mixed effect model with the predictor<em> <em>Pseudo distance from origin</em></em> and both a random by-group intercept and a random by-group slope for the predictor <em><em>Pseudo distance from origin</em></em></li>
</ol>
<div>The first model indeed always returned significance for the predictor <em>Pseudo distance from origin</em>. The linear mixed model with a random intercept is a lot less likely to considerably less likely to return significance, but still does so in 60-98% of all cases. The last model, which also includes a random slope term shows different results, returning significance in 3-7% of the 2000 runs for many combinations of number of groups and group size. As the number of groups begins to be much larger than the cases per group, this model, too, finds significance 70%+ of all times. The left figure below show the simulation results for model 2, the right figure shows the results for model 3. Colored points represent one run of the simulation and show the <em>t</em>-value for the predictor (the pseudo distance effect in Figure 2) in the respective model. The non-solid shapes indicate the mean <em>t</em>-value for all 2000 mixed models fit For each combination of “number of groups” and “number of individual data points per group”. The dashed line indicates <em>t</em>=1.96, above which a t-test would be significant.</div>
<div>

<a href='https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/simpsons-paradox/' title='Simpson&#039;s Paradox'><img data-attachment-id='973' data-orig-size='2400,1350' data-liked='0'width="150" height="84" src="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox.png?w=150&#038;h=84" class="attachment-thumbnail" alt="Simpson&#039;s Paradox" title="Simpson&#039;s Paradox" /></a>
<a href='https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/simpsons-paradox-simulation-lmer-no-slope/' title='simpsons-paradox-simulation-lmer-no-slope'><img data-attachment-id='975' data-orig-size='750,1350' data-liked='0'width="83" height="150" src="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox-simulation-lmer-no-slope.png?w=83&#038;h=150" class="attachment-thumbnail" alt="simpsons-paradox-simulation-lmer-no-slope" title="simpsons-paradox-simulation-lmer-no-slope" /></a>
<a href='https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/simpsons-paradox-simulation-lmer-with-slope/' title='simpsons-paradox-simulation-lmer-with-slope'><img data-attachment-id='976' data-orig-size='1200,1350' data-liked='0'width="133" height="150" src="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox-simulation-lmer-with-slope.png?w=133&#038;h=150" class="attachment-thumbnail" alt="simpsons-paradox-simulation-lmer-with-slope" title="simpsons-paradox-simulation-lmer-with-slope" /></a>

</div>
<div>I&#8217;d be curious to hear what other people think about this.</div>
<br />Filed under: <a href='https://hlplab.wordpress.com/category/statistics-methodology/'>Statistics &amp; Methodology</a>, <a href='https://hlplab.wordpress.com/category/statistics-methodology/statisticsr/'>statistics/R</a> Tagged: <a href='https://hlplab.wordpress.com/tag/mixed-models/'>mixed models</a>, <a href='https://hlplab.wordpress.com/tag/r-code/'>R code</a>, <a href='https://hlplab.wordpress.com/tag/regression/'>regression</a>, <a href='https://hlplab.wordpress.com/tag/simpsons-paradox/'>Simpson's paradox</a>, <a href='https://hlplab.wordpress.com/tag/simulation/'>simulation</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/hlplab.wordpress.com/972/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/hlplab.wordpress.com/972/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/hlplab.wordpress.com/972/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=hlplab.wordpress.com&amp;blog=2188611&amp;post=972&amp;subd=hlplab&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>https://hlplab.wordpress.com/2011/05/31/mixed-models-and-simpsons-paradox/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="https://secure.gravatar.com/avatar/2fdd62ea4887c75d358ad9d77c6ab38b?s=96&#38;d=https%3A%2F%2Fsecure.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D96&#38;r=G" medium="image">
			<media:title type="html">tiflo</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox.png" medium="image">
			<media:title type="html">Simpson&#039;s Paradox</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox.png?w=150" medium="image">
			<media:title type="html">Simpson&#039;s Paradox</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox-simulation-lmer-no-slope.png?w=83" medium="image">
			<media:title type="html">simpsons-paradox-simulation-lmer-no-slope</media:title>
		</media:content>

		<media:content url="http://hlplab.files.wordpress.com/2011/05/simpsons-paradox-simulation-lmer-with-slope.png?w=133" medium="image">
			<media:title type="html">simpsons-paradox-simulation-lmer-with-slope</media:title>
		</media:content>
	</item>
	</channel>
</rss>
