Frequently Taught Electives in the Minor
ACMS 34445: Probability and Statistics for Data Science
Alan Huebner, Applied and Computational Math and Statistics
In this course, you will learn the fundamentals of probability theory and statistical inference used in data science. These foundational principles and techniques will allow you to transform data science problems into mathematical terms and validate them as statistical statements.
Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with webcam, reliable internet connection, and a quiet place to participate in live sessions.
ANTH 43200: The Social Species
Mark Golitko, Anthropology
Human beings are distinguished in the animal kingdom by the degree to which we are embedded in wide-ranging networks of interaction. Anthropologists and archaeologists have long been interested in reconstructing the evolutionary causes of the human social talent, and the nature and structure of these connections in the past and present. This course will review current understanding of the evolutionary causes and consequences of human social networks beginning with our earliest ancestors, and reviewing archaeological and anthropological methods used to study them, including social network analysis (SNA). Network analysis is a powerful set of tools and theories drawn from across the social and physical sciences that can be used to study and model relational data. The course will review both the basics of network analysis as a tool, but also what we know of the structure of human social networks from classic sociological and anthropological studies, and discuss how network approaches can be used to study and model interaction and social structure in the past (or present). No prior mathematical or statistical training is needed to take this course.
BIOS 40427: The Epidemiology and Ecology of Infectious Diseases
Edwin Michael, Biology
This course provides an introduction to epidemiology and disease ecology; topics covered include historical perspectives on disease, tracking of disease, spread of disease, and disease mitigation.
BIOS 30318: Introduction to Biocomputing
Stuart Jones, Biology
Modern biology, as well as biochemistry and biophysics, relies significantly on computation. The volumes of data generated by modern lab and field research commonly require greater capacity and more sophisticated algorithms for reformatting, filtering, and analyzing than are available in traditional spreadsheet software. As a result, an efficient and productive scientist must possess, at least basic, biocomputational skills. Often these requisite skills include the ability to navigate the Unix Shell environment, to understand and implement existing software tools, and to use a scripting language for data processing and analysis. This course will provide students with the knowledge and experience required to apply these important tools in diverse contexts. Approximately one-third of the course will focus on using the Unix Shell environment with an introduction to bioinformatics approaches. The remaining two-thirds of the course will build the students' skills in the use of the R scripting language and applications in statistics and dynamic modeling. No previous coding experience is required of students in this course
CSE 10102: Elements of Computing II
Corey Pennycuff, Computer Science Engineering
CSE 10102/CDT 30020 is the second course in the core programming sequence in the Computing & Digital Technologies Minor. Building on your prior experience with the Python programming language, you will explore advanced programming paradigms such as functional and object-oriented programming, familiarize yourself with elements of software engineering such as the command line interface, version control, and development environments, and utilize web and cloud-based services. To demonstrate your mastery of these skills and concepts, you will work on an interdisciplinary team project throughout the semester that applies your knowledge to a problem related to one of the CDT tracks.
CSE 40838: Data Visualization
Chaoli Wang, Computer Science Engineering
Introduction to scientific and information visualization. Topics include visualization of scalar and vector fields (isosurface extraction, volume rendering, line integral convolution, and particle tracing); visual data representations (parallel coordinates, treemaps, and graph layouts); interactive techniques (focus+context visualization and coordinated multiple views); and solutions for big data visual analytics. Students will gain hands-on experiences in learning popular visualization programming (D3.js) and toolkit (ParaView). Students will have the opportunity to learn, implement, and apply visualization techniques through assignments and projects.
CSE 44640: Data Science
Nitesh Chawla, Computer Science & Engineering
Data science can be viewed as the art and craft of extracting knowledge from large bodies of structured and unstructured data using methods from many disciplines, including (but not limited to) machine learning, databases, probability and statistics, information theory, and data visualization. This course will focus on the process of data science -- from data acquisition to analytics methods to deployment, and will walk the students through both the technical and use-case aspects in the process. It will place a larger emphasis on the machine learning component, with relevant inclusions and references from other disciplines. The course will give students an opportunity to implement and experiment with some of the concepts as part of a class project, in addition to the hands-on assignments using the Python programming language. Additionally, the course touches upon some of the advances in related topics such as big data and discuss the role of data mining in contemporary society. The course has been designed and developed by Nitesh Chawla, the Frank Freimann Professor of Computer Science and Engineering and Director of iCeNSA at the University of Notre Dame.
Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with webcam, reliable internet connection, and a quiet place to participate in live sessions. Students who will be on the Main campus are not eligible to enroll in this course.
Students enrolling in this course should have taken one or more courses or implemented one or more projects involving Python programming and one or more courses in probability or statistics.
DESN 40120: Visualization of Data
Neeta Verma, Visual Communication Design
MATERIALS FEE. The course develops an understanding of what data means to humans and how does its visualization helps communicate ideas in the fields of medicine, technology and social sciences. The course touches upon measurement, collection and reporting, analysis but ultimately focuses on visualization. Visualization is when the data comes alive and is ready to be used to communicate a complex concept be it numeric, spatial, process or temporal. Types of data covered in this course include but are not limited to: geographical, cultural, scientific, financial, statistical, meteorological, natural, and transportation data. The goal of the exercises within this course is to understand how data can be used to tell a story as opposed to merely packaging and plotting a set of numbers on a page. The design process is therefore exploring the static, dynamic, interactive or 3-dimensional and performance formats of representation and understand why a certain format is more or less suitable for the nature of data, its analysis and therefore its representation. Students develop an understanding of how the graphics being used must correlate completely with the data and numbers that are being represented. The course traverses through these considerations to understand the various approaches that can be used to bring data to life and allow the viewer to understand a story that is being packaged within the representation. Is there revelation or a deeper understanding of a pattern once your data has been visualized and presented that had not been discovered earlier?
ECON 30331: Econometrics
ENGL 30010/ CDT 30380: Text Mining the Novel
Matthew Wilkens, English
A course in quantitive and computational approaches to analyzing large bodies of text. Broadly speaking, the course covers text mining, content analysis, and basic machine learning, emphasizing (but not limited to) approaches with demonstrated value in literary studies. Students will learn how to clean and process textual corpora, extract information from unstructured texts, identify relevant textual and extra-textual features, assess document similarity, cluster and classify authors and texts using a variety of machine-learning methods, visualize the outputs of statistical models, and incorporate quantitative evidence into literary and humanistic analysis. Most of the methods treated in the class are relevant in other fields. Students from all majors are welcome. No prerequisites, but some programming experience strongly recommended. Taught in Python.
MDSC/ANTH 20110: Archeology of Hacking: Everything You Wanted to Know About Hacking But Were Afraid to Ask
"Hacking" is one of the most pressing topics of technological and societal interest. Yet, it is one of the most misunderstood and mischaracterized practices in the public sphere, given its ethical and technical complexities. In this course we will combine anthropological and computer science methods to explore the digital tools, practices, and sociocultural histories of hacking with a focus on their context of occurrence from the late 1960s to the present. Our goal is to help students think anthropologically about computing as well as technically about the digital mediations that we depend on in our lives.
MDSC 30003: Baseball in America
Katherine Walden, American Studies
Baseball is one of the most enduringly popular and significant cultural activities in the United States. Since the late 19th century, baseball has occupied an important place for those wishing to define and understand "America." Who has been allowed to play on what terms? How have events from baseball's past been remembered and re-imagined? What is considered scandalous and why (and who decides)? How has success in baseball been defined and redefined? Centering baseball as an industry and a cultural practice, this course will cover topics that include the political, economic, and social development of professional baseball in the United States; the rise of organized baseball industry and Major League Baseball; and globalization in professional baseball. Readings for this course will include chapters from texts that include Rob Rucks's How the Major Leagues Colonized the Black and Latin Game (2011), Adrian Burgos's Playing America's Game: Baseball, Latinos, and the Color Line (2007), Daniel Gilbert's Expanding the Strike Zone: Baseball in the Age of Free Agency (2013), Robert Elias's How Baseball Sold U.S. Foreign Policy and Promoted the American Way Abroad (2010), and Michael Butterworth's Baseball and Rhetorics of Purity: The National Pastime and American Identity During the War on Terror (2010). Coursework may include response papers, primary source analysis, and a final project.
MDSC 30056: Digital Empires
Liang Cai, History
This course will provide advanced undergraduates and graduate students with a critical introduction to digital humanities for the study of early China, the fountainhead of Chinese Civilization. Collaborating with the Center of Digital Scholarship, this course will focus on relational data with structured information on historical figures, especially high officials, of early Chinese empires. Throughout the semester, we will read academic articles, mine data from primary sources, and employ Gephi and ArcGIS to visualize data. Those constructed data will cover three major themes: how geographical mobility contributed to the solidarity of an newly unified empire over diversified regions, how social networks served as the hidden social structure channeling the flow of power and talents, and how criminal records and excavated legal statutes shed light on the unique understanding of law and its relationship with the state power in Chinese history.
MDSC 30104: Data Feminism
Katherine Walden, American Studies
Feminism isn’t only about women, nor is feminism only for women. Feminism is about power—about who has it and who doesn’t. And in today’s world, data is power. Data can be used to create communities, advance research, and expose injustice. But data can also be used to discriminate, marginalize, and surveil. This course will draw intersectional feminist theory and activism to identify models for challenging existing power differentials in data science, with the aim of using data science methods and tools to work towards justice. Class meetings will be split between discussions of theoretical readings and explorations of data science tools and methods (such as Tableau, RStudio, and Python). Those readings may include chapters from texts that include Catherine D’Ignazio and Lauren Klein’s Data Feminism (2020), Virginia Eubanks’s Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (2018), Ruha Benjamin’s Race After Technology: Abolitionist Tools for the New Jim Code (2019), and Sasha Costanza-Chock’s Design Justice: Community-Led Practices to Build the Worlds We Need (2020). This course will also examine the data advocacy and activism work undertaken by groups like Our Data Bodies, Data for Black Lives, the Anti-Eviction Mapping Project, and Chicago-based Citizens Police Data Project. Over the course of the semester, students will develop original research projects that use data to intervene in issues of inequality and injustice.
MDSC 30161: Football in America
Katherine Walden, American Studies
Football is one of the most enduringly popular and significant cultural activities in the United States. Since the late 19th century, football has occupied an important place for those wishing to define and understand "America." And Notre Dame football plays a central role in that story, with larger-than-life figures and stories, from Knute Rocknes. Win one for the Gipper line to the Four Horsemen backfield that led the program to a second national championship in 1924. The mythic proportions of the University's football program cast a long shadow on the institution's history, cultural significance, and traditions. This course focuses on Notre Dame football history as an entry point into larger questions about the cultural, historical, and social significance of football in the U.S. Who has been allowed to play on what terms? How have events from Notre Dame football's past been remembered and re-imagined? How has success in Notre Dame football been defined and redefined? In particular, the course will focus on how Notre Dame football became a touchstone for Catholic communities and institutions across the country navigating the fraught terrain of immigration, whiteness, and religious practice. This course will take up those questions through significant engagement with University Archive collections related to Notre Dame football, working toward increased levels of description and access for these materials. This course will include hands-on work with metadata, encoding and markup, digitization, and digital preservation/access through a collaboration with the University Archives and the Navari Family Center for Digital Scholarship. Readings for this course will include chapters from texts such as Murray Sperber's Shake Down the Thunder: The Creation of Notre Dame Football (1993), TriStar Pictures' Rudy (1993), Steve Delsohn's Talking Irish: The Oral History of Notre Dame Football (2001), Jerry Barca's Unbeatable: Notre Dame's 1988 Championship and the Last Great College Football Season (2014), David Roediger's Working Toward Whiteness: How America's Immigrants Became White (2005), David Roediger's The Wages of Whiteness: Race and the Making of the American Working Class (1991), and Noel Ignatiev's How the Irish Became White (1995). Class meetings will be split between discussions of conceptual readings and applied work with library and information science technologies and systems. Coursework may include response papers, hands-on work with data, and a final project. Familiarity with archival methods, library/information science, data science, or computer science tools and methods is NOT a prerequisite for this course.
MDSC 30190: Sport and Big Data
Katherine Walden, American Studies
Sport is one of the most enduringly popular and significant cultural activities in the United States. Data has always been a central part of professional sport in the US, from Henry Chadwick’s invention of the baseball box score in the 1850s to the National Football League’s use of Wonderlic test scores to evaluate players. This course focuses on the intersecting structures of power and identity that shape how we make sense of the “datification” of professional sport. By focusing on the cultural significance of sport data, this course will put the datafication of sport in historical context and trace the ways the datafication of sport has impacted athletes, fans, media, and other stakeholders in the sport industry. The course will also delve into the technology systems used to collect and analyze sport data, from the TrackMan and PITCHf/x systems used in Major League Baseball to the National Football League’s Next Gen Stats partnership to emerging computer vision and artificial intelligence research methods. Readings for this course will draw on texts like Christopher Phillips’ Scouting and Scoring: How We Know What We Know About Baseball (2019), Ruha Benjamin’s Captivating Technology: Race, Carceral Technoscience, and Liberatory Imagination in Everyday Life (2019), and Michael Lewis’s Moneyball: The Art of Winning an Unfair Game (2004). Class meetings will be split between discussions of conceptual readings and applied work with sport data and technology systems. Coursework may include response papers, hands-on work with data, and a final project. Familiarity with statistical analysis, data science, or computer science tools and methods is NOT a prerequisite for this course.
MDSC 33201: Geographic Information Systems
Matthew Sisk, GIS and Data Science
This course is aimed to provide a basic understanding of how Geographic Information Systems (GIS) and satellite imagery can be used to visualize and analyze environmental data. Students will learn basic techniques for analyzing, manipulating and creating geospatial data in both pixel-based (satellite imagery and digital terrain models) and vector based (point, line and polygon representation of spatial data) formats. Students will also learn how acquire high resolution satellite imagery and other GIS data from online data servers.
MDSC 40810: Quantitative Political Analysis using Stata
Michael Coppedge, Political Science
"Students in this course will learn to understand the most common statistical techniques used in political science and acquire the skills necessary to use these techniques and interpret their results. A mastery of these techniques is essential for understanding research on public opinion and voting behavior, electoral studies, and comparative research on the causes of democracy. For each topic, students will read works to orient them to key issues and debates. They will learn the reasoning behind the statistical analysis in these readings and create their own spreadsheet programs to execute such analyses. They will then download and clean datasets actually used in the published research, replicate selected analyses from these readings using the statistical package Stata and write short papers evaluating the inferences defended in the published research."
MDSC 40811: Quantitative Political Analysis Using R
Michael Coppedge, Political Science
This course is an introduction to political, economic, and social issues through the medium of visual displays. This kind of course has become important because data are now abundant and easy to access and software for displaying and analyzing data are available and easy to use. The ability to explore and display data is an increasingly valuable skill in many fields. However, this skill must be complemented by the ability to interpret visual displays orally, and by a commitment to use data responsibly: to reveal rather than slant or distort the truth. We will use data on democracy, economic development, and society as examples, but the main emphasis is on helping you explore many facets of a socioeconomic or political issue of particular interest to you. You will learn to manage data and produce your own graphics to describe and explain relationships. The graphics will include line and bar graphs, 2D and 3D scatterplots, maps, path diagrams, animated graphics, and others.
MDSC 43316: Sociotechnical Studies of Data Science
Luis Felipe Rosado Murillo, Anthropology
Data Science is an emergent, interdisciplinary field of research and technical practice that involves data-intensive and large-scale digital infrastructures. As Internet technologies and services become more relevant for the ways we organize our socioeconomic, political, and intimate lives, Data Science has been promoted as an expert domain to help make sense of multiple and heterogeneous digital traces that constitute "Big Data." In this seminar, we will review the literature on sociotechnical studies of Data Science to examine social, technical, and ethical dimensions of large-scale data analytics. Throughout the semester we will discuss a wide range of controversial cases in criminal law, labor recruitment, predictive policing, online marketing, Internet-based research, mass surveillance, and more. The goal is to develop a shared understanding of "data ethics" to enable us to critically engage with the mounting challenges of large-scale data analytics.
PHIL 24632: Robot Ethics
Don Howard, Philosophy
Robots or "autonomous systems" play an ever-increasing role in many areas, from weapons systems and driverless cars to health care and consumer services. As a result, it is ever more important to ask whether it makes any sense to speak of such systems' behaving ethically and how we can build into their programming what some call "ethics modules." After a brief technical introduction to the field, this course will approach these questions through contemporary philosophical literature on robot ethics and through popular media, including science-fiction text and video. This is an online course with required, regular class sessions each week. Class meetings are online via Zoom webinar software (provided by the University).
Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with a webcam, reliable internet connection, and a quiet place to participate in live sessions. Students who will be on the Main campus or residing in the Michiana region are not eligible to enroll in this course.
PHIL 20647/MDSC 20647: Data and Artificial Intelligence Ethics
Emanuele Ratti, Philosophy
In the last decade, the Big Data revolution and developments in Artificial Intelligence (AI) have both created promises and raised several ethical issues. Computational emerging technologies have fostered the achievement of apparent benefits, while at the same they seem to exacerbate social inequalities and threaten even our own existence as a species. In this course, we will discuss those ethical and societal issues related to the development of AI and Big Data that have direct and concrete consequences on the way we perceive ourselves as persons, as members of society, and the way we conceive our place as a species on this planet. These issues will be analyzed in light of major ethical theories, but a special emphasis will be placed on virtue ethics. Recent works in virtue ethics are well positioned to make sense of the importance of our place as human beings on this planet, but at the same time they can account for the indispensable roles that machines play in our environment. The course is divided in three main parts. In the first part, I will introduce the main ethical frameworks, and in particular virtue ethics. In the second part, we will discuss AI. Societal and ethical issues raised by AI include the threats posed to the existence of our species; whether we should trust AI or we should find a way to build artificial agents with moral characteristics; whether AI will do most of our jobs in the future and if this scenario is desirable. In the third part, we will focus on selected issues concerning the Big Data revolution, such as how the autonomy of very complex algorithms can shape our lives in opaque ways and whether transparency is desirable; if the design of algorithms may hide bias leading to social inequalities; how algorithms are changing the way healthcare is provided. Upon successful completion of this course, you will be able to: 1. define and sketch focal points of the virtue ethics and other relevant ethical theories 2. identify moral theories in arguments provided in support or in opposition to the use of certain AI-related and Big Data technologies 3. compare different arguments and highlight strengths and weaknesses
PHYS 60410/MDSC 40410: Patterns of Life
Dervis Can Vural, Physics
This course focuses on the mathematical principles underlying the spatiotemporal patterns emerging in biological populations. Students are expected to be comfortable with calculus, differential equations, linear algebra, and elementary probability theory. The first part of the course focuses primarily on population genetics and evolutionary biology, while the second part will focus on reaction diffusion equations and pattern formation. Students will be expected to solvequantitative problems, design simulations, and will be guided towards developing researchprojects related to theoretical and computational biology.
POLS 30813/KSGA 30005/MDSC 30005: Simulating Politics and Global Affairs
Thomas Mustillo, Global Affairs
Politics, markets, and the environment are all spheres of development that are fundamentally shaped by the action and interaction of many individuals over time. For example, the Arab Spring protests, the shortage of medicines in Caracas, and the rising water temperatures of the Baltic Sea are all system-level outcomes arising from the individual actions of thousands or even billions of people. In these spheres, leadership is often weak or non-existent. Scientists call these "complex systems." Complexity is difficult to study in the real world. Instead, scientists often approach these phenomenon using computer simulations (sometimes called agent-based models, social network models, and computational models). The goal is to build computer models of development that link the actions and interactions of individuals to the system-level outcomes. This class will use the perspective, literature, and tools of complexity science to approach core questions in the field of development.
POLS 40815: Visualizing Politics
Michael Coppedge, Political Science
This course is an introduction to political, economic, and social issues through the medium of visual displays. This kind of course has become feasible because data are now abundant and easy to access and software for displaying and analyzing data are available and easy to use. The ability to examine and display data is an increasingly valuable skill in many fields. However, this skill must be complemented by the ability to interpret visual displays orally, and by a commitment to use data responsibly: to reveal, rather than slant or distort, the truth. We will discuss examples concerning drugs, marriage, climate change, development, economic performance, social policy, democracy, voting, public opinion, and conflict, but the main emphasis is on helping you explore many facets of an issue of particular interest to you. You will learn to manage data and produce your own graphics to describe and explain political, social, economic (or other!) relationships. The graphics will include line and bar graphs, 2D and 3D scatterplots, motion charts, maps, and others.
POLS 34815/MDSC 34815: How To (Not) Lie with Statistics
Jeff Harden, Political Science
How will Amazon HQ2 impact local economies? Should parents allow kids to have screen time? What role did demographic shifts in suburban areas play in the 2016 and 2018 elections? Does the infield shift work? Modern society constantly faces questions that require data, statistics, and other empirical evidence to answer well. But the proliferation of niche media outlets, the rise of fake news, and the increase in academic research retraction makes navigating potential answers to these questions difficult. This course is designed to give students tools to confront this challenge by developing their statistical and information literacy skills. It will demonstrate how data and statistical analyses are susceptible to a wide variety of known and implicit biases, which may ultimately lead consumers of information to make problematic choices. The course will consider this issue from the perspectives of consumers of research as well as researchers themselves. We will discuss effective strategies for reading and interpreting quantitative research while considering the incentives researchers face in producing it. Ultimately, students will complete the class better equipped to evaluate empirical claims made by news outlets, social media, or their peers. The goal is to encourage students to approach data-driven answers to important questions with appropriate tools rather than blind acceptance or excessive skepticism.
POLS 30111: Data and Politics
Nathanael Gratias Sumaktoyo, Postdoctoral Fellow, Global Religion Research Initiative
Sherlock Holmes famously said in the Adventure of the Copper Beeches, "Data! Data! Data! I can't make bricks without clay." Similarly, it is hard to understand our world without data?big or small. Data allows us to look for patterns and trends, to test for relationships, and to make predictions. This course is all about data and how we can use it to understand politics and social phenomena in general. We will learn various approaches to data gathering and analysis, ranging from public opinion survey, experimental methods, unobtrusive measures, machine learning, to social networks analysis. Our assignments and exams will test both students' knowledge of the methods and ability to do basic analysis. While preexisting knowledge of statistical methods is not required, a willingness to learn statistics and programming (especially with R) is crucial. In terms of substance, while we will also discuss topics such as consumer behavior (think Netflix and Amazon), our foci will be on political behavior and religious behavior. Our focus on political behavior will include voting behavior, representation/redistricting, and political campaigns; whereas our focus on religious behavior will include religious violence, terrorism, and interreligion cooperation. Naturally, we will also discuss how the two foci are related both in the U.S. and around the world. This course counts as a methodology course for departmental honors.
PSY 40121: Psychological Measurement and Test Development
Ying (Alison) Cheng, Psychology
This course introduces measurement of human behavior in psychological studies, the construction and use of psychological instruments and educational assessments (including tests of intelligence, achievement, personality, and vocational interest), validation of these tests following classical test theory and item response theory, as well as practice in test construction, administration and validation. The course also highlights issues of test equality across groups, assessing measurement error, interpretation of test scores in the context of criterion-referenced tests vs. norm-referenced tests, standard setting and so on.
PSY 40122: Machine Learning for Social and Behavioral Research
Ross Jacobucci, Psychology
Cluster analysis is a statistical approach for the analysis of multivariate data that aims at discovering groups of subjects in a sample that are similar to each other. Clustering techniques are applied in a wide variety of areas including psychiatry (e.g., finding disease categories), marketing (e.g. different consumer profiles), sociology (e.g., social subgroups), etc. Cluster analysis is an example of unsupervised learning. The latter term is derived from the fact that clusters are discovered in the absence of an outcome variable that guides the clustering. Regression trees and random forests, on the other hand, are supervised learning approaches. An outcome such as "case" and "control" is predicted by a number of predictor variables, and the analysis focuses on finding groups with similar response patterns on the predictor variables. In other words, the outcome variable guides ("supervises") finding groups of similar subjects. Outcomes in regression trees or forests can be categorical or continuous.This graduate level course consists of two parts. The first part covers the basics of cluster analysis whereas the second part provides an introduction to regression trees and random forests. The course consists of approximately 2/3 lectures providing the theoretical background, and 1/3 lab sessions, which will use the free software program R. The course is suitable for students with a strong interest in methods. Basic knowledge of matrix algebra and thorough knowledge of regression analysis are a prerequisite. Due to the broad variety of applications of cluster analysis and regression trees, students in (Quantitative) Psychology, Sociology, Political Sciences, or Computer Sciences are equally welcome.
PSY 30109: R for Data Science and Exploratory and Graphical Data Analysis
Zhiyong (Johnny) Zhang, Psychology
This class aims to equip students with basic knowledge of R in data manipulation, data generation, data visualization and data analysis with a focus on data science. The first part of the class will introduce the very basics of R including the types of data such as vectors, matrices, and data frames as well as tibbles for refined data frames and bigmatrix for big data. The second part of the class will introduce data manipulation and preprocessing methods such as data transformation, subsetting, and combination. The third part will deal with specific types of data such as strings, texts, dates and times, images, audios, and videos. The fourth part will teach ggplot2 and related packages for data visualization. The last part of the class will illustrate how to conduct data analysis using the above techniques through case studies such as basket analysis, network analysis, and log analysis. The class does not require previous knowledge of R
PSY 40120: Advanced Statistics
Zhiyong (Johnny) Zhang, Psychology
This course extends PSY 30100 in two respects. First, additional attention is given to the logic of inferential statistics. Special focus is placed on the purpose, strengths, and limitations of hypothesis testing, especially as it is used in psychological research. Second, this course considers statistical analysis of data from more complex data structures than typically covered in PSY 30100. The goal of this part of the course is to heighten students' awareness of the variety of research questions that can be addressed through a wide range of designs and accompanying analyses. The orientation of the entire course focuses much less on the computational aspects of analyzing data than on the conceptual bases of what can be learned from different approaches to data analysis.
PSY 30105: Exploratory and Graphical Data Analysis
Zhiyong (Johnny) Zhang, Psychology
The process by which Psychological knowledge advances involves a cycle of theory development, experimental design and hypothesis testing. But after the hypothesis test either does or doesn't reject a null hypothesis, where does the idea for the next experiment come from? Exploratory data analysis completes this research cycle by helping to form and change new theories. After the planned hypothesis testing for an experiment has finished, exploratory data analysis can look for patterns in these data that may have been missed by the original hypothesis tests. A second use of exploratory data analysis is in diagnostics for hypothesis tests. There are many reasons why a hypothesis test might fail. There are even times when a hypothesis test will reject the null for an unexpected reason. By becoming familiar with data through exploratory methods, the informed researcher can understand what went wrong (or what went right for the wrong reason). This class is recommended for advanced students who are interested in getting the most from their data.
SOC 43990/MDSC 43990: Social Networks
David Hachen, Sociology
Social networks are an increasingly important form of social organization. Social networks help to link persons with friends, families, co-workers and formal organizations. Via social networks information flows, support is given and received, trust is built, resources are exchanged, and interpersonal influence is exerted. Rather than being static, social networks are dynamic entities. They change as people form and dissolve social ties to others during the life course. Social networks have always been an important part of social life: in our kinship relations, our friendships, at work, in business, in our communities and voluntary associations, in politics, in schools, and in markets. Our awareness of and ability to study social networks has increased dramatically with the advent of social media and new communication tools through which people interact with others. Through email, texting, Facebook, Twitter and other platforms, people connect and communicate with others and leave behind traces of those interactions. This provides a rich source of data that we can use to better understand our connections to each other; how these connections vary across persons and change over time; and the impact that they have on our behaviors, attitudes, and tastes. This course will introduce students to (1) important substantive issues about, and empirical research on, social networks; (2) theories about network evolution and network effects on behavior; and (3) tools and methods that students can use to look at and analyze social networks. The course will be a combination of lectures, discussions and labs. Course readings will include substantive research studies, theoretical writings, and methodological texts. Through this course students will learn about social networks by collecting data on social networks and analyzing that data.
SOC 43919/MDSC 43919: Text Analysis for Social Science
Marshall Taylor, Sociology
Screens are all around us. From T.V.s to smartphones and e-books, the ubiquity of screens andthe fact that we use them to communicate with one another means that virtually all of uscreate some form of "text data" every day.Further, the proliferation of mass communication technologies over the past couple ofdecades - including the rise of social media, the emphasis on document digitization in archives,libraries, and organizations, and increasing access to these data - has opened the door to newquestions for social scientists and to new data and methods for answering these questions. Forexample, do anti-immigration laws shape how people tweet about immigration? Does warshape how U.S. presidents frame the role of governance in society, as reflected in State of theUnion addresses? What accounts for the gender gap in net neutrality activism? Did nationalnews media or activist social media matter more for sparking #BlackLivesMatter? Can Twittersentiment predict stock market activity?This course will introduce students to some of the methods that social scientists use to answerthese types of questions. The focus will be on understanding and developing some of thefundamentals for designing and conducting text analysis projects from a social scienceperspective. We will also touch on some of the more advanced topics in this rapidly growingfield. Hands-on analysis in the R statistical computing environment will be integral to thecourse, though no prior coding experience is required.
STV 40305: Data in the Humanities: Mining the Book of Nature
Caterina Agostini, The Center for Digital Humanities
This course will introduce advanced undergraduates and graduate students to data mining and computational methods in the humanities. What do we mean when we say “data” or “big data”? Why would data, or data visualization count as an argument in the humanities? Through the concept of the “Book of Nature,” students explore unstructured and structured data in the work of scientists Thomas Harriot, Isaac Newton, Galileo Galilei, Andreas Vesalius, and Primo Levi, and practitioners in pharmacy and alchemy Camilla Erculiani and Caterina Sforza.* Students familiarize themselves with data mining and visualization, text analysis, and geospatial techniques through resources at the Center of Digital Scholarship and the Institute for Data and Society, while also gaining experience with digital cultural heritage. Primary sources include texts and images from manuscripts, printed books, artworks, and natural history that we will see in person at the Hesburgh Rare Books and Special Collections, the Snite Museum of Art, and the Notre Dame Museum of Biodiversity.