Frequently Taught Electives in the Minor

ACMS 34445: Probability and Statistics for Data Science 

Alan Huebner, Applied and Computational Math and Statistics

In this course, you will learn the fundamentals of probability theory and statistical inference used in data science. These foundational principles and techniques will allow you to transform data science problems into mathematical terms and validate them as statistical statements.

Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with webcam, reliable internet connection, and a quiet place to participate in live sessions.

ANTH 43200: The Social Species 

Mark Golitko, Anthropology

Human beings are distinguished in the animal kingdom by the degree to which we are embedded in wide-ranging networks of interaction. Anthropologists and archaeologists have long been interested in reconstructing the evolutionary causes of the human social talent, and the nature and structure of these connections in the past and present. This course will review current understanding of the evolutionary causes and consequences of human social networks beginning with our earliest ancestors, and reviewing archaeological and anthropological methods used to study them, including social network analysis (SNA). Network analysis is a powerful set of tools and theories drawn from across the social and physical sciences that can be used to study and model relational data. The course will review both the basics of network analysis as a tool, but also what we know of the structure of human social networks from classic sociological and anthropological studies, and discuss how network approaches can be used to study and model interaction and social structure in the past (or present). No prior mathematical or statistical training is needed to take this course.

BIOS 40427: The Epidemiology and Ecology of Infectious Diseases

Edwin Michael, Biology

This course provides an introduction to epidemiology and disease ecology; topics covered include historical perspectives on disease, tracking of disease, spread of disease, and disease mitigation.

BIOS 30318: Introduction to Biocomputing

Stuart Jones, Biology

Modern biology, as well as biochemistry and biophysics, relies significantly on computation. The volumes of data generated by modern lab and field research commonly require greater capacity and more sophisticated algorithms for reformatting, filtering, and analyzing than are available in traditional spreadsheet software. As a result, an efficient and productive scientist must possess, at least basic, biocomputational skills. Often these requisite skills include the ability to navigate the Unix Shell environment, to understand and implement existing software tools, and to use a scripting language for data processing and analysis. This course will provide students with the knowledge and experience required to apply these important tools in diverse contexts. Approximately one-third of the course will focus on using the Unix Shell environment with an introduction to bioinformatics approaches. The remaining two-thirds of the course will build the students' skills in the use of the R scripting language and applications in statistics and dynamic modeling. No previous coding experience is required of students in this course

CSE 10102: Elements of Computing II

Corey Pennycuff, Computer Science Engineering

CSE 10102/CDT 30020 is the second course in the core programming sequence in the Computing & Digital Technologies Minor. Building on your prior experience with the Python programming language, you will explore advanced programming paradigms such as functional and object-oriented programming, familiarize yourself with elements of software engineering such as the command line interface, version control, and development environments, and utilize web and cloud-based services. To demonstrate your mastery of these skills and concepts, you will work on an interdisciplinary team project throughout the semester that applies your knowledge to a problem related to one of the CDT tracks.

CSE 40838: Data Visualization

Chaoli Wang, Computer Science Engineering

Introduction to scientific and information visualization. Topics include visualization of scalar and vector fields (isosurface extraction, volume rendering, line integral convolution, and particle tracing); visual data representations (parallel coordinates, treemaps, and graph layouts); interactive techniques (focus+context visualization and coordinated multiple views); and solutions for big data visual analytics. Students will gain hands-on experiences in learning popular visualization programming (D3.js) and toolkit (ParaView). Students will have the opportunity to learn, implement, and apply visualization techniques through assignments and projects.

CSE 44640: Data Science 

Nitesh Chawla, Computer Science & Engineering

Data science can be viewed as the art and craft of extracting knowledge from large bodies of structured and unstructured data using methods from many disciplines, including (but not limited to) machine learning, databases, probability and statistics, information theory, and data visualization. This course will focus on the process of data science -- from data acquisition to analytics methods to deployment, and will walk the students through both the technical and use-case aspects in the process. It will place a larger emphasis on the machine learning component, with relevant inclusions and references from other disciplines. The course will give students an opportunity to implement and experiment with some of the concepts as part of a class project, in addition to the hands-on assignments using the Python programming language. Additionally, the course touches upon some of the advances in related topics such as big data and discuss the role of data mining in contemporary society. The course has been designed and developed by Nitesh Chawla, the Frank Freimann Professor of Computer Science and Engineering and Director of iCeNSA at the University of Notre Dame.

Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with webcam, reliable internet connection, and a quiet place to participate in live sessions. Students who will be on the Main campus are not eligible to enroll in this course.

Students enrolling in this course should have taken one or more courses or implemented one or more projects involving Python programming and one or more courses in probability or statistics.

DESN 40120: Visualization of Data

Neeta Verma, Visual Communication Design

MATERIALS FEE. The course develops an understanding of what data means to humans and how does its visualization helps communicate ideas in the fields of medicine, technology and social sciences. The course touches upon measurement, collection and reporting, analysis but ultimately focuses on visualization. Visualization is when the data comes alive and is ready to be used to communicate a complex concept be it numeric, spatial, process or temporal. Types of data covered in this course include but are not limited to: geographical, cultural, scientific, financial, statistical, meteorological, natural, and transportation data. The goal of the exercises within this course is to understand how data can be used to tell a story as opposed to merely packaging and plotting a set of numbers on a page. The design process is therefore exploring the static, dynamic, interactive or 3-dimensional and performance formats of representation and understand why a certain format is more or less suitable for the nature of data, its analysis and therefore its representation. Students develop an understanding of how the graphics being used must correlate completely with the data and numbers that are being represented. The course traverses through these considerations to understand the various approaches that can be used to bring data to life and allow the viewer to understand a story that is being packaged within the representation. Is there revelation or a deeper understanding of a pattern once your data has been visualized and presented that had not been discovered earlier?

ECON 30331: Econometrics


ENGL 30010/ CDT 30380: Text Mining the Novel

Matthew Wilkens, English

A course in quantitive and computational approaches to analyzing large bodies of text. Broadly speaking, the course covers text mining, content analysis, and basic machine learning, emphasizing (but not limited to) approaches with demonstrated value in literary studies. Students will learn how to clean and process textual corpora, extract information from unstructured texts, identify relevant textual and extra-textual features, assess document similarity, cluster and classify authors and texts using a variety of machine-learning methods, visualize the outputs of statistical models, and incorporate quantitative evidence into literary and humanistic analysis. Most of the methods treated in the class are relevant in other fields. Students from all majors are welcome. No prerequisites, but some programming experience strongly recommended. Taught in Python.

PHIL 24632: Robot Ethics

Don Howard, Philosophy

Robots or "autonomous systems" play an ever-increasing role in many areas, from weapons systems and driverless cars to health care and consumer services. As a result, it is ever more important to ask whether it makes any sense to speak of such systems' behaving ethically and how we can build into their programming what some call "ethics modules." After a brief technical introduction to the field, this course will approach these questions through contemporary philosophical literature on robot ethics and through popular media, including science-fiction text and video. This is an online course with required, regular class sessions each week. Class meetings are online via Zoom webinar software (provided by the University).

Note: this course is delivered fully online. The course design combines required live weekly meetings online with self-scheduled lectures, problems, assignments, and interactive learning materials. To participate, students will need to have a computer with a webcam, reliable internet connection, and a quiet place to participate in live sessions. Students who will be on the Main campus or residing in the Michiana region are not eligible to enroll in this course.

PHIL 20647/MDSC 20647: Data and Artificial Intelligence Ethics

Emanuele Ratti, Philosophy

In the last decade, the Big Data revolution and developments in Artificial Intelligence (AI) have both created promises and raised several ethical issues. Computational emerging technologies have fostered the achievement of apparent benefits, while at the same they seem to exacerbate social inequalities and threaten even our own existence as a species. In this course, we will discuss those ethical and societal issues related to the development of AI and Big Data that have direct and concrete consequences on the way we perceive ourselves as persons, as members of society, and the way we conceive our place as a species on this planet. These issues will be analyzed in light of major ethical theories, but a special emphasis will be placed on virtue ethics. Recent works in virtue ethics are well positioned to make sense of the importance of our place as human beings on this planet, but at the same time they can account for the indispensable roles that machines play in our environment. The course is divided in three main parts. In the first part, I will introduce the main ethical frameworks, and in particular virtue ethics. In the second part, we will discuss AI. Societal and ethical issues raised by AI include the threats posed to the existence of our species; whether we should trust AI or we should find a way to build artificial agents with moral characteristics; whether AI will do most of our jobs in the future and if this scenario is desirable. In the third part, we will focus on selected issues concerning the Big Data revolution, such as how the autonomy of very complex algorithms can shape our lives in opaque ways and whether transparency is desirable; if the design of algorithms may hide bias leading to social inequalities; how algorithms are changing the way healthcare is provided. Upon successful completion of this course, you will be able to: 1. define and sketch focal points of the virtue ethics and other relevant ethical theories 2. identify moral theories in arguments provided in support or in opposition to the use of certain AI-related and Big Data technologies 3. compare different arguments and highlight strengths and weaknesses

PHYS 60410/MDSC 40410: Patterns of Life 

Dervis Can Vural, Physics

This course focuses on the mathematical principles underlying the spatiotemporal patterns emerging in biological populations. Students are expected to be comfortable with calculus, differential equations, linear algebra, and elementary probability theory. The first part of the course focuses primarily on population genetics and evolutionary biology, while the second part will focus on reaction diffusion equations and pattern formation. Students will be expected to solvequantitative problems, design simulations, and will be guided towards developing researchprojects related to theoretical and computational biology.

POLS 30813/KSGA 30005/MDSC 30005: Simulating Politics and Global Affairs

Thomas Mustillo, Global Affairs

Politics, markets, and the environment are all spheres of development that are fundamentally shaped by the action and interaction of many individuals over time. For example, the Arab Spring protests, the shortage of medicines in Caracas, and the rising water temperatures of the Baltic Sea are all system-level outcomes arising from the individual actions of thousands or even billions of people. In these spheres, leadership is often weak or non-existent. Scientists call these "complex systems." Complexity is difficult to study in the real world. Instead, scientists often approach these phenomenon using computer simulations (sometimes called agent-based models, social network models, and computational models). The goal is to build computer models of development that link the actions and interactions of individuals to the system-level outcomes. This class will use the perspective, literature, and tools of complexity science to approach core questions in the field of development.

POLS 40815: Visualizing Politics

Michael Coppedge, Political Science

This course is an introduction to political, economic, and social issues through the medium of visual displays. This kind of course has become feasible because data are now abundant and easy to access and software for displaying and analyzing data are available and easy to use. The ability to examine and display data is an increasingly valuable skill in many fields. However, this skill must be complemented by the ability to interpret visual displays orally, and by a commitment to use data responsibly: to reveal, rather than slant or distort, the truth. We will discuss examples concerning drugs, marriage, climate change, development, economic performance, social policy, democracy, voting, public opinion, and conflict, but the main emphasis is on helping you explore many facets of an issue of particular interest to you. You will learn to manage data and produce your own graphics to describe and explain political, social, economic (or other!) relationships. The graphics will include line and bar graphs, 2D and 3D scatterplots, motion charts, maps, and others.

POLS 34815/MDSC 34815: How To (Not) Lie with Statistics

Jeff Harden, Political Science

How will Amazon HQ2 impact local economies? Should parents allow kids to have screen time? What role did demographic shifts in suburban areas play in the 2016 and 2018 elections? Does the infield shift work? Modern society constantly faces questions that require data, statistics, and other empirical evidence to answer well. But the proliferation of niche media outlets, the rise of fake news, and the increase in academic research retraction makes navigating potential answers to these questions difficult. This course is designed to give students tools to confront this challenge by developing their statistical and information literacy skills. It will demonstrate how data and statistical analyses are susceptible to a wide variety of known and implicit biases, which may ultimately lead consumers of information to make problematic choices. The course will consider this issue from the perspectives of consumers of research as well as researchers themselves. We will discuss effective strategies for reading and interpreting quantitative research while considering the incentives researchers face in producing it. Ultimately, students will complete the class better equipped to evaluate empirical claims made by news outlets, social media, or their peers. The goal is to encourage students to approach data-driven answers to important questions with appropriate tools rather than blind acceptance or excessive skepticism.

POLS 30111: Data and Politics

Nathanael Gratias Sumaktoyo, Postdoctoral Fellow, Global Religion Research Initiative

Sherlock Holmes famously said in the Adventure of the Copper Beeches, "Data! Data! Data! I can't make bricks without clay." Similarly, it is hard to understand our world without data?big or small. Data allows us to look for patterns and trends, to test for relationships, and to make predictions. This course is all about data and how we can use it to understand politics and social phenomena in general. We will learn various approaches to data gathering and analysis, ranging from public opinion survey, experimental methods, unobtrusive measures, machine learning, to social networks analysis. Our assignments and exams will test both students' knowledge of the methods and ability to do basic analysis. While preexisting knowledge of statistical methods is not required, a willingness to learn statistics and programming (especially with R) is crucial. In terms of substance, while we will also discuss topics such as consumer behavior (think Netflix and Amazon), our foci will be on political behavior and religious behavior. Our focus on political behavior will include voting behavior, representation/redistricting, and political campaigns; whereas our focus on religious behavior will include religious violence, terrorism, and interreligion cooperation. Naturally, we will also discuss how the two foci are related both in the U.S. and around the world. This course counts as a methodology course for departmental honors.

PSY 40121: Psychological Measurement and Test Development

Ying (Alison) Cheng, Psychology

This course introduces measurement of human behavior in psychological studies, the construction and use of psychological instruments and educational assessments (including tests of intelligence, achievement, personality, and vocational interest), validation of these tests following classical test theory and item response theory, as well as practice in test construction, administration and validation. The course also highlights issues of test equality across groups, assessing measurement error, interpretation of test scores in the context of criterion-referenced tests vs. norm-referenced tests, standard setting and so on.

PSY 40122: Machine Learning for Social and Behavioral Research 

Ross Jacobucci, Psychology

Cluster analysis is a statistical approach for the analysis of multivariate data that aims at discovering groups of subjects in a sample that are similar to each other. Clustering techniques are applied in a wide variety of areas including psychiatry (e.g., finding disease categories), marketing (e.g. different consumer profiles), sociology (e.g., social subgroups), etc. Cluster analysis is an example of unsupervised learning. The latter term is derived from the fact that clusters are discovered in the absence of an outcome variable that guides the clustering. Regression trees and random forests, on the other hand, are supervised learning approaches. An outcome such as "case" and "control" is predicted by a number of predictor variables, and the analysis focuses on finding groups with similar response patterns on the predictor variables. In other words, the outcome variable guides ("supervises") finding groups of similar subjects. Outcomes in regression trees or forests can be categorical or continuous.This graduate level course consists of two parts. The first part covers the basics of cluster analysis whereas the second part provides an introduction to regression trees and random forests. The course consists of approximately 2/3 lectures providing the theoretical background, and 1/3 lab sessions, which will use the free software program R. The course is suitable for students with a strong interest in methods. Basic knowledge of matrix algebra and thorough knowledge of regression analysis are a prerequisite. Due to the broad variety of applications of cluster analysis and regression trees, students in (Quantitative) Psychology, Sociology, Political Sciences, or Computer Sciences are equally welcome.

PSY 30109: R for Data Science and Exploratory and Graphical Data Analysis 

Zhiyong (Johnny) Zhang, Psychology

This class aims to equip students with basic knowledge of R in data manipulation, data generation, data visualization and data analysis with a focus on data science. The first part of the class will introduce the very basics of R including the types of data such as vectors, matrices, and data frames as well as tibbles for refined data frames and bigmatrix for big data. The second part of the class will introduce data manipulation and preprocessing methods such as data transformation, subsetting, and combination. The third part will deal with specific types of data such as strings, texts, dates and times, images, audios, and videos. The fourth part will teach ggplot2 and related packages for data visualization. The last part of the class will illustrate how to conduct data analysis using the above techniques through case studies such as basket analysis, network analysis, and log analysis. The class does not require previous knowledge of R

PSY 40120: Advanced Statistics 

Zhiyong (Johnny) Zhang, Psychology

This course extends PSY 30100 in two respects. First, additional attention is given to the logic of inferential statistics. Special focus is placed on the purpose, strengths, and limitations of hypothesis testing, especially as it is used in psychological research. Second, this course considers statistical analysis of data from more complex data structures than typically covered in PSY 30100. The goal of this part of the course is to heighten students' awareness of the variety of research questions that can be addressed through a wide range of designs and accompanying analyses. The orientation of the entire course focuses much less on the computational aspects of analyzing data than on the conceptual bases of what can be learned from different approaches to data analysis.

PSY 30105: Exploratory and Graphical Data Analysis 

Zhiyong (Johnny) Zhang, Psychology

The process by which Psychological knowledge advances involves a cycle of theory development, experimental design and hypothesis testing. But after the hypothesis test either does or doesn't reject a null hypothesis, where does the idea for the next experiment come from? Exploratory data analysis completes this research cycle by helping to form and change new theories. After the planned hypothesis testing for an experiment has finished, exploratory data analysis can look for patterns in these data that may have been missed by the original hypothesis tests. A second use of exploratory data analysis is in diagnostics for hypothesis tests. There are many reasons why a hypothesis test might fail. There are even times when a hypothesis test will reject the null for an unexpected reason. By becoming familiar with data through exploratory methods, the informed researcher can understand what went wrong (or what went right for the wrong reason). This class is recommended for advanced students who are interested in getting the most from their data.

SOC 43990/MDSC 43990: Social Networks

David Hachen, Sociology

Social networks are an increasingly important form of social organization. Social networks help to link persons with friends, families, co-workers and formal organizations. Via social networks information flows, support is given and received, trust is built, resources are exchanged, and interpersonal influence is exerted. Rather than being static, social networks are dynamic entities. They change as people form and dissolve social ties to others during the life course. Social networks have always been an important part of social life: in our kinship relations, our friendships, at work, in business, in our communities and voluntary associations, in politics, in schools, and in markets. Our awareness of and ability to study social networks has increased dramatically with the advent of social media and new communication tools through which people interact with others. Through email, texting, Facebook, Twitter and other platforms, people connect and communicate with others and leave behind traces of those interactions. This provides a rich source of data that we can use to better understand our connections to each other; how these connections vary across persons and change over time; and the impact that they have on our behaviors, attitudes, and tastes. This course will introduce students to (1) important substantive issues about, and empirical research on, social networks; (2) theories about network evolution and network effects on behavior; and (3) tools and methods that students can use to look at and analyze social networks. The course will be a combination of lectures, discussions and labs. Course readings will include substantive research studies, theoretical writings, and methodological texts. Through this course students will learn about social networks by collecting data on social networks and analyzing that data.

SOC 43919/MDSC 43919: Text Analysis for Social Science

Marshall Taylor, Sociology

Screens are all around us. From T.V.s to smartphones and e-books, the ubiquity of screens andthe fact that we use them to communicate with one another means that virtually all of uscreate some form of "text data" every day.Further, the proliferation of mass communication technologies over the past couple ofdecades - including the rise of social media, the emphasis on document digitization in archives,libraries, and organizations, and increasing access to these data - has opened the door to newquestions for social scientists and to new data and methods for answering these questions. Forexample, do anti-immigration laws shape how people tweet about immigration? Does warshape how U.S. presidents frame the role of governance in society, as reflected in State of theUnion addresses? What accounts for the gender gap in net neutrality activism? Did nationalnews media or activist social media matter more for sparking #BlackLivesMatter? Can Twittersentiment predict stock market activity?This course will introduce students to some of the methods that social scientists use to answerthese types of questions. The focus will be on understanding and developing some of thefundamentals for designing and conducting text analysis projects from a social scienceperspective. We will also touch on some of the more advanced topics in this rapidly growingfield. Hands-on analysis in the R statistical computing environment will be integral to thecourse, though no prior coding experience is required.