K. Adhiguna, D. Chris, H. John, Y. Dani, C. Stephen et al., LSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp.1426-1436, 2018.

M. Baroni, Linguistic generalization and compositionality in modern artificial neural networks, 2019.

Y. Belinkov, N. Durrani, F. Dalvi, H. Sajjad, and J. Glass, What do neural machine translation models learn about morphology?, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp.861-872, 2017.

I. Berent and G. Marcus, No integration without structured representations: Response to Pater. Language, vol.95, pp.75-86, 2019.

J. , P. Bernardy, and S. Lappin, Using deep neural networks on learn syntactic agreement. Linguistic Issues in Language Technology, vol.15, pp.1-15, 2017.

T. Blevins, O. Levy, and L. Zettlemoyer, Deep RNNs encode soft hierarchical syntax, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp.14-19, 2018.

R. Samuel, C. D. Bowman, C. Manning, and . Potts, Tree-structured composition in neural networks without tree-structured architectures, NIPS Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches, 2015.

N. Chomsky, Syntactic Structures. Mouton, 1957.

N. Chomsky, Rules and representations. Behavioral and Brain Sciences, vol.3, pp.1-15, 1980.

A. Shammur, R. Chowdhury, and . Zamparelli, RNN simulations of grammaticality judgments on long-distance dependencies, Proceedings of the 27th International Conference on Computational Linguistics, pp.133-144, 2018.

A. Clark and R. Eyraud, Learning auxiliary fronting with grammatical inference, Conference on Computational Language Learning, pp.125-132, 2006.

A. Clark and S. Lappin, Unsupervised learning and grammar induction, Handbook of Computational Linguistics and Natural Language Processing, 2010.

A. Conneau, G. Kruszewski, G. Lample, L. Barrault, and M. Baroni, What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp.2126-2136, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01898412

C. Dyer, A. Kuncoro, M. Ballesteros, and N. A. Smith, Recurrent neural network grammars, North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.

, Gottlob Frege. 1892.Über Sinn und Bedeitung. Zeitschrift für Philosophie und philosophische Kritik, vol.100, pp.25-50

M. Giulianelli, J. Harding, F. Mohnert, D. Hupkes, and W. Zuidema, Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information, EMNLP Workshop Blackbox NLP: Analyzing and Interpreting Neural Networks for NLP, pp.240-248, 2018.

E. M. Gold, Language identification in the limit, Information and control, vol.10, pp.447-474, 1967.

K. Gulordava, P. Bojanowski, E. Grave, T. Linzen, and M. Baroni, Colorless green recurrent networks dream hierarchically, North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.1195-1205, 2018.

J. Hewitt and P. Liang, Designing and interpreting probes with control tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019.

J. Hewitt and C. D. Manning, A structural probe for finding syntax in word representations, Proceedings of the North American Chapter of the Association for Computational Linguistics, 2019.

D. Hupkes, S. Veldhoen, and W. H. Zuidema, Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure, Journal of Artificial Intelligence Research, vol.61, pp.907-926, 2018.

D. Kauchak, Improving text simplification language modeling using unsimplified text data, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp.1537-1546, 2013.

M. Brenden, M. Lake, and . Baroni, Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks, 34th International Conference on Machine Learning, 2017.

S. Lappin and S. Shieber, Machine learning theory and practice as a source of insight into universal grammar, Journal of Linguistics, vol.43, pp.393-427, 2007.

B. Levin and M. R. Hovav, Argument Realization, 2005.

O. Levy, S. Remus, C. Biemann, and I. Dagan, Do supervised distributional methods really learn lexical inference relations?, Proceedings of the North American Chapter of the Association for Computational Linguistics Human Language Technologies, pp.970-976, 2015.

L. Xiang, J. Li, and . Eisner, Specializing word embeddings (for parsing) by information bottleneck, 2019 Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing, pp.2744-2754, 2019.

. Tal-linzen, What can linguistics and deep learning contribute to each other? Response to Pater, vol.95, pp.98-108, 2019.

T. Linzen, E. Dupoux, and Y. Goldberg, Assessing the ability of LSTMs to learn syntax-sensitive dependencies, Transactions of the Association for Computational Linguistics, vol.4, pp.521-535, 2016.

R. Marvin and T. Linzen, Targeted syntactic evaluation of language models, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018.

R. Mccoy, R. Frank, and T. Linzen, Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks, ArXiv, 2018.

T. Mccoy, E. Pavlick, and T. Linzen, Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.3428-3448, 2019.

T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S. Khudanpur, Recurrent neural network based language model, INTER-SPEECH, 2010.

F. J. Newmeyer, Grammar is grammar and usage is usage. Language, vol.79, pp.682-707, 2003.

T. Niven and H. Kao, Probing neural network comprehension of natural language arguments, Proceedings of the 57th Annual Meeting of the Association for Computa-tional Linguistics, pp.4658-4664, 2019.

J. Pater, Generative linguistics and neural networks at 60: Foundation, friction, and fusion, vol.95, pp.41-74, 2019.

M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark et al., Deep contextualized word representations, North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.

M. E. Peters, M. Neumann, L. Zettlemoyer, and W. Yih, Dissecting contextual word embeddings: Architecture and representation, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.1499-1509, 2018.

S. Ravfogel, Y. Goldberg, and T. Linzen, Studying the inductive biases of RNNs with synthetic variations of natural languages, 2019.

S. Ravfogel, Y. Goldberg, and F. Tyers, Can LSTM learn to capture agreement? the case of basque, EMNLP Workshop Blackbox NLP: Analyzing and Interpreting Neural Networks for NLP, pp.98-107, 2018.

N. Saphra and A. Lopez, Language models learn POS first, EMNLP Workshop Blackbox NLP: Analyzing and Interpreting Neural Networks for NLP, pp.328-330, 2018.

D. Saxton, E. Grefenstette, F. Hill, and P. Kohli, Analysing mathematical reasoning abilities of neural models, Proceedings of the 7th International Conference on Learning Representations, 2019.

A. Marten-van-schijndel, T. Mueller, and . Linzen, Quantity doesn't buy quality syntax with neural language models, Proceedings of Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing, pp.5830-5836, 2019.

C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal, vol.27, pp.379-423, 1948.

I. Tenney, P. Xia, B. Chen, A. Wang, A. Poliak et al., What do you learn from context? Probing for sentence structure in contextualized word representations, International Conference on Learning Representations, 2019.

N. Tishby, F. Pereira, and W. Bialek, The information bottleneck method, Annual Allerton Conference on Communication, Control and Computing, pp.368-377, 1999.

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, Feature-rich part-ofspeech tagging with a cyclic dependency network, Proceedings of the North American Chapter of the Association for Computational Linguistics, p.173180, 2003.